Search for a command to run...
The proliferation of user-generated content in today’s digital landscape has further increased dependence on online reviews as a source for decision-making in the hospitality industry. There has been an increasing interest in automating this decision-support mechanism through recommender systems. However, this process often requires a large amount of labelled corpus to train an effective algorithm, necessitating the use of human annotators for developing training data, where this is lacking. Although the manual annotation can be helpful in enriching the training corpus, it can, on the one hand, introduce errors and annotator bias, including subjectivity and cultural bias, which can affect the quality of the data and fairness in the model. This paper examines the alignment of ratings derived from different annotation sources and the original ratings provided by customers, which are treated as the ground truth. The paper compares the predictions from Generative Pre-trained Transformer (GPT) models against ratings assigned by Amazon Mechanical Turk (MTurk) workers. The GPT 4o annotation outputs closely mirror the original ratings, given its strong positive correlation (0.703) with the latter. The GPT-3.5 Turbo and MTurk showed weaker correlations (0.663 and 0.15, respectively) than GPT 4o. The potential cause of the large difference between original ratings and MTurk (largely driven by human perception) lies in the inherent challenges of subjectivity, quantitative bias, and variability in context comprehension. These findings suggest that the use of advanced models such as GPT-4o can significantly reduce the potential bias and variability introduced by Amazon MTurk annotators, thus improving the prediction accuracy of ratings with actual user sentiment as expressed in textual reviews. Moreover, with the per-annotation cost of an LLM shown to be thirty times cheaper than MTurk, our proposed LLM-based textual review annotation approach will be cost-effective for the hospitality industry.