Ngo-Ye and Sinha (2014) developed a text regression model for predicting the helpfulness of 7,465 online restaurant reviews posted at Yelp.com and 584 book reviews posted at Amazon.com. According to the authors, most review opinion mining studies focused on sentiment rather than quality and ignored reviewer engagement characteristics. The authors argued that reviewer characteristics influence the perception of helpfulness and that decision makers at online organizations who rely on user-generated content to gain marketing advantage could better leverage that content if they understood what makes it helpful to others and if they could predict that helpfulness.
In a hybrid approach using the bag-of-words (BOW) model, a type of vector space model (VSM) that represents a document as a bag of words along with recency, frequency, and monetary (RFM) analysis, a tool commonly used for maximizing response rates in direct marketing, and correlation-based feature selection (CFS), a dimension reduction technique, Ngo-Ye and Sinha performed a comparison of the predictive strength of their proposed model and the ZeroR model, a method of classification that relies on the target and ignores predictive variables. The authors further experimented with and cross-compared four common index weighting schemes including, binary occurrence, term occurrence, term frequency, and term frequency/inverse document frequency. A summary of the conceptual models and predictor variables used by the authors are shown in the fol1owing table. All models used the number of useful votes as the target variable.
|RFM||Recency, Frequency, Monetary value|
|BOW||Word1, Word2, … WordN|
|BOW/CSF||Word1, Word2, … Wordp|
|BOW/CSF +RFM||Word1, Word2, … Wordp, Recency, Frequency, Monetary value|
The authors sought to determine if the BOW/ CSF model was a better predictor of review helpfulness than that of ZeroR; if the BOW/CSF + RFM model was a better predictor than that of BOW/CSF alone; and if BOW/CSF was a better predictor than that of RFM alone. To do so, they instantiated 28 models using support vector regression (SVR), a technique used to optimize large datasets (Basak, Pal, & Patranabis, 2007). The results indicated that BOW/CSF + RFM significantly (p < .05) out-performed the other models in predicting review helpfulness for both the Yelp.com and Amazon.com datasets.
Anderson, Sweeney, Williams, Camm, and Martin (2012) stated there are numerous linear programming applications used in marketing, such as in media selection and market research. The research done by Ngo-Ye and Sinha (2014) is another example in that it demonstrates how quantitative models can help marketers make decisions about what content to display prominently to what users and at which points in time. The ability to accurately predict which online reviews users will perceive as the most helpful would give the company a competitive edge and would increase user satisfaction.