PRODUCT REPUTATION MANIPULATION: THE CHARACTERISTIC S AND IMPACT OF SHILL REVIEWS by TOAN C. ONG B.S. in Information Systems, University of Economic s Ho Chi Minh City, 2003 M.S. in Information Systems, University of Colorado Denver, 2008 A thesis submitted to the Faculty of the Graduate School of the University of Colorado in partial fulfillment of the requirements for the degree of Doctor of Philosophy Computer Science and Information Systems (CSIS) 2013
ii This thesis for the Doctor of Philosophy degree by Toan C. Ong has been approved for the Computer Science and Information Systems Program by Dawn Gregg, Chair Michael V. Mannino, Advisor Judy Scott Tom Altman April 15, 2013
iii Ong, Toan C (Ph.D., Computer Science and Information Systems) Product Reputation Manipulation: The Impact of Shil l Reviews on Perceived Quality Thesis directed by Associate Professor Michael V. M annino ABSTRACT Online reviews have become a popular method for con sumers to express personal evaluation about products. Ecommerce firms have inv ested heavily into review systems because of the impact of product reviews on product sales and shopping behavior. However, the usage of product reviews is undermined by the increasing appearance of shill or fake reviews. As initial steps to deter an d detect shill reviews, this study attempts to understand characteristics of shill reviews and influences of shill reviews on product quality and shopping behavior. To reveal the lingui stic characteristics of shill reviews, this study compares shill reviews and normal review s on informativeness, readability and subjectivity level. The results show that these fea tures can be used as reliable indicators to separate shill reviews from normal reviews. An e xperiment was conducted to measure the impact of shill reviews on perceived product qu ality. The results showed that positive shill reviews significantly increased quality perce ptions of consumers for thinly reviewed products. This finding provides strong evidence abo ut the risks of shill reviews and emphasizes the need to develop effective detection and prevention methods. The form and content of this abstract are approved. I recommend its publication. Approved: Michael V. Mannino
iv Dedicated to the ones I love
v ACKNOWLEDGEMENTS I would like to express my deepest appreciation to my advisor, Dr. Michael V. Mannino, who supported and motivated me thorough my Master a nd Ph.D. programs at the University of Colorado Denver. Without his guidance and persistent help this dissertation would not be possible. I would like to thank Dr. Dawn Gregg, the chair of my dissertation committee and my other committee members, Dr. Judy Scott and Dr. Tom Altman for giving me helpful advices for the completion of this dissertation. In addition, I would like to thank Dr. Stefanie Joh nson for assisting me with the data collection process.
vi TABLE OF CONTENTS 1.Introduction ...................................... ................................................... ........................ 11.1 Overview ...................................... ................................................... .............................. 11.2 The linguistic characteristics of shill reviews .................................................. ............. 31.3 The impact of shill reviews on perceived qualit y ................................................. ........ 52.Literature Review ................................. ................................................... .................... 72.1 The effect of product reviews ................. ................................................... ................... 72.2 Review manipulation ........................... ................................................... .................... 102.3 Product feature extraction .................... ................................................... .................... 132.4 Perceived quality ............................. ................................................... ......................... 133.Linguistic Characteristics of Shill Reviews........ ................................................... .... 153.1 Development of hypotheses ..................... ................................................... ................ 153.2 Data Collection ............................... ................................................... ......................... 183.2.1 Shill review collection ..................... ................................................... ..................... 183.2.2 Normal review collection .................... ................................................... .................. 203.3 Linguistic characteristics .................... ................................................... ..................... 203.3.1 Product feature extraction background ....... ................................................... .......... 213.3.2 Informativeness ............................. ................................................... ........................ 22.214.171.124 Official feature detection ................ ................................................... ................... 2126.96.36.199 Unofficial feature detection .............. ................................................... ................. 2188.8.131.52 Performance ............................... ................................................... ........................ 303.3.3 Readability ................................. ................................................... ........................... 3184.108.40.206 The Fog-Index ............................. ................................................... ....................... 3220.127.116.11 The Flesch-Kincaid index (FK) ............. ................................................... ............ 318.104.22.168 The Automated Readability Index ........... ................................................... .......... 322.214.171.124 Simple Measure of Gobbledygook (SMOG) ..... ................................................... 333.5 Subjectivity Analysis ......................... ................................................... ...................... 333.4 Results and discussion ........................ ................................................... ..................... 353.4.1 Results ..................................... ................................................... .............................. 35
vii 3.4.2 Discussion .................................. ................................................... ........................... 394.The Impact of Shill Reviews on Perceived Quality .. ................................................ 414.1 Theoretical background ........................ ................................................... ................... 414.2 Models......................................... ................................................... ............................. 484.3 The experiment ................................ ................................................... ........................ 514.4 Results and discussion ........................ ................................................... ..................... 544.4.1 Multicollinearity testing and sample size ... ................................................... .......... 544.4.2 Results ..................................... ................................................... .............................. 574.4.3 Discussion .................................. ................................................... ........................... 625.Conclusions ....................................... ................................................... ..................... 645.1 Summary of chapters ........................... ................................................... .................... 645.2 Results and implications ...................... ................................................... .................... 655.2.1 Linguistic characteristics of shill reviews ................................................... ............ 655.2.2 The impact of shill reviews on perceived qual ity ............................................... ..... 665.3 Limitations and future research ............... ................................................... ................ 695.3.1 Negative shill reviews and shill review detec tion............................................... ..... 695.3.2 The impact of shill reviews on product prefer ence .............................................. .... 69References Appendix A.The website for shill collection .................. ................................................... ............ 80B.The product evaluation website .................... ................................................... .......... 83
viii LIST OF TABLES Table 2.1 Literature summary about the impact of product reviews ........................................... 83.1 Distribution of target product reviews ........ ................................................... ............. 253.2 Data structure of a review .................... ................................................... .................... 263.3 Example of POS Tagging ........................ ................................................... ................ 273.4 Result of manual classification ............... ................................................... ................. 313.5 Performance of DFEM ........................... ................................................... .................. 313.6 Subjectivity classifier confusion matrix ...... ................................................... ............. 343.7 Analysis results .............................. ................................................... .......................... 374.1 Summary of the hypotheses ..................... ................................................... ................ 474.2 Family-wise and comparison-wise Type I error ra te ................................................ .. 564.3 Minimum required sample size .................. ................................................... .............. 574.4 The impact of shill reviews on the first impres sion .............................................. ...... 594.5 Factors that impacts quantity of reviews read ................................................... ......... 594.6 Factors that impacts median time spent on each review ............................................ 604.7 Factors that impacts the change in perceived qu ality ............................................. .... 614.8 The impact of shill level on perceived quality ................................................... ......... 62
ix LIST OF FIGURES Figure 3.1 Examples of review format ..................... ................................................... ................. 223.2 The Description-based Feature Extraction Method .................................................. .. 244.1 Factors that impact perceived quality ......... ................................................... ............. 424.2 Relationship of the models..................... ................................................... .................. 494.3 The experiment ................................ ................................................... ........................ 524.4 The scatter plots of independent variables .... ................................................... ........... 54
1 1. Introduction 1.1 Overview Buyers use online reviews as a source of knowledge about products they want to buy. The knowledge contained in the reviews reflect s personal experiences of the reviewer. These experiences may provide consumers w ith additional information not mentioned in the official product description or al low them to verify that the information advertised by the manufacturer is accurate. The inf ormation in product reviews can be used to overcome the problem of information asymmet ry, that is exacerbated in online sales environment (sellers possess more product inf ormation than buyers) (Ba and Pavlou, 2002). Thus, reviews help online buyers mak e more informed purchase decisions. For example, a study about video game bu yers shows that purchase decisions were positively influenced by the usage of online r eviews (Bounie, Bourreau, Gensollen and Waelbroeck, 2005). Because reviews affect buyer sÂ’ purchase decisions which directly impact product sales, there is motivation for sellers to use fake reviews to provide the buyers with misleading or incorrect pro duct information. In this study we regard fake reviews as Â“shill revi ewsÂ”. The term Â“shillÂ” and Â“shillingÂ” are used in studies about reputation man ipulation. Lam and Riedl (2004) defines shills as users Â“whose false opinions are i ntended to mislead other usersÂ”. We extend this definition by specifying that a shill i s a person who writes a review for a product without disclosing the relationship between the seller and review writer. A shill can be the seller or someone compensated by the sel ler for writing a review. Thus, shills can be sellers, distributors, manufacturers and aut hors who benefit from the sales of the
2 product. Wu, Greene, Smyth and Cunningham (2010) de fines Â“shill reviewsÂ” as reviews that Â“distort popularity rankings given that the ob jective is to improve the online reputationÂ”. By the definition above, any review ca n potentially be a shill review making it very difficult to detect shill reviews. Despite the difficulty in detecting shill reviews, some anecdotal evidence has emerged about their existence. Review manipulation was found even on reputable online marketplaces such as Amazon.com and BarnesandNoble. com (Hu, Bose, Gao and Liu, 2011a; Hu, Liu and Sambamurthy, 2011b). The review system on Google was also attacked. An investigation by Denver 7News channel discovered that a woman was hired to create more than 50 Google accounts to publish 5 -star reviews for multiple local businesses1. BBC News reported that Gary Beal, a business owne r, was a victim of review manipulation2. Gary found that a local competitor posted a negat ive review about his company in order to damage his reputation and s teal his customers. In 2009, Belkin, a networking and peripheral manufacturer was reported hiring people to write fake positive reviews for their products on Amazon.com3. Later, Belkin management issued an apology for this action4. In the music industry, marketers disguised as con sumers, promoted newly released CDs on online communities s uch as discussion forums or fan sites (Mayzlin, 2006). According to Gartner, an IT research and advisory company, by the year 2014, 10-15 % of media reviews will be fak e reviews5. 1 http://www.thedenverchannel.com/news/31087210/deta il.html 2 http://news.bbc.co.uk/2/hi/programmes/click_online /8826258.stm 3 http://www.thedailybackground.com/2009/01/16/exclu sive-belkins-development-rep-ishiring-people-to-write-fake-positive-amazon-reviews / 4 http://news.cnet.com/8301-1001_3-10145399-92.html 5 http://www.gartner.com/it/page.jsp?id=2161315
3 There are several factors that allow shill attacks to be effective. First, the most important part of a product review is its overall r ating. In current review systems, the overall rating of a product is the simple average v alue of all of its reviews. So a direct way to impact the average rating of a product is to simply submit a review. The fewer reviews a product has, the more impact a new review has on the overall rating. Therefore, thinly reviewed products, such as new products or s pecialized products, can be benefit from shill attacks. Second, it is very simple to su bmit a review for a product. Normally, an account is required for a reviewer to submit a r eview, but the account registration process usually only requires the reviewer to have an email address, which can easily be obtained for free. Third, the identification of rev iewers is usually anonymous so that reviewers donÂ’t have to be responsible for the cont ent of their reviews. Finally, unlike reviews for sellers, product reviews can be submitt ed by reviewers who are not required to demonstrate product ownership. 1.2 The linguistic characteristics of shill reviews Although the existence of review manipulation is kn own, researchers are having difficulty developing effective methods to detect f ake reviews and measuring the impact of shill reviews on the consumers. Research efforts were made to identify product groups whose reviews are more likely to be manipulated (Hu et al., 2011a; Hu et al., 2011b). However, the results of these studies have been lim ited to verifying the existence of review manipulation instead of identifying the fake reviews. It is difficult to specifically identify fake reviews even when a fake review ident ification process is done manually (Jindal and Liu, 2007). We argue that to effectivel y detect fake product reviews, better
4 understanding about the linguistic characteristics of fake reviews must be developed. In this study, we explore the linguistic characteristi cs, such as informativeness, subjectivity and readability, of fake reviews by comparing their text comment to that of the normal reviews using natural language processing (NLP) tec hniques. A comparison between shill reviews and normal revie ws reveals the characteristic of shill reviews. To measure informativeness of a r eview, a novel method integrating multiple NLP techniques was developed to extract nnrnrn nnrnrnr rrrrnrn nrrnnrnnnrnnr nnrnnrr nnrnnbynn !"nrr#nn nnnnnr#nnrn nrrnn$n nnnrrrnrrr nrnnn rrnnnn$n $nnnnnnrnrn nrr nrrnnrrnnr nrnnrn$rn nnrnnrnr%r& 'rnn nnnrrrnrr nnn rnnrnrnrnnrn$ rrrrrn rrnn#nrrnrr nnnn nnrnrnrnnnrn
5 1.3 The impact of shill reviews on perceived qualit y Despite the impact of reviews on purchase decisions found in multiple studies (Bounie et al., 2005; Chevalier and Mayzlin, 2006), there has been no clear explanation for the underlying reason of this relationship. We argue that the consumers use online reviews to gain trust about products. Product revie ws have become an important source of product information (Urban, 2005). Consumers use reviews to verify the quality of the product advertised by the manufacturer. Thus, consu mers use reviews to confirm their perception about product quality (Moe, 2009). Since perceived quality plays an important role in consumerÂ’s purchase decision making process (Tsiotsou, 2005; Zeithaml, 1988), reviews indirectly impact customer purchase decisio ns. The purpose of shill reviews is to change the consumersÂ’ perception about the quality of target products. The objective of this study is to measure the impact of product revi ews on perceived quality and the effect of positive shill reviews on improving quality perc eptions. Following ZeithamlÂ’s definitions, product reviews c an be treated as an extrinsic attribute that has an impact on the reputation of t he product (Zeithaml, 1988). We extend ZeithamlÂ’s model by hypothesizing that customer rev iews can also impact perceived quality. To isolate the effect of the reviews, othe r factors such as price and brand names in this model are controlled. Different sets of rev iews contain different numbers of positive shill reviews. An advantage of collecting data via an experiment is the ability to monitor review usage, such as quantity of reviews r ead and time spent on reading the reviews, which are difficult to observe in real wor ld environment.
6 Our results reveal some interesting characteristics of the relationship between product reviews and review usage. The first impress ion is only influenced by average rating among the variables included in the rating s ummary. When there are ten reviews or less, consumers tend to read all the reviews. Howev er, the results show when more reviews are available, consumers spend less time re ading each review. The results also show that shill reviews have a significant effect o n changing product quality perception. The appearance of shill reviews increases perceived product quality. The findings of this study have both theoretical an d practical contributions. Theoretically, we identify that customer reviews is one of the factors that has an impact on perceived quality. We provide evidence that word -of-mouth, in online shopping environment, has shown the influence on perceived p roduct quality. In practice, marketers can use online review as a tool to improv e quality perception, especially in cases where advertising doesnÂ’t effectively do so ( Clark, Doraszelski and Draganska, 2009). Our findings about the effect of shill revie ws also help to raise awareness about the risks posed by shill reviews.
7 2. Literature Review 2.1 The effect of product reviews Literature in economics and marketing has shown tha t online product reviews are used widely by consumers to make both online and of fline purchases (Bansal and Voyer, 2000; Chatterjee, 2001; Godes and Mayzlin, 2004). T he information contained in product reviews helps consumers gather useful information a bout products they intend to purchase. For instance, gamers between 19 and 25 w ho read more online video game reviews tend to purchase more video games (Bounie e t al., 2005). The additional product information provided by the reviews also helps cons umers mitigate the problem of information asymmetry and therefore, increase their confidence in making the purchase decision which has direct impact on product sales ( Ba et al., 2002; Infosino, 1986). Table 2.1 summarizes studies that show the importan t role of online product reviews on product sales and buyer behaviors. Using movie box office data, Liu (2006) and Duan, Gu and Whinston (2008) showed that the qu antity of reviews positively impacts movie revenue. These two studies found no e ffect on the average rating on movie sales. On the other hand, a significant effect of a verage rating of the reviews on product sales was found in other studies (Chevalier et al., 2006; Cui, Lui and Guo, 2010; Dellarocas, Zhang and Awad, 2007; Ye, Law and Gu, 2 009). Explaining the effect of the rating score on product sales, Forman, Ghose and Wi esenfeld (2008) stated that consumers use ratings as a measurement for product quality. The authors also believed that a good rating can draw the attention of the bu yers and lead to a buying decision.
8 Table 2.1 Literature summary about the impact of pr oduct reviews Article Product category Dependent variable Significant WOM effects Liu (2006) Movies Sales Number of posts (Volume) Duan et al. (2008) Movies Sales Number of posts (Volume) Dellarocas et al. (2008) Movies Sales diffusion parameters Number of posts (Volume) Clemons et al. (2006) Beer Sales growth rate Average rating (Valence) Standard deviation (Variance) Godes & Mazilyn (2004) Television shows TV viewership ratings Entropy of post (Variance) Number of posts (Volume) Chevalier & Mayzilin (2006) Books Sales rank Average rating (Valence) Park et al. (2007) Portable multimedia player Intention to buy Quality of the reviews Number of posts (Volume) Bounie et al. (2005) Video games Intention to buy Review usage Studies about the effect of review rating score fin d that the effect of low-end (1,2 star) and high-end (4,5 star) reviews on sales depe nds on the characteristics such as price premium of the product (Chevalier et al., 2006; Cle mons, Gao and Hitt, 2006). However, comparing to medium rating, strong ratings which ar e low-end or high-end appear to be more attractive. According to (Forman et al., 2008) reviews with strong rating provide Â“a great deal of information to inform purchase decisi onÂ”. Supporting this argument, Cao, Duan and Gan (2011) concluded that reviews with ext reme opinions receive more helpfulness votes than reviews with neutral or mixe d opinions. The rating score is not the only element of the rev iew that has an impact on consumers. The variance in ratings among the review s of a product also impacts the productÂ’s market performance (Awad and Zhang, 2006) Sun (2008) found that even
9 when a product has a low rating score, some of its high rating reviews which create high variance in the review set can increase the sales o f that product. The existence of the high ratings means that the product is still appreciated by several users and the potential consumers may shift their focus to these high ratin g reviews. Other factors such as reviewer identity also play an important role in th e influence of the reviewer on the consumers. Reviews with enclosed reviewer identity receive more helpfulness votes from consumers and enjoy an increase in sales (Forman et al., 2008). Similarly, Hu, Liu and Zhang (2008) concluded that Â“the market responds mo re favorably to reviews written by reviewers with better reputation and higher exposur eÂ”. The content of the text comment of the reviews repo rtedly has influence on product sales. Ghose and Ipeirotis (2004) reported that the subjectivity level of reviews positively impacted the sales of electronic items. The sentiment in the text comment expresses the attitude of the reviewers toward the quality of the product. This piece of information is important to consumers because they want to know what others think about the product (Pang and Lee, 2008). In other wo rds, reading the reviews, the consumers are looking for not only the confirmation of product features but also reviewer personal feeling when they use the product. In addi tion, the sentiment of early reviews impacted the sentiment of later reviews and indirec tly affected the overall reputation of the product (Gao, Gu and Lin, 2006; Sakunkoo and Sa kunkoo, 2009).
10 2.2 Review manipulation Two common challenges review systems face are a lac k of incentive to leave feedback and the existence of dishonest feedback (R esnick, Zeckhauser, Friedman and Kuwabara, 2000). Leaving a detailed feedback is a t ime consuming process. Many buyers do not bother to leave feedback if there is no a re ward for doing so (Gao et al., 2006). Lack of feedback can leave products thinly reviewed and susceptible to review manipulation (Prawesh and Padmanbhan, 2012). Anothe r issue of review systems is the existence of shill reviews, which threaten the effe ctiveness of the review systems. Shill reviews not only hurt consumers by tricking them to buying a product, but also hurts both honest and dishonest sellers. If a shill attack is successful, honest sellers canÂ’t sell their products and they will be eliminated from the marke t. The market will be filled with lemon products and eventually could collapse (Akerl of, 1970). Several attempts have been made to provide evidence about the existence of review manipulation. Hu et al. (2011a) views review manipulation as review management which they define as Â“vendors, publisher s or writers consistently monitoring consumer online reviews, posting non-authentic mess ages to message board, or writing inflated online reviews on behalf of customers when needed, with the goal of boosting their product sales, in the online review contextÂ”. By exploring book reviews on Amazon.com, the authors revealed that review manipu lation does exist with several groups of books namely non-bestseller books, popula r and high-priced book and books whose reviews have high divergence in helpfulness v otes. Also using the reviews on Amazon.com as the sample, Jindal et al. (2007) foun d that the problem of review manipulation is wide-spread. After examining over 5 .8 millions reviews on Amazon, the
11 authors found Â“a large number of duplicate or near duplicate reviews written by the same reviewersÂ”. Liu estimates that about 30% of online reviews are fake reviews. Another study found that 10.3% of the products on Amazon.co m are subject to online review manipulation (Hu, Bose, Koh and Liu, 2012). The downsides of review manipulation are that the a ction often has no effect and can be very costly if caught (Dellarocas, 2004). As an example, the Huffington Post reports that Â“Bestselling, award-winning crime auth or R.J. Ellory was caught faking Amazon reviews for both his own books and the books of his competitorsÂ”6. The author later issued an apology for this action. Such negat ive publicity has the potential to create long term damage to the reputation of the person ca ught faking reviews, potentially causing online stores to refuse to sell the product or consumers to be reluctant to purchase it both of which could reduce revenue for the pro duct being sold. As another example, Legacy Learning Systems was fined $250,000 by the f ederal trade commission (FTC) for hiring affiliate marketers to write positive review s7. The U.S. Federal Trade Commission caught Reverb Communications, a public relations fi rm, posting phony positive reviews on iTunes without revealing it was being paid to do so8. Despite the existence of review manipulation, there has been little research to und erstand or ameliorate it (Dellarocas, 2004; Hu et al., 2011a; Mayzlin, 2006). The difficulties of review manipulation research ar e ineffective detection methods and lack of labeled manipulated reviews (Hu et al., 2011a). There are several approaches 6 http://www.huffingtonpost.com/2012/09/04/rj-ellory -fake-amazon-reviewscaught_n_1854713.html 7 http://ftc.gov/opa/2011/03/legacy.shtm 8 http://www.inc.com/news/articles/2010/08/ftc-settle s-case-over-fraudulent-reviews.html
12 to prevent review manipulation. One approach is tha t websites can encourage more reviews to be submitted making it more difficult to change the average rating (Dellarocas, 2004). To encourage review submission, websites usu ally give some kinds of rewards to reviewers9. Another less common approach is to limit name cha nges by allowing the user to commit to their identification or charging entry fee (Friedman and Resnick, 2001). This reduces shilling because on other sites it usu ally costs almost nothing to create a user account allowing shills to easily publish mult iple reviews under different identities without any consequences. An alternative solution A mazon.com utilizes is to increase the credibility of the reviews by providing certificati ons such as publishing the reviewerÂ’s real name or indicating that the review was written by an Amazon verified consumer. Besides prevention of review manipulation, three ca tegories of shill review detection have been developed to reduce the prevale nce of shill reviews: review-centric, reviewer-centric and item-centric. The review-centr ic approach detects reviews submitted multiple times to multiple products. The reviews ar e then used to train a shill review classifier (Jindal et al., 2007). The drawback of t his approach is that not all shill reviews are duplicate. The reviewer-centric approach can ov ercome this drawback by analyzing the rating behaviors of individual reviews to ident ify suspicious reviewers (Lim, Nguyen, Jindal, Liu and Lauw, 2010). For example, reviewers who submit similar reviews for many products are considered suspicious allowing th e reviews to be flagged as spam. The primary drawback to the reviewer-centric approach i s that it is ineffective when reviewers use multiple identities. The item-centric approach focuses on analyzing the review set for 9 Multiple websites such as Epinion.com and Ciao.co. uk reward its member for writing reviews. Amazon.com recognizes the effort of posting helpful reviews of the reviewers by creating lists such as AmazonÂ’s Top Customer Reviewers and Hall of Fame Re viewers.
13 an item (Wu et al., 2010). The analysis determines if a group of reviews is removed from the review set and the ranking of the product signi ficantly changes, those reviews might be spam reviews. The problem with this approach is that it assumes that there will be homogeneity among reviews and has the potential to eliminate normal reviews Â– especially for products with wildly varied customer opinions. 2.3 Product feature extraction There are multiple methods to extract product featu res from the text comment of the reviews (Abulaish, Jahiruddin, Doja and Ahmad, 2009; Archak, Ghose and Ipeirotis, 2007; Liu, 2010). A sequential rule based method wa s used to extract product features (Liu, 2005). This method generates a set of rules a bout the location of product features in a statement. Then, all the statements are matched w ith that set of rules and the feature can be located. Hu and Liu (2004) and Dave, Lawrence an d Pennock (2003) used statistical patterns to detect the product features. First, POS tagging is used to identify nouns and noun phrases from the reviews. The nouns or noun ph rases which appear multiple times are classified as frequent features. A feature prun ing process is used to eliminate redundant frequent features. The frequent pruning m ethod above can be improved by calculating the Pointwise Mutual Information (PMI) score between the phrase and meronymy discriminators associated with the product class (Popescu and Etzioni, 2005). 2.4 Perceived quality There are two different kinds of quality: objective quality and perceived quality. Objective quality is defined as Â“the technical supe riority or excellence of the productÂ” (Zeithaml, 1988). Objective quality is the true qua lity of the product and is often stable
14 (Clark et al., 2009). Objective quality of the prod uct is observable and can be measured by predetermined standards. According to the litera ture, customer reviews canÂ’t reveal product objective quality because of biases and inf luences such as self-selection bias and culture influence (Hu, Pavlou and Zhang, 2006; Koh, Hu and Clemons, 2010; Moe and Trusov, 2011; Schlosser, 2005). Objective quality i s not the target of this study. Perceived quality is defined as Â“the consumer's jud gment about a product's overall excellence or superiorityÂ” (Zeithaml, 1988). Perce ived quality is not the same as the objective quality of the product. It is what the co nsumers think the quality of the product might be. Perceived product quality is an important factor that impacts consumer behaviors such as intention to buy or product selec tion (Jacoby, Chestnut, Hoyer, Sheluga and Donahue, 1978; Sawyer, 1975; Tsiotsou, 2005). T herefore, one way to influence purchasing behavior is to influence perceived produ ct quality. Shill reviews are used to change the perceived quality judgments of potential customers. The objective of this study is to measure how effective the shill reviews are in accomplishing this task.
15 3. Linguistic Characteristics of Shill Reviews 3.1 Development of hypotheses To understand the characteristics of shill reviews, we examine the differences between shill and normal reviews. The main differen ces between shill reviewers and normal reviewers are reflected by reviewersÂ’ knowle dge about the product and product usage experience. We assume that shill reviewers ha ve never used the target product, a reasonable assumption since it is too costly to sen d most products to shill reviewers and to compensate shill reviewers for the time necessar y to actually evaluate the product. Informativeness of a review is defined as the amoun t of product information provided in the review (Liu, Cao, Lin, Huang and Zh ou, 2007). Product information is represented by product features mentioned in the re view. Product features are divided into two categories: official features and unoffici al features. An official feature is a noun or a noun phrase about the product which is include d in the official product description. Official features are usually the product informati on that a consumer sees when reading the description of the product. An official feature is public information which is usually provided by the manufacturer of the product. An uno fficial feature is also a noun or a noun phrase about the product. However, unofficial features are not a part of the product description. An example of an unofficial feature is the word Â“caseÂ” which might not be mentioned in the product description but described in the reviews as an accessory that comes with the device. Hence, unofficial features a re private information known only to users of the product.
16 Product features in normal reviews may differ than those found in shill reviews. Expectancy theory posits that Â“the motivation force experienced by an individual to select one behavior from a larger set is some function of the perceived likelihood that that behavior will result in the attainment of various o utcomes weighted by the desirability of these outcomes to the personÂ” (Oliver, 1974). Becau se the reward from the act of writing a shill review is not high, expectancy theory sugge sts that shill reviewers will not spend time looking for additional information about the p roduct but rather use the readily available information provided by the product descr iptions when writing their reviews. This assumption is consistent with a study about cr iminal behavior which found that the amount of reward from a criminal act significantly impacts the intensity of the criminal activity (Viscusi, 1986). Shill reviewers are unlik ely to know about the unofficial features of the product and consequently their reviews will contain fewer unofficial product features and more official features. Thus, we hypot hesize that: H1a: Shill reviews contain more official features p er sentence than normal reviews. H1b: Shill reviews contain fewer unofficial feature s per sentence than normal reviews. H1c: The percentage of sentences containing officia l features in shill reviews is higher than that of normal reviews. H1d: The percentage of sentences containing unoffic ial features in shill reviews is lower than that of normal reviews.
17 Product usage experience is measured using the subj ectivity and objectivity of the sentences in the reviews. A subjective sentence Â“gi ves a very personal description of the productÂ” and an objective sentence Â“lists the chara cteristics of the productÂ” (Ghose et al., 2004). An example of a subjective sentence can be Â“ ItÂ’s really a great little playerÂ”. An example of an objective sentence can be Â“It even in cludes a computer USB interface and built-in speakerÂ”. Knapp, Hart and Dennis (2006) st ated that liars usually avoid statements of ownership because of lack of personal experiences. Agreeing with this argument, Newman, Pennebaker, Berry and Richards (2 003) showed that one of the important factors that distinguish deceptive senten ces from other sentences is selfreference. The findings suggest that shill reviewers will avoi d subjective statements in their reviews because they have never actually used or ow ned the product. Instead, they are more likely to focus on describing the product. In contrast, since normal reviewers have used the product, they have the experience using th e product and will be confident in expressing their feelings about the product they us ed. So, normal reviews are expected to include more subjective sentences than shill review s. We hypothesize that: H2: Shill reviews are less subjective than normal r eviews. Readability can be another measure to compare shill and normal reviews. Readability is defined as the cognitive effort requ ired for a person to comprehend a piece of text (Zakaluk and Samuels, 1988). Readability is usually measured by the length of the text, the complexity of the words and number of sen tences. Readability characteristics have been used as linguistic cues to detect text de ception (Afroz, Brennan and Greenstadt, 2012; Daft and Lengel, 1984). Moffitt a nd Burns (2009) finds that fraudulent
18 financial reports usually contain more complex word s making them less readable than truthful ones. Reviewing the literature, Vartapetia nce and Gillam (2012) suggests that texts that are less readable are likely to be decep tive. Thus, we hypothesize that: H3: Shill reviews are less readable than normal rev iews 3.2 Data Collection To explore the linguistic characteristics of shill reviews, a collection of shill and normal reviews are required. Shill reviews are revi ews submitted by shills who have undisclosed relationship with the seller. A normal review is free of undisclosed relationships between seller and reviewer unlike sh ill reviews. The collected shill and normal reviews are compared together to reveal thei r differences in informativeness, subjectivity and readability. 3.2.1 Shill review collection Reputation manipulation related studies require a d ataset of labeled shill reviews. It is difficult to obtain the labeled shill review dataset from publicly available reviews because there is no effective method to classify th em as shill reviews. Several studies have collected duplicate reviews and label them as Â“spam reviewsÂ” (Jindal et al., 2007; Jindal and Liu, 2008). While this approach is appro priate for some research about shill reviews, it is not appropriate for this study becau se a sufficient quantity of reviews for a specific product is required. Due to this challenge shill reviews must be collected as primary data. In this study, shill reviews were collected via a d ata collection procedure in which the subjects were asked to become shills and intent ionally write positive reviews for an
19 MP3 player. The participants were undergraduate stu dents at the University of Colorado Denver. Undergraduate students, a convenience popul ation, were chosen for this study because they are active technology and internet use rs. The students received some course credit as a reward for the writing the shill review s. To increase the quality of the reviews, we offered a chance to win a $20 gift card for five reviewers whose reviews are in the top five of most helpful reviews. Each participant coul d submit more than one shill review. To simulate real conditions for writing shill revie ws, the product information available to the subjects was limited. The subjects were provided with the product specifications and two pictures of the product. To ensure that the reviewers would not seek the productÂ’s information or its reviews onlin e, product identification information such as brand name, product name and model number w ere changed. The price of the product was also hidden from the reviewer. With the provided information, the subjects were asked to rate the product and write a short re view title and a text comment. Although the shill reviewers were asked to submit positive shill reviews, the reviewers were not given specific instructions about review c ontent, structure, and format. The subjects had no specified time limits to write the reviews.
20 3.2.2 Normal review collection While the shill reviews were collected as primary d ata, normal reviews were collected on Amazon.com. Although there is no foolp roof way to verify the lack of undisclosed relationships underlying the reviews co llected, risk was reduced by only including reviews that either disclosed the reviewe rÂ’s name or were Amazon.com verified purchasers. Shill reviewers typically will not disc lose their real names in shill reviews because of the risk of losing reputation. It is als o unlikely for a shill reviewer to actually buy the product just to have the Â“Amazon.com verifi ed purchaseÂ” badge because it will increase the cost of the shill reviews submitted. 3.3 Linguistic characteristics The linguistic characteristics analyzed in this stu dy are informativeness, subjectivity and readability. The informativeness o f the reviews is measured by the quantity of the product features included in a revi ew. Product features are extracted by feature extraction methods. In the following subsec tion, the background on current approaches to extract product features is discussed and the Description-based Feature Extraction Method (DFEM) is described. DFEM integra tes existing text mining tools to capture features from product reviews.
21 3.3.1 Product feature extraction background Liu (2010) divides product features into two catego ries: explicit feature and implicit feature. Explicit feature is usually descr ibed by a noun or a noun phrase. For example, in the sentence, Â“ The touch screen of this MP3 player is very sensiti veÂ” the explicit product feature mentioned about is the Â“to uch screenÂ”. The implicit feature does not mention a product feature directly. However, a explicit feature can be inferred from the implicit features. Implicit feature can be in a ny form. For example, bad durability can be inferred from the sentence, Â“ This camera dies after 3 days of useÂ” Another challenge for the feature extraction method is that some feat ures are context-dependent. For example, the word Â“penÂ” may not be a feature of a T V set but can mean a Â“stylusÂ” of a touch screen MP3 player. According to Liu (2010), there are two popular form ats of the product reviews. Format 1 includes a list of pros/cons at the beginn ing followed by the explanation text. Format 2 only includes the explanation text. The di fference between format 1 and 2 is the pros/cons section. According to the author, there s hould be different method to handle that section. Figure 3.1 shows the example of Forma t 1 and Format 2.
22 Format 1 Pros: Portability Sound Quality Expandable Memory Cons: Battery Life could be better Included headphones definitely do not do the player justice. While the smart phone replaces most mp3 players, wo uld you want to drop your 500-700 dollar phone while running or at the gym and risk b reaking it? Enter the Clip Zip. For a mere 100 dollars you can have a 36gb mp3 (4gb + 32g b microSD card) player that is the size of a book of matches, and has amazing sound qu ality. The improvements over the clip+ include a color screen for album art, support for your AAC files, and alphabetical browsingÂ… Format 2 I purchased a Clip+ Plus over a year ago and have e njoyed so much that I purchased a Clip Zip as a backup as I never wanted to be withou t my portable music. At first glance it seems to be a slightly upgraded model with a color screen and a stop watch. Which just about covers the main changes. The problem comes when you start loading it with mu sic and play lists. Just like the Plus model it supports external memory cards up to 32gb. However, unlike the Plus model, the Zip model does NOT support play lists stored on the external memory card. As I see it this makes the memory card a useless waste of space no way I'm going to navigate 32gb of songs one at a time. Figure 3.1 Examples of review format Liu (2005) uses the sequential rule based method to extract the features from the pros/cons section of Format 1. The basic idea behin d this method is to generate a set of rules about where the product feature might be loca ted in a statement. Then, all the statements will be matched with that set of rules a nd the feature can be located. The strength of this method is that it can detect not o nly explicit features but also implicit features.
23 For the second format, there are multiple methods t o extract product features from a review (Abulaish et al., 2009; Archak et al., 200 7). Hu et al. (2004) and Dave et al. (2003) uses statistical pattern to detect the produ ct features. First, Part-of-Speech tagging is used to identify nouns and noun phrases from the reviews. The nouns or noun phrases which appear multiple times are classified as frequ ent features. The feature pruning process is in place to get rid of redundant frequen t features. Then, the sentiment adjectives associated with the retrieved frequent f eatures are identified. These sentiment words are then used to detect the infrequent featur es. Popescu et al. (2005) improves the frequent pruning method above by calculating the Po intwise Mutual Information (PMI) score between the phrase and meronymy discriminator s associated with the product class. For more information about this method, see (Popesc u et al., 2005). 3.3.2 Informativeness The goal of DFEM is to identify product features me ntioned in product reviews and classify them into official features and unoffi cial features. Feature detection is context dependent because a term can have different meanings in different contexts. A noun or noun phrase might describe one feature of a product category but not the features of other products. For example, the word Â“noteÂ” mig ht involve the ability to record voice note of an MP3 player, but the same word doesnÂ’t de scribe a feature of a coffee-maker. Although DFEM can be a 100% automatic method, an op tional manual step can improve its classification accuracy. The DFEM uses basic NL P techniques such as POS tagging, sentence separator, approximate matching, word stem ming and spell checking to preprocess the reviews and use the publicly availab le product description to filter the features (Manning and Schtze, 1999).
24 Figure 3.2 The Description-based Feature Extraction Method Figure 3.2 gives an overview of the product feature extraction approach used in DFEM. Given the target product and product category, DFEM first collects the target product technical description. Then, it crawls to get all reviews of the target product, the technical description and reviews of all products in the same category as the target product. After that, the reviews of the target product are preprocessed for POS tagging. Next, the nouns and noun phrases extracted from the reviews of the target product are compared with ones in the target product technical description. If the term is a part of the product description, it is cl assified as official feature. The terms which do not appear in the product description go t hrough a filtering process that uses the technical description of other products in the sam e category to identify which terms represent unofficial features of the product (as op posed to simply nouns that are unrelated to the product category). The data collected from the review crawling process is sufficient for our study. 226 reviews were collected for the target product, an off brand MP3 player. Table 3.1 shows the distribution of target product reviews. T he four-star and five-star reviews were
25 used to create the positive normal review dataset w hich will be compared with the positive shill review dataset. The one, two and thr ee-star reviews were not used in the dataset in this study. Table 3.1 Distribution of target product reviews "r (r )* + ,) All the data related to the review (except for the username of the reviewer) were collected from Amazon.com. Table 3.2 shows the fields in the review table cove ring both target product reviews and other products in the MP3 category. The fields RealName and Verified are used to verify the authenticity of the review. The review crawling process yielded the description of 2,225 MP3 products with 68,981 revie ws. The product descriptions were used to filter official and unofficial features and the product reviews were used to check its popularity level. 339 products were eliminated because they did not include product descriptions. In addition, multiple products from t he same manufacturer have the same or similar product description which could cause a pro blem when they are used to filter unofficial product features. These products were no t eliminated because they have different customer reviews.
26 Table 3.2 Data structure of a review -rn .n .nrr ReviewID String Review ID is an combination between Amazon.com ASIN code and the position of the review Rating Integer The rating of the review Title String The title of the review Comment String The text comment of the review Helpfulness rating Ratio The helpfulness vote of th e review. Although this info with a String data type, it is stored in the format of # of #. For example, 7 of 8 means 7 out of 8 shoppers consider the review as helpful. ASIN String The ASIN number of the product. ASIN number is the private product ID on Amazon.com ReviewDate Date/Time The date the review was submit ted RealName Boolean Yes: Reviewer real name is disclos ed No: Review real name is not disclosed Verified Boolean Yes: The reviewer purchase is veri fied No: The reviewer purchase is not verified To identify the product features, one important tas k is to classify the type of each word in a sentence. The reviews must be broken into sentences before words are classified. After the reviews are broken into sente nces, the sentences are then tokenized. These tokens are the inputs for the POS tagging too l. We used the POS tagging tool from the OpenNLP toolkit10. Table 3.3 shows an example of the POS tagged sent ence. In this example, the first row contains the tokens, the sec ond row contains the POS tags and the third row contains the chunk tag. Â“grandsonÂ” is ide ntified as a noun and Â“boughtÂ” is a verb. For the full explanation of word type abbrevi ations, go to Penn Treebank II Tags11. 10 http://opennlp.apache.org/documentation/1.5.2incubating/manual/opennlp.html#tools.postagger 11 http://bulba.sdsu.edu/jeanette/thesis/PennTags.htm l
27 The next step involves word phrase identification. The Chunker tool from OpenNLP was used for phase identification. The Chunker identifi es Â“a birthday giftÂ” and Â“our grandsonÂ” as noun phrases.Original sentence: We bought this for a birthday gift for our grandson Table 3.3 Example of POS Tagging We bought this for A birthday gift for our grandson PRP VBD DT IN DT NN NN IN PRP$ NN B-NP B-VP B-NP B-PP B-NP I-NP I-NP B-PP B-NP I-NP O After the nouns and noun phrases are identified the y are pre-processed. The preprocessing step involves removal of stopwords, spel l checking, singularizing, word stemming and approximate matching. The preprocessin g step will produce two lists of terms: product technical description terms and revi ews terms. To ensure that all official terms are detected, it is necessary to find the syn onyms of term currently on the list. For example, the term Â“headphoneÂ” and Â“earphoneÂ” refer to the same product feature. So if the word Â“headphoneÂ” is already in the list, the sy nonym generator will add the word Â“earphoneÂ”. In this study, we used the SynSet tool/ to find the synonyms of given words. Since finding synonyms is context-dependent, a general tool canÂ’t find a complete list of synonyms. To compensate, an additional manu al step generated terms missing from the list generated by the synonym generating t ool. This step should increase the accuracy of this classification method. 12 http://lyle.smu.edu/~tspell/jaws/doc/edu/smu/tspell /wordnet/impl/file/synset/packagesummary.html
28 126.96.36.199 Official feature detection By definition, official features are mentioned in t he product description. Let: n be a set of features in the product description. rrr be a set of terms in the reviews of the target pro duct. be the set of all sentences in the target product reviews. contains a subset of is an official feature if r. If ris an official feature, it will be removed from Therefore, after the official feature detection step, becomes which only contains terms that are not official features. 188.8.131.52 Unofficial feature detection All the terms left in are noun and noun phrases that are not official fe atures. However, not all noun and noun phrases in are unofficial features of the product. The feature pruning process must be done to eliminate t erms that are unlikely to be product features. In prior research, multiple pruning steps were used to extract only those features that appear frequently enough (Hu et al., 2004). Al though this approach has been successful in detecting many features, it might als o ignore a many features which do not appear frequently. For smaller datasets, like the o ne used for this study, it is necessary to try to detect all the possible features even when t hey appear just a few times in the reviews.
29 After filtering all the official features, the rema ining nouns and noun phrases are possible unofficial features. However, many of thes e terms are not really unofficial features. A feature pruning process is used detect the non-feature terms. The feature pruning process has two stages. In the first stage, the product description of the other products in the same category as the target product is used to identify phrases that are likely to be unofficial features. If a term is prod uct feature, even though it is not mentioned in the description of the target product, is likely to appear in the description of other products in the same category. With all if at least product brands contain in their product reviews, term will be go to stage two of the pruning process. is an arbitrary parameter. In this study, (10% of the total quantity of brand names). This v alue is reasonable in the category of MP3 players because these players share many common features. In other product categories in which a feature is not suppor ted by many brand names, the value of k should be reduced. If a term is included in the d escription of products of 5 different brand names, it will have a good chance of being cl assified as a product feature. Quantity of brand names is counted instead of quantity of pr oducts because many products have identical or nearly identical descriptions. If a te rm appears in a description of many products, it doesnÂ’t necessarily means that it is a feature. Therefore, brand name is a stronger measure for feature popularity. The second stage of the feature pruning process eli minates extremely popular terms which are not product features. For example, although the word Â“friendÂ” might be included in the description of the products of 5 or more brand names, it is not a product feature. To make sure that a term is not an extreme ly popular term, we count its
30 occurrences in the reviews of all other products of the categories. If a term appears in more than p reviews, it is classified as extremely popular ter m and eliminated from the unofficial feature list. p is an integer number. In this study, if an unoffici al feature appears in more than 5% of the reviews, it is consi dered as extremely popular. 184.108.40.206 Performance The description-based feature classification method was used to detect and classify the features in the reviews of the target product. The review set included 60 positive shill reviews and 93 positive normal revie ws. To measure the performance of the description-based feature classification method, th e reviews were manually read and features were tagged. Then, results of the automati c method were compared to the manual classification results. Recall, precision, and harmonic mean ( F ) were used as measures of performance. !" #$%%&' #() $%%&') $%%&'! Table 3.4 contains the total quantity of features i dentified by human tagger. There are 3058 nouns and noun phrases in the review set. 1822 of them are product-related terms. 82.39% of the features mentioned in the revi ews are official features. Table 3.5 shows that the performance of the description-based feature detection and classification method is very promising. After the step 1, officia l feature detection, 1589 terms were classified as product features. Because this step j ust detects official features, not all the features, the recall is very low. The overall preci sion in step 1 is high because the tasks of detecting official features automatically and manua lly using the human tagger are very similar. Both look for terms mentioned in the produ ct description.
31 Table 3.4 Result of manual classification Quantity of manual features # of features/# of terms Official Unofficial 1822/3058 1511 311 Table 3.5 Performance of DFEM Step 1: Official features detection Quantity Recall Precision Harmonic mean (F) 1589 0.83 0.96 0.89 Step 2: Unofficial features detection Quantity Recall Precision Harmonic mean (F) 1980 0.96 0.86 0.91 Step 3: Popular term elimination Quantity Recall Precision Harmonic mean (F) 1870 0.94 0.91 0.92 Step 2 detected 391 additional product related term s. Since many of the unofficial product features are included in the newly detected terms, the recall was significantly improved from 0.83 to 0.96. However, many of the de tected unofficial terms are not actual unofficial terms, reducing the precision fro m 0.96 to 0.86. Step 3 reduced the number of false positive unofficial features detect ed, providing a recall of 0.94 and precision of 0.91. The results demonstrate that DFE M is an effective method for detecting both official and unofficial features in product reviews. 3.3.3 Readability Readability has been used in prior studies to predi ct the usefulness of customer reviews (Korfiatis, 2008; OÂ’Mahony and Smyth, 2010) Five popular readability measures are used in this study, namely the Fog-Ind ex (Gunning, 1969), the Flesch Reading Ease Index (Flesch, 1951), the Automated re ading test index, the Coleman-Liau Index (Coleman and Liau, 1975) and the SMOG Index ( Laughlin, 1969).
32 220.127.116.11 The Fog-Index Developed in the 40s, the Fog-Index was used to mea sure the readability of newspaper writing. It measures how well an individu al with average high school education can read an evaluated piece of text. The value range of the Fog index is from 1 to 12. Lower Fog-index means more readable text. Th e Fog index of each review can be calculated as follows: &*+,-./ 0&1 'r' !2++.3 "4&5$678&19 "48&19 :;Where: complex_word: word with three syllables or more 18.104.22.168 The Flesch-Kincaid index (FK) The Flesch-Kincaid or Flesch Reading Ease index is used to identify the number of years of education needed to understand a piece of text. The FK index calculation is based on syllables per words and words per sentence The value of this index is from 0 to 100 with smaller scores indicating less readable te xt. Text content with FK index higher than 60 can be understood by almost everyone. Advan ced content such as Harvard Law Review has scores in the 30s indicating a level und erstood by law school students. The FRE index of each review can be calculated as follo ws: <(+=,>?@2,+2.3 "48&19 "4'r'9 :!>-,=.3 "4AB9 "48&19 :
33 22.214.171.124 The Automated Readability Index This index is simpler than the other two indexes. T he calculation of this index uses the quantity of characters (excluding standard punctuation such as hyphens and semicolons) per word to measure of the readability of the text. The AR index ranges from 1-12 indicating the grade level to understand the t ext. For example, AR = 5 requires a fifth grade education to understand the review. The AR index can be calculated as follow: CDE-,F2.3 "4Gr9 "48&19 :!+,.3 "48&19 "4'r'9 :@(2,-? 126.96.36.199 Simple Measure of Gobbledygook (SMOG) Simple Measure of Gobbledygook is a readability mea suring method proposed by (Laughlin, 1969). SMOG is widely used, especially i n health documents. The main component of SMOG method is polysyllables defined a s words with 3 or more syllables. The formula is a regression of the interaction betw een the length of words and sentences. SMOG result also ranges from 1-12. SMOG is calculat ed as follows: HIJ2,+-? K ?+. LM'r%rA&$&AAB LM'r%rA&'r' !?,2(N2 3.5 Subjectivity Analysis The purpose of subjectivity analysis is to classify sentences in a text as subjective or objective. Subjectivity analysis is usually a st ep in a multiple step process to extract
34 the polarity/0 of a review (Pang and Lee, 2004a). Using movie rev iews, Pang et al. (2004a) used subjectivity analysis to detect subjec tive sentences in the reviews. According to these authors, the polarity of the rev iew can only be extracted from the subjective sentences. The subjectivity analysis app roach of Pang and Lee was implemented in a software module included in the Li ngPipe14 toolkit. This toolkit was used for subjective/objective sentence classificati on in this study. To classify the sentences, training data of labeled subjective and objective sentences must be obtained. One approach to automatically obtain labeled object ive/subjective sentences is using product description and product reviews (Ghos e et al., 2004). Objective sentences are extracted from the product description page and subjective sentences are extracted in the product reviews. About 3800 objective sentences were retrieved from the product description page on Amazon.com. More than 200 thous ands sentences were collected from the product reviews. To create two datasets wi th the same size, 3800 sentences were randomly selected from the subjective sentences dat aset above. 90 percent of each dataset were used in the training dataset and the remaining 10 percent sentences were used in the testing dataset. Table 3.6 shows the confusion matr ix of the subjectivity classifier. Table 3.6 Subjectivity classifier confusion matrix Response Objective Subjective Reference Objective 399 10 Subjective 14 386 13 According to Pang and Lee, 2004 , the term pol arity is used to indicate the sentiment of the review: positive or negative 14 http://alias-i.com/lingpipe/demos/tutorial/lm/read -me.html
35 3.4 Results and discussion 3.4.1 Results The empirical results answer many questions about t he differences between shill and normal reviews. Shill reviews and normal review s are compared based on informativeness, subjectivity and readability. Info rmativeness reflects knowledge of reviewers about the products they are reviewing. Su bjectivity shows personal assessments of the reviewers while readability is c ommonly used as linguistic cues to detect text deception. The above measures of shill and normal reviews are compared using one-tailed independent T tests. According to (Ruxton and Neuhuser, 2010), a one-tailed hypothesis test is justified when only one direction has meani ng and evidence of a difference in the other direction is treated identically to non-rejec tion of a two-tailed test. Each hypothesis indicates a direction consistent with a one-tailed test. In addition, non-rejection of the null hypothesis will be treated the same as evidence of a difference in the opposite direction. Non rejection essentially means that the characteri stic is not suitable to differentiate shill and normal reviews. For example, evidence that the quantity of official features in shill reviews is not larger than the quantity of official features in normal reviews indicates that this variable is not suitable to differentiate betw een shill and normal reviews. n
36 Table 3.7 show that shill reviewers concentrate on the official features included in the product description page. This conclusion is su pported by the rejection of the null hypotheses along with the large effect sizes for H1 a and H1c about the quantity of official features and the percentage of sentences c ontaining official features of shill reviews. The large effect sizes (1.019 and 0.966) p rovide evidence that shill reviews contain substantially more official features and pe rcentage of sentences containing official features than normal reviews. While there is not enough evidence to detect a difference between the quantity of unofficial featu res per review of shill reviews and that of normal reviews, we have enough evidence to show the difference between the percentage of sentences that contain unofficial fea tures in a normal review and that of a shill reviews. Although the negative effect size of -0.344 is small, it means that the percentage of sentences containing unofficial featu res per review of normal reviews is higher than that of shill review.
37 Table 3.7 Analysis results Hypothesis Measurement Source N Mean Std. Dev. Std. Err. p-Value one-tailed Effect size ( f2) H1a Official Feature Quantity per sentence Shill 61 14.59 8.57 1.100 0.000 1.019 Normal 93 6.47 7.84 .813 H1b Unofficial Feature Quantity per sentence Shill 61 3.20 2.72 .348 0.099 -0.185 Normal 93 3.72 4.84 .504 H1c % Sentence containing an official feature per review Shill 61 0.81 0.16 .021 0.000 0.966 Normal 93 0.60 0.25 .025 H1d % Sentence containing an unofficial feature per review Shill 61 0.16 0.13 .016 0.018 -0.344 Normal 93 0.22 0.20 .021 H2 Flesch-Kincaid Reading Ease Shill 61 71.82 10.52 1.347 0.001 -0.498 Normal 93 77.80 12.99 1.347 Gunning Fog Index Shill 61 11.18 2.70 0.346 0.056 0.263 Normal 93 10.24 4.08 0.423 Automatic Readability Index Shill 61 6.88 2.93 0.376 0.018 0.320 Normal 93 5.59 4.61 0.478 Coleman-Liau Index Shill 61 8.82 1.59 0.204 0.000 0.630 Normal 93 7.60 2.15 0.223 SMOG Index Shill 61 6.88 1.85 0.237 0.000 0.604 Normal 93 5.55 2.39 0.248 % Subjective sentence in the review Shill 61 0.68 0.26 0.033 0.000 -1.312 H3 Normal 93 0.93 0.14 0.014
38 These results show that hypothesis H1 is weakly sup ported. Despite this, we strongly believe that product feature is an importa nt variable that can be used to separate shill reviews from normal reviews. Mentioning offic ial features frequently does not mean that the review is informative, especially when the reviewer is simply repeating the information found in the product description. It co uld mean that the shill reviewer is trying to convince consumers that they know a lot a bout the product. Other evidence of this effect is that our result shows that 100% of t he unofficial features mentioned in shill reviews are also mentioned in normal reviews. These unofficial features are usually popular nouns (i.e. Â“sizeÂ” and Â“userÂ”) which are no t mentioned in the product description. Meanwhile, there are unofficial features which only the normal reviews discuss. The weakness of shill reviewers is that they donÂ’t have real product experience to know about the existence of these unofficial features. There is enough evidence to reject the null hypothe sis about the difference in percentage of subjective sentences in a normal and shill reviews. The large negative effect size means that the percentage of subjective sentences in normal reviews is substantially larger than that in shill reviews. In other words, normal reviewers tend to express their personal opinions about the product i n their product reviews, while shill reviews describe the features of the product instea d of giving their personal opinions about it. Thus, hypothesis 2 is strongly supported The results indicate that all readability measures except the Gunning Fox Index show sufficient evidence to reject the null hypothe sis of no difference between shill and normal reviews except for The Gunning Fog Index. Th e effect size of the Coleman-Liau, SMOG Indexes and the Flesch-Kincaid Reading Ease is medium while the effect size of
39 the Automatic Readability Index is small. This resu lt provides evidence to support hypothesis H3. Shill reviews are more difficult to read than normal reviews. 3.4.2 Discussion The occurrence of review manipulation has the poten tial to undermine the effectiveness of review systems. A successful shill attack might trick consumers into buying a low quality product and damage sales of co mpeting products. An unsuccessful shill attack (e.g. shill reviews are detected by co nsumers) might result in losing trust in review systems and driving consumers from the marke tplace. Therefore, a powerful shill review detection method is essential for online mar ketplaces moving forward. The results of this study indicate that official features, read ability and subjectivity of the reviews are reliable factors to separate shill reviews from nor mal reviews. These factors can be added to current methods to empower their ability to dete ct shill reviews. Effective shill review detection mechanisms help gain consumer trust in re view systems and maintain a fair marketplace. Product reviews have become an important resource f or both consumers and sellers on online marketplaces. For consumers, prod uct reviews provide an information channel, different from the ads of the sellers, abo ut the product features and their quality. Product using experience of the reviewers helps the consumers make an informed purchase decision (Bounie et al., 2005). With promi sing performance, the product feature extraction method proposed in this study might incr ease the benefits that reviews bring to the consumers and sellers. An effective product fea ture extraction method can enhance the performance of existing review summarization me thods making it easier for consumers to read reviews.
40 For sellers, reviews not only increase product sale s but also provide useful information about the product features the consumer s discuss (Cui et al., 2010; Ye et al., 2009). Manufacturers can gather feedback from their consumers by extracting the product features in reviews. The feature extraction method introduced in this study can detect official and unofficial features separately. The ab ility to detect official features provides product manufacturers with valuable consumer opinio ns on the product. In addition, comments on unofficial features might bring useful knowledge about the features that the consumers care about but not included in the produc t descriptions. This knowledge can be helpful in product marketing or improving the qu ality of the product.
41 4. The Impact of Shill Reviews on Perceived Quality 4.1 Theoretical background Perceived quality is an important variable in marke ting research (Zeithaml, 1988). Perceived quality not only influences the behaviors of the consumers but also provides manufacturers the information about what consumers think about their products. Zeithaml (1988) presents a model with factors that impact perceived quality. In the ZeithamlÂ’s model, the components that have the effe ct on perceived quality are reputation, an abstract dimension, and perceived mo netary price. These components are called Â“the perception of lower level attributesÂ”. For perceived monetary price, studies show that consumers donÂ’t always remember the price of the item but encode the price in a way that is meaningful or easy-to-remember to the m (Dickson and Sawyer, 1985; Jacoby et al., 1978). Instead of remembering the ex act price of an MP3 player which is $47.84, a shopper may encode it as Â“low or highÂ” or Â“affordable or expensiveÂ”. It is the perception of lower level attributes that has the d irect impact on perceived quality. With the emergence of review systems, a new factor that impacts perceived quality is product reviews. While advertising provi des product information from the manufacturerÂ’s perspective, product reviews provide product information from the product userÂ’s perspective. According to a report o f Neilson Company in 200915, online opinions are more trusted than most forms of advert ising. Li and Hitt (2008) found that prior to buying an experience good, product quality expectation of the consumers can be 15 http://blog.nielsen.com/nielsenwire/consumer/globa l-advertising-consumers-trust-realfriends-and-virtual-strangers-the-most/
42 affected by product reviews. Therefore, we expect t hat positive shill reviews can impact perceived product quality. Figure 4.1 Factors that impact perceived quality Three metrics, valence, volume and variance, are fr equently used to measure the impact of the rating summary on the first impressio n of product quality. Although the appearance of positive shill reviews in rating summ ary is not apparent, all three metrics are affected by the rating of positive shill review s. Since positive shill reviews usually have high rating (Mukherjee, Liu and Glance, 2012), the appearance of shill reviews in the review set will increase the valence of the pro duct. The volume increases when shill reviews are added. The ratings of shill reviews als o increase the variance of product ratings because only negatively rated products need help from shills. So the ratings of shill reviews must be in the opposite direction of other reviews in the review set (Wu et al., 2010).
43 Valence is usually represented by the average ratin g measure (Clemons et al., 2006; Dellarocas and Narayan, 2006; Dellarocas et a l., 2007). The average rating is the overall assessment of the reviewers towards the pro duct. Consumers use average rating to compare among products with the average rating serv ing as a proxy of product quality (Cui et al., 2010; Forman et al., 2008). A product with a better average rating can be considered a better product. Agreeing with this arg ument, Moe et al. (2011) stated that Â“a Â‘goodÂ’ product is likely to experience higher sales and receive more positive ratings than a Â‘badÂ’ productÂ” (Moe et al., 2011). Volume is usually measured by the quantity of revie ws. Volume has been found to impact product sales (Awad et al., 2006; Duan et al., 2008; Liu, 2006). One reason for this effect is that volume of reviews shows the lev el of discussion about the product which, can help increase the awareness among consum ers (Cui et al., 2010). The volume of ratings received is one measure to estimate the size of the group of consumers who have bought and used the product. A larger group me ans more people have used the product regardless of the ratings they provide. Variance is measured using the statistical variance of the ratings. Typically, variance is available to consumers in a rating dist ribution chart. Variance has also been found to have a significant effect on product sales Variance represents the disagreement among the reviewers about a product (Awad et al., 2 006). High disagreement among the reviewers means different users perceive product qu ality differently. A large variance does not necessarily mean that the product is good product. Instead, variance in rating may just signal that the product is suitable for a portion of consumers and less suitable for
44 others. Sun (2008) found that when the average rati ng of the product is low, more variance helped to increase profit. Based on the prior research related to valence, vol ume and variance, we hypothesize: H4a: The valence of product ratings positively impa cts perceived product quality. H4b: The volume of product ratings positively impac ts perceived product quality. H4c: The influence of variance on perceived product quality is affected by average rating. Consumer behavior is affected by risk perception be cause any action may have unanticipated consequences (Bauer, 1960). Bauer (19 60) revealed that perceived risk is associated with consumerÂ’s data acquisition process both before and after purchase decision. According to (Lutz and Reilly, 1974), whe n product risk perception is high, consumers tend to collect more information about th e product. The more information is collected, the less unknown problems about the prod uct are found. Since word-of-mouth has an effect on perceived risk (Ross, 1975), we ar gue that word-of-mouth can influence the data acquisition process. In this study, the da ta acquisition process is represented by the review usage of consumers. Usage of actual review comments can also be related to the valence, volume and variance of reviews. Research shows that average ra ting is used as a proxy of product quality (Cui et al., 2010; Forman et al., 2008). Th us, better ratings can reduce perceived product risk. Similarly, a large variance shows di sagreement among the reviewers which can lead to higher perceived risk (Awad et al., 200 6). Review usage can also be related to
45 the number of reviews available to read. Past rese arch has shown that consumers read no more than two pages of reviews (Pavlou and Dimoka, 2006). Since thinly reviewed products have small number of reviews, volume dicta tes the quantity of reviews available for the consumers to read. Thus, we hypothesize tha t: H5a: The valence of product ratings positively impa cts the total quantity of reviews read. H5b: The valence of product ratings positively impa cts the median time spent on reading reviews. H6a: The volume of product ratings positively impac ts the total quantity of reviews read. H6b: The volume of product ratings positively impac ts the median time spent on reading reviews. H7a: The variance of product ratings positively imp acts the total quantity of reviews read. H7b: The variance of product ratings positively imp acts the median time spent on reading reviews. Prior to product purchases, consumers purchasing on line usually donÂ’t have physical contact with products. Thus, consumers te nd to look for additional product information from previous product users. By reading the reviews, consumers might collect some new information which is not available by using the rating summary or in the official product description. The new informati on might change the consumerÂ’s first impression about the quality of the product. As an example, an experiment compared one group of participants that read positive reviews of a film with another group of participants that read negative reviews of the same film (Wyatt and Badger, 1984). After viewing the films, the participants were asked to e valuate the films. The results showed
46 that direction of the reviews significantly impacte d the direction of the evaluation. In practice, negative reviews have been found to impac t product sales (Cui et al., 2010; Dellarocas et al., 2007; Forman et al., 2008; Moe e t al., 2011). One explanation for this effect is that negative reviews negatively impact p erceived product quality which, in turn, negatively impacts the buying decision of the consu mers (Buttle, 1998). We hypothesize that: H8a: Total quantity of negative normal reviews read negatively changes the first impression about product quality. H8b: The median time spent on negative normal revi ews negatively changes the first impression about product quality. As shown in (Wyatt et al., 1984), if positive revie ws are read, they can impact consumer perception about product quality. The reas on is that consumers read positive reviews to strengthen their beliefs about the quali ty of products (Moe, 2009). The rating of positive shill reviews is usually 5 stars or 4 s tars (Mukherjee et al., 2012). Because the overall rating of a product is calculated as the si mple average of its ratings (Jsang and Ismail, 2002), shill reviews with high ratings incr ease the overall average rating of the target product, especially thinly reviewed products In this study, shill reviews are injected when the product is negatively rated. In s uch a situation, positive shill reviews might have better chance to be read because the sub jects seek opposing opinions. Thus, we hypothesize that: H9a: Total quantity of shill reviews read positivel y changes the first impression about product quality.
47 H9b: The time spent on reading the shill reviews po sitively changes the first impression about product quality. Table 4.1 Summary of the hypotheses n rnr nnnrn nr r nnnrn rn nnrnnnnr nrnnnnr n nn!nrn"#nn! rn nnnr $nn#n %nnrn&rnn#n %nnrn'nrnnr (nrn&rnn#n (nrn'nrnnr )nrn&rnn#n )nrn'nrnnr n nnnrr &rnrnnn#n nrn nnnnr 'nrnnrn nrnnn# nrn nnnnr + &rrn nn#n rn nnnnr + 'nrnnrn rn nn# rn nnnnr
48 4.2 Models Three linear regression models were used to test th e hypotheses (Figure 4.2). Model 1 answers the research question about the imp act of the rating summary on the first impression a consumer may have about a produc t. In model 1, the average rating positively impacts perceived quality and also influ ences the effect of review variance on perceived quality. If the average rating is high, m ore variance is expected to negatively impact perceived quality. If the average rating is low, more variance may positively impact perceived quality. In Figure 4.2, Model 2 overlaps with both Model 1 a nd Model 3 in the way that the independent variables of Model 2 are also the i ndependent variables of Model 1 and the dependent variables of Model 2 are the independ ent variables of Model 3. In Model 1 and 2, average rating, volume and variance are the factors that impact both the first impression about product quality and review usage. Average ratings, variance and volume are all expected to have positive effect on review usage. Review usage, then, will change the first impression about product quality b ecause the content of the reviews provides more detailed information about the produc t. Model 3 addresses the change in product quality per ception once the content of the reviews is read. The review usage is measured b y two variables: total quantity of reviews read and median time spent on each review. The review usage of both shill reviews and normal reviews is observed separately i n order to measure the effect of both types of reviews on the change of quality perceptio n.
49 Figure 4.2 Relationship of the models Model 1: O PQP!QCRDr%'*!QS&Dr%'*!QTSDr%'*UCRDr%'*!V Where: O P : The first impression of product % CRDr%'* W "&5Dr%'*X! W G%Dr%'*X nY ZX[ nY \X[' !' S&Dr%'*' !'
50 SDr%'* ] W 4"&5Dr%'*X@CRDr%'*9! W 4G%Dr%'*X@CRDr%'*9 nY ZX[ nY \X[' !' ^ Quantity of normal reviews in the review set of pr oduct % ^ Quantity of shill reviews in the review set of pro duct % Model 2: OQP!QCRDr%'*!QS&Dr%'*!QTSDr%'*!V H_P!_CRDr%'*!_S&Dr%'*!_TSDr%'*!V Where: Ois the total number of reviews of product i read by the consumer. His the median time the consumer spends on the shill reviews of product i Model 3: ` O O @ O P_P!_O a!_H a!_TO b!_cH b!V Where: O is the perceived quality of product i after the reviews are read. ` Ois the difference in perceived quality before and a fter the reviews are read. O a is the total of normal reviews of product i read by the consumer. H a is the median time the consumer spends on the norm al reviews of product i O b is the total of shill reviews of product i read by the consumer.
51 H b is the median time the consumer spends on the shil l reviews of product i 4.3 The experiment Testing the impact of shill reviews on perceived qu ality requires both shill reviews and normal reviews to be gathered and shown to the subjects. Shill reviews were submitted by shills who have an undisclosed relatio nship with the seller. A normal review is free of undisclosed relationships between seller and reviewer unlike shill reviews. The collected shill and normal reviews were mixed toget her to create different review sets. The same product was shown to the subjects along wi th one of these review sets. Different review sets with different shill and norm al review combinations allowed us to measure the impact of the reviews on perceived qual ity. Figure 4.3 illustrates the steps of the experiment.
52 Figure 4.3 The experiment The experiment simulates the situation in which a s eller wants to dishonestly promote a thinly reviewed product. In the experimen t, an MP3 player was shown to the subjects with basic product technical information a long with the overall rating information such as average rating, quantity of rev iews and the distribution of the reviews. The review set of the product was a random mix between positive shill reviews and negative normal reviews. With the available inf ormation, the subjects were asked to give their opinion about their perception of the pr oduct quality. This response could not be changed after submitted. Then, the subjects were shown the rating, title and text comment of all the reviews. The review usage of the subjects was recorded. Finally, the subjects were asked about the product quality perce ption again. The comparison of the
53 product quality perceptions before and after the re views were read indicated the impact of the review content on product quality perception. Perceived quality is a multidimensional construct w hich is unobservable, context dependent and difficult to measure (Zeithaml, 1988) Accurate measurement of perceived quality involves identification of specific quality dimensions and careful justification of the validity of the dimensions (Parasuraman, Zeltha mI and Berry, 1985). Instead of multidimensional measurement, unidimensional scale has been used to measure quality (Moorthy and Zhao, 2000; Tsiotsou, 2005; Zeithaml, 1988). The problem with using a unidimensional scale to measure perceived quality i s the difficulty to interpret results. A unidimensional scale cannot provide detailed inform ation about specific quality dimensions associated with respondent ratings. Beca use there are too many product quality dimensions mentioned in review content, it is difficult to design a multidimensional scale which measures all of these dimen sions. Therefore, unidimensional scale is selected to measure perceived quality in t his study (See Appendix B). The review usage is measured in two dimensions: qua ntity of reviews read and median time spent on each review. The total quantit y of reviews read provides the quantity of each type of reviews (e.g. normal and s hill) read by the consumers. To isolate the impact of shill reviews, we have to determine t hat shill reviews were read. Median time spent on a review measures the reading effort of the consumers. To recruit the subjects for this experiment, a prob ability sample of undergraduates and graduate students was used. Students are an app ropriate population for this study because they are young and familiar with the intern et and online shopping. In addition, students are active technology users, especially MP 3 players. 6000 invitation emails were
54 randomly sent to 17,000 students. To comply with th e requirement of probability sampling, every student in the list had the same pr obability of receiving the invitation email. The quantity of students who participated in the experiment was 175. This sample size is reasonable according to the analysis in sec tion 4.4.1. 4.4 Results and discussion 4.4.1 Multicollinearity testing and sample size The independent variables of all three models were tested for multicollinearity. The scatter plots were used to test for multicollin earity. Figure 4.4 suggests that there is no multicollinearity among the independent variable s in all three models. If multicollinearity occurs between two variables a li near pattern should appears on the scatter plot. Figure 4.4 The scatter plots of independent variabl es There are two approaches that we can take to test t he models: stepwise and confirmatory specification. Stepwise approach allow s us to find the best set of the predictor variables (Hair, Black, Babin, Anderson a nd Tatham, 2005). However, stepwise approach might drop variables of interest which we really want to know about their
55 effect. It is equally important to know if a partic ular factor is important or not. For example, the result of this estimation helps determ ine if the quantity of positive shill reviews read has a positive impact on perceived qua lity. Since the purpose of the estimation process is not to look for the best model but to confirm the role of the variables of interest, we d onÂ’t use the stepwise approach. Instead, we take the confirmatory specification approach to test the models because this approach helps us answer the research questions. T-test is u sed to assess the significant level of the coefficients. To assess the overall model fit, we u se adjusted R2. Since this research involves multiple t-tests, the problem of multiple comparisons must be addressed. The multiple comparisons problem involves two error rates (www.statistics.com/)), the comparison-wise rate applying to individual t-tests and the family-wise rate applying to the entire set of t-te sts. The family-wise rate indicates the probability for making at least one type I error wh en conducting the experiment. So, even though the comparison error rate is satisfied, the family-wise error rate may not be acceptable. The experimenter determines one rate a nd the other rate is determined by the multiple comparisons technique (Huberty and Morris, 1989). In our experiment, we set the family-wise error to Qde,+. According to (McClave, Benson and Sincich, 2008), there are three widely used techniques to address t he problem of multiple comparison: Bonferroni, Scheff and Tukey. We selected the Bonf erroni corrections method in this experiment because we do pair-wise comparisons and our samples have unequal sizes. Table 4.2 shows the results of the Bonfferoni corre ction method. 16 www.statistics.com is the official website of the Institute for Stati stics Education
56 Table 4.2 Family-wise and comparison-wise Type I er ror rate Model Q de Q fe 1 0.05 0.017 2 0.05 0.017 3 0.05 0.010 According to (Hair et al., 2005), factors that migh t impact on the statistical power are the effect size, the type I error rate () and sample size. Larger effect size is more likely to result in higher statistical power. Since type I and type II errors are inversely related, reducing alpha will increase the probabili ty of a type II error which then decrease the power. A larger sample size also increases the power of the test. Following the guidelines suggested in (Cohen, 1998), our objectiv e for the power level of the models is at least 0.80 and the type I error rate shouldnÂ’t be larger than 0.05. In this study, we use CohenÂ’s to measure the effect size. According to (Cohen, 1 998), the effect size measured by CohenÂ’s can be divided into three levels: small (,2+), medium (,() and large (,-+). Table 4.3 shows the minimum sample size required wi th power=0.8 and at different level of effect sizes. So in case the eff ect size is small, we set alpha at 0.01 with a sample of size of 174.
57 Table 4.3 Minimum required sample size alpha ()=0.05 alpha ()=0.01 Quantity of independent variables Quantity of independent variables Effect size 3 4 3 4 Small (0.10) 112 124 160 174 Medium (0.25) 48 53 68 74 Large (0.40) 32 35 45 49 Another approach to identify the sample size is to use the rule of thumb suggested in (Hair et al., 2005). The ideal sample size for m ultiple regression models should follow the 20 observations to 1 independent variable ratio Because the most complicated model in our study contains 4 independent variables, we n eed a sample with at least 80 data points. However, we will want to collect as much da ta as possible because the size of the sample has a direct impact on the statistical power of linear regression model. The most difficult challenge of getting a desired sample is the availability of resource for the incentive. To collect the required sample, we gave a $5 gift card per complete submission. 4.4.2 Results The empirical results answer the questions about th e influence of product reviews on perceived quality and the effectiveness of shill reviews on changing consumer quality perception. Three regression models were employed t o estimate the impact of variables in rating summary on the first impression and review u sage and the impact of review usage on the difference in final perceived quality and th e first impression.
58 We are concerned about the correlation between vari ance and the average rating in the review sets because each review set is the r esult of mixing the reviews from two groups of reviews with opposite rating values. The positive shill review group consists of only reviews with high ratings. The negative normal reviews group consists of only low rating reviews. Therefore, when the average rating is approaching the medium value (e.g. 3-star) the variance is larger. This concern is par tly addressed by using 4and 5-star reviews in the high rating review group and 1and 2-star reviews in the low rating review group. As indicated in These results are similar to the c onclusions of Hu et al. (2012) that volume and variance are not reliable factors t o predict perceive product quality. Table 4.4, there is a significant relationship betw een average rating and a consumerÂ’s first impression about the quality of th e product. The effect size of the model is large. This result supports the statement by For man et al. (2008) saying that the product rating can be used as a proxy for product quality. The results suggest that one additional star in the average rating can increase perceived q uality rating by 0.72. This situation is ideal for shill attacks because the direct result o f a positive shill review is an improved average rating of the product. Therefore, if consum ers use only the average rating to assess the quality of the product, shill attacks ca n be very effective in making them think that the quality of the product is good. The effect s of variance and volume on the first impression were not statistically significant. This makes senses because unlike average rating which directly reflects product quality, vol ume and variance only imply the size of the population of product users and the agreement a mong reviews about the product.
59 These results are similar to the conclusions of Hu et al. (2012) that volume and variance are not reliable factors to predict perceive produc t quality. Table 4.4 The impact of shill reviews on the first impression Model Coefficients Significance B S.E Constant 1.935 0.302 0.000 AveRating 0.720 0.074 0.000 VarRating -0.007 0.059 0.909 VolRating -0.039 0.031 0.211 Dependent variable: Adjusted R-square: Significance of model: Effect size: The first impression 0.353 0.000 0.572 The results from Table 4.5 show that only volume ha s a statistically significant impact on the total quantity of reviews read by the consumers. This result makes sense because volume limits the quantity of the reviews t hat are available for the consumers to read. Furthermore, in this study, the maximum quant ity of reviews for each product was 10. Because this quantity is small, the subjects mi ght read entire review set regardless of the variance and the average rating of the reviews. Table 4.5 Factors that impacts quantity of reviews read Model Coefficients Significance B S.E Constant -0.165 0.462 0.722 AveRating 0.147 0.113 0.194 VarRating 0.113 0.090 0.211 VolRating 0.792 0.048 0.000 Dependent variable: Adjusted R-square Significance of model Effect size Quantity of reviews read 0.643 0.000 1.849
60 Table 4.6 indicates that only volume impacts median time spent on each review. The effect size of this model is very small. One ca n interpret from the results that when more reviews are available, the subjects spend less time on reading the reviews. In combination with previous results, we can conclude that when the quantity of reviews is small, consumers tend to read all the reviews, howe ver, they spend less time reading each individual review. The data also supports this conc lusion. The data shows that 82.43% of the participants read all reviews that are availabl e. The subjects who read all of their reviews have 5.21 reviews in the review set while t hat number for the subjects who did not read all of the reviews have 6.44 reviews. Howe ver, the difference in quantity of reviews between the two groups of subjects is stati stically insignificant (p-value = 0.066) with 95% confidence level. Table 4.6 Factors that impacts median time spent on each review Model Coefficients Significance B S.E Constant 25.639 3.347 0.000 AveRating 0.248 0.819 0.763 VarRating 0.037 0.650 0.951 VolRating -1.054 0.346 0.003 Dependent variable: R-square Significance of model Effect size Median time spent 0.041 0.017 0.060 The subjects were asked to assess the quality of th e product at two different periods: before and after they read the content of the reviews. After reading the reviews, 66.9% of the subjects changed their first impressio n about the quality of the product. The results from Table 4.7 indicate that shill reviews have a statistically significant impact on the change of quality perception while normal revie ws do not. Reading an additional shill review increased the perceived quality rating by 0. 095. The results also suggest that one
61 additional second spent on reading a shill review i ncreases the perceived quality rating of the consumers by 0.012. Small adjusted-R2 and effect size indicate that the usage of the reviews only explains a small part of the change in consumer quality perception. Additional will be necessary to find the major fact ors that influence the change in perceived quality. Table 4.7 Factors that impacts the change in percei ved quality Model Coefficients Significance B S.E Constant -0.399 0.212 0.061 TQ S 0.095 0.041 0.020 MT S 0.012 0.005 0.021 TQ N -0.096 0.049 0.052 MT N -0.003 0.006 0.626 Dependent variable: Adjusted R-square Significance of model Effect size Change in perceived quality 0.081 0.001 0.114 Further analysis also provides evidence about the e ffect of shill level on perceived quality. The shill level is measured as the percent age of shill reviews in the review set. Each subject is exposed to shill reviews at two dif ferent levels. At rating summary level, the shill level contains all shill reviews and norm al reviews for the product. After the subjects read the reviews, the shill review level c an be measured as the percentage of shill review actually read. The results in Table 4.8 show that both shill levels have a significant impact on perceived quality. If the ini tial shill level increases by 1%, the first impression perceived quality rating increases by 0. 024. Once the subjects read the reviews, if the shill level based on reviews read i ncreases by 1%, the final perceived quality rating increases by 0.034.
62 Table 4.8 The impact of shill level on perceived qu ality Model Coefficients Significance Effect size ( f2) Adjusted R2 B S.E Dependent variable: First impression Constant 2.594 0.163 0.000 0.46 0.347 Shill Level 0.024 0.002 0.000 Dependent variable: Final perceived quality Constant 1.886 0.161 0.000 1.15 0.531 Shill level of reviews read 0.034 0.002 0.000 4.4.3 Discussion The results suggest that shill reviews are more inf luential for thinly reviewed products because shill reviews have the greatest po tential to improve overall product ratings for products with fewer normal reviews. In addition, the effect of shill reviews was even stronger after being read. One explanation for this result is that shill reviews usually are extreme which can be more persuasive to consumers than more neutral (unbiased) reviews (Mukherjee et al., 2012). Anothe r reason for shill reviews to have such an impact is that consumers are unaware that a particular review is actually a shill review. These findings should raise the awareness a bout the danger of review manipulation. Online marketplaces should pay more a ttention to developing effective shill review detection methods to protect both cons umers and honest sellers. The empirical results show that consumers use the a verage rating as an indicator of product quality. This result is consistent with prior research that suggests that average product ratings are used as the proxy of product qu ality. Therefore, maintaining a good product rating is very important to the long-term s uccess of products. Regardless of the review quantity, products with a better average rat ing create a better first impression
63 about product quality. However, the overall review quantity does influence the quantity of reviews read by consumers. When the quantity of reviews is relatively small (e.g. less than or equal to 10), consumers tend to read all th e reviews.
64 5. Conclusions In this chapter, the content of the previous chapte rs are summarized and the implications of the findings, limitations and futur e work are discussed. 5.1 Summary of chapters Chapter 1 introduced the problem of product reputat ion manipulation. Product reputation is manipulated using shill reviews. The objectives of this study are to explore the linguistic characteristics of shill reviews and to measure the impact of shill reviews on perceived product quality. Chapter 2 reviewed the literature about the effect of product reviews on consumer behavior, the approaches on shill review detection and deterring methods, the methods to extract product features from product reviews and t he concept of perceived quality. This study explores the linguistic characteristics of sh ill reviews and measures the impact of positive shill reviews on perceived product quality These two aspects were not fully addressed in the literature. Chapter 3 addressed questions about the linguistic characteristics of shill reviews. To reveal these characteristics, shill reviews are compared to normal reviews using measures of informativeness, subjectivity and reada bility. Informativeness is measured based on the quantity of official and unofficial pr oduct features included in the reviews. Product features are detected and classified by a n ovel approach called Description-based feature extraction method. Subjectivity is measure d using the well-known subjectivity extraction model proposed by Pang et al. (2004b). R eadability is measured using five
65 readability indexes widely used in studies about on line reviews. The above measures of shill and normal reviews are compared using one-tai led independent samples T-tests. Chapter 4 measured the impact of shill reviews on p erceived product quality. In an experiment, consumer quality perceptions are mea sured: when consumers have only seen the rating summary and when consumers have rea d the content of the reviews. At both occasions, a quantity of shill reviews are inj ected into productÂ’s review sets. The measures of perceived quality are then compared to reveal the impact of reading shill reviews on perceived product quality. 5.2 Results and implications 5.2.1 Linguistic characteristics of shill reviews Shill reviews are an emerging problem of online rev iew systems. To effectively detect shill reviews, it is important to distinguis h them from the normal reviews. The objective of this study was to explore the characte ristics of shill reviews by comparing them with the normal reviews and create a novel met hod to perform product feature detection and classification. Unlike previous studi es which try to detect shill reviews from publicly available reviews, this study collect s shill reviews via a data collection procedure. Official and unofficial features of the product were extracted from the reviews using a description-based feature extraction method Having a wide the variety of unofficial features included in a review indicates the knowledge of the reviewer about the product. Our results suggest significant differences between shill reviews and normal reviews in terms of informativeness, readability an d subjectivity level. The shill reviews
66 are less readable than normal reviews. The content of shill reviews is usually repetitive and long because shill reviewers try to mention as many features as possible in the reviews. The repetitive content of the shill review s focused mainly on the official features included in the product description. This is not su rprising as the product description is the main source of information used by shill reviewers. In addition, unlike normal reviewers who usually use subjective statements to express th eir personal opinion about the product, shill reviewers use more objective sentenc es similar to ones in the product description. This finding demonstrates that normal reviewers personally used the product and were confident in judging the product. In contr ast, shill reviewers had no personal experiences using the product and just described th e product features in their reviews. 5.2.2 The impact of shill reviews on perceived qual ity Customer reviews play an important role in the succ ess of online product sales. Our explanation for this relationship is that custo mer reviews impact perceptions of product quality which affects purchase decisions an d product sales. To increase product sales, shill reviews are published to improve the r eputation of the target product. The objective of this study is to understand the relati onship between product reviews and quality perceptions and the role that shill reviews play in influencing perceived product quality. Collecting shill reviews via an experiment and norm al reviews on Amazon.com, we compose multiple different review sets for an MP 3 player. In another experiment, the same product with different review sets is shown to different groups of consumers. Perceptions of product quality are measured using a survey first when consumers only see the rating summary and then after consumers rea d the review content. Three linear
67 regression models are used to measure the impact of review rating summary on the first impression and review usage and the impact of consu mer reading behaviors on the change in perceived product quality. The results show that product average rating affect s the first impression and review quality influences total number of reviews r ead and median time spent on reading each review. However, not all reviews had the same effect on changes in consumer quality perceptions. Reading normal reviews didnÂ’t make the consumer change their mind about the quality of the product. However, whe n a consumer read more shill reviews and spent more time reading shill reviews, his assessment about the quality of the product changed. The results show that when there a re more shill reviews in the review set, perceived product quality increases. This resu lt is true both before and after the consumers read the reviews. After reading the revie ws, the effect of shill reviews was even stronger indicating that consumers were unable to detect that reviews were shill reviews and the shill reviews were successful in in fluencing the assessment about product quality. This finding is consistent with the conclu sion by Jindal et al. (2008) who states that it is impossible to distinguish shill reviews from normal reviews even if they are manually read. The findings of this study have both theoretical an d practical contributions. Theoretically, we provided evidence that word-of-mo uth, in an online shopping environment, can be used as an important indicator of perceived product quality. Practically, marketers can use online review rating as a measure for perceived product quality. Online reviews can also be used as a tool to improve consumers perceptions of
68 quality, especially in cases where advertising does nÂ’t effectively do so (Clark et al., 2009). Our approach is different from the previous studies in several ways. First, shill reviews were submitted by real shills. Unlike other studies that collected publically available reviews from online marketplaces and clas sify them as shill reviews, this study collected shill reviews in an experiment in which t he subjects posed as shills. Second, most of the previous studies focus on the sales of the product while our focus is on perceived product quality. We explain the reason be hind the association between customer reviews and sales is because of the mediat ing effect of perceived product quality. We extend the Zeithaml framework by identi fying the relationship between customer reviews and perceived quality. Third, we d etermined the connection between the overall reputation information and the review u sage which were not addressed in previous studies. It is feasible to explore this re lationship because the experiment setting allows us to monitor the review usage. Finally, whi le other research indicated awareness of shill reviews, the effect of shill reviews on co nsumers was unknown. In this study, we measured magnitude of the impact of the shill revie ws.
69 5.3 Limitations and future research 5.3.1 Negative shill reviews and shill review detec tion The first part of this study has several limitation s. First, although the sample size meets the requirements for the analysis method used it is still relatively small. Our results might be more convincing in a larger sample Second, we only collected positive shill reviews while negative shill reviews are also very interesting to analyze. In a market with few competitors, damaging the reputation of th e competitors might result in increasing sales of a target product. So the incent ive for submitting negative shill reviews can also be very high. Third, our method is limited to comparing the characteristics of shill and normal reviews. Future research is neces sary to address these limitations and to extend this approach such that it can become a robu st method to detect shill reviews. 5.3.2 The impact of shill reviews on product prefer ence The second part of this study also has a number of limitations. First, only positive shill reviews are considered while there are also n egative shill reviews, especially in the market where the competition is narrow. Second, alt hough prize money was offered to improve the quality of shill reviews, the quality o f the shill reviews is not verifiable. Third, only one product was used in the experiment. To generalize the results, different product categories should be used. Fourth, the popu lation is convenience population. Finally, we only collected reviews from hired shill reviewers who never used the product. In real life, the shill reviews can come from anyon e including the author of the book or the manufacturer of the television. A shill review written by the author of a book may differ from a shill review submitted by someone who never read that book.
70 This research can be extended in multiple direction s. Further work can be done to measure effect of shill reviews on perceived qualit y when other factors such as prices and product features are not fixed. In such a situation it is interesting to see if shill reviews are powerful enough to make consumers change their product preference. Another future direction for this research is to measure how the o rder of appearance changes the effectiveness of shill reviews because higher revie ws on the list have a better chance of being read. We also want to look into the character istics and effect of negative shill reviews in a future study. Negative shill reviews c an be effective, especially in case where limited quantity of products is being offered and the consumers donÂ’t many products to choose. Finally, indirect review manipu lation is an important area to investigate. Indirect review manipulation occurs wh en shill reviews are rated as helpful by the shills themselves. The answer to the questio n about how highly rated shill reviews impact perceived quality is another promising area for future study.
71 References Abulaish, M., Jahiruddin, Doja, M. N., and Ahmad, T "Feature and Opinion Mining for Customer Review Summarization," in: PReMI '09 Proceedings of the 3rd International Conference on Pattern Recognition and Machine Intelligence 2009, pp. 219-224. Afroz, S., Brennan, M., and Greenstadt, R. "Detecti ng Hoaxes, Frauds, and Deception in Writing Style Online," in: 2012 IEEE Symposium on Security and Privacy (SP) San Francisco, CA, 2012. Akerlof, G. "The Market for "Lemons": Quality Uncer tainty and the Market Mechanism.," The Quarterly Journal of Economics (84:3) 1970, pp 488-500. Archak, N., Ghose, A., and Ipeirotis, P. G. "Show m e the money!: deriving the pricing power of product features by mining consumer review s," KDD '07 Proceedings of the 13th ACM SIGKDD international conference on Kno wledge discovery and data mining 2007, pp. 56-65. Awad, N. F., and Zhang, J. "Stay out of My Forum! E valuating Firm Involvement in Online Ratings Communities in: 40th Annual Hawaii International Conference on System Sciences (HICSS'07) 2006. Ba, S., and Pavlou, P. A. "Evidence of the Effect o f Trust Building Technology in Electronic Markets: Price Premiums and Buyer Behavi or," MIS Quarterly (26:3) 2002, pp 243-268. Bansal, H. S., and Voyer, P. A. "Word-of-Mouth Proc esses within a Services Purchase Decision Context," Journal of Service Research (3:2) 2000, pp 166-177. Bauer, R. A. "Consumer Behavior as Risk Taking," Dy namic Marketing for a Changing World, Chicago, IL, 1960. Bounie, D., Bourreau, M., Gensollen, M., and Waelbr oeck, P. "The Effect of Online Customer Reviews on Purchasing Decisions: the Case of Video Games," in: ENST 2005.
72 Buttle, F. A. "Word of mouth: understanding and man aging referral marketing," JOURNAL OF STRATEGIC MARKETING (6) 1998, pp 241-254. Cao, Q., Duan, W., and Gan, Q. "Exploring determina nts of voting for the Â“helpfulnessÂ” of online user reviews: A text mining approach," Decision Support Systems (50) 2011, pp 511-521. Chatterjee, P. "Online reviews: do consumers use th em?," ACR 2001 Proceedings, eds. M. C. Gilly and J. Myers-Levy, Provo, UT: Associati on for Consumer Research, 2001, pp. 129-134. Chevalier, J., and Mayzlin, D. "The effect of word of mouth on sales: online book reviews," Journal of Marketing Research (43:3) 2006, pp 345-354. Clark, C. R., Doraszelski, U., and Draganska, M. "T he Effect of Advertising on Brand Awareness and Perceived Quality: An Empirical Inves tigation using Panel Data," Quantitative Marketing and Economics (7:2) 2009, pp 207-236. Clemons, E., Gao, G., and Hitt, L. M. "When Online Reviews Meet Hyperdifferentiation: A Study of the Craft Beer Industry," Journal of Management Information Systems (23:2) 2006, pp 149-171. Cohen, J. Statistical Power Analysis for Behavioral Sciences (2nd ed.) Lawrence Erlbaum Publishing, Hillsdale, NJ, 1998. Coleman, M., and Liau, T. L. "A Computer Readabilit y Formula Designed for Machine Scoring," Journal of Applied Psychology (60:2) 1975, pp 283-284. Cui, G., Lui, H.-k., and Guo, X. "Online Reviews as a Driver of New Product Sales," in: International Conference on Management of e-Commerc e and e-Government 2010. Daft, R. L., and Lengel, R. H. "Information Richnes s: A New Approach to Managerial Behaviour and Organizational Design," Research in Organizational Behaviour (6) 1984, pp 091-233. Dave, K., Lawrence, S., and Pennock, D. M. "Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product R eviews," WWW '03
73 Proceedings of the 12th international conference on World Wide Web 2003, pp. 519-528. Dellarocas, C. "Strategic Manipulation of Internet Opinion Forums: Implications for Consumers and Firms," in: MIT Sloan Working Paper 4501-04 2004. Dellarocas, C., and Narayan, R. "A Statistical Meas ure of a PopulationÂ’s Propensity to Engage in Post-Purchase Online Word-of-Mouth," Statistical Science (21:2) 2006, pp 277-285. Dellarocas, C., Zhang, X., and Awad, N. F. "Explori ng the value of online product reviews in forecasting sales: The case of motion pi ctures," Journal of Interactive Marketing (21:4) 2007, pp 23-45. Dickson, P., and Sawyer, A. "Point of Purchase Beha vior and Price Perceptions of Supermarket Shopper," in: Marketing Science Institute Working Paper Series 1985. Duan, W., Gu, B., and Whinston, A. B. "Do online re views matter? Â— An empirical investigation of panel data," Decision Support Systems (45) 2008, pp 1007-1016. Flesch, R. F. "How to Test Readability," Harper ) 1951. Forman, C., Ghose, A., and Wiesenfeld, B. "Examinin g the Relationship Between Reviews and Sales: The Role of Reviewer Identity Di sclosure in Electronic Markets," Information Systems Research (19:3) 2008, pp 291-313. Friedman, E. J., and Resnick, P. "The Social Cost o f Cheap Pseudonyms," Journal of Economics & Management Strategy (10:2) 2001, pp 173-199. Gao, G., Gu, B., and Lin, M. "The Dynamics of Onlin e Consumer Reviews," in: Workshop on Information Systems and Economics (WISE ) 2006. Ghose, A., and Ipeirotis, P. G. "Designing Ranking Systems for Consumer Reviews: The Impact of Review Subjectivity on Product Sales and Review Quality," in: Proceedings of the International Converence on Deci sion Support System 2004.
74 Godes, D., and Mayzlin, D. "Using Online Conversati ons to Study Word-of-Mouth Communication," Marketing Science (23:4) 2004, pp 545-560. Gunning, R. "The Fog Index After Twenty Years," Journal of Business Communication (6:3) 1969. Hair, J. F., Black, B., Babin, B., Anderson, R. E., and Tatham, R. L. Multivariate Data Analysis (6th Edition) 2005. Hu, M., and Liu, B. "Mining opinion features in cus tomer reviews," AAAI'04 Proceedings of the 19th national conference on Arti fical intelligence 2004. Hu, N., Bose, I., Gao, Y., and Liu, L. "Manipulatio n in digital word-of-mouth: A reality check for book reviews," Decision Support Systems (50) 2011a, pp 627-635. Hu, N., Bose, I., Koh, N. S., and Liu, L. "Manipula tion of online reviews: An analysis of ratings, readability, and sentiments," Decision Support Systems (52) 2012, pp 674-684. Hu, N., Liu, L., and Sambamurthy, V. "Fraud detecti on in online consumer reviews," Decision Support Systems (50) 2011b, pp 614-626. Hu, N., Liu, L., and Zhang, J. J. "Do online review s affect product sales? The role of reviewer characteristics and temporal effects," Inf Technol Manage (9) 2008, pp 201-214. Hu, N., Pavlou, P. A., and Zhang, J. "Can Online Re views Reveal a ProductÂ’s True Quality? Empirical Findings and Analytical Modeling of Online Word-of-Mouth Communication," EC '06 Proceedings of the 7th ACM c onference on Electronic commerce Ann Arbor, Michigan, USA, 2006. Huberty, C. J., and Morris, J. D. "Multivariate Ana lysis Versus Multiple Univariate Analyses," Psychological Bulletin (105:2) 1989, pp 302-308. Infosino, W. J. "Forecasting New Product Sales from Likelihood of Purchase Ratings," Marketing Science (5:4) 1986, pp 372-384.
75 Jacoby, J., Chestnut, R. W., Hoyer, W. D., Sheluga, D. A., and Donahue, M. J. "Psychometric characteristics of behavioral process data: preliminary findings on validity and reliability," Advances in Consumer Research (5) 1978, pp 546-554. Jindal, N., and Liu, B. "Review Spam Detection," WWW 2007 / Poster Paper ) 2007. Jindal, N., and Liu, B. "Opinion Spam and Analysis, in: WSDMÂ’08 Palo Alto, California, USA, 2008. Jsang, A., and Ismail, R. "The Beta Reputation Sys tem," in: 15th Bled Electronic Commerce Conference e-Reality: Constructing the e -Economy Bled, Slovenia, 2002. Knapp, M. L., Hart, R. P., and Dennis, H. S. "An ex ploration of deception as a communication construct," Human Communication Research (1:1) 2006, pp 1229. Koh, N. S., Hu, N., and Clemons, E. K. "Do online r eviews reflect a productÂ’s true perceived quality? An investigation of online movie reviews across cultures," Electronic Commerce Research and Applications ) 2010. Korfiatis, N. "Evaluating Content Quality and Usefu lness of Online Product Reviews," 2008. Lam, S. K., and Riedl, J. "Shilling Recommender Sys tems for Fun and Profit," WWW2004, New York, New York USA, 2004. Laughlin, G. H. M. "SMOG Grading-a New Readability Formula," Journal of Reading (12:8) 1969, pp 639-646. Li, X., and Hitt, L. M. "Self-Selection and Informa tion Role of Online Product Reviews," Information Systems Research (19:4) 2008, pp 456-474. Lim, E.-P., Nguyen, V.-A., Jindal, N., Liu, B., and Lauw, H. W. "Detecting product review spammers using rating behaviors," in: CIKM '10 Proceedings of the 19th ACM international conference on Information and kno wledge management 2010.
76 Liu, B. "Opinion Observer: Analyzing and Comparing Opinions on the Web," WWW Â’05: Proceedings of the 14th international conferen ce on World Wide Web, 2005. Liu, B. "Sentiment Analysis and Subjectivity," in: To appear in Handbook of Natural Language Processing, Second Edition, N.I.a.F.J. Damerau (ed.), 2010. Liu, J., Cao, Y., Lin, C.-Y., Huang, Y., and Zhou, M. "Low-Quality Product Review Detection in Opinion Summarization," Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, 2007, pp. 334-34 2. Liu, Y. "Word of Mouth for Movies: Its Dynamics and Impact on Box Office Revenue," Journal of Marketing (70:July) 2006, pp 74-89. Lutz, R. J., and Reilly, P. J. "An exploration of t he effects of perceived social and performance risk on consumer information acquisitio n," Advances in Consumer Research (1) 1974, pp 393-405. Manning, C., and Schtze, H. "Foundations of Statis tical Natural Language Processing," MIT Press ) 1999. Mayzlin, D. "Promotional Chat on the Internet," Marketing Science (25:2) 2006, pp 155163. McClave, J. T., Benson, P. G., and Sincich, T. "Sta tistics For Business And Economics 10ed," 2008, pp. 500-505. Moe, W. W. "How Much Does A Good Product Rating Hel p A Bad Product? Modeling The Role Of Product Quality In The Relationship Bet ween Online Consumer Ratings And Sales," in: Working Paper University Of Maryland College Park, 2009. Moe, W. W., and Trusov, M. "Measuring the Value of Social Dynamics in Online Product Ratings Forums," Journal of Marketing Research (48:3) 2011, pp 444456. Moffitt, K., and Burns, M. B. "What Does That Mean? Investigating Obfuscation and Readability Cues as Indicators of Deception in Frau dulent Financial Reports," in: AMCIS 2009 Proceedings 2009.
77 Moorthy, S., and Zhao, H. "Advertising Spending and Perceived Quality," Marketing Letters (11:3) 2000, pp 221-233. Mukherjee, A., Liu, B., and Glance, N. "Spotting Fa ke Reviewer Groups in Consumer Reviews," International World Wide Web Conference C ommittee (IW3C2), Lyon, France, 2012. Newman, M. L., Pennebaker, J. W., Berry, D. S., and Richards, J. M. "Lying Words: Predicting Deception From Linguistic Styles," Personality and Social Psychology Bulletin (29:5) 2003, pp 665-675. OÂ’Mahony, M. P., and Smyth, B. "Using Readability T ests to Predict Helpful Product Reviews," Proceeding RIAO '10 Adaptivity, Personali zation and Fusion of Heterogeneous Information Paris, France, 2010. Oliver, R. L. "Expectancy Theory Predictions of Sal esmen's Performance," Journal of Marketing Research (11:3) 1974, pp 243-253. Pang, B., and Lee, L. "A sentimental education: sen timent analysis using subjectivity summarization based on minimum cuts," ACL '04 Proce edings of the 42nd Annual Meeting on Association for Computational Lin guistics, 2004a. Pang, B., and Lee, L. "A Sentimental Education: Sen timent Analysis Using Subjectivity Summarization Based on Minimum Cuts Proceedings o f the ACL, 2004b, pp. 271-278. Pang, B., and Lee, L. "Opinion mining and sentiment analysis," Foundations and Trends in Information Retrieval (2:1-2) 2008, pp 1-135. Parasuraman, A., ZelthamI, V. A., and Berry, L. L. "A Conceptual Model Of Service Quality And Its Implications For Future Research," Journal of Marketing (49:Fall) 1985. Pavlou, P. A., and Dimoka, A. "The Nature and Role of Feedback Text Comments in Online Marketplaces: Implications for Trust Buildin g, Price Premiums, and Seller Differentiation," Information Systems Research (17:4) 2006, pp 392-414.
78 Popescu, A.-M., and Etzioni, O. "Extracting product features and opinions from reviews," HLT '05 Proceedings of the conference on Human Lang uage Technology and Empirical Methods in Natural Language Processing, 2 005, pp. 339-346. Prawesh, S., and Padmanbhan, B. "Manipulation Resis tance in Feedback Models of TopN Recommenders," WITS 2012, Orlando, FL, 2012. Resnick, P., Zeckhauser, R., Friedman, E., and Kuwa bara, K. "Reputation systems," in: Communications of the ACM 2000. Ross, I. "Perceived risk and consumer behavior: a c ritical review," Advances in Consumer Research (2) 1975, pp 1-20. Ruxton, G. D., and Neuhuser, M. "When should we use one-tailed hypothesis testing?," Methods in Ecology and Evolution (1) 2010, pp 114-117. Sakunkoo, P., and Sakunkoo, N. "Analysis of Social Influence in Online Book Reviews," Proceedings of the Third International ICWSM Confer ence, 2009. Sawyer, A. G. "Demand Artifacts in Laboratory Exper iments in Consumer Research," Journal of Consumer Research (1:4) 1975, pp 20-30. Schlosser, A. "Posting Versus Lurking: Communicatin g in a Multiple Audience Context," Journal of Consumer Research (23:September) 2005, pp 260-265. Sun, M. J. "How Does Variance of Product Ratings Ma tter?,") 2008. Tsiotsou, R. "Perceived Quality Levels and their Re lation to Involvement, Satisfaction, and Purchase Intentions," Marketing Bulletin (16) 2005. Urban, G. L. "Customer Advocacy: A New Era in Marke ting?," Journal of Public Policy & Marketing (24:1) 2005, pp 155-159. Vartapetiance, A., and Gillam, L. "Â“I DonÂ’t Know Wh ere He is NotÂ”: Does Deception Research yet offer a basis for Deception Detectives ?," Proceedings of the EACL 2012 Workshop on Computational Approaches to Decept ion Detection, Avignon, France, 2012, pp. 5-14.
79 Viscusi, W. K. "Market Incentives for Criminal Beha vior," in: The Black Youth Employment Crisis University of Chicago Press, 1986, pp. 301-351. Wu, G., Greene, D., Smyth, B., and Cunningham, P. Distortion as a Validation Criterion in the Identification of Suspicious Reviews," in: 1st Workshop on Social Media Analytics (SOMA Â’10) Washington, DC, USA, 2010. Wyatt, R. O., and Badger, D. P. "How Reviews Affect Film Interest and Evaluation," in: Annual Meeting of the Association for Education in Journalism and Mass Communication Gainesville, FL, 1984. Ye, Q., Law, R., and Gu, B. "The impact of online u ser reviews on hotel room sales," International Journal of Hospitality Management (28:180-182) 2009. Zakaluk, B. L., and Samuels, S. J. Readability: its past, present, and future International Reading Association, 1988. Zeithaml, V. A. "Consumer Perceptions of Price, Qua lity, and Value: A Means-End Model and Synthesis of Evidence," The Journal of Marketing (52:3) 1988, pp 222.
80 Appendix A. The website for shill collection This website has 3 sections Picture 1 Contact information and Term & Instructio ns The students are asked to provide full Name and cla ss info and email address. The class information is used to identify which class the student attends for the purpose of allocating course credit The student must use @ucdenver.edu email address to participate in this study. The purpose of the email address is to uniquely identify the stude nt and communicate about the monetary reward. Even though we already change the brand name, the p roduct name and the model number of the product, we specifically ask to student not to look for the product on the internet.
81 We ask the reviewer to intentionally submit a posit ive review for the product even though they might never use this produ ct before. ThereÂ’s no instruction about what the content of the review sh ould be. Picture 2 Product information Two pictures of the product The product specifications. No price is shown
82 Picture 3 The review The reviewers can rate the product (1 star Â– 5 star ) The length of the review title is limited to 128 ch aracters. This limitation is similar to Amazon.com review title length requir ement. The length of the text comment is unlimited.
83 B. The product evaluation website $rnrn,nrPicture 4 Contact information Each student can participate in this experiment onl y one time. Every student at the University of Colorado Denver has on e unique @ucdenver.edu email address. We use this email addr ess to limit the participation of the students. The participants are not told that there are shill reviews in the review set. However, they are informed that the content of the reviews was not verified. The participants are specifically asked not to look for the product information somewhere else during the participation of this experiment
84 Picture 5 Product information We control for the factors that might impact produc t quality perception of the consumer. o Brand name: the brand name was changed o Advertising: Two pictures of the product The product specifications. No additional advertisi ng information available. o Price: No price is shown.
85 Picture 6 The overall reputation information In this section, the participants are shown the ove rall reputation information of the product: o Average rating (valence) o Quantity of reviews (volume) o Rating distribution (variance). With only the overall reputation information, the p articipants are asked to give their opinion about the quality of the product At this moment, the text content of the review is h idden. The participants must answer the question about product quality perc eption before moving on to the next section.
86 Picture 7 Full review To expand a review the participant must click on it title. Once a review is expanded, we clock the time the pa rticipant reads that review. The clock of that review stop once the part icipant expand another review or done reading by moving on to the next sec tion. After reading the review, the participant must rate its helpfulness. The participant must read at least one review. The participants are not required to read all the reviews. The order of appearance of the reviews is random. Picture 8 Product quality evaluation after reading the reviews
87 The participants are asked to evaluate the product quality one more time after reading the reviews. Then, the participants must answer all the question s in a short survey before finishing their participation. n nrr rn nnr nr nr n rn nr !"#$%!& '$r(%"r r%"r )%%"r nrrrrr nrrr *n rn
88 #&r +,n!r #n!r r )% nr!nr"r! n )r -.$n #$rnr%n r ,r/01 ,r/01 ,r/01 rn,r/01 &nrr'' *r 2r3 4n (n'rr'rr %'r 2r3 4n )$rn'rrnr !r%'r *n r!0 r!0 r!0 r!0 r!0 rnr!0 $rr'rr r!'r*n -.r3 !"r %!"r #!"3 !"r *,r #!"3 "r 5"r #rn"3 "r +rr'rrn'r' !'r*n -.r3 !"r %!"r #!"3 !"r *,r #!"3 "r 5"r #rn"3 "r ,rr'rrn'r' !'r*n -.r3 %!"r #!"3*,r #!"3 5"r #rn"3
89 !"r !"r "r "r -''r%n -.r3 !"r %!"r #!"3 !"r *,r #!"3 "r 5"r #rn"3 "r ''r% n'nr -.r3 !"r %!"r #!"3 !"r *,r #!"3 "r 5"r #rn"3 "r