The Role of Fake Review Detection in Managing Online Corporate

Reputation

R. E. Loke

and Z. Kisoen

Centre for Market Insights, Amsterdam University of Applied Sciences, Amsterdam, The Netherlands

Keywords: Online Reviews, Feature Extraction, Spam Detection, Supervised Machine Learning, Strong Corporate

Reputation Management.

Abstract: In a recent official statement, Google highlighted the negative effects of fake reviews on review websites and

specifically requested companies not to buy and users not to accept payments to provide fake reviews (Google,

2019). Also, governmental authorities started acting against organisations that show to have a high number

of fake reviews on their apps (DigitalTrends, 2018; Gov UK, 2020; ACM, 2017). However, while the

phenomenon of fake reviews is well-known in industries as online journalism and business and travel portals,

it remains a difficult challenge in software engineering (Martens & Maalej, 2019). Fake reviews threaten the

reputation of an organisation and lead to a disvalued source to determine the public opinion about brands.

Negative fake reviews can lead to confusion for customers and a loss of sales. Positive fake reviews might

also lead to wrong insights about real users’ needs and requirements. Although fake reviews have been studied

for a while now, there are only a limited number of spam detection models available for companies to protect

their corporate reputation. Especially in times with the coronavirus, organisations need to put extra focus on

online presence and limit the amount of negative input that affects their competitive position which can even

lead to business loss. Given state-of-the-art derived features that can be engineered from review texts, a spam

detector based on supervised machine learning is derived in an experiment that performs quite well on the

well-known Amazon Mechanical Turk dataset.

1 INTRODUCTION

The last few months have changed the landscape of

the world drastically (McGrath & Ross, 2020). The

outbreak of COVID-19, or the coronavirus, is already

stamped as a human tragedy and has a growing

impact on the global economy. To sustain, especially

the business industry is facing a huge number of

challenges to cope with (Gerdeman, 2020). Iansiti et

al. (2020) state that business leaders all over the world

are struggling with a wide variety of problems from

decreasing sales and stalling supply chains to keeping

employees safe and ensure that the operational core

can continue operating without too many obstacles

from the coronavirus. Another recently published

study from McKinsey (2020) shows that although the

coronavirus has caused the biggest quarterly drops of

shares since 1987, a record of unemployment claims

and a crude drop of oil prices globally, it has turned

more people to technology than ever. Governments

https://orcid.org/0000-0002-7168-090X

around the world have urged people to work from

home where possible, this together with the lockdown

measures leads to a new way of using technologies in

our daily lives. According to the Dutch Institute of

International Relations (2020), “COVID-19 is a

digital pandemic in terms of its origin, and it is also

one in its effects”. As workplaces instruct employees

to work from home, universities shift fully to online

teaching and the restaurant industry transitions faster

than before to online ordering and delivering; one of

the most rapid organizational transformations in the

history of the modern firm is happening right now

(Iansiti & Richards, 2020). In this huge digital

transformation, organizations are forced to move to a

fundamentally new operating architecture based on

software, data, and digital networks. With more

digitally at stake for organisations, the online

corporate reputation has become more important than

ever and can mean the deal breaker between surviving

in times with the coronavirus or not.

Loke, R. and Kisoen, Z.

The Role of Fake Review Detection in Managing Online Corporate Reputation.

DOI: 10.5220/0011144600003269

In Proceedings of the 11th International Conference on Data Science, Technology and Applications (DATA 2022), pages 245-256

ISBN: 978-989-758-583-8; ISSN: 2184-285X

245

According to Chandler (2020), the coronavirus

has driven a massive rise in the use of technology

globally. In their recently published article it is stated

that “the coronavirus boosted online spending and

usage in Q1 of 2020 to the highest in history”. It also

shows that digital platforms are thriving as consumers

seek more entertainment, shopping opportunities and

new ways of connecting during the crisis. This

increase of online behavior generates more data for

organizations to work with improving their online

corporate reputation. More organisations start to

realize the importance of having an online strategy

and strong digital visibility as part of corporate

reputation. In times of the corona pandemic,

organisations rely more than ever on strong online

presence in terms of their websites and apps (Lincoln,

2020). Recent research from The New York Times

(2020) stated that people are spending almost one

hour a day extra on websites since the outbreak of

COVID-19. This means that an important way to

reach a broader audience is by having a multi-channel

strategy including an app, social media pages and

websites. However, with more organisations

strengthening their digital strategies; the online

market becomes more crowded in terms of

competitors. Also, organisations that shift from a

traditional marketing toolbox to multi-channel

become more vulnerable in terms of corporate

reputation. The rise of social media and reviewing

websites has empowered consumers and weakened

the position of organisations by exposing them to

negative publicity, customer attacks and reputation

damage (Horn, Taros & Dirkes, 2015). In order to

provide a very actual and up-to-date research, this

study will focus on the rising concern of fake reviews

and its relationship with corporate reputation.

Fake reviews can quite easily be written by

anyone on the Internet. Martens and Maalej (2019)

state that reviews as a feedback form is often used by

managers to prepare organisations for business

decisions and to measure corporate reputation of

organisations. Research shows that positive feedback

improves app downloads, sales and the reputation of

the company. However, as a side effect, a market for

fake reviews has emerged which can turn into very

negative consequences for organisations (Martens &

Maalej, 2019). For several years now, there has been

done extensive research on the effects of negative and

fake reviews on online corporate reputation. Many

researchers indicate that small insignificant

comments or reviews can have a far-reaching impact

on an organisation (DiMauro & Bulmer, 2014).

According to Otar (2018), negative and fake reviews

can damage corporate reputation online and business

growth. Stats show that only four negative or fake

reviews can cost an organisation 70% of potential

customers (Otar, 2018). Especially fake reviews are

recognized as a real challenge by both the research

community and the e-commerce industry. As many

giant app stores as Google and Apple try to combat

against fake reviews, almost 15-30% of all reviews

are estimated to be fake per product or service

(Barbado et al., 2019). Therefore, fake reviews in app

stores can be seen as an actual, critical business

problem that affects all layers of businesses.

Fake review detection has been a hot topic in

research and industry for many years now (Li, Lui &

Qin, 2018). However, it remains interesting to

analyse the background and effect of fake reviews in

business and, because of a generally noted low

accuracy of detecting fake reviews by people, how

these can be detected using machine learning

methods. With the rising market for apps,

organisations have become more vulnerable to user

feedback in form of app ratings and reviews. As

research shows that even a single fake review can

have a significant impact on business, it will be

important to take this problem seriously and analyse

it below in a survey in more detail. In addition, the

outcomes of an experiment that was conducted are

reported on below and made available for other

companies in order to tackle the issue of fake app

reviews.

The remainder of the paper below has been

logically structured into sections on literature review

(Section 2), research methodology (Section 3), results

(Section 4), discusion (Section 5), and conclusions

(Section 6).

2 LITERATURE REVIEW

In this section, we first give an in-depth description

and background of corporate reputation and then

explain and discuss on developments in the

phenomenon of fake reviews that can be related to

online corporate reputation. Thereafter, we stipulate a

preliminary conceptual model and report on common

spam review detection techniques.

The aim is to provide an overview of current

state-of-the-art knowledge addressing relevant

theories, methods, and unforeseen gaps in existing

research.

2.1 Corporate Reputation

The definition of corporate reputation has been

widely discussed over the years in the research

DATA 2022 - 11th International Conference on Data Science, Technology and Applications

246

industry and is in continual change. Although it is a

hot topic, this concept is still vague and has many

different definitions that sometimes even contradict

each other. According to Giovanni (2010, p.74), “the

reputation of a company can be considered one of the

most valued organizational assets”. Chun (2005) and

Dowling (2016) both agree that corporate reputation

has one aligning element; the term is often described

as a reflection of the company to insiders and

outsiders. Also, corporate reputation is often linked

with terms as corporate identity, corporate image, and

corporate goodwill (Wartick, 2002; Barnett et al.,

2006).

For this study, it will be important to set one

straight direction for corporate reputation; therefore,

the definition from Fombrun and van Riel (1997) will

be maintained throughout the paper. According to

their early days research corporate reputation can be

identified as “a perceptual representation of a

company’s past actions and future prospects that

describes the firm’s overall appeal to all of its key

constituents when compared with other leading

rivals”. Another important finding that comes across

in most academic papers on corporate reputation is

that many researchers define corporate reputation as

a collective concept; it is seen as the sum of the

perception of external stakeholders (Barbado et al.,

2019; Barnett et al., 2006; Horn et al., 2015). Chun

(2005) states that corporate reputation can be seen as

an umbrella construct for corporate image and

corporate identity.

Figure 1: Key elements of corporate reputation (Chun,

2005).

Figure 1 shows the statement from Chun (2005)

that identity, desired identity, and image are partly

independent variables that form corporate reputation.

Image can be described as the perception of others of

a company or how it is formulated “how others see

us” (Chun, 2005). Identity can be described as an

internal view of the company what means how

members of the organization perceive, feel, and think

about the company (“how we see ourselves”).

Desired identity describes how an organisation wants

to be perceived which refers to the name, logo,

symbol as well as strategic actions and philosophy

(“how we want others to perceive ourselves”). The

gap in the middle represents how an organisation is

being perceived internally and externally, as well as

how it wants to be perceived (Chun, 2005). A wide

gap indicates inconsistencies in strategy or

communication and can damage the corporate

reputation of an organisation. Walker (2010) states

that alignment between these variables can lead to

strategic benefits, such as increasing profitability,

lower costs, and a competitive advantage.

We will now discuss on the relevant concepts of

electronic worth-of-mouth (EWOM), online

corporate reputation and corporate reputation

management.

Table 1: Touch points of EWOM (adapted from Mishra &

Satish, 2016).

Stage Touch points of EWOM

Problem or interest External stimuli (ads on

websites, social media

personalization and

recommendations)

Information search Search engines, social media,

product websites, e-retailers

Evaluation of

alternatives

Websites with compare

options, social media for

feedback, online reviews, and

rating websites

Purchase decision Channels (e-commerce

websites), discussion and

feedback on social media

Post-purchase

behaviour

Review sites, social media,

online rating and reviews,

feedback on social media or

product sites

2.1.1 EWOM (Electronic Word-of-Mouth)

Internet and social media platforms have added a new

element to the traditional word-of-mouth (WOM)

term. Electronic word-of-mouth, or EWOM, refers to

any positive or negative content made by potential,

actual, or previous customers about a product or

company, which is made available to an audience of

people and institutions via the web 2.0 (Mishra &

Satish, 2016). EWOM is expressed in different forms

of communication such as opinions, online ratings,

online feedback, reviews, comments, and experience

sharing via online communication channels.

According to a study from Mishra & Satish (2016) on

EWOM, it plays a critical factor in marketing efforts

The Role of Fake Review Detection in Managing Online Corporate Reputation

247

and has an impact on different stages in the consumer

purchase decision process. Table 1 shows how

consumers are in touch with EWOM during the

purchasing process (Mishra & Satish, 2016; Dewey,

1910).

Although there seems to be a clear link between

EWOM and corporate reputation, there is little

literature on this connection. Hoyer and Macinnis

(2001) found out that WOM is the most credible and

objective influence on corporate reputation. Other

researchers agree that in meeting or exceeding

customers’ expectations, customer satisfaction is

achieved, EWOM is uttered, and good reputations are

built (Davies et al. 2010). However, the corporate

reputation of companies is considered fragile; while

it may take time to build, it can be easily destroyed.

2.1.2 Online Corporate Reputation

A concept that often simultaneously appears with

corporate reputation is online reputation. According

to Jones, Temperley and Lima (2010), “online

corporate reputation is a reputation, which involves a

corporate reputation created in the online

environment”. Online reputation is not only created

on social media but is also created by groups of

people sharing and collaborating online and through

search engines as Google, Ask and Yahoo (Weber

Shandwick & KRC Research, 2019). In this digital

era, online corporate reputation is as important as

offline reputation (Abimbola & Vallaster, 2009). The

emergence of social media platforms and review

websites allows people to have new tools to

publically judge companies at a much greater and

faster pace than before. On these platforms,

consumers do not only discuss content from

companies, but they also create it (Barnett et al.,

2006). Fournier and Avery (2011) have defined social

media as “a venue for open-source branding” in

which consumers can co-create the nature of

reputations of a brand. Companies try to influence

this process of co-creation by creating solid online

presence and strong online marketing strategies. The

online presence, according to Waters et al. (2009),

“offers various benefits to companies like the

opportunity to communicate directly with customers,

strengthen relationships, stimulate co-creation and to

assess consumer’s brand attitudes”. Nowadays,

companies experience more pressure from outside to

take part in online conversations that influence

corporate reputation. Therefore, the online corporate

reputation is associated with increased loss of control

and increased need for active monitoring (Gensler et

al., 2015).

2.1.3 Corporate Reputation Management

Since the overall goal of this research is to contribute

to a good online reputation management for

companies (for example, by emphasizing genuine

reviews in EWOM to consumers and eliminating fake

ones), it is important to understand the meaning

behind reputation management.

According to Hutton et al. (2001), reputation

management, which is considered a business

function, is based on the traditional term “public

relations”, or also known as “corporate affairs”. Beal

& Strauss (2009) state that online reputation

management is placed between marketing

communications, public relations, and search engine

optimization (SEO). Jones et al. (2010) agree with

this definition as they list: “online reputation

management is the process of positioning,

monitoring, measuring, talking and listening as the

organization engages in a transparent and ethical

dialogue with its various online stakeholders”. What

comes across from different literature is that to build

and maintain corporate reputation, it is important for

a company to understand who its stakeholders are and

how they perceive the company (Beal & Strauss,

2009). This can be linked to the umbrella theory of

Chun (2005) and is aligned with the perception that

reputation is formed by a collective perception of

different individuals. The more the perceptions of

several individuals are aligned with each other, the

stronger the corporate reputation of a company

(Gensler et al., 2015).

When looking at how corporate reputation can

best be maintained, research from Page and Fearn

(2005) indicates that organisations should focus on

aligning the perceptions of different stakeholders. To

do so, organisations should focus on clear

communication about leadership and successes of the

organisation and the organisation’s perspective on

consumer fairness in advertisement, marketing,

websites, reviews, and other forms of

communication. To go more in-depth on this: the

reputation of an organisation is reflected by the

leadership style and its successes from the CEO. A

clear example of this is Tesla, an automotive

company that is mainly known for its famous CEO,

Elon Musk. The reputation is also reflected by

consumer fairness including the fair treatment of

consumers regarding pricing, quality of products and

services and transparency in advertisement which

also includes reviews.

To conclude on reputation management, literature

indicates that it is important for organisations to

measure, monitor and co-ordinate the different

DATA 2022 - 11th International Conference on Data Science, Technology and Applications

248

stakeholder reputations with the overall goal to align

these as much as possible. Page and Fearn (2005) and

(Gensler et al., 2015) emphasize strongly that the

more different stakeholder reputations are similar, the

stronger the corporate reputation of an organisation

is. To create alignment, organisations should focus on

creating clear and transparent messages with regards

to leadership style, successes of an organisation,

advertisement and marketing communication. It is

important for an organisation to be authentic and

transparent towards all its stakeholders.

2.2 The Role of Fake Reviews in

Corporate Reputation

In today’s tech-savvy world, review websites, social

media and mobile applications have become the most

important source for consumers to express

themselves. It is considered very easy for people to

share their views about products and services using e-

commerce websites as TripAdvisor and Trustpilot,

forums and blogs (Hussain, Mirza, Rasool, Hussain

& Kaleem, 2019). In app stores in particularly, users

can rate downloaded apps on a scale from 1 to 5 stars

and write a review message in which they can express

satisfaction, report bugs, or make suggestions

(Martens & Maalej, 2019). A recent study on online

consumer buying behaviour confirms the statement

that most people read these reviews about products

and services before buying them (Xhema, 2019). In

case of apps, consumers often read through the

reviews before deciding to download the app.

Harman, Jia, and Zhang (2012) identified in their

research that there is a positive relationship between

the number of positive ratings and reviews to sales

and download ranks of apps. As is stated, “stable

numerous ratings lead to higher downloads and sales

numbers”, which will have a positive effect on

corporate reputation (Barnett et al., 2006).

As a result of the positive connection between

reviews and sales, a new illegal market that is focused

on producing fake reviews has emerged. The

phenomenon of producing fake reviews on products

and services with the goal to boost sales is also

referred to in academic studies as “spam attack”

(Hussain et al., 2019). In regular situations, real users

are motivated by their satisfaction level to provide

feedback on apps; however, fake reviewers get paid

or similarly rewarded to submit reviews (Martens &

Maalej, 2019). An important distinction between real

users and fake reviewers is that fake reviewers might

not even be real app users and thus their reviews

might not be truly reflecting honest opinions.

According to Martens and Maalej (2019), fake

reviews can be defined as non-spontaneous, requested

and rewarded. Another definition states that a fake

review is a positive, neutral, or negative review that

is not an actual consumer’s honest and impartial

opinion or that does not reflect a consumer’s genuine

experience of a product, service, or business

(Fontanarava et al., 2017).

Many studies agree that fake reviews have a

negative effect on the online corporate reputation of a

company (Horn et al., 2015; Barbado et al., 2019;

Xhema, 2019; Hussain et al., 2020). One of the main

issues with opinion sharing websites and apps is that

fake reviews can easily create hype about a particular

product based on misleading information. These fake

reviews can become the key factor for consumers in

their buying decision and thus lead to negative

financial consequences. Although it seems clear for

people that not everything on the Internet is

believable, research shows that almost 84% of

consumers consider online reviews to be as

trustworthy as personal recommendations. However,

for organisations to make use of fake reviews or to

have fake reviewers can harm the corporate

reputation by creating false expectations. Also, true

reviews can help organisations learn where to

improve and can be beneficial in increasing success

for business. Secondly, if an organisation gets caught

buying fake reviews for its own products or for

decreasing the value of those of its competitors, it will

lead to much more reputation loss than it possibly

would gain. An example from 2013 is Samsung

which was fined for paying people to negatively

review HTC products. Another example is a report

from BBC that showed that fake online reviews get

openly bought and sold and that shoppers often can

get products for free in return for fake reviews.

We now first underline why it can be extremely

important for a company to focus on strong corporate

reputation management and then elaborate on the role

of fake reviews in consumer buying behavior that can

be related to corporate reputation.

2.2.1 Benefits of Strong Corporate

Reputation Management

The above-mentioned examples indicate what can

happen if organisations do not put effort in strong

reputation management and alignment of stakeholder

reputations as discussed by Page and Fearn (2005).

Positive reputation can strengthen the overall

performance of an organisation, while negative

reputation is considered a competitive disadvantage

(Aula, 2010).

The Role of Fake Review Detection in Managing Online Corporate Reputation

249

According to Helm and Klode (2007), there are

five major benefits that strong corporate reputation

can bring to an organisation. These are as follows: (1)

Increased financial performance; (2) Greater

competiveness; (3) Higher satisfaction and loyalty

among consumers; (4) Attract and retain employees;

(5) Support in crisis. Some explaining notes on this:

Firstly, the first benefit logically can result in an

increased stock value. According to Helm and Klode

(2007), a strong reputation limits risks for investors,

who are more willing to spend money on the

organisation. Secondly, the second benefit goes hand-

in-hand with increased financial performance. Helm

and Klode (2007) identify that organisations with

strong corporate reputation can easily charge higher

prices due to the fact that consumers perceive the

quality of products and services as better. Thirdly,

several studies indicate that a good corporate

reputation can increase benefit number (3) (Helm and

Klode, 2007; Chun, 2005; van Riel & Fountain;

2008). Fourthly, a positive company image attracts

more highly skilled employees, hence, benefit

number (4) (Helm & Klode, 2007). Lastly, according

to Helm and Klode (2007), in times of crisis for an

organisation, a positive reputation can help

companies to overcome economic consequences.

Organisations with a strong image experience less

market decline compared to organisations with a

weak reputation (van Riel & Fombrun, 2008).

To conclude, strong corporate reputation can

bring several major benefits to an organisation. These

benefits are linked to financial, strategic, and

competitive advantages that all have a positive effect

on the performance of an organisation. Therefore, it

is highly advisable and important for an organisation

to focus on strengthening its corporate reputation and

on limiting threats as fake reviews.

2.2.2 Fake Reviews in Consumer Buying

Behaviour

A study from Constantinides and Fountain (2008)

describes relationships when consumers are exposed

to information about organisations. There are four

identified stimulating factors, A, B, C and D, see

Figure 2, that each affect the purchasing decision.

Although purchasing behaviour should be threatened

separately from corporate reputation, it is important

to describe the theory from Constantinides and

Fountain (2008) in order to emphasize the role that

fake reviews play in purchasing behaviour.

Organisations that use fake reviews, attempt to make

from stimuli D a controllable stimulating factor.

Since Constantinides and Fountain (2008) postulate

that all stimulating factors are equally distributed, this

explains why organisations with bad reputations, as

part of their sales strategies, focus on making the

uncontrollable controllable (Grutzmacher, 2011).

Figure 2: Four stimuli on consumer behaviour

(Constantinides and Fountain, 2008).

2.3 Conceptual Model

The goal of the conceptual model that we postulate is

to visualize the concepts in this study and indicate the

modeling playground between fake reviews and

corporate reputation. The model, inspired by

Fombrun (1997), shows how several variables that

have been identified in the above literature frame are

related to each other and eventually create corporate

reputation; see Figure 3.

According to Dowling (2016), firstly, corporate

identity is, in short, how people recognize an

organisation. Secondly, corporate image is defined as

“a set of beliefs and feelings an audience has about an

organization”. This all leads to corporate reputation,

that is formed by the judgement about the

organisation’s attributes as is indicated in the

conceptual model.

Figure 3: Fake reviews and corporate reputation: a

conceptual framework that we propose in this paper that has

been derived from scientific literature (see text in 2.3).

DATA 2022 - 11th International Conference on Data Science, Technology and Applications

250

We stipulate that (fake) reviews play an important

role on the perception of the customer and community

image.

2.4 Spam Review Detection

As fake reviews are becoming much more of a

problem with more review websites popping up and

with consumers’ ability to produce feedback at any

time, demand for spam detection methods is rising.

As we have discussed, such methods are needed for

strong corporate reputation management. However,

as much more research has recently appeared on the

topic of spam detection, the practical implication

seems to remain a challenge. Major review websites

as Yelp and Amazon have already taken first steps in

detection of fake reviews on their websites; however

there seems to be a lot of room for improvement. For

instance, Hussain et al. (2019) researched several

spam detection techniques. According to their paper,

spam detection consists out of the following steps: (1)

Gather a review dataset; (2) Select feature

engineering; (3) Apply, for example, machine

learning techniques. Below, each of these three steps

will be separately discussed in depth in order to

generate useful findings for implementing a spam

detection model in the experiment that we set up.

2.4.1 Gathering a Review Dataset

To be able to set up a machine learning model for

review spam detection, it is important to have a

dataset to work with. However, in terms of spam

detection it is considered difficult to find an available,

labelled dataset (Hussain et al., 2019). A prior

inventory on spam detection models indicates that

there is only one labelled hotel review dataset

available that includes review text and has no other

features available (Kaggle, 2020). Many of the

studies that analyze spam detection methods do not

publish used datasets publicly, which makes it

difficult for new researchers to continue to optimize

and improve on spam detection models. It can be

stated that after researching multiple studies on spam

detection, only a limited number of labelled datasets

are available which is contradicting the high current

urgency for spam detection methods in society.

2.4.2 Feature Engineering

According to Hussain et al. (2019), the linguistic

approach is the most common approach for feature

extraction from review datasets. As they explain in

their research, this approach focuses on review text

and includes data pre-processing, tokenization,

transformation, and feature selection. In the next

section, in Section 3, an experimental setup of how all

these practical steps can be executed for spam

detection will be given; we will now proceed with

discussing step 4 which is the most crucial step

because it has the most significant effect on the

performance of spam detection models. Previous

research on feature selection, according to Hussain et

al. (2019), shows that the following spammer features

are used to detect spam and non-spam reviews: (1)

Maximum number of reviews: previous research

indicates that spammers write often more than one

review per day. (2) Percentage of positive reviews:

most spammers write positive and favourable

reviews; therefore, a high percentage of positive

reviews could indicate spam reviews. (3) Review

length: most spammers do not write very lengthy

reviews with a lot of details. Therefore, short reviews

can indicate spam reviews. (4) Reviewer deviation:

spammers give often very high ratings, therefore this

rating deviates from the average review rating. (5)

Maximum content similarity: research shows that

similar reviews are used for multiple products and

services over different organisations.

After analyzing several sources in the study of

Hussain et al. (2019) it shows that the linguistic

approach holds the highest accuracy in terms of spam

detection methods. However, it all depends on the

feature selection process as features become the input

for the actual spam review detection method that

might be in place.

Figure 4: Taxonomy of spam review detection techniques

(Hussain et al., 2019, p. 13).

2.4.3 Machine Learning Techniques

To be able to classify reviews in the two classes of

spam and non-spam, it will be needed to choose the

appropriate classification model. Hussain et al.

(2019) published a taxonomy of spam detection

techniques (Figure 4). It was created to enable other

researchers “to classify existing approaches and to

The Role of Fake Review Detection in Managing Online Corporate Reputation

251

figure out the most appropriate technique to solve a

spam detection problem”. Spam detection models fall

into two categories (see again, Figure 4): machine-

learning based methods and lexicon-based approach.

The first approach can be classified into supervised

and unsupervised learning. Research shows that the

accuracy of supervised learning in terms of Support

Vector Machine and Naïve Bayes is best; for

unsupervised learning, Aspect Based, and K-Nearest

Neighbour is best. In this paper, the focus will lie on

machine learning techniques, therefore, the Lexicon-

based approach will not be further discussed. For an

overview of accuracy rate per approach, please refer

to Hussain et al. (2019).

3 RESEARCH METHODOLOGY

We adopted an exploratory research methodology, as

we intended to generate general insights about the

fake review problem that the business industry is

currently dealing with in society. Of course, we held

our main drive that is targeted towards the relevancy

of online corporate reputation for e-commerce in the

back of our minds.

Figure 5 graphically represents our research

methodology that consisted of data collection,

preparation, and analysis processes. Below, we give

some more detail about the datasets that we employed

as well as how we concretely implemented our data

processing and machine learning.

Figure 5: Research design process.

3.1 Datasets

Our main dataset was obtained from the open-source

data platform Kaggle (2020). This is a well-known

hotel reviews dataset from Amazon Mechanical Turk

in combination with TripAdvisor. It has been created

for researchers to provide new solutions on the fake

review issue and to develop and test new spam

detection models. During the production of this

(supervised) dataset, a group of people were paid to

write 400 fake positive and 400 fake negative reviews

about hotel experiences. These deceptive reviews

were added together with 800 genuine, thruthfull

reviews (again, 400 positive and 400 negative). The

total number of reviews is hence 1600. Positive and

negative refer to the sentiment in a review text. The

dataset was obtained in April 2020, via the Kaggle

website.

As a good common practice, the dataset was

descriptively analysed on any differences between the

groups deceptive versus truthful, negative versus

positive, and TripAdvisor versus non-TripAdvisor.

We found no significant statistical difference (p-value

= 0.2) for the length of words in deceptive and truthful

reviews, but significant statistical difference (p-value

= 0) for the length of words in negative and positive

reviews as well as in TripAdvisor and non-

TripAdvisor reviews.

In addition, with the goal to test and apply our

algorithms to other datasets that are relevant for many

businesses, a scraper was built to crawl about 200.000

product reviews of eight food and beverage suppliers

from the review site Google Play. For this scraper, an

algorithm was built in Python by using the package

Google Play Scraper. However, it was too

challenging and too costly to turn this dataset into a

supervised dataset with spam identification labels that

we needed to feed our supervised machine learning

algorithms.

3.2 Data Processing

After understanding the dataset and the structure of

the data, the next step was to process the data.

Non meaningful stop words were removed from

reviews using a natural language toolkit library.

It is important to explain how relevant review text

features were computed. First, we logically computed

the length of words variable that we already have

mentioned in the previous section. Second, we

included the sentiment polarity (positive or negative).

Third, since Ott et al. (2011) state that in review

classification there is a large difference between

informative and imaginative writing, namely that the

former typically consists of more nouns, adjectives,

prepositions, determiners, and coordinating

conjunctions, while the latter includes more verbs,

adverbs, pronouns, and pre-determiners, for each word

in a review, Parts of Speech components were

extracted to be able to feed this as a feature vector in

the machine learning model. Fourth, we experimented

with weighting meaningful words to form topics.

DATA 2022 - 11th International Conference on Data Science, Technology and Applications

252

All review text data were vectorized using

TfidfVectorizer.

3.3 Machine Learning

Once relevant features were extracted, it was time to

split the data into a training and test set. We used the

following, common split: 80% training and 20%

testing. Only the training data was used to implement

machine-learning models.

We implemented the following machine learning

models: Logistic Regression, Decision Tree, Support

Vector Machine (SVM), and Random Forrest.

We used GridSearchCv to finetune and find best

hyperparameters for ML algorithms.

We also systematically tested our machine

learning models using Random States.

4 RESULTS

The machine learning model that did best was

Support Vector Machine; see Table 2 for several of

its model performance scores on the test data set of

320 reviews. The accuracy rate was 89%. When we

systematically tested this model with random states, a

slightly higher accuracy rate could be obtained.

Table 2: Several performance scores of our SVM machine

learning model.

precision recall f1-score support

deceptive 0.88 0.90 0.89 155

truth 0.90 0.88 0.89 165

avg /

total

0.89 0.89 0.89 320

To check on any welcome generalization

capabilities of the machine learning model, we also

tested the model on several reviews from Yelp that

are likely to be in the same application domain and

the outcomes were promising; see Figure 6.

Figure 6: Generalization capabilities of our SVM machine

learning model on some Yelp reviews.

Table 3: Comparison of several good-scoring spam

detection models, i.e., different supervised learning

techniques on different datasets (adapted from Hussain et

al. (2019)).

5 DISCUSSION

When we compare the results of our study to other

related work, see Table 3, we may note that our spam

detection model yields a high performance when

compared to other models. Our best performing

machine learning model would score fourth place in

terms of accuracy in this list, with an accuracy like

that was obtained in the two studies that use the same

dataset that would be on a third and fifth place in

terms of accuracy, respectively. It clearly

outperforms other studies in terms of accuracy and,

since accuracy is one of the most important

performance indicators, could therefore logically

serve and practically be applied in any defence

strategy that an organisation might want to define in

order to be able to tackle fake reviews.

Our study contributes to the research community

by providing another successful example of how fake

reviews can be detected.

Although fake reviews are a rising concern and a

hot topic in the machine learning domain,

unfortunately, not many datasets in which fake

reviews have been identified are accessible.

Obviously, in supervised learning, a large, diverse

dataset is needed for proper training of classifiers in

different application domains.

The scarcity of labelled datasets forms a real

challenge for further research in the field. It can be

recommended to synthesize and produce a new

labelled dataset to bring more variety into the domain

of spam detection. Currently, many models are being

The Role of Fake Review Detection in Managing Online Corporate Reputation

253

built upon the same sort of data, and, therefore, it will

be valuable for future research to have different or

more ample datasets to analyse.

6 CONCLUSIONS

Many organisations struggle with defending their

online corporate reputations against fake reviews. It

can be argued that positive and neutral fake reviews

have, similar as is the case for negative fake reviews,

negative consequences on corporate reputation. To

provide organisations with an asset for corporate

reputation management, a state-of-the-art machine

learning model has been built that separates fake

reviews from regular ones. The model yields a high

accuracy rate compared to others, and, therefore, it

can be said that this model could be successfully

implemented by organisations as part of their

corporate reputation management strategy. In the

future, the model should be further optimized and

extended to incorporate new datasets that are relevant

for organisations by finetuning the processing steps

that we have inpictured in this paper.

ACKNOWLEDGEMENTS

This paper has been inspired on the MSc master

project of Zoë Kisoen who was involved via the

master Digital Driven Business at HvA. Thanks go to

Frederik Situmeang as well as several anonymous

reviewers for providing some useful suggestions to an

initial version of this manuscript. Rob Loke is

assistant professor data science at CMIHvA.

REFERENCES

Abdulhamid, S.M., Abd Latiff, M.S., Chiroma, H., Osho,

O., Abdul-Salaam, G., Abubakar, A.I., Herawan, T.,

2017. A Review on Mobile SMS Spam Filtering

Techniques. IEEE Access 5, 15650–15666.

https://doi.org/10.1109/ACCESS.2017.2666785

Abimbola, T., Vallaster, C., 2007. Brand, organisational

identity and reputation in SMEs: An overview.

Qualitative Market Research: An International Journal

10, 341–348. https://doi.org/10.1108/1352275071081

9685

ACM, 2017. Eindrapportage ACM-verkenning naar online

reviews: ‘Reviews gereviewd’. https://www.acm.nl/nl/

publicaties/publicatie/17217/Eindrapportage-ACM-

verkenning-naar-online-reviews-Reviews-gereviewd

Aula, P., 2010. (PDF) Social media, reputation risk and

ambient publicity management [WWW Document].

ResearchGate. http://dx.doi.org/10.1108/1087857101

1088069

Barbado, R., Araque, O., Iglesias, C.A., 2019. A framework

for fake review detection in online consumer

electronics retailers. Information Processing &

Management 56, 1234–1244. https://doi.org/10.1016/

j.ipm.2019.03.002

Barnett, M., Jermier, J., Lafferty, B.A., 2006. Corporate

Reputation: The Definitional Landscape. Corporate

Reputation Review, 9, 26-38.

Beal, A., Strauss, J., 2009. Radically Transparent:

Monitoring and Managing Reputations Online. John

Wiley & Sons.

Bryant, D., 2019. How Chinese Sellers are Manipulating

Amazon in 2019 [WWW Document]. URL

https://www.ecomcrew.com/chinese-sellers-

manipulating-amazon/ (accessed 6.20.20).

Carlisle, K., 2015. Fake Online Reviews are Bad for

Business. 910 West. URL https://910west.com/2015/

02/fake-online-reviews-bad-business/ (accessed

6.20.20).

Chandler, S., 2020. Coronavirus Drives 72% Rise In Use

Of Fintech Apps [WWW Document]. URL

https://www.forbes.com/sites/simonchandler/2020/03/

30/coronavirus-drives-72-rise-in-use-of-fintech-

apps/#68435c9066ed (accessed 6.17.20).

Chun, R., 2005. Corporate reputation: Meaning and

measurement. International Journal of Management

Reviews 7, 91–109.

Constantinides, E., & Fountain, S. J. (2008). Web 2.0:

Conceptual foundations and marketing issues. Journal

of direct, data and digital marketing practice, 9(3), 231-

244.

Crawford, M., Khoshgoftaar, T.M., Prusa, J.D., Richter,

A.N., Al Najada, H., 2015. Survey of review spam

detection using machine learning techniques. Journal of

Big Data 2, 23. https://doi.org/10.1186/s40537-015-

0029-9

Davies, G., Chun, R., Kamins, MA., 2010. Reputation gaps

and the performance of service organizations. Strategic

Management Journal 31, 530–546.

Dewey, J. (1910). How we think. Boston: D.C. Health and

Company.

DigitalTrends (2018) Can you really trust app store ratings?

We asked the experts. https://www.digitaltrends.com/

android/can-you-really-trust-app-store-ratings/

DiMauro, V., & Bulmer, D. (2014). The Social Consumer

Study. The Society for new Communications Research.

Dowling, G. R. (2016). Defining and measuring corporate

reputations. European Management Review, 13(3),

207-223.

Dutch Institute of International Relations, S., n.d. Economic

impact of COVID-19 [WWW Document]. Statistics

Netherlands. URL https://www.cbs.nl/en-gb/dossier/

coronavirus-crisis-cbs-figures/economic-impact-of-

covid-19 (accessed 6.20.20).

Fombrun, C., van Riel, C., 1997. The Reputational

Landscape. Corporate Reputation Review.

Fontanarava, J., Pasi, G., Viviani, M., 2017. Feature

Analysis for Fake Review Detection through

DATA 2022 - 11th International Conference on Data Science, Technology and Applications

254

Supervised Classification, in: 2017 IEEE International

Conference on Data Science and Advanced Analytics

(DSAA), pp. 658–666. https://doi.org/10.1109/

DSAA.2017.51

Fournier, S., Avery, J., 2011. The uninvited brand. Business

Horizons, SPECIAL ISSUE: SOCIAL MEDIA 54,

193–207. https://doi.org/10.1016/j.bushor.2011.01.001

Gensler, S., Völckner, F., Egger, M., Fischbach, K.,

Schoder, D., 2015. Listen to Your Customers: Insights

into Brand Image Using Online Consumer-Generated

Product Reviews. International Journal of Electronic

Commerce, 1 20.

Gerdeman, D., 2020. How the Coronavirus Is Already

Rewriting the Future of Business [WWW Document].

HBS Working Knowledge. URL http://hbswk.hbs.edu/

item/how-the-coronavirus-is-already-rewriting-the-

future-of-business (accessed 6.17.20).

Giovanni, S., 2010. Managing Knowledge Assets and

Business Value Creation in Organizations: Measures

and Dynamics: Measures and Dynamics. IGI Global.

Google (2019) In reviews we trust - making Google Play

ratings and reviews more trustworthy. https://android-

developers.googleblog.com/2018/12/in-reviews-we-

trust-making-google-play.html

Gov UK, n.d. Online reviews: letting your customers see

the true picture [WWW Document]. GOV.UK. URL

https://www.gov.uk/government/publications/online-

reviews-and-endorsements-advice-for-businesses/

online-reviews-giving-consumers-the-full-picture

(accessed 6.20.20).

Grützmacher, A., n.d. Reputation 2.0: The role of social

media in corporate reputation - Case Nokia 160.

Harman, M., Jia, Y., Zhang, Y., 2012. App Store Mining

and Analysis: MSR for App Stores.

Helm, S., Klode, C., 2011. Challenges in Measuring

Corporate Reputation. pp. 99–110.

https://doi.org/10.1007/978-3-642-19266-1_11

Horn, I., Taros, T., Dirkes, S., 2015. (PDF) Business

Reputation and Social Media: A Primer on Threats and

Responses [WWW Document]. ResearchGate.

http://dx.doi.org/10.1057/dddmp.2015.1

Hoyer, WD., Macinnis, DJ., 2001. Consumer Behavior. 2nd

ed., Boston: Houghton Mifflin.

Hussain, N., Mirza, H., Rasool, G., Hussain, I., Kaleem, M.,

2019. Spam review detection techniques: a systematic

literature review.

Hutton, J., Goodman, M., Alexander, J., Genest, C., 2001.

Reputation management: the new face of corporate

public relations? Public Relations Review 27, 247–261.

https://doi.org/10.1016/S0363-8111(01)00085-6

Iansiti, M., Richards, G., 2020. Coronavirus Is Widening

the Corporate Digital Divide. Harvard Business

Review.

Jamie, 2019. Fake Reviews Are a Real Problem: 8 Statistics

That Show Why [WWW Document]. BrightLocal.

URL https://www.brightlocal.com/learn/fake-reviews-

are-a-real-problem-8-statistics-that-show-why/

(accessed 6.20.20).

Jankauskaite, D., Urboniene, A., 2016. Organization’s

Reputation Management Through Content Creation

And Sharing In The Social Media 3, 35.

Jones, B., Temperley, J., Lima, A., 2010. Corporate

Reputation in the Era of Web 2.0: The Case of Primark.

Journal of Marketing Management November 2009,

927–939. https://doi.org/10.1362/026725709X479309

Kaggle, 2020. Find Open Datasets and Machine Learning

Projects | Kaggle [WWW Document]. URL

https://www.kaggle.com/datasets?sortBy=relevance&

group=public&search=spam&page=1&pageSize=20&

size=sizeAll&filetype=fileTypeAll&license=licenseAl

l (accessed 6.19.20).

Kaggle Reviews, 2020. Rome wasn’t built in a day: spotting

fake reviews [WWW Document]. URL

https://kaggle.com/nicodds/rome-wasn-t-built-in-a-

day-spotting-fake-reviews (accessed 6.17.20).

Kaplan, A., Haenlein, M., 2010. Users of the World, Unite!

The Challenges and Opportunities of Social Media.

Business Horizons 53, 59–68. https://doi.org/10.1016/

j.bushor.2009.09.003

Li, L.-Y., Qin, B., Liu, T., 2018. Survey on Fake Review

Detection Research. Jisuanji Xuebao/Chinese Journal

of Computers 41, 946–968. https://doi.org/10.11897/

SP.J.1016.2018.00946

Lincoln, J., n.d. As Coronavirus Spreads, Digital Marketing

Becomes More Important Than Ever [WWW

Document]. Business 2 Community. URL

https://www.business2community.com/small-

business/as-coronavirus-spreads-digital-marketing-

becomes-more-important-than-ever-02296683

(accessed 6.17.20).

Martens, D., Maalej, W., 2019. Towards understanding and

detecting fake reviews in app stores. Empir Software

Eng 24, 3316–3355. https://doi.org/10.1007/s10664-

019-09706-9

McGrath, J., Ross, T., 2020. Corporate reputation and the

coronavirus. Ipsos.

McKinsey, 2020. Customer experience: new capabilities,

new audiences, new opportunities. McKinsey &

Company.

Mishra A, Satish SM., 2016. eWOM: Extant Research

Review and Future Research Avenues. Vikalpa

41(3):222-233. doi:10.1177/0256090916650952

Otar, C., 2018. https://www.forbes.com/sites/forbesfinance

council/2018/10/05/how-review-sites-can-affect-your-

business-and-what-you-can-do-about-it/

Page, G., & Fearn, H. (2005). Corporate reputation: what

do consumers really care about? Journal of Advertising

Research, 45(3), 305-313.

Siano, A., Vollero, A., Confetto, M., Siglioccolo, M., 2013.

Corporate communication management: A framework

based on decision-making with reference to

communication resources. Journal of Marketing

Communications 19. https://doi.org/10.1080/135272

66.2011.581301

Statista, 2020. Amazon fake product review categories

2018 l Statistic [WWW Document]. Statista. URL

https://www.statista.com/statistics/997026/amazon-sh

The Role of Fake Review Detection in Managing Online Corporate Reputation

255

opping-categories-largest-share-fake-product-reviews/

(accessed 6.20.20).

The New York Times, 2020. https://www.nytimes.com/

2020/02/28/business/economy/companies-coronavirus

-economy.html

Walker, K. (2010). A systematic review of the corporate

reputation literature: Definition, measurement, and

theory. Corporate reputation review, 12(4), 357-387.

Wartick, S., 2002. Measuring Corporate Reputation

Definition and Data. Business & Society 41, 371–392.

https://doi.org/10.1177/0007650302238774

Waters, R.D., Burnett, E., Lamm, A., Lucas, J., 2009.

Engaging stakeholders through social networking: How

nonprofit organizations are using Facebook. Public

Relations Review 35, 102–106. https://doi.org/10.1016/

j.pubrev.2009.01.006

Weber Shandwick, KRC Research, 2019. The state of

corporate reputation in 2020: everything matters now.

Xhema, J., 2019. Effect of Social Networks on Consumer

Behaviour: Complex Buying. IFAC-PapersOnLine,

19th IFAC Conference on Technology, Culture and

International Stability TECIS 2019 52, 504–508.

https://doi.org/10.1016/j.ifacol.2019.12.594

DATA 2022 - 11th International Conference on Data Science, Technology and Applications

256