Research on COE Program with Machine Learning Algorithms
Jiashi Li
Department of Statistics and Applied Probability, UC Santa Barbara, CA, U.S.A.
Keywords: COE Coffee, Machine Learning, One-Hot Coding, Hedonic Pricing Theory.
Abstract: The market price of COE coffee depends on its heterogeneity characteristics, and hedonic model theory is the
dominant research approach for coffee prices. In this paper, the correlation between coffee prices and the New
York C-futures index is first verified. Further, machine learning algorithms such as Support Vector Regression
(SVR), Multilayer Perceptron (MLP) models are used to study the factors affecting prices. A more general
coding of coffee types is designed and improved the one-hot coding of the regression model. The performance
shows that the improved model is better in terms of performance. The prediction accuracy of the model
improved by 24.65% after generalized coding of coffee categories. The study further explores the
implementation of the hedonic pricing theory.
1 INTRODUCTION
The growing demands for specialty coffee have led to
a rapidly growing market for specialty coffee in many
countries. The percentage of adults who consume
specialty coffee has increased in recent years.
Specialty coffee quality is a key factor in stabilizing
market development (Traore, 2018, Wilson, 2018,
Fields, 2018).
Many Latin American countries participate in the
Cup of Excellence (COE) program. Every year an
auction of coffee is held. A jury tastes the coffee
based on samples of brown beans submitted by the
farms. Each cup is given a score from 0 to 100, those
scoring 84 or more quality points in the competition
are awarded the prestigious Cup of Excellence Award
(Bacon, 2004). The winning coffees are ranked
according to their scores, and the coffee with the
highest score in a given category is awarded first
place, followed by the highest quality.
Scholars have used extensively COE dataset to
predict the price of specialty coffee. They studied the
role of product differentiation and quality production
in the world coffee market (Teuber, 2010, Ferreira,
2016, Liska, 2016, Cirillo, 2016). There is a growing
literature on the relationship between coffee quality
and regional environmental characteristics,
especially for the so-called specialty coffees, but
consumer price analysis can provide useful
information on coffee quality differences. A
comprehensive analysis is possible if the datasets can
cover both objective and subjective quality attributes.
Donnet analyzes the importance of sensory and
reputational attributes in the origin markets (Donnet,
2007, Weatherspoon, 2007, Hoehn, 2007). They
found that country of origin effect is evident except
sensory quality and scores.
The hedonic price model has been used to study
the relationship between prices and attributes of
agriculture, food, and real estate. This model is
inspired by Waugh's publication "Quality Factors
Affecting Vegetable Prices article and the work of
Rosen. This approach is used to measure and analyze
the contribution of a product's attributes (Hu, 2019,
He, 2019, Han, 2019, Gu, 2011, Zhu, 2011, Jiang,
2011). The price of the product is usually modeled as
a parameter. Thus, the regression model is used to
predict the quality fraction and the price based on
various attributes.
However, achieving healthy, sustainable price is a
challenge. The framework presented in this paper
integrates machine learning and consumption
models. Aggregate multiple regressions on various
subsamples of the dataset to improve the prediction
accuracy. Improving prediction accuracy and thus
controlling overfitting. Tree regression algorithms
are considered as non-linear, non-parametric methods
with high generalization.
Through the analysis of historical data, the
original purpose of establishing COE coffee is
validated. It is to keep the futures price from affecting