A Hybrid Approach for Product Classiﬁcation

based on Image and Text Matching

Sebastian Bast, Christoph Brosch and Rolf Krieger

Institute for Software Systems, Trier University of Applied Sciences,

Environmental Campus Birkenfeld, Birkenfeld, Germany

Keywords:

Product Classiﬁcation, Machine Learning, Convolutional Neural Networks, Image and Text Matching.

Abstract:

The classiﬁcation of products generates a high effort for retail companies because products must be classiﬁed

manually in many cases. To optimize the product data creation process, methods for automating product

classiﬁcation are necessary. An important component of product data records are digital product images. Due

to the latest developments in pattern recognition, these images can be used for product classiﬁcation. Artiﬁcial

neural networks are already capable of classifying digital images with lower error rates than humans. But the

enormous variety of products and frequent changes in the product assortment are big challenges for current

methods for classifying product images automatically.

In this paper, we present a system that automatically classiﬁes products based on their images and their textual

descriptions extracted from the images according to the Global Product Classiﬁcation Standard (GPC) by

using machine learning methods to ﬁnd similarities in image and text datasets. Our experiments show that the

manual effort required to classify product data can be signiﬁcantly reduced by machine learning techniques.

1 INTRODUCTION

The amount of product information to be managed by

retailers has increased signiﬁcantly in recent years. In

some companies, more than one thousand new prod-

uct data records are created and classiﬁed every day.

This process is often associated with high manual ef-

fort. A single record can contain several hundred at-

tributes, which must be edited by hand in many cases.

This time-consuming task is susceptible to errors and

produces high costs.

In retail, products are classiﬁed based on classi-

ﬁcation systems with predeﬁned product categories

and hierarchies to model product relationships like the

Global Product Classiﬁcation (GPC) (GS1, 2018). In

this hierarchical classiﬁcation system each product is

assigned to a precisely deﬁned segment, family, class

and brick code. In the Food/Beverage/Tobacco seg-

ment alone there are 25 different GPC families, 137

different GPC classes and 884 different GPC bricks

which must be distinguished. In addition, many retail-

ers use several classiﬁcation systems that group prod-

ucts according to different criteria to support purchas-

ing and sales processes which makes product classiﬁ-

cation a difﬁcult task to solve.

In order to reduce manual activities, we developed

a system to classify a product based on its product

image by ﬁnding similar images in a labeled dataset

automatically. This paper describes the automated

classiﬁcation process for product images via informa-

tion extraction and matching of the extracted informa-

tion with already classiﬁed images based on machine

learning. We also explain how we used textual de-

scriptions on product images to further improve clas-

siﬁcation performance. The primary use case we have

in mind is the automated classiﬁcation of products to

support the user by eliminating manual tasks.

This paper is organized as follows: In section 2 we

give a brief review of the related work for product im-

age classiﬁcation and feature extraction. In addition,

we name key problems of automated image classiﬁ-

cation systems and in section 3 we describe our ap-

proach to solve them. We also discuss the architec-

ture of our system including the models and proce-

dures we used. The dataset is described in section 4.

In section 5 we present the results of our experiments.

We explain how we tuned parameters and how we ap-

plied Feature Engineering. In section 6 we give a brief

summary of our work and discuss the advantages and

disadvantages of our system. We also mention ideas

for future research.

Bast, S., Brosch, C. and Krieger, R.

A Hybrid Approach for Product Classiﬁcation based on Image and Text Matching.

DOI: 10.5220/0011260200003269

In Proceedings of the 11th International Conference on Data Science, Technology and Applications (DATA 2022), pages 293-300

ISBN: 978-989-758-583-8; ISSN: 2184-285X

293

2 RELATED WORK

Reliable product classiﬁcation tools will become in-

creasingly important in the future as they increase the

level of automation in retail. There are various ap-

proaches for classifying product data. It can be clas-

siﬁed based on unstructured data, structured data or

image data. Progress has been made in the ﬁeld of

automated image classiﬁcation due to developments

in machine learning, but there are several problems

which need to be solved.

Current issues are large-scale classiﬁcation, data

limitations, intraclass variation and lack of ﬂexibility

(Wei et al., 2020). Large scale classiﬁcation is an is-

sue because an increasing number of classes results

in a decrease in accuracy. Deep learning approaches

require a large amount of labeled data, and labeling is

a very time-consuming task. In many cases, the sup-

ply of labeled data is limited. In case of product im-

ages, products from similar subcategories often have

only minor visual differences, which make them hard

to distinguish. In addition, the visual appearance of

products can change over time, which increases the

demand for a ﬂexible solution. Convolutional neu-

ral networks are not ﬂexible and have to be retrained

from time to time in order to recognize the frequent

changes in the product assortment.

A central task of systems for automated image

classiﬁcation is the extraction of visual features which

are used to ﬁnd patterns in images and to predict

classes. Conventional methods are algorithms like

SIFT, SURF, BRIEF and ORB (Karami et al., 2017)

where pattern recognition is based on features that

have been speciﬁed by humans. However, the man-

ual design of a reliable and robust system for pattern

recognition is a difﬁcult task to solve, since there is

an enormous number of different patterns which must

be considered. Deep Learning is a machine learning

method for feature extraction based on artiﬁcial neu-

ral networks, which has led to great advances in the

ﬁeld of image classiﬁcation (LeCun et al., 2015). To-

day, neural networks are preferred over conventional

pattern recognition algorithms in many cases because

they solve certain image classiﬁcation problems with

lower error rates than humans (Russakovsky et al.,

2015). By using Transfer Learning (Tan et al., 2018),

researchers can build on knowledge that has already

been learned, which signiﬁcantly reduces the amount

of training data needed.

In our work, we used image matching for auto-

mated identiﬁcation of similarities in product images.

This method detects similar patterns by matching vi-

sual image features and similar images are determined

by comparing their feature vectors (Szeliski, 2021).

It can be used in recommender systems to suggest

images with a similar visual appearance, in backward

image search engines to return similar images and in

systems for product image classiﬁcation (Bast, 2021).

In addition to image-based approaches, text-based

classiﬁcation is also promising (Chavaltada et al.,

2017; Allweyer et al., 2020). The text-based model

described in (Allweyer et al., 2020) can classify

a given product dataset with weighted f1-scores of

0.860 on GPC brick, 0.912 on GPC class and 0.938 on

GPC family level. Lowercasing and removing special

characters improve the result, while omitting numer-

als has a detrimental impact.

3 APPROACH AND MODELLING

In this section we describe our approach for product

classiﬁcation based on product images and textual in-

formation extracted from these images. We predict

the GPC family, GPC class and GPC brick codes of

unclassiﬁed products by extracting and comparing the

image- and text-based features of their images to the

features of already classiﬁed images.

3.1 System Overview

Our system for classifying products with image

matching is based on two different phases: Prepro-

cessing and Classiﬁcation (ﬁgure 1).

In the Preprocessing phase, visual features of all

product images are extracted as described in section

3.2. A feature vector is generated for every classi-

ﬁed image and stored together with the corresponding

GPC-labels on brick, class and family level for later

use. This data provides the basis for all product clas-

siﬁcation tasks. Preprocessing only needs to be done

once.

The Classiﬁcation of a new product is performed

by comparing the feature vector of its product image

to the vectors generated during Preprocessing. The

visual features of this image are matched with the vi-

sual features of already classiﬁed product images by

examining their similarity as described in section 3.4.

3.2 Feature Extraction

A key element of image classiﬁcation is feature ex-

traction. Since images are distinguished based on

their features, a robust method is needed to reli-

ably extract them. We applied a convolutional neu-

ral network (CNN) with pretrained weights for fea-

ture extraction. The weights of the network have been

trained on the ImageNet dataset (Deng et al., 2009).

DATA 2022 - 11th International Conference on Data Science, Technology and Applications

294

CNN

Feature

Extraction

Feature

Vectors

Feature

Reduction

kNN Class

CNN

Feature

Extraction

Feature

Vector

Feature

Reduction

Preprocessing

Classiﬁcation

Classiﬁed

Images

Unclassiﬁed

Image

Reduced

Feature

Vectors

Reduced

Feature

Vector

Figure 1: System for Product Image Classiﬁcation based on Image Matching.

This allowed us to use the knowledge already stored

in the weights of the network to solve our problem. In

section 5.4 we compared three different network ar-

chitectures by using every architecture for feature ex-

traction and by measuring classiﬁcation performance

for each of the three feature sets. With ResNet50 (He

et al., 2015), InceptionV3 (Szegedy et al., 2016) and

VGG16 (Simonyan and Zisserman, 2015) we investi-

gated three different architectures, all of which have

been proven to be suitable for automated image clas-

siﬁcation in the past.

3.3 Feature Reduction

The network architectures we used produce a feature

vector of up to 2048 dimensions for a single image.

This results in a large amount of data and slows down

the classiﬁcation process, because the extracted data

must be screened completely during classiﬁcation. In

section 5.4 we used Principal Component Analysis

(Lever et al., 2017) for data reduction to speed up the

classiﬁcation process and we measured the impact on

both runtime and classiﬁcation performance.

3.4 Matching and Classiﬁcation

We used a nearest neighbor approach for matching

and classiﬁcation, which is performed by the kNN al-

gorithm (Mitchell, 1997) based on the feature vectors

of our product images. It detects the k feature vectors

with the highest similarity and uses them for classi-

ﬁcation. The target class is determined based on the

classes assigned to these detected vectors by a major-

ity vote. The class with the highest probability is then

assigned to the image. In order to achieve the best

possible classiﬁcation result, we investigated the pa-

rameters of kNN in section 5.3. A result predicted by

kNN with a k=3 is illustrated in ﬁgure 2.

10000262

10000262 10000262 10000262

Figure 2: Unclassiﬁed product image (left) and three cor-

rectly predicted brick codes with the corresponding images.

3.5 Hybrid Approach

We combined the image-based classiﬁcation ap-

proach with text-based classiﬁcation and merged the

results to increase classiﬁcation performance.

In the ﬁrst step, we extracted the texts of all avail-

able product images with Optical Character Recogni-

tion (OCR) (Google, 2022). In a second step, we stan-

dardized the texts by removing special characters and

converting upper case characters into lower case char-

acters. We also split the extracted texts into blocks

and formed tokens (ﬁgure 3) to represent the text of a

product image as a feature vector.

[’am besten eiskalt’,

’kohlens

aurehaltig’,

’ed bul’,

’energy drink’,

’mit taurin’,

’belebt geist und k

orper’]

Figure 3: Product image and its extracted german text.

A Hybrid Approach for Product Classiﬁcation based on Image and Text Matching

295

In addition, we assigned the corresponding GPC

brick code to each text vector as a label for classiﬁca-

tion. We used a nearest neighbor approach to classify

the texts, which is similar to the technique we used in

our image-based approach.

To classify a text we compared its feature vec-

tor with all feature vectors in the dataset and counted

matching words. Then, we selected the label of the

feature vector with the largest number of matches

and assigned the corresponding label. The follow-

ing listing shows the pseudo code of our heuristic

for determining the ﬁnal class based on the two best

predictions of the image-based method (imgclass1,

imgclass2) and the text-based method (txtclass1,

txtclass2).

predicted_classes = [imgclass1, imgclass2,

txtclass1, txtclass2];

majority = most_common(predicted_classes);

minority = least_common(predicted_classes);

if(majority.frequency >= 2 &&

minority.frequency == 1)

predicted_class = majority.class;

else

predicted_class = imgclass1;

The class that occurs most frequently in the four

predicted classes is the ﬁnal class prediction. In case

of a tie, the ﬁrst proposed class of the image-based

approach is used as the ﬁnal class, since in our exper-

iments the image-based method has a higher accuracy

than the text-based method (table 9).

4 PRODUCT DATASET

In order to evaluate the performance of our approach

we used product images and corresponding prod-

uct data records provided by a German retail com-

pany. The data is exclusively assigned to the seg-

ment Food/Beverage/Tobacco according to the Global

Product Classiﬁcation (GPC). We cleaned the dataset

by removing duplicated images and images belong-

ing to bricks with less than 10 images. The ﬁnal

dataset contains 69.256 product images and 36.624

corresponding product data records. Most of the im-

ages show a single product in front of a neutral back-

ground (ﬁgure 4).

Figure 4: Product Image Samples.

These front images belong to one of two differ-

ent image types according to the GS1 Product Image

Style Guide (GS1, 2019). The two image types are

called Functional Product Images and Primary Prod-

uct Images. Functional Product Images are captured

with no spatial depth of the product. Primary Product

Images show not only the height and width, but also

the depth of a product. The images in the dataset are

in PNG-format and have sizes of up to 1000 pixels

in width and height. The product images are not uni-

formly distributed among the classes (ﬁgure 5). Every

image in our dataset is assigned to exactly one brick,

one class and one family according to the GPC stan-

dard.

Table 1: Quantity of images, products, texts and labels in

the dataset we used in our experiments.

type quantity

images 69.256

texts 67.587

products 36.624

GPC-bricks 197

GPC-classes 73

GPC-families 20

5 EXPERIMENTS

We implemented the described approach based on the

scikit-learn library (scikit learn, 2022). In section 5.1

and 5.2, we describe the data preparation steps and

the performance metrics used in our paper. In section

5.3, we studied the parameters of the kNN algorithm

and their effects on the performance of the system to

determine the optimal conﬁguration. In section 5.4,

we describe approaches to reduce the amount of data

and demonstrate the effects on classiﬁcation accuracy

after reduction. In section 5.5, we evaluate the perfor-

mance of our system and present our results. In sec-

tion 5.6 we analyze the hybrid approach which com-

bines our image-based and text-based approach to in-

crease classiﬁcation performance.

5.1 Data Preparation

All results of our experiments are based on the dataset

we described in section 4. In our experiments, we

used 20% of the dataset as a test set and classiﬁed it

according to the GPC standard by using the remain-

ing 80% of the dataset. While splitting our data, we

made sure, that each of the two sets contained approx-

imately the same percentage of samples of each class

as the total dataset. In every experiment, we used the

metrics described in section 5.2 to measure the classi-

ﬁcation performance of the system.

DATA 2022 - 11th International Conference on Data Science, Technology and Applications

296

Figure 5: Distribution of images per GPC-family (69.256 images from 20 different families).

5.2 Performance Metrics

We evaluated the performance of our system by cal-

culating the metrics precision, recall and f1-score

(Sokolova and Lapalme, 2009) in a weighted form.

As mentioned before, our dataset is unbalanced be-

cause the number of images per class is different for

each class. So we calculated the metrics for each class

and their average, weighted by the number of true in-

stances for each class (scikit learn, 2022).

5.3 K-Nearest Neighbors Algorithm

In this experiment, we analyzed the conﬁguration of

the knn algorithm in detail to ﬁnd the best possible

conﬁguration for classiﬁcation.

5.3.1 Computation of Nearest Neighbors

We used an implementation of kNN (scikit learn,

2022) which provides three different methods for fast

computation of nearest neighbors. brute computes the

distances between all pairs of points in the dataset,

while kd tree and ball tree use an internal tree struc-

ture. In this section we compare their classiﬁcation

performance and runtime. In our case all computa-

tional methods achieve the same classiﬁcation perfor-

mance. But brute solves the classiﬁcation problem

the fastest based on our dataset. The construction of

an internal data structure for the kd tree and ball tree

methods apparently takes so much time that it cannot

be made up for during the classiﬁcation process.

Table 2: kNN computation performance comparison on

GPC brick level (k=2) and 20% of our data as test set.

metric kd tree ball tree brute

precision 0.864 0.864 0.864

recall 0.862 0.862 0.862

f1-score 0.861 0.861 0.861

runtime [s] 2118,12 1995,31 29,11

5.3.2 Distance Weights and k

In this section, the inﬂuence of k on the classiﬁca-

tion accuracy of kNN is investigated by using up to

20 nearest neighbors for classiﬁcation.

Figure 6: F1-score for weighted and unweighted kNN.

Based on our data, the system performs more ac-

curately with weighted nearest neighbor distances and

classiﬁcation accuracy decreases the larger the value

of k becomes. The performance metrics show the

highest value when k is 2 and when distances are

A Hybrid Approach for Product Classiﬁcation based on Image and Text Matching

297

weighted by the inverse of their distance. Therefore,

we use this conﬁguration for the kNN algorithm in all

of the following experiments.

5.3.3 Distance Metrics

In this section, we examined three distance metrics for

determining nearest neighbors while calculating the k

nearest neighbors using different distance metrics for

classiﬁcation. As mentioned before, k is set to 2.

Table 3: kNN distance metric performance comparison on

GPC brick level.

algorithm precision recall f1-score

euclidean 0.864 0.862 0.861

minkowski 0.850 0.849 0.848

chebyshev 0.743 0.742 0.740

Based on our data, each distance metric produces

different classiﬁcation results and the euclidean dis-

tance achieves the best classiﬁcation performance.

5.4 Feature Engineering

We investigated the inﬂuence of different model ar-

chitectures on classiﬁcation performance. We ana-

lyzed the impact of reducing the data in the feature

vectors on the performance and the runtime of the

system. In a ﬁrst step, we investigated the inﬂuence

of different model architectures on the classiﬁcation

performance on brick level. We used each architec-

ture for feature extraction and measured the classiﬁ-

cation performance. The results in table 4 show, that

the model architecture used for feature extraction has

an impact on classiﬁcation performance. In our case,

ResNet50 is the most suitable architecture.

Table 4: Performance comparison of different model archi-

tectures on GPC brick level.

model precision recall f1-score

ResNet50 0.864 0.862 0.861

VGG16 0.851 0.850 0.849

InceptionV3 0.806 0.805 0.802

In order to determine the class as quickly as pos-

sible, we reduced the generated feature vectors to de-

crease runtime. We used Principal Component Anal-

ysis (PCA) (Lever et al., 2017) for dimensionality re-

duction (table 5) and measured the classiﬁcation per-

formance of the system by classifying the differently

sized feature vectors on GPC brick level. The values

in table 6 indicate a correlation between the perfor-

mance of the system and the number of elements in

the feature vectors. A decreasing number of elements

per vector causes a decreasing system performance.

Table 5: Inﬂuence of the PCA feature reduction on the

amount of data based on 69.256 images.

vec. size data amount [MB]

2048 546.73

512 272.00

128 68.00

Table 6: Performance and runtime of the classiﬁcation of

PCA-reduced feature vectors at GPC-brick level based on

20% of the data.

vec.size precision recall f1-score time [s]

2048 0.864 0.862 0.861 29.34

512 0.861 0.860 0.859 13.91

128 0.850 0.849 0.848 11.45

A reduction of elements per vector by 75% to 512

results in a performance decrease of only 0.23% and a

runtime decrease by a factor of 2.11. We can therefore

double the speed of the system if we accept a small

loss in accuracy. We compared the PCA to Neigh-

borhood Components Analysis (NCA) and found that

PCA leads to better classiﬁcation results in our case.

5.5 Performance Evaluation

We evaluated the performance of the system with k-

fold stratiﬁed cross-validation. We divided the dataset

into ﬁve subsets, which corresponds to 20% of the

data per subset, considering the proportions of each

class in the total dataset. In ﬁve runs, we used one of

the subsets as test data and automatically classiﬁed its

content based on the remaining data by using the kNN

algorithm. We calculated the weighted performance

metrics precision, recall and f1-score after each run

and used the results to calculate the arithmetic mean

to determine the total classiﬁcation performance.

Figure 7: k-Fold stratiﬁed cross validation.

The k-fold cross validation resulted in a weighted

f1-score of 0.856 on GPC brick level, 0.886 on GPC

class level and 0.911 on GPC family level. Classiﬁca-

DATA 2022 - 11th International Conference on Data Science, Technology and Applications

298

tion accuracy is the highest at family level. This can

be justiﬁed by the fact that the visual differences of

the products at family level are larger and thus easier

to distinguish than products on class and brick level.

Table 7: Results of system performance evaluation based on

k-fold stratiﬁed cross validation.

GPC-Layer k precision recall f1-score

Brick 5 0.859 0.857 0.856

Class 5 0.887 0.886 0.886

Family 5 0.911 0.911 0.911

In another experiment, we removed product im-

ages that belong to products in the test set from the

remaining dataset before classiﬁcation. We ensured

that no data of products from the test set appear in

the remaining 80% of the data. We observed that

the weighted f1-score decreased by 0.031 at the GPC

brick level, by 0.025 at the GPC class level, and by

0.02 at the GPC family level as shown in table 8.

Table 8: Results of system performance evaluation with a

dataset where the products in the test set are not included in

the remaining 80% of the data.

GPC-Layer precision recall f1-score

Brick 0.827 0.826 0.825

Class 0.862 0.862 0.861

Family 0.891 0.890 0.891

5.6 Hybrid Classiﬁcation Approach

In all previous experiments we used image-based

classiﬁcation only. In order to improve the classiﬁ-

cation performance of the system, we combined the

image-based classiﬁcation with a text-based approach

and merged the classiﬁcation results as described in

section 3.5.

For classiﬁcation, we took 20% of the image data

and extracted the corresponding text from each im-

age in the test data based on OCR (Google, 2022).

This gave us 13.476 images with 13.476 texts, each

of which we classiﬁed separately on GPC brick level.

We then combined the two methods and classiﬁed all

13.476 elements with our combined approach.

Table 9: System performance comparison of the image-

based, text-based and hybrid approach on GPC brick level.

approach precision recall f1-score

images and texts 0.884 0.884 0.883

texts only 0.846 0.858 0.850

images only 0.870 0.864 0.866

The combination of the two approaches leads

to a better overall classiﬁcation performance. The

weighted f1-score of the combined approach is 1.7%

higher than the image-based approach and 3.3%

higher than the text-based approach.

6 SUMMARY

The system presented in this paper is based on a neu-

ral network (ResNet50) for feature extraction and a

supervised learning algorithm (kNN) for classiﬁca-

tion of the extracted features. The results of our exper-

iments are based on 69.256 product images of 36.624

different products which are assigned to 197 differ-

ent GPC-bricks, 73 different GPC-classes and 20 dif-

ferent GPC-families. We classiﬁed the given dataset

with weighted f1-scores of 0.856 on GPC brick level,

0.886 on GPC class level and 0.911 on GPC family

level (table 7). After removing product images from

the search space which belong to products in the test

data, we achieved a weighted f1-score of 0.825 on

GPC brick level, 0.861 on GPC class level and 0.891

on GPC family level (table 8). Our hybrid classiﬁ-

cation approach with image- and text-based classiﬁ-

cation increased the weighted f1-score on GPC brick

level to 0.883 (table 9).

Due to current error rates, a completely error-free

and fully automated classiﬁcation is not yet possible

but the system can support the user signiﬁcantly by

considerably reducing the manual workload during

the classiﬁcation process.

6.1 Advantages and Limitations

Our system is not limited to the GPC standard

and works with other classiﬁcation systems as well.

Changes to the classiﬁcation system do not cause a

need for adjustments or changes of the system. For

example, the built-in neural network does not need

to be retrained in case of a change of the classiﬁca-

tion system, which makes it a ﬂexible solution. The

system can determine the class of an unknown prod-

uct image based on similarities with already classiﬁed

images. Each newly classiﬁed image is added to the

search space after classiﬁcation and will be used for

classiﬁcation tasks in the future.

The system needs at least one classiﬁed item per

class to make predictions for a given element. The

ﬁrst item of a class must be labeled manually by the

user. Initial experiments have shown that predictions

improve as the amount of data increases and the per-

formance of the system is better when multiple im-

ages per product exist in the dataset. In a productive

environment, it is recommended to use a probability

value for a predicted class. If the predicted value is

A Hybrid Approach for Product Classiﬁcation based on Image and Text Matching

299

above a given threshold, the corresponding class can

be assigned automatically by the system. Otherwise,

the user can be prompted to perform the classiﬁcation

manually.

6.2 Future Work

Our system supports the user during the product clas-

siﬁcation process by using corresponding product im-

ages. To automate this task completely, further re-

search and development work is necessary. The sys-

tem can be combined with object detection and image

segmentation to detect the product in an image and cut

it out before feature extraction. This approach allows

almost any product image type to be used in the sys-

tem for classiﬁcation. Currently, product images are

compared in their entirety. This can lead to a product

image not being classiﬁed correctly, when the product

only makes up a small part of the image.

The system can also be combined with additional

text-based and color-based techniques to further in-

crease classiﬁcation performance. Despite the high

accuracy of the system, errors can occur during a fully

automated classiﬁcation process. The user can be

asked for input if the probability of a predicted class

is low and below a given threshold to minimize the

error rate of the system. In this way, the user inter-

actively contributes to the improvement of the system

by generating labeled image data, which can be used

for classiﬁcation in the future.

ACKNOWLEDGEMENTS

This work was funded by the German Federal Min-

istry of Education and Research (FKZ 01IS20085).

REFERENCES

Allweyer, O., Schorr, C., Krieger, R., and Mohr, A. (2020).

Classiﬁcation of Products in Retail using Partially Ab-

breviated Product Names Only. Proceedings of the

9th International Conference on Data Science, Tech-

nology and Applications - DATA, pages 67–77.

Bast, S. (2021). Image Matching for Product Image Clas-

siﬁcation based on Machine Learning. University of

Applied Science Trier, Institute for Software Systems,

Master thesis.

Chavaltada, C., Pasupa, K., and Hardoon, D. R. (2017). A

Comparative Study of Machine Learning Techniques

in Automatic Product Categorisation. Proceedings

of the 14th International Symposium on Neural Net-

works - ISNN, pages 10–17.

Deng, J., Dong, W., Socher, R., Li, L.-J., and Li, L. K. F.-F.

(2009). ImageNet: a Large-Scale Hierarchical Image

Database. IEEE Conference on Computer Vision and

Pattern Recognition, pages 248–255.

Google (2022). Google Vision API. https://cloud.google.

com/vision. (2022-02-16).

GS1 (2018). Global Product Classiﬁcation Development

& Implementation Guide. https://www.gs1.org/sites/

default/ﬁles/gpc development and implementation 1.

pdf. (2022-02-12).

GS1 (2019). GS1 Product Images Application Guideline for

the Retail Grocery & Foodservice Industries. https://

www.gs1us.org/grocery-image-guide. (2022-02-12).

He, K., Zhang, X., Ren, S., and Sun, J. (2015). Deep

Residual Learning for Image Recognition. CoRR,

abs/1512.03385.

Karami, E., Prasad, S., and Shehata, M. (2017). Image

Matching using SIFT, SURF, BRIEF and ORB: Per-

formance Comparison for Distorted Images. CoRR,

abs/1710.02726.

LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learn-

ing. Nature, 524:436–444.

Lever, J., Krzywinski, M., and Altman, N. (2017). Principal

Component Analysis. Nature Methods, 14(7):641–

642.

Mitchell, T. M. (1997). Machine Learning. McGraw-Hill

Education Ltd.

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S.,

Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bern-

stein, M., Berg, A. C., and Fei-Fei, L. (2015). Ima-

geNet Large Scale Visual Recognition Challenge. In-

ternational Journal of Computer Vision (IJCV), pages

211–252.

scikit learn (2022). scikit-learn - Machine Learning in

Python. https://scikit-learn.org. version 1.0.2. (2022-

02-12).

Simonyan, K. and Zisserman, A. (2015). Very Deep Con-

volutional Networks for Large-Scale Image Recogni-

tion. 3rd International Conference on Learning Rep-

resentations, ICLR 2015.

Sokolova, M. and Lapalme, G. (2009). A systematic analy-

sis of performance measures for classiﬁcation tasks.

Information Processing & Management, 45(4):427–

437.

Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna,

Z. (2016). Rethinking the Inception Architecture for

Computer Vision. 2016 IEEE Conference on Com-

puter Vision and Pattern Recognition (CVPR), pages

2818–2826.

Szeliski, R. (2021). Computer Vision: Algorithms and Ap-

plications. Springer, 2 edition.

Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., and Liu, C.

(2018). A Survey on Deep Transfer Learning. Arti-

ﬁcial Neural Networks and Machine Learning, pages

270–279.

Wei, Y., Tran, S., Xu, S., Kang, B., and Springer, M. (2020).

Deep Learning for Retail Product Recognition: Chal-

lenges and Techniques. Computational Intelligence

and Neuroscience.

DATA 2022 - 11th International Conference on Data Science, Technology and Applications

300