A Case Study for Minimal Cost Shopping in a Cluster of Online Stores
Thiago Alexandre Nakao Franc¸a
1 a
, Rodrigo Campiolo
1 b
and Lilian Caroline Xavier Candido
2 c
1
Department of Computer Science, Federal University of Technology Paran
´
a, Campo Mour
˜
ao, Paran
´
a, Brazil
2
Department of Mathematics, Federal University of Technology Paran
´
a, Campo Mour
˜
ao, Paran
´
a, Brazil
Keywords:
Web Crawler, Integer Linear Programming, Caching, e-Commerce.
Abstract:
This work proposes and evaluates mechanisms to converge to the purchase configuration with minimal cost for
a product list in an online stores cluster. We collected and stored product prices in a database, and used caching
to reduce client response time. We designed, implemented, and compared two Integer Linear Programming
solutions to achieve purchase configuration with minimal cost. A case study was conducted to evaluate these
mechanisms. The results demonstrated that developed mechanisms found optimal solutions with response
time guarantees. Empirical tests in the case study with 100 different products and 118 providers converged in
about 9 seconds.
1 INTRODUCTION
Price comparison shopping websites (shopbots) such
as Google Shopping, Yahoo Shopping, and Camel-
camelcamel enable shoppers to filter and compare
products on many online stores. Typically, this kind
of application aggregate product listing from a spe-
cific group of retailers. In web shopping, users fetch
on comparison shopping sites for one specific prod-
uct they want to acquire. Then the application re-
turns a list of retailers ordered by price, reviews, or
other criteria. Most of these applications work well
for displaying retailers when the shoppers need to
purchase one product, but they do not offer multiple
items search features.
There are many niches in which buyers are in-
terested in purchasing multiple items at once, such
as trading card games, books, comics, action figures,
construction materials, and market foods. In order to
build a comparison shopping application for multiple
products as input, there are two main issues. First,
fetching the product prices from retailers web sites,
and second, finding the retailers in which the cost of
the products is minimum, considering that each re-
tailer has a shipping cost.
Retrieving prices is a problem due to the need
of fetching multiple products data from multiple
sources. Usually the price of a product in a singu-
a
https://orcid.org/0000-0002-1267-6513
b
https://orcid.org/0000-0001-7160-7262
c
https://orcid.org/0000-0001-9927-8843
lar source is retrieved via Hypertext Transfer Protocol
(HTTP) requests. Therefore the number of requests
triggered will be the multiplication of the number of
products by the number of sources, considering one
program execution.
Determining the best purchase configuration con-
sists of selecting the stores where the purchase price is
minimum. Consider a set of products I = {1, . . . , n}, a
set of online stores J = {1, . . . , m}, and k = min(m, n).
C
k, j
is the total number of purchase configurations
that uses j online stores from m available ones, where
j k (equation 1).
C
k, j
=
k!
j! · (k j)!
(1)
The total number of possible purchase configura-
tions for m online stores is the sum of the number of
purchase configurations with j online stores, for each
j J. This is presented in the equation 2, in which
C is the total number of possible purchase configura-
tions and C
k, j
is calculated using the equation 1.
C =
k
j=1
C
k, j
= 2
k1
(2)
Due to the exponential number of possible pur-
chase configurations, a brute force algorithm is not
indicated to find the purchase configuration with min-
imum cost. We would have n( j 1) comparisons for
each purchase configuration in the worst case (i.e., ev-
ery retailer has set I in stock). So we modeled, ap-
França, T., Campiolo, R. and Candido, L.
A Case Study for Minimal Cost Shopping in a Cluster of Online Stores.
DOI: 10.5220/0011040000003179
In Proceedings of the 24th International Conference on Enterprise Information Systems (ICEIS 2022) - Volume 1, pages 717-723
ISBN: 978-989-758-569-2; ISSN: 2184-4992
Copyright
c
2022 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
717
plied, and compared two Integer Linear Programming
models to solve this problem.
We propose and evaluate mechanisms to fetch
product prices from a group of online stores to find the
purchase configuration in which the total price is min-
imum. A case study was conducted on online stores
that focus on the trading card game Magic: the Gath-
ering.
Our main contributions are: (i) mechanism to col-
lect, store and maintain product prices from various
online stores, to reduce the response time; (ii) eval-
uation of two Integer Linear Programming models
to find the optimal purchase configuration of online
stores.
The rest of this work is organized as follows. Sec-
tion 2 presents other works that are related to this re-
search. Section 3 presents the methods applied in this
study, such as case study, web crawler, caching mech-
anisms, proposed models, and experiments specifica-
tions. Section 4 shows and discusses the results of the
experiments. Section 5 exposes the identified limita-
tions of our approach. And finally, Section 6 exposes
our conclusions and future works.
2 RELATED WORKS
This section presents works that address data syn-
chronization and combinatorial optimization prob-
lems similar to this work. Cho and Garcia-Molina
(Cho and Garcia-Molina, 2000) compared metrics
and synchronization politics to maintain local copies
from external sources fresh. They define three syn-
chronization politics as (i) uniform, (ii) proportional,
and (iii) optimal. The optimal synchronization pol-
icy can be used when the change frequency is known.
They conclude that the uniform politic keeps the local
copies from autonomous databases more updated than
the proportional politic and demonstrate that the opti-
mal is better than the previous policies. However, it is
needed to know the update rate of the data. Garfinkel
(Garfinkel et al., 2006) provides a model to com-
pute a purchase plan that fetches many products from
distinct retailers. They propose an algorithm named
Greedy Addition of Bundles (GRAB), which is a vari-
ation of the Fisher and Wolsey algorithm (FISHER
and WOLSEY, 1982). GRAB presented a lower ex-
ecution time than Integer Linear Programming (ILP)
approach in their experiments. Garfinkel (Garfinkel
et al., 2008) proposed an approach to integrate market
promotions and recommendation services on compar-
ison shopping websites. This approach used ILP to
solve a combinatorial problem related to electronic
commerce, focusing on maximizing discounts from a
set of applicable promotions. Their model uses re-
strictions to control the number of bonus products,
discount coupons, and the value required to obtain
free shipping. The model was applied for a set of
Amazon products, and it is described in his work.
(Kandi et al., 2018) proposed a resource allocation
method for query optimizations in the cloud using
an ILP model. The multiple items purchase prob-
lem is similar to a capacitated facility location prob-
lem. Cooper (Cooper, 1963) developed a model for
the classic facility location problem. Its goal was to
decide the locations of warehouses and the allocation
of customer demand, given the locations and demands
of customers. Various extensions of the problem have
been proposed in the literature. Gao (Gao et al., 2010)
proposed a model for the capacitated facility location
problem with freight cost discount. Yu et. al (Yu et al.,
2012) proposed a model for capacitated facility loca-
tion problems in terms of serve radius and economic
benefit. Even though there are similar approaches,
none of these works has proposed the same model
presented in our research. Our work also presents a
caching mechanism to keep local copies of the prices
and the optimization process applicable in a realistic
situation.
3 METHODS
This section presents the methods applied in this re-
search. It includes the case study, the web crawl-
ing technique, the caching mechanism, the proposed
models to converge to the optimal purchase configu-
rations, and experiments specifications.
3.1 Case Study
In order to evaluate the proposed mechanisms and
models, a case study was conducted in a Magic: the
Gathering online marketplace.
Magic: the Gathering is a top-rated trading card
game in various countries, such as United States of
America, Brazil, China, Japan, United Kingdom, Ger-
many, and many more. Many online stores sell the
individual cards of the game. For this kind of prod-
uct, batch purchases are regular, and this happens due
to the nature of the game that instigates shoppers to
purchase multiple game pieces to play the game.
We developed an application that finds optimal
purchase configurations for Magic: the Gathering
cards. The prices and stock are collected from online
marketplace that catalogs prices from various online
stores. With the stock prices available, we fetched
ICEIS 2022 - 24th International Conference on Enterprise Information Systems
718
optimal purchase configurations of various combina-
tions of randomly selected input card lists, aiming to
evaluate the response time of the proposed mecha-
nisms and application. Details of the run executions
are provided in the experiments section.
3.2 Web Crawler
Executing an HTTP request for each card at every on-
line store has a high cost. A simple alternative is to
fetch prices from a marketplace that has already in-
dexed the prices from many retailers. In this way, only
one HTTP request is needed for each input product.
In this case study, the prices from Magic: the
Gathering cards are collected from a marketplace.
Unfortunately, the selected source does not provide
an application programming interface (API). For that
reason, a web crawler that collects its web pages
directly was created, respecting the requirements
present in the site robots.txt file. Figure 1 illustrates
the crawling process at a marketplace.
Figure 1: Prices crawling process.
In our context, every input product is a Magic the
Gathering card. For each input card, a crawling pro-
cess executes an HTTP GET request in the Magic:
the Gathering target marketplace website. It filters
card price and stock for every electronic commerce
returned in the response. When all processes have
finished their search request, the collected data is cen-
tralized in a data structure that organizes the cards
present in each electronic commerce. This structure
is stored in the local database.
3.3 Caching Mechanism
The use of cache mechanisms reduces the number of
HTTP requests on a search. We store products and
prices from individual and marketplace websites on
the cache.
We implement a uniform update policy for cache
consistency. We use a default frequency of 24 hours
to refresh the prices. We chose this policy because
Cho and Garvia-Molina (Cho and Garcia-Molina,
2000) compared updating metrics and policies for lo-
cal copies of remote systems. They describe the uni-
form allocation policy have better results than the pro-
portional allocation policy.
In the case study, we use a singular table in a
PostgreSQL (PostgreSQL, 2021) database to store
the product prices because of two main advantages:
(i) the method execute values from psycopg2.extras
python library that can perform massive inserts on a
table quickly, and (ii) the possibility of using Struc-
tured Query Language (SQL) queries later to re-
trieve the updated prices from the cards on the local
database.
3.4 Proposed Models
We propose two models to address the combinatorial
problem of finding the minimum price for a product
list in a group of online stores. For both models, con-
sider a set of product models indexed by I = {1, ..., n}
and a set of retailers indexed by J = {1, ..., m}. For ev-
ery product model i I there is the purchase quantity
q
i
. To every retailer j J, the shipping cost f
j
is con-
sidered if the retailer j is designated for the purchase
of a product. For every product model i associated to
a retailer j there is the price cost c
i j
, also there is the
available stock e
i j
for every product i in the retailer
j. Linear Models 1 and 2 are presented in the follow-
ing subsections. Linear Model 1 solves optimization
problems when the desired quantity of each product
is equal to 1, and Linear Model 2 is capable of solv-
ing problems with more units for each product. The
linear programming models were implemented using
PuLP Python library.
3.4.1 Linear Model 1
Be the variables x
i j
and y
j
defined by:
x
i j
=
(
1, if the product i is purchased in the store j
0, otherwise
y
j
=
(
1, if the shipping cost from j it’s used
0, otherwise
The problem consists in minimizing the total pur-
chase cost for every q
i
= 1. The cost is composed of
the sum of the prices c
i j
from selected cards, and the
shipping costs f
j
from the selected stores. Therefore,
the objective function to be minimized is expressed in
the equation 3:
F =
jJ
f
j
· y
j
+
iI
jJ
c
i j
· x
i j
(3)
A Case Study for Minimal Cost Shopping in a Cluster of Online Stores
719
The objective function is subjected to the con-
straints:
jJ
x
i j
= 1, i I (4)
iI
x
i j
n · y
j
, j J (5)
y
j
, x
i j
{0, 1}, i I, j J (6)
The constraint 4 ensures that a product is pur-
chased from a single store. The constraint 5 activates
the variable y
j
for every store utilized, in this way
the shipping y
j
is considered in the objective func-
tion. The restriction 6 delimits the value interval for
y
j
and x
i, j
, that can be only integer numbers 0 or 1.
We reinforce that the goal of this model is to min-
imize the function 3 constrained by 4, 5 and 6, the
minimization of this function solves the best purchase
configuration problem. In this model, every q
i
= 1. In
other words, there is only one product of each kind as
input.
This model is similar to the Dual-Based Procedure
for the Unconstrained Facility Location Problem pro-
posed by Erlenkotter (Erlenkotter, 1978), the differ-
ence is that in our approach, x
i j
can only assume the
values 0 or 1.
3.4.2 Linear Model 2
Let the variables x
i j
and y
j
be:
x
i j
Z, quantity of items i in store j
y
j
=
(
1, if the shipping in the store j is used
0, otherwise
The problem consists in minimizing the total pur-
chase cost, which is composed of the sum of the se-
lected card prices c
i j
and the shipping costs f
j
from
the selected stores. The function to be minimized is
expressed by the equation 7:
F =
jJ
f
j
· y
j
+
iI
jJ
c
i j
· x
i j
(7)
The objective function is constrained by:
jJ
x
i j
= q
i
, i I (8)
x
i j
e
i j
i I, j J (9)
iI
x
i j
y
j
·
iI
e
i j
, j J (10)
y
j
{0, 1}, j J (11)
x
i j
0, x
i j
Z, i I, j J (12)
The constraint 8 ensures that the quantity of prod-
ucts do not surpass the desired total. The constraint
9 ensures that the quantity used of each product x
i j
be lower or equal the stock e
i j
. The constraint 10 en-
sures that the total of purchased items is lower than
the available stock. The constraint 11 sets the range
of possible values for y
j
and the constraint 12 define
the range of values for x
i, j
.
Finally, we reinforce that the goal of this model
is to minimize the function 7, constrained by 8, 9, 11
and 10, the minimization of this function will result
in the optimal purchase configuration.
3.5 Experiments
The experiments were performed on an Intel(R)
Core(TM) i7-7700K CPU @ 4.20GHz processor,
with 20GB of RAM, running Ubuntu 17.04.
We registered the execution time for requests, in-
cluding 30, 60, and 100 cards. We used the card
lists A (Tappedout.net, 2019a) and B (Tappedout.net,
2019b) for the executions. These lists are realistic
Magic: the Gathering playable decks.
In order to evaluate the execution time for the pro-
posed models, we performed tests using random input
card lists. Both models executed lists with 40 sets of
25, 50, 100, and 200 cards, keeping the number of
units of each card equal to 1 for both proposed mod-
els. Model 2 can converge to solutions even if the
number of units for each card is greater than 1. Thus,
we also performed tests using two units for each card.
It is unlikely that every store has every possible
product in stock in a real-world scenario. For this rea-
son, we performed tests using the same random input
lists, in which every store provides all products. A
random numeric price between 1 and 50 was assigned
for every product missing from a store.
The evaluation of the mechanisms considers two
main factors: (i) returning the purchase configuration
with minimum purchase cost and (ii) performing the
price fetch and the linear optimization with guarantee
in terms of response time.
4 RESULTS AND DISCUSSIONS
This section presents the data crawling time, cache
analysis, and linear models performance.
4.1 Web Crawler
We fetched the product prices for lists A and B, vary-
ing the number of cards. The results are presented in
Table 1.
ICEIS 2022 - 24th International Conference on Enterprise Information Systems
720
Table 1: Crawling response time in seconds.
List 30 cards 60 cards 100 cards
A 9.72 18.32 29.20
A 10.35 17.70 29.75
A 9.90 17.69 28.04
B 10.60 18.44 30.10
B 10.11 18.60 29.53
B 10.74 19.01 29.40
Even by distributing the web crawler tasks into
multiple processes, the price retrieving still requires
about 30 seconds for 100 distinct cards. So, for a
list of 100 cards, when all the products to fetch are
cache hits, the retrieval time is much smaller, in our
experiments the worst time recorded was 0.2 seconds.
Therefore, the experiments justify using a caching
mechanism.
4.2 Performance of the Linear Models
We tested 40 sets of 25, 50, 100, and 200 cards
for both real and virtual price databases, keeping the
number of units of each card equal to 1 for both pro-
posed models and testing the performance of model 2
with two units of each product. In each execution of
the Integer Linear Programming, the total number of
stores considered by the models varied between 120
and 127 stores.
Table 2: ILP Models optimization time in realistic stock.
Quantity of Model 1 (s) Model 2 (s)
cards units AVG SD AVG SD
25 1 1.1 0.63 1.16 0.7
25 2 - - 1.24 1.0
50 1 2.66 3.48 2.33 2.14
50 2 - - 5.82 7.33
100 1 11.02 11.24 9.1 7.96
100 2 - - 20.6 15.03
200 1 46.23 72.44 31.18 35.43
200 2 - - 48.07 47.71
Table 2 presents the average execution time and
standard deviation for both of the proposed models
in the realistic stock database. It is observed that for
25 cards, both models converged in an average of 1.1
seconds and 1.16 seconds, for 50 cards 2.66 seconds
and 2.33 seconds, for 100 cards 11.02 seconds and
9.1 seconds, for 200 cards, model 1 took an aver-
age of 46.23 seconds while model 2 took 31.18 sec-
Figure 2: Execution time for 100 cards in seconds.
onds. However, the standard deviation for both cases
is high, indicating a relevant variance in model execu-
tion time.
Model 2 can find the optimal purchase configura-
tion for more units of the same product. We can ob-
serve that in every execution, the fetch time increased.
However, the execution time of increasing the number
of units is lower than increasing the number of differ-
ent items on this model.
The execution time distribution for 100 unique in-
put cards is presented in Figure 2. We can observe that
for model 1 most of the executions took more than 2
seconds and less than 11 seconds, with some outliers.
The lower execution time for model 1 with 100 cards
was 0.89 seconds, while the greater was 56.3 seconds.
Model 2 got similar results, and the lower execution
time was 0.88 seconds and the greatest 27.9 seconds.
Overall, model 2 can converge to the optimal solution
faster for most inputs.
Figure 3 displays the time distribution for 200
unique input cards. We can see that compared to 100
input cards, the execution time is much greater, and
here the better performance for model 2 is even more
perceptible. The lower execution time for model 1
with 200 cards was 1.5 seconds, while the greater was
271.71 seconds. Similarly, the lower execution time
was 1.6 seconds and the greatest 150.3 seconds for
model 2.
We observed that the time to converge to the opti-
mal solution had significant variations. In our experi-
ments with 100 cards, the list random13 converged in
56 seconds, while the list random2 converged in 0.93
seconds. The number of iterations of the models is
expected to be associated with the fetch space’s char-
acterization, which is defined by the behavior of the
constraints that vary depending on the input.
We also investigated the models’ behavior when
every store has all the products. We used the same
input lists from the previous experiments. When a
A Case Study for Minimal Cost Shopping in a Cluster of Online Stores
721
Figure 3: Execution time for 200 cards in seconds.
store misses a particular item from the input list, we
insert a virtual product with a random numeric price
between 1 and 50.
Table 3 presents the average execution time and
standard deviation for both of the proposed models.
Every store has all the products in stock, and as ex-
pected, in this scenario, the execution time is much
greater than in the realistic case for every singular in-
put list.
Table 3: ILP Models optimization time in the virtual stock.
Quantity of Model 1 (s) Model 2 (s)
cards units AVG SD AVG SD
25 1 4.41 2.33 5.02 2.54
25 2 - - 6.21 3.58
50 1 17.8 7.52 17.8 9.34
50 2 - - 22.5 12.8
100 1 110 112 87.6 69.3
100 2 - - 113.4 95.9
200 1 1048 731.7 655.8 766.9
200 2 - - 786.7 860.4
We observed that for 25 cards, the average execu-
tion time for both models is 4.41 and 5.02 seconds to
find the optimal purchase configuration, for 50 cards
17.86 and 17.84 seconds, for 100 cards 110.06 and
87.67 seconds, and for 200 cards 1048.14 and 655.83
seconds. Tests with two units of each card, model 2
converged in an average of 6.21 seconds for 25 cards,
22.57 seconds for 50 cards, 113.46 seconds for 100
cards, and 786.75 seconds for 200 cards.
As expected, when every store has every input
item in stock, the time to converge to the optimal solu-
tion scales faster when compared to the realistic stock
scenario. It is natural since the fetch space will be
larger.
5 LIMITATIONS
The proposed solution has two main limitations: (i)
the need for a single marketplace to collect the stock
prices and (ii) the fetch space characterization. Updat-
ing the stock by collecting prices for many resources
in real-time is a problem because of the number of
HTTP requests to be made. For this reason, our ap-
proach is to fetch from a singular marketplace that al-
ready catalogs the stock from many sellers. Another
limitation is the exponential processing relative to the
stock and input query from a specific execution. In
our experiments, for 200 input products, the execu-
tion time in both proposed models was high, having
averages of 31 seconds and 46 seconds, which would
not be acceptable for web applications.
6 CONCLUSION AND FUTURE
WORKS
In this work, two Integer Linear Programming models
were designed to solve the batch purchase combina-
torial problem. We performed tests using Magic: the
Gathering online stores. The products used in the ex-
periments are randomly chosen Magic: the Gathering
cards.
We performed experiments using the actual stock
from online stores and virtual stock experiments. All
the online stores have every input card in stock in the
virtual experiments. For the realistic scenario, both
solutions converged to the optimal solution in less
than 60 seconds for most executions. However, in the
case in which every store has every item, the execu-
tion time is at least double.
In our experiments, we observed that both mod-
els converge to the optimal solution when feasible.
Model 2 was able to find the optimal solution in
less time when compared to model 1. We also ob-
served that the time required to find the optimal solu-
tion grows considerably as the number of input items
grows. However, the time required to converge when
the number of units of each item grows is lower.
In the context of purchasing multiple items from
different sources, in cases such as Magic: the Gather-
ing online stores, the proposed models are viable for
use since there are about 150 different stores, and due
to the nature of this niche, it is common that singular
purchases include between 2 and 100 distinct items,
since the number of cards contained in a Magic: the
Gathering varies between 45 and 100 playable cards.
However, even in niches such as Magic: the Gath-
ering, there are applications that allow ordinary users
ICEIS 2022 - 24th International Conference on Enterprise Information Systems
722
to sell their cards online, such as Card Market
1
and
MypCards
2
. In this scenarios, the number of pos-
sible retailers can be considerably larger since there
are much more persons playing the game than online
stores.
In the current stage of our research, we are explor-
ing heuristic models to find at least reasonable solu-
tions if the computational cost becomes too high. We
are working in parallel solutions to improve response
time and investigating if choosing between an optimal
solution versus reasonable solutions analyzing input
from shoppers. We also intent to collect feedback of
the solution in a commercial online application.
REFERENCES
Cho, J. and Garcia-Molina, H. (2000). Synchronizing a
database to improve freshness. In Proceedings of
the 2000 ACM SIGMOD International Conference on
Management of Data, SIGMOD ’00, pages 117–128,
New York, NY, USA. ACM.
Cooper, L. (1963). Location-allocation problems. Opera-
tions Research, 11(3):331–343.
Erlenkotter, D. (1978). A dual-based procedure for unca-
pacitated facility location. Oper. Res., 26:992–1009.
FISHER, M. L. and WOLSEY, L. A. (1982). On the
greedy heuristic for continuous covering and pack-
ing problems. LIDAM Reprints CORE 505, Univer-
sit
´
e catholique de Louvain, Center for Operations Re-
search and Econometrics (CORE).
Gao, D., Jiao, W., and Zhang, J. (2010). Capacitated facility
location problem with freight cost discount. In 2010
7th International Conference on Service Systems and
Service Management, pages 1–5.
Garfinkel, R., Gopal, R., Pathak, B., and Yin, F. (2008).
Shopbot 2.0: Integrating recommendations and pro-
motions with comparison shopping. Decision Support
Systems, 46(1):61 – 69.
Garfinkel, R., Gopal, R., Tripathi, A., and Yin, F.
(2006). Design of a shopbot and recommender sys-
tem for bundle purchases. Decision Support Systems,
42(3):1974 – 1986.
Kandi, M. M., Yin, S., and Hameurlain, A. (2018). An
integer linear-programming based resource allocation
method for sql-like queries in the cloud. In Proceed-
ings of the 33rd Annual ACM Symposium on Applied
Computing, SAC ’18, page 161–166, New York, NY,
USA. Association for Computing Machinery.
PostgreSQL (2021). Postgresql web page. Last access:
2021-03-12.
Tappedout.net (2019a). A commander deck sample. Last
access: 2019-06-10.
Tappedout.net (2019b). A commander deck sample. Last
access: 2019-06-10.
1
https://www.cardmarket.com/en/Magic
2
https://mypcards.com/
Yu, H., Gao, L., and Lei, Y. (2012). Model and solution for
capacitated facility location problem. In 2012 24th
Chinese Control and Decision Conference (CCDC),
pages 1773–1776.
A Case Study for Minimal Cost Shopping in a Cluster of Online Stores
723