The Design and Implementation of Big Data Analysis System for
Enterprise Economic Operation
Dan Liu*, Yuan Sun and Libin Zhang
School of Big Data and Artificial Intelligence, Dalian University of Finance and Economics, Dalian City,
Liaoning Province, 116622, China
Keywords: Economic Operation of Enterprises, Big Data, Hadoop, Java.
Abstract: In order to make big data better serve the development of local economy, the author has established an
analysis platform system of comprehensive data of enterprises' economic operation in this city with the help
of big data technology. This system is a b/s mode application system developed by java language. The de-
velopment environment of this system is built by Linux system and developed by ssh framework which
combines spring, springmvc and hibernate. The system data is collected, converted, cleaned and counted by
setting up hadoop cluster of five servers. And the DTW dynamic time warping algorithm is improved, and
C4.5 decision tree classification algorithm which divides time series sets is used to predict the economic trend
of local enterprises more scientifically and reasonably. From the perspective of society and government, make
overall planning for the economic operation of regional enterprises, and establish a comprehensive data
platform for economic operation of this city with the help of big data technology, so that big data can better
serve the local economic operation and development. We will improve the economic operation monitoring
and analysis system, improve the quality and level of economic operation monitoring and analysis, realize
data integration and sharing, and establish a basic data classification and collection mechanism.
1 INTRODUCTION
The economic operation of enterprises is an im-
portant way for the government and industrial and
commercial administrative departments to manage
the local economy, and proper use can help promote
the sound development of local enterprise economy.
Meanwhile, the economic operation is also a very
important part for the operation and management of
the enterprise itself. Using scientific methods to
manage the economic operation of the enterprise can
help the planning scheme and various operations of
the enterprise to achieve sustainable development.
With the development of the Internet era, the eco-
nomic form has become increasingly complex with
the appearance of the Internet. Many local govern-
ments are aware of this, and begin to attach im-
portance to the establishment of the economic oper-
ation detection and analysis environment under the
Internet technology, so as to realize the data integra-
tion and effective monitoring of the information
systems established by various enterprises.
But the information systems of most enterprises
do not communicate with each other, and the infor-
mation data of each enterprise is not comprehensive
and standardized. It is difficult to realize the data
relevance and value sharing of the data of the local
economic operation detection project. Besides, the
data of enterprises' economic operation in different
markets are generally obtained through field inves-
tigation by relevant personnel, and it is often difficult
to reflect the development trend of the industry be-
cause the data is too specific. The poor quality of data
indirectly leads to the low quality of the report of
local enterprises' economic operation analysis, which
affects the overall development of local economy.
Therefore, it is advisable to use big data technology
to establish a cross-departmental and cross-unit big
data warehouse platform system to effectively inte-
grate all kinds of information and help the industrial
and commercial departments to analyze the economic
operation. (Zhu, 2021)
On the basis of the above analysis, the author
thinks that a data analysis system of local enterprises'
economic operation should be developed based on
big data technology. This system is a b/s mode ap-
plication system developed by java language. The
development environment of this system is built by
Liu, D., Sun, Y. and Zhang, L.
The Design and Implementation of Big Data Analysis System for Enterprise Economic Operation.
DOI: 10.5220/0011751800003607
In Proceedings of the 1st International Conference on Public Management, Digital Economy and Internet Technology (ICPDI 2022), pages 551-554
ISBN: 978-989-758-620-0
Copyright
c
2023 by SCITEPRESS Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)
551
Linux system and developed by ssh framework which
combines spring, springmvc and hibernate. The
system data is collected, converted, cleaned and
counted by setting up hadoop cluster of five servers.
We will provide users with a rigorous and efficient
decision-making platform from the perspective of
relevant personnel of local industrial and commercial
administration departments. The establishment of a
warehouse and a self-service business data analysis
platform integrating the economic operation data of
various enterprises can help local government and
industrial and commercial managers provide great
convenience for data analysis, reduce the workload of
statistical staff and improve the management effi-
ciency of local economy.
2 KEY TECHNOLOGIES
2.1 B/S Structure
The big data analysis system of enterprise economic
operation designed in this paper adopts B/S structure.
The B/S is the structure of browser/server, which is
widely used in web application development. In the
B/S structure, the client uses the browser title, while
the server is used to run the core technology. The
network environment of B/S is mostly used in wide
area network, and only the devices of browser and
operating system need to be loaded, so this structure
is more suitable for application and application de-
velopment with a wide range of customers. (Li, 2019)
2.2 Hadoop Ecology
The Hadoop is the infrastructure of a distributed
system, developed by Apache Foundation. The de-
sign of this ecosystem is mainly used to solve the
problems of massive data storage, analysis and cal-
culation in the era of big data. The Hadoop ecosystem
is mainly composed of mapreduce computing com-
ponent, yarn resource scheduling component, HDFS
data storage component and other auxiliary tools. The
Hadoop ecological cluster covers all kinds of com-
ponents in the big data technology ecosystem, in-
cluding business model layer, task scheduling layer,
data computing layer, resource management layer,
data storage layer and data transmission layer.
(Wang, 2015).
2.3 Classification and Prediction
Algorithm for Data Mining
2.3.1 K-nearest Neighbor Algorithm
K-nearest neighbor algorithm divides the number set
into several categories, and calculates the repre-
sentative particles of each category. X refers to the
distance between different prediction points and
representative points, and the final value X is the
minimum distance point.
Assuming that the number of categories is n and
the number of representative points of each category
is m, the classification function is:
𝑔
(x)=min
x x
k=1,2,3....,𝑀
(2)
In which i in x
represents n class, and k repre-
sents the k of m representative points. The category
with the largest number among the k minimum dis-
tances of the predicted point x is the category of the
predicted point, and k=1 is the nearest neighbor
method.
2.3.1 Decision Tree Algorithm
The decision tree algorithm is an inductive algorithm
classification rule based on the decision tree deduced
from the unordered sequence. It is a recursive algo-
rithm from top to bottom, so it is necessary to con-
struct the relationship between categories and attrib-
utes to predict unknown classes. The current main-
stream decision tree algorithms include c4.5, ID3 and
cart, etc. This paper focuses on C4.5 decision tree
algorithm, which is an improved algorithm based on
ID3. The construction of C4.5 decision tree first
needs to input the data set, classification attribute and
sample attribute set of the required data, and use V, C
and S to replace them respectively. 1. create node n .
2. where N=C when s is the set of c, otherwise, exe-
cute 3. 3. S is empty. N = the category with the most
frequent occurrences of S; S=NULL, then execute 4.
4. calculating the highest information gain rate v,
wherein N=V . 5. If s is the set of sample points of V,
then S=null, add a leaf node, otherwise, return
(V-,C,). 6. Recursive results are used to complete the
construction. (Mao, 2018)
2.4 Development Environment
The development environment of enterprise eco-
nomic operation big data analysis system is divided
into two parts, one is the construction of hadoop big
data cluster, the other is the application environment
of Javaweb technology. According to the required
ICPDI 2022 - International Conference on Public Management, Digital Economy and Internet Technology
552
amount of data, this paper builds a hadoop cluster
composed of one primary node named namenode and
four secondary nodes named datanode. These clusters
store massive data based on hdfs distributed storage.
The code of configuring HDFS components in the
cluster is shown in Figure 1. Then, the functional
components such as zookeeper-3.5.5 and flume1.9.0
are installed and deployed in these five nodes syn-
chronously, and the initial construction of hadoop
cluster is completed. The hadoop server cluster is
developed on five clients installed with Linux system.
This paper selects Centos7.8 Server release version of
Linux operating system. The Java development tool
used by the JavaWeb application of this system is
IDEA 2021.1.3, the development environment is
JDK 1.8, the development language is Java, and
Apache Tomcat 9.0 is selected for server building.
The code for detecting whether JDK is successfully
installed is shown in Figure 2. The development of
the system is based on MVC pattern, and the SSH
framework of spring+springmvc+hibernate is se-
lected as the framework. And choose MySQL 8.0.28
to help manage data.
Figure 1: hdfs-site.xml configuration code (Original).
Figure 2: Code for detecting whether JDK is successfully
installed (Original).
3 FUNCTION REALIZATION
3.1 Basic Client
The data warehouse construction in the data classi-
fication function module is mainly classified ac-
cording to the fact data of the main body registration
in the industrial and commercial market, including
five categories: time, region, enterprise type, industry
type and enterprise scale.
In the business analysis module, according to the
existing data, in order to clearly show the local re-
gional economic development situation for the in-
dustrial and commercial management departments,
this paper makes index modeling from a single di-
mension. Meanwhile, mining the law of economic
development, using the c4.5 algorithm based on time
series to forecast the local economy, and helping the
industrial and commercial administration depart-
ments to make a reasonable layout and adjustment of
the market economy in time. The underlying data of
intelligent analysis is the local accumulated historical
data of business administration departments, because
this part of data has the characteristic attribute of time
series. The initial input data set D and the number of
candidate sequence pairs M of the time sequence
decision tree. If there is x and y(x)= in D, this node is
a leaf node. It is continuously selected from D and
input into the set of candidate sequence pairs S, and
stops when the number of candidate sequence pairs is
m. Then the information gain and gain rate of each
candidate data pair are calculated in turn, and the data
with the largest gain rate is selected to be divided into
child nodes, and then the decision tree is constructed
recursively. The calculation formula of information
gain rate is shown in Formula 1.
InfoGainRatio(D,s)=
,

(1)
In the intelligent report generation and export
function module, users can select the time period,
content and form according to their needs, and au-
tomatically generate data reports after the selection.
The report includes data reports and various visual
images generated by echart. The visual image is
loaded by loading the echarts plug-in and data into
the web page generated by the report. According to
the API of the system echarts, the specific patterns of
charts belonging to this system are customized, and
the corresponding option module is also configured.
During the development of the system, in addition to
setting the attributes of option, the setoption function
is also called for rendering. (Chen, 2019)
3.2 Management Client
In the data preparation and uploading function mod-
ule, the administrator needs to select all kinds of
collected local enterprise economic operation data,
and select the appropriate data to upload to the sys-
tem. The data collection is provided by the relevant
departments of the Ministry of Industry and Com-
merce and relevant personnel on-the-spot enterprise
The Design and Implementation of Big Data Analysis System for Enterprise Economic Operation
553
investigation, including the enterprise registration
data and annual inspection data of local enterprises.
Administrators enter the code of information data, for
example, the industry type code input field is HYML,
the input type is vchar, and the data length is 100. The
data source is the industry category registration form.
At the same time, when this data is called in the
industry category dimension table, it is not allowed to
be set to null.
In the model building and deployment function
module, the administrator can adjust and change the
attributes of predictive modeling. The key model of
this paper is the regional economic forecasting mod-
el. The forecast model attributes of this system in-
clude industry division, industry type, enterprise
type, enterprise scale, time and region. The index
content under each attribute division is the number of
enterprises and the amount of registered capital,
while the predicted target attribute is the development
trend of local economy.
In the function module of user management and
data maintenance, administrator users can add, delete
and modify the information and permissions of basic
users. The system data is huge, so administrators
need to monitor and maintain hadoop cluster data of
each module.
4 CONCLUSION
The research of big data analysis system of enterprise
economic operation is mainly aimed at the research
and development of the underlying business data of
local industrial and commercial systems. The system
uses C4.5 decision tree algorithm to predict and
analyze the economic operation and development of
local enterprises through the related technologies of
data mining and the application of data analysis
visualization tools.
Due to my lack of ability, limited time and envi-
ronmental conditions, the current research and anal-
ysis have great limitations. It needs more excellent
personnel to improve and perfect this research.
Firstly, the data source of this system is not com-
prehensive enough, and there is a lack of multiple
data fusion with other government departments.
Secondly, due to the huge amount of data in the
underlying database, there is still as much as 70G of
data after processing, and the algorithm performance
and hadoop cluster server performance are limited,
which needs further optimization to save a lot of time
in data processing.
REFERENCES
Chen Manju. The Application of Statistical Analysis in
Enterprise Economic Operation Analysis.Economic
Forum.2019.04.
Li Ling. The Research and Design of Regional Economic
Trend Prediction and Analysis System Based on In-
dustrial and Commercial Data.Guizhou Universi-
ty.2019.04.
Mao Hongwei, Ruan Bohu.The Monitoring and Research
of Big Data Economic Operation in Zhuji City.MIN
YIN KE JI.2018.02.
Wang Xiaoyong. The Design and Implementation of
Jiangxi Industrial Economic Operation Analysis and
Forecast System.Jiangxi University of Finance and
Economics.2015.12.
Zhu Tao. The Monitoring and Analysis System of Munic-
ipal Private Economy Based on Big Data.Science and
Technology.2021.04.
ICPDI 2022 - International Conference on Public Management, Digital Economy and Internet Technology
554