databases. Transactions are undone just in the site
where the fault occurs and they are submitted again
later on. This strategy always implies in a period of
latency for the data to be available (replication
latency). Updates delays will always occur in
databases that are inaccessible, but the other
databases will be updated after the time of latency.
Nevertheless, a measure of the latency can be
used by the application to limit risks for some
transactions. For example, an application can change
its behavior using an estimate of latency as an
advisory. If the latency is above a pre-defined
threshold, the application can reduce the values of a
loan or withdrawal.
3 IMPLEMENTATION AT THE
UNIVERSITY OF SÃO PAULO
Founded in 1934, the University of São Paulo (USP)
is the largest institution of higher education and
research in Brazil, and the third in size in Latin
America. With 746 courses taught in its teaching and
research units, 202 of which are undergraduate
courses attended by approximately 46,000 students,
and 487 are graduate courses (including 280 for
masters' and 264 for doctors' degrees). Its teaching
units are distributed among its eight campuses
spread in six cities.
To support its activities, USP has a complex
administrative infrastructure, most of which is
centralized in the city of São Paulo, but operations
are wide decentralized: each of the teaching units
has its administrative office and there are regional
headquarters hosted in each campus.
To confront this complexity, each business area
has its workflows implemented in control systems
for the business area (here called “application”).
However, since many of the flows pass through
more than one area, data integration is highly
needed.
The corporate data model of the University of
São Paulo was conceived to support this integration
and distribution of data and applications. This
logical model cover all businesses of the University
(Academic control for undergraduate and masters
degrees, Finances, Human Resources etc.) and it is
structured as a single relational model for all the
institution, having an extensive number of entities
and relationships. The physical implementation, due
to performance and availability considerations, is
distributed among several databases.
At the time of the initial implementation, the
hardware available at the University could not
operate satisfactorily all these databases from the
same server. The choice made, therefore, was to
distribute the databases among 4 servers, each one of
them concentrating on one of the main business
areas.
This scenery, if implemented in the traditional
“n-phase commit” way, would have implied in high
code complexity and transactional cost regarding
tables of common use to the applications, because,
in addition to repeating the same transaction in
several databases, guaranteeing in this way the
referential integrity of the model, each application
would have to establish connections with each one
of these databases for reading or recording
operations.
The physical implementation followed the
premises and definitions below:
3.1 Global Data Model
The downsizing process, that motivates systems
migration from mainframe to client server, started
with premise of a unique and integrated logical data
model, internally named “global data model”. This
logical abstraction was the answer to integration
problems with old mainframe data structure (apart
databases, one for each business area) such as
different id for same people, address or personal data
updated only in one database and incorrect in others,
difficulty in identify the same people data in each
database and so on.
The global data model contains unique logical
abstractions for each concept used in corporate
systems, independently which application uses it.
For example; all personal data, used by all
applications, is stored in a “PERSON” table.
People’s roles, such as student, graduate, professor
or faculty staff are stored in another table set. All
these tables (persons, roles, relationships) constitutes
a sub-model named “PERSON”. Organizational
unity information is stored in a sub-model named
“STRUCTURE”. Another example is applications
access, that is centralized and its control data is
represented in sub-model “USER”.
Although logical abstraction is unique to each
concept (PERSON, by way of example), its physical
implementation is distributed among several
databases. All databases that require PERSON for
read data, consistency or referential integrity have a
PERSON table replica. PERSON is primary in one
and only one database (see 3.5.1). Nowadays there
are PERSON table replicas in 28 databases.
ICEIS 2006 - DATABASES AND INFORMATION SYSTEMS INTEGRATION
324