
 
databases. Transactions are undone just in the site 
where the fault occurs and they are submitted again 
later on. This strategy always implies in a period of 
latency for the data to be available (replication 
latency). Updates delays will always occur in 
databases that are inaccessible, but the other 
databases will be updated after the time of latency.  
Nevertheless, a measure of the latency can be 
used by the application to limit risks for some 
transactions. For example, an application can change 
its behavior using an estimate of latency as an 
advisory. If the latency is above a pre-defined 
threshold, the application can reduce the values of a 
loan or withdrawal. 
3 IMPLEMENTATION AT THE 
UNIVERSITY OF SÃO PAULO 
Founded in 1934, the University of São Paulo (USP) 
is the largest institution of higher education and 
research in Brazil, and the third in size in Latin 
America. With 746 courses taught in its teaching and 
research units, 202 of which are undergraduate 
courses attended by approximately 46,000 students, 
and 487 are graduate courses (including 280 for 
masters' and 264 for doctors' degrees). Its teaching 
units are distributed among its eight campuses 
spread in six cities.  
To support its activities, USP has a complex 
administrative infrastructure, most of which is 
centralized in the city of São Paulo, but operations 
are wide decentralized: each of the teaching units 
has its administrative office and there are regional 
headquarters hosted in each campus. 
To confront this complexity, each business area 
has its workflows implemented in control systems 
for the business area (here called “application”). 
However, since many of the flows pass through 
more than one area, data integration is highly 
needed. 
The corporate data model of the University of 
São Paulo was conceived to support this integration 
and distribution of data and applications. This 
logical model cover all businesses of the University 
(Academic control for undergraduate and masters 
degrees, Finances, Human Resources etc.) and it is 
structured as a single relational model for all the 
institution, having an extensive number of entities 
and relationships. The physical implementation, due 
to performance and availability considerations,  is 
distributed among several databases. 
At the time of the initial implementation, the 
hardware available at the University could not 
operate satisfactorily all these databases from the 
same server. The choice made, therefore, was to 
distribute the databases among 4 servers, each one of 
them concentrating on one of the main business 
areas. 
This scenery, if implemented in the traditional 
“n-phase commit” way, would have implied in high 
code complexity and transactional cost regarding 
tables of common use to the applications, because, 
in addition to repeating the same transaction in 
several databases, guaranteeing in this way the 
referential integrity of the model, each application 
would have to establish connections with each one 
of these databases for reading or recording 
operations. 
The physical implementation followed the 
premises and definitions below: 
3.1  Global Data Model 
The downsizing process, that motivates systems 
migration from mainframe to client server, started 
with premise of a unique and integrated logical data 
model, internally named “global data model”. This 
logical abstraction was the answer to integration 
problems with old mainframe data structure (apart 
databases, one for each business area) such as 
different id for same people, address or personal data 
updated only in one database and incorrect in others, 
difficulty in identify the same people data in each 
database and so on. 
The global data model contains unique logical 
abstractions for each concept used in corporate 
systems, independently which application uses it. 
For example; all personal data, used by all 
applications, is stored in a “PERSON” table. 
People’s roles, such as student, graduate, professor 
or faculty staff are stored in another table set. All 
these tables (persons, roles, relationships) constitutes 
a sub-model named “PERSON”. Organizational 
unity information is stored in a sub-model named 
“STRUCTURE”. Another example is applications 
access, that is centralized and its control data is 
represented in sub-model “USER”.  
Although logical abstraction is unique to each 
concept (PERSON, by way of example), its physical 
implementation is distributed among several 
databases. All databases that require PERSON for 
read data, consistency or referential integrity have a 
PERSON table replica. PERSON is primary in one 
and only one database (see 3.5.1). Nowadays there 
are PERSON table replicas in 28 databases. 
ICEIS 2006 - DATABASES AND INFORMATION SYSTEMS INTEGRATION
324