
 
representations are intuitive presentations for these 
conceptual representations where nodes represent 
entities and arcs or edges represent the relationships.  
To facilitate working with the representations, 
entities and relationships have labels, however, the 
labels are merely for the purpose of discussion and 
do not suggest the “meaning” of the node or 
relationship.  Nodes and edges derive their meaning 
strictly from their connections. 
4.2 Graph-based Information 
Representation 
Graphs, as a mathematical construct, have been 
studied for hundreds of years. More recently, graphs 
have been applied to practical problems involving 
networks, particularly in transportation and 
communication. The key observation is that network 
problems focus, not on the things, but on the nature 
of the connections between things. The essential 
information in the traveling salesman problem is not 
the destination cities, but the ways in which those 
cities are connected in a transportation network and 
the cost of making a trip between two particular 
cities. As we observed earlier, the interpretive 
frameworks that enable us to operate in terms of 
information emphasize relationships. A graph-based 
representation is the natural choice for expressing 
relationships. (Ebert, 1996) 
 
In a graph-based information representation scheme 
nodes are labeled end-points that represent a single, 
atomic entity and arcs represent an assertion of an 
association between two nodes. Arcs are typed so 
that multiple associations may be expressed in a 
single representation. Values may be associated with 
each node to carry information that may be needed 
at other levels of the system (e.g., a string label to be 
displayed to a user) but are treated, insofar as the 
graph representation is concerned, as opaque blocks 
of data. 
 
Because information is expressed in relationships, 
systems that implement a graph-based information 
representation will be optimized to store and manage 
networks of relationships. Graph theory considers 
directed and undirected arcs. We have found that a 
pair of directed arcs, where one arc points from the 
first node to the second and another points from the 
second to the first, gives us a general construct that 
can be used as either a directed or undirected 
connection. More importantly, this representation 
captures the fact that if we can assert that one object 
has a relationship with another, we also implicitly 
assert that the other object has a reciprocal 
relationship with the first. By making the reciprocal 
relationship explicit the graph-based representation 
naturally provides back-links that double the 
possible traversal patterns. We call this construct a 
relationship. 
 
With the majority of the information residing in the 
networks of relationships, nodes must represent 
single, finer-grained entities. Because any two nodes 
in a graph may be linked by a relationship, a concept 
need only be expressed once and represented by a 
single node. This has the important side-effect of 
naturally creating a fully-normalized representation. 
 
The notion that nodes represent atomic entities can 
be a difficult concept. In a graph-based information 
representation scheme, each node should represent 
one, atomic thing. In practice, this generally means 
that what would be an object in an object-oriented 
system or a row in a relational system would be a 
network in a graph: The graph representation of, say, 
an employee record would have a node for each field 
in the record and all of those nodes would be 
connected with the node that represents the 
employee record. 
4.3  Performance of Graph-Based 
Information Representation 
Extracting precise information can be time 
consuming and expensive when working with 
complex data sets. For example, consider the 
following comparison of using a modeling paradigm 
implemented in a graph based storage system versus 
the current solutions in a relational database system.  
Administrators using relational database technology 
strive to optimize queries across multiple tables, but 
this often involves iterative cycles for filtering out 
irrelevant information and structuring statements 
that reduce the answer set based on ordered 
sequences. Because of this, relational queries 
through chained data are often limited to four or five 
connection levels. In many cases, a four or five 
degree search becomes unmanageable, overly time 
consuming, and requires additional hardware and 
software. 
 
Queries when using a graph database are 
significantly simpler, with the ability to traverse data 
that was never de-structured to fit into tables. To a 
large degree, data in a graph follows its natural 
pattern of existence with relevant information 
related through close association. This pattern 
follows even as the data is committed to disk. 
 
To illustrate, assume a large data set with records 
indicating parent-child relationships but no extended 
INFORMATION-CENTRIC VS. STORAGE/DATA-CENTRIC SYSTEMS
505