SNAP: Scalable Networkable ABM Platform for the Social Sciences

Christopher M. Conway

IÉSEG, School of Management, 3 Rue de la Digue, Lille, France

Keywords: Agent-Based Modeling, Distributed Systems, Computing for Social Sciences.

Abstract: Agent-Based models (ABMs), although increasingly useful and widespread, are underused in social science.

I show that extant “user-friendly” platforms are not well suited for social science research, while platforms

that would support such research are not easy to use. I outline requirements for a sufficiently powerful, easy-

to-use system, which requires no programming skills on the part of the user. I explain the design and devel-

opment of an ABM which features a GUI and a menu of agents, statistics, and visualizations which are com-

monly desired. This system is robust enough for social science research; it is portable, flexible, and customi-

zable. Users will have access to pre-designed complex and recursive agents, running distributed across all

available processors, as well as user-selected geometries and time clocks. Progress and future work are dis-

cussed.

1 INTRODUCTION

Agent based modeling (ABM) has come a long way

from its humble beginnings, and has proved its worth

in a number of fields, from environmental sciences to

genetics and epidemiology. ABM can provide

preliminary results to research questions which

cannot be explored directly; it can eliminate

inappropriate solutions; it can illuminate emergent

behavior; and it can simplify complex environments.

Perhaps most usefully, ethical considerations can be

ignored or resolved using “agents” before testing with

live subjects, whether that is a pond ecosystem or

corporate employees (Gilbert, 2007).

Because the social science field considers the

behavior of human actors, ethical considerations are

in play almost immediately, which constrains

research. Furthermore, experiments are often difficult

to replicate, since human subjects are unreliable due

to many biases and research threats (such as social

desirability, hypothesis guessing, or experimental

interference) which may cause them to react

inaccurately to a study, to which ABMs are immune

(Gilbert, 2007). Thus, ABMs enable social science

researchers to do considerable preliminary work,

eliminating dead ends and ethical traps before the first

subject is contacted.

https://orcid.org/0000-0003-0108-9177

However, ABMs are considerably underused in

social science (Conte & Paolucci, 2014; Hughes et al.,

2012; Jackson et al., 2017). Seeing an opportunity, I

began to search for social science research,

particularly in applied social psychology, which was

well adapted to expansion with ABM. However, it

quickly became clear that the existing platforms are

not well suited for social science research, and/or are

difficult to use. I realized that what was needed was

an ABM which is well adapted to social science

research.

We know from decades of research with the

Technology Acceptance Model (TAM) (Davis, 1989)

that, in order to be adopted, technology should be

perceived as both easy to use and useful. Of currently

available ABM platforms, those that are easy to use

are not well suited to social science research, and

those that support social science research are not easy

to use. This contention is supported by papers

reviewing ABM platforms (Abar et al., 2017).

As an experienced software engineer now

working in the social sciences (MIS), I am well

positioned to create an ABM for social science

research. In this paper, I will first provide the

requirements for a system which would be easy to use

and useful for research in social science, and briefly

examine major current platforms in light of those

Conway, C.

SNAP: Scalable Networkable ABM Platform for the Social Sciences.

DOI: 10.5220/0011051400003274

In Proceedings of the 12th International Conference on Simulation and Modeling Methodologies, Technologies and Applications (SIMULTECH 2022), pages 165-171

ISBN: 978-989-758-578-4; ISSN: 2184-2841

165

requirements. I will then present a design for a system

which I believe will meet all the requirements.

Finally, I will explain my progress in implementing

this platform, and what future work I am planning.

2 ABM AND THE SOCIAL

SCIENCES

2.1 Making an ABM Easy to Use

Social scientists are not, generally, also experts in

computer science and programming, especially

systems programming. Early conversations with my

social sciences colleagues suggest that they need a way

to specify their ABM models simply and intuitively,

without requiring knowledge of application efficiency,

parallelization, synchronization, or guaranteeing

consistency. As with many users of this type, this

means that they need a graphic interface which will

provide a simpler set of intuitive and, to the extent

possible, self-explanatory tools to allow execution of

typical tasks without requiring detailed knowledge of

the underlying system. For ABM systems, these

typical needs are as follows:

1. Select among “typical” agents and add them to a

simulation run, without needing to program them

2. Extract typical data and statistics from a run,

either by integrating with a common statistical

program or by providing raw data in a format

usable by common statistical programs

3. Initialize simulation runs easily, including options

to specify multiple runs with or without parameter

changes

4. Predict run time required when possible, given a

set of computing resources and past history

5. Provide typically desired visualizations of a run in

progress

6. Perform the above tasks from the user interface,

without any required programming

2.2 Making an ABM Useful

The research domain of social sciences is humans and

their interactions. This introduces complexity at both

the micro and macro levels. At the micro level, each

human is a complex entity with goals, desires,

emotions, moods, beliefs, attitudes and behaviors. An

individual is, in fact, an entire ecosystem. At the

macro level, humans are embedded in the society that

surrounds them. This embedding is recursive;

humans are embedded in groups, which are

embedded in larger groups, and larger, until one gets

to the entire population of the planet. Additionally,

each individual is embedded in multiple groups,

which may or may not share memberships. The study

of humans is inherently multilevel and has an

extremely complex geometry. Thus, the ideal ABM

would require considerable flexibility and nuance in

the agent specifications. In fact, the meaning of

“agent” itself becomes complicated, since agents may

contain other agents: individuals are embedded in

groups, which are embedded in larger groups

recursively. Additionally, they are often members of

multiple groups, and the group membership of these

groups may have non-identity and non-null

intersections.

What this means in practice is a considerable

increase in complexity, on several fronts.

In the first place, the addition of nuance requires

that each agent make multiple choices, often among

multiple options. Most ABM platforms require that

these choices happen in sequence. Each additional

option, or nuance, is likely to double (or more) the

processing time and memory required. At its most

basic level, the model is turn based. Each agent

calculates and performs the appropriate action for its

current state, sequentially. When all of those

calculations have finished, the timer sets to the next

turn, and so forth. As the model increases in

complexity, whether in complexity of choice or in

number of agents, the length of time required for each

turn increases, and the number of turns required to

reach a new equilibrium also increases.

The multilevel nature adds further complexity.

The groupings may operate simultaneously with the

individuals, sequentially with the individuals, or may

even be in a feedback loop with the individuals, a loop

which needs to reach an equilibrium before the

current turn can end. This becomes more complicated

the more levels are introduced, and the more complex

the geometry of the groupings becomes.

Current ABM platforms primarily run as a single

process, and hence on a single CPU. Any attempt to

schedule other tasks risks thrashing by the CPU

and/or interruption to the program. CPU speed is the

hard limit for the length of time required for a

complete run. However, modern systems have

multiple CPUs; these CPUs are left idle by most

ABM platforms. Restricting the simulation to a single

CPU makes scaling to the level needed by social

simulations extremely problematic.

In the second place, current ABM platforms

primarily use a checkerboard model, in which each

SIMULTECH 2022 - 12th International Conference on Simulation and Modeling Methodologies, Technologies and Applications

166

agent occupies a specific geographical space. Many

of its actions will be prompted by the change in state

of its neighbors in the previous turn. Empty spaces

must be evaluated at each turn, as well.

However, many human interactions take place in

social space. Although this often overlaps with

physical space, social space is increasingly virtual

(e.g. FaceBook) and proximity is less important than

relationships. For example, we are likely to be more

influenced by the opinion of a relative or close friend

(strong relationship) than by a neighbor or fellow

commuter (strong proximity). Social space groupings

are also critical and increase the dimensional space of

the geometry exponentially.

Social space is difficult to map onto a standard

checkerboard model, because the exact number of

connections between this agent and other agents is

fluid. On a checkerboard, a square has 4 direct

neighbors and 4 diagonal neighbors; similar

calculations apply to an n-dimensional matrix.

However, in a social space, the number and strength

of these connections does not align easily with a

physical diagram. What's more, it is a sparse matrix;

there are many, many empty spaces. If we draw a

diagram showing whether a given agent is connected

to each of the other agents, we will find that the

number of extant connections is much lower than the

number of non-connections. I may know one

thousand people; I may have one million Twitter

followers; but there are 6 billion people in the world

and I have no connection to 5,999,000,000 of them.

In a checkerboard model, each of those 5.999

billion non-connections will take up a space, and must

be processed at each turn. Thus, the ABM platform

needs to support a more abstract notion of space, to

allow these kind of sparse geometries to be used

efficiently.

This leads to the following requirements:

1. The platform needs to be portable. As computer

systems and resources change, current ABM

platforms become obsolete. It should not rely on

use of a particular operating system or

programming language.

2. The platform needs make use of all available

computing resources. This means it must make

use of

2.1. multiple processors and cores in a system

2.2. GPUs contained in a system, and

2.3. networked nodes.

3. The platform needs flexibility. In particular, this

means that the platform needs to pre-specify as

little as possible of a simulation.

3.1. The platform must support multiple user-

defined geometries.

3.2. The platform must permit embedding of

groups of agents into other agents to enable

multilevel models.

3.3. The platform must allow use of multiple

types of time, as time may flow differently

for different agents (e.g. quarterly results

versus weekly sales quotas versus sudden

interruptions for emergency tasks of

undefined length).

4. The platform needs to be customizable and

extensible by the user.

4.1. It must be possible to easily modify

existing components to provide new or

modified functionality, or to add new

components such as new types of agents,

new groupings, new geometries, and new

measures of time.

4.2. Adding or modifying modules necessarily

implies some level of sophistication in the

user beyond our expected case. The

sophisticated user may be familiar with

specific programming languages or

systems. The platform needs to allow for

use of the programming language, design,

and run environment which is comfortable

for this more experienced user. For them,

use of their preferred language and

environment will also add to the platform's

ease of use, since they will not need to

learn a new language or system

environment.

3 CURRENTLY AVAILABLE

ABMS

I have taken as my starting point a recent

categorization of ABM platforms. In the

categorization, platforms were ranked on ease of

model development and scalability. Ease of model

development had three levels: easy, moderate, or

hard; scalability had four levels: small, medium,

large, and extreme (Abar et al., 2017). In Table 1

below, I have looked at platforms which are "easy" to

develop models on, and which have scalability

"large" or better. While these do not directly

correspond to "ease of use" and "usefulness" as

explored above, they are a good proxy for limiting the

search to a manageable number. This gives three

possible candidates: Altreva Adaptive Modeler,

SeSAm, and NetLogo. Of these, SeSAm and

NetLogo are the only general-purpose platforms;

Altreva Adaptive Modeler only handles financial

SNAP: Scalable Networkable ABM Platform for the Social Sciences

167

market simulations, so I exclude it from

consideration. Table 1 shows which requirements

SeSAm and NetLogo meet.

4 BUILDING A BETTER

MOUSETRAP

Clearly the available platforms do not simultaneously

provide ease of use and usefulness to the social

science researcher. A better solution is needed. Based

in my experience in computer science and real-time,

distributed, and systems programming, I propose the

following design. Two principles guide this design.

First, small tools that do one thing well, and can be

easily combined with other tools, are more powerful

and flexible than large tools that try to do everything.

Second, offload functionality to already extant system

components whenever possible, as much as possible,

to take advantage of domain expertise beyond my

own.

1. Framework management is decoupled from

the interface and the agents. This is an

approach which has previously been used for

reasons of scalability and performance (Bosse,

2021; Collier & North, 2013).

Table 1: Requirements and Existing Platforms.

uirement NetLo

o SeSAm

1. Select among “typical” agents and add them to a

simulation run, without needing to

rogram the

Yes Yes

2. Extract typical data and statistics from a run Yes Yes

3. Initialize simulation runs easily, including options

to specify multiple runs with or without parameter

chan

Yes Yes

4. Predict run time required when possible, given a set

of computing resources and past history

No No

5. Provide typically desired visualizations of a run in

ress

Yes Yes

6. Perform the above tasks from the user interface,

without any required programming

Yes Yes

1. Portable Yes; requires Java Virtual

Machine

Yes; requires Java Virtual

Machine

2.1. Uses multiple processors and cores in a system Only when multiple runs are

made; a single ecosystem is not

split across processors

Only when multiple runs are

made; a single ecosystem is not

split across processors

2.2 Uses GPUs contained in a syste

No No

2.3 Uses networked nodes No, although an extension can

run multiple versions of a

simulation across clustered

systems; again, a single

ecosystem cannot be split across

nodes

No, although an extension can

run multiple versions of a

simulation across clustered

systems; again, a single

ecosystem cannot be split across

nodes

3.1 Allows use

-specified geometries No Yes

3.2 Supports multilevel agents No Yes

3.3 Allows use

-specified types of time No No

4.1 Allows modification of included agents and

creation of new a

ents

Yes Yes

4.2 Allows creation of agents in any language Native creation in NetLogo

language. Bindings for some

languages exist to allow external

calls; others could be adde

SIMULTECH 2022 - 12th International Conference on Simulation and Modeling Methodologies, Technologies and Applications

168

2. A basic communications library abstracts all

the information needed for two independent

programs to send each other messages,

regardless of the location of the agents (on the

system, on a GPU, on another networked

node) in the most efficient possible manner.

Messages

3. between agents will be plain text strings

(encrypted in transit, but plain text to the agent

itself).

4. A directory server allows programs to find

each other (similar to RPC portmapper or

CORBA). The directory server will also

enable broadcast and multicast messages. The

directory server is simply an agent with a well-

known (to the ABM platform) address.

5. All agents are self-contained programs which

use the library in (2) above and the directory

in (3) above to find and communicate with

other agents; this creates extreme modularity.

The agents are normal processes scheduled by

the operating system, leveraging the expertise

that goes into operating systems

implementation to efficiently use the CPUs

and cores available. This approach also has

some precedent, though not quite to the

fundamental level that I propose (Collier &

North, 2013). This also allows agent

parameterization to be handled by command-

line arguments.

6. All framework components (GUI,

visualizations, geometry, time, etc.) are

themselves agents, no more or less powerful

than user agents.

7. A library of typical agents is provided for ease

of use (visualization, data tapping, statistics

creation, etc.)

8. The communications library directory server

and all included agents are written in C using

POSIX standard libraries for portability.

9. Library stubs in other languages are provided,

to allow agent programming in any language.

10. The design builds in security from the start.

Agents will need to authenticate with each

other in order to communicate with each other.

Messages sent between agents will be

encrypted. These are done in the provided

library, and the user does not need to concern

themselves with it beyond very simple

decisions (e.g., what is the password for the

simulation?). Encryption and authentication

will be handled by POSIX standard libraries.

4.1 How Does This Design Match the

Requirements?

First, a word on terminology. For this discussion,

"system" means the hardware and operating system

components. "Platform" means the overall ABM

platform, which, with this design, may have multiple

independent simulations running simultaneously.

"Ecosystem" is a user-defined set of agents which are

able to talk to each other in a particular simulation

run. Thus, the platform may run on multiple systems,

and may contain multiple ecosystems. There is no a

priori linkage between a system and an ecosystem.

4.1.1 Ease of Use

Since the user interface is completely decoupled from

the ABM platform, all usability aspects are handled

by an agent library. If the user wants a GUI, they start

the provided GUI agent. This interface provides for

agent creation, standard data extraction, run control,

statistical results, parameterization, run time

predictions, and visualization via provided agents.

These are each individual programs, which

communicate with the GUI agent to interface with the

user. A preset suite of interface agents is provided, but

it is possible for the user to determine how many or

few of these predefined agents they wish to use.

The fact that the agents are self-contained

programs using command line arguments for

parameterization makes the creation of a graphic user

interface very simple. The interface will not be as

efficient as an integrated GUI would be; however,

since the execution time of the GUI is going to be tiny

compared to the execution time of the simulation

runs, this tradeoff for simplicity seems worthwhile.

4.1.2 Usefulness

Because the provided programs and library are

written in C using POSIX libraries, they can easily be

compiled and run on any POSIX-compliant system

(including Windows, MACOS, Linux, and any UNIX

system (either SYSTEM V or BSD derivatives). As

standard research practices, university resources, and

technical best practices change, the entire model or

individual systems can be moved to any hardware

desired.

The problem of scheduling CPU resources has

been well studied in the computer science field; good

solutions exist and better ones are being developed

(e.g., (Chen et al., 2020; Kim et al., 2020; Wang et al.,

2020; Cheng et al., 2016)). An ABM platform which

allows the operating system to perform scheduling

SNAP: Scalable Networkable ABM Platform for the Social Sciences

169

tasks can take advantage of the latest updates, without

itself having to change. Thus, the operating system

handles internal load balancing. External load

balancing among networked nodes is an ongoing field

of study in cloud computing, with experimental load

balancers available (Hamdani et al., 2020). I

anticipate providing an agent which will handle this

in the medium-term future.

The reduced size of the ABM platform core

results in low overhead. Again, since agents are

simply programs, a program which runs in a GPU can

easily be an agent in the ecosystem like any other.

Since individual agents are self-contained, they can

be easily migrated to other systems. The directory

server and the communications library understand

how to communicate to agents which are located on

the same system, on a GPU, or on other network

nodes. The library uses shared memory on a single

system and either IPV6 or IPV4 to communicate to

other systems.

Geometry is just an agent. The user can use a

predefined agent (e.g. checkerboard), or can provide

their own agent to provide the geometry. This permits

use of agents which can efficiently handle the sparse

space of a social network. Similarly, time is just an

agent. The user can use a predefined agent (e.g. event

tick) or provide their own to create a more complex

idea of time, possibly including a real-time clock. The

ability of agents to migrate to other systems makes

dynamic reallocation of computing resources simple.

All of these being individual programs, starting or not

starting any individual agent is trivial.

Again, the fact that every agent is a self-contained

program guarantees the maximum amount of

modularity. Users can easily pick and choose which

agents to include in an ecosystem.

All the ease of use functions are provided by

predefined agents. The user is free to modify those or

provide their own to allow for more customizability.

Since each agent is its own program, the user is not

limited to the language of the framework for

programming their own agents. The stub libraries

provided allow the user to use whatever programming

language they choose to implement their own agents.

5 CURRENT STATUS OF THE

WORK

I have created the basic framework to allow agent

communication, creation, and removal. I also have a

basic checkerboard geometry agent and event tick

timer agent. I anticipate that in the near future I will

be able to reproduce Schelling's (Schelling, 1971) and

March's (March, 1991) models to verify correct

operation, with statistics showing scalability across

CPUs and additional systems.

6 FURTHER WORK

Currently very little of the standard ease of use set of

agents is complete. Additionally, "ease of use" is

currently based upon my conceptions, informed by a

few colleagues. I intend to formally research what the

simulation community considers a full set of

requirements. Much of this will be ongoing; in the

spirit of agile development, it is better to get

something working, and then fine tune it as users

interact with it and understand better their needs, than

to lock in a feature set at the very beginning.

I have not implemented use of GPUs at this point;

Library bindings are currently only in C, so agents

can only be written in C right now. Stub libraries will

need to be provided for other languages, to include at

least: Java, C++, R, and Python.

REFERENCES

Abar, S., Theodoropoulos, G. K., Lemarinier, P., & O’Hare,

G. M. P. (2017). Agent Based Modelling and Simula-

tion tools: A review of the state-of-art software. Com-

puter Science Review, 24, 13–33. https://doi.

org/10.1016/j.cosrev.2017.03.001

Bosse, S. (2021). Parallel and Distributed Agent-based

Simulation of Large-scale Socio-technical Systems

with Loosely Coupled Virtual Machines. 344–351.

https://www.scitepress.org/PublicationsDetail.aspx?ID

=trckToYQetc=&t=1

Chen, J., Soomro, P. N., Abduljabbar, M., Manivannan, M.,

& Pericas, M. (2020). Scheduling Task-parallel Appli-

cations in Dynamically Asymmetric Environments.

49th International Conference on Parallel Processing -

ICPP : Workshops, 1–10. https://doi.org/10.1145/34

09390.3409408

Cheng, L., Rao, J., & Lau, F. C. M. (2016). vScale: Auto-

matic and efficient processor scaling for SMP virtual

machines. Proceedings of the Eleventh European Con-

ference on Computer Systems, 1–14. https://doi.

org/10.1145/2901318.2901321

Collier, N., & North, M. (2013). Parallel Agent-Based Sim-

ulation with Repast for High Performance Computing.

SIMULATION, 89(10), 1215–1235. https://doi.org/

10.117 7/0037549712462620

Conte, R., & Paolucci, M. (2014). On Agent-Based Model-

ing and Computational Social Science. Frontiers in Psy-

chology, 5. https://doi.org/10.3389/fpsyg.2014.00668

SIMULTECH 2022 - 12th International Conference on Simulation and Modeling Methodologies, Technologies and Applications

170

Davis, F. D. (1989). Perceived Usefulness, Perceived Ease

of Use, and User Acceptance of Information Technol-

ogy. MIS Quarterly, 13(3), 319–340. https://doi.org/Ar-

ticle

Gilbert, N. (2007). Agent-Based Models. SAGE Publica-

tions, Inc.

Hamdani, M., Aklouf, Y., & Chaalal, H. (2020). A Com-

parative Study on Load Balancing Algorithms in Cloud

Environment. Proceedings of the 10th International

Conference on Information Systems and Technologies,

1–4. https://doi.org/10.1145/3447568.3448466

Hughes, H. P. N., Clegg, C. W., Robinson, M. A., &

Crowder, R. M. (2012). Agent-based modelling and

simulation: The potential contribution to organizational

psychology. Journal of Occupational and Organiza-

tional Psychology, 85(3), 487–502. https://doi.org/

10.1111/j.2044-8325.2012.02053.x

Jackson, J. C., Rand, D., Lewis, K., Norton, M. I., & Gray,

K. (2017). Agent-Based Modeling: A Guide for Social

Psychologists. Social Psychological and Personality

Science, 8(4), 387–395. https://doi.org/10.1177/19485

50617691100

Kim, J., Kim, J., & Park, Y. (2020). Navigator: Dynamic

multi-kernel scheduling to improve GPU performance.

Proceedings of the 57th ACM/EDAC/IEEE Design Au-

tomation Conference, 1–6.

March, J. G. (1991). Exploration and Exploitation in Or-

ganizational Learning. Organization Science, 2(1), 71–

87. https://doi.org/10.1287/orsc.2.1.71

Schelling, T. C. (1971). Dynamic models of segregation.

The Journal of Mathematical Sociology, 1(2), 143–186.

https://doi.org/10.1080/0022250X.1971.9989794

Wang, Y., Li, R., Huang, Z., & Zhou, X. (2020). An In-

depth Analysis of System-level Techniques for Simulta-

neous Multi-threaded Processors in Clouds. Proceedings

of the 2020 4th International Conference on High Perfor-

mance Compilation, Computing and Communications,

145–149. https://doi.org/10.1145/3407947.3407964

Willinsky, J. (2005). The unacknowledged convergence of

open source, open access, and open science. First Mon-

day. https://doi.org/10.5210/fm.v10i8.1265

SNAP: Scalable Networkable ABM Platform for the Social Sciences

171