Gephi, Conspiracies, and SNA in the Classroom: Midterm Thoughts

Image from https://gephi.org/images/screenshots/layout2.png

This semester I designed a class, Introduction to Social Networks and Conspiracy Theories, that makes extensive use of Gephi along with the downloadable version of Networks, Crowds, and Markets: Reasoning About a Highly Connected World by David Easley and Jon Kleinberg. Readings on real conspiracies and conspiracy theories were compiled by myself into a course packet, and cover Ancient Athens, the assassination of Philip II, Knights Templar, the Gunpowder plot, the French Revolution, and conspiracy theories in the United States from the revolution to the present. In short, this class uses social network analysis to study paranoia from Plato to NATO.

Key to this class is the understanding and use of SNA software. I chose Gephi as it has a forgiving learning curve for creating networks and conducting basic analysis, and its cross platform capabilities were required as I do not have access to a computer lab for the class. The other deciding factor was the ease of exporting Gephi files (through the use of the excellent Sigmajs exporter) to the web, as the students will produce a number of publicly available network visualizations, in addition to a written report, for their final project.

Gephi has shown its usefulness to the class. The ability to very quickly take .csv files and make meaningful network diagrams impressed the students, and showing network filtering in real-time is a powerful way to show conceptually how eliminating bridges and key nodes can throw a network into confusion. Some other positive points:

Gephi’s GUI vs. command line tools
For my students, using a GUI has been a far better choice than a command-line or text driven interface. While the Gephi GUI can sometimes do strange things (like eliminate buttons or workspaces), on the whole its basic functionality is relatively intuitive. After a few demonstrations in the basics, the students have grasped how to create a network from spreadsheet data.

Real-time rendering
Keeping the various layouts running while filtering / changing elements of the network (especially the stand-by force atlas) powerfully illustrates many network concepts. It is also a very cheap (in time and effort!) method to create animated networks for the class.

Ease of stats
While some of the statistics I would like to see are not in the core of Gephi, the ones that are present are excellent. The students, after learning about the math and logic behind various network statistics, were quite relieved to discover how quickly Gephi can compute centrality, density, and degree measurements.

Styling
After spending some time going over the interface, the ease of selecting different attributes and measurements for node styling is something that really captured the student’s attention. I anticipate a flood of very interesting network diagrams for their final projects based on different styling / visualization choices, which is an excellent way for students to support their arguments.

Creating diagrams for the course
Using Gephi to create network diagrams for the conspiracy portion of the course is a very straightforward process, and the excellent export capabilities ensure that all of the networks I share look very professional.

While Gephi is an excellent piece of software, extensive use in the classroom has revealed some issues and missing features that do present a source of frustration for the class.

Java can be difficult
Supporting multiple operating systems with different Java installs on student laptops is an exercise in frustration. A class that uses Gephi extensively MUST have a supported computer lab, at the very least so that Java problems can be addressed and fixed for everyone at the same time in the same way. I am running my course without this, and I can attest that much class time has been wasted trying to troubleshoot Java and install issues on different OS / JVM combinations.

Gephi is not very fault tolerant 
Data, at least in the humanities, is often messy, malformed, and non standards-compliant. I was stymied in class due to one character causing an issue in a data set that we found online – while text programs and Excel / OpenOffice handled the file gracefully, it blew up on Gephi.

Many of the concepts discussed in SNA texts can not be easily seen in Gephi
Concepts like triadic closure are somewhat difficult to capture, but there is nothing in Gephi to identify triads. It can compute the total number, but this is less useful for showing students where the triads are in a graph. I could also not find a way to view cliques, or to identify bridges programatically. Network balance is also something that is not readily apparent in Gephi.

Filtering can be difficult
While there are some powerful filtering features in Gephi, the class has had a difficult time conceptualizing their use and using them to their full potential. A more intuitive interface may solve some of these problems.

Some features are Broken
Embeddedness is not a core feature for Gephi, and the plugin that computes this is incompatible with the current version of the code. In addition, filtering on partition for edges does not seem to currently work – this makes identification of cliques and balanced graphs more difficult. Along with this, Gephi can be very unstable at times, and some workarounds (like exporting a newly created graph and re-importing it to ensure compatibility with multiple edges) can be a hassle.

Summary
In short, I think Gephi is a good choice for the classroom, but one that will require some serious work from the instructor. I would HIGHLY recommend that you teach Gephi in a classroom setting, where JVM and OS choices are restricted and supported by IT staff. I would like to see more educators using Gephi so we can pressure the developers (or encourage interested students!) to add more functionality to the core of the software.

Reflections on the BAM Conference

BAM2016_FinalI had the absolute privilege of attending the Big Ancient Mediterranean Conference (#BAM2016) this week. The remarkable projects, enthusiasm for all things digital, and congenial atmosphere was inspiring. Now that the conference has ended, I think it is a good time to organize my thoughts, and perhaps point out some of the common themes that particularly struck me.

1) Our projects are ready talk to each other. This was one of the most exciting revelations of the conference. Many of the digital humanities projects and initiatives represented here not only offer their data in downloadable format (.csv files, JSON dumps, etc), but also feature feature-rich APIs. Even if we are not quite yet to the point where we are using the same meta-data / data standards (more on that later), the use of APIs with permanent URIs allows our data sets to meaningfully interact. The work of Pelagios creates an excellent medium to facilitate such communication, and opens up our data to initiatives that are not limited to studies of the ancient world.

2) Users, users, users, users. We had some spirited and fascinating debate about who the audience is for digital humanities projects, and if it is even possible to create an application that can be effectively used by different audiences (experts, the general public, grad students, etc). I fall squarely on the side of the idea that we engage with multiple audiences by the very nature of a freely-accessible online platform, but our debate revealed a fundamental design question that is often not explicitly addressed: Exactly *who* is a digital humanities project for? Although I may differ with the voices questioning the multi-audience approach, I certainly agree with the position that we need increased usability studies and more robust user information. It is not enough for us to create DH projects that answer our individual questions for ourselves – we need to understand how to communicate with an audience which is used to the visual literacies of web and is less familiar with the conventions of scholarly communication derived from a print medium. The sample edition of Calpurnius from the Digital Latin Library (http://digitallatin.github.io/viewer/editio-2.0.html) is a great model – it captures the information of a textual apparatus free of technical jargon, rendering critical information to a wider audience without a loss of scholarly rigor.

3) Uncertainty. A corollary to the discussion around users is the question of representing uncertainty. There was an interesting question of why we should recognize fuzzy data at all: if an application is directed solely at an academic audience, is it not correct to assume that our users implicitly know that any data or representation of the ancient world is somewhat problematic, and therefore have no problem consuming visual representations that ignore the idea of uncertainty entirely? As I think that our projects need to communicate with non-academic audiences (and indeed academics who may not be as familiar with the inherent uncertainty of the ancient world), I see a very real need to represent the imprecision and uncertainty of our data. Almost all of the projects at BAM grappled with fuzzy data, whether that was geo-spatial (location, assignment to a place), textual (uncertain letter forms, unclear manuscript tradition), or interpretive (multiple archaeological reconstructions, the placement of garrison soldiers at a specific community). Almost every project dealt with uncertainty in a way that reflected the scholarly tradition of their subject area, like placing notes in an apparatus, or describing fuzzy data through text. I see a critical need to establish a common meta-data vocabulary that can, at the very least, alert users (both human and computational) to the presence of uncertainty in our work. I also see room for a common visual literacy for representing uncertainty in maps, social networks, or other visualizations, which is a far more complex issue.

4) Metadata and documentation. For example, even if it proves impossible / impractical / or undesirable to create a common visual literacy surrounding uncertainty, we need to implement a common way of indicating and describing fuzzy data that can be computationally consumed. This returns to my first point: our projects can now talk to each other through computational agents, but we must agree on the vocabulary governing that conversation. Alignment with Pelagios will help in that regard, but I think more attention needs to be paid at all levels of DH projects to metadata standards. For DH projects in the ancient world, the ontology for Linked Ancient World Data offered by LAWD (https://github.com/lawdi/LAWD) should be a staring point.

Much like the slow, often tedious process of generating metadata, creating documentation for DH projects is often overlooked. From comments in code to capturing the design decisions and the entire creative process, DH documentation needs to go beyond the narrative of the research question and capture the entire creative, intellectual, and industrial process of a DH project. The suggestion to look to the hard sciences foe guidance in this process is a fruitful place to start.

5) The use of open-source repositories and the continued importance of institutional support. Most of the projects at BAM had a presence at GitHub, and there was some very interesting discussion around the practicality and usefulness of a non-profit, academically oriented alternative. This debate had as a background the reality that GitHub and other free services are currently a critical component of our work, as many DH projects operate on a shoe-string budget and are dependent on largess from an institution or grants. Such funding is often uncertain; Pleaides, one of the most exemplary projects at BAM, has a 50% success rate at securing NEH funding. For smaller projects this rate may be even lower; some participants indicated that the reward to work ratio of grant applications is not attractive for smaller projects.

There is some good news though, as many institutions have expressed growing interest in the digital humanities as a field. As a digital humanities community we need to build on this interest with a push for institutional backing. The University of Iowa clearly demonstrated the excellent outcomes of a group that is both dedicated to digital humanities and able to provide hosting, archiving, and other technical support.

6) The continued need for face-to-face gatherings. While we have many electronic forums for communication (Twitter, Slack, IRC, site forums, etc), there is still something special that happens when DH scholars are brought together for several days, freed of other distractions, and think about the same issues as a group. For me, the headspace of a conference is entirely different than using Skype in my office; my other projects and papers are out of sight, (largely) out of mind, and my focus is squarely on the discussion.

7) Release the tweets. One place where documentation is somewhat overlooked is at conferences like BAM. Many conferences generate an end-product like proceedings, which while valuable, can not capture the conversation that surrounds each presentation. The incredible use of twitter by BAM attendees, and the use of storify to capture those tweets, can serve as a model for other conference proceedings. Conference organizers should establish an “official” twitter tag, advertise it widely on social media, and ensure that a conference venue offers free wifi-access to the attendees. This expands out the reach of the conference in real-time to attendees and remote presenters who would otherwise be unable to participate in the conversation. A critical component to this is also archiving – for BAM, the use of storify (part 1, part 2) and the support of Iowa libraries ensures that there is a searchable account that can be referenced of the conference and the wider conversation it sparked.

The BAM conference generated a lot of intriguing conversation and displayed a host of excellent projects. If this kind of interest, scholarship, and congeniality can be maintained, the future of DH is bright indeed.

2 of N: Gephi, D3.js, and maps: Success!

gephileafletd3js
A working, geographically accurate map using Gephi, D3.js, and Leaflet. NOTE: Link subject to change.

In my previous post I outlined how I used D3.js to display a “raw” JSON output from Gephi. After some hacking around, I am now able to display my Gephi data on an interactive leaflet map!

This is a departure from other work on the subject for a few reasons:

  1. Not all of my data has geographic information – indeed in many cases a specific longitude / latitude combination is inappropriate and would lend a false sense of permanence to anyone looking at the map. In my case I have names of Greek garrison commanders which have some relation to a place, but it is unclear in some instances if they are actually at a specific place, have dominion over the location, or are mentioned in an inscription for some other reason. Therefore, I need to locate data that has a fuzzy relation to a location (ancient people who may originate, reside, work, and be mentioned in different and / or unknown locations) and locations that may themselves have fuzzy or unknown geography. This is a problem for just about every ancient to pre-modern project, as we do not have a wealth of location information, or even a clear idea of where some people are at any particular moment.
  2. I want to show how social networks form around specific geographic points which are known, and have those social networks remain “reactive” on zooms, changing map states, etc. This can be expanded to encompass epistolary networks, knowledge maps, etc – basically anything that links people together who may not be locatable themselves.
  3. Gephi does not output in GeoJSON, and the remaining export options that are geographically oriented require that *all* nodes have geographic information. As this is not my case (see above), the standard export options will not work for me. Also, as part of my work on BAM, I want to create a framework that is as “plug and play” as possible, so that we can simply take Gephi files and drop them into the system to make new modules. Therefore this work has to be reproducible with a minimum of tweaking.

So, let us get to the code!

First things first: You need to make your html, bring in your javascript,and style some elements. I put the css in the file for testing – it will be split off later.


<!DOCTYPE html>

<head>
<meta name='viewport' content='width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no' />
<!-- Mapbox includes below -->
<script src='https://api.mapbox.com/mapbox.js/v2.2.2/mapbox.js'></script>
<link href='https://api.mapbox.com/mapbox.js/v2.2.2/mapbox.css' rel='stylesheet' />
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.3/jquery.min.js"></script>
<script src="http://d3js.org/d3.v3.js"></script>
</head>
<meta charset="utf-8">
<!-- Will split off css when done with testing -->

<style>
.node circle {
stroke: grey;
stroke-width: 10px;
}

.link {
stroke: black;
stroke-width: 1px;
opacity: .2;
}

.label {
font-family: Arial;
font-size: 12px;
}

#map {
height: 98vh;
}

#attributepane {
display: block;
display: none;
position: absolute;
height: auto;
bottom: 20%;
top: 20%;
right: 0;
width: 240px;
background-color: #fff;
margin: 0;
background-color: rgba(255, 255, 255, 0.8);
border-left: 1px solid #ccc;
padding: 18px 18px 18px 18px;
z-index: 8998;
overflow: scroll;
}
</style>

<body>

<div id='attributepane'></div>

<div id='map'>
</div>

Next, make a map.

<script>
var map = L.mapbox.map('map', 'yourmap', {
accessToken: 'yourtoken'
});

//set the initial view. This is pretty standard for most of the ancient med. projects
map.setView([40.58058, 36.29883], 4);

Pretty basic so far. Next we follow some of the examples that are already in the wild to initiate D3 goodness:


var force = d3.layout.force()
.charge(-120)
.linkDistance(30);

/* Initialize the SVG layer */
map._initPathRoot();

/* We simply pick up the SVG from the map object */
var svg = d3.select("#map").select("svg"),
g = svg.append("g");

Next, we bring in our json file from Gephi. Again, this is pretty standard:


d3.json("graph.json", function(error, json) {

if (error) throw error;

Now we get into the actual modifications to make the json, D3, and leaflet all talk to each other. The first thing to do is to modify the colors (from http://stackoverflow.com/questions/13070054/convert-rgb-strings-to-hex-in-javascript) so that D3 displays what we have in Gephi:


//fix up the data so it is what we want for d3
json.nodes.forEach(function(d) {
//convert the rgb colors to hex for d3
var a = d.color.split("(")[1].split(")")[0];
a = a.split(",");

var b = a.map(function(x) { //For each array element
x = parseInt(x).toString(16); //Convert to a base16 string
return (x.length == 1) ? "0" + x : x; //Add zero if we get only one character
})
b = "#" + b.join("");
d.color = b;

Next, we need to put in “dummy” coordinates for locations that do not have geography. This is messy and could probably be removed with some more efficient coding later. For the nodes that do have geography, the map.latLngToLayerPoint will translate the values into map units, which places them where they need to go. These are simply lat lon attributes in the Gephi file. I also set nodes that are fixed / not fixed, based on the presence of lat/lon data.


if (!("lng" in d.attributes) == true) {
//if there is no geography, then allow the node to float around
d.LatLng = new L.LatLng(0, 0);
d.fixed = false;
} else //there is geography, so place the node where it goes
{
d.LatLng = new L.LatLng(d.attributes.lat, d.attributes.lng);
d.fixed = true;
d.x = map.latLngToLayerPoint(d.LatLng).x;
d.y = map.latLngToLayerPoint(d.LatLng).y;
}
})

Now to setup the links. As we are keyed on attributes and not an index value, we need to follow this fix:


var edges = [];
json.edges.forEach(function(e) {
var sourceNode = json.nodes.filter(function(n) {
return n.id === e.source;
})[0],
targetNode = json.nodes.filter(function(n) {
return n.id === e.target;
})[0];

edges.push({
source: sourceNode,
target: targetNode,
value: e.Value
});
});

var link = svg.selectAll(".link")
.data(edges)
.enter().append("line")
.attr("class", "link");

Now to setup the nodes. I wanted to do a popup on a mouseclick event, but for some reason this is not firing (mousedown and mouseover do work, however). The following code builds the nodes, with radii, fill, and other information pulled from the JSON file. It also toggles a div that is populated with attribute information from the JSON. There is still some work to do at this part: the .css needs to be cleaned up, images need to be resized, and the attribute information for the nodes should be a configurable option when importing the JSON.


var node = svg.selectAll(".node")
.data(json.nodes)
.enter().append("circle")
//display nodes and information when a node is clicked on
//for some reason the click event is not registering, but mousedown and mouseover are.
.on("mouseover", function(d) {

//put in blank values if there are no attributes
var titleForBox, imageForBox, descriptionForBox = '';
titleForBox = '
<h1>' + d.label + '</h1>

';

if (typeof d.attributes.Description != "undefined") {
descriptionForBox = d.attributes.Description;
} else {
descriptionForBox = '';
}

if (typeof d.attributes.image != "undefined") {
imageForBox = '<img src="' + d.attributes.image + '" align="left">';
} else {
imageForBox = '';
}

var htmlForBox = imageForBox + ' ' + titleForBox + descriptionForBox;
document.getElementById("attributepane").innerHTML = htmlForBox;
toggle_visibility('attributepane');
})
.style("stroke", "black")
.style("opacity", .6)
.attr("r", function(d) {
return d.size * 2;
})
.style("fill", function(d) {
return d.color;
})
.call(force.drag);

Now for the transformations when the map state changes. The idea is to keep the fixed nodes in the correct place, but to redraw the “floating” nodes when the map is zoomed in and out. The nodes that need to be transformed are dealt with first, then the links are rebuilt with the new (or fixed) x / y data.


//for when the map changes viewpoint
map.on("viewreset", update);
update();

function update() {

node.attr("transform",
function(d) {
if (d.fixed == true) {
d.x = map.latLngToLayerPoint(d.LatLng).x;
d.y = map.latLngToLayerPoint(d.LatLng).y;
return "translate(" +
map.latLngToLayerPoint(d.LatLng).x + "," +
map.latLngToLayerPoint(d.LatLng).y + ")";
}
}
);

link.attr("x1", function(d) {
return d.source.x;
})
.attr("y1", function(d) {
return d.source.y;
})
.attr("x2", function(d) {
return d.target.x;
})
.attr("y2", function(d) {
return d.target.y;
});

node.attr("cx", function(d) {
if (d.fixed == false) {
return d.x;
}
})
.attr("cy", function(d) {
if (d.fixed == false) {
return d.y;
}
})

//this kickstarts the simulation, so the nodes will realign to a zoomed state
force.start();
}

Next, time to start the simulation for the first time and close out the d3 json block:


force
.links(edges)
.nodes(json.nodes)
.start();
force.on("tick", update);

}); //end

Finally, time to put a function in to toggle the visibility of the div (from here) and close out our file:


function toggle_visibility(id) {
var e = document.getElementById(id);
if (e.style.display == 'block')
e.style.display = 'none';
else
e.style.display = 'block';
}
</script>
</body>

There you have it- a nice, interactive map with a mix of geographic information and social networks. While I am pleased with the result, there are still some things to fix / address:

  1. The click even not working. This is a real puzzler.
  2. Tweaking the distances of the simulation – I do not want nodes to be placed half a world away from their connections. This may have to be map zoom level dependent.
  3. Style the links according to Gephi and provide popups where applicable. This should be easy enough to do, but simply hasn’t been done in this code.
  4. Tweak the visibility of the connections and nodes. While retaining an option to show the entire network at once, my idea is to have a map that starts out with JUST the locations, and then makes the nodes that are connected to that location visible when you click on it (which would also apply to the unlocated nodes – i.e. you see what they are connected to when you click on them).
  5. Connected to the above point, the implementation of a slider to show nodes in a particular timeframe. As my data spans a period from the 600s BCE to the 200s CE, this would provide a better snapshot of a particular network at a particular time.
  6. Implement a URI based system – you will be able to go to address/someEntityName and that entity will be selected with its information pane and connected neighbors displayed. This will result in an RDF file that will be sent to the Pelagios Project.
  7. Fix up the .css for the information pane.

I will detail further steps in a later post.