I will be delivering a talk May 20th at 12:15 at UCSB (location TBD) entitled Mediterranean Pathways: GIS, Network Analysis, and the Ancient World on some of the geospatial and network analysis I have been performing with my own research in conversation with ORBIS, Pleiades, Nomisma, and other linked open data sets. An abstract of the talk is below:
We live in a world of maps and networks. GPS enabled phones allow us to instantly locate ourselves on the earth’s surface, guide us to stores or restaurants, or announce to the world our location through social media. Likewise, programs like Google Earth and desktop Geographic Information Systems (GIS) have revolutionized our engagement with maps, map-making, and have challenged traditional notions of space and place.
The proliferation of GIS technologies and the “spatial turn” in digital humanities has also provided new avenues for challenging assumptions about the representations of past societies, the nature of empire, and the reach of imperial power. Despite their aesthetic beauty, traditional print maps, with clearly delineated static borders, often artificial naming conventions, and fixed viewpoints do not convey the complexity and uncertainty of the past.
Ancient societies and empires were far from static; they were networks of complex interactions and fierce contestation which unfolded in geographic space. This talk demonstrates how the use of new digital methodologies, gazetteers, and Linked Open Data (LOD) resources can be used to model and study these networks, and how new mapping techniques are transforming our understanding of ancient empire. Using the Attalid Kingdom as a guide, this talk examines the theory and practicalities of building an entity-relationship gazetteer and how to align it with LOD resources. It then addresses the construction of networks in desktop software, the impact of networks on cartography, and how new maps and digital models provided unique insights into the study of ancient Greek garrisons. The talk will then end with a brief overview of how Pleiades and other ancient world digital initiatives, including the Pelagios project’s Recogito platform, are developing new tools to enable the research and mapping of ancient networks.
Two projects that I am involved with, Pleiades and the World-Historical Gazetteer at the University of Pittsburgh, have been devoting considerable time and energy to modeling conceptual places and their connections, so I thought it was worth discussing a few of our observations and presenting some preliminary steps to visualize what we are doing.
First, a somewhat crowded overview of all of the Pleiades data set with map symbols representing different place types.
At this level of zoom the map is nearly incomprehensible, but it does reveal some interesting aspects of our data set. The Grid like structure in India and central Asia is the result of “dumping” places for which we have insufficient data into the middle of Barrington Atlas grid squares. For the editorial board such a view is actually quite useful, as it highlights where we need to clean our data and focus on creating better locations.
Another way to show the reach of the Pleiades project is through a choropleth map, which shades different countries according to the number of Pleiades places within them.
This is interesting, but I think it gives a fairly misleading sense of Pleiades coverage. From this map a reader would be unable to tell the extent of our data into Russia, China, and other countries where our locations are clustered around certain areas, not evenly spread throughout the country. It does highlight the areas where we have fairly extensive coverage, namely Italy, Greece, and Turkey.
To get around these issues, very often projects like ours use heat-maps to show both the concentration and extent of their data. I find this particular approach to be more aesthetically pleasing than simply throwing all of the points on the map, but due to the nature of a heat-map, I am still not convinced that it accurately depicts the extent of our coverage.
One of my issues with heat-maps is how the colors “bleed” into areas where there are not points. While this can be adjusted and refined by decreasing the radius around each point, if taken too far the heat-map will simply show isolated dots of color instead of the expected continuous whole.
One experiment that I have done is to try and combine heat maps with a Voronoi diagram. The basic idea behind this approach is that the GIS system creates a polygon around each point, and any spot within that polygon is closer to that particular point than any other known point. This helps Pleiades editors, as a “hotspot” in one polygon indicates that there are multiple places “stacked” on one another on the same point, which is a good indication that we are dealing with inaccurate data. Conversely, a “hotspot” that extends through multiple polygons is expected behavior, and signifies that there is a dense cluster of points that are in close proximity but nevertheless still are in distinct locations.
This is a very aesthetically pleasing map, but it is still difficult to quickly identify the correspondence between points, polygons, and the heat map. Using a hex-bin map (which is essentially a choropleth map with small hex shapes) styled like a heat map perhaps provides the cleanest and most comprehensible view of both our data coverage and density.
Of all the representations mentioned here (and many tests which were far too incomprehensible to show), I believe this map offers by far the best combination of understandability, honesty, and presentation. It clearly shows the concentration of our data in the Mediterranean like a heat map, but does a far better job of showing the precise location of the data points. It also shows a far more honest depiction of the number of points per country and the actual location of those points, which is not the case with a choropleth map at a country scale.
What these maps do not capture is the presence of connections in the Pleiades data set. As part of our evolving data modeling and best practices, we are now experimenting with a more robust system for expressing relationships between different places in our data set. These relationships could be political, geographic, or highly conceptual. One highly interesting product of this approach is that we can start thinking of the Pleiades gazetteer as a description of a network of places, not just as a list of their names and locations.
As a result, it is now possible to graph some of the relationships in our data. This is highly experimental and very incomplete, but I hope that by sharing our first steps in this direction that we can generate some discussion on our approach.
The first thing that I did was to download the Pleiades data set, then extract the connections information, creating a spread sheet that listed each connection as a source – target combination that social network analysis software would understand. Essentially any place that connected to another place was the source, while the place connected to was the target. This was then put into Gephi, where different “communities”, or places with denser connections to each other, are indicated by different colors.
The figure above is a detail of a portion of the resulting graph. You can see communities clustering around regions like Sicily and Sardinia, or around extremely important cities like Rome. The square on the outer reaches of the graph is simply a number of unconnected places that are pushed to the edges by the Gephi visualization software. While this is an interesting and somewhat compelling visualization, it is devoid of any geographic context. Luckily, Gephi has a plugin that places nodes (in our case the places) in a geographic location of there is data available. As we have location data for most of our places, we can use this plugin, which yields the result below.
Now we are getting somewhere! The broad outlines of the Mediterranean are visible, as are features like the Nile river and even the outline of India. However, this network is still not on a geographic map (the Gephi globe plugin does not exactly match the coordinate system used by the geography plugin, and also it is based on modern geography), so we are somewhat missing the larger spacial context. Unfortunately there is not an easy way to export the specially enhanced network with Gephi’s statistics and colors – the .kml plugin does capture the color, but lumps all of the statistics into a single description tag.
After some experimentation with exporting, importing, and reexporting in Gephi and QGIS, I finally found a solution by importing the .kml exported from Gephi into QGIS and exporting that as a .csv file which can then be manipulated in OpenRefine to “extract” all of the information from the description field. From there, the .csv file can be re-imported into QGIS, which results in the visualization below.
While somewhat crowded and messy, a closer of Italy view shows the power of this visualization.
These visualizations show the networks of connections within a spatial context, and are an intriguing way to approach entities like kingdoms, political entities, or other place groupings. We are already experimenting with placing regions and larger entities (like Sardinia and Sicily) as the “midpoint” between all of their constituent connections, which you can see displayed on the maps above.
However, I want to take this idea one step further and eliminate the representative point entirely from such places. To do so, I decided that a mono modal network, or a network of just one place type, would be an interesting way to represent these connections. In short, any place that connected to the place Sardinia would now connect directly to all of the other places that connected to Sardinia, and the place marker of Sardinia would be eliminated from the network entirely. This resulted in a very interesting visualization where the density of network connections almost resembles a polygon.
Even though I am still figuring out a method to transfer the color of the links from Gephi to QGIS, this type of representation has tremendous potential. If we can class different connections and pull those form the data set, we can begin to represent political areas, land masses, and other groupings as the sum of their shared connections in geographic space. So, instead of drawing arbitrary polygons, it is the connections themselves that create the “area” of a place. If these connections are able to respect underlying geography (roads, mountain passes, navigable rivers, springs, and other features), I think we may have a very powerful way of representing economic regions, areas of social interaction, political control, etc, and explore how those different networks interact and influence each other in geographic space.
A (VERY!) brief synopsis: The institute will focus on three areas of concern to digital art history: provenance, geographies, and visualization. We will create detailed specifications, assess different methodologies, and create a detailed proof of concept for each of these three areas. The results of this work will be translatable to different project plans and research opportunities at the close of the institute.
Given the detailed description of the institute (linked above) and the various specialties and strengths of the organizers, I think this will be a fascinating exploration of the intersection of art history, ancient history, linked data, geospatial research, material culture, and digital humanities. I expect that this institute will not only create outstanding scholarly output, but will serve as the core of a new, robust community of scholars interested in linked data, material culture, and art history.
Part of my work on the Big Ancient Mediterranean project involves creating a general software framework that can display social networks produced with Gephi, either as “stand alone” displays or integrated with geographic and textual information.
I created this particular module, “Hellenistic” Royal Relationships, to highlight the “stand alone” social network analysis (SNA) capabilities of BAM, and to serve as the start of a more generalized Hellenistic prosopography. Some other, more specialized work has been done in this direction; notably Trismegistos Networks and the efforts of SNAP:DRGN to create data standards for describing prosopographies and linking to other projects. Eventually this module will take advantage of these efforts, and provide stable URIs for its own data.
I envision this module serving several purposes. First, it provides an interesting visual representation of data contained within Wikipedia articles, including textual data that is not “linked” to other entries and therefore not discoverable by automated means. It serves as a quick reference for familial relationships, and provides an entry point for further exploration and study. This project has created a “core” of relationships that can be further expanded by different projects. It also can function as a check on Wikipedia data; some of the relationships here are highly controversial, or could even be wrong.
For future development, the next steps are to add more data on the subjects, including birth / death / reigning dates and a time-line browser based on those dates. As mentioned above, more work needs to be done to take advantage of linked data projects, including linkages to Pleiades locations where appropriate, linkages to Nomisma IDs if the monarch minted coins, and the presentation of the underlying data in a format that is compatible with SNAP:DRGN. Finally, I would like to develop a method for the automatic discovery and extraction of relationships described in Wikipedia articles, which is an interesting, but difficult, problem.
Intended for class or research use, the map can be printed, distributed digitally, or remixed as desired. It is the same scale and general size as the AWMC’s other wall map offerings through Routledge, so if you are so inclined, you can add it to a “mega-map” of the Mediterranean World. Demand for the map was so high that dropbox suspended our public folder; you can e-mail the AWMC (firstname.lastname@example.org) for a new direct download link.
Although this project is a static map of Asia Minor, the data behind the map can be found at the AWMC GitHub page. In a future post, I’ll write up how to use the AWMC geodata and the BAM framework to make an interactive version of this map which you can modify for your own needs.
With increasing social media incidents of election-related violence on twitter and social media, I decided to perform a quick network analysis of #ReportHate and whywereafraid (which, as of this writing, has removed its twitter link from its site). I am interested in examining the development of these online communities, if there are significant overlaps between them, and if there are opportunities for increased cooperation.
First, I looked at each network in isolation. I started with the network formed around #ReportHate, which consists of 2,781 nodes, 4,217 edges and 79 components. (A quick network primer: nodes are users or hashtags, while edges represent users mentioning a hashtag or another user. Components are parts of the graph where every node can trace a path through a number of edges to another node, and degree is the number of edges connecting a node to other nodes).
Surprisingly to me, the SPLC (@SPLCENTER) is not the node with the highest degree; that honor belongs to Dr. Simran Jeet Singh (@SIKHPROF), a professor of religion at Trinity University, despite SPLC’s approximate 9-1 advantage in followers (96.3 thousand to 10.7 thousand). It will be interesting to see if this disparity closes as more individuals are aware of the hashtag.
The top ten nodes by degree are dominated by two very different philosophies. @SIKHPROF, @SPLCENTER, @SHAUNKING, @AMYWESTERVELT, @TRUMPSWORLD2016, and @THIERISTAN are certainly aligned with progressive causes and appear to be supporters of the SPLC’s efforts to accurately report hate crimes. However, the next major node on the graph, @STOPHATECRIMEZ, appears to be an alt-right account (including an emoticon frog as a stand-in for Pepe the Frog), which tweets links to accounts of violence against Trump voters (dominated by links to YouTube) and refutations of violence committed by Trump supporters. The accounts that retweeted this account likewise seem to be dominated by alt-right and far right wing individuals, and the hashtag #HATECRIME is almost exclusively used by this group.
Moving on from the alt-right component of the graph, it is apparent that there are several large clusters of SPLC supporters that as of yet do not have much interconnectivity. As this is a relatively new hashtag, I expect a growth of connections between clusters; if not, there is is an opportunity for the “central” nodes of each cluster to reach out to each other and establish a more robust online community. Another potential issue are nodes that are otherwise disconnected from the network; if these individuals are tweeting about incidents, it would be beneficial to reach out (virtually) and bring them into the larger #ReportHate network.
Unlike the #ReportHate network, with a strong connected component, the whywereafraid network is far more dispersed and much smaller. There are 992 nodes and 938 edges, with 151 components. The node with the highest degree count is Patrick Kingsley (@PATRICKKINGSLEY), a foreign correspondent with the Guardian paper; his high degree is the result of his tweet linking to the whywereafraid tumblr account.
The other two of the top three nodes, @ADAMPOWERS and @JAMIETWORKOWSKI, seem to be allied with the progressive movement. The next node with the highest degree is the official account of Donald Trump (@REALDONALDTRUMP). However, this is due to other twitter users castigating him over election violence.
I then placed the networks together, to see if there was any overlap between the two growing communities. There are 26 users and 19 hashtags in common; when the entire network is placed in a graph, the node with the node with the highest degree of the 26 is @SHAUNKING, who is mentioned four times by other uses to bring his attention to whywereafraid. There are other tentative connections, but for the most part the two networks are very distinct, with little cross conversation.
This represents a danger and an opportunity for the supporters of #ReportHate and whywereafraid. As the #ReportHate and whyweareafraid networks grow, there are likely to have increased links due to shared common interests, but there is the real possibility that many users will remain tied to their initial choice of hashtag, and not participate in the wider community or conversation. If nodes that are structurally important (a high betweenness centrality) in the #ReportHate graph, such as @SIKHPROF and @AMYWESTERVELT, could be brought into conversation with the major nodes of the whyweareafraid graph, then there is a good chance to merge the two networks, increasing awareness, mutual support, and an increased online presence.