Category Archives: Social Networks

#ReportHate, whywereafraid, and SNA

With increasing social media incidents of election-related violence on twitter and social media, I decided to perform a quick network analysis of #ReportHate and whywereafraid (which, as of this writing, has removed its twitter link from its site). I am interested in examining the development of these online communities, if there are significant overlaps between them, and if there are opportunities for increased cooperation.

maincomp
The main component of the #ReportHate network. Dr. Singh’s community is in purple, the SPLC is in green, and the alt-right grouping is in red.

First, I looked at each network in isolation. I started with the network formed around #ReportHate, which consists of 2,781 nodes, 4,217 edges and 79 components. (A quick network primer: nodes are users or hashtags, while edges represent users mentioning a hashtag or another user. Components are parts of the graph where every node can trace a path through a number of edges to another node, and degree is the number of edges connecting a node to other nodes).

Surprisingly to me, the SPLC (@SPLCENTER) is not the node with the highest degree; that honor belongs to Dr. Simran Jeet Singh (@SIKHPROF), a professor of religion at Trinity University, despite SPLC’s approximate 9-1 advantage in followers (96.3 thousand to 10.7 thousand). It will be interesting to see if this disparity closes as more individuals are aware of the hashtag.

The top ten nodes by degree are dominated by two very different philosophies. @SIKHPROF, @SPLCENTER, @SHAUNKING, @AMYWESTERVELT, @TRUMPSWORLD2016, and @THIERISTAN are certainly aligned with progressive causes and appear to be supporters of the SPLC’s efforts to accurately report hate crimes. However, the next major node on the graph, @STOPHATECRIMEZ, appears to be an alt-right account (including an emoticon frog as a stand-in for Pepe the Frog), which tweets links to accounts of violence against Trump voters (dominated by links to YouTube) and refutations of violence committed by Trump supporters. The accounts that retweeted this account likewise seem to be dominated by alt-right and far right wing individuals, and the hashtag #HATECRIME is almost exclusively used by this group.

Moving on from the alt-right component of the graph, it is apparent that there are several large clusters of SPLC supporters that as of yet do not have much interconnectivity. As this is a relatively new hashtag, I expect a growth of connections between clusters; if not, there is is an opportunity for the “central” nodes of each cluster to reach out to each other and establish a more robust online community. Another potential issue are nodes that are otherwise disconnected from the network; if these individuals are tweeting about incidents, it would be beneficial to reach out (virtually) and bring them into the larger #ReportHate network.

Unlike the #ReportHate network, with a strong connected component, the whywereafraid network is far more dispersed and much smaller. There are 992 nodes and 938 edges, with 151 components. The node with the highest degree count is Patrick Kingsley (@PATRICKKINGSLEY), a foreign correspondent with the Guardian paper; his high degree is the result of his tweet linking to the whywereafraid tumblr account.

afraid-no-label
The whywereafraid network

The other two of the top three nodes, @ADAMPOWERS and @JAMIETWORKOWSKI, seem to be allied with the progressive movement. The next node with the highest degree is the official account of Donald Trump (@REALDONALDTRUMP). However, this is due to other twitter users castigating him over election violence.

I then placed the networks together, to see if there was any overlap between the two growing communities. There are 26 users and 19 hashtags in common; when the entire network is placed in a graph, the node with the node with the highest degree of the 26 is @SHAUNKING, who is mentioned four times by other uses to bring his attention to whywereafraid. There are other tentative connections, but for the most part the two networks are very distinct, with little cross conversation.

connected
The combined network. Edges that are from the #ReportHate data are in red, edges from the whywereafraid data are in blue.

This represents a danger and an opportunity for the supporters of #ReportHate and whywereafraid. As the #ReportHate and whyweareafraid networks grow, there are likely to have increased links due to shared common interests, but there is the real possibility that many users will remain tied to their initial choice of hashtag, and not participate in the wider community or conversation. If nodes that are structurally important (a high betweenness centrality) in the #ReportHate graph, such as @SIKHPROF and @AMYWESTERVELT, could be brought into conversation with the major nodes of the whyweareafraid graph, then there is a good chance to merge the two networks, increasing awareness, mutual support, and an increased online presence.

#NoDAPL Twitter Analysis

Introduction:

dapl_routes_map_large
Map By Carl Sack

The approximately 1,172 mile Dakota Access Pipeline1 has been highly controversial since its public unveiling in 2014.2 The Standing Rock Sioux and allied organizations took ultimately unsuccessful legal action to stop construction of the project3 while youth from the reservation began a social media campaign which gradually morphed into a larger movement with dozens of associated hashtags.4 I performed network analysis on #NODAPL, the most prominent of these hashtags on Twitter, between October 22 – 30, 2016. This revealed some interesting trends in the data, including the key role of alternative media, celebrities, and seemingly random twitter users holding the network together. Another surprising finding was the relatively minor role that republican candidate Donald Trump’s twitter account plays in the #NODAPL conversation, especially compared to the accounts of Barack Obama, Hilary Clinton, Bernie Sanders, and Dr. Jill Stein.

funtitled
My Visualization of the #NODAPL network

Preliminary Network Analysis:

Due to restrictions from the Twitter API and crashes / limitations from the software (see below), I do not have complete access to all Tweet traffic involving #NODAPL.5 I used the Twitter Archiving Google Sheet (TAGS) 6.16 to capture tweets that featured #NODAPL somewhere the tweet text. The resulting sheets were then imported into a database, then exported into an edges table for use in Gephi. For technical details, see the “Detailed Procedure” section below.

Basic to any network analysis is the concept of nodes and edges. Nodes can represent people, places, things, ideas, etc – they are entities on the graph. In this case, nodes are twitter users and hashtags. Edges associate nodes in some manner; they can represent friendship, biological relationships, enmity, or anything else that links two nodes. For my analysis, edges are anytime a user includes a user name or hashtag in a tweet. For example, one of the most prominent users in this study, @RUTHHHOPKINS is represented as a node, with an edge created to the node #NODAPL every time she uses the hashtag in a tweet, like the example below:

# NODAPL itself was excluded as a node in this analysis, as every tweet and user would be directly connected to it. This network features 133,702 nodes linked by 630,393 edges.7 I used Gephi to identify communities of nodes that are strongly linked together, which are represented by different colors in the network visualization.8 In addition, I ran some basic network statistics, including measuring the degree of nodes (the number of edges between two individuals, hashtags, or individuals and hashtags) on the graph. In these measurements out-degree indicates that a node initiates a link to another node in the graph, which in this case means another user name or hashtag was mentioned in a text by the node in question. in-degree measures incoming edges, which indicates that a particular node is the subject of a twitter conversation.

I first looked at the in-degree measurement. #STANDINGROCK was by far the node with the highest in-degree, indicating its popularity as a potential alternative hashtag to #NODAPL. @POTUS, the official twitter account of the President of the United States, was in second place, followed by #WATERISLIFE, @HILLARYCLINTON, @OFFICIALJADEN, @UR_NINJA, @SHAILENEWOODLEY, @MARKRUFFALO, and @RUTHHHOPKINS. In this list, only two nodes are not politicians, hashtags, or celebrities. @UR_Ninja is the official twitter account of Unicorn Riot9, a 501(c)3 nonprofit organization based in Minneapolis, Minnesota10 which has done extensive reporting on the Dakota Access Pipeline protests. @RUTHHHOPKINS is the twitter account of Ruth Hopkins, a Dakota/Lakota Sioux writer, journalist, and blogger. The high degree count on these nodes indicates that they may function as an information service, where their reporting on the situation is retweeted and mentioned by many other nodes in the network.

This measurement also revealed a marked difference between the in-degree and out-degree of nodes. The top 34 nodes by number of degrees are so dominated by in-degree connections that no node has an out-degree that contains more than 3.17% of its total edges. This reveals that such nodes are being “talked at”: they are mentioned in tweets, retweeted in large numbers, but by and large feature extremely limited further engagement with other Twitter users.

A particular user group is indicative of this trend. Few politicians have used Twitter to actively engage with activists or to contribute to the dialogue surrounding the #NoDAPL movement. In some cases this is not surprising; the official twitter account of the President of the United States can scarcely be expected to contribute extensively to dialogue on twitter. Despite being the seventh highest degree node and an occupation of her Brooklyn campaign headquarters on October 27, 201611 @HILLARYCLINTON, the official account of Hillary Clinton, has likewise not responded to #NoDAPL conversations on twitter. The official account of Bernie Sanders, @SENSANDERS, has also not extensively engaged with #NODAPL. However, on October 31, 2016, which is outside of the bounds of my data set, his account did issue a series of tweets in support of the # NODAPL movement.12

Another account of a politician, Dr. Jill Stein (@DRJILLSTEIN), is twelfth on in-degree, but only has five outwardly directed edges. Despite active involvement at the protests leading to charges of criminal trespass and criminal mischief,13 Dr. Stein’s twitter account has barely engaged with other users, with the only mentions in this data set originating from a retweet that mentioned Hillary Clinton and Barack Obama.14 Interestingly, despite over 1,000 retweets (many of which were collected by this study), her tweet mentioning both Hillary Clinton and Donald Trump15 was not captured by the TAGS software.

Perhaps surprisingly for a major party candidate, the twitter handle of Donald Trump, @REALDONALDTRUMP, is an outlier on this list: he ranks at 112,160 with only 933 total mentions. Trump’s publicized investments and connections with the Dakota Access project16 and environmental positions, including discounting climate change,17 almost certainly makes him unlikely to be sympathetic, let alone an ally of the # NODAPL movement. Indeed, most of his mentions on the network are simply retweets of Dr. Jill Stein’s criticism against Donald Trump and Hillary Clinton’s lack of involvement in the pipeline issue.18

Drilling down further into the data, I next looked at the nodes with the highest out-degree, which represents nodes who mentioned other users and hashtags. There were some interesting variations from the trends of in-degree nodes. Three users, @DEANLEH, @CANATIVEOBT, and @WMN4SRVL had in-degree and out-degree measurements that were no more than 20% divergent from each other. However, this does not mean that these nodes are engaged in extensive online conversations. These accounts all feature extensive retweets and linkages to different causes often associated with the progressive movement, including climate change awareness, opposition to institutional racism, feminism, and anti-corporatism. All three of these accounts seem to perform a function similar to news aggregation, as the majority of their mentions are retweets from other sources and are not extensive discussions with other users.

Another useful statistics, betweenness, measures the number of shortest paths (connections between any two nodes on the graph that may involve any number of additional nodes) that pass through a specific node.19 Nodes with a high betweenness are “central” in that they play a critical role in connecting (and therefore moving information) through the network. The single node with the highest betweenness is @UR_NINJA, which combined with its high degree ranking, suggests that the news service plays a critical role in bringing together individuals on the graph who are interested in social justice / progressive issues. Four other nodes in the top 25 betweenness list are likewise in the top 25 nodes by degree.

The remaing nodes are somewhat surprising. The twitter profile for second highest betweenness node, @TNPMR has a limited online footprint outside of Twitter, and does not seem to be involved in a leadership capacity in a social movement or media organization. Another important node in this measurement, @AMAZONMILLER, only scores 1638th in total degrees, yet still retains an important place in the network structure. Looking further at this data, I next examined at each individual user’s Twitter profile who scored in the top 25 for betweenness. I divided this list into people who seem to be primarily interested in progressive causes in general vs. those who expressed affinity for indigenous rights issues. The results were nearly evenly split, with a slight edge to the more general progressivists. However, only two of top ten nodes in the betweenness category focused primarily on indigenous issues, while the rest were concerned with progressivist issues more broadly. What this may indicate is that, as a whole, indigenous activists may face future difficulties in promoting their narrative outside of the more general progressive interests of the online community.

Further Observations:

These preliminary steps have also revealed some issues about data collection and curation. Twitter’s REST and streaming APIs are woefully inadequate for examining the whole data set. While Twitter provides, in theory, a representative sample of the data set, one of the powers of social network analysis is the discovery of weak ties and other network structures which are by definition not representative of the network as a whole. This can be frustrating for academic study of the network, and extremely detrimental to movements that depend on social media to transmit their messages. Groups can look at their own twitter histories, but the larger network structure, along with crucial weak ties, may be invisible to them.

Although Twitter does provide mechanisms for obtaining the entire history of hashtag usage, the organic development of other hashtags which are not heavily watched from the beginning is almost certainly a cost-prohibitive proposition for social movements that are loosely organized, under-funded, and / or have limited computer infrastructure. It would be a significant benefit for such groups to gain access to the Twitter history of their movements, and be able to the evolution of the conversation on social media. As hashtag use can grow organically, with many different signifiers used for conversations, Twitter’s current pricing structure and data access model puts these groups at a severe disadvantage and hinders the identification and cultivation of allied communities and supporters.

A less pressing, but nevertheless important, issue is access to Twitter’s archive by researchers. Unlike print material or traditional media, which may be tedious to analyze but are fully (and for the most part cheaply) accessible to interested parties, the complete set of tweets on a topic are impossible to study without significant funding. Even if a researcher could guess all of the hashtags that could emerge from a dynamic topic, the Twitter streaming API does not provide all relevant tweets. Such limitations make it challenging to use Twitter data in a pedagogical setting. Some of my students have expressed interest in conducting similar projects, but the need for constant downstream connections and the high cost of historical tweets have made all but the most superficial studies impossible. There needs to be a more cost-effective means for projects operating on a limited budget, students, and other academic uses of Twitter’s data.

Next Steps:

In addition to the data set on #NoDAPL featured here, I have also compiled a number of hashtags and data in separate TAGS sheets which can be combined to see more of the network. I am currently running a python script to grab more tweet data from the streaming API, which should provide more tweets. After placing this data in the network and performing some basic sentiment analysis, I want to see if distinct communities have formed around different hashtags, and if those communities have noticeably different rhetorical strategies that correspond to the inclusion of certain hashtags. A long term goal is to secure funding to obtain the complete twitter archive of #NODAPL and related hashtags in order to perform a full social network and sentiment analysis. In addition, I would like to examine the twitter history of @UR_Ninja and other alternative news organizations to see if their followers form recognizable activist communities. As part of this analysis, I am especially interested to see how these communities change when news organization shifts their focus between causes (like #FERGUSON to #NODAPL), and to examine the interactions of these virtual communities with different social movements.

To overcome the issues I discovered with TAGS and TwitterStreamingImporter, I am currently running a python script (modeled after http://adilmoujahid.com/posts/2014/07/twitter-analytics/) that pulls in the full json object from Twitter’s streaming API for a number of hashtags related to #NODAPL. I think the best approach is to perform a weekly update of a “master” network that captures all of the data that I can dealing with #NODAPL, and then running statistics / etc from a filtered network in Gephi. I will be sure to post any additional developments here.

Detailed Procedure:

The first difficulty in analyzing Twitter traffic is actually obtaining Twitter data. While Twitter does retain a historical archive of all tweets, this resource is currently inaccessible for academic research unless licensing fees are paid to an archival service such as GNIP. There is an indication that GNIP is aware of the power of Twitter analytics for academic research, and there are different pricing plans available,20 but as my project is currently in the exploratory phase, I am operating without any funding. As such, I needed an alternative.

I first used TAGS to pull historical and incoming tweets into separate google sheets for each hashtag I was interested in. TAGS uses Twitter’s REST API, which limits search rates and results.21 I ran into rate limits rather quickly with my searches; in addition, my documents in google also hit their size and row limit. TAGS does not provide the entire result from the Twitter API: fields like place, retweeted (which indicates if a tweet was retweeted or not), and other useful fields are left off. Finally, I noticed that the text of tweets was often truncated; this made searching form complete user names, hashtags, and full text problematic. Although TAGS is a convent way to collect tweets, it can not possibly hope to represent the full network.

Despite these imitations, TAGS can still provide some powerful insights with a little modification. After importing my TAGS documents into a postgresql database, I mined the tweet text for all user mentions and hashtags from individual twitter users, which formed the edges of my network. I then imported this into Gephi v.0.9.122, where I performed some basic network analysis and visualizations of the data.

After this analysis, I decided that I needed to capture more tweets as they are issued. I used the TwitterStreamingImporter plugin for Gephi,23 which uses Twitter’s streaming API.24 The result is not all tweets that contain specified search terms, but is instead a representative sample that numbers up to 1% of global tweets. At ~ 300 -500 million tweets per day,25 the streaming api will return 3 – 5 million tweets on a given subject. For small data sets this may be sufficient, but it is impossible to tell how truly representative this sample is without the complete Twitter firehose.26

Unlike TAGS, TwitterStreamingImporter requires a constant internet connection to compile tweets. This is impracticable if not impossible for individuals who use a single laptop or other machine between different locations. I also experienced some crashes while performing analytics and changing/ running the visualization layouts; anyone wishing to style twitter data using this technique may wish to save constantly and export different files for styling purposes. This plugin does a nice job of drawing edges between users, tweets, and hashtags, and specifies the type of edge (tweet, retweet, hashtag, etc), although I would still like some more detailed information. The code is freely accessible,27 so I may be able to fork the repository and create a new plugin that pulls in all the data that I am interested in (especially geolocations, time of the tweet, etc). However, I think simply using a python script on a persistent connection will be my next step in this analysis. 

Notes:

1  LLC Dakota Access and United States Army Corps of Engineers, “Environmental Assessment: Dakota Access Pipeline Project, Crossings of Flowage Easements and Federal Lands” (U.S. Army Corps of Engineers, Omaha District, 2016), 8, http://purl.fdlp.gov/GPO/gpo74064.

2  Further reading can be found at https://nycstandswithstandingrock.wordpress.com/standingrocksyllabus/, created by NYC Stands for Standing Rock committee, a self described group “…group of Indigenous scholars and activists, and settler/ POC supporters.” (https://nycstandswithstandingrock.wordpress.com/about/).

3  A. B. C. News, “Court Denies Tribe’s Appeal to Block Dakota Access Pipeline,” ABC News, October 11, 2016, http://abcnews.go.com/US/court-denies-tribes-appeal-block-controversial-dakota-access/story?id=42700614.

4  “Rezpect Our Water,” accessed November 6, 2016, http://rezpectourwater.com/; “Thousands Nationwide Show Solidarity with the Standing Rock Sioux and #NoDAPL,” Sierra Club, September 13, 2016, http://www.sierraclub.org/planet/2016/09/thousands-nationwide-show-solidarity-standing-rock-sioux-and-nodapl.

5  n.b. Twitter is case insensitive, but all user names and hashtags are capitalized here.

7  I used Gephi with the OpenOrd Layout to create the network visualization1 after modifying TAGS data in a postgresql database. Although the OpenOrd layout is intended for undirected graphs (see https://marketplace.gephi.org/plugin/openord-layout/), its ability to handle large datasets and limited computing resources made it an attractive choice for this investigation.

8  The modularity for the graph is 0.414, with 862 communities detected. 32 of these communities had 100 or more nodes, and totaled 131,080 of the 133,702 total, which is 98.04% of the total.

10  “About,” Unicorn Riot, accessed October 30, 2016, http://www.unicornriot.ninja/?page_id=372.

11  The Root Staff, “#NoDAPL: Indigenous Youths Occupy Hillary Clinton’s Brooklyn, NY, Headquarters,” The Root, October 29, 2016, http://www.theroot.com/articles/news/2016/10/nodapl-indigenous-youth-occupy-hillary-clintons-brooklyn-headquarters/; “Indigenous Youth Occupy Hillary Clinton Campaign Headquarters to Demand She Take Stand on #DAPL,” Democracy Now!, accessed November 4, 2016, http://www.democracynow.org/2016/10/28/indigenous_youth_occupy_hillary_clinton_campaign.

16  Oliver Milman, “Dakota Access Pipeline Company and Donald Trump Have Close Financial Ties,” The Guardian, October 26, 2016, sec. US news, https://www.theguardian.com/us-news/2016/oct/26/donald-trump-dakota-access-pipeline-investment-energy-transfer-partners; “The Latest: Trump Holds Dakota Access Pipeline Company Stock,” US News & World Report, accessed November 4, 2016, http://www.usnews.com/news/us/articles/2016-10-26/the-latest-pipeline-protesters-think-their-removal-imminent.

17  “Did Trump Say Climate Change Was a Chinese Hoax?,” @politifact, accessed November 4, 2016, http://www.politifact.com/truth-o-meter/statements/2016/jun/03/hillary-clinton/yes-donald-trump-did-call-climate-change-chinese-h/.

19  I performed Eigenvector analysis on the data set, but there was little deviation in the top ranked nodes from ranking by total degree.

25  Jim Edwards, “Leaked Twitter API Data Shows the Number of Tweets Is in Serious Decline,” Business Insider, accessed November 2, 2016, http://www.businessinsider.com/tweets-on-twitter-is-in-serious-decline-2016-2; “Twitter Usage Statistics – Internet Live Stats,” accessed November 2, 2016, http://www.internetlivestats.com/twitter-statistics/#sources.

26  Research on the representative accuracy of Twitter’s API has been mixed; see Fred Morstatter et al., “Is the Sample Good Enough? Comparing Data from Twitter’s Streaming API with Twitter’s Firehose,” arXiv Preprint arXiv:1306.5204, 2013; Fred Morstatter, Jürgen Pfeffer, and Huan Liu, “When Is It Biased?: Assessing the Representativeness of Twitter’s Streaming API,” in Proceedings of the 23rd International Conference on World Wide Web (New York, NY: ACM, 2014).

Bibliography

“A Pipeline Fight and America’s Dark Past.” The New Yorker, September 6, 2016. http://www.newyorker.com/news/daily-comment/a-pipeline-fight-and-americas-dark-past.

“About.” NYC Stands with Standing Rock, September 13, 2016. https://nycstandswithstandingrock.wordpress.com/about/.

“About.” Unicorn Riot. Accessed October 30, 2016. http://www.unicornriot.ninja/?page_id=372.

“Appeals Court Halts Dakota Access Pipeline Work Pending Hearing.” Indianz. Accessed November 6, 2016. http://www.indianz.com/News/2016/09/16/appeals-court-halts-dakota-access-pipeli.asp.

CNN, Marlena Baldacci, Emanuella Grinberg and Holly Yan. “Dakota Access Pipeline: Police Remove Protesters.” CNN. Accessed November 6, 2016. http://www.cnn.com/2016/10/27/us/dakota-access-pipeline-protests/index.html.

Dakota Access, LLC, and United States Army Corps of Engineers. “Environmental Assessment: Dakota Access Pipeline Project, Crossings of Flowage Easements and Federal Lands.” U.S. Army Corps of Engineers, Omaha District, 2016. http://purl.fdlp.gov/GPO/gpo74064.

“Dakota Access Pipeline.” Accessed November 6, 2016. http://www.daplpipelinefacts.com/.

“Dakota Access Pipeline: Overview.” Accessed November 6, 2016. http://www.daplpipelinefacts.com/about/overview.html.

“Did Trump Say Climate Change Was a Chinese Hoax?” @politifact. Accessed November 4, 2016. http://www.politifact.com/truth-o-meter/statements/2016/jun/03/hillary-clinton/yes-donald-trump-did-call-climate-change-chinese-h/.

Edwards, Jim. “Leaked Twitter API Data Shows the Number of Tweets Is in Serious Decline.” Business Insider. Accessed November 2, 2016. http://www.businessinsider.com/tweets-on-twitter-is-in-serious-decline-2016-2.

Healy, Jack. “From 280 Tribes, a Protest on the Plains.” The New York Times, September 11, 2016. http://www.nytimes.com/interactive/2016/09/12/us/12tribes.html.

“Indigenous Youth Occupy Hillary Clinton Campaign Headquarters to Demand She Take Stand on #DAPL.” Democracy Now! Accessed November 4, 2016. http://www.democracynow.org/2016/10/28/indigenous_youth_occupy_hillary_clinton_campaign.

“Judge Rules That Construction Can Proceed On Dakota Access Pipeline.” NPR.org. Accessed November 6, 2016. http://www.npr.org/sections/thetwo-way/2016/09/09/493280504/judge-rules-that-construction-can-proceed-on-dakota-access-pipeline.

“Life in the Native American Oil Protest Camps.” BBC News, September 2, 2016, sec. US & Canada. http://www.bbc.com/news/world-us-canada-37249617.

McCausland, Phil. “More Than 80 Dakota Pipeline Protesters Arrested, Some Pepper Sprayed.” NBC News, October 23, 2016. http://www.nbcnews.com/news/us-news/more-80-dakota-access-pipeline-protesters-arrested-some-pepper-sprayed-n671281.

McCleary, Mike. “As Standing Rock Protesters Face Down Armored Trucks, the World Watches on Facebook.” WIRED. Accessed October 30, 2016. https://www.wired.com/2016/10/standing-rock-protesters-face-police-world-watches-facebook/.

Milman, Oliver. “Dakota Access Pipeline Company and Donald Trump Have Close Financial Ties.” The Guardian, October 26, 2016, sec. US news. https://www.theguardian.com/us-news/2016/oct/26/donald-trump-dakota-access-pipeline-investment-energy-transfer-partners.

Morstatter, Fred, Jürgen Pfeffer, and Huan Liu. “When Is It Biased?: Assessing the Representativeness of Twitter’s Streaming API.” In Proceedings of the 23rd International Conference on World Wide Web. New York, NY: ACM, 2014.

Morstatter, Fred, Jürgen Pfeffer, Huan Liu, and Kathleen M Carley. “Is the Sample Good Enough? Comparing Data from Twitter’s Streaming API with Twitter’s Firehose.” arXiv Preprint arXiv:1306.5204, 2013.

News, A. B. C. “Court Denies Tribe’s Appeal to Block Dakota Access Pipeline.” ABC News, October 11, 2016. http://abcnews.go.com/US/court-denies-tribes-appeal-block-controversial-dakota-access/story?id=42700614.

———. “Timeline of the Dakota Access Pipeline Protests.” ABC News, October 31, 2016. http://abcnews.go.com/US/timeline-dakota-access-pipeline-protests/story?id=43131355.

“Rezpect Our Water.” Accessed November 6, 2016. http://rezpectourwater.com/.

Staff, The Root. “#NoDAPL: Indigenous Youths Occupy Hillary Clinton’s Brooklyn, NY, Headquarters.” The Root, October 29, 2016. http://www.theroot.com/articles/news/2016/10/nodapl-indigenous-youth-occupy-hillary-clintons-brooklyn-headquarters/.

“The Digital Transition: How the Presidential Transition Works in the Social Media Age.” Whitehouse.gov, October 31, 2016. https://www.whitehouse.gov/blog/2016/10/31/digital-transition-how-presidential-transition-works-social-media-age.

“The Latest: Trump Holds Dakota Access Pipeline Company Stock.” US News & World Report. Accessed November 4, 2016. http://www.usnews.com/news/us/articles/2016-10-26/the-latest-pipeline-protesters-think-their-removal-imminent.

“Thousands Nationwide Show Solidarity with the Standing Rock Sioux and #NoDAPL.” Sierra Club, September 13, 2016. http://www.sierraclub.org/planet/2016/09/thousands-nationwide-show-solidarity-standing-rock-sioux-and-nodapl.

“Twitter Usage Statistics – Internet Live Stats.” Accessed November 2, 2016. http://www.internetlivestats.com/twitter-statistics/#sources.

Williams, Weston. “Standing Rock Protests Escalate, as Tribe Calls for DOJ to Investigate.” Christian Science Monitor, October 24, 2016. http://www.csmonitor.com/USA/Justice/2016/1024/Standing-Rock-protests-escalate-as-tribe-calls-for-DOJ-to-investigate.

 

Gephi, Conspiracies, and SNA in the Classroom: Midterm Thoughts

Image from https://gephi.org/images/screenshots/layout2.png

This semester I designed a class, Introduction to Social Networks and Conspiracy Theories, that makes extensive use of Gephi along with the downloadable version of Networks, Crowds, and Markets: Reasoning About a Highly Connected World by David Easley and Jon Kleinberg. Readings on real conspiracies and conspiracy theories were compiled by myself into a course packet, and cover Ancient Athens, the assassination of Philip II, Knights Templar, the Gunpowder plot, the French Revolution, and conspiracy theories in the United States from the revolution to the present. In short, this class uses social network analysis to study paranoia from Plato to NATO.

Key to this class is the understanding and use of SNA software. I chose Gephi as it has a forgiving learning curve for creating networks and conducting basic analysis, and its cross platform capabilities were required as I do not have access to a computer lab for the class. The other deciding factor was the ease of exporting Gephi files (through the use of the excellent Sigmajs exporter) to the web, as the students will produce a number of publicly available network visualizations, in addition to a written report, for their final project.

Gephi has shown its usefulness to the class. The ability to very quickly take .csv files and make meaningful network diagrams impressed the students, and showing network filtering in real-time is a powerful way to show conceptually how eliminating bridges and key nodes can throw a network into confusion. Some other positive points:

Gephi’s GUI vs. command line tools
For my students, using a GUI has been a far better choice than a command-line or text driven interface. While the Gephi GUI can sometimes do strange things (like eliminate buttons or workspaces), on the whole its basic functionality is relatively intuitive. After a few demonstrations in the basics, the students have grasped how to create a network from spreadsheet data.

Real-time rendering
Keeping the various layouts running while filtering / changing elements of the network (especially the stand-by force atlas) powerfully illustrates many network concepts. It is also a very cheap (in time and effort!) method to create animated networks for the class.

Ease of stats
While some of the statistics I would like to see are not in the core of Gephi, the ones that are present are excellent. The students, after learning about the math and logic behind various network statistics, were quite relieved to discover how quickly Gephi can compute centrality, density, and degree measurements.

Styling
After spending some time going over the interface, the ease of selecting different attributes and measurements for node styling is something that really captured the student’s attention. I anticipate a flood of very interesting network diagrams for their final projects based on different styling / visualization choices, which is an excellent way for students to support their arguments.

Creating diagrams for the course
Using Gephi to create network diagrams for the conspiracy portion of the course is a very straightforward process, and the excellent export capabilities ensure that all of the networks I share look very professional.

While Gephi is an excellent piece of software, extensive use in the classroom has revealed some issues and missing features that do present a source of frustration for the class.

Java can be difficult
Supporting multiple operating systems with different Java installs on student laptops is an exercise in frustration. A class that uses Gephi extensively MUST have a supported computer lab, at the very least so that Java problems can be addressed and fixed for everyone at the same time in the same way. I am running my course without this, and I can attest that much class time has been wasted trying to troubleshoot Java and install issues on different OS / JVM combinations.

Gephi is not very fault tolerant 
Data, at least in the humanities, is often messy, malformed, and non standards-compliant. I was stymied in class due to one character causing an issue in a data set that we found online – while text programs and Excel / OpenOffice handled the file gracefully, it blew up on Gephi.

Many of the concepts discussed in SNA texts can not be easily seen in Gephi
Concepts like triadic closure are somewhat difficult to capture, but there is nothing in Gephi to identify triads. It can compute the total number, but this is less useful for showing students where the triads are in a graph. I could also not find a way to view cliques, or to identify bridges programatically. Network balance is also something that is not readily apparent in Gephi.

Filtering can be difficult
While there are some powerful filtering features in Gephi, the class has had a difficult time conceptualizing their use and using them to their full potential. A more intuitive interface may solve some of these problems.

Some features are Broken
Embeddedness is not a core feature for Gephi, and the plugin that computes this is incompatible with the current version of the code. In addition, filtering on partition for edges does not seem to currently work – this makes identification of cliques and balanced graphs more difficult. Along with this, Gephi can be very unstable at times, and some workarounds (like exporting a newly created graph and re-importing it to ensure compatibility with multiple edges) can be a hassle.

Summary
In short, I think Gephi is a good choice for the classroom, but one that will require some serious work from the instructor. I would HIGHLY recommend that you teach Gephi in a classroom setting, where JVM and OS choices are restricted and supported by IT staff. I would like to see more educators using Gephi so we can pressure the developers (or encourage interested students!) to add more functionality to the core of the software.

2 of N: Gephi, D3.js, and maps: Success!

gephileafletd3js
A working, geographically accurate map using Gephi, D3.js, and Leaflet. NOTE: Link subject to change.

In my previous post I outlined how I used D3.js to display a “raw” JSON output from Gephi. After some hacking around, I am now able to display my Gephi data on an interactive leaflet map!

This is a departure from other work on the subject for a few reasons:

  1. Not all of my data has geographic information – indeed in many cases a specific longitude / latitude combination is inappropriate and would lend a false sense of permanence to anyone looking at the map. In my case I have names of Greek garrison commanders which have some relation to a place, but it is unclear in some instances if they are actually at a specific place, have dominion over the location, or are mentioned in an inscription for some other reason. Therefore, I need to locate data that has a fuzzy relation to a location (ancient people who may originate, reside, work, and be mentioned in different and / or unknown locations) and locations that may themselves have fuzzy or unknown geography. This is a problem for just about every ancient to pre-modern project, as we do not have a wealth of location information, or even a clear idea of where some people are at any particular moment.
  2. I want to show how social networks form around specific geographic points which are known, and have those social networks remain “reactive” on zooms, changing map states, etc. This can be expanded to encompass epistolary networks, knowledge maps, etc – basically anything that links people together who may not be locatable themselves.
  3. Gephi does not output in GeoJSON, and the remaining export options that are geographically oriented require that *all* nodes have geographic information. As this is not my case (see above), the standard export options will not work for me. Also, as part of my work on BAM, I want to create a framework that is as “plug and play” as possible, so that we can simply take Gephi files and drop them into the system to make new modules. Therefore this work has to be reproducible with a minimum of tweaking.

So, let us get to the code!

First things first: You need to make your html, bring in your javascript,and style some elements. I put the css in the file for testing – it will be split off later.


<!DOCTYPE html>

<head>
<meta name='viewport' content='width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no' />
<!-- Mapbox includes below -->
<script src='https://api.mapbox.com/mapbox.js/v2.2.2/mapbox.js'></script>
<link href='https://api.mapbox.com/mapbox.js/v2.2.2/mapbox.css' rel='stylesheet' />
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.3/jquery.min.js"></script>
<script src="http://d3js.org/d3.v3.js"></script>
</head>
<meta charset="utf-8">
<!-- Will split off css when done with testing -->

<style>
.node circle {
stroke: grey;
stroke-width: 10px;
}

.link {
stroke: black;
stroke-width: 1px;
opacity: .2;
}

.label {
font-family: Arial;
font-size: 12px;
}

#map {
height: 98vh;
}

#attributepane {
display: block;
display: none;
position: absolute;
height: auto;
bottom: 20%;
top: 20%;
right: 0;
width: 240px;
background-color: #fff;
margin: 0;
background-color: rgba(255, 255, 255, 0.8);
border-left: 1px solid #ccc;
padding: 18px 18px 18px 18px;
z-index: 8998;
overflow: scroll;
}
</style>

<body>

<div id='attributepane'></div>

<div id='map'>
</div>

Next, make a map.

<script>
var map = L.mapbox.map('map', 'yourmap', {
accessToken: 'yourtoken'
});

//set the initial view. This is pretty standard for most of the ancient med. projects
map.setView([40.58058, 36.29883], 4);

Pretty basic so far. Next we follow some of the examples that are already in the wild to initiate D3 goodness:


var force = d3.layout.force()
.charge(-120)
.linkDistance(30);

/* Initialize the SVG layer */
map._initPathRoot();

/* We simply pick up the SVG from the map object */
var svg = d3.select("#map").select("svg"),
g = svg.append("g");

Next, we bring in our json file from Gephi. Again, this is pretty standard:


d3.json("graph.json", function(error, json) {

if (error) throw error;

Now we get into the actual modifications to make the json, D3, and leaflet all talk to each other. The first thing to do is to modify the colors (from http://stackoverflow.com/questions/13070054/convert-rgb-strings-to-hex-in-javascript) so that D3 displays what we have in Gephi:


//fix up the data so it is what we want for d3
json.nodes.forEach(function(d) {
//convert the rgb colors to hex for d3
var a = d.color.split("(")[1].split(")")[0];
a = a.split(",");

var b = a.map(function(x) { //For each array element
x = parseInt(x).toString(16); //Convert to a base16 string
return (x.length == 1) ? "0" + x : x; //Add zero if we get only one character
})
b = "#" + b.join("");
d.color = b;

Next, we need to put in “dummy” coordinates for locations that do not have geography. This is messy and could probably be removed with some more efficient coding later. For the nodes that do have geography, the map.latLngToLayerPoint will translate the values into map units, which places them where they need to go. These are simply lat lon attributes in the Gephi file. I also set nodes that are fixed / not fixed, based on the presence of lat/lon data.


if (!("lng" in d.attributes) == true) {
//if there is no geography, then allow the node to float around
d.LatLng = new L.LatLng(0, 0);
d.fixed = false;
} else //there is geography, so place the node where it goes
{
d.LatLng = new L.LatLng(d.attributes.lat, d.attributes.lng);
d.fixed = true;
d.x = map.latLngToLayerPoint(d.LatLng).x;
d.y = map.latLngToLayerPoint(d.LatLng).y;
}
})

Now to setup the links. As we are keyed on attributes and not an index value, we need to follow this fix:


var edges = [];
json.edges.forEach(function(e) {
var sourceNode = json.nodes.filter(function(n) {
return n.id === e.source;
})[0],
targetNode = json.nodes.filter(function(n) {
return n.id === e.target;
})[0];

edges.push({
source: sourceNode,
target: targetNode,
value: e.Value
});
});

var link = svg.selectAll(".link")
.data(edges)
.enter().append("line")
.attr("class", "link");

Now to setup the nodes. I wanted to do a popup on a mouseclick event, but for some reason this is not firing (mousedown and mouseover do work, however). The following code builds the nodes, with radii, fill, and other information pulled from the JSON file. It also toggles a div that is populated with attribute information from the JSON. There is still some work to do at this part: the .css needs to be cleaned up, images need to be resized, and the attribute information for the nodes should be a configurable option when importing the JSON.


var node = svg.selectAll(".node")
.data(json.nodes)
.enter().append("circle")
//display nodes and information when a node is clicked on
//for some reason the click event is not registering, but mousedown and mouseover are.
.on("mouseover", function(d) {

//put in blank values if there are no attributes
var titleForBox, imageForBox, descriptionForBox = '';
titleForBox = '
<h1>' + d.label + '</h1>

';

if (typeof d.attributes.Description != "undefined") {
descriptionForBox = d.attributes.Description;
} else {
descriptionForBox = '';
}

if (typeof d.attributes.image != "undefined") {
imageForBox = '<img src="' + d.attributes.image + '" align="left">';
} else {
imageForBox = '';
}

var htmlForBox = imageForBox + ' ' + titleForBox + descriptionForBox;
document.getElementById("attributepane").innerHTML = htmlForBox;
toggle_visibility('attributepane');
})
.style("stroke", "black")
.style("opacity", .6)
.attr("r", function(d) {
return d.size * 2;
})
.style("fill", function(d) {
return d.color;
})
.call(force.drag);

Now for the transformations when the map state changes. The idea is to keep the fixed nodes in the correct place, but to redraw the “floating” nodes when the map is zoomed in and out. The nodes that need to be transformed are dealt with first, then the links are rebuilt with the new (or fixed) x / y data.


//for when the map changes viewpoint
map.on("viewreset", update);
update();

function update() {

node.attr("transform",
function(d) {
if (d.fixed == true) {
d.x = map.latLngToLayerPoint(d.LatLng).x;
d.y = map.latLngToLayerPoint(d.LatLng).y;
return "translate(" +
map.latLngToLayerPoint(d.LatLng).x + "," +
map.latLngToLayerPoint(d.LatLng).y + ")";
}
}
);

link.attr("x1", function(d) {
return d.source.x;
})
.attr("y1", function(d) {
return d.source.y;
})
.attr("x2", function(d) {
return d.target.x;
})
.attr("y2", function(d) {
return d.target.y;
});

node.attr("cx", function(d) {
if (d.fixed == false) {
return d.x;
}
})
.attr("cy", function(d) {
if (d.fixed == false) {
return d.y;
}
})

//this kickstarts the simulation, so the nodes will realign to a zoomed state
force.start();
}

Next, time to start the simulation for the first time and close out the d3 json block:


force
.links(edges)
.nodes(json.nodes)
.start();
force.on("tick", update);

}); //end

Finally, time to put a function in to toggle the visibility of the div (from here) and close out our file:


function toggle_visibility(id) {
var e = document.getElementById(id);
if (e.style.display == 'block')
e.style.display = 'none';
else
e.style.display = 'block';
}
</script>
</body>

There you have it- a nice, interactive map with a mix of geographic information and social networks. While I am pleased with the result, there are still some things to fix / address:

  1. The click even not working. This is a real puzzler.
  2. Tweaking the distances of the simulation – I do not want nodes to be placed half a world away from their connections. This may have to be map zoom level dependent.
  3. Style the links according to Gephi and provide popups where applicable. This should be easy enough to do, but simply hasn’t been done in this code.
  4. Tweak the visibility of the connections and nodes. While retaining an option to show the entire network at once, my idea is to have a map that starts out with JUST the locations, and then makes the nodes that are connected to that location visible when you click on it (which would also apply to the unlocated nodes – i.e. you see what they are connected to when you click on them).
  5. Connected to the above point, the implementation of a slider to show nodes in a particular timeframe. As my data spans a period from the 600s BCE to the 200s CE, this would provide a better snapshot of a particular network at a particular time.
  6. Implement a URI based system – you will be able to go to address/someEntityName and that entity will be selected with its information pane and connected neighbors displayed. This will result in an RDF file that will be sent to the Pelagios Project.
  7. Fix up the .css for the information pane.

I will detail further steps in a later post.

1 of N: Gephi, D3.js, and maps

Update (11/12/15): See this post to integrate the following code with leaflet.

After finding no real way to use background maps with SigmaJs, I stumbled on this example of combining leaflet with D3.jshttp://bost.ocks.org/mike/leaflet/. The example is more closely aligned with what I want to achieve, which is using a display library to show a social network that respects / interacts with underlying geography. This would be a very valuable visualization for both TBib/BAM and my own work on garrisons, and completing it will allow me to get back to other tasks, like pounding out Greek inscriptions.

For this work I am not tied to Gephi, but I do like its interface and low learning curve, which is valuable for pedagogical and collaborative use. So, my first order of business is getting a Gephi project to talk nicely with D3.js. There is, of course, a nice example already in the wild: http://bl.ocks.org/susielu/9526340. However, this presented some serious problems, which I will outline to (hopefully!) help others who may be going down this path. So, refer back to http://bl.ocks.org/susielu/9526340 for the code template – what follows below are additions / modifications.

geo-attemprFor this project, I want to recreate the image to the right, which was created in Gephi. If you read my previous post on this topic, this image uses a geo-layout plugin to place locations from Pleiades in their correct geographic placement, then uses other layouts to place the people and other non locatable nodes. The eventual goal is to make an interactive network map above an interactive geographic map, so simply exporting these out as a flat svg file will not provide the functionality I need.

My first attempt to simply plug in my own data met with disaster. First, I got hit with an “Uncaught TypeError: Cannot read property ‘weight’ of undefined” error and absolutely no graph. Looking into it, I noticed that the example assumed that nodes would be referenced by their position in an index, NOT by their own id.


 var links = json.edges.map(function(d){
 return {
 'source': parseInt(d.source),
 'target': parseInt(d.target)
 }
 })

My linkages use a unique ID text attribute, which plays havoc with this function. However, this seems like a simple fix: simply remove the parseInt() function, and the actual linkages should work.


var links = json.edges.map(function(d){
 return {
 'source': d.source,
 'target': d.target
 }
 })

netminusnetGetting closer: I see a network graph….only minus the network. Yikes. So, what is going wrong?

It seems that linking nodes by attribute instead of index is a somewhat common problem in D3.js, with a good solution here: http://stackoverflow.com/questions/23986466/d3-force-layout-linking-nodes-by-name-instead-of-index. Following this example, I modified my code by adding the following:


var edges = [];
links.forEach(function(e) {
// Get the source and target nodes
var sourceNode = nodes.filter(function(n) { return n.id === e.source; })[0],
targetNode = nodes.filter(function(n) { return n.id === e.target; })[0];

// Add the edge to the array
edges.push({source: sourceNode, target: targetNode});
});

...

var force = d3.layout.force()
.nodes(nodes)
.links(edges)

...

var link = svg.selectAll(".link")
.data(edges)

workingFinally, the links show! The nodes, however, are of a uniform size. I want the nodes to reflect their size in Gephi. Luckily this was an easy fix: adding


.attr("r", function(d) { return d.size * 3; })

to

 node.append("svg:circle") 

did the trick. I also wanted to add colors from Gephi – the following code does so (with a conversion from RGB to hex provided by http://stackoverflow.com/questions/13070054/convert-rgb-strings-to-hex-in-javascript) :


var a = d.color.split("(")[1].split(")")[0];
a = a.split(",");

var b = a.map(function(x){ //For each array element
 x = parseInt(x).toString(16); //Convert to a base16 string
 return (x.length==1) ? "0"+x : x; //Add zero if we get only one character
})

b = "#"+b.join("");

 
 return {
 'id' : d.id,
 'x' : d.x,
 'y' : d.y,
 'fixed': true,
 'label' : d.label,
 'size' : d.size,
 'color' : b,
 }
 
 })

and


.style("fill", function (d) { return d.color; })

added to


node.append("svg:circle")

onemoreproblemThis produces a graph that looks correct except for one MAJOR problem: It seems the Y axis is inverted from the original! This is obviously not acceptable if I am trying to capture actual coordinates for a map. All is not lost: I do remember this being a problem in the SigmaJS exporter. A fix is provided here: https://github.com/oxfordinternetinstitute/gephi-plugins/issues/5#issuecomment-22291683. For me, this was as simple as adding the following code:


finalY = -d.y;
return {
'id' : d.id,
'x' : d.x,
'y' : finalY,
'fixed': true,
'label' : d.label,
'size' : d.size,
'color' : b,
}

})

to the

  var nodes = json.nodes.map(function(d)

block.

inorderThe next task will be to finalize some functionality for the D3.js portion of the graph, then on to integrating the whole mess with leaflet. Then, when I have all of this in order, time to re-write it to accept all manner of different inputs / etc for BAM. More on both of these ideas later.

Quick and Dirty Footnotes For Gephi / SigmaJS

Before I begin, I once again want to recognize the excellent SigmaJS Exporter plugin for Gephi. This really does mitigate a lot of the grunt work involved in quickly making a usable, interactive social network graph. However, sometimes you just want another feature or some further refinement – in my case adding workable footnotes to information on each node.

For those of us in the humanities, citations are sine qua non for scholarship. However, there are few good was to maintain linkable citations on the web that are not hardcoded beforehand, or reliant on javascript trickery. What I wanted to do was find a tool or a method to easily move text and citations contained in my dissertation to a description field in a Gephi-based application without manually entering footnotes, footnote numbers, or linking them myself, as I have over 2,000 footnotes to deal with.

What I found is a bit of a hack, and certainly can be improved, but it works. First, you are going to want to have your document in a format that is readable by OpenOffice / LibreOffice / etc. What you need to do is select the bit of text you are interested in, dump it into a new file (making sure to include your footnotes!) and then export that file as XHTML.

export-textOnce this is complete, you will have a lovely, fully encapsulated xml file of your text – including all formatting, footnotes, etc. However, we want to eliminate some of the elements produced by this process. Open this file in your favorite text editor. You will notice that you have code similar to the following at the top of the document:


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1 plus MathML 2.0//EN" "http://www.w3.org/Math/DTD/mathml2/xhtml-math11-f.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<!--This file was converted to xhtml by LibreOffice - see http://cgit.freedesktop.org/libreoffice/core/tree/filter/source/xslt for the code.-->
<head profile="http://dublincore.org/documents/dcmi-terms/">
<meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8"/>
<title xml:lang="en-US">- no title specified</title>
<meta name="DCTERMS.title" content="" xml:lang="en-US"/>
<meta name="DCTERMS.language" content="en-US" scheme="DCTERMS.RFC4646"/>
<meta name="DCTERMS.source" content="http://xml.openoffice.org/odf2xhtml"/>
<meta name="DCTERMS.issued" content="2015-09-26T21:09:50.283345000" scheme="DCTERMS.W3CDTF"/>
<meta name="DCTERMS.modified" content="2015-09-26T21:23:01.904161000" scheme="DCTERMS.W3CDTF"/>
<meta name="DCTERMS.provenance" content="" xml:lang="en-US"/>
<meta name="DCTERMS.subject" content="," xml:lang="en-US"/>
<link rel="schema.DC" href="http://purl.org/dc/elements/1.1/" hreflang="en"/>
<link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" hreflang="en"/>
<link rel="schema.DCTYPE" href="http://purl.org/dc/dcmitype/" hreflang="en"/>
<link rel="schema.DCAM" href="http://purl.org/dc/dcam/" hreflang="en"/>

This can all be eliminated. Make sure you retain the




<style type="text/css">

tag at the end of the line.

Likewise, delete this from the start of the last line:

</head><body dir="ltr" style="max-width:8.5in;margin-top:0.7874in; margin-bottom:0.7874in; margin-left:0.7874in; margin-right:0.7874in; writing-mode:lr-tb; ">

and this from the end of the last line:

</body></html>

Now, simply paste what is left into a field in your Gephi data.

Gephi with footnotes

Export as usual, and viola! you have your clickable, interactive footnotes.

footnote-gephi

Now, this is good for a quick and dirty solution, but it would require a modification of the json datafile if you ever make a change or wish to add more information (or, even worse, a re-export of your entire network). As such, this solution will not be used for BAM, as we are seeking a more flexible and modifiable code base.

Networks, Geography, and Gephi: Lots of Promise, but Lots of Work to be Done

This post will outline some of my efforts to bring social networks into dialog with geography. Although I have found some interesting plugins and hacks, the results still leave something to be desired.


Screen Shot 2015-10-07 at 1.40.46 PMTo provide some background: From my dissertation I have a nice, interactive map of all garrisons (phrourai, in orange), and garrison commanders (phrourarchoi, in white) from all of Greek sources up to the mid second century C.E. This is all nicely georeferenced, linked to other projects such as Pleiades and Pelagios, and serves its purpose pretty well. However, this provides the location and frequency of garrisons and commanders, and does not really show the social network that developed between commanders, monarchs, and communities. I could perhaps use a clustering strategy to create dynamic markers around specific points, but that seems to be a very unwieldy solution.

Strictly speaking, by modeling people (phrourarchoi, monarchs) with places and abstract communities I am moving beyond a social network and instead looking at an information network, as I am interested in a number of different connections (social, geographic, ideological) that are not traditionally associated with social network analysis.

The first step to get all of my data into Gephi, assign different “types” to my nodes (in my case people, offices, places, phrourarchoi). I then created a network map, ran statistics, assigned the node size based on degree, and ran a force atlas layout. At the same time I also color coded the network based on type. This is all pretty basic Gephi use so far, and produced a perfectly serviceable network graph.

degree
First graph. Pretty basic and serviceable.

Now it was time to experiment with different types of ranking. Betweenness centrality, or the measure of a node’s influence, led to an interesting difference in graphs:

Untitled
Betweenness Graph. Note the increased importance of individuals.

However, this result is somewhat meaningless, as my graph covers a period from the 400s BCE to the 100s CE. Despite any of his wishes to the contrary, Ptolemy VIII did not live forever, yet he is the unquestioned central authority of this graph. All of the other Egyptian monarchs also score highly, underlining their importance in the communications and relationships between different phrourarchoi. This is an interesting yet hardly unsurprising finding – a good portion of the surviving data on phroruarchoi originates from Ptolemaic Egypt, which may inflate the relative importance of the dynasty in this kind of analysis. What this map does show is the enormous influence of individuals – most of whom were not phrourarchoi themselves.

However, I am interested in garrisons as a sustained phenomena across several centuries, so I want to get back to the importance of location and geography on garrisons. In other words: Where are the most important locations for phrourarchoi, and how do those relate to one another?

Running an Eigenvector Centrality measurement produces a graph that somewhat mimics my original map, with physical locations, not people as the most significant authorities. This gives a better impression of what I am looking for – the centrality of a node relative to the whole network, which in my case privileges locations, which often serve as a bridge between different populations of nodes.

egienvector
Eigenvector Centrality

To me this is an interesting graph: It shows the importance of locations, while still highlighting important individuals. Now that I have this graph, I would love to place it on a map. I actually have coordinates for all of the locations, so a simple use of the Gephi GeoLayout plugin puts all of my identified places in a rough geographic layout.

Screen Shot 2015-10-07 at 12.48.40 PM

From here I simply fixed the location of the places, then ran some other layouts to try and make a coherent graph of people and offices that did not have a specific geographic value.The results were generally less than satisfying. The individuals in my dataset are not assigned coordinates because it would make little sense to do so – some phroruarchoi served in multiple locations, and almost all imperial phrourarchoi served outside their place of origin, were buried somewhere else, possibly lived in yet another location, etc.

geo
Force Atlas combined with GeoLayout
circle
Force atlas and Fruchterman-Reingold
geo-attempr
Adjusting the size of the nodes and running force atlas eventually produced  a result that looks more comprehensible, if a bit small.

From this step, I thought I would try out some Gephi plugins to push my data into a format I could drop onto a map. Only a very small percentage of my nodes actually contain geographic information, so the ExportToEarth plugin was not going to help. My first attempt at pushing out a shapefile using Export to SHP initially looked like a success in QGIS:

Screen Shot 2015-10-07 at 1.43.16 PM
This looks promising…

So, I decided to throw in some background, and that is when the trouble started. QGIS does a good job of transforming coordinates, but this was just messy (and not to mention wrong – there certainly were no phrourarchoi in Antarctica!)

Screen Shot 2015-10-07 at 1.35.59 PM
Note how the nodes are now literally all over the map.

So, what happened? If you do not have coordinates already explicitly assigned to your data, Export to SHP actually does not use “geographic” coordinates, and instead uses, in the words of the plugin, “fake geography – that is the current position of the nodes in the Gephi layout”. My thought that this position would line up with correct coordinates fromGeoLayout were false –Export to SHP treats the middle of the map as an origin point (instead of using whatever geographic data is present), and as such it does not match with any projection in QGIS.

This is a bit of a let down. It seems that all of mapping plugins in Gephi need for *ALL* of the nodes to have geographic information already baked in, or they will not export a geographically accurate map. This does make some sense, but it would be nice if you could use GeoLayout to place nodes with actual geographic data, then use force atlas or some other layout to produce a graph, and finally use the location of those nodes as coordinates. In other words, the location of nodes on the graph that have no actual geographic data of their own are located relative to nodes that do have geographic data. I tried the Sigmajs exporter, but the json object also does not use real coordinates, as seen in the fragment below ( lng and lat are the real-world coordinates, while x and are used by SigmaJS):


"label":"Priene/‘Lince’?",
"x":-22.65546226501465,
"y":32.66741943359375,
"id":"Pl_599905",
"attributes":{
...
"lng":"27.297566084106442",
"lat":"37.659724652604169",
...},

So, is there a way around this?

Short of writing a new plugin to do so, it looks like Gephi is simply missing the functionality of assigning geographic points to nodes that do not already have that information, then exporting that graph in a way that makes sense to mapping software. I could export an image and georeference that, but that will not provide the functionality I am looking for either.

What I would like is for a graph produced by Gephi to use coordinates for nodes that have them, and make real world coordinates for nodes that do not. This map could then be placed on Leaflet / OpenLayers / whatever map, providing a level of interaction beyond a static image. As it is impracticable to duplicate the functionality (especially the statistical tools and layouts) of Gephi in a mapping application, this strikes me as something that would be very valuable to visualization and study.

My next idea is to see if R has something close to what I want, which I will detail in a future post.