CASOS Produced Datasets


a2c2_team_d_scuds

Description:

The a2c2_team_d_scuds dataset. This data was originally made publicly available from CASOS in May 2007.

File Downloads:

How to Cite:

Carley, Kathleen M. (2014). ORA: A Toolkit for Dynamic Network Analysis and Visualization, In Reda Alhajj and Jon Rokne (Eds.) Encyclopedia of Social Network Analysis and Mining, Springer.. Retrieved from http://www.casos.cs.cmu.edu/tools/datasets/internal/index.php.


a2c2_team_d_no_scuds

Description:

The a2c2_team_d_no_scuds dataset. This data was originally made publicly available from CASOS in May 2007.

File Downloads:

How to Cite:

Carley, Kathleen M. (2014). ORA: A Toolkit for Dynamic Network Analysis and Visualization, In Reda Alhajj and Jon Rokne (Eds.) Encyclopedia of Social Network Analysis and Mining, Springer.. Retrieved from http://www.casos.cs.cmu.edu/tools/datasets/internal/index.php


Company

Description:

The Company dataset. This data was originally made publicly available from CASOS in May 2007.

File Downloads:

How to Cite:

Carley, Kathleen M. (2014). ORA: A Toolkit for Dynamic Network Analysis and Visualization, In Reda Alhajj and Jon Rokne (Eds.) Encyclopedia of Social Network Analysis and Mining, Springer. Retrieved from http://www.casos.cs.cmu.edu/tools/datasets/internal/index.php


HatField-McCoy dataset

Description

This dataset contains a list of individuals (agents) involved in the Hatfield-McCoy feud between two rural families who lived in the border area between the US states of West Virginia and Kentucky along the Tug Fork of the Big Sandy River in the years 1863–1891. The list of agents is augmented by a multiple attributes describing agent affiliation, participation and associations.

The dataset was constructed based on the family webpages and Wikipedia and is presented in two forms: as an Excel spreadsheet and as an ORA DyNetML nodeset.

Networks can be constructed using ORA based on these attributes but the dataset itself contains only the agent list.

The nodeset is encapsulated as part of a meta-network, per DyNetML convention. The following attributes are associated with the Agent nodeset: Node Name, Node Title, Devil Anse Kids, Harmed Hatfield, Harmed McCoy, Hatfield, Intermarried, Killed due to Feud, McCoy, Randolph Kids, female, male, man, woman.

File Downloads


How to Cite:

Carley, Kathleen M., Morgan, Geoffrey M., & Levine, Joel. (2017). Socio-cultural Cognitive Mapping. Carnegie Mellon University, School of Computer Science, Institute for Software Research, Technical Report, CMU-ISR-17-115.


On August 7, 1998, truck bombs were detonated nearly simultaneously in front of the United States embassies in Nairobi, Kenya and Dar es Salaam, Tanzania. There were 224 fatalities, and more than 4,000 wounded, the vast majority of the fatalities being local citizens but including 12 Americans. Most of the casualties were incurred in Nairobi where the embassy was located in a densely built up area. The subsequent investigation traced both explosions to members of the Egyptian Islamic Jihad and brought the terrorist organization, al-Qaeda lead by Osama bin Laden to public attention.

The datasets in this section are based on news accounts and Wikipedia entries.

Tanzania Embassy CT

Description

This dataset contains five meta-networks with similar structure describing the plot to bomb the US embassy in Tanzania at different periods in time. Files are in separate ZIP archives.

The time periods bracket the embassy bombing event:

  • Periods 1 and 2 - Network growth during preparation phase.
  • Period 3 - Bombing event.
  • Periods 4 and 5 - Network attrition after event.

How to Cite:

Carley, Kathleen M. (2014). ORA: A Toolkit for Dynamic Network Analysis and Visualization, In Reda Alhajj and Jon Rokne (Eds.) Encyclopedia of Social Network Analysis and Mining, Springer. Retrieved from http://www.casos.cs.cmu.edu/tools/datasets/internal/index.php

Tanzania Embassy Bombing - 2009

Description

These are two different versions of the Tanzania embassy bombing, showing movement over time. One (tanzania_3_2009b_trails.xml) is specifically for trails: things move across time, agents only have one location at a time. In this meta-network, only the location edges are changing. All other information is known from the start.

The other (tanzania_3_2009b_learning.xml) is based on a learning/surveillance scenario, where more and more information is slowly learned about the target. In this one, you start out with only a little information about the targets and slowly the meta-network is built up over time. This dataset has trails, too, but they tend to be shorter.

File Downloads


How to Cite:

CASOS Center, Institute for Software Research, Carnegie Mellon University.(2009). Tanzania Embassy Bombing 2009 data set [Data set]. Retrieved from http://www.casos.cs.cmu.edu/tools/datasets/internal/index.php

Tanzania Embassy Bombing - 2006

Description

The Tanzania Embassy Bombing - 2006 dataset. An expanded and more comprehensive version of the Tanzania Kenya dataset developed by Il-Chul Moon. The dataset covers the attrition of the original network.

The networks are include the following sequence of events:

  • Tanzania_3 - Tanzania bombing event and precursor terror events.
  • Tanzania 4 - Adds bombing of USS Cole on October 12, 2000.
  • Tanzania 5 - Adds 9/11 attacks in the United Sates (September 11, 2001).

File Downloads

Original DyNet Format

How to Cite:

CASOS Center, Institute for Software Research, Carnegie Mellon University.(2008). Tanzania Embassy Bombing 2006 data set [Data set]. Retrieved from http://www.casos.cs.cmu.edu/tools/datasets/internal/index.php

Tanzania Kenya

Description

Data based on the Tanzania - Kenya bombing. A small dataset showing key actors in the two bombings, good for demonstration purposes but incomplete in terms of incorporating all actors. Developed by Il-Chul Moon and used for his dissertation.

File Downloads

References:

  • Carley, Kathleen M. (2014). ORA: A Toolkit for Dynamic Network Analysis and Visualization, In Reda Alhajj and Jon Rokne (Eds.) Encyclopedia of Social Network Analysis and Mining, Springer.. Retrieved from http://www.casos.cs.cmu.edu/tools/datasets/internal/index.php
  • Moon, Il-Chul. (2008). Destabilization of Adversarial Organizations with Strategic Interventions. Carnegie Mellon University, School of Computer Science, Institute for Software Research, Computation, Organizations and Society, Doctor of Philosophy. http://www.casos.cs.cmu.edu/publications/papers/CMU-ISR-08-124.pdf.

How to Cite:

Moon, Il-Chul. (2008). Destabilization of Adversarial Organizations with Strategic Interventions. Carnegie Mellon University, School of Computer Science, Institute for Software Research, Computation, Organizations and Society, Doctor of Philosophy. http://www.casos.cs.cmu.edu/publications/papers/CMU-ISR-08-124.pdf.


Glitch

Description

This report presents a collection of data describing many of the economic and social interactions that occurred in Glitch, a recently closed massively multiplayer online game aimed at casual players. The bulk of the data are records of sales of virtual goods by different players during the game's life, stored as CSV files. The records depict a history of the Currant, the game's principal currency. The data also includes both a snowball-sample based scrape of all of the explicit friendships between different players (where friendships are minimally defined relationships that players may create for any reason) and a scrape of the seven different Glitch forums. This scrape is presented as both raw HTML and in a minimally parsed format sufficient for generating a basic network of posters and responders, while poster and responder networks are provided in DyNetML format. This data was used and collected in Tech Report, CMU-ISR-13-100.

The data is divided into two sections based on format: Network DyNetML files are contained in the "DyNetML files" directory, while CSV files and raw HTML are contailed in the "CSV and HTML files" directory. The directory tree is as follows:

  • CSV and HTML files
    • economicData
      • allGlitchAuctions.csv - All of the auctions carried out during Glitch's life that were successfully scraped via the API.
      • allGlitchSDBsales.csv - All SDB-based sales carried out during Glitch's life that were successfully scraped via the API.
      • glitchStreetPricesDecember.csv - The official street price of every item listed in the official Glitch Encyclopedia in December, 2012. (The game includes many items not present in the encyclopedia.)
    • forumDigests
      • announcementsDigest.csv - Digest of all of the posts and responses in the Announcements Forum.
      • bugsDigest.csv - Digest of all of the posts and responses in the Bugs Forum.
      • developerDigest.csv - Digest of all of the posts and responses in the Developer Forum.
      • generalDigest.csv - Digest of all of the posts and responses in the General Forum.
      • ideasDigest.csv - Digest of all of the posts and responses in the Ideas Forum.
      • marketplaceDigest.csv - Digest of all of the posts and responses in the Marketplace Forum.
      • offtopicDigest.csv - Digest of all of the posts and responses in the Offtopic Forum.
    • forumsRaw
      • announcementsThreads (directory) - HTML scrape of all threads posted to the Announcements Forum.
      • bugsThreads (directory) - HTML scrape of all threads posted to the Bugs Forum.
      • developerThreads (directory) - HTML scrape of all threads posted to the Developer Forum.
      • generalThreads (directory) - HTML scrape of all threads posted to the General Forum.
      • ideasThreads (directory) - HTML scrape of all threads posted to the Ideas Forum.
      • marketplaceThreads (directory) - HTML scrape of all threads posted to the Marketplace Forum.
      • offtopicThreads - HTML scrape of all threads posted to the Off Topic Forum.
    • playerFriendships.csv - A CSV file of all explicit player friendships in Glitch. These connections are not necessarily mutual.
  • DyNetML files
    • Player Friendships.xml - A DyNetML network of all explicit player friendships in Glitch. These connections are not necessarily mutual.
    • forumNetworks
      • Announcements Network.xml - DyNetML network linking posters, repliers, posts, and replies in the Announcements Forum. Contained in two nodesets, Posts and Replies.
      • Bugs Network.xml - DyNetML network linking posters, repliers, posts, and replies in the Bugs Forum. Contained in two nodesets, Posts and Replies.
      • Developer Network.xml - DyNetML network linking posters, repliers, posts, and replies in the Developer Forum. Contained in two nodesets, Posts and Replies.
      • General Network.xml - DyNetML network linking posters, repliers, posts, and replies in the General Forum. Contained in two nodesets, Posts and Replies.
      • Ideas Network.xml - DyNetML network linking posters, repliers, posts, and replies in the Ideas Forum. Contained in two nodesets, Posts and Replies.
      • Marketplace Network.xml - DyNetML network linking posters, repliers, posts, and replies in the Marketplace Forum. Contained in two nodesets, Posts and Replies.
      • Offtopic Network.xml - DyNetML network linking posters, repliers, posts, and replies in the Off Topic Forum. Contained in two nodesets, Posts and Replies.

File Downloads

References

Landwehr, Peter. (2013). A Collection of Economic and Social Data from Glitch, a Massively Multiplayer Online Game. Carnegie Mellon University, School of Computer Science, Institute for Software Research, Technical Report, CMU-ISR-13-100.

How to Cite:

Landwehr, Peter. (2013). A Collection of Economic and Social Data from Glitch, a Massively Multiplayer Online Game. Carnegie Mellon University, School of Computer Science, Institute for Software Research, Technical Report, CMU-ISR-13-100. Retrieved from http://www.casos.cs.cmu.edu/tools/datasets/internal/index.php


kc_node_aa

Description

The kc_node_aa dataset.

File Downloads

How to Cite:

Carley, Kathleen M. (2014). ORA: A Toolkit for Dynamic Network Analysis and Visualization, In Reda Alhajj and Jon Rokne (Eds.) Encyclopedia of Social Network Analysis and Mining, Springer. Retrieved from http://www.casos.cs.cmu.edu/tools/datasets/internal/index.php


kc_node_af

Description

The kc_node_af dataset.

File Downloads

How to Cite:

Carley, Kathleen M. (2014). ORA: A Toolkit for Dynamic Network Analysis and Visualization, In Reda Alhajj and Jon Rokne (Eds.) Encyclopedia of Social Network Analysis and Mining, Springer. Retrieved from http://www.casos.cs.cmu.edu/tools/datasets/internal/index.php


Sci-Fi Books

Description

This data was collected by Dr. Kathleen M. Carley. Dataset does not contain a pre-built network. The attributes (which can also be viewed as knowledge nodes) are:

  • frequency - the number of lists of top N science fiction books that these appeared in circa 2016
  • date - the date first published; if it is a series, date listed is the date first book was published
  • author - author of the book or series
  • century - which century it was written in; 1 = prior to 1900, 2 = prior to 2000, 3 = prior to 2100
  • quarter century - 1 = 1800 to 1824; 2 = 1825 to 1849; 3 = 1850 to 1874; 4 = 1875 to 1899; 5 = 1900 to 1924; 6 = 1925 to 1949; 7 = 1950 to 1974; 8 = 1975 to 1999; 9 = 2000 to 2024
  • author gender - 1 = male; 2 = female
  • content of story - for each of the following, rates were given about the content of the story. A 4 point scale was used and each book was read by 2 people with disagreements argued until agreement was reached; 0 = that was not present; 1 = that was present but peripheral; 2 = that was present at a stronger level but not strongly integral to the story; 3 = that was present, strong and integral to the story.
    • robots, androids or AI computers
    • battles
    • romance
    • magic
    • time travel
    • interplanetary
    • multi-species, sentient species
    • beasts
    • psychic powers
    • novel technology (not AIish, ex. steam based technology is considered novel)
    • after catastrophe - often post apopolyptic

File Downloads

How to Cite:

Kathleen M. Carley(2017). Sci-Fi Books data set [Data set]. Retrieved from http://www.casos.cs.cmu.edu/tools/datasets/internal/index.php


Syria in the News

Description

This data was collected from the web indicating features of the variouos groups in Syria. These are the same groups as in another Syrian data set so it can be used for comparison. The data is binary, a 1 indicates that feature is present, a 0 indicates it is not. Dataset does not contain a pre-built network.

File Downloads

How to Cite:

Kathleen M. Carley(2017). Syria in the news data set [Data set]. Retrieved from http://www.casos.cs.cmu.edu/tools/datasets/internal/index.php


Startup Company

Description

The Startup Company dataset hand coded by Dr. Kathleen M. Carley and company Netanomics. This is based on ethnographic observations of a CMU based startup, May 2012.

File Downloads

How to Cite:

Carley, Kathleen M. (2014). ORA: A Toolkit for Dynamic Network Analysis and Visualization, In Reda Alhajj and Jon Rokne (Eds.) Encyclopedia of Social Network Analysis and Mining, Springer. Retrieved from http://www.casos.cs.cmu.edu/tools/datasets/internal/index.php


Welsh Canals

Description

A single input file in ORA's advanced table format encodes geographical, location and time information relating to the network of canals in Wales (as they were described in Wikipedia in 2016). The data covers the canal network's evolution from 1790 to 2016. In several cases, individual canals were closed as commercial enterprises but have been restored and reopened for recreational use causing canals to appear, disappear and then reappear when viewed over time.

As described in the ORA help entry Import Advanced Table the file can be used to create a dynamic meta-network of the Welsh canal network over time. In this encoding, a series of geographic places form the nodes which are connected by links representing the canals. Since the number of places is small, the resulting network is only approximate in a geographic sense but, like a subway map, provides an abstract representation of the network. The date information defines the duration of the link from completion to abandonment on a decade scale. (Interestingly, the choice of places mentioned in Wikipedia varies between articles, depending on the author's orientation and might link original canal features, towns and villages or modern transportation features.)

The dynamic meta-network resulting from the ORA help procedure is also provided. Describes the network produced by following the procedure Import Advanced Table in the ORA help.

File Downloads

How to Cite:

Neal Altman (2016). Welsh Canals data set [Data set]. Retrieved from http://www.casos.cs.cmu.edu/tools/datasets/internal/index.php


West Bank

Description

The West Bank dataset. This data was collected at CASOS. It is about six major terrorist groups that operate in the West Bank. These groups are the Al Aksa Martyrs Brigades, Al Fatah, Al Qaeda, Hamas, Hezbollah, and the Islamic Jihad. We gathered the 18 texts from that the networks were extracted from LexisNexis Academia via exact matching Boolean keyword search for each of the groups. The media searched with LexisNexis were The Economist, The Washington Post, and The New York Times. The time frame of our data set ranges from articles published in 2000 to 2003.

File Downloads

How to Cite:

Carley, Kathleen M. (2014). ORA: A Toolkit for Dynamic Network Analysis and Visualization, In Reda Alhajj and Jon Rokne (Eds.) Encyclopedia of Social Network Analysis and Mining, Springer. Retrieved from http://www.casos.cs.cmu.edu/tools/datasets/internal/index.php


Stargate

Description

Stargate dataset based on the TV series Stargate

Note: If using the Stargate dataset please refer to the ORA Users Guide.

File Downloads

How to Cite:

Carley, Kathleen M. (2014). ORA: A Toolkit for Dynamic Network Analysis and Visualization, In Reda Alhajj and Jon Rokne (Eds.) Encyclopedia of Social Network Analysis and Mining, Springer. Retrieved from http://www.casos.cs.cmu.edu/tools/datasets/internal/index.php


Star Wars

Description

This dataset was created by Dave Columbus (CASOS) and Jon Storrick (CASOS) for use in the ORA Help documentation. The 24 separate meta-network files contain time information and can visualized using the Loom analysis or combined into a dynamic meta-network.

File Downloads

How to Cite:

Carley, Kathleen M. (2014). ORA: A Toolkit for Dynamic Network Analysis and Visualization, In Reda Alhajj and Jon Rokne (Eds.) Encyclopedia of Social Network Analysis and Mining, Springer. Retrieved from http://www.casos.cs.cmu.edu/tools/datasets/internal/index.php


OSN Threat Groups

Description

The following files are contained in this dataset;

  • CJTC_readme.txt - contains information about the data files.
  • attributes.tsv - contains node attribute information for users associated with the 2 hop snowball sample described in the aforementioned work. The fields coorespond to fields provided by the Twitter API. The file contains the following fields:
    • userID
    • ScreenName
    • followingCount
    • followerCount
    • tweetCount
    • lastTweet
    • creation_date
    • urlCount
    • mentionCount
  • friend_edgefile.xml - a directed network edge list of the following or friend ties associated with all nodes listed in attributes.tsv. Divided into 2 separate nodesets.
  • mention_edgefile.xml - a directed network edge list of the mention ties associated with all nodes listed in attributes.tsv. Additionally epoch time for each edge is provided in the 'date' field. Divided into 3 separate nodesets.
  • user_ht_edgefile.tsv - a bipartite network edge list of the user to hash tag ties associated with all nodes listed in attributes.tsv. Additionally epoch time for each edge is provided in the 'date' field. This data can be analyzed using R source code provided at: https://github.com/mbenigni/OSNThreatGroups.

File Downloads

References:

Matthew Benigni and Kathleen M. Carley, 2016, From Tweets to Intelligence: Understanding the Islamic Jihad Supporting Communities in Twitter, In Proceedings of the International Conference SBP-BRiMS 2016, Kevin S. Xu, David Reitter, Dongwon Lee and Nathaniel Osgood (Eds.) June 28-July 1, 2016 Washington DC, Springer, DOI: 10.1007/978-3-319-39931-7.

How to Cite:

Matthew Benigni and Kathleen M. Carley (2016) From Tweets to Intelligence: Understanding the Islamic Jihad Supporting Communities in Twitter, In Proceedings of the International Conference SBP-BRiMS 2016, Kevin S. Xu, David Reitter, Dongwon Lee and Nathaniel Osgood (Eds.) June 28-July 1, 2016 Washington DC, Springer, DOI: 10.1007/978-3-319-39931-7. Retrieved from http://www.casos.cs.cmu.edu/tools/datasets/internal/index.php


OEC Detection

Description

This dataset was created by CASOS graduate student Matthew Benigni for the 2016 SBP-BRiMS Tutorial on Covert Groups.

File Downloads

How to Cite:

Benigni, Matthew (2016).Detection, Analysis, and Distruption of Online Extremist Communities [Tutorial Presentation]. 2016 SBP-BRiMS International Conference, Washington, DC. Retrieved from http://www.casos.cs.cmu.edu/tools/datasets/internal/index.php


Good Flightpaths

Description:

The flightpaths datasets are near identical, developed serve as a tutorial on how bringing data into the ORA GIS works. Contained in 3 separate nodesets.

Good Flightpaths showcases the proper form. Airport terminals are of node type Location, each has their own latitude and longitude, and the agents have Agent x Location connections, which ORA takes to be an Agent is located at... connection.

When you import data into the ORA GIS, it asks two questions in the backend. Does this data have coordinate data (LatLon, MGRS, etc) and What nodesets have coordinate data? If no coordinate data is found, you have to configure it. If it finds coordinate data in ONLY nodesets of type Location (as in Flightpaths good), then all locations are placed, and any network that TARGETS a location nodeset is said to have a Is located at relationship. If it finds coordinate data in nodesets of ANY type besides location, all nodes are placed according to their coordinates, and we make no assumptions about networks. Links are just links.

File Downloads:

How to Cite:

Storrick, Jon. Good flightpaths [dataset]. Available from: http://www.casos.cs.cmu.edu/tools/datasets/internal/index.php.


Not Good Flightpaths

Description:

The flightpaths datasets are near identical, developed serve as a tutorial on how bringing data into the ORA GIS works.

Not Good Flightpaths is mostly the same as the Flightpaths Good dataset, but the nodesets with geographic data aren't of type location. Not Good Flightpaths, at the time it was developed, would not load in the ORA Visualizer at all. Using it was an exercise in showing users how to configure non-formatted data to load it into GIS. In current versions of ORA, the dataset will load and display without error.

File Downloads:

How to Cite:

Storrick, Jon. Not Good flightpaths [dataset]. Available from: http://www.casos.cs.cmu.edu/tools/datasets/internal/index.php.


Marvel Phase 1

Description:

This dataset was created by Netanomics staff and is based on early marvel super hero comics. Created June 2016.

File Downloads:

How to Cite:

Netanomics. Marvel Phase 1 [dataset]. Available from: http://www.casos.cs.cmu.edu/tools/datasets/internal/index.php.


PR1

Description:

PR1 is a random network for larger data demonstrations.

File Downloads:

How to Cite:

Storrick, Jon. PR1 [dataset]. Available from: http://www.casos.cs.cmu.edu/tools/datasets/internal/index.php.


Raiders of the Lost Ark

Description:

Based on the movie Raiders of the Lost Ark (1981), directed by Steven Spielberg. Encodes the location of the characters for each time interval (where the location of characters is known). All meta-networks contain the same number of nodesets and nodes; 756 nodes in total are contained in the dynamic meta-network as a whole.

File Downloads:

How to Cite:

CASOS Center. Raiders of the Lost Ark [dataset]. Available from: http://www.casos.cs.cmu.edu/tools/datasets/internal/index.php.


Valkyria Chronicles

Description:

Valkyria Chronicles is a mapping of character affinities and powers from the video game Valkyria Chronicles.

File Downloads:

How to Cite:

Storrick, Jon. Valkyria Chronicles [dataset]. Available from: http://www.casos.cs.cmu.edu/tools/datasets/internal/index.php.


NATO 2016 Exercises

Description

NATO conducted one of its largest exercises over the past decade, Trident Juncture, from October 21st to November 6th 2015. As part of this exercise, the Center for Computational Analysis of Social and Organizational Systems (CASOS) at Carnegie Mellon was asked to assess, in partnership with the Data Mining and Machine Learning Lab at Arizona State University, the social media response to Trident Juncture. The dataset provides social media observations for the period of one week, October 5-11, 2015.

File Downloads

References:

Frankenstein, Will, Huang, Binxuan, and Carley, Kathleen M. (2016). NATO Trident Juncture on Twitter: Public Discussion. Carnegie Mellon University, School of Computer Science, Institute for Software Research, Technical Report, CMU-ISR-16-100.

How to Cite:

Frankenstein, Will, Huang, Binxuan, and Carley, Kathleen M. (2016). NATO Trident Juncture on Twitter: Public Discussion. Carnegie Mellon University, School of Computer Science, Institute for Software Research, Technical Report, CMU-ISR-16-100. Retrieved from http://www.casos.cs.cmu.edu/tools/datasets/internal/index.php