Rapid Ethnographic Analysis - From text to networks to simulation
International security is increasingly defined in terms of ethnic, religious or ideological groups engaged in asymmetric conflicts. Such conflicts are characterized by high degree of uncertainty, global distribution, rapidly changing adversary strategies that are adaptive to past operations & the cultural & geopolitical environment. To act successfully in such environments, it is crucial to assess current activity, identify changes, predict possible futures, & continuously refine models of the adversary based on new data, & so recognize & respond to such changing adversary strategies. In this project, we will study cultural factors that allow the development of validated models to predict adversary strategy, group formation, leadership & organizational structure over time, predicting circumstances for adversarial change, & the nature of the change. More specifically, we are interested in modeling phenomena such as military coups, change in access to infrastructure, political shifts & provision of medical aid that may sometimes serve at times as tipping points resulting in either abrupt shifts in adversarial behavior or a catalyst for stability relative to larger issues such as genocide & reconstruction.
The overall goal of this multidisciplinary research is to develop grounded culturally sensitive theories & flexible, robust & scalable computational techniques for modeling & predicting the adaptive behavior of adversaries in asymmetric threat environments. The initial domain of study will be Sudan, with a secondary focus on Darfur & the conflict in that area. The research aims to (a) understand how socio-cultural & political-economic factors can be incorporated into adversary models, (b) create & evaluate robust adversary models that incorporate socio-cultural factors; political & economic factors; local attitudes, values, & social structure; & local asymmetric threat strategies, & (c) incorporate these context sensitive models into computational algorithms & tools to aid analysts in better understanding & predicting adversary behavior as well as enabling commanders to evaluate action plans & strategies for countering adversary intentions. The baseline data includes a series of SME writings on the Sudan, data gathered from various studies, almost 10 years worth of articles from the Sudan Tribune Review.
We are working toward an integrated system plus associated user documentation for a system that will enable rapid & flexible assessment of the adversary & the impact, including unintended consequences of, various courses of action (see picture). This system will operate end-to-end to enable the military to rapidly collect & assess data in novel situations, understand & identify points of strength, vulnerability & deception in that situation, & assess possible COA strategically. In this picture we see the software that will be developed or extended (dark gray circles) & the way the SME's, scientists & military personnel will interact with that software. The gray-blue circles indicate the three main components; rapid ethnographic retrieval (RER); analysis; & strategic adversarial reasoning, each composed of multiple tools & models. RER uses multiple data ingest tools to encode, retrieve & provide data to the analysis & reasoning tools. Data ingest tools include new, existing & extended tools that enable SME directed data retrieval: CEMAP, a tool that can read, parse, & extract header data like that in email & blogs; AutoMap, a text-mining tool for extracting networks; & various parsers for semi-structured data. ORA & its planned extensions, as the key analysis tool, will be at the heart of this project. It is a dynamic network analysis/link analysis/pattern detection tool kit with integrated visualization. Key extensions will enable temporal & geospatial scaling, context sensitive capability assessments, & automated input for the wide range of strategic adversarial models (SAMs) that will be developed & used to assess adaptive adversarial activity.
Figure 1. High Level View of Integrated System for Adaptive Adversarial Modeling. Component technologies include a variety of techniques for rapid ethnographic assessment, link & entity extraction, dynamic network analysis, machine learning & game theoretic reasoning. We envision an end-to-end integrated framework that can be used by the military as they enter a new situation, rapidly collect data on the underlying context, process that data for analysis with DNA tools, collect data in an on-going fashion from humint, sigint, newspapers, blogs, event tracers, standard semi-structured datasets (such as those provided by the UN & World Bank) & use them to update & add changes to the ethnographic assessment of the city, country or region, process updates with DNA & cultural assessment tools, assess the structure, key actors, events, resources, capabilities in the situation, identify points of intervention, reason about possible courses of action using DNA & EGT SAMs, & suggest which information is deceptive. One goal, technologically, will be to use this system to rapidly collect & assess data on multiple scenarios & countries. A second goal, theoretically, is to use this system to develop a better understanding of conflict predicated on how context & deception affects adaptive adversary behavior. A third goal, operationally, is to provide improved support for the military operational need to characterize, assess & predict the behavior of adaptive adversaries. Note we are taking a very different approach than traditional IR where SMEs would be asked to do independent topic studies as we are using these tools to capture SME input, use that input to tune the models in a semi-automatic fashion, & then provide assessments to the SME for topical studies. We are trying to create the necessary paradigm shift in the way SME's work with models & modelers to facilitate rapid & accurate model assessment & validation.
Landwehr, Peter. 'NUBBI: An Introduction and a Non-examination of Sudanese Newspapers' presented at the INSNA Sunbelt Social Networks Conference XXXII, Redondo Beach, California, March 16, 2012.
Traditional methods for extracting semantic networks have relied on linking together words that are used within a particular proximity or by tying together all noun phrases. While effective, both of these methods sacrifice some level of meaning in the text: the former emphasizes words that co-occur but not the meaning of the connection; the latter deliberately sacrifices information conveyed by verbs becuase it is difficult to interpret. In this paper we present a new implementation of the NUBBI method, developed by Blei et al., which attempts to retain some of the information lost by these other techniques. NUBBI uses the probabilistic distribution of all words in a text to creates links that represent latent properties of specified agents and the terms surrounding these agents. We then compare and contrast the results of a NUBBI analysis of a multi-year corpora of Sudanese English-language newspapers with network analyses of the same corpora using conventional windowing techniques, discussing the costs and benefits of each approach.