Terrorist Networks: Rethinking the Logic Behind Web Search Engines


Social Network Blue People

Terror as Coercion: The Major Stumbling Block in a New Subject for Social Network Analysis

Since the notorious events of September 11th, 2001, the study of suicide terrorism and the strategic logic behind large-scale acts of extremism has taken off in a variety of disciplines. As readers of the national news, we receive a predominantly qualitative analysis of terrorist cells, how they are organized, recruit and sometimes cooperate. Without reference to a lot of numerical data or statistics and seemingly without the goal of precise and generalizable measurements, news casters present footage of interviews with Iraqis and Americans that get at the “why” of the issue but not exactly the “how.” Qualitative research on terrorism isn’t confined to the realm of media but extends into academia as well. In this context, a large and enlarging body of literature traces the rise and fall of various terrorist campaigns while commenting on the history of terrorism in general. Such studies are extremely beneficial to our understanding of, for instance, the transformation of an ideology into a formidable terrorist organization. However, given that terrorist groups are uniquely decentralized, diffuse, dynamic and constituted by clusters of dense networks that are otherwise isolated or weakly linked to other clusters[1], it is unreasonable to hope that qualitative methods will produce a sufficiently thorough and dependable explanation for how terrorists function or a reliable framework for predicting future terrorist attacks. Qualitative research, which requires a considerable amount of time to execute under the best of circumstances cannot fully and quickly address the “how” of terrorism and when it comes to the systematic use of terror, time is of the essence. A much more practical approach is that of social network analysis—a set of techniques, theories, models and applications that have proven themselves remarkably valuable to the studies of interaction, interdependency, sustained and truncated relational ties, opportunities for and constraints upon individual action, and structures of the social, economic and political sort.

Social network analysis conceives of relationships, contacts, affiliations, friendships and other web-like structures in terms of “nodes” and “edges,” best defined by the below graphic[2]:

In this example, which depicts a pattern of email communication among employees of Hewlett Packard, the nodes (red dots) represent the individual employees and the edges (gray lines) between them represent their email exchanges. It is easy to see how we might map a terrorist network in this way, allowing nodes to represent terrorist suspects and edges instances contact between them. It is also easy to see how the resulting graph would aid our understanding of how terrorist organizations operate and solve problems, as well as how disaggregated (or centralized!) they really are.

In addition to possessing the attributes described above, the study of terrorism also lends itself to social or dynamic network analysis since it is associated with massive volumes of information that need to be synthesized and disseminated in an efficient way. According to Patrick Keefe, when still engaged in wiretapping, the National Security Agency intercepted some 650 million communications on a daily basis. Furthermore, the National Counterterrorism Center’s database of suspected terrorists currently contains over 325,000 names.[3] Network analysis can help to make sense of this wealth of data by giving it a form and shape that can be more easily appreciated than an interminable list of names and dates.

However, in light of these statistics and intimidatingly large numbers, it is no surprise that productive network analysis is often hindered by an overabundance of information, the bulk of which is frequently extraneous and of limited relevance. Valdis Krebs, the first person to diagram the network of terrorist cells associated with the 9/11 hijackings, notes that the nodes (people, potential terrorists) present in the existing body of information often have “fuzzy boundaries” between them, making it difficult to determine who and who not to include in the mapping of a particular terrorist network.[4] False leads are inevitable but not always immediately apparent and therefore represent a debilitating time waster.[5] The question is, then, how can this profusion of information be gathered, managed and propagated in an efficient way? Assuming we can surmount some major roadblocks—such as this baffling quantity of data—the answer may be contained in the relatively new but burgeoning field of social network analysis.

The Next Generation of the Web: Semantics

As many readers of this article will already know, as the number of pages grows in the World Wide Web, so do the number of search engines, portals and directories, all of which are designed to facilitate the location of useful information. Indeed, much of the information amassed and used by government officials has its origins in the Internet. So what if there was a way to organize and integrate all of this feedback into a single model while also selectively removing unusable or worthless data?  This may in fact be feasible through a tool called the Semantic Web, a group of methods and technologies that allows users to build vocabularies or “ontologies” that enrich data with additional meaning and therefore increase opportunities for effective use of said data.[6] Put succinctly, the purpose of the Semantic Web—an intelligent technique for information categorization, extraction and search—is to make the Web “smarter” and better able to perform useful services for users by adding semantic annotation to Web documents and other resources so that knowledge, rather than unstructured material, is consistently accessed. Through Semantic Web methods and technologies, machines can understand the semantics, the meaning of information, text and data and subsequently create connections for those who take advantage of it, thereby relieving them (at least partially) of the laborious task of consolidating and making sense of various bits and pieces of dispersed information.

Traditional Web portals, those with which the average Internet user is most familiar, are websites that collect information and links to pages and usually operate around a specific theme or topic. Semantic Web portals instead “collect URIs of files on the Semantic Web, and allow users to interact with…statements,” statements being carefully crafted descriptions of URIs that are eventually translated into Resource Description Framework (RDF) graphs in which each resource is represented by a node and each statement—conveying a property—represents an edge. For example, we could take the sentence “Wali Zazi is the father of Najibullah Zazi” (the two were arrested in 2009 for conspiring to execute domestic terrorism) and endow it with machine-readable meaning. With the help of eXtensible Markup Language (XML) and an RDF graph, the Semantic Web would identify “Wali Zazi” as the sentence’s subject, “is the father of” as the sentence’s property, and “Najibullah Zazi” as the sentence’s object. In order to more fully comprehend the names in the sentence and the relationship between them, the Semantic Web would use uniform resource indicators or URIs—series of characters that identify names or resources on the Internet and generally begin with “http”—to associate each element of the sentence with a resource describing its nature, a resource that might not necessarily be a part of the Web. For example, it might associate “Wali Zazi” and “Najibullah Zazi” with a list of suspected terrorists that is not accessible by Web to the general public. After this and many other such meaning-enriched sentences have been entered into the RDF graph, our hypothetical machine can start drawing inferences, eventually making connections between the Zazi men and others who may have helped them to develop their terrorism scheme. A significant implication of this structure is that it allows users of a given RDF graph to navigate through it based on their personal interests, following statements to relevant information that reflects individual objectives and areas of curiosity. In plainer language, although the Semantic Web cannot make computers self-aware, intelligent or sensible, it can make the Web “readable” to machines so that they are able to find and, to a certain degree, decipher information.

Consider the following example: You want to buy “The Lord of the Rings” boxed set online but you have a few specifications: You want widescreen DVDs and you want only those with the extended versions and bonus material. You are willing to buy a used set but only if the quality of the DVDs is still classifiably excellent and while you don’t want to wait too long for delivery, you also don’t want to pay an excessive amount for shipping. Rather than asking you to compare the items available at Amazon.com, BestBuy.com, DVDEmpire.com, etc., the Semantic Web would allow you to input your preferences into a computerized agent that, in addition to searching the Web and finding the best option for you, could also record the amount you spent in the financial software on your computer and mark your computer calendar with the date your DVDs should arrive.[7]

The above is but a synopsis of the constituent elements behind and capabilities of the Semantic Web, whose attributes cannot be fully explored here. Rather, applying the basic knowledge presented, the remainder of this essay will try to demonstrate its potential usefulness for ongoing analyses of terrorist networks.

The Consequences of the Semantic Web for Terrorist Network Analysis

Today, for better or for worse, the U.S. government and its various representatives in the intelligence community are constantly monitoring travelers’ behavior in a surreptitious but large-scale search for mal-intentioned voyagers. Regardless of whether it is consistent with the Constitution, analysts have access to information about what individuals are denied entry into what countries, where suspects stay when they are in transit, the origin of those who visit them at their hotels, etc. Sometimes, by following the movements of two or more suspects simultaneously, they try to determine if collaboration is at work. Semantic Web technologies can facilitate these related processes of pursuit and decision-making by allowing analysts to draw on stored information about the individual suspects in question, such as where they have traveled in the past, if ever they have been in the same location, if they have in common an affiliation with a specific (religious) organization, and so on. Here, the results of initial behavioral scrutiny are essentially input factors into the Semantic Web, which instantaneously links them to relevant available files, allowing analysts to engineer a comprehensive picture of what is going on and to respond to it in a suitable and timely manner without being delayed by the distractions of superfluous information.

The careful plotting of a terrorist network or organization is no modest task but instead requires an onerous series of steps, including collecting data, harmonizing data, and accurately pinpointing relationships between data points. As this process is continually repeated for the sake of completeness, irrelevant data is inevitably accumulated in discouragingly large quantities. Google, the most prominent and most used of traditional search engines, only exacerbates this crisis of immaterial information through its PageRank link analysis algorithm, an approximation of citation importance on the Web that assigns a numerical weighting to hyperlinks for the purpose of determining their relative importance in comparison with other links. PageRank is, essentially, a vote, by all the other pages on the Web, about how important a page is, where a link to a page counts as a vote of support. In the end, the more times a page is linked to and the more times it is linked to by frequently cited pages, the greater its numerical weighting and the more likely it is to appear at the top of search results. Inevitably, this algorithm will consider any frequently or habitually cited page as “relevant,” regardless of whether that page’s content truly reflects, from the user’s perspective, the query that caused it to surface. A Google search for “Abu al Zarqawi,” a close associate of Osama bin Laden, and “Israel,” into which Zarqawi has been accused of smuggling terrorists, generates approximately 148,000 results; however, only a handful of these, scattered throughout, offer information about the connection between Zarqawi and Israel. Most are Web pages that include both the words “Zarqawi” and “Israel” in them, but only coincidentally. The Semantic Web, by contrast, is more discriminatory and would allow researchers to endow the search phrase “Zarqawi+Israel” or “Zarqawi and Israel” with a specific meaning—perhaps related to Zarqawi’s smuggling activity in Israel—so that only the most appropriate information is retrieved and entered into the Web portal.

Additionally, standard search engines can only return data formatted in words and numbers, despite the fact that images, pictures and photographs often convey linkages as well. Fortunately for intelligence case officers, the Semantic Web makes it possible to associate notes, theories and other facts and messages with these forms of information so that no stone is left unturned. For instance, terrorists sometimes communicate through graffiti on the walls and building sides of urban spaces in a way that signifies a “secrete cable of ‘others’ who could strike without warning.”[8] Images of this crafty means of interchange could undoubtedly be helpful as intelligence officers shadow prospective human threats and establish their relationship to other terrorist network insiders.

The Semantic Web is also valuable when aliases, pseudonyms, monikers and other kinds of assumed names come into play. MSNBC.com provides a list of basic information on at-large al Qaeda operatives, including their nicknames when these are known. Some on the list, such as Ayman al Zawahiri, have upwards of twelve known aliases. Fazul Abdullah Mohammed boasts more than 15. Others, including Midhat Mursi, have only one but these can be drastically different from the individuals’ actual names (Mursi’s is Abu Khabab). What is more, MSNBC accentuates that some of its spellings/transliterations “may vary from what has been published elsewhere since different Arab countries use different spellings of even the most common names,”[9] compelling us to acknowledge that illicit activity has probably gone unnoticed in the past because of the failure of traditional search instruments to encode the relationship between, for example, Uday and Oday or Khaddafy and Ghaddafi. A Google search using “Ghaddafi” does not return results with the name “Khaddafy”—a fairly common alternative spelling—nor does it offer the latter as a suggestion for a related search. The Semantic Web represents a way in which to equate multiple alternative spellings and/or to recognize aliases such as “The Doctor” or “The Manager,” making it less easy to unintentionally overlook constructive information.

In the end, a terrorist network is the outcome of hundreds of personal connections. Understanding the relationships and links between the members of this category of network is critical to preemptively deterring terrorist plans or at least to interrupting them. Thankfully, the Semantic Web gives us a way in which to exhaustively describe these relationships, using our knowledge of various members’ hometowns, workplaces, residences, communal affiliations, involvement in certain events, etc. An excellent example of the Semantic Web in action comes from a dataset known as Profiles in Terror (PIT). Developed at University of Maryland, College Park, this resource contains counter-terrorism intelligence information collected from various publicly available real-world sources such as federal court indictments and news reports.[10]

This diagram[11], generated using a PIT demo, acts as a visual representation for the arguments presented above. As we can see, it contains both events (Passover Massacre, Taher calls Sayyed, Driver recruited, etc.) and names (Mohammed Taher, etc.) so that a complete or near complete picture is revealed to the analyst, unlike in the case of conventional Web portals, whose graphs simply cannot contain such a range of information.

The Obstacles that Remain: Making the Web a More Understandable (Mine-able) Place

At the foundation of the Semantic Web are machine-understandable Web pages (this characteristic is essential since it allows for the creation of expansive portals of highly applicable, congruous information). Continuing with our example of terrorism, we may want to extract from various Web pages newspaper articles, video clips, etc. about terrorist activity. These resources and the information contained in them must undergo a process of data mining, during which patterns are extracted from data, so that they become sensible to the machines directly responsible for pulling together and coherently arranging information and therefore indirectly responsible for informing analysts of potential terrorist threats.

Yet establishing a robust Web mining capacity in the context of the Semantic Web is not without its challenges. As noted by Syed Ahsan and Abad Shah, employing the technology behind data mining is difficult when it comes to matters of terrorism because much of the information pertaining to it exists in disparate databases scattered among numerous federal, provincial and local entities that often cannot or simply do not swap knowledge.[12] Nonetheless, the inadequacy of traditional Web portals as compared to the Semantic Web has been fully exposed and unless maximum efficiency and accuracy are not the goals of the CIA and U.S. Government, a concerted effort must be made to make the transition. Rather than forsake the possibilities inherent in the Semantic Web, we should work to achieve greater intergovernmental transparency and correspondence.

Join The Triple Helix Online on Facebook. Follow The Triple Helix  Online on Twitter.

[1] Sara Amin and Tanya Trussler, “Terrorist Network Structures: A Dynamic Analysis of Cellular Durability,” Paper presented at the annual meeting of the American Society of Criminology, November 14, 2007, http://www.allacademic.com/meta/p_mla_apa_research_citation/2/0/1/2/7/p201276_index.html.

[2] David Easley and John Kleinberg, Networks, Crowds, and Markets: Reasoning about a Highly Connected World, Cambridge University Press (2010), p. 3, http://www.cs.cornell.edu/home/kleinber/networks-book/.

[3] Patrick Keefe, “Can Network Theory Thwart Terrorists?,” The New York Times Magazine, March 12, 2006, http://www.nytimes.com/2006/03/12/magazine/312wwln_essay.html?_r=1.

[4] Valdis Krebs, “Uncloaking Terrorist Networks,” First Monday, vol. 7, no. 4 (April 2002), http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/viewArticle/941/863.

[5] Robert Baer quoted in Jennifer Goldbeck, Aaron Mannes and James Hendler, “Semantic Web Technologies for Terrorist Network Analysis.”

[6] “Semantic Web,” W3C, http://www.w3.org/standards/semanticweb/.

[7] Tracy V. Wilson, “How Semantic Web Works,” How Stuff Works: A Discovery Company.

[8] Rene Larche, “Global Terrorism Issues and Developments,” Nova Science Publishers, Inc., 2008, p. 37.

[9] “Al-Qaida Leaders, Associates,” MSNBC.com.

[10] Lise Getoor, Prithviraj Sen and Bin Zhao, “Entity and Relationship Labeling in Affiliation Networks,” Conference on Machine Learning, 2006.

[11] profilesinterror.mindswap.org

[12] Syed Ahsan and Abad Shah, “Data Mining, Semantic Web and Advanced Information Technologies for Fighting Terrorism,” IEEEXplore, 2008, p. 3. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4547644&userType=inst&tag=1.