Tuesday, February 24, 2015

Wisdom of the crowd or technicity of content? Wikipedia as a sociotechnical system


Sabine Niederer
new media & society XX(X) 1–19 © The Author(s) 2010
Reprints and permission: sagepub. co.uk/journalsPermissions.nav DOI: 10.1177/1461444810365297 http://nms.sagepub.com
University of Amsterdam, the Netherlands
José van Dijck
University of Amsterdam, the Netherlands
Abstract
Wikipedia is often considered as an example of ‘collaborative knowledge’. Researchers have contested the value ofWikipedia content on various accounts. Some have disputed the ability of anonymous amateurs to produce quality information, while others have contested Wikipedia’s claim to accuracy and neutrality. Even if these concerns about Wikipedia as an encyclopaedic genre are relevant, they misguidedly focus on human agents only.Wikipedia’s advance is not only enabled by its human resources,but is equally defined by the technological tools and managerial dynamics that structure and maintain its content.This article analyzes the sociotechnical system – the intricate collaboration between human users and automated content agents – that defines Wikipedia as a knowledge instrument.

Keywords
collaborative knowledge, protocol, sociotechnical system,Web 2.0,Wikipedia
Introduction
User-generated content has revived the idea of the web as a place of human collaboration and a place of activity for ‘everybody’ (Shirky, 2008). The Wikipedia project is often considered as an example par excellence of ‘collaborative knowledge’, of ‘social media’
Corresponding author:
Sabine Niederer, University of Amsterdam,Turfdraagsterpad 9, 1012 XT Amsterdam, the Netherlands. Email: s.m.c.niederer@uva.nl
2
new media & society XX(X)


or the ‘wisdom of crowds’. Since 2001, a group of editors and volunteers have engaged in developing an online encyclopaedia, whereby everyone is invited to contribute and articles are open to continuous editing. Large numbers of contributors, the so-called ‘Wikipedians’, produce an online encyclopaedia that is unprecedented in scale and scope. Researchers who have evaluated or contested the value of Wikipedia content have almost unanimously focused on its human contributors. For instance, it has been dis- puted whether an encyclopaedia that is produced by many (anonymous) minds results in quality information (Keen, 2008). Other critics have contested Wikipedia’s claim to accuracy and neutrality by pointing at the liability of allowing anonymous contributors whose interest or expertise remains undisclosed.
Even if these concerns about Wikipedia as an encyclopaedic genre are legitimate and relevant, we argue that they misguidedly focus on human agents only, while neglecting the role of technology. In this article, we focus on the intricate interrelation between human and technological tools which lies at the heart of several debates concerning Wikipedia. The first debate revolves around the question of whether Wikipedia is authored primarily by a few elite users or by many common contributors, an opposition we would like to question, for Wikipedia has a refined hierarchical structure in which contributing administrators, registered users, anonymous users and ‘bots’(short for ‘soft- ware robots’; see below) all have a distinct rank in an orderly system. Second, we would like to refocus the public debate on the quality of Wikipedia’s encyclopaedic information (disputing whether its entries are accurate and neutral) by shifting attention to the proto- cols and technologies deployed to facilitate consensus editing. A basic comprehension of Wikipedia’s automated editing systems, as well as emerging tracking tools like the WikiScanner, are needed to evaluate the encyclopaedia’s ability to meet standards of neutrality and accuracy, while preventing overt bias and vandalism.
In the third section of this article, we want to show how dependent various user groups and entries are on non-human content agents (or bots) that assist in editing Wikipedia content. Examining the variable dependency of human editors on bots for editing ency- clopaedic content per language Wikipedia, we explore how an automated system of ‘interwiki’ and ‘interlanguage’ bots helps maintain the overall content strategies of the online encyclopaedia. Linking and networking specific language Wikipedias into one global system is less the result of people working together across languages and borders as it is the product of collaboration between humans and bots.
From our analysis, we conclude that any evaluation of Wikipedia’s qualities should acknowledge the significance of the encyclopaedia’s dynamic nature as well as the power of its partially automated content management system. It is the intricate collaboration between large numbers of human users and sophisticated automated systems that defines Wikipedia’s ultimate success as a knowledge instrument. In order to unravel this intricate human-technological interaction, we deploy several ‘natively’ digital methods for web research (Rogers, 2009).1 In his publication The End of the Virtual, Rogers calls for ‘research with the Internet’that applies novel medium-specific methods and tools. Rather than importing existing analytical methods onto the web, natively digital methods can ‘move beyond the study of online culture alone’ (2009: 5) and help understand new media as the interplay between human and technological agents. The concrete goal of this analysis is to provide a better understanding of how Wikipedia’s automated
Niederer and Dijck
3


technological systems and the management of large numbers of edits are inextricably intertwined. More philosophically, we want to theorize human and machine contribu- tions as complementary parts of a sociotechnical system that lies at the heart of many Web 2.0 platforms.
Many minds collaborating
Wikipedia has been described with terms such as ‘many minds’ (Sunstein, 2006) and similar notions such as ‘the wisdom of crowds’ (Kittur and Kraut, 2008; Surowiecki, 2004), ‘distributed collaboration’ (Shirky, 2008), ‘mass collaboration’ (Tapscott and Williams, 2006), ‘produsage’ (Bruns, 2008), ‘crowdsourcing’ (Howe, 2006), ‘Open Source Intelligence’ (Stalder and Hirsh, 2002) and ‘collaborative knowledge’ (Poe, 2006). The collectively written encyclopaedia on a wiki platform is often heralded as an example of collaborative knowledge production at its best. In early 2008, an article in the New York Review of Books explained the compelling charm of Wikipedia:
So there was this exhilarating sense of mission – of proving the greatness of the Internet through an unheard-of collaboration. Very smart people dropped other pursuits and spent days and weeks and sometimes years of their lives doing ‘stub dumps,’ writing ancillary software, categorizing and linking topics, making and remaking and smoothing out articles – without getting any recognition except for the occasional congratulatory barnstar on their user page and the satisfaction of secret fame. Wikipedia flourished partly because it was a shrine to altruism – a place for shy, learned people to deposit their trawls. (Baker, 2008)
Since the start of the Wikipedia project in 2001, the dedication of its contributors, as well as the group effort as an alternative to the professional expert approach, have been sources of both excitement and criticism. Even if Wikipedia has now become famous for its collaborative character of many minds producing knowledge, it is interesting to remind ourselves that the project originally intended to be an expert-generated encyclo- paedia. Started with the name of ‘Nupedia’, a small team of selected academics was invited to write the entries, with the aim of creating a ‘free online encyclopaedia of high quality’ (Shirky, 2008: 109). The articles would be made available with an open content licence. Founder Jimmy ‘Jimbo’ Wales and his employee Larry Sanger put into place a protocol based on academic peer review (Poe, 2006; Shirky, 2008). This expert approach failed partly because of the slowness of the editing process by invited scholars. To speed up the process, Sanger suggested a wiki as a collective place where scholars and inter- ested laypeople from all over the globe could help with publishing and editing draft articles. The success of Wikipedia and the commitment of the Wikipedians took them by surprise. Sanger became the ‘chief organizer’, a wiki-friendly alternative to the job of ‘editor-in-chief’ that he held for Nupedia. He made a great effort to keep Wikipedia orga- nized, while at the same time providing space for some of the ‘messiness’ (edit wars, inaccuracies, mistakes, fights, and so on) that collaborative editing brings along. In early 2002, however, Sanger turned away from the epistemic free-for-all of Wikipedia towards an expert-written encyclopaedic model called Citizendium (http://en.citizendium.org/ wiki/Welcome_to_Citizendium), while Wales chose to further pursue the Wiki model.
4
new media & society XX(X)


The question as to whether online encyclopaedias and similar enterprises should be produced by few (expert) or many (amateur) minds has been the source of heated debate ever since the Sanger/Wales split. Internet critic Andrew Keen (2008: 186) applauded Sanger for coming to his senses about the debased value of amateur contributions in favour of expert professionals. On the other end of the spectrum, many Wikipedia fans have praised its democratizing potential as well as its ethos of community and collabora- tion, a source of knowledge free for everyone to read and write (Benkler, 2006; Jenkins, 2006). By the same token, the notion that Wikipedia is actually produced by ‘crowds’ has been regularly challenged, most notably by Wikipedia’s founders. During the first five years of its existence, Wikipedia was largely dependent upon the work of a small group of dedicated volunteers. Although they soon formed a thriving community, the notion of a massive collective of contributors was repeatedly downplayed by Wales. As he pointed out in a talk at Stanford University in 2006:
The idea that a lot of people have of Wikipedia is that it’s some emergent phenomenon – the wisdom of mobs, swarm intelligence, that sort of thing – thousands and thousands of individual users each adding a little bit of content and out of this emerges a coherent body of work (…) [But Wikipedia is in fact written by] a community, a dedicated group of a few hundred volunteers (…) I expected to find something like an 80–20 rule: 80% of the work being done by 20% of the users (…) But it’s actually much, much tighter than that: it turns out over 50% of all the edits are done by just [0].7% of the users. (Wales cited in Swartz, 2006)
As Wales asserts, until 2006, Wikipedia was largely written and maintained by a small core of dedicated editors (2% doing 73.4% of all the edits). The disproportionate contri- bution of (self-)designated developers versus ‘common users’ can also be found in research into the open source movement. Rishab Aiyer Ghosh and Vipul Ved Prakash were among the first to disaggregate the ‘many minds’ collaborating in the open software movement. Their conclusion was that, ‘free software development is less a bazaar of several developers involved in several projects and more a collation of projects devel- oped single-mindedly by a large number of authors’ (Ghosh and Prakash, 2000: 1). In the open source movement, very few people were actually collaborating in developing software.
It would be a mistake, however, to dismiss the idea of Wikipedia’s ‘many contribu- tors’ as a myth. Starting in 2006, the online encyclopaedia showed a distinct decline in ‘elite’ users, while, at the same time, the number of edits made by novice users and ‘masses’ was steadily increasing. Various researchers have pointed to the dramatic shift in the workload to the common user (Kittur et al., 2008). Instead of pitching the power of the expert versus the wisdom of the crowds, Kittur et al. speak of ‘the rise of the bour- geoisie’, a marked growth in the population of low-edit users between 2006 and 2008. Interestingly, these researchers explain this shift by describing Wikipedia in terms of a dynamic social system that evolves as a result of the gradual development, implementa- tion and distribution of content management systems. After an initial period of being managed by a small group of high-powered, dedicated volunteers, the ‘pioneers were dwarfed by the influx of settlers’ (Kittur et al., 2008: 8). The early adopters select and refine technological and managerial systems, followed by a majority of novice users who
Niederer and Dijck
5


begin to be the primary users of the system. Kittur and his colleagues observe a similar decline in elite users of Web 2.0 platforms and suggest that it may be a common phenom- enon in the evolution of online collaborative knowledge systems. This tentative conclu- sionisunderscoredbyotherresearcherswhoshowthat,inordertosustaintheencyclopaedia’s growing popularity, its organizers need to identify more productive workers and grant them ‘administrator’s status’ (Burke and Kraut, 2008).
Although these researchers correctly observe significant changes in the ‘wisdom of crowds’ paradigm, they seem to be stuck in the antagonism of (few) experts versus (many) common users. Even if they notice the growing presence of non-human actors in the evolution of Wikipedia’s social dynamics, such as software tools and managerial protocols, they tend to underestimate their importance. In fact, the increasing openness of Wikipedia to inexperienced users is made possible by a sophisticated technomanage- rial system, which facilitates collaboration on various levels. Without the implementa- tion of this strict hierarchical content management system, Wikipedia would most likely have become a chaotic experiment.
According toAlexander Galloway, the internet and many of its (open source) applications are not simply open or closed, but modulated. Networked technology and management style are moderated by protocol, which gains its authority ‘from technology itself and how people program it’(2004: 121). Wikipedia, built as an open system and carried out by large numbers of contributors, appears to be a ‘warm, friendly’technological space, but only becomes warm and friendly through what Galloway refers to as ‘technical standardization, agreement, orga- nized implementation, broad adoption and directed participation’ (2004: 142).
This is exactly what happened during the first five years of Wikipedia, during which time administrators developed strict protocols for distributing permission levels, impos- ing a hierarchical order in deciding what entries to include or exclude, what edits to allow or block. If we look more closely at Wikipedia’s organizational hierarchy (see Figure 1),
Permission level
Wikipedia users
Most permissions
Developer/System administrator





Steward





Check user


Oversight


Bureaucrat


Administrator/Sysop


Bot


Registered user


Newly registered user


Anonymous user
No permissions
Blocked user
Figure 1.  Schematic overview of global and local categories of Wikipedia users according to permission levels.Available at: http://meta.wikimedia.org/wiki/ User_groups
6
new media & society XX(X)


we can distinguish various user groups, some of which are ‘global’ (in the sense that they edit across various language Wikipedias) while others are specific to a certain ‘local’ Wikipedia.
Each user group maintains the same pecking order, regulating the distribution of per- mission levels: ‘blocked users’ have the least permissions, for they can only edit their own talk page; anonymous users have fewer permissions than registered users, who, in turn, are at a lower level of permission than bots; bots are just below administrators (‘admins’), who occupy the highest level in the elaborate Wikipedia bureaucracy; system administrators (or developers) have the most permissions, including server access. This is a small user group of only 10 people who ‘manage and maintain theWikimedia Foundation Servers’ (http://meta.wikimedia.org/wiki/System_administrators). Remarkable in this ranking system is the position of bots, whose permission level is just below that of administrators, but well above the authority of registered users. We will return to the status of bots in the third section. For now, it is important to note the significant role of automated mechanisms in the control of content.
In fact, we could argue that the very success of the Wikipedia project lies in the regu- lation of collaborative production at any level, from a small edit or a single upload to a more extensive contribution or even development of the platform or its content. Like any large public system, Wikipedia works through a system of disciplinary control by issuing rewards, such as granting a dedicated user the authority level of administrator (Burke and Kraut, 2008) and by blocking contributors’ rights to those users who deviate from the rules. A disciplinary system of power distribution in the digital age, however, can’t be regarded exclusively as a system of social control. As Gilles Deleuze (1990) has pointed out in his acute revision of Foucault’s disciplinary institutions, a ‘society of control’ deploys technology as an intricate part of its social mechanisms. Wikipedia’s content management system, with distinct levels of permissions, allows moreover for protoco- logical control, a mode of control that is at once social and technological – one cannot exist without the other (Galloway, 2004: 17). Along the same lines, Bruno Latour (1991: 129) proposes to analyze technological objects and infrastructures as ‘sociotechnical ensembles’, in which the strict division between ‘material infrastructure’ and ‘social superstructure’ is to be dissolved:
Rather than asking ‘is this social’ or ‘is this technical or scientific’ … we simply ask: has a human replaced a non-human? Has a non-human replaced a human? … Power is not a property of any of those elements [of humans or non-humans] but of a chain. (1991: 110)
Similar to Latour’s attempt to dissolve the ‘technology/society’ divide, we argue that the dynamic interwovenness of human and non-human content agents is an underrated yet crucial aspect of Wikipedia’s performance. The online encyclopaedia’s success is based on sociotechnical protocological control, a combination of its technical infrastructure and the collective ‘wisdom’ of its contributors. Rather than assessing Wikipedia’s episte- mology exclusively in terms of ‘power of the few’ versus the ‘wisdom of crowds’, we propose to define Wikipedia as a gradually evolving sociotechnical system that carefully orchestrates all kinds of human and non-human contributors by implementing manage- rial hierarchies, protocols and automated editing systems.
Niederer and Dijck
7


Accurate and neutral encyclopaedic information
A similar disregard of technological aspects can be observed in another heated debate that has haunted the online project from its inception: the question regarding the quality of Wikipedia’s encyclopaedic information. Wikipedia entries have often been held against the standards of accuracy and objectivity set by reputed encyclopaedias such as the Encyclopaedia Britannica. Wikipedia entries are based on three core principles, which serve as leading rules for its contributors and aim at holding up the encyclopae- dia’s quality standards.
The first core rule is ‘verifiability’ (i.e. readers have to be able to retrieve Wikipedia content in reliable sources). Therefore, referring to published articles and verifiable resources is necessary to have the article (or edits) accepted (http://en.wikipedia.org/ wiki/Wikipedia:Verifiability). A second, related core rule is called ‘no original research’. Wikipedia simply does not accept ‘new’ (unpublished) research or thought (http:// en.wikipedia.org/wiki/Wikipedia:No_original_research). Again, reliability on Wikipedia means citing proven published sources. Third, articles have to be written from a ‘neutral point of view’ (NPoV) to avoid bias, meaning the articles have to be based on facts, and factsaboutopinions,butnotonopinions(http://en.wikipedia.org/wiki/Wikipedia:Neutral_ point_of_view). All contributors, whether single anonymous users or administrators, are required to comply with these rules.2 Compliance is regulated by the abovementioned core rules, and non-compliance is punished by removal of edits. In past debates on Wikipedia’s standards of accuracy and neutrality, the emphasis has been on whether they can be kept up by crowds of human users. The more profound question in line with our research thesis is, however, how these standards are maintained and controlled through the organization and mechanics of Wikipedia’s content management system.
Initially, the quality debate concentrated mainly on accuracy or, more precisely, on the lack thereof due to the impossibility of verifying and authenticating sources. With so many anonymous and amateur contributors, the likeliness of vandalism, inaccuracy and downright sloppiness in factual details was more than real. As danah boyd (2005) observes, Wikipedia ‘lacks the necessary research and precision’ and ‘students are often not media-savvy enough to recognize when to trust Wikipedia and when this is a dreadful idea’. Other researchers entered the quality of content debate by testing Wikipedia’s robustness in terms of content vandalism. Alexander Halavais (2004) intentionally con- tributed incorrect information to existing articles. For his ‘Isuzu experiment’, he inserted 13 mistakes into 13 different articles, expecting that most of the errors would remain intact. Much to Halavais’s surprise, his wrongful edits did not last long, but were all corrected within a couple of hours.3
However, the explanation for the speed at which his vandalism was detected lies less in human acuity than in technological perspicacity. The fact that Halavais had made all his changes from the same username and IP address arguably made it all too easy for Wikipedians and their tools to undo his edits. Making 13 changes in 13 different articles in a short timeframe obviously attracts attention from automated bots, and even human Wikipedians, after spotting one mistake, would have probably looked into his other edits in the other articles and could have easily retrieved the other mistakes ‘by association’. Philosopher of science P.D. Magnus therefore provided a corrective to Halavais’s
8
new media & society XX(X)


research method in his 2008 study, in which he inserted inaccuracies distributed across different IP addresses and fields of expertise. He found that one-third of the errors were corrected within 48 hours and that most of the others were ‘corrected by association’, as was the case with Halavais’s experiment, whereby Wikipedians probably started check- ing his other edits after initially finding three mistakes. Some researchers conclude from these tests that the online encyclopaedia is robust to vandalism due to its huge numbers of watchful community members (Poe, 2006). Instead, we argue that it is rather the strict implementation of protocological control and the use of automated bots that account for
Wikipedia’s vigilance. With their experiments, Halavais and Magnus have less proven the reliability of Wikipedia's articles than the reliability of the encyclopaedia’s techno- managerial system.
The notion of a community of vigilant users has continued to feed the accuracy debate, particularly by academic researchers who questioned the reliability of Wikipedia’s sup- posed egalitarian approach. Collaboratively written amateur content, as these critics con- tend, finds itself at odds with knowledge production. If not written by known experts, how accurate and reliable are these encyclopaedic entries? In December 2005, the first academic research that systematically compared the accuracy of Wikipedia and Encyclopaedia Britannica was published in Nature (Giles, 2005). Researchers compared the two encyclopaedias by checking 42 science articles in both publications. The review- ers were academics, who checked the articles without knowing their source. They found Wikipedia and Britannica to be almost equally accurate; not surprisingly, the news was triumphantly announced as ‘Wikipedia Survives Research Test’ (BBC, 2005). With this outcome, Wikipedia was recognized as an encyclopaedia, at least on the level of its accu- racy. But this was not the symbolic end of the reliability discussion. On the contrary, the debate heated up and more research followed. In 2006, information systems researcher Thomas Chesney (2006) conducted more empirical research into the credibility of Wikipedia, asking a total of 258 experts (academics) and non-experts to fill out a survey about a Wikipedia article from their area of expertise (or, for the laymen, in their realm of interest). The respondents found mistakes in 13 percent of the Wikipedia articles. But Chesney also found that the experts gave the Wikipedia articles a higher credibility rat- ing than did the non-experts. Contrary to what Sanger described as the ‘perceived inac- curacy of Wikipedia’, the respondents expected (and found) Wikipedia to be a reliable source of information on the web.
In response to this accuracy debate, centring on the assumed polarity between (known) experts and (unknown) laypersons, few academics proposed to redirect its focus from product to process and from the abilities of people to the qualities of its technological tools. Historian Roy Rosenzweig (2006), who conducted a thorough analysis of Wikipedia biographical entries and compared them to entries from theAmerican National Biography Online (written by known scholars), concludes that the value of Wikipedia should not be sought in the accuracy of its published content at one moment in time, but in the dynamics of its continuous editing process – an intricate process whereby amateurs and experts collaborate in an extremely disciplined manner to improve entries each time they are edited. Rosenzweig notices the benefits of many edits to the factuality of an entry. As he points out, it is not so much crowds of anonymous users that make Wikipedia a reliable resource, but a regulated system of consensus editing that bares how history is
Niederer and Dijck
9


written: ‘Although Wikipedia as a product is problematic as a sole source of information, the process of creating Wikipedia fosters an appreciation of the very skills that historians try to teach’(2006: 138). One of the most important features, in this respect, is the website’s built-in history page for each article, which lets you check the edit history of an entry. According to Rosenzweig, the history of an article as well as personal watch lists and recent changes pages are important instruments that give users additional clues to deter- mine the quality of individual Wikipedia entries.
Part of the discussion disputing the accuracy and neutrality of Wikipedia’s content concentrated on the inherent unreliability of anonymous sources. How can an entry be neutral and objective if the encyclopaedia accepts copy edits from anonymous contribu- tors who might have a vested interest in its outcome? Critics like Keen (2008) and Denning et al. (2005) have objected to the distribution of editing rights to all users. What remains unsaid in this debate is that the impact of anonymous contributors is clearly restricted due to technological and protocological control mechanisms. For one thing, every erroneous anonymous edit is systematically overruled by anyone who has a (similar or) higher level of permission (which is anyone except for blocked users). Since anony- mous users are very low in the Wikipedia pecking order, their edit longevity is likely to be short when they break the rules of objectivity and neutrality.
On top of that, there is an increasing availability of ‘counter tools’ that allow for checking the identity of contributors or at least their location of origin. On the history page of each Wikipedia entry, we can find the time stamp and IP address for every anony- mous edit made. The WikiScanner, a tool created by California Institute of Technology student Virgil Griffith in 2007, makes it possible to geo-locate anonymous edits by look- ing up the IP addresses in an IP-to-Geo database, listing the IP addresses and the compa- nies and institutions they belong to. It facilitates the tracking of anonymous users by revealing who and where they actually are. The WikiScanner has proven to be a powerful tool for journalists trying to localize and expose biased content. In the WikiScanner FAQ on his website, Griffith states that he created the WikiScanner (among other reasons) to ‘create a fireworks display of public relations disasters in which everyone brings their own fireworks, and enjoys’ (2008a). The WikiScanner was designed to reveal bias, and Griffith collects the most spectacular results on his website.4
The debates concerning Wikipedia’s accuracy and neutrality have been dominated by fallacious oppositions of human actors (experts versus amateurs, registered versus anon- ymous users) and have favoured a static evaluation of its content (correct or incorrect at one particular moment in time). Both qualifications, however, are ill suited when applied to a dynamic online encyclopaedia such as Wikipedia, mostly because a debate grounded in such parameters fails to acknowledge the crucial impact of a non-human actor: Wikipedia’s dynamic content management system and the protocols by which it is run. Arguably, Wikipedia is neither the often-advertised platform for many minds, nor is it a space for anonymous knowledge production. The WikiScanner has made the revealing of anonymous users much easier by matching IP addresses with contact information.5 Bias can now be identified, tracked and, if necessary, reverted. But there is more to the technicity of Wikipedia content than fast users armed with notification feeds and moni- toring devices. The technicity of Wikipedia content, as we will show in the next section, lies in the totality of tools and software robots used for creating, editing and linking
10
new media & society XX(X)


entries, combating vandalism, banning users, scraping and feeding content and cleaning articles. It is the complex collaboration not of crowds, but of human and non-human agents combined that defines the quality standards of Wikipedia content.
Co-authored by bots
The significant presence of bots appears counter to the common assumption that Wikipedia is authored by human ‘crowds’. In fact, human editors would never be able to keep up the online encyclopaedia if they weren’t assisted by a large number of software robots. Bots are pieces of software or scripts that are designed to ‘make automated edits without the necessity of human decision-making’ (http://en.wikipedia.org/wiki/ Wikipedia:Bot_policy). They can be recognized by a username that contains the word ‘bot’, such as SieBot or TxiKiBoT. Bots are created by Wikipedians and, once approved, they obtain their own user page and form their own user group with a certain level of access and administrative rights, made visible by flags on a user account page. One year after Wikipedia was founded, bots were introduced as useful helpers for repetitive admin- istrative tasks (http://en.wikipedia.org/wiki/Wikipedia:History_of_Wikipedia_bots). Since the first bot was created on Wikipedia, the number of bots has grown exponen- tially. In 2002, there was only one active bot on Wikipedia; in 2006, the number had grown to 151 and, in 2008, there were 457 active bots (http://en.wikipedia.org/wiki/ Wikipedia:Editing_frequency/All_bots).
In general, there are two types of bots: editing (or ‘co-authoring’) bots and non-edit- ing (or administrative) bots. Each of the bots has a very specific approach to Wikipedia content, related to its often narrow task. Administrative bots are most well known and well liked among Wikipedia users. They are deployed to perform policing tasks, such as blocking spam and detecting vandalism. Vandalism combat bots come into action when ‘vandalism-like’ edits are made. Vandalism is recognizable, for it often means a large amount of deleted content in an article or a ‘more than usual’ change in content. Spellchecking bots check language and make corrections in Wikipedia articles. Ban enforcement bots can block a user from Wikipedia and, thus, take away his or her editing rights, which is something a registered user is not able to do. Non-editing bots are also data miners, used to extract information from Wikipedia, and copyright violation identi- fiers; the latter compare text in new Wikipedia entries to what is already available on the web about that specific topic and report this to a page for human editors to review. Most bots, being created to perform repetitive tasks, make many edits. In 2004, the first bots reached the record number of 100,000 edits.
The second category of editing or co-authoring bots seems to be much less known by Wikipedia users and researchers (for it would otherwise certainly have played a role in the debates about reliability and accuracy). While not every bot is an author, all bots can be classified as ‘content agents’, as they all actively engage with Wikipedia con- tent. The most active Wikipedians are in fact bots. A closer look at various user groups reveals that bots create a large number of revisions of high quality (http://en.wikipedia. org/wiki/Wikipedia:List_of_Wikipedians_by_number_of_edits#List). Adler et al. (2008) discovered that the two highest contributors in their ‘edit longevity survival test’ were bots. As mentioned before, bots as a user group have more rights than
Niederer and Dijck
11


registered users and also a very specific set of permissions. For instance, bot edits are by default invisible in recent changes logs and watch lists. Research cited above has already pointed out that Wikipedians rely on these notification systems and feeds for the upkeep of articles.
Describing Wikipedians in bipolar categories of humans and non-humans, how- ever, doesn’t do justice to what is, in fact, a third category: that of the many active users assisted by administrative and monitoring tools, also referred to as software- assisted human editors. Bots are Wikipedians’ co-authors of many entries. One of the first editing bots to be deployed by Wikipedians was rambot, a piece of software cre- ated by Derek Ramsey (http://en.wikipedia.org/wiki/User:Ram-Man). Rambot pulls content from public databases and feeds it into Wikipedia, creating or editing articles on specific content, either one by one or as a batch. Since its inception in 2002, ram- bot has created approximately 30,000 articles on US cities and counties on Wikipedia using data from the CIA World Factbook and the US census. And since content pro- duced by authoring bots relies heavily on their sources, errors in the data set caused rambot to publish around 2000 corrupted articles. In the course of time, bot-gener- ated articles on American cities and counties were corrected and complemented by human editors, following a strict format protocol: history, geography, demographics, and so on. The articles appear strikingly tidy and informative and remarkably uni- form. If we compare, for instance, an article on La Grange, Illinois, as created by rambot in 2002, with a more recent version of this article from 2009, it clearly shows the outcomes of a collaborative editing process; the entry has been enriched with facts, figures and images (see Figure 2). The basic format, however, has remained the same. To date, it is still rambot’s main task to create and edit articles about US coun- ties and cities, while human editors check and complement the facts provided by this software robot.
But how dependent is Wikipedia on the use of bots for the creation and editing of content? What is the relative balance of human versus non-human contributions in the online encyclopaedia? Peculiarly, the answer to this simple question turns out to be layered and nuanced. When we started to look for answers, we found there to be strik- ing differences between various language Wikipedias. As a global project, Wikipedia features over 10 million articles in over 250 languages (2 million in English) serving a large number of language communities.6 The fact that Wikipedia distinguishes between local and global user groups already suggests that bot activity might differ across local Wikipedias, which indeed turns out to be the case.7 Specific language Wikipedias not only greatly vary in size and number of articles, but also in bot activity. The Wikipedia Bot Activity Matrix, Wikipedia’s own meta-data record, offers an overview of total bot activity as well as bot activity per language (http://stats.wikimedia.org/EN/ BotActivityMatrix.htm). The percentage of bot edits in all Wikipedias combined is 21.5 percent. Excluding the English language Wikipedia, total bot activity amounts to 39 percent, which means that bot activity is unevenly deployed across languages and communities.
In order to account for the differences in bot activity versus human activity, it is inter- esting to compare bot activity in the most used language Wikipedias (English, Japanese, German) to bot activity in endangered and revived language Wikipedias (such as Cornish,
12
new media & society XX(X)


Figure 2.Bot-created article compared to a human-edited article.The upper screenshot is the La Grange, Illinois article as created by rambot on 11 December 2002.The lower
screenshot shows the same article on 12 January 2009.Available at: http://en.wikipedia.org/wiki/ La_Grange,_Illinois
Oriya, Ladino). In the Dorling maps (see Figures 3 and 4), the thin-lined outer circle depicts the language Wikipedia, sized according to the amount of articles in total. The inner dot represents the share of bot activity in that language Wikipedia. The English, Japanese and German Wikipedias show that by far most of its editing is done by human editors. The German Wikipedia, for instance, has only 9 percent bot activity; the English version even less. Wikipedias of small and endangered languages show a high depen- dency on bots and a relatively small percentage of human edits. Oriya, for instance,
Niederer and Dijck
13


Figure 3. Visualization of Wikipedia bot activity in most-used languages worldwide, overview and detail.The outer circle is theWikipedia size in that specific language;the inner dot depicts the percentage of bot activity in that language Wikipedia. Source: Rogers et al. (2008). Graphic by Auke Touwslager using the Dorling Map Tool, Digital Methods Initiative,Amsterdam, 2009.
Available at: http://wiki.digitalmethods.net/Dmi/NetworkedContent
depends 89 percent on automated software programmes; one small Wikipedia, in the language Bishnupriya Manipuri has seen 97 percent of its edits made by bots (http://stats. wikimedia.org/EN/TablesWikipediaBPY.htm).
Further analysis of bot activity versus human activity reveals that the variety of bot dependency can be an indicator of the state of the language Wikipedia, if not the state of that language, in the global constellation. Looking at the types of bots, we may notice that Wikipedias are maintained mainly by bots that network the content, so-called inter- wiki and interlanguage bots. Phrased differently, the bots active in these spaces take care of linking articles to articles in Wikipedias, to prevent them from becoming ‘orphans’ or dead ends. Wikipedia policy states that articles should be networked and be part of the
14
new media & society XX(X)


Figure 4.  Bot activity in endangered language Wikipedias.Analysis: Rogers et al. (2008). Graphic by Auke Touwslager using the Dorling Map Tool, Digital Methods Initiative,Amsterdam, 2009. Available at: http://wiki.digitalmethods.net/Dmi/NetworkedContent
Wikipedia web. This core principle is summarized as: ‘Link articles sideways to neigh- bors, upwards to categories and contexts, and downwards to sub-articles to create a use- ful web of information’ (http://en.wikipedia.org/wiki/Wikipedia:Build_the_web). Not only are ‘good’ Wikipedia articles full of links to reliable sources, but they should also link to related Wikipedia articles and sub-articles and be linked to themselves. Articles that only refer to each other, but are not linked to or linking to other articles, are also considered a threat to the principle of building the web.
We can analyze a language’s state of interconnectedness using the Wikipedia statistics pages, featuring lists of the most active bots per language Wikipedia. They reveal that most-used language Wikipedias, which obviously contain much more content than the smaller language Wikipedias, have bot activity distributed across administrative tasks. In German, for instance, the top 45 listing of most active bots features 27 interwiki bots and 18 bots that are meant to edit content, add categories and fix broken links.8 In the smaller language Wikipedias, bots significantly outnumber human editors and are mostly dedi- cated to linking articles to related articles in other Wikipedias. They make sure that the content, however scarce, is networked. The Cornish Wikipedia’s top 45 of most active bots, for instance, shows at least 35 interwiki bots, while the remainder are bots with unspecified functions. These interwiki bots, such as ‘Silvonenbot’, a bot that adds inter- language links, make connections between various language Wikipedias. Smaller lan- guage Wikipedias thus make sure that every article is properly linked sideways and prevent the language Wikipedia from becoming isolated.
Tracing the collaboration between human and non-human agents in Wikipedias thus allows for an interesting and unexpected insight into the culturally and linguistically diverse makeup of this global project. Following the ‘wisdom of crowds’ paradigm, we might have been tempted to look for cultural-linguistic diversity in terms of many people across the world collaborating in different languages and from a number of cultural back- grounds. In line with this paradigm, British information scientists have demonstrated that
Niederer and Dijck
15


the internet (and Wikipedia, in particular) is anything but a culturally neutral space; major aspects of collaborative online work are influenced by pre-existing cultural differ- ences between human contributors (Pfeil et al., 2006). Adding a ‘natively digital’ analy- sis of the varied distribution of bot dependency across the wide range of language Wikipedias, we show that cultural differences in collaborative authoring of Wikipedia content cannot just be accounted for in terms of its human users; they reveal themselves perhaps more candidly in the relative shares of human and non-human contributions as revealed through automated patterns of contributions. High levels of bot activity, mainly dedicated to networking content and to building the web, are an indicator of small or endangered languages. A richer variety of bot activity, largely subservient to human edit activity, could be considered an indicator of a large and lively language space.
Conclusion
Wikipedia has most commonly been evaluated – either praised or criticized – for its col- laborative knowledge production by many (anonymous) minds. From the start, research- ers have pointed out that Wikipedia thrived by virtue of a small core of dedicated contributors rather than a large crowd of collaborators, even if the encyclopaedia became more hospitable to common users after 2006. In addition, Wikipedia has never been this mythical egalitarian space, for the various user groups have very distinct levels of per- missions. Past Wikipedia research has focused mainly on the crowdsourcing of knowl- edge as well as on the reliability of Wikipedia content. As we have shown, some researchers, such as Halavais and Magnus, tested Wikipedia by entering false informa- tion. These types of research have isolated Wikipedia content as a static product mainly by assessing it against other encyclopaedic records.9 Other research, mostly (investi- gative) journalism, was intent on ‘outing’ anonymous human editors, making use of ‘counter-technology’ like the WikiScanner to reveal the identity of contributors or the origin of an edit. More recently, we have seen different research approaches to Wikipedia, such as that of Rosenzweig who acknowledges the encyclopaedia’s dynamic content as well as the significance of its partially automated content-management system. For the most part, though, scholarly evaluations of Wikipedia have adhered to the human con- tent-agent paradigm.
In this article, we have argued that Wikipedia’s nature and quality should be evaluated in terms of collaborative qualities, not only of its human users, but specifically of its human and non-human actors. Since 2002, Wikipedia content has been maintained by both tool-assisted human editors and bots, and collaboration has been modulated by pro- tocols and strict managerial hierarchies. Bots are systematically deployed to detect and revert vandalism, monitor certain articles and, if necessary, ban users, but they also play a substantial role in the creation and maintenance of entries. As we have shown, bot activity may be analyzed as an indicator of the international or intercultural dimension of Wikipedia as a global project.
In the fall of 2009, Wikipedia has introduced WikiTrust, a MediaWiki extension developed by the WikiLab at the University of California in Santa Cruz. With WikiTrust, newly edited parts of Wikipedia articles are colour coded according to reliability, based on the author’s reputation, which is established by the lifespan of their other contributions.
16
new media & society XX(X)


Instead of turning to the expert to check all articles, Wikipedia builds further on the com- bination of rules, hierarchies and editors. To understand Wikipedia’s collaborative pro- cess, we need to unravel not simply Wikipedia’s human agents, but the specificities of its technicity.
We propose to extend this kind of analysis from Wikipedia to various kinds of Web 2.0 infrastructures. Non-human actors and coded protocols are often overlooked in the many optimistic Web 2.0 theories that triumphantly claim the virtues of mass collabora- tion on Web 2.0 platforms (Tapscott and Williams, 2006). It is important to question the assumptions of the internet as a merely social laboratory of human interaction, instead analyzing and interrogating the sociotechnical system that lies at the core of Web 2.0 platforms. Human and machine contributions are complementary parts of a society of control in which social interactions are increasingly facilitated by means of coded, auto- mated processes. Human judgements such as reliability, accuracy or factuality are turned into machine-coded and regulated alert systems, as illustrated by Wikipedia’s mecha- nisms for generating and checking content. Nicolas Carr compares Web 2.0 to a Mechanical Turk, which ‘turns people’s actions and judgments into functions in a soft- ware program’ (2008: 218). A thorough and critical understanding of the automated pro- cesses that structure human judgements and decisions requires analytical skills and medium-specific methods which are crucial to a full understanding of how the internet works. Instead of succumbing to the mechanisms of control, users should learn to criti- cally analyze their interactions with technology and actively engage in technology’s development (Zittrain, 2008: 245).
In line with David Beer’s call in this journal for a more thorough understanding of the ‘technological unconsciousness’ of participatory web cultures, we have tried to explain the ‘performative infrastructure’ of Wikipedia by focusing on the second level of analy- sis that Beer proposes; namely, unravelling software infrastructures and their applica- tions (Beer, 2009: 998). In this article, we have deployed several natively digital methods to unravel in detail the close interdependency of human and technological agents. It is important to comprehend the powerful information technologies that shape our everyday life and the coded mechanisms behind our informational practices and cultural experi- ences. The analysis of the Wikipedia platform as a sociotechnical system is a first step in that direction.
Notes
1‘Digital methods’ is a term for medium-specific methods for web research coined by Richard Rogers (2009). The research for this article was conducted with the Digital Methods Initiative and the Govcom.org Foundation (2008).
2On top of the set of rules, there is a fourth important general principle emphasizing the ideals of openness and collaboration that lie at the core of the project: ignore all rules. This general principle was written up by Larry Sanger to make clear that, above all, Wikipedia is an open platform. Wikipedians should first and foremost strive to improve and maintain Wikipedia (see: http://en.wikipedia.org/wiki/Wikipedia:Ignore_all_rules).
3Halavais’s approach was heavily criticized, mainly because he deliberately littered his object of study.After the event, Halavais regretted his approach, especially because the media attention on
Niederer and Dijck
17


his experiment encouraged others to test Wikipedia by inserting mistakes. His website (http:// alex.halavais.net/the-isuzu-experiment) now includes a call for testing Wikipedia in a ‘non- destructive way’.
4In the summer of 2008, Virgil Griffith launched the WikiWatcher suite, a set of tools designed for monitoring and maintaining Wikipedia. The suite includes a tool that makes it possible to de-anonymize users with a username whose IP addresses match those of other user(name)s or companies/institutions in an IP-to-Geo database. This stretches the notion of anonymity from the unregistered to the registered with a username (see Griffith, 2008b).
5WikiScanner 2 works the other way around: enter a company name or URL and the tool shows you which Wikipedia articles were edited from that organization’s IP address.
6An overview of the available local Wikipedias is given on the Wikipedia portal page (www. wikipedia.org). Wikipedia currently has 264 language versions, some of which only have a main page and no articles as of yet. The largest Wikipedia is in English, with more than 2 million articles; followed by the German, French, Polish and Japanese editions, each of which contain more than half a million articles. 17 other language editions contain 100,000+ articles, and more than 100 other languages contain 1000+ articles; the overview also includes the smaller ones with only 100+ articles and even Wikipedias that have only a main page (http://wikimediafoundation.org/wiki/Our_projects).
7In a case study on Wikipedia as ‘networked content’ that Sabine Niederer conducted with Richard Rogers et al. (2008), during the 10-Year Jubilee Workshop of the Govcom.org Foundation, the researchers noticed the great discrepancy between bot activity in English and the other lan- guage versions of Wikipedia.
8The top 45 most active bots on the German Wikipedia consist of 27 interwiki bots and 18 bots with various tasks such as editing, fixing links and adding categories (http://stats.wikimedia. org/EN/TablesWikipediaDE.htm).
9To this date, communication scholars like Halavais and Lackaff (2008) examine Wikipedia’s reliability and completeness, assessing the qualities of its users rather than those of its systems.
References
Adler T, de Alfaro L, Pye I and Raman W (2008) Measuring author contributions to Wikipedia. In: Proceedings of WikiSym 2008, Porto, 8–10 September. New York: ACM. Available at: http:// users.soe.ucsc.edu/~luca/papers/08/wikisym08-users.pdf
Baker N (2008) The charms of Wikipedia. New York Review of Books 55(4). Available at: http:// www.nybooks.com/articles/21131
BBC (2005) Wikipedia survives research test. BBC News (15 December). Available at: http://news. bbc.co.uk/2/hi/technology/4530930.stm
Beer D (2009) Power through the algorithm? Participatory web cultures and the technological unconscious. New Media & Society 11(6): 985–1002.
Benkler Y (2006) The Wealth of Networks: How Social Production Transforms Markets and Freedom. New Haven, CT: Yale University Press.
boyd d (2005) Academia and Wikipedia. Corante (4 January). Available at: http://many.corante. com/archives/2005/01/04/academia_and_wikipedia.php
Bruns A (2008) Blogs, Wikipedia, Second Life, and Beyond: From Production to Produsage. New York: Peter Lang. Available at: http://produsage.org/
18 new media & society XX(X)
Burke M and Kraut R (2008) Taking up the mop: identifying future Wikipedia administrators. In: Proceedings of the 2008 CHI Conference, Florence, 5–10 April. New York: ACM, 3441–6. Available at: http://portal.acm.org/citation.cfm?id=1358628.1358871
Carr T (2008) The Big Switch: Rewiring the World, from Edison to Google. New York: Norton. Chesney T (2006) An empirical examination of Wikipedia’s credibility. First Monday 11(11).
Available at: http://firstmonday.org/issues/issue11_11/chesney/
Deleuze G (1990) Society of control. L’autre journal (1). Available at: http://www.nadir.org/nadir/ archiv/netzkritik/societyofcontrol.html
Denning P, Horning J, Parnas D and Weinstein L (2005) Inside risks: Wikipedia risks.
Communications of the ACM 48(12): 152.
Digital Methods Initiative (2008) Digital Methods Initiative. Available at: http://www.digital- methods.net
Galloway A (2004) Protocol: How Control Exists after Decentralization. Cambridge, MA: MIT Press.
Ghosh RA and Prakash VV (2000) Orbiten free software survey. First Monday 5(7). Available at: http://www.firstmonday.org/issues/issue5_7/ghosh/
Giles J (2005) Internet encyclopaedias go head to head. Nature 438: 900–1. Available at: http:// www.nature.com/nature/journal/v438/n7070/full/438900a.html
Govcom.org Foundation (2008) Govcom.org 10 year jubilee workshop. Digital Methods Initiative (11–15 August). Available at: http://wiki.digitalmethods.net/Dmi/GovcomorgJubilee#Jubilee_ Programme_archived
Griffith V (2008a) WikiScanner homepage. Available at: http://virgil.gr/31.html Griffith V (2008b) Wikiwatcher homepage. Available at: http://wikiwatcher.virgil.gr/
Halavais A (2004) The Isuzu experiment. In: A Thaumaturgical Compendium Blog. Available at: http://alex.halavais.net/the-isuzu-experiment/
Halavais A and Lackaff D (2008) An analysis of topical coverage of Wikipedia. Journal of Computer-Mediated Communication 13: 429–40.
Howe J (2006) The rise of crowdsourcing. Wired 14(6). Available at: http://www.wired.com/wired/ archive/14.06/crowds.html
Jenkins H (2006) Convergence Culture: Where Old and New Media Collide. Cambridge, MA: MIT Press.
Keen A (2008) The Cult of the Amateur: How Blogs, MySpace, YouTube, and the Rest of Today’s User-generated Media Are Killing Our Culture and Economy. London: Nicholas Brealey.
KitturAand Kraut RE (2008) Harnessing the wisdom of crowds in Wikipedia: quality through coor- dination. In: Proceedings of the ACM 2008 Conference on Computer Supported Cooperative Work. New York: ACM, 37–46.
Kittur A, Chi E, Pendleton B, Sun B and Mytkowicz T (2008) Power of the few vs wisdom of the crowd: Wikipedia and the rise of the bourgeoisie. Paper presented at ‘CHI 2007’, San Jose, 28 April–3 May.
Latour B (1991) Technology is society made durable. In: Law J (ed.) A Sociology of Monsters: Essays on Power, Technology and Domination. London: Routledge, 103–32.
Magnus PD (2008) Early response to false claims in Wikipedia. First Monday 13(9). Available at: http://www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/fm/article/viewArticle/2115/2027
Negroponte N (1995) Being Digital. New York: Vintage.
Niederer and Dijck
19


Pfeil U, Zaphiris P and Ang CS (2006) ‘Cultural Differences in Collaborative Authoring of Wikipedia’, Journal of Computer-Mediated Communication 12: 88–113.
Poe M (2006) The hive. Atlantic Online (September). Available at: http://www.theatlantic.com/ doc/200609/wikipedia
Rogers R (2009) The End of the Virtual: Digital Methods. Amsterdam: Vossiuspers UvA.
Rogers R, Niederer S, Deveraux Z and Nijhof B (2008) Networked content. Digital Methods Initiative. Available at: http://wiki.digitalmethods.net/Dmi/NetworkedContent
Rosenzweig R (2006) Can history be open source? Wikipedia and the future of the past. Journal of American History 93(1): 117–46.
Shirky C (2005) K5 article on Wikipedia anti-elitism. Corante. Available at: http://many.corante. com/archives/2005/01/03/k5_article_on_wikipedia_antielitism.php
Shirky C (2008) Here Comes Everybody: The Power of Organizing without Organizations. New York: Penguin.
Stalder F and Hirsch J (2002) Open source intelligence. First Monday 7(6–3). Available at: http:// firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/961
Stevenson M and Rogers R (2009) Digital methods: first steps. East Review 28(2). Available at: http://www.easst.net/review/june2009/stevenson
Sunstein CR (2006) Infotopia: How Many Minds Produce Knowledge. Oxford: Oxford University Press.
Surowiecki J (2004) The Wisdom of Crowds: Why the Many Are Smarter than the Few and How Collective Wisdom Shapes Business, Societies and Nations. New York: Doubleday.
Swartz A (2006) Who writes Wikipedia? Raw Thought Blog. Available at: http://www.aaronsw. com/weblog/whowriteswikipedia/
Tapscott D and Williams AD (2006) Wikinomics: How Mass Collaboration Changes Everything. New York: Penguin.
Zittrain J (2008) The Future of the Internet. New York: Penguin.
Sabine Niederer is PhD candidate in Media Studies at the University of Amsterdam and member of the Digital Methods Initiative, Amsterdam. She is also Managing Director of the Institute of Network Cultures, the new media research centre at the Amsterdam University of Applied Sciences. José van Dijck is a Professor of Media and Culture at the University of Amsterdam where she is currently Dean of Humanities. She has published widely in the area of media and science, (digital) media technologies, and television and culture. Her latest book is entitled Mediated Memories in the Digital Age (Stanford University Press, 2007). Address: University of Amsterdam, Spuistraat 210, 1012 VT Amsterdam, the Netherlands. [email: j.f.t.m.vanDijck@uva.nl]

No comments:

Post a Comment