new media & society

Wikipedia is often considered as an example of
‘collaborative knowledge’. Researchers have contested the value
ofWikipedia content on various accounts. Some have disputed the ability
of anonymous amateurs to produce quality information, while others have
contested Wikipedia’s claim to accuracy and neutrality. Even if these
concerns about Wikipedia as an encyclopaedic genre are relevant, they
misguidedly focus on human agents only.Wikipedia’s advance is not only
enabled by its human resources,but is equally defined by the
technological tools and managerial dynamics that structure and maintain
its content.This article analyzes the sociotechnical system – the
intricate collaboration between human users and automated content agents
– that defines Wikipedia as a knowledge instrument.
collaborative knowledge, protocol, sociotechnical system,Web 2.0,Wikipedia
or the ‘wisdom of crowds’. Since 2001, a group of
editors and volunteers have engaged in developing an online
encyclopaedia, whereby everyone is invited to contribute and articles
are open to continuous editing. Large numbers of contributors, the so-called
‘Wikipedians’, produce an online encyclopaedia that is unprecedented in
scale and scope. Researchers who have evaluated or contested the value
of Wikipedia content have almost unanimously focused on its human
contributors. For instance, it has been dis- puted whether an
encyclopaedia that is produced by many (anonymous) minds results in
quality information (Keen, 2008). Other critics have contested
Wikipedia’s claim to accuracy and neutrality by pointing at the
liability of allowing anonymous contributors whose interest or expertise
remains undisclosed.
Even if these concerns about Wikipedia as an
encyclopaedic genre are legitimate and relevant, we argue that they
misguidedly focus on human agents only, while neglecting the role of
technology. In this article, we focus on the intricate interrelation
between human and technological tools which lies at the heart of several
debates concerning Wikipedia. The first debate revolves around the
question of whether Wikipedia is authored primarily by a few elite users
or by many common contributors, an opposition we would like to
question, for Wikipedia has a refined hierarchical structure in which
contributing administrators, registered users, anonymous users and
‘bots’(short for ‘soft- ware robots’; see below) all have a distinct
rank in an orderly system. Second, we would like to refocus the public
debate on the quality of Wikipedia’s encyclopaedic information
(disputing whether its entries are accurate and neutral) by shifting
attention to the proto- cols and technologies deployed to facilitate
consensus editing. A basic comprehension of Wikipedia’s automated
editing systems, as well as emerging tracking tools like the
WikiScanner, are needed to evaluate the encyclopaedia’s ability to meet
standards of neutrality and accuracy, while preventing overt bias and
In the third section of this article, we want to show how dependent various user groups and entries are on non-human
content agents (or bots) that assist in editing Wikipedia content.
Examining the variable dependency of human editors on bots for editing
ency- clopaedic content per language Wikipedia, we explore how an
automated system of ‘interwiki’ and ‘interlanguage’ bots helps maintain
the overall content strategies of the online encyclopaedia. Linking and
networking specific language Wikipedias into one global system is less
the result of people working together across languages and borders as it
is the product of collaboration between humans and bots.
From our analysis, we conclude that any evaluation
of Wikipedia’s qualities should acknowledge the significance of the
encyclopaedia’s dynamic nature as well as the power of its partially
automated content management system. It is the intricate collaboration
between large numbers of human users and sophisticated automated systems
that defines Wikipedia’s ultimate success as a knowledge instrument. In
order to unravel this intricate human-technological interaction, we deploy several ‘natively’ digital methods for web research (Rogers, 2009).1 In his publication The End of the Virtual, Rogers calls for ‘research with the Internet’that applies novel medium-specific
methods and tools. Rather than importing existing analytical methods
onto the web, natively digital methods can ‘move beyond the study of
online culture alone’ (2009: 5) and help understand new media as the
interplay between human and technological agents. The concrete goal of
this analysis is to provide a better understanding of how Wikipedia’s
technological systems and the management of large
numbers of edits are inextricably intertwined. More philosophically, we
want to theorize human and machine contribu- tions as complementary
parts of a sociotechnical system that lies at the heart of many Web 2.0
Many minds collaborating
Wikipedia has been described with terms such as
‘many minds’ (Sunstein, 2006) and similar notions such as ‘the wisdom of
crowds’ (Kittur and Kraut, 2008; Surowiecki, 2004), ‘distributed
collaboration’ (Shirky, 2008), ‘mass collaboration’ (Tapscott and
Williams, 2006), ‘produsage’ (Bruns, 2008), ‘crowdsourcing’ (Howe,
2006), ‘Open Source Intelligence’ (Stalder and Hirsh, 2002) and
‘collaborative knowledge’ (Poe, 2006). The collectively written
encyclopaedia on a wiki platform is often heralded as an example of
collaborative knowledge production at its best. In early 2008, an
article in the New York Review of Books explained the compelling charm of Wikipedia:
So there was this exhilarating sense of mission – of proving the greatness of the Internet through an unheard-of
collaboration. Very smart people dropped other pursuits and spent days
and weeks and sometimes years of their lives doing ‘stub dumps,’ writing
ancillary software, categorizing and linking topics, making and
remaking and smoothing out articles – without getting any recognition
except for the occasional congratulatory barnstar on their user page and
the satisfaction of secret fame. Wikipedia flourished partly because it
was a shrine to altruism – a place for shy, learned people to deposit
their trawls. (Baker, 2008)
Since the start of the Wikipedia project in 2001,
the dedication of its contributors, as well as the group effort as an
alternative to the professional expert approach, have been sources of
both excitement and criticism. Even if Wikipedia has now become famous
for its collaborative character of many minds producing knowledge, it is
interesting to remind ourselves that the project originally intended to
be an expert-generated encyclo- paedia. Started with the
name of ‘Nupedia’, a small team of selected academics was invited to
write the entries, with the aim of creating a ‘free online encyclopaedia
of high quality’ (Shirky, 2008: 109). The articles would be made
available with an open content licence. Founder Jimmy ‘Jimbo’ Wales and
his employee Larry Sanger put into place a protocol based on academic
peer review (Poe, 2006; Shirky, 2008). This expert approach failed
partly because of the slowness of the editing process by invited
scholars. To speed up the process, Sanger suggested a wiki as a
collective place where scholars and inter- ested laypeople from all over
the globe could help with publishing and editing draft articles. The
success of Wikipedia and the commitment of the Wikipedians took them by
surprise. Sanger became the ‘chief organizer’, a wiki-friendly alternative to the job of ‘editor-in-chief’
that he held for Nupedia. He made a great effort to keep Wikipedia
orga- nized, while at the same time providing space for some of the
‘messiness’ (edit wars, inaccuracies, mistakes, fights, and so on) that
collaborative editing brings along. In early 2002, however, Sanger
turned away from the epistemic free-for-all of Wikipedia towards an expert-written
encyclopaedic model called Citizendium (
wiki/Welcome_to_Citizendium), while Wales chose to further pursue the
Wiki model.
The question as to whether online encyclopaedias and similar enterprises should be produced
by few (expert) or many (amateur) minds has been the source of heated
debate ever since the Sanger/Wales split. Internet critic Andrew Keen
(2008: 186) applauded Sanger for coming to his senses about the debased
value of amateur contributions in favour of expert professionals. On the
other end of the spectrum, many Wikipedia fans have praised its
democratizing potential as well as its ethos of community and collabora-
tion, a source of knowledge free for everyone to read and write
(Benkler, 2006; Jenkins, 2006). By the same token, the notion that
Wikipedia is actually produced by ‘crowds’ has been regularly
challenged, most notably by Wikipedia’s founders. During the first five
years of its existence, Wikipedia was largely dependent upon the work of
a small group of dedicated volunteers. Although they soon formed a
thriving community, the notion of a massive collective of contributors
was repeatedly downplayed by Wales. As he pointed out in a talk at
Stanford University in 2006:
The idea that a lot of people have of Wikipedia is
that it’s some emergent phenomenon – the wisdom of mobs, swarm
intelligence, that sort of thing – thousands and thousands of individual
users each adding a little bit of content and out of this emerges a
coherent body of work (…) [But Wikipedia is in fact written by] a
community, a dedicated group of a few hundred volunteers (…) I expected
to find something like an 80–20 rule: 80% of the work being
done by 20% of the users (…) But it’s actually much, much tighter than
that: it turns out over 50% of all the edits are done by just [0].7% of
the users. (Wales cited in Swartz, 2006)
As Wales asserts, until 2006, Wikipedia was largely
written and maintained by a small core of dedicated editors (2% doing
73.4% of all the edits). The disproportionate contri- bution of (self-)designated
developers versus ‘common users’ can also be found in research into the
open source movement. Rishab Aiyer Ghosh and Vipul Ved Prakash were
among the first to disaggregate the ‘many minds’ collaborating in the
open software movement. Their conclusion was that, ‘free software
development is less a bazaar of several developers involved in several
projects and more a collation of projects devel- oped single-mindedly
by a large number of authors’ (Ghosh and Prakash, 2000: 1). In the open
source movement, very few people were actually collaborating in
developing software.
It would be a mistake, however, to dismiss the idea
of Wikipedia’s ‘many contribu- tors’ as a myth. Starting in 2006, the
online encyclopaedia showed a distinct decline in ‘elite’ users, while,
at the same time, the number of edits made by novice users and ‘masses’
was steadily increasing. Various researchers have pointed to the
dramatic shift in the workload to the common user (Kittur et al., 2008).
Instead of pitching the power of the expert versus the wisdom of the
crowds, Kittur et al. speak of ‘the rise of the bour- geoisie’, a marked
growth in the population of low-edit users between 2006
and 2008. Interestingly, these researchers explain this shift by
describing Wikipedia in terms of a dynamic social system that evolves as
a result of the gradual development, implementa- tion and distribution
of content management systems. After an initial period of being managed
by a small group of high-powered, dedicated volunteers, the
‘pioneers were dwarfed by the influx of settlers’ (Kittur et al., 2008:
8). The early adopters select and refine technological and managerial
systems, followed by a majority of novice users who

begin to be the primary users of the system. Kittur
and his colleagues observe a similar decline in elite users of Web 2.0
platforms and suggest that it may be a common phenom- enon in the
evolution of online collaborative knowledge systems. This tentative
growing popularity, its organizers need to identify more productive
workers and grant them ‘administrator’s status’ (Burke and Kraut, 2008).
Although these researchers correctly observe
significant changes in the ‘wisdom of crowds’ paradigm, they seem to be
stuck in the antagonism of (few) experts versus (many) common users.
Even if they notice the growing presence of non-human
actors in the evolution of Wikipedia’s social dynamics, such as software
tools and managerial protocols, they tend to underestimate their
importance. In fact, the increasing openness of Wikipedia to
inexperienced users is made possible by a
sophisticated technomanage- rial system, which facilitates collaboration
on various levels. Without the implementa- tion of this strict
hierarchical content management system, Wikipedia would most likely have
become a chaotic experiment.
According toAlexander Galloway, the internet and many of its (open source) applications are not simply open or closed, but modulated.
Networked technology and management style are moderated by protocol,
which gains its authority ‘from technology itself and how people program
it’(2004: 121). Wikipedia, built as an open system and carried out by
large numbers of contributors, appears to be a ‘warm,
friendly’technological space, but only becomes warm and friendly through
what Galloway refers to as ‘technical standardization, agreement, orga-
nized implementation, broad adoption and directed participation’ (2004:
This is exactly what happened during the first five
years of Wikipedia, during which time administrators developed strict
protocols for distributing permission levels, impos- ing a hierarchical
order in deciding what entries to include or exclude, what edits to
allow or block. If we look more closely at Wikipedia’s organizational
hierarchy (see Figure 1),
Permission level
Wikipedia users
Most permissions
Developer/System administrator
Check user
Registered user
Newly registered user
Anonymous user
No permissions
Blocked user
Figure 1. Schematic
overview of global and local categories of Wikipedia users according to
permission levels.Available at:
we can distinguish various user groups, some of
which are ‘global’ (in the sense that they edit across various language
Wikipedias) while others are specific to a certain ‘local’ Wikipedia.
Each user group maintains the same pecking order,
regulating the distribution of per- mission levels: ‘blocked users’ have
the least permissions, for they can only edit their own talk page;
anonymous users have fewer permissions than registered users, who, in
turn, are at a lower level of permission than bots; bots are just below
administrators (‘admins’), who occupy the highest level in the elaborate
Wikipedia bureaucracy; system administrators (or developers) have the
most permissions, including server access. This is a small user group of
only 10 people who ‘manage and maintain theWikimedia Foundation
Servers’ (
Remarkable in this ranking system is the position of bots, whose
permission level is just below that of administrators, but well above
the authority of registered users. We will return to the status of bots
in the third section. For now, it is important to note the significant
role of automated mechanisms in the control of content.
In fact, we could argue that the very success of the Wikipedia project lies in the regu- lation of collaborative production at any level,
from a small edit or a single upload to a more extensive contribution
or even development of the platform or its content. Like any large
public system, Wikipedia works through a system of disciplinary control
by issuing rewards, such as granting a dedicated user the authority
level of administrator (Burke and Kraut, 2008) and by blocking
contributors’ rights to those users who deviate from the rules. A
disciplinary system of power distribution in the digital age, however,
can’t be regarded exclusively as a system of social control.
As Gilles Deleuze (1990) has pointed out in his acute revision of
Foucault’s disciplinary institutions, a ‘society of control’ deploys
technology as an intricate part of its social mechanisms. Wikipedia’s
content management system, with distinct levels of permissions, allows
moreover for protoco- logical control, a mode
of control that is at once social and technological – one cannot exist
without the other (Galloway, 2004: 17). Along the same lines, Bruno
Latour (1991: 129) proposes to analyze technological objects and
infrastructures as ‘sociotechnical ensembles’, in which the strict
division between ‘material infrastructure’ and ‘social superstructure’
is to be dissolved:
Rather than asking ‘is this social’ or ‘is this technical or scientific’ … we simply ask: has a human replaced a non-human? Has a non-human replaced a human? … Power is not a property of any of those elements [of humans or non-humans] but of a chain. (1991: 110)
Similar to Latour’s attempt to dissolve the ‘technology/society’ divide, we argue that the dynamic interwovenness of human and non-human
content agents is an underrated yet crucial aspect of Wikipedia’s
performance. The online encyclopaedia’s success is based on
sociotechnical protocological control, a combination of its technical
infrastructure and the collective ‘wisdom’ of its contributors. Rather
than assessing Wikipedia’s episte- mology exclusively in terms of ‘power
of the few’ versus the ‘wisdom of crowds’, we propose to define
Wikipedia as a gradually evolving sociotechnical system that carefully
orchestrates all kinds of human and non-human contributors by implementing manage- rial hierarchies, protocols and automated editing systems.
Accurate and neutral encyclopaedic information
A similar disregard of technological aspects can be
observed in another heated debate that has haunted the online project
from its inception: the question regarding the quality of Wikipedia’s
encyclopaedic information. Wikipedia entries have often been held
against the standards of accuracy and objectivity set
by reputed encyclopaedias such as the Encyclopaedia Britannica.
Wikipedia entries are based on three core principles, which serve as
leading rules for its contributors and aim at holding up the encyclopae-
dia’s quality standards.
The first core rule is ‘verifiability’ (i.e. readers
have to be able to retrieve Wikipedia content in reliable sources).
Therefore, referring to published articles and verifiable resources is
necessary to have the article (or edits) accepted
( wiki/Wikipedia:Verifiability). A second,
related core rule is called ‘no original research’. Wikipedia simply
does not accept ‘new’ (unpublished) research or thought (http:// Again,
reliability on Wikipedia means citing proven published sources. Third,
articles have to be written from a ‘neutral point of view’ (NPoV) to
avoid bias, meaning the articles have to be based on facts, and
point_of_view). All contributors, whether single anonymous users or
administrators, are required to comply with these rules.2 Compliance is regulated by the abovementioned core rules, and non-compliance
is punished by removal of edits. In past debates on Wikipedia’s
standards of accuracy and neutrality, the emphasis has been on whether
they can be kept up by crowds of human users. The more profound question
in line with our research thesis is, however, how these standards are
maintained and controlled through the organization and mechanics of
Wikipedia’s content management system.
Initially, the quality debate concentrated mainly on
accuracy or, more precisely, on the lack thereof due to the
impossibility of verifying and authenticating sources. With so many
anonymous and amateur contributors, the likeliness of vandalism,
inaccuracy and downright sloppiness in factual details was more than
real. As danah boyd (2005) observes, Wikipedia ‘lacks the necessary
research and precision’ and ‘students are often not media-savvy
enough to recognize when to trust Wikipedia and when this is a dreadful
idea’. Other researchers entered the quality of content debate by
testing Wikipedia’s robustness in terms of content vandalism. Alexander
Halavais (2004) intentionally con- tributed incorrect information to
existing articles. For his ‘Isuzu experiment’, he inserted 13 mistakes
into 13 different articles, expecting that most of the errors would
remain intact. Much to Halavais’s surprise, his wrongful edits did not
last long, but were all corrected within a couple of hours.3
However, the explanation for the speed at which his
vandalism was detected lies less in human acuity than in technological
perspicacity. The fact that Halavais had made all his changes from the
same username and IP address arguably made it all too easy for
Wikipedians and their tools to undo his edits. Making 13 changes in 13
different articles in a short timeframe obviously attracts attention
from automated bots, and even human Wikipedians, after spotting one
mistake, would have probably looked into his other edits in the other
articles and could have easily retrieved the other mistakes ‘by
association’. Philosopher of science P.D. Magnus therefore provided a
corrective to Halavais’s
research method in his 2008 study, in which he
inserted inaccuracies distributed across different IP addresses and
fields of expertise. He found that one-third of the errors
were corrected within 48 hours and that most of the others were
‘corrected by association’, as was the case with Halavais’s experiment,
whereby Wikipedians probably started check- ing his other edits after
initially finding three mistakes. Some researchers conclude from these
tests that the online encyclopaedia is robust to vandalism due to its
huge numbers of watchful community members (Poe, 2006). Instead, we
argue that it is rather the strict implementation of protocological
control and the use of automated bots that account for
Wikipedia’s vigilance. With their experiments,
Halavais and Magnus have less proven the reliability of Wikipedia's
articles than the reliability of the encyclopaedia’s techno- managerial
The notion of a community of vigilant users has
continued to feed the accuracy debate, particularly by academic
researchers who questioned the reliability of Wikipedia’s sup- posed
egalitarian approach. Collaboratively written amateur content, as these
critics con- tend, finds itself at odds with knowledge production. If
not written by known experts, how accurate and reliable are these
encyclopaedic entries? In December 2005, the first academic research
that systematically compared the accuracy of Wikipedia and Encyclopaedia
Britannica was published in Nature (Giles,
2005). Researchers compared the two encyclopaedias by checking 42
science articles in both publications. The review- ers were academics,
who checked the articles without knowing their source. They found
Wikipedia and Britannica to be almost equally accurate; not
surprisingly, the news was triumphantly announced as ‘Wikipedia Survives
Research Test’ (BBC, 2005). With this outcome, Wikipedia was recognized
as an encyclopaedia, at least on the level of its accu- racy. But this
was not the symbolic end of the reliability discussion. On the contrary,
the debate heated up and more research followed. In 2006, information
systems researcher Thomas Chesney (2006) conducted more empirical
research into the credibility of Wikipedia, asking a total of 258
experts (academics) and non-experts to fill out a survey
about a Wikipedia article from their area of expertise (or, for the
laymen, in their realm of interest). The respondents found mistakes in
13 percent of the Wikipedia articles. But Chesney also found that the
experts gave the Wikipedia articles a higher credibility rat- ing than
did the non-experts. Contrary to what Sanger described as
the ‘perceived inac- curacy of Wikipedia’, the respondents expected (and
found) Wikipedia to be a reliable source of information on the web.
In response to this accuracy debate, centring on the
assumed polarity between (known) experts and (unknown) laypersons, few
academics proposed to redirect its focus from product to process and from the abilities of people to the qualities of its technological tools.
Historian Roy Rosenzweig (2006), who conducted a thorough analysis of
Wikipedia biographical entries and compared them to entries from
theAmerican National Biography Online (written by known scholars),
concludes that the value of Wikipedia should not be sought in the
accuracy of its published content at one moment in time, but in the
dynamics of its continuous editing process – an intricate process
whereby amateurs and experts collaborate in an
extremely disciplined manner to improve entries each time they are
edited. Rosenzweig notices the benefits of many edits to the factuality
of an entry. As he points out, it is not so much crowds of anonymous
users that make Wikipedia a reliable resource, but a regulated system of
consensus editing that bares how history is
written: ‘Although Wikipedia as a product is
problematic as a sole source of information, the process of creating
Wikipedia fosters an appreciation of the very skills that historians try
to teach’(2006: 138). One of the most important features, in this
respect, is the website’s built-in history page for each
article, which lets you check the edit history of an entry. According to
Rosenzweig, the history of an article as well as personal watch lists
and recent changes pages are important instruments that give users
additional clues to deter- mine the quality of individual Wikipedia
Part of the discussion disputing the accuracy and
neutrality of Wikipedia’s content concentrated on the inherent
unreliability of anonymous sources. How can an
entry be neutral and objective if the encyclopaedia accepts copy edits
from anonymous contribu- tors who might have a vested interest in its
outcome? Critics like Keen (2008) and Denning et al. (2005) have
objected to the distribution of editing rights to all users. What
remains unsaid in this debate is that the impact of anonymous
contributors is clearly restricted due to technological and
protocological control mechanisms. For one thing, every erroneous
anonymous edit is systematically overruled by anyone who has a (similar
or) higher level of permission (which is anyone except for blocked
users). Since anony- mous users are very low in the Wikipedia pecking
order, their edit longevity is likely to be short when they break the
rules of objectivity and neutrality.
On top of that, there is an increasing availability
of ‘counter tools’ that allow for checking the identity of contributors
or at least their location of origin. On the history page of each
Wikipedia entry, we can find the time stamp and IP address for every
anony- mous edit made. The WikiScanner, a tool created by California
Institute of Technology student Virgil Griffith in 2007, makes it
possible to geo-locate anonymous edits by look- ing up the IP addresses in an IP-to-Geo
database, listing the IP addresses and the compa- nies and institutions
they belong to. It facilitates the tracking of anonymous users by
revealing who and where they actually are. The WikiScanner has proven to
be a powerful tool for journalists trying to localize and expose biased
content. In the WikiScanner FAQ on his website, Griffith states that he
created the WikiScanner (among other reasons) to ‘create a fireworks
display of public relations disasters in which everyone brings their own
fireworks, and enjoys’ (2008a). The WikiScanner was designed to reveal
bias, and Griffith collects the most spectacular results on his website.4
The debates concerning Wikipedia’s accuracy and
neutrality have been dominated by fallacious oppositions of human actors
(experts versus amateurs, registered versus anon- ymous users) and have
favoured a static evaluation of its content (correct or incorrect at
one particular moment in time). Both qualifications, however, are ill
suited when applied to a dynamic online encyclopaedia such as Wikipedia,
mostly because a debate grounded in such parameters fails to
acknowledge the crucial impact of a non-human actor: Wikipedia’s dynamic content management system and the protocols by which it is run. Arguably, Wikipedia is neither the often-advertised
platform for many minds, nor is it a space for anonymous knowledge
production. The WikiScanner has made the revealing of anonymous users
much easier by matching IP addresses with contact information.5 Bias
can now be identified, tracked and, if necessary, reverted. But there
is more to the technicity of Wikipedia content than fast users armed
with notification feeds and moni- toring devices. The technicity of
Wikipedia content, as we will show in the next section, lies in the
totality of tools and software robots used for creating, editing and
entries, combating vandalism, banning users,
scraping and feeding content and cleaning articles. It is the complex
collaboration not of crowds, but of human and non-human agents combined that defines the quality standards of Wikipedia content.
The significant presence of bots appears counter to
the common assumption that Wikipedia is authored by human ‘crowds’. In
fact, human editors would never be able to keep up the online
encyclopaedia if they weren’t assisted by a large number of software
robots. Bots are pieces of software or scripts that are designed to
‘make automated edits without the necessity of human decision-making’
( Wikipedia:Bot_policy). They can be
recognized by a username that contains the word ‘bot’, such as SieBot or
TxiKiBoT. Bots are created by Wikipedians and, once approved, they
obtain their own user page and form their own user group with a certain
level of access and administrative rights, made visible by flags on a
user account page. One year after Wikipedia was founded, bots were
introduced as useful helpers for repetitive admin- istrative tasks
Since the first bot was created on Wikipedia, the number of bots has
grown exponen- tially. In 2002, there was only one active bot on
Wikipedia; in 2006, the number had grown to 151 and, in 2008, there were
457 active bots (
In general, there are two types of bots: editing (or ‘co-authoring’) bots and non-edit-
ing (or administrative) bots. Each of the bots has a very specific
approach to Wikipedia content, related to its often narrow task.
Administrative bots are most well known and well liked among Wikipedia
users. They are deployed to perform policing tasks, such as blocking
spam and detecting vandalism. Vandalism combat bots come into action
when ‘vandalism-like’ edits are made. Vandalism is
recognizable, for it often means a large amount of deleted content in an
article or a ‘more than usual’ change in content. Spellchecking bots
check language and make corrections in Wikipedia articles. Ban
enforcement bots can block a user from Wikipedia and, thus, take away
his or her editing rights, which is something a registered user is not
able to do. Non-editing bots are also data miners, used to
extract information from Wikipedia, and copyright violation identi-
fiers; the latter compare text in new Wikipedia entries to what is
already available on the web about that specific topic and report this
to a page for human editors to review. Most bots, being created to
perform repetitive tasks, make many edits. In 2004, the first bots
reached the record number of 100,000 edits.
The second category of editing or co-authoring
bots seems to be much less known by Wikipedia users and researchers
(for it would otherwise certainly have played a role in the debates
about reliability and accuracy). While not every bot is an author, all
bots can be classified as ‘content agents’, as they all actively engage
with Wikipedia con- tent. The most active Wikipedians are in fact bots. A
closer look at various user groups reveals that bots create a large
number of revisions of high quality (http://en.wikipedia.
org/wiki/Wikipedia:List_of_Wikipedians_by_number_of_edits#List). Adler
et al. (2008) discovered that the two highest contributors in their
‘edit longevity survival test’ were bots. As mentioned before, bots as a
user group have more rights than
registered users and also a very specific set of
permissions. For instance, bot edits are by default invisible in recent
changes logs and watch lists. Research cited above has already pointed
out that Wikipedians rely on these notification systems and feeds for
the upkeep of articles.
Describing Wikipedians in bipolar categories of humans and non-humans,
how- ever, doesn’t do justice to what is, in fact, a third category:
that of the many active users assisted by administrative and monitoring
tools, also referred to as software- assisted human editors. Bots are
Wikipedians’ co-authors of many entries. One of the first
editing bots to be deployed by Wikipedians was rambot, a piece of
software cre- ated by Derek Ramsey (
Rambot pulls content from public databases and feeds it into Wikipedia,
creating or editing articles on specific content, either one by one or
as a batch. Since its inception in 2002, ram- bot has created
approximately 30,000 articles on US cities and counties on Wikipedia
using data from the CIA World Factbook and the US census. And since
content pro- duced by authoring bots relies heavily on their sources,
errors in the data set caused rambot to publish around 2000 corrupted
articles. In the course of time, bot-gener- ated articles
on American cities and counties were corrected and complemented by human
editors, following a strict format protocol: history, geography,
demographics, and so on. The articles appear strikingly tidy and
informative and remarkably uni- form. If we compare, for instance, an
article on La Grange, Illinois, as created by rambot in 2002, with a
more recent version of this article from 2009, it clearly shows the
outcomes of a collaborative editing process; the entry has been enriched
with facts, figures and images (see Figure 2). The basic format,
however, has remained the same. To date, it is still rambot’s main task
to create and edit articles about US coun- ties and cities, while human
editors check and complement the facts provided by this software robot.
But how dependent is Wikipedia on the use of bots
for the creation and editing of content? What is the relative balance of
human versus non-human contributions in the online
encyclopaedia? Peculiarly, the answer to this simple question turns out
to be layered and nuanced. When we started to look for answers, we found
there to be strik- ing differences between various language Wikipedias.
As a global project, Wikipedia features over 10 million articles in
over 250 languages (2 million in English) serving a large number of
language communities.6 The fact that Wikipedia
distinguishes between local and global user groups already suggests
that bot activity might differ across local Wikipedias, which indeed
turns out to be the case.7 Specific language
Wikipedias not only greatly vary in size and number of articles, but
also in bot activity. The Wikipedia Bot Activity Matrix, Wikipedia’s own
meta-data record, offers an overview of total bot activity
as well as bot activity per language (
BotActivityMatrix.htm). The percentage of bot edits in all Wikipedias
combined is 21.5 percent. Excluding the English language Wikipedia,
total bot activity amounts to 39 percent, which means that bot activity
is unevenly deployed across languages and communities.
In order to account for the differences in bot
activity versus human activity, it is inter- esting to compare bot
activity in the most used language Wikipedias (English, Japanese,
German) to bot activity in endangered and revived language Wikipedias
(such as Cornish,

Figure 2. Bot-created article compared to a human-edited article.The upper screenshot is the La Grange, Illinois article as created by rambot on 11 December 2002.The lower
screenshot shows the same article on 12 January 2009.Available at: La_Grange,_Illinois
Oriya, Ladino). In the Dorling maps (see Figures 3 and 4), the thin-lined
outer circle depicts the language Wikipedia, sized according to the
amount of articles in total. The inner dot represents the share of bot
activity in that language Wikipedia. The English, Japanese and German
Wikipedias show that by far most of its editing is done by human
editors. The German Wikipedia, for instance, has only 9 percent bot
activity; the English version even less. Wikipedias of small and
endangered languages show a high depen- dency on bots and a relatively
small percentage of human edits. Oriya, for instance,

Figure 3. Visualization of Wikipedia bot activity in most-used languages worldwide, overview and detail.The outer circle is theWikipedia size in that specific language;the inner dot depicts the
percentage of bot activity in that language Wikipedia. Source: Rogers
et al. (2008). Graphic by Auke Touwslager using the Dorling Map Tool,
Digital Methods Initiative,Amsterdam, 2009.
Available at:
depends 89 percent on automated software programmes;
one small Wikipedia, in the language Bishnupriya Manipuri has seen 97
percent of its edits made by bots (http://stats.
Further analysis of bot activity versus human
activity reveals that the variety of bot dependency can be an indicator
of the state of the language Wikipedia, if not the state of that
language, in the global constellation. Looking at the types of bots, we
may notice that Wikipedias are maintained mainly by bots that network
the content, so-called inter- wiki and interlanguage bots.
Phrased differently, the bots active in these spaces take care of
linking articles to articles in Wikipedias, to prevent them from
becoming ‘orphans’ or dead ends. Wikipedia policy states that articles
should be networked and be part of the

Figure 4. Bot activity in
endangered language Wikipedias.Analysis: Rogers et al. (2008). Graphic
by Auke Touwslager using the Dorling Map Tool, Digital Methods
Initiative,Amsterdam, 2009. Available at:
Wikipedia web. This core principle is summarized as:
‘Link articles sideways to neigh- bors, upwards to categories and
contexts, and downwards to sub-articles to create a use-
ful web of information’
( Not only are
‘good’ Wikipedia articles full of links to reliable sources, but they
should also link to related Wikipedia articles and sub-articles and be linked to themselves.
Articles that only refer to each other, but are not linked to or
linking to other articles, are also considered a threat to the principle
of building the web.
We can analyze a language’s state of
interconnectedness using the Wikipedia statistics pages, featuring lists
of the most active bots per language Wikipedia. They reveal that most-used
language Wikipedias, which obviously contain much more content than the
smaller language Wikipedias, have bot activity distributed across
administrative tasks. In German, for instance, the top 45 listing of
most active bots features 27 interwiki bots and 18 bots that are meant
to edit content, add categories and fix broken links.8 In
the smaller language Wikipedias, bots significantly outnumber human
editors and are mostly dedi- cated to linking articles to related
articles in other Wikipedias. They make sure that the content, however
scarce, is networked. The Cornish Wikipedia’s top 45 of most active
bots, for instance, shows at least 35 interwiki bots, while the
remainder are bots with unspecified functions. These interwiki bots,
such as ‘Silvonenbot’, a bot that adds inter- language links, make
connections between various language Wikipedias. Smaller lan- guage
Wikipedias thus make sure that every article is properly linked sideways
and prevent the language Wikipedia from becoming isolated.
Tracing the collaboration between human and non-human
agents in Wikipedias thus allows for an interesting and unexpected
insight into the culturally and linguistically diverse makeup of this
global project. Following the ‘wisdom of crowds’ paradigm, we might have
been tempted to look for cultural-linguistic diversity in
terms of many people across the world collaborating in different
languages and from a number of cultural back- grounds. In line with this
paradigm, British information scientists have demonstrated that
the internet (and Wikipedia, in particular) is
anything but a culturally neutral space; major aspects of collaborative
online work are influenced by pre-existing cultural differ-
ences between human contributors (Pfeil et al., 2006). Adding a
‘natively digital’ analy- sis of the varied distribution of bot
dependency across the wide range of language Wikipedias, we show that
cultural differences in collaborative authoring of Wikipedia content
cannot just be accounted for in terms of its human users; they reveal themselves perhaps more candidly in the relative shares of human and non-human
contributions as revealed through automated patterns of contributions.
High levels of bot activity, mainly dedicated to networking content and
to building the web, are an indicator of small or endangered languages. A
richer variety of bot activity, largely subservient to human edit
activity, could be considered an indicator of a large and lively
language space.
Wikipedia has most commonly been evaluated – either
praised or criticized – for its col- laborative knowledge production by
many (anonymous) minds. From the start, research- ers have pointed out
that Wikipedia thrived by virtue of a small core of dedicated
contributors rather than a large crowd of collaborators, even if the
encyclopaedia became more hospitable to common users after 2006. In
addition, Wikipedia has never been this mythical egalitarian space, for
the various user groups have very distinct levels of per- missions. Past
Wikipedia research has focused mainly on the crowdsourcing of knowl-
edge as well as on the reliability of Wikipedia content. As we have
shown, some researchers, such as Halavais and Magnus, tested Wikipedia
by entering false informa- tion. These types of research have isolated
Wikipedia content as a static product mainly by assessing it against
other encyclopaedic records.9 Other research, mostly (investi- gative) journalism, was intent on ‘outing’ anonymous human editors, making use of ‘counter-technology’
like the WikiScanner to reveal the identity of contributors or the
origin of an edit. More recently, we have seen different research
approaches to Wikipedia, such as that of Rosenzweig who acknowledges the
encyclopaedia’s dynamic content as well as the significance of its
partially automated content-management system. For the most part, though, scholarly evaluations of Wikipedia have adhered to the human con- tent-agent paradigm.
In this article, we have argued that Wikipedia’s
nature and quality should be evaluated in terms of collaborative
qualities, not only of its human users, but specifically of its human and non-human actors. Since 2002, Wikipedia content has been maintained by both tool-assisted
human editors and bots, and collaboration has been modulated by pro-
tocols and strict managerial hierarchies. Bots are systematically
deployed to detect and revert vandalism, monitor certain articles and,
if necessary, ban users, but they also play a substantial role in the
creation and maintenance of entries. As we have shown, bot activity may
be analyzed as an indicator of the international or intercultural
dimension of Wikipedia as a global project.
In the fall of 2009, Wikipedia has introduced
WikiTrust, a MediaWiki extension developed by the WikiLab at the
University of California in Santa Cruz. With WikiTrust, newly edited
parts of Wikipedia articles are colour coded according to reliability,
based on the author’s reputation, which is established by the lifespan
of their other contributions.
Instead of turning to the expert to check all
articles, Wikipedia builds further on the com- bination of rules,
hierarchies and editors. To understand Wikipedia’s collaborative pro-
cess, we need to unravel not simply Wikipedia’s human agents, but the
specificities of its technicity.
We propose to extend this kind of analysis from Wikipedia to various kinds of Web 2.0 infrastructures. Non-human
actors and coded protocols are often overlooked in the many optimistic
Web 2.0 theories that triumphantly claim the virtues of mass collabora-
tion on Web 2.0 platforms (Tapscott and Williams, 2006). It is important
to question the assumptions of the internet as a merely social
laboratory of human interaction, instead analyzing and interrogating the
sociotechnical system that lies at the core of Web 2.0 platforms. Human
and machine contributions are complementary parts of a society of
control in which social interactions are increasingly facilitated by
means of coded, auto- mated processes. Human judgements such as
reliability, accuracy or factuality are turned into machine-coded
and regulated alert systems, as illustrated by Wikipedia’s mecha- nisms
for generating and checking content. Nicolas Carr compares Web 2.0 to a
Mechanical Turk, which ‘turns people’s actions and judgments into
functions in a soft- ware program’ (2008: 218). A thorough and critical
understanding of the automated pro- cesses that structure human
judgements and decisions requires analytical skills and medium-specific
methods which are crucial to a full understanding of how the internet
works. Instead of succumbing to the mechanisms of control, users should
learn to criti- cally analyze their interactions with technology and
actively engage in technology’s development (Zittrain, 2008: 245).
In line with David Beer’s call in this journal for a
more thorough understanding of the ‘technological unconsciousness’ of
participatory web cultures, we have tried to explain the ‘performative
infrastructure’ of Wikipedia by focusing on the second level of analy-
sis that Beer proposes; namely, unravelling software infrastructures and
their applica- tions (Beer, 2009: 998). In this article, we have
deployed several natively digital methods to unravel in detail the close
interdependency of human and technological agents. It is important to
comprehend the powerful information technologies that shape our everyday
life and the coded mechanisms behind our informational practices and
cultural experi- ences. The analysis of the Wikipedia platform as a
sociotechnical system is a first step in that direction.
1‘Digital methods’ is a term for medium-specific
methods for web research coined by Richard Rogers (2009). The research
for this article was conducted with the Digital Methods Initiative and
the Foundation (2008).
2On top
of the set of rules, there is a fourth important general principle
emphasizing the ideals of openness and collaboration that lie at the
core of the project: ignore all rules. This general principle was
written up by Larry Sanger to make clear that, above all, Wikipedia is
an open platform. Wikipedians should first and foremost strive to
improve and maintain Wikipedia (see:
approach was heavily criticized, mainly because he deliberately
littered his object of study.After the event, Halavais regretted his
approach, especially because the media attention on
his experiment encouraged others to test Wikipedia by inserting mistakes. His website (http:// now includes a call for testing Wikipedia in a ‘non- destructive way’.
4In the
summer of 2008, Virgil Griffith launched the WikiWatcher suite, a set
of tools designed for monitoring and maintaining Wikipedia. The suite
includes a tool that makes it possible to de-anonymize users with a username whose IP addresses match those of other user(name)s or companies/institutions in an IP-to-Geo database. This stretches the notion of anonymity from the unregistered to the registered with a username (see Griffith, 2008b).
2 works the other way around: enter a company name or URL and the tool
shows you which Wikipedia articles were edited from that organization’s
IP address.
overview of the available local Wikipedias is given on the Wikipedia
portal page (www. Wikipedia currently has 264 language
versions, some of which only have a main page and no articles as of yet.
The largest Wikipedia is in English, with more than 2 million articles;
followed by the German, French, Polish and Japanese editions, each of
which contain more than half a million articles. 17 other language
editions contain 100,000+ articles, and more than 100 other languages
contain 1000+ articles; the overview also includes the smaller ones with
only 100+ articles and even Wikipedias that have only a main page
7In a
case study on Wikipedia as ‘networked content’ that Sabine Niederer
conducted with Richard Rogers et al. (2008), during the 10-Year
Jubilee Workshop of the Foundation, the researchers noticed
the great discrepancy between bot activity in English and the other lan-
guage versions of Wikipedia.
top 45 most active bots on the German Wikipedia consist of 27 interwiki
bots and 18 bots with various tasks such as editing, fixing links and
adding categories (http://stats.wikimedia.
this date, communication scholars like Halavais and Lackaff (2008)
examine Wikipedia’s reliability and completeness, assessing the
qualities of its users rather than those of its systems.
