First published 10 August 2005, by Jeroen Coumans, as my final thesis for Arts and Culture.
Chapter 1. Introduction
The Internet has a large number of applications, but one of the oldest and most basic application is as a communication medium. The Internet was used for e-mailing and sharing publications in the 70’s and 80’s, long before the World Wide Web was invented by Tim Berners-Lee in 1991. The unique nature of e-mail, where delivery and pickup of new mail is instantaneous, has also radically changed the nature of the messages itself. The cost of traditional letters is quite high in comparison to e-mail, both in temporary, spatial and economic requirements. Thus, the effort to send a traditional letter is also higher, which has affects on the nature and quality of it. For example, a hand-written letter can’t be easily copied or sent to multiple recipients at once. E-mail on the other hand makes it trivial to send and copy messages. As a consequence, e-mail has brought us many new categories of messages and types of mail: ’chainmail’, ’flamewars’, spam, automated ’out-of-office’ replies, viruses, ’attachments’, and so on. And even though a traditional keyboard character set has less expressive capabilities then a traditional letter, e.g. in terms of characters, typography, and free-hand drawing, people found ways to make it more expressive through the use of abbreviations, signs and symbols (Sproull, 1989). For example, to communicate emotions, people often used symbols such as :
) (a smiley) to convey happiness and :( to convey sadness. The restrictions inherent in e-mail were thus a source of inspiration for seeking richer communication, and this form of communication changed the communication itself and the actors involved in it. In other words, while e-mail and traditional letter exchange can be seen as the same from a pure functional perspective, they are radically different in a technical and social perspective.
The Web has not only been used as a publication platform, but also increasingly as an application platform, with famous examples such as Google, Amazon and Ebay, a trend which is even called a “paradigm shift” (’O Reilly, 2004) by industry analysts. But just as communication itself has been changed by using the Web as a platform, applications have also changed by using the Web as a platform. Not only on a technological level (programmatic interfaces, functionality, speed, efficiency, etc.), but also on a social level. This is especially true for collaboration software, since these applications are specifically designed to support and enhance human interaction and collaboration. In my research, I have explored the use of online (Internet enabled) collaboration software, with a focus on how their use has changed the projects and the people themselves. I have also explored how the available collaboration applications are fundamentally different from traditional (non-Internet enabled) means of computer-mediated collaboration, and how this led to interesting research problems.
The main research problem which I have examined is how online collaboration tools change their actors, and reversely, how online collaboration tools are changed by the actors involved. In contrary to common perception, technology is not considered an unproblematic deterministic force which shapes environment. Rather, technology is seen as an aspect of a socio-technical ensemble, and analysed as such with the ’SCOT’ model (Social Construction of Technology). To examine my research problem, I will try to deconstruct two case studies by applying the SCOT model, although its use will be critically evaluated for its applicability in a strict online context. The case studies are complementary and focus on different issues and shaping forces. The first case study, Linux From Scratch, which is about a project where participants collaboratively write a book, is mostly driven by the political force of internal groups and individuals. The second case study, Wikipedia, which is about a project where participants collaboratively contribute to the development of an encyclopaedia, is much more influenced by outside opinions and interpretations of the technology. Using two case studies allows me to touch upon aspects which are not available in each case, and thus helps establish a more generalised thesis about the use of online collaboration tools.
Computer-mediated communication (CMC) is about the ways that humans and computers interact together, where computers are specifically used to assist and support human actors in the process of working together. CMC and Computer Supported Co-operative Work (CSCW) have developed extensive models of technology and society since social scientists started playing a more active role, some of which are of use in examining the research problem. Science and technology studies (STS) have critically examined the mutual relationship of society and technology, and how they effect each other. To put my research into context, it’s necessary to explore the history of these fields.
This leads me to the following specific questions:
- What are the challenges of doing online (virtual) research, and what theories exist to support this?
- How can existing theories and models in social sciences and social informatics be used for online case studies?
- What is meant by (online) computer-mediated collaboration? What does it mean for doing research?
- In the case studies, what were the criteria of the various actors to choose a specific online collaboration model? Did these criteria evolve during the history of the project, and if so, how?
- What socially relevant groups can be identified in each case study?
- Can a dominant interpretation of the project be identified, is there closure of meaning? Did this affect the use of the collaboration tools?
Chapter 2. Theoretical background
First of all, I’ll discuss the methodology which I’ve used to examine the case studies. Then, I’ll try to frame this in a theoretical perspective to analyse the case studies. Then, I’ll give a short history of the relevant disciplines in which this research is embedded. Finally, the recent insights from the various disciplines will be presented.
Methodology and case studies
I will discuss two unique case studies to determine the transformations in technology and actors through the use of collaboration applications on the Internet. Both case studies are projects which successfully use a diverse range of online collaboration tools, and are thus good case studies to see the developments and choices which have been made and what effect they had on the projects. Furthermore, they both have an active user community and a relatively long history. The first case study, Linux From Scratch, makes extensive use of e-mail and mailing lists for support, communication, co-ordination and collaboration. Other technologies, such as IRC, website and Wiki are also used, but in a lesser extent and mostly to fill in the gaps which can’t be filled with e-mail. The second case study, Wikipedia, is the other way around. It makes extensive use of various Wiki’s and websites for discussion and collaboration, and only uses other technologies, such as mailing lists, in smaller sub-projects where they are explicitly required.
For my research on the Linux From Scratch case study I’ve drawn on my own experiences from when I was an active participant member of the community. My first encounters with the project date back to 2001, and I’ve been an official member from 2003 to the end of 2004. As such, I have first-hand experience with the community and can provide information from an inside perspective. This was also of use when researching the mailing list archives. While these are available online, it takes considerable time to catch up with the rich and complicated history of the lists themselves and the various cross-postings with other lists. As former member, I was able to quickly pick out relevant discussion ’threads’, which significantly sped up the research.
Of course, the fact that I was an active member of the project can also be seen as a handicap: I’m more biased or ’subjective’ then outsiders would be. Such objections mostly come from the ’harder’ sciences, where a quantitative methodology is the foundation for establishing authorical claims and fact making. This is based on a strict separation between the researcher and his research object. To research the LFS community, I’ve instead used an ethnographic approach, which requires not only a researcher who is immersed in the field (which is based on connection rather then location), but also an active engagement from the researcher (Hine, 2000).
In order to say something meaningful about a community, one has to become part of it and experience it. The level of involvement in ethnography should be balanced by a reflexive use of the methodologoy, a participatory ethnography, which means that a researcher should be too distant as to stay superficial, nor too involved as to go native. My previous involvement in LFS ensures that I’m not a distant observator. And since I’m no longer an active member of the community, I can better reflect on my experiences, which saves me from the risk of ’going native’.
For my research on the LFS community, I’ve only used sources available on the web. This keeps me in a symmetrical position to the information available to the participants of the LFS community. I’ve also included a lot of hypertext links to these sources, which makes it easy to verify my statements and makes the text more authoritative. That also means that this paper is best experienced in hyper-textual format. Most information comes from the archive of the mailing list ’lfs-dev’ from 2003 and 2004, but I’ve also used information from the website, the FAQ and the Wiki.
To analyse the mailing lists, I’ve used discourse analysis techniques (Tonkiss, 2001). Discourse analysis is described as “part of a larger body of social and cultural research that is concerned with the production of meaning through text, together with semiotics, ethnomethodology and conversation analysis” (Tonkiss, 2001). This means that the messages on the mailing lists are seen as attempts to construct a social reality instead of a reflection of society. This is especially useful and obvious in the online context of the project, where one’s identity is determined by his or her expressions. Discourse analysis offers means to deconstruct language itself and see the formative properties and influence it has on the reality which is constructed through language. I have used this analysis to examine various incidents which were formative of the project’s direction and played a crucial role in determining collaboration tools. This is a constitutive element in the formation of socially relevant groups in the project.
In the second case study, Wikipedia, I’ve taken an outsiders’ position and tried to use the discourse analysis techniques on various debates on the Internet about Wikipedia. Although I have been actively involved in Wikipedia, my engagement is not so big as to warrant an ethnographic study. Therefore, while I’ve attempted to describe the Wikipedia community, it’s roots and how internal groups shape the project, the main focus is on how debates by people outside the direct Wikipedia community have constructed an interpretation of the project and how the projects technology and community was transformed because of these debates.
In order to identify social groups in the debate about the trustworthiness of Wikipedia, I’ve searched for and collected various online materials. I’ve used various search engines, such as Google, and followed hypertext links between sources. I’ve found a wide variety of comments on Wikipedia, which ranged from journalist articles to personal ’weblogs’ or ’blogs’ and the dialogues there to technological analysis and academic papers. The goal was not to be conclusive or complete in the collection of opinions or to get a representative sample, which would be an impossible task, but rather to discern patterns and common jargon between groups, in order to identify interpretations of the Wikipedia project and its technology.
As theoretical framework, I’ve mostly worked from the ’social-shaping’ theory instead of the often-simplified technological determinism point of view. In tackling the research questions, I’m going to deconstruct two case studies of projects which are good examples of successful online collaboration. A ’soft’ technological-deterministic (Smith and Marx, 1994) approach would focus on technological capabilities, design decisions and inherent ’politics’, i.e. the opening or closing of certain social options (Winner, 1999), to determine the impact of technology on virtual groups. The case for Wikipedia, then, could be answered by analysing the various barriers and openings it creates, for example, access to the Internet, intelligence, trust, language etc. As such, it could be argued that Wikipedia (or rather, the engine that drives Wikipedia and the project itself) is an inherently political technical system. Such an analysis would say a lot about the effects that the technology has on its participants and how they adapt to it. However, it would say very little about how the participants actually shape the technology itself, e.g. by modifying the engine’s source code as a result of discussions, or tweaks to enhance the usability with the intended effect of attracting more participants. In other words, a deterministic perspective regards participants as passive recipients of the technology, while we will see that there is a rich variety of participants, who, in various degrees, actively shape the project and its corresponding technologies.
Rather then seeing the technology or technological artifact as having a fixed meaning, social-shaping theory and especially the ’Social Construction of Technology’ or ’SCOT’ model stresses that socially relevant groups determine meaning, acceptance and success of a technology. Different groups have different interpretations of an artifact, and can thus shape it’s meaning and construction. Whether an artifact works or not is not an intrinsic property of the artifact, but is determined by the user. This so-called ’interpretative flexibility’ is crucial in understanding the meaning of an artifact. In later versions of the SCOT model, interpretative flexibility is a continuous process, although there are phases of ’closure’ or stabilisation when a dominant interpretation arises. Society and technology together are seen as the technological ensemble. The SCOT model does not give more importance to producers or consumers of a technology, but sees them as mutually constitutive.
The definition of socially relevant groups in SCOT makes it hard to identify groups who are not explicit in their interpretation of a technology, but can contribute to the interpretation or development of a technology. In other words, the SCOT model makes it hard to be complete or even know when you are complete. This is especially true in the case of the case studies which I’ve chosen for this paper, since they are both set in a strict online setting on the Internet. Therefore, this research does not pretend to give a complete survey of interpretations, rather, it will look at patterns which are evident in various sources, and see how they contribute to the construction of technology.
The Internet comes with its own challenges on how to find and define socially relevant groups. One of the methods I’ve used is to consult search engines to search for people who comment on certain technology. However, the danger in that approach is that the specific ways in which search engines determine the ranking of pages determine in large part which socially relevant groups will be uncovered. Therefore, I’ve used multiple search engines and cross-site links to negate any search engine bias. At the same time, the Internet also allows new ways to research data. Search engines function as useful filters for irrelevant comments and thus as indicators for determining whether a particular interpretation has had impact or not. Again, the consequence for the SCOT model is that the results of this research are not about the completeness of the socially relevant groups which have been identified, but rather about the interpretations which have been the most relevant. This is what Hine (2000) calls an ’adaptive ethnography’, with which she means that a virtual ethnography can never be complete since its boundaries are defined by the connections which are made between the research data, and it’s thus up to the ethnographer to define the research object.
A particular challenge of the Internet is the lack of persisting information resources. This makes it harder to collect and validate research data. For example, to examine the changes which the projects went through, it’s necessary to examine digital archives which are available on the Internet. Most websites have a limited life span on the Internet, so a lot of information is lost or relocated. While there are attempts to provide permanent archives for snapshots of websites, e.g. the Internet Archive, they aren’t complete. Most of my data comes from the websites on which they originally appeared, and are still valid as of June 2005.
The Internet also brings interesting challenges for identity and group affiliation. Each Internet user is effectively anonymous for other users and constructs his or her identity through Internet publications. This raises methodological issues for validating subject resources such as mailing lists. Hine (2000) and Miller and Slater (2001) have published both excellent books on the subject of virtual research. Hine side-steps the issue of authenticity by stating that it’s not authenticity that should be researched, rather, “the ethnographer aims to assess how the culture is organised and experienced on its own terms.” (Hine, 2000, p. 119). The authentication of identity should only be of concern for the researcher if it’s of concern for the informants. In other words, identity is a research object, not a methodological issue. Miller and Slater state that the question of authenticity is only relevant as long as the distinction between online and off-line identity is made, which, in the context of identity play, is often not experienced as such.
Both the community behind the LFS project as the Wikipedia project have wrestled with the question of identity and authenticity. In the LFS project, identity is mostly seen in relation to the identification of persons and groups. Most people who post, especially veteran members, post with their real names, although there are notable exceptions of people who only post with an alias. The use of an alias is also more prolific on IRC, where a short ’nick’ (nickname) is easier to differentiate between individuals, especially when people share their first or last names. In general, and specifically in the LFS community, the use of a real name or an alias is not an interesting issue, since it has no implications for the interpretation of the messages which are written by these people. More important is that whatever identifier is used by an individual to distinguish himself or herself, it is used consistently.
The issue of identity and group formation is more complex in the case of Wikipedia, especially in the debate about the trustworthiness of Wikipedia’s sources, which will be analysed in chapter 4. Staying anonymous or using real names is constructed as an issue when it comes to assessing the quality articles on Wikipedia, since it is directly linked to claims of authority. Not only is the Wikipedia community much larger then the LFS community, it is distributed among different localities with their own languages, and there are more levels of being ’in’ or ’out’ of the community. Rather then a single, coherent group, it should be seen as a network of various groups, connected by their ideas of what Wikipedia is or should be. Lastly, the Wikipedia project is more reflexive of it’s own functioning, and is more regulated then the LFS community.
Although the distinction between ’virtual’ and ’real world’, or ’online’ and ’off-line’ is often made in popular terminology, recent ethnographic studies from Hine and Miller and Slater indicate that it is not a difference in the experience of people. There is no difference in the way people experience identity, community or culture when online or off-line, rather, the ’virtual’ is very much a part of life, instead of apart of life. The case studies and the communities behind them don’t seem to support this though. Their strict online nature is a necessity to successfully collaborate and reach the project goals; there is no off-line LFS or Wikipedia community. This is not to say that these communities are have no relation with the off-line world, rather that they can only be experienced on the Internet.
Finally, concepts such as community and membership are not opaque. Participants of both the LFS and the Wikipedia project often refer to themselves as ’members’ and to the group as ’community’. These concepts have a long history and tradition in social sciences, and typically differ in meaning from that on Internet and in computer-mediated communication. Traditionally, they referred to geographical connection and moral commitment (Baym, 1998). On the Internet, community and membership of it refers more to shared interests and subscription. As such, a community on the Internet should be seen as emergent from several sources, such as external contexts, temporal structure, system infrastructure, group purposes, and participant characteristics, rather then predictable by environmental factors.
“Social Informatics”, Science, Technology & Society
My research draws from insights in the fields of social informatics and science and technology studies. Social informatics is the bridge between social sciences, such as anthropology, ethnography and ethnomethodology, and computer sciences in the fields of human-computer interfaces (HCI), computer-supported collaborative work (CSCW) and computer-mediated communication (CMC), artificial intelligence, etc. A more formal definition is “the interdisciplinary study of the design, uses and consequences of information technologies that takes into account their interaction with institutional and cultural contexts.” (Kling, 1999). As such, social informatics studies the producing side of technology, especially in respect to design, implementation and effect of information and communication technology. Social informatics in the above definition is a young discipline, and the distance between the social sciences and computer sciences was rather recently still described as “the great divide” (Bowker et al., 1997). Still, the research and co-education of both disciplines have steadily moved towards each other, which led to the birth and conception of a new discipline now called ’social informatics’.
Initially, in the 60’s and 70’s, social scientists were invited by computer scientists to provide metaphors for solving research problems in the domain of artificial intelligence. For example, the metaphor which resulted from the question how human intelligence can best be modelled in artificial intelligence was that of a scientific community. Gradually, the relationship between social and computer scientists began to change to one of partnership and discovery. Computer scientists began to study ethnography in order to understand the interaction between humans and computers, which eventually led to the field of computer-supported co-operative work in the 80’s and 90’s. Since the 70’s, a turn from modelling and extrapolating laboratory circumstances to research in complex real-world dynamics, led to the development of “new approaches to system dynamics based on the hypothesis of unequal individual access to collective information resources” (Bowker, 1997, p. 16,). This led to the methodological rule that technology models social structures, but is shaped by its use in the workplace (ibid). Current themes in social informatics are all centred around the question how technology models society and vice versa.
A central idea in social informatics is that the success and use of any information technology is dependent on the social context. This means that technology and social behaviour should not be seen separate from each other, but are mutually constitutive. While research in the 60’s and 70’s was focused on the deterministic aspects of technology, later research, influenced by the input from social scientists, is more keen to the idea of the ’sociotechnical ensemble’. These ideas are partly inspired by science and technology studies, which has studied the relationship between technology and society in a more abstract field.
Science and Technology Studies (STS) experienced a similar ’turn to the social’ in the 80’s. Before the 60’s, science and technology were seen as autonomous entities separated from their social context. In the mid-1960s, a break occurred when technology was no longer seen as a neutral, autonomous tool but contextualised in society and human values. A second break occurred in the mid-1970s when scientific knowledge was examined by sociological researchers. This led to a similar contextualization of science in the natural world, which “places constraints upon the construction of scientific knowledge claims and technological artifacts” (Bowden, 1995, p. 71). A third break was in the late 80’s, when science studies turned to technology, which extended these insights to technology. This led to social shaping theories, which states that technology is shaped by social factors. However, the ’pendulum’ (Bijker, 1995) swings too far, and technology was seen as merely a social construct. The concept of ’sociotechnical ensembles’ emerged to indicate the co-construction of science and technology.
Current collaboration possibilities
To put the technology in its social context, it is helpful to categorise them according to their functionality. Andriessen (2003) states that collaboration technology appears to serve 5 functions or categories:
- Communication: these tools overcome the distance between geographically separated people. Both synchronous as asynchronous tools serve this function. E-mail, fax, telephone, video and chat all belong to this category. It is the most fundamental aspect of collaboration, and can even be seen as a meta-function, since the other categories can be viewed as aspects of communication.
- Information sharing: storage and retrieval of information, which can be via shared databases or file servers. Message boards and presentation systems.
- Co-operation: document sharing and co-authoring belong to this category.
- Co-ordination: these tools synchronize work processes, e.g. through shared calendars or workflow management systems.
- Social encounter: this rather vague category includes virtual reality and media spaces. This category is only useful when collaboration technology is viewed as supplement to traditional groups. Although it may be interesting since it allows for nice comparisons with regular social encounter systems, it is the least interesting with respect to strict virtual collaboration with no previous relation or basis in a non-virtual context.
While this classification is not unambiguous, we will see how different categories of functionality have effect on the different projects in the examination of the case studies.
Chapter 3. Linux From Scratch
What is Linux From Scratch?
Linux and Open Source
Linux is, roughly speaking, an Operating System (OS) for computers. An OS is the most fundamental piece of software required to operate a computer; it takes care of driving the hardware and software. Linux is developed by Linus Torvalds since 1991, who began on the design and implementation of this ’kernel’ for IBM 386 PC’s. He released its code under the GPL on the Internet for other developers and hobbyists to look at and improve on it. Some of the so-called ’hackers’ (1) have become important actors and the community which was formed around the project grew considerably.
Linux wasn’t developed in a vacuum. A big factor in its success is the license under which it is distributed, the GPL, the “General Public License”, written by Richard Stallman. The GPL is a software distribution license which grants people the right to inspect, modify and re-release software, provided that these modifications are distributed under the same license. As such, it is sometimes described as a “copyleft” license, since it has the opposite purpose of copyright. This creates an environment very much like the scientific community (Bezroukov, 1999), in which peer review and the sharing of ideas leads to better design, better integration and better software.
The idea of freedom (“free as in free speech, not as in free beer”, Stallman, 1991) is very central to the Open Source software movement and was instrumental in the consequential success of Linux. By combining the Linux kernel with other Open Source software it was possible to construct a self-supporting OS. To ease this process, there were efforts to collect this software, which was only available in source code, and distribute it in machine-readable way, as ’binaries’ or ’executables’. This led to the concept of a “Linux distribution”, a system which is pre-compiled with binaries, ready for installation and use. A core aspect is that throughout the history of Linux, a wide variety of distributions have existed, each filling a particular need or niche. For example, the Ubuntu distribution is targeted towards novice end-users, who just want to “use” their system, while the Slackware distribution is more for experienced users who want to tweak and customize every aspect of their system.
History of Linux From Scratch
Similar to how Linux was started as a hobby project, the Linux From Scratch (LFS) project was initially started as an article for a Linux magazine. The central idea in this article was to describe the process of building a Linux distribution using the freely available software source packages to construct a system. The motivation for doing so was that distributions apply customizations to the core components of a system, which are not always in the interest of the users, and that a distribution can never be as fine-tuned or customized as a self-built system. The LFS article guides the reader in downloading, ’compiling’ and configuring all the required source packages to construct his or her own Linux system. Important motives to build a “Linux From Scratch” system are the perceived ’cleanliness’ of unwanted customizations of the resulting system, the pervasive power and control which one has over every aspect of it, and the optimization of the system.
The article in the Linux magazine quickly got a life of its own and grew to become “The Linux From Scratch How-To”. The “Linux Documentation Project” played at that time a critical role in the gathering and distributing of up-to-date and technically correct Linux documentation. Because of this community inclusion, the Linux From Scratch How-To attained a number of properties which were mandated in the Linux Documentation Project. This had consequences for the article’s distribution license (which was meant to give a similar freedom in use to the reader), the technical format in which the book was initially written and the use of version numbers for the documents. These properties were instrumental in achieving reader adoption and, later on, co-authorship.
Initial development went very fast; going from version 1.0 (the first version of the How-To) to 2.3.1 (the first version as a TLDP guide or stand-alone book) was in little over a year. After that, development slowed down a bit, typically taking at least a year between a new ’major’ version: in September 2001 came 3.0, October 2002 4.0 and November 2003 came 5.0. The 4.1 version and 6.0 versions were even printed as actual books (2), demonstrating the success of the project. The most recent ’stable’ version dates from October 2004, and is 6.0, although currently (June 2005), work is well underway to version 7.0. A new version indicates that the book has gone through major revisions, such as a new build methodology or the use of all-new source package versions. The book is continually under development, using version numbers to indicate milestones, just like software projects, down to the use of collaborative technologies. Development is driven by the projects’ set-up and goals, which (as we will see) is largely dependent on the interpretation of the relevant groups.
Even external development of software, such as a new version of the C library (glibc), which usually has profound impact on the structure and methodology of the book (such as LFS-5.0), can be attributed by the community’s need to always use the latest software release possible. As such, LFS is firmly embedded in the Open Source ’culture’, where the latest version represents speed, features and stability and is thus always the goal.
The How-To quickly grew to a guide, which grew to a book. Perhaps the most essential factor in this growth was the successful use of a constant, archived communication channel (mailing lists) between the author and his users. This enabled the author to consistently refine the documentation while enabling more users to follow the book. The mailing list functioned as a support forum as well as a development forum. When the volume on the list grew too large, it was split in a development list and a support list (lfs-discuss and lfs-suport). Over the course of years, when communication diffused over topics, additional lists were created to support and manage these.
Mission and philosophy
The current LFS project can be best defined as a collaborative effort to write a book using strictly public communication via the Internet about the building of a Linux system using the original source code of the packages. The original How-To was expanded to a book and now has various off-spring books, such as “Beyond Linux From Scratch” and “Hardened Linux From Scratch”, and off-spring projects, such as the “Hints”, the “Patches”, the “FAQ” and “Automated Linux From Scratch”. This specialization has allowed it to limit its scope and more clearly define its goals. A large community was born, with facilities in communication, cooperation, coordination, and collaboration.
We can distinguish a number of values and norms in the community, which explain the group processes. These are: control, technical correctness and purity, timeliness and education.
An important driver is control. Control is the most fundamental principle in the book and used as one of the two primary motivations to read the book, the other being education of the workings of a Linux system. Most Linux users who turn to LFS do so because they are not satisfied with the opaqueness and complexity of a Windows system and not even with a regular Linux system. Like building a house, they only trust themselves to do it properly (3).
Technical excellence and purity naturally flows from the control which people wish to exercise over their systems. Purity is a term which comes from the pure LFS ’hint’ document written by two LFS community members. It builds from the thought that controlling the environment in which LFS is built increases it’s stability. In order to control the environment, a build method is devised with redundancy and regression tests. When the LFS system is built, this environment is used to bootstrap another LFS system, and binary diffs are used to verify that they are exactly the same. A popular slang word used to signify this obsessive technical correctness is ’anal retentiveness’, also first termed in the [pure LFS] hint.
Timeliness has two facets. The first facet is the need for LFS to always be up to date. Each week, a new version of a package is released. The community integrates these as fast they can, in order to determine the impact on the rest of the book. Some packages, such as the GNU C library (glibc) and the GNU Compiler Collection (gcc) require more testing, and have a larger impact then other packages. New versions of these packages often warrant the release of a new (minor) version of the book. The other facet of timeliness is the rather slow release cycle of the book. Apart from package updates and minor textual corrections, the abundance of communication channels make this also the area with a large overhead. For example, the time between version 5.0 and 5.1 took 6 months and featured only relatively little fixes and updates. At the same time, lfs-dev had a traffic of 5422 (4) messages, which averages to a little over 900 messages per month, or 30 per day.
Education, or the transference of the accumulated and collective knowledge in the LFS community, is implicit in the book but not reflected upon in the community. The sustaining of the support forum lfs-support is often considered a chore, and there have been discussions on how to raise the starting knowledge of new LFS users (5). The FAQ was especially created as a starting point for new LFS users on how to deal with the community and how to get one’s own knowledge up to par with the community. While education is often considered one of the most important project goals, its priority competes with technical correctness and timeliness. The educational value of the book is considered to come from these properties, and other features which could be attributed to educational value, such as reading flow and expanded explanations, attract far less discussion.
The author, Gerard Beekmans, is a Linux user and thus used all Open Source equipment which came with this system, even though most of the tools were more development-oriented instead of writing-oriented. The following tools have been used in the lifetime of the project:
- Mailing list software (Listar and Mailman), news groups and IRC (Information Relay Channel) for synchronous and a-synchronous communication. Without this, LFS wouldn’t have attracted such a large audience. Most of the following systems are integrated into or intertwined with the mailing lists, either through notification of changes (Subversion, website) or as a front-end for them (news groups, archive searching).
- The website has multiple interfaces. We can roughly distinguish the project documentation (the html files found on http://www.linuxfromscratch.org), the mailing list and search interface, the archive and file interface and the Bugzilla interface. Both the project documentation as the Bugzilla interface function as information sharing systems. The website is used for external communication and member initiation. The website also functions as an elaborate dissemination platform, utilizing additional technologies such as FTP, mirror servers and peer-to-peer (P2P) networks. Bugzilla functions as issue tracker and a fancy “to do” list for long-term storage.
- For co-operation, Subversion and its predecessor CVS are used to centrally store the project’s SGML and XML files. Both are widely used to allow multiple developers to work on the project files, an essential feature in the distributed Open Source world. As such, it was only natural that the LFS project’s ’source files’ are used in the same manner. The final collaborative tool used is a simple Wiki at http://wiki.linuxfromscratch.org, which has been used as community tool for collaborative discussion and writing.
- For co-ordination, no separate tools are used. Real-time coordination, usually only required in the case of server maintenance, takes place via IRC.
Identifying socially relevant groups
The LFS community can be divided in several groups, depending on the criteria. When we use participation as group delimiter, we can then distinguish 4 groups: those who have never read LFS, those who read it but don’t participate in active discussion on public fora about it (6), those who read it and participate in public fora and those who help create it. The first two are implied groups, the latter are those which I’ve examined on the lists. The first two groups are referred to by LFS users as “newbies” and are often used in discussions about the educational value of LFS or the support burden they cause. They are specifically excluded from support since the assumption (based on experience) is that they lack the necessary knowledge to successfully complete the book. The distinction in the latter two groups is rather artificial. It could be argued that everybody who participates in the mailing lists in LFS contributes to the development of it, although only those who are official (which means those installed by Gerard Beekmans) members have the executive power to do so.
Even then, non-members have a legion of options to significantly contribute to LFS: they can participate in development discussion, they can open ’bugs’ or comment upon them through Bugzilla, they have access to the development versions, they can contribute patches for problematic source packages and can contribute hints for solving problems unique to the LFS community.
When we dig deeper into the history of the LFS community, we can further refine the groups identified above. In the history of the LFS project, we can single out certain development milestones which led to a redefinition of the project. Some of these were caused by internal crises, which were brought about by differences in the goal of the project, the pace with which it develops or the organizational structure. These milestones were instrumental in redefining groups and interpretations on the LFS project. They were also instrumental in determining the use of collaboration tools and how they were used to support the new interpretations. I will use these events to try and answer the research question of how the use of collaboration tools have changed the users and vice versa.
The first milestone in the history of LFS which I have examined was a crisis initiated by an e-mail titled “Let’s open a can of worms, shall we”, started by long-time contributor Jesse Tie-Ten-Quee. The point of his e-mail was to discuss the organizational structure of the LFS project, and to discuss the position of Gerard Beekmans as project leader. Gerard’s role in LFS was increasingly diminishing due to time constraints and he became perceived, by some long-time members of the LFS project, as a bottleneck for the pace of development of LFS. For example, the integration of the pure-lfs hint took more then two months, and many commands and explanations remained out of sync. Co-editors of the book were frustrated by this, because discussion about development wasn’t resolved, or resolved issues wouldn’t be implemented. Some editors even left the project because of this. While there were enough worthwhile developments and developers which could contribute to LFS, they were not done due to lack of leadership and decisions. The LFS project was seen as Gerard Beekmans’ project, and other contributors (wether they had write access to the book or not), were seen as and considered themselves as helpers, instead of co-authors. The e-mail from Jesse brought a new interpretation to the surface, namely that the LFS project is a community project and should thus be carried out by that community.
In the resulting discussion, this dichotomy between ’leader’ Gerard Beekmans and ’the community’ was carved out further. This was compared to the larger context of Open Source development models, in the case of the Linux kernel and the BSD systems. The Linux kernel employs a ’benevolent dictatorship’ development model, where one person (Linus Torvalds) reserves the right to make the final decision about what code goes into the kernel, helped by an elaborate system of lieutenants (Iannaci, 2005) on which he trusts to make decisions about what code goes in or not. Gerard chose a similar way, calling out for volunteers in key areas to help out distribute the responsibility and workload and minimize the time required for actual decision making from Gerard himself. This resulted in a new organizational structure and the call for organizational documents, in order to formalize work procedures. While LFS was already split up in LFS, BLFS (Beyond LFS), ALFS, patches, hints, etc., the organizational move provided more granularity to the division of roles. LFS was redefined in several projects, such as LFS leadership, editorial, package maintenance, quality assurance, testing, ’toolchain’, FAQ, website, FTP, IRC etc. Not all of these existed previously, so it wasn’t just a matter of formalizing existing structure. Rather, projects were created whenever someone volunteered to do it, or convincingly suggested the need for it. Some of these projects, such as quality assurance and testing, died due to lack of interest from others.
The new LFS leadership team, consisting of Gerard Beekmans and Bill W. Maltby, required a formal ’manifesto’ from the LFS project, and a ’mini-manifesto’ from each team to define their subproject. This was meant to “describe LFS in terms of goals, strategies, major policies and functions, and similar attributes.”. A new collaboration technology in the form of a WikiWikiWeb, which was recently introduced, was used to collect these documents and improve them. The Wiki allowed the creation of web documents through a web browser, and allowed other visitors to edit the page without logging in. This was seen as the ideal method to grow an idea to full proposal. It was used for the Manifesto’s, but also for book development, such as the new XML structure.
At the same time, just before the summer, work had commenced on a new website. Main reason for doing so was to externalise the dynamics and developments which happened inside the community to those outside the community. To do this, news sections were added, which contained announcements and summaries of important developments on the lists. Another ’dynamic’ feature was the auto-generated output of log messages from CVS, so that any change in the book was reflected in a dynamic ’changelog’ on the website. The website also contained a separate section for each subproject, which allowed for specific documentation or news regarding that subproject (7). For some new subprojects, mailing lists were created as well, such as the FAQ and Website mailing lists.
The development of this website and the organizational formalizations led to the strengthening of Jesse’s interpretation of LFS as a community project, causing it to become the dominant one. We can say that the Linux From Scratch book is thus socially constructed as a community project by a group of editors who interpreted it as such. Since this interpretation became the dominant one, technologies such as the website, the Wiki and the mailing lists were adjusted to fit this new interpretation.
The second milestone was the announcement of an LFS ’fork’ (8) called the BE-LFS project, which stood for “Bleeding-Edge LFS”. This caused quite a consternation in the LFS community, since the ones announcing this ’fork’ were long-time contributors. They collaborated outside the regular LFS communication channels to work together on a version of the LFS book which incorporated the very latest software versions of critical components, some which weren’t even declared stable yet. The reactions were divided. Most members complained about the secrecy and ’private’ development through which this book was written. Everyone agreed though that this was not in the spirit of the Open Source community and that development should be done as much as possible through publicly available, archived communication channels.
The BE-LFS book also brought a new interpretation of the contents of the LFS book. Whereas previously, technical correctness and educational value were the main norms which guided development of the book, the BE-LFS book was specifically written for ’veteran’ LFS users. It was meant for those who regularly build a LFS system using the instructions in the book but who also deviate from those instructions, in other words, this was a book for the writers themselves. This caused quite some consternation, since it meant a radical diversion from the regular LFS book. People complained that BE-LFS was introduced as a ’fork’, although the BE-LFS members didn’t intend it as such. It also widened the gap between the LFS developers and the community which they targeted the book for, although the developers themselves stated that doing it this way meant faster development cycles, since there was no need to discuss every little detail to death. An e-mail thread, which was meant to find out if the community wanted to integrate BE-LFS or not, resulted in a threat to fork. The reactions to this were infuriating, and a change in development methodology was seen as the only reasonable compromise.
The change which was proposed was to move to a model where one version of the book would be the ’stable, educational’ version, meant for the target audience of LFS. This would be the latest released version and was labeled ’stable’. Another version of the book would work on the development of the next ’stable, educational’ version. Changes therein would consist of relatively stable changes, such as new package versions or spell fixes or better explanations. This branch would be labeled ’testing’. Finally, to integrate BE-LFS and the need for unrestricted development, there would be another version specifically for trying out new packages or changes with large impact. This branch was labeled “unstable” (See http://www.linuxfromscratch.org/lfs/development.html). Most people agreed to this model, although the BLFS editors complained that it made their task of being compatible with LFS more difficult.
At the same time, Gerard Beekmans realized his time was so limited that he required someone to help to make these decisions, because the community would otherwise not move forward. He therefore asked for volunteers for a ’second in command’, who would have the same authority as himself. Bill responded to this by stepping down from the LFS Leadership team, since his role to formalize the LFS organization was complete.
The implementation of the new development model had some consequences for the tools which were used to support LFS development. The CVS system, which was always used to hold the book files and their history, was technically outdated and slated for replacement. A proposal for replacement had already been discussed, and Subversion was seen as the most logical successor. The new development model and the migration to Subversion were therefore done together. This also allowed room to take advantage of some unique features of Subversion, namely the ease of so-called “branching”. This means that multiple versions of the same source tree can exist next to each other, with independent version control and development. It was the perfect fit for the new development model, and Subversion made it much easier to implement this then CVS.
Although Subversion is optimized for working with branches, it wasn’t the reason that LFS changed it’s development model. Rather, the choice for Subversion was the result of a lack of leadership, a new group which required a platform for quick development, and a community which liked to have an educational and stable book version. There was no closure of meaning here, rather, the new interpretation of the LFS book was added to the existing interpretation, and integrated in the community. And again, just as with the use of new mailing lists and the Wiki, the tools were adapted to the people, socially constructed, and not the other way around.
This does not mean that technology is therefore merely socially constructed. We can’t deny that the tools themselves in large part configure the use and users, although not in a deterministic sense. Rather, the users and technology mutually define each other. The community is shaped through the use of tools, but the selection of tools shapes the community as well. In effect, the users and the technology co-construct the Linux From Scratch project.
The LFS book can not be seen separate from its users, its history and its embedding in the larger culture of Open Source. It is a peculiar mix of mutual dependencies on passive readers, users, participants and leaders, and various technologies to facilitate the interaction and communication between community members, cooperation and collaboration on the book, in short, it is a ’technological ensemble’.
Chapter 4. Wikipedia
Wiki Wiki Wiki
History of Wiki
“WikiWiki” is the Hawaiian word for fast, speedy, quick or informal. On the Internet, a WikiWikiWeb server, or Wiki (9), refers to
a freely expandable collection of interlinked Web ’pages,’ a hypertext system for storing and modifying information — a database, where each page is easily editable by any user with a forms-capable Web browser client” (Leuf & Cunningham, 2001, p. 14).
The concept was conceived of by Ward Cunningham, a programmer who created it because he needed a quick way to publish software patterns on the Web. That first Wiki, the Portland Pattern Repository, was initially a really simple server script which added a link “Edit this page” to each each web page. Every visitor with a browser who understood forms could freely edit any page, without technical restrictions. As a result, it managed to attract other pattern designers, who felt compelled to add pages or edit and correct current ones, or add to discussions.
The Wiki software was released as Open Source, which allowed other people to inspect the code and improve upon it. Technical features, such as access control, link categories and change notification, were added depending on the needs of the community which needed them. There are currently (10) about a 1000 public Wiki’s.
As an in-browser editable website, a Wiki has unique features and problems. Some of its most interesting features are:
- A Wiki is, depending on the engine and the setup, an open system. Most Wiki’s allow all pages to be edited freely and anonymously. This has lead to many interesting communities, but also interesting problems.
- A Wiki is a collaborative software tool. It allows multiple authors to write on the same page, although most engines don’t allow this to be done at the same time, in order to prevent conflicting changes.
- It’s quick and easy to edit any page. Quick because editing is one click away (most Wiki’s feature a “Edit this page” link on the bottom of the page) and easy because it doesn’t force you to use HTML. Instead, you can use so-called WikiMarkup, which is plain text with some formatting rules. These formatting rules make sure that the Wiki can automatically convert a page to HTML for the web browser.
- A Wiki preserves each saved change and revision. It’s easy to roll back a page to a previous version. This is an essential feature if other people are allowed to edit the pages in the Wiki.
- A Wiki is a low-maintenance documentation and collaboration system. Low-maintenance because, after initial setup, users are only busy with reading, editing, writing and structuring content. At least, technically they are.
Just like any other new method of communication or collaboration, one has to experience a Wiki in order to appreciate it.
Some of the (social) problems which most Wiki face can be seen as a result from the (technical) features. One of the most common problems which most Wiki face is vandalism, also referred to as graffiti. Wiki heavily relies on the assumption that there are more people who have an interest in improving the Wiki then there are people who want to take advantage of it. An open Wiki is an easy target for vandalism. It takes an active user base to constantly repair the damage which is done by them. An especially malicious form of vandalism is link spam. Similar to e-mail spam, Wiki spam are advertisements or links to commercial websites. Links boost search engine ranking, and Wiki’s are thus an easy target for cheap extra links. Most Wiki spam is automatically inserted by a special “spambot”, which is a computer program that searches for Wiki’s which it can inject with links. Typically, these are combatted through technical measures, such as restricted access, devaluation of external links or frequency/time based access restrictions. Vandals (users intentionally writing nonsense) and trolls (users provoking for sake of discussions), are typically dealt with by the community, who watches their contributions and the most recent changes.
Another issue, which has more to do with attracting new users, is that a Wiki is constantly a work in progress. This can make it harder to read Wiki’s when it is also used as communication platform (which is fairly common). The C2 Wiki refers to thread mode, which is a page that is comprised of a collection of comments from visitors. On the C2 Wiki, when the discussion reaches a consensus, the page is restructured in a so-called document mode, which means that it is rewritten to read more like a conventional article.
Other Wiki-related problems revolve around a (deliberate) misinterpretation of the tool. Common examples are Wiki squatting and a walled garden. In both cases, the community Wiki is used for personal goals, to setup a personal website or a private corner in the Wiki.
The problems commonly associated with Wiki are a consequence of the social aspects of the software (malignant users), not because of the technical features of the system. Nonetheless, due to the technical configuration of a Wiki, or the configuration of the users of a Wiki, is much more resilient to graffiti or vandalism. In economical terms, this can be explained by the relatively high transaction costs of vandalizing a page and the low transaction costs of restoring a page (Ciffolili, 2001).
The largest Wiki, in terms of pages at least, is the Wikipedia, which boasts more then 1.6 million articles in 195 registered languages (http://en.wikipedia.org/wiki/Wikipedia, checked june 2005) since it’s inception in 2001. There are more then 300.000 contributors (11), which leads to them being referred to as Wikipedians, the most fervent ones called ’Wikipediholic’. The Wikipedia community is also quite reflexive of their organization, their social structure and their culture. The project is currently managed and paid for by the Wikimedia Foundation. Wikipedia has its roots as a complement to the Nupedia project. The Nupedia project was a research project from Jimmy Wales and Larry Sanger to produce an encyclopedia with free articles (in the sense of freedom, similar to Open Source). It was based on an elaborate peer-review system with strictly expert contributors from authors who had at least a PhD. However, this system prevented Nupedia from being very productive, and progress was considered slow at a rate of 18 articles in the first year. Therefore, the idea was put forward to setup an open side-project powered by a Wiki to attract more contributors, with the intent to move the more elaborate articles into Nupedia after the approval process. However, Wikipedia quickly gained domination, and Nupedia was, due to loss of interest and contributors, cancelled in 2002.
Wikipedia is not a regular Wiki
On several points, Wikipedia deviates from common Wiki practices. First of all, to better serve the project’s goal of producing a free encyclopedia, the Wiki software behind Wikipedia, Mediawiki, separates content from ’discussion’. Regular Wikipedia articles feature a selection bar at the top. This includes links to the article view, which is the default, and is strictly for topical information, and the discussion view, which is reserved for discussion on the article. Furthermore, any discussion about Wikipedia itself, or its goals, are deferred to the Wikimedia Meta-Wiki and the Wikipedia mailing list. If a visitor creates a user account, he or she is given a “User” page, where some personal information can be given and discussion between users is facilitated.
Another difference between Wikipedia and traditional Wiki’s are the elaborate foundation issues, which can be seen as a constitution of rules or principles which can’t be debated. One of the most important of these rules is the neutral point of view. This policy
“(…) basically states that its mission as an encyclopedia is best served not by advancing
or detracting particular points of view on any given subject, but by trying to present a
fair, neutral description of the facts — among which are the facts that various
interpretations and points of view exist.” (http://meta.wikimedia.org/wiki/NPOV).
The neutral point of view is an important social mechanism to deal with the heterogeneity of Wikipedia contributors. It is commonly used to resolve conflicts or editing wars, as well as the other elaborate policies and guidelines which Wikipedia has setup to control and shape the community. It is also a significant difference with traditional Wiki’s, which are more a work in progress, where documents can be in thread mode or document mode.
Finally, the Mediawiki engine, which powers the Wikipedia project, is fine-tuned for supporting encyclopedic activities and getting people to collaborate. For example, a new article without a lot of content is called a “stub”, and the mediaWiki software has elaborate mechanisms for identifying them and to encourage members to expand them. Special pages are dedicated to categorizing stubs, vote for deletion and the setup of collaboration
Wikipedia is a sociotechnical ensemble
A community as large as Wikipedia is not a uniform one. Just as in Linux From Scratch, there are several groups to distinguish. We can even distinguish similar groups from a participational point of view: those who have nothing to do with Wikipedia (non-users), those who just read it but don’t edit articles, those who read and help edit or create articles (they commonly refer to themselves as “Wikipedians”), and finally those who work on an organizational level, determining policy or enforcing it. This structure is, contrary to LFS, setup from the beginning, although explicit guidelines and policies were formed at the end of the first two years mostly by Larry Sanger (Sanger, 2005, http://features.slashdot.org/article.pl?sid=05/04/18/164213&tid=95). Although Sanger doesn’t note it in his list of why Wikipedia works, he later argues that ’official’ guidelines and policies were crucial after his departure, when Wikipedia’s growth was exponential.
Wikipedia users are configured
There is a large deal of user configuration going on in Wikipedia, both implicitly and explicitly. Explicitly through the use of technical properties of the software used, since there are various user roles with different capabilities for manipulating content or the system. Implicitly through the guidelines and policies, which is revealed through the various sub-communities in Wikipedia.
The technical properties of the used software, Mediawiki, are coded specifically for supporting Wikipedia and it’s related projects. As such, they reflect in large part the guidelines and policies which have been created. For example, the distinction between an article page and a discussion page is the result of the decision to separate the writing of an article and the talking about an article, even though this was not common practice or a standard feature in other Wiki engines. Similarly, the distinction of user privileges between anonymous users, signed-in users, “sysops” (system operator or administrator), bureaucrats and developers is hard-coded in the technical framework of the Mediawiki software. This technical feature shapes the interactions within the Wikipedia community, for example, by giving signed-in users more editorial freedom, anonymous users are encouraged to sign in or create an account. The “sysop” (or “administrator”) user role has even more control over content, for example, this role is allowed to edit protected pages, delete or undelete pages and block users. Typically, this role is liberally given to anyone who is for longer time active member. However, an administrator is no more or less authoritative over content as other users; typically, their decisions just reflect the community decisions. For example, the “Vote for Deletion” page (abbreviated to VfD, see http://en.wikipedia.org/wiki/WP:VFD) collects all pages which someone has voted to be deleted. Articles can get nominated for deletion for a lot of reasons: it can be a joke, it violates copyrights or it can have a trivial subject. It’s up to administrators to, after a lag time in which members vote for deletion or inclusion, carry out the decision of keeping or deleting a page. Despite appearances, however, this is not a democratic process, the administrator’s decision is still an individual judgment.
Internal Wikipedia politics
The VfD page is also the locus for a number of sub-communities within Wikipedia, namely the “Deletionists” and “Inclusionists”. These self-proclaimed sub-communities distinguish themselves very explicitly by their different interpretations of the Wikipedia goals. The VfD page is arguably the most political page in Wikipedia, since it reflects opposing interpretations on the project’s direction. On the one end of the spectrum is a group of Wikipedians who call themselves “Inclusionists”. Their interpretation of Wikipedia is a catalogue of all knowledge in the world. Their credo is that Wikipedia is not paper, which means that Wikipedia is not bound by the limitations of paper, and has virtually unlimited digital storage capacity due to disk space being cheap. The other end of the spectrum is a group who call themselves the “Deletionists”. Their interpretation of the Wikipedia project goal is to produce a quality encyclopedia with as little junk as possible. Therefore, their credo is “Wikipedia is not a junkyard”. Both groups only seem to exist to keep the other in balance, which explains the existence of groups like the “Mergists” or the “Apathetics”, who strive for quality content and broad range of articles, and AWWDMBJAWGCAWAIFDSPBATDMTD, the “Association of Wikipedians Who Dislike Making Broad Judgements About the Worthiness of a General Category of Article, and Who Are In Favor of the Deletion of Some Particularly Bad Articles, but That Doesn’t Mean They are Deletionist”, which obviously makes fun of both groups.
Neither of these groups have gained any dominance within the broader Wikipedia community though, and their relevance for the project direction is mostly restricted to the peripheral Vote for Deletion page.
In July 2004, the Boston Globe ran a critical article on Wikipedia by reporter Hiawatha Bray, “One great source – if you can trust it”. The point of the article was that Wikipedia’s greatest strength – open content – is also it’s greatest flaw. Since content is freely editable, all accountability is lost, because an article becomes the aggregation of multiple authors. As opposed to traditional encyclopedia, where known experts contribute articles, Wikipedia relies on the principle that many eyes together make sure that an article continually is updated and improved. Referring to the social darwinistic principle of “survival of the fittest”, this is called “Darwikinism”.
In August, the discussion about the trustworthiness of Wikipedia reached new heights when the Syracuse newspaper posted a provoking article in which a high-school teacher is quoted as saying that Wikipedia is not a trustworthy source (12). This incident was quickly spread around various ’weblogs’ on the Internet as “The Great Wikipedia Authority Debate”. The journalist, Al Fasoldt, had recommended his readers to go to Wikipedia for more background information in a previous column, but didn’t realize that the content was freely editable by readers. After he received an alarming e-mail from a high-school teacher, who specified what Wiki meant, he claimed to be misled by the site and called it a “supposedly authoritative Web sites that [is] untrustworthy”. While the article was done without any prior research, it apparently touched a painful nerve from Wikipedia supporters, especially the point that Wikipedia is untrustworthy because it is freely editable. Comparable articles, eg. in the Guardian, also indicate this problem with authority.
Wikipedia supporters regard the free editing as it’s greatest strengths. One particular supporter of Wikipedia (http://techdirt.com/articles/20040827/0132238_F.shtml) contacted the writer to prove the trustworthiness of Wikipedia. His argument was that Wikipedia has a “self-correcting community” which corrects errors other people (intentionally or not) make to articles. He suggests an experiment to intentionally make some errors on a Wikipedia article and see how long it takes before it is corrected. Fasoldt didn’t take the challenge, but Alex Halavais, an Assistant Professor of Communication and the Director of the Masters in Informatics program within the School of Informatics at the University at Buffalo, did, in The Isuzu Experiment. He made 13 obscure errors to various pages at Wikipedia, and hypothesized that most of them wouldn’t be fixed within 2 weeks. To his surprise, the errors were fixed literally within hours of his change. Comments on his site criticized his methodology and explained that the existence of the “RecentChanges” pages and the dedicated “RC Patrol” deliberately keep an eye out to suspicious edits. Another blogger conducted the same experiment, but was more subtle in spreading out the mistakes across more days and across multiple articles. This made it harder to notice for dedicated error-correction teams, and thus it was felt more of a test of the self-correcting powers of the project. When he checked back a week later, his errors weren’t corrected, which was deemed newsworthy enough to appear on Slashdot. Other tests to quantify the quality of Wikipedia were carried out, for example, Edward Felten made a comparison of a number of articles between Wikipedia and Britannica. He cautiously concluded that Wikipedia has a slight advantage due to the larger articles and bigger breadth of articles. This was explained as the results from a so-called systemic bias in Wikipedia. Wikipedia describes an encyclopedia as ’a compendium of human knowledge’. The systemic bias manifests itself due to the mismatch of the Wikipedia demography and the world (human) demography. As a result, the articles which are written for Wikipedia reflect the interests of the Wikipedia community instead of the interests of the world.
Critical arguments have been made by all kinds of people, from traditional media to academics, such as Robert McHenry and Larry Sanger. Robert McHenry, former editor of the respected “Encyclopaedia Britannica”, wrote the critical “The faith-based encyclopedia” article. The main point is that there is no closure of an article, which makes it continually susceptible to deterioration. Instead of a line of continual improvement, he attests that an article is edited to mediocrity. (His article has been criticized by Aaron Krowne in “The FUD-based encyclopedia, 2003”). Larry Sanger, co-founder of Wikipedia, states that the common problems of Wikipedia, namely the lack of public perception of credibility and the dominance of difficult people, are cause by the root problem of “anti-elitism, or lack of respect for expertise” (Sanger, 2004). Sanger attests that, because there is no attitude or key policy to defer to experts, Wikipedia only manages to keep and attract non-experts. Finally, there is criticism from librarians, who oppose the idea that more eyes means better articles: “damage and low quality will win over high quality, because high quality requires effort and low quality does not.” (Scott, J. “The Great Failure of Wikipedia”, 2003).
Identifying socially relevant groups
In the course of these discussions, some clear interpretations of Wikipedia are emerging, which brings me to the interpretative flexibility of the Wikipedia project. The first interpretation of Wikipedia can be labeled as a “communist” interpretation. Subscribers of this interpretation strongly believe in the combined knowledge of the collective: “The quality of Wikipedia Articles, at the very least, at a moment in time are better than they were before and will improve over time.” (Mayfield, R., 2004. Available: http://www.corante.com/many/archives/2004/08/29/wikipedia_reputation_and_the_wemedia_project.php). The communist interpretation consists of Wikipedians and supporters of the project who stress that Wikipedia is a community technology. Wikipedia enthusiasts, contributors and supporters (“Wikipedians”), describe it as “a free-content encyclopedia, written collaboratively by people from around the world.” (From: http://en.wikipedia.org/wiki/Wikipedia:About, retrieved June 2005). This points to the two most central ideas in Wikipedia: the freedom of using content and the freedom to write or edit it. It is so central that first-time users who test-edit a real page instead of the sandbox are thanked for their contribution, even though this can be seen as an act of vandalism (see http://en.wikipedia.org/wiki/Wikipedia_talk:Please_do_not_bite_the_newcomers). Similarly, the NPOV (Neutral Point Of View) policy ensures that multiple authors can collaborate on articles without disrespecting each others opinions. Technical features of the Mediawiki software are built to support this further, e.g. by separating the article and the discussion. Other concepts, such as “Darwikinism”, consensus, openness and collaboration also fall under this interpretation.
I label the second interpretation as an “aristocratic one, because subscribers to this interpretation strongly believe in the unequal distribution of (encyclopedic) knowledge. Some professions which are more likely to be part of this group are journalists, librarians and other traditional media spokespersons. The aristocratic interpretation stresses that Wikipedia is unstable and continually a work in progress, and can therefore not be relied upon as an online encyclopedia. The assumption upon which their arguments are based is that it is not likely that articles will get better when time goes by. To improve Wikipedia and erase weak points such as accountability and authority, they suggest that Wikipedia would be better if it incorporated features such as better peer-review or an expert board to assess articles.
Closure of meaning
Interestingly, the aristocratic interpretation begins to become the dominant view on Wikipedia. Despite the numerous arguments made by supporters of the communist interpretation, the flaws described are recognized as real pitfalls which should be resolved. This can be seen in the plans of Jimmy Wales and the Wikimedia Foundation to stabilize certain parts of the content of Wikipedia. Also, various actions to make Wikipedia more trustworthy are initiated. For example, there are plans for a stable, “1.0” version of Wikipedia, in which articles are peer-reviewed by experts and printed out on paper. Numerous outside experts have organized themselves in the WeMedia project, which does a fact-checking exercise. The self-declared “1.0 Editorial Team” should play an instrumental role in identifying good quality articles, building on the work of the “featured article”. There are also fractions within the Wikipedia community which have taken it upon themselves to check facts and references, although their attempts are currently splintered. The Mediawiki software will also be adjusted for this, gaining features for peer-review and article validation. Users would also be encouraged to sign articles with their real names, in order increase the perception of trust, and to establish credentials of users.
The Mediawiki software is an Open Source effort, built by a large and active group of developers who operate more or less independently of the larger Wikipedia community. Similar to Linux From Scratch, all development files are stored in a versioning system (CVS) and bugs and feature requests are entered into a Bugzilla database. New features are added if someone from the development team implements them and if they are approved by the rest of the team. Access to the team is open to anyone who has meaningful contributions (13). Larger features which are more deterministic for the project’s functioning or direction, such as the article validation features, are only implemented after thorough discussion by the Wikimedia Foundation. This includes the features for the “Wikipedia 1.0” version.
Plans on the roadmap for the Mediawiki software reveals that some of the discussed features are almost ready for implementation. Examples which are listed for future development are “liquid threads” and “Wikiflow”. Liquid threads is a new form of discussion, based on a mix of web forums and wiki discussion pages. The Wikiflow intends to add common publication modes to pages, for example, stub, draft and stable. This will allow readers to better assess the state a page is in, and thus determine the trustworthiness of a page. The most concrete example of a new technical feature added to Mediawiki as a response to the aristocratic interpretation is the addition of an article validation feature. Although it is not declared stable in the current release of the Mediawiki software, it is actively being tested for the next stable version. With a special “Validate” tab, readers can qualify an article based on a number of metrics, for example, to assess the NPOV of an article. These metrics are currently not stabilized, although various proposals have been made. Combining this feature with special user privileges for editorial purposes or a workflow which separates publication from editing is a step closer in the direction of peer review processes, such as planned for “Wikipedia 1.0”.
Chapter 5: Conclusion
This paper started with a research question which asks how on-line collaboration tools change their actors, and reversely, how on-line collaboration tools are changed by the actors involved. For researching this question, a number of specific questions have been asked, which I’ll try to answer in this chapter. I’ve framed these questions in the theoretical background of social informatics and science and technology studies. To gain insights in the actual use of collaboration tools, I’ve analysed two case studies which are representative of collaboration tools unique to the Internet.
The Linux From Scratch case study was examined as a virtual ethnographic study, from which I could draw from my own experiences as a participant in the community. By using the mailing list archives, I could analyse part of the history of LFS and how certain events have influenced the choice and use of collaboration tools. The Wikipedia case study was examined as a discursive practice, which means that the meaning and function of Wikipedia is constructed by the ideas and language of its participants. This research uses both sources from within the extensive documentation published by Wikipedia and it’s organisations, as well as various debates and critiques which were published on the Internet.
In each case study, I’ve stumbled upon problems in the collection and validation of the raw research material. These problems ranged from references which couldn’t be retrieved to the practical limits of limiting the number of articles to examine. I’ve tried to deal with these challenges by being reflexive of my methodology and adjust it to its research object. As Hine suggests, a virtual ethnography is an adaptive ethnography, but it’s still possible to apply the core principles.
The case studies were framed within the theoretical frameworks of Science and Technology Studies, which is concerned with the development of the meaning of technologies through the interpretation which is given to it by socially relevant groups. When a new interpretation of the meaning of the LFS book as a community project emerged in the LFS community, the use of the tools was adjusted to reflect this meaning. Similarly in the Wikipedia project, where the development of the software is directly influenced by the new interpretation of Wikipedia as a trustworthy encyclopaedia. The theoretical frameworks were furthermore framed within the history and development of the relevant disciplines. For categorising and analysing the collaboration tools, I’ve used models from social informatics, which has roots in CMC and CSCW. The SCOT model was placed in the context of technological determinism and social determinism, where a middle ground must be found in the co-construction of society and technology.
In describing the case studies, I’ve placed them in their historical context, showing relationships with the larger narratives of the Open Source software movement in Linux From Scratch, and the ideas and philosophy of the Wiki systems in Wikipedia. Furthermore, each was described with references to first-hand experience, demonstrating the cultural competence which is required by the researcher to understand his object.
Comparison of case studies
Linux From Scratch has a rich community and is deeply rooted in the Linux and Open Source culture. As such, it is no surprise that the choice of collaboration technologies is largely based on the conventions in that context. The use of mailing lists, bugzilla and Subversion is easily attributable to those conventions alone. However, the tools themselves have influence on how the LFS project can manage themselves. The ratio of communication to actual work done on the project files is sometimes disproportionate. The development milestones which I’ve examined clearly show how a new interpretation of the project and its goals emerges, and how this affects the use of collaboration tools.
In Wikipedia, the choice for the use of Wiki software was much more determined by the need of the project founders to attract a wide range of contributors. While it has therefore also inherited much of the problems which plague Wiki’s, it has also managed to circumvent a number of them by keeping a clear project goal. Due to the development of social conventions in the form of key policies and guidelines, Wikipedia has managed to attract a large user-base who write and review a lot of articles. The software which powers Wikipedia was initially adjusted to scale with the growth of the user-base and the number of articles, and is adjusted to be easy to use for newcomers and efficient for more experienced users. Several public debates last year show that there are conflicting interpretations of Wikipedia. The aristocratic interpretation, which defines Wikipedia as a project which can only be authoritative and trustworthy if it manages to attract and use experts, is becoming more dominant. This is reflected in the plans and intentions of the Wikimedia Foundation, and is seeping in new versions of the software that powers Wikipedia.
While there are obviously lots of commonalities between both projects, there are also some differences. These differences can mostly be attributed to the project goals and the choice for collaboration tools which is deducted from it. In broad terms, LFS is more driven by internal conflicts, while Wikipedia is more driven by external influences. This is because LFS has clearly separated its community from the outside world, by using subscription-based mailing lists, which gives it clear boundaries and raises the thresholds for new users. It’s development is thus more inward focused. Wikipedia on the other hand has erased as much boundaries between the participant and the outsider, because it needs to attract as much people as possible by reasoning that more participants raise the quality and volume.
There is one question which is still left unanswered by this paper. Can a dominant interpretation be identified in each project and is there closure of meaning? Both projects which were examined in the case studies have very active communities and are still much in development. Due to their relative youth, there is less stabilisation of meaning and more searching for project definition. The use of certain collaboration tools is the result of choices which were made early on in the development of the projects, such as in Linux From Scratch, where mailing list software was the primary means to communicate with the author and thus establish a community. In the case of Wikipedia, the use of Wiki technology was a technical experiment to attract lots of contributors which proved very successful and was repurposed to fill an open encyclopaedia. These basic uses and understandings of the technology and collaboration tools are unchallenged and stable. In a sense, the case could be made that the meaning and use of the collaboration tools are thus closed. Nevertheless, in both cases the meaning of the collaboration tools are socially defined by certain uses, and the tools are tweaked and adjusted to fit these uses. As the meaning of the projects are changed due to new interpretations from socially relevant groups, the tools are adjusted to reflect this. So while there is a basic understanding of what the technology does, it’s use is continually refined and reflected upon to adjust to new interpretations. The technology and the community are thus in myriad ways co-constructed.
- The term ’hackers’ is not to be confused with the popular meaning of the word, which most commonly refers to computer burglary. In this context, a hacker refers to a programmer, while a cracker refers to computer burglary. See the ’official’ hackers dictionary: http://catb.org/~esr/jargon/html/H/hacker.html and http://catb.org/~esr/jargon/html/C/cracker.html
- The 4.1 version can be bought at http://cart.cheapbytes.com/cgi-bin/cart/0970010001.html and the 6.0 version can be bought at http://cart.cheapbytes.com/cgi-bin/cart/0970010004.html. Note that the latter is titled ’Second Edition’, which was done on the publishers’ request to avoid reader confusion outside the regular Linux From Scratch community, who aren’t aware of the book’s development cycle.
- In a previous research, I asked about how users entered the community. The responses showed a clear trend. See eg. http://linuxfromscratch.org/pipermail/lfs-book/2002-February/003338.html, http://linuxfromscratch.org/pipermail/lfs-book/2003-July/003573.html, http://linuxfromscratch.org/pipermail/lfs-book/2002-February/003340.html
- The lfs-dev archives of november 2003 to may 2004 have 1517+670+1370+106+723+352+684=5422 messages posted to them.
- See http://linuxfromscratch.org/pipermail/lfs-dev/2002-September/028426.html and http://linuxfromscratch.org/pipermail/lfs-dev/2002-September/028283.html for examples.
- For an interesting discussion on non-users, see Wyatt (2003), in Pinch & Oudshoorn (eds.). In LFS, these users are either the generic LFS reader who isn’t subscribed to a mailing list, or the so-called ’lurker’, a passive subscribed mailing list member.
- A summary of all changes is collected at http://www.linuxfromscratch.org/website.html.
- A ’fork’ is a separated development path independent of the original project, which in general is frowned upon in the Open Source community. The BE-LFS book was initially hosted on a different server then the Linux From Scratch server. Also, the people who contributed to it all collaborated via private e-mail or via a separate IRC channel.
- In “The Wiki Way” (Leuf & Cunningham, 2001), the authors refer to “Wiki” as the concept and to “wiki” as a technical implementation. This paper adheres to that convention.
- In June 2005, the WorldWideWiki listed 975 public Wiki’s.
- See http://en.wikipedia.org/w/index.php?title=Special:Listusers. Not all users are listed due to technical limitations.
- The original article link was at http://www.syracuse.com/news/poststandard/index.ssf?/base/news-0/1093338972139211.xml, but the actual content is missing. There is a copy at the author’s homepage at http://aroundcny.com/technofile/texts/mac082504.html.
- See http://meta.wikimedia.org/wiki/Development_policy and http://meta.wikimedia.org/wiki/How_to_become_a_Mediawiki_hacker.
- Andriessen, J.H.E.
(2003). Working with Groupware. Understanding and Evaluating Collaboration Technology. London: Springer-Verlag.
- Baym, N.
(1998). The Emergence of On-Line Community. in: Jones, S.G. (ed.), Cybersociety 2.0: revisiting computer-mediated communication and community. Thousand Oaks, Calif.,: Sage.
- Bezroukov, N.
(1999). Open Source Software Development as a Special Type of Academic Research (Critique of Vulgar Raymondism). First Monday, volume 4, number 10 (October 1999). [Online] Available: http://firstmonday.org/issues/issue4_10/bezroukov/index.html
- Bowden, G.
(1995). Coming of Age in STS. in: Jasanoff, Markle, Petersen & Pinch (eds.), Handbook of Science and Technology Studies. California: Sage, 64-80.
- Bowker, Star, Turner, Gasser (eds.,)
(1998). Social Science, Technical Systems, and Cooperative Work. Beyond the great divide. London: Lawrence Erlbaum Associates.
- Ciffollili, A.
(2003). Phantom authority, self-selective recruitment and retention of members in virtual communities: The case of Wikipedia. First Monday, volume 8, number 12. [Online] Available: http://www.firstmonday.org/issues/issue8_12/ciffolilli/
- Hine, C.
(2000). Virtual Ethnography. London: Sage.
- Iannaci, F.
(2005). Coordination processes in Open Source Software development: The Linux Case Study. London School of Economics. [Online] Available: http://opensource.mit.edu/papers/iannacci3.pdf
- Jones, S.G.
(1998). Cybersociety 2.0: revisiting computer-mediated communication and community. Thousand Oaks, Calif.,: Sage.
- Kling, R.
(1999). What is Social Informatics and Why Does it Matter?. In: D-Lib Magazine, Volume 5 Number 1. [Online] Available: http://www.slis.indiana.edu/faculty/kling/pubs/kling99_01.pdf
- Leuf, B. & Cunningham, W.
(2001). The Wiki way: quick collaboration on the Web. Boston, MA: Addison-Wesley.
- MacKenzie, D. and Waicman, J.
(1999). The Social Shaping of Technology. Second Edition. Philidelphia: Open University Press.
- Matzat, U.
(2001). Social networks and cooperation in electronic communities: a theoretical-empirical analysis of academic communication and internet discussion groups. Amsterdam: Thela Publishers. [Online] Available: http://www.ub.rug.nl/eldoc/dis/ppsw/u.matzat/
- Miller, D., and Slater, D.
(2000). The Internet. An Ethnographic Approach. New York: University Press
- O’Reilley, T.
(2004). The Open Source Paradigm Shift. [Online] Available: http://tim.oreilly.com/opensource/paradigmshift_0504.html
- Pinch & Oudshoorn (eds.)
(2002). How Users Matter. The co-construction of users and technology. Massachusetts: MIT Press
- Sawyer, S. and Rosenbaum, H.
(2000). Social Informatics in the Information Sciences: Current Activities and Emerging Directions. [Online] Available: http://inform.nu/Articles/Vol3/v3n2p89-96r.pdf
- Smith, M.R.m and Marx, L.
(eds.), (1994). Does technology drive history? The dilemma of technological determinism. Cambrigde: MIT Press.
- Stallman, R.
(1995). The Free Software definition. [Online] Available: http://www.gnu.org/philosophy/free-sw.html
- Tonkiss, F.
(2001). Analysing Discourse. In: Seale, C. (ed), Researching Society and Culture. London [etc.]: Sage.
- Tuomi, I.
(2001). Internet, Innovation, and Open Source: Actors in the Network. First Monday, volume 6, number 1. [Online] Available: http://firstmonday.org/issues/issue6_1/tuomi/
- Winner, L.
(1999). Do artifacts have politics? In: The Social Shaping of Technology. Second Edition. Philidelphia: Open University Press.