Thesis/Design

From Researchwiki

Jump to: navigation, search

The last chapter reviewed the literature in the field, and identified some gaps in the knowledge, and possible questions of study. This chapter builds on from those questions, defining the research methods that will be presented in later chapters.

The research here focused on one question identified from the literature review. The question asked was as follows.

Are there any mechanisms by which the community (or a subset) can provide some level of authority?

This research tested a few simple mechanisms in the attempt to provide an answer to this question.

One proposed method of making the validating process of articles easier is to identify good articles to validate, thus eliminating poor articles from ever beginning, and ultimately failing the validation process. This research focused on two such mechanisms, user ratings, and attention data. The following general questions were investigated by this research, using the methods presented in this chapter:

  • Can ratings made by users be reliably used to identify quality articles?
  • When users are asked to rate an article, what exactly are they rating?
  • Can attention data be used in an wiki environment to estimate the quality of an article?

In order to answer these questions, this research uses a range of research methodologies to process different types of data, and to triangulate results from these different sources to provide a clearer picture (Preece 2000). Considering the little research done in the field, this research looks at a wiki in a naturalistic setting, focusing on aspects of quality arising from a real-world wiki community.

The community studied was one created for this research, formed from the students of a "computer supported collaborative work" subject at CSU. Using this group of participants also allowed observations to be made of their participation in an authentic learning setting regarding the following questions from the literature review:

  • How do smaller wikis behave differently from Wikipedia?
  • How do wikis differ when the Wikipedia style limitation prohibiting first hand data or subjective experiences is removed?

And also the question:

  • How effective are wikis in educational settings?

Contents

Justification for the paradigm and methodology

Myers (1997) and Straub, Gefen and Boudreau (2005) explain the two major forms of research, qualitative and quantitative research.

  • Quantitative research was originally designed to study natural phenomena, and deals with numeric data collection. Methods include surveys, experiments, and various formal methods
  • Qualitative research was designed for studying social and cultural phenomena using methods such as case studies, action research and ethnography, dealing with descriptive data.

This research uses both forms of research in analysing different sets of data. Preece (2000) presents an expanded set of research approaches for application in the study of online communities, and summarises the methods into the matrix of evaluation types in table 3.1.

Evaluation TypeQualitative DataQuantitative Data
Subjective Ethnographic data, for example, interviews, observations, artifacts are interpreted by ethnographers Questionnaires, for example, take subjective input, then express it using numeric rating scales
Objective For example, content analysis categorises user comments seeking to identify patterns and frequencies For example, usage logs generate data that is statistically analysed

Table 3.1 "matrix of evaluation types" (Preece 2000, pg. 305)

This research combined two formal data collection methods, with several informal methods. The literature review, presented in chapter two, identified opportunities for research, allowing questions to be selected for further investigation and a suitable methodology to be build. This research continued to implement a wiki, and design a rating system to be trialled in a naturalistic setting, while log data is gathered. Building from the results of this wiki experiment, a survey was conducted. The survey results were used to validate results from the experiment, and answer questions arising from the experiment. This process is summarised in figure 3.1.

Figure 3.1 "A summary of the research procedures"
Figure 3.1 "A summary of the research procedures"

(Preece 2000) details five approaches to research in online communities. These are reviews, surveys, observations, experiments, and data logging. This research employs three of these methods, as explained below.

Data collection occurred in two phases. Firstly by collecting web server logs, from a running wiki, extended with a rating system, secondly from a follow-up survey distributed to the users of the wiki. The server logs allowed participants behaviour to be analysed, while the survey aided in interpreting results from the server logs, and helped to answer any remaining questions. Observations from the wiki, and the development process were also recorded.

The wiki experiment employs data logging to collect data. From this data users' actions can be quantified for statistical analysis. The "wiki experiment" is not necessarily considered a formal experiment. It is considered to be a pre-experimental method, using a single group of participants, and with few controls (Tanner 2002), however, it does not fit the definition presented by Preece, as the experiment itself does not employ the manipulation of controls to directly test hypotheses. It was endeavoured in fact, to avoid placing controls upon the community, in order to monitor the community as it develops on its own, and how members behave and interact freely within the community.

Surveys were employed in the final phase of data collection, where participants were asked to answer a questionnaire, comprising of questions arising from observations and results from the wiki experiment phase. This was aimed to determine users' perceptions, opinions, feelings and motives regarding their participation in the wiki.

Observation was used in both the design phase and wiki experiment phase. Observation was used in an informal manner, but was useful for providing an extra layer of detail. Observation allowed subjective data to be elicited where there was otherwise no formal data collection. In the design phase observation was used to gain and present a general understanding of the process of writing a MediaWiki extension, which is otherwise poorly documented. In the wiki experiment phase, it allowed the researcher, as an active member of the wiki, to gain a better understanding of the dynamics of the community.

Returning to Preece's matrix of data and evaluation types above, this research covers each of the four categories, as shown in table 3.2.

Evaluation TypeQualitative DataQuantitative Data
Subjective Observations during development and reviewing comments from survey data survey data, captured using Likert scales
Objective Analysing comments from survey data Studying Wiki Logs

Table 3.2 "data collection in this research grouped according to Preece's matrix of evaluation types"

Implementation of the wiki, including a rating mechanism involves the use of systems development procedures. Nunamaker, Chen and Purdin (1991) explain that systems development can be used as a tool in research for developing an artefact to be studied, either through its use as a proof-of-concept following a theory building exercise, or the focus of an experiment, such as a naturalistic trial. This research uses a wiki system as the main tool in the experiment. This system and its implementation will be detailed later in this chapter.

Research procedures

This research determines how a rating system might be used in a wiki environment, its effects on the community, and usefulness as a measure of article quality. To achieve this, a wiki community was established and monitored, by implementing a wiki extended with a simple page rating mechanism. The rating mechanism allowed users to rate an article within the wiki, using a standard 5-star rating. A group of undergraduate students partaking in an undergraduate IT subject were invited to the wiki.

Students as part of their study were asked to undertake assignments enforcing typical wiki usage through the following guidelines

  • write 1000 words (approximately 2 articles) or word equivalent (words could be split across more articles, by contributing to other partial articles)
  • rate other articles you read
  • be creative, let ideas flow, making students search and rate, and produce content

These activities were designed to encourage an organic (as per Cunningham's ideals, see 2.2.2.1) site, where students may otherwise not have been used to studying this way.

At the completion of the experiment, a follow-up survey was issued. This data was used to aid in the interpretation of the results from the wiki logs.

The following data was collected

  • Article Content - The content created by users of the wiki. This includes any textual content, discussions, comments, and historic revisions of articles.
  • Ratings - As determined by users of the wiki through the embedded rating system.
  • Server logs - allowing user behaviours to be monitored, to see how users interact with the system, and to determine time spent using the system, including attention data.
  • Survey results - General demographics, users' computer self-efficacy, and user thoughts on the use of the wiki and rating system.
  • Observations throughout the development of the rating mechanism
  • Observations from the live wiki community

Once data was collected, articles were analysed to determine an objective measure of quality. These measurements provided baseline measure of the quality of articles, for comparison with system generated ratings.

Ethical considerations

This research involved human participants contributing to a wiki site, and analysing contributions and actions when using that wiki. This creates the possibility for participants to be ill-affected by this research. The possibility of this was kept to an absolute minimum.

Study of this wiki was done with permission from the server administrator and the CSU Ethics Committee. Students from the ITC213 subject were required to participate in this wiki as part of their class work. The requirements of this participation were kept general, and in keeping with the subject objectives, keeping disruption to a minimum. No requirement was made for students to use their real identity within the system (although their identity within the system must be revealed to the subject assessors for marking). No personally identifiable information within the system was used for this research, and all data was de-identified before analysis. No identifying data or large samples of raw data was released as part of this research.

Design

To facilitate the data collection, a live wiki server was implemented. The wiki system chosen was MediaWiki, for several reasons. MediaWiki is one of the more common wiki systems in use. It is well known by many Internet users, and provided the necessary features to allow student work to be tracked to allow a lecturer to assess students' work. MediaWiki runs on a standard Apache/PHP/MySQL stack, a common and trusted set of applications installed on many web servers by default. Netcraft Ltd. (2006) report Apache to have about 60% market share, and Seguy (2006) reports PHP to have about 35% market share in September 2006. The MediaWiki installation is an easy automated process that detects its environment and configures itself accordingly (shown in figure 3.2). Due to its popularity, and despite the poor state of the official documentation, there is an abundance of support and development information available which is useful for setting up and maintaining the engine. The MediaWiki engine is also designed with extension in mind, by supporting an increasing set of hooks, providing a convenient interface for adding functionality to the engine.

Figure 3.2 "A screenshot of the installation of MediaWiki 1.7.1."
Figure 3.2 "A screenshot of the installation of MediaWiki 1.7.1."

MediaWiki Extension

A PHP extension was written and applied to the MediaWiki installation to add a page rating mechanism.

This rating mechanism added a set of five stars to each page in the User, Project, Image, Template, Help, Category and Main namespaces, as well as their associated talk namespaces (see 2.2.3.1). The stars allowed users, with a single click, to rate a page from one to five. This rating was stored and averaged by the system, and presented to the user both via feedback in by colouring the stars (to count the rating), and a three-digit numeric rating. A minimum of three ratings from different users was required before a rating was shown. Until this number of ratings was satisfied, the rating stars and text was highlighted with the intention of drawing the user's attention to rating the page.

The extension is a single PHP file of about 600 lines, accompanied by a set of small images providing the stars for the rating mechanism. This PHP file was activated with the addition of a single "require" instruction added to the MediaWiki configuration file, instructing the software to load the extension whenever the software was executed.

The extension made several "hooks" into the MediaWiki software. These were three parser hooks, an OutputPageBeforeHTML hook, and a SkinTemplateSetupPageCss hook.

The SkinTemplateSetupPageCss hook instructed MediaWiki to add CSS code to its HTML output. The CSS added graphical formatting instructions for the rating mechanism. The OutputPageBeforeHTML instructed MediaWiki to call upon the extension after any page is generated, but before it is returned to the user. This allows the extension to add the rating mechanism to the top of the page.

The three Parser hooks informed MediaWiki that three new tags should now be allowed in wikitext, and that when found, the extension should be called to process them. The three tags allow the dynamic content to be added to a page, one showing the rating for a specified page, one showing the number of ratings for a specified page. The third showing a list of pages and their ratings, sorted and filtered by several factors, and displayed using a variety of formats (see Appendix 1).

The stars, when clicked, instruct the browser to visit a page. Downloading this page causes the rating to be recorded for the page the user was viewing, clears the MediaWiki page cache for that page, which allows it to be updated with the new rating. The browser is then instructed to return to the updated page. The links triggered when the user clicks on a star are tagged with rel="nofollow" tags. This instructs spiders not to follow these links (Google Inc. 2005), as doing so would trigger the rating mechanism, causing false ratings.

The extension makes use of two database tables, ratings and ratings_cache. The ratings table stores each rating made within the system, using three fields, the ID of the page, the ID of the user making the rating, and the numeric rating itself. The rating_cache table summarises the rating for each page revision by storing the page ID, the calculated page rating, and the total number of ratings made for that page. The record for a page is updated whenever a rating is made for that page. If a user rates a revision twice, the previous rating is ignored, replaced by the most recent rating. The ratings_cache table is not necessary, but allows the system to provide prompt responses for tasks such as displaying the top ten ranked pages. This task would otherwise require the calculation of ratings for every page, an increasingly time consuming task as the wiki grows.

Code in the extension is called in two ways. Normally functions are called by the MediaWiki engine, however to provide the rating submission, as well as returning of image and CSS files, code in the main scope of the program (not in any function) interrupted the normal execution of the MediaWiki engine.

Returning files is a fairly simple operation. If a request is detected, the plug-in would return the file and end the execution of the script, before the MediaWiki engine can load.

Recording ratings however required access to the database. Although PHP provides standard functions for accessing a MySQL database, the obvious solution to this problem is to use the MediaWiki functions. Using these functions allows access to the standard set of queries typical for retrieving data from the MediaWiki database. The MediaWiki documentation however is not comprehensive enough to explain how the database components are loaded. To solve this problem, if database access was required, the entire MediaWiki engine was loaded, making available the required functions. This solution is inefficient, but sufficient to solve the problem given the small scale on which the extension is intended to implemented.

Writing to the database using the manually loaded engine however uncovered another problem. The MediaWiki engine does not perform write queries until the end of the execution. Normally this would not cause a problem, however because the MediaWiki engine was not run to completion, it never executed these queries. The poorly documented Database->immediateCommit() function was used to force the engine to flush data out to the database so that the script could be safely terminated.

Wiki

MediaWiki 1.7.1 was the version implemented (the most up to date stable version at the time of installation), it was configured for public access, and seeded with a set of pages explaining the purpose of the wiki, instructions for editing the wiki, and the set of assignment tasks to be completed by the students.

The standard installation script was run, and after providing the information required, the script initialised the database, and created a configuration file (see figure 3.2 for an example output). The configuration file was modified to allow users to upload files of any type (manual monitoring of the wiki ensured this was not abused), and to enable the rating mechanism.

The seed pages provided communal areas for communication and finding information relevant to the students' tasks. Certain key pages were omitted, with the intention that students should create these create based on instructions provided. Example pages were also added, as a guide to students when writing their own pages.

Over the course of the experiment, the wiki was monitored, and kept tidy by the researcher (who filled the "janitorial" role, see 2.2.2.4.2), following the guidelines set out by the Wikipedia Contributors (2006).

With no set due date for completion of POD exercises, contributions to the wiki were made in fairly regular intervals. It was observed that most students would post an entire article with a single edit. It seemed that students preferred to draft their writing in an off-line system (perhaps a word processor). There were however a few exceptions, where students may post a paragraph at a time. There was little modification to text however, after its original submission.

The exception to this contribution style were the POD Group pages, pages students were specifically asked to use to collaborate between members.

... create an article for your POD group. Have each member add a short description of themselves, including your full name, a link to your user page, and whether you are a distance or internal student. Use this page to communicate with the world what your group is doing.
POD Exercise 1, see Appendix 3.1.1

These pages were not created when seeding the wiki, they were left for students to create. Much editing of these pages was observed, usually by several members of the group. These pages were frequently updated with often minor presentation changes, following each others leads, and fixing each others mistakes. It seemed these pages were perceived common property, whereas where a single student posted their writing on a page of its own, students would not venture to interfere.

POD Exercises

Participants in the wiki were students from the ITC213 undergraduate class for spring 2006. ITC213, or Computer Supported Collaborative Work teaches a range of social and technical topics surrounding online communities. The class was volunteered by the lecturer and coordinator Ken Eustace. The subject is a very hands on subject where students collaborate using various collaborative tools (Charles Sturt University 2006).

Students in the class are placed into Pools of Online Dialogue (POD) groups, and in those groups, complete fortnightly collaborative exercises (Eustace 2006). This research was designed to provide activities for the first two POD exercises.

The POD exercises were designed with three goals in mind:

  • Provide a suitable exercise for learning and assessment
  • Generate valuable content for the wiki
  • Promote typical wiki behaviour

A wiki does not generally have a set of tasks that each user completes, rather there may be a to-do list that volunteers may complete, with no requirement to do so. For the participants of this research however, it was required that there be a compulsory element to the activities to ensure participation, and to facilitate the academic assessment of the students work.

The wiki's long term plan is to be used by academics studying instructional gaming. The design of the exercises attempted to influence the content being created to provide useful springboards for research by academics. It encouraged students to think critically about how games related to theory in areas such as communication or social interaction.

The set exercises attempted to set, where possible, few limitations regarding the type of interaction within the wiki. This was to try to accurately simulate a wiki community, where there are generally no such limitations. Students were required to contribute 500 words for each of the two activities, however the distribution of these 500 words was not limited. The words could be "spent" adding to existing articles, or pooled with a group of people to generate a larger article.

Survey

A follow-up survey was conducted after initial analysis of wiki data. This survey was designed to provide information useful in interpreting the wiki data, and answering any resulting questions. The survey was designed to achieve the following goals:

  • Determine general demographics
  • Determine computer self-efficacy of participants
  • Discover participants previous experience with online tools
  • Determine what the participant has learned about wikis
  • Probe for factors affecting levels of participation in the wiki, including:
    • Difficulties with POD exercises, or online game
    • Difficulties submitting to the wiki, and perceptions of the wiki
  • Determine usefulness of rating mechanism:
    • When rating "friends" vs. "fiends"
    • In terms of self meta-moderation - or how confident participants are about their own rating submissions
    • What factors were more important in rating? Content, quality writing, or graphical appearance?
    • Did the rating mechanism make sense? Was it understandable and useful?

A full listing of the survey questions are provided in Appendix 2.

Conclusion

This chapter defined the methods used in this research, and detailed the design of the software and wiki, including observations made during those processes. The next chapter will present an analysis of the results collected using the methods presented here.

Bibliography

Charles Sturt University 2006, 2006 Undergraduate Handbook, electronic version, viewed 8 November 2006, <http://www.csu.edu.au/handbook/subjects/ITC213.html>.

Eustace, K 2006, ITC213 Subject Outline, electronic version, Charles Sturt University, viewed 9 November 2006, <http://ispg.csu.edu.au/subjects/cscw/pods/instructions>.

Google Inc. 2005, 'Preventing comment spam', last edited 18 January, viewed 19 October 2006, <http://googleblog.blogspot.com/2005/01/preventing-comment-spam.html>.

Myers, MD 1997, 'Qualitative Research in Information Systems', electronic version, MISQ Discovery, 20 May, pp.241-242, viewed 13 September 2006, <http://www.misq.org/discovery/MISQD_isworld/>.

Netcraft Ltd. 2006, 'October 2006 Web Server Survey', last edited 6 October, viewed 19 October 2006, <http://news.netcraft.com/archives/2006/10/06/october_2006_web_server_survey.html>.

Nunamaker, JF, Chen, M & Purdin, TDM 1991, 'Systems Development in Information Systems Research', Journal of Management Information Systems, vol 7 no 3, pp.89-106.

Preece, J 2000, Online Communities - Designing Usability, Supporting Sociability, John Wiley & Sons, West Sussex, England.

Seguy, D 2006, 'PHP statistics for August 2006', last edited 4 September, viewed 19 October 2006, <http://www.nexen.net/chiffres_cles/phpversion/php_statistics_for_august_2006.php#global>.

Straub, DW, Gefen, D & Boudreau, MC 2005, 'Quantitative Research', electronic version, Pries-Heje, D.AaJ. (ed.), pp.221-238, viewed 13 September 2006, <http://dstraub.cis.gsu.edu:88/quant/>.

Tanner, K 2002, Experimental research designs, in Research methods for students, academics and professionals: Information management and systems, Williamson, K (ed.), Centre for Information Studies, Charles Sturt University, Wagga Wagga, NSW, Australia.

Wikipedia Contributors 2006, 'Wikipedia:Don't bite the newcomers', last edited 7 November, viewed 9 November 2006, <http://en.wikipedia.org/w/index.php?title=Wikipedia:Please_do_not_bite_the_newcomers&oldid=86176581>.

Personal tools