Creative Commons License

Encyclopedia of Life Curates Wikipedia’s Species Articles

Friday, December 10, 2010 at 06:55 AM EST

There are more than 1.9 million animals, plants, and other forms of life on Earth. In May 2007, some of the world's leading scientists announced the development of the Encyclopedia of Life (EOL) to document them all. Inspired by biologist E. O. Wilson's TED Wish and supported by more than $25 million in funding, the project aggregates and makes accessible information about species ranging from 19th century journals to modern online databases.

See the page about Solanum lycopersicum, the garden tomato, as an example. Much of the information comes from Solanaceae Source, a specialized source of names lists, species descriptions, specimen collections and publication lists for the genus Solanum. The Biodiversity Heritage Library provides historical public domain texts about the species from various published journals. Many other specialized and general resources contribute to the overall species page.

A Wikipedia article included in an Encyclopedia of Life species page. The yellow background indicates that no curator has reviewed the content yet. Click the image to enlarge.

You'll also find a "Wikipedia" entry in the table of contents. It reveals a copy of the Wikipedia article about tomatoes. As of this writing, the article text has a yellow background.

This means that an Encyclopedia of Life curator has not yet reviewed the content for inclusion in EOL. An EOL species page can have one or more curators who select and validate information added to EOL pages. Wikipedia articles, where they exist, are included by default.

Once the article has been validated by a curator, the yellow background is removed. The information for curators and curation standards pages on EOL give some additional background on the curation process, which applies to all content objects in EOL. Specific guidelines have been written for curation of content from Wikipedia and Wikimedia Commons. We're particularly pleased that EOL encourages its curators to improve Wikipedia directly if errors or omissions are found.

So far, more than 200 Wikipedia articles have been reviewed through this process. Reviewers classify the information as follows:

  • trusted' -- reviewed by curator and not deemed to contain substantially incorrect information
  • ‘untrusted' -- reviewed by curator and deemed to include incorrect or unverifiable information
  • ‘inappropriate' -- reviewed by curator and deemed to not be eligible for inclusion in EOL for other reasons (e.g. too short to add value)

EOL makes the entirety of all review information (who reviewed what when, with what outcome) available through an Atom feed. This means that Wikipedians, and others, can use this information easily in the development of new applications.

The book creator tool makes it possible to order a printed and bound book from any Wikipedia article selection. A custom cover can be chosen. Nautilus photograph by Lee Berger, Creative Commons Attribution/Share-Alike License. (Click to enlarge.)

A proof-of-concept for expert reviews

Magnus Manske is a biochemist and programmer at the Sanger Institute in the United Kingdom. He is also a long-time Wikimedia volunteer, and wrote the first version of the PHP software used by Wikipedia, which later became MediaWiki. As a scientist, Magnus has advocated for the scientific community to use and improve Wikipedia, most recently as co-author of the paper Ten Simple Rules for Editing Wikipedia.

I informed Magnus about the new EOL review information, and suggested that we might want to explore using this information to generated printed books or PDF collections of reviewed articles. The software for exporting Wikipedia articles into books already exists, so it was just a matter of putting two and two together.

So, Magnus used the available data feed to create an automated tool that creates a list of all EOL-reviewed article versions in a form that can be used by Wikipedia's book tool.

This makes it possible to download a PDF file or order a printed book that only contains EOL-reviewed versions of Wikipedia species articles.

To try it out, visit the page for Magnus' example book. Click "Download PDF" to generate the (very large) PDF file that contains all the species articles, or "order printed book" to preview or order a printed book from PediaPress (which, as of this month, also offers books in color and hardcover format). If you want to remix or play with the book further, you can click "Open book creator".

We're very pleased with this first proof-of-concept, and are grateful to the Encyclopedia of Life team for engaging its community in the curation of Wikipedia articles. Both parties benefit: The Encyclopedia of Life enriches its species pages using the often well-developed Wikipedia content. Wikipedia benefits because EOL's trusted reviewers add their stamp of approval to Wikipedia articles, which helps Wikipedia readers and editors alike. Where EOL reviewers do not approve, they are encouraged to edit the Wikipedia article.
I asked Bob Corrigan, EOL Product Manager and Acting Deputy Director, to give his take on this project. He writes: "This is definitely a win-win partnership. EOL is focused on providing very deep, structured access to trusted biodiversity information from our network of content partners and curators, and vetted Wikipedia articles can be a terrific gateway to this information. We see a closer relationship with Wikimedia as an important way to expand access to global knowledge about life on Earth."

Hardcover book made from curated Wikipedia articles. Photo credit: Guillaume Paumier; Nautilus photograph by Lee Berger. Creative Commons Attribution/Share-Alike License 3.0

Example page from the book. Photo credit: Guillaume Paumier; Nautilus photograph by Lee Berger. Creative Commons Attribution/Share-Alike License 3.0

A replicable model

Magnus' implementation was already created with an eye to future extensibility. If you're inclined to take a closer technical look, check out Magnus' "Sifter-Books" script which generates the book data, and can potentially support multiple partner institutions/organizations providing article reviews. As of the time of this writing, Magnus has already added two additional groups who review Wikipedia articles, Rfam and Pfam, databases of RNA and protein families.

Moreover, Magnus has written a small proof-of--concept script which makes the existence of reviews visible on Wikipedia itself. You need to create a user account on the English Wikipedia and follow the installation instructions to use the script. Once installed, a "Reviews" tab will indicate available article reviews.

We look forward to exploring similar partnerships with subject-matter experts in institutions (like universities and libraries), scientific associations, and specialized knowledge communities. If you're interested in this model, drop me a note (erik at wikimedia dot org).

Erik Moeller
Deputy Director, Wikimedia Foundation
Representative of Wikimedia in the Encyclopedia of Life Institutional Council