Where to Share Your Data?
by Ben Mazzotta
Thursday, July 16, 2009 at 06:29 PM EDT
There are many competing standards out there for how to publish datasets with due credit to the author and publisher. Rich, structured metadata and interoperable standards for data identification are rapidly developing, but itâ€™s not clear which standard is going to win the day or which search engines will successfully organize all that data.
Here is my short wish list for data standards:
Once structured information about datasets is available in an open format, aggregators will step into the breach and serve that information to researchers, along with well maintained links to the data publishersâ€™ sites. The main barrier to aggregating and indexing information about datasets is common standards, not a dearth of universities and companies willing to do the job.
Academics seem to have one set of standards, and government websites another. Upstart companies (both for-profit and not-) are piling into this space. Their websitesâ€™ grandiose claims to universal data cataloging are completely at odds with the slim pickings youâ€™ll find if you bother to visit.
Every discipline has a relatively small number of expert publishers that aggregate information for researchers. Access to the most important datasets may be open or closed, but the links are gathered together by professional societies. Academic departments and courses also have resource portals for students and practitioners.
Up until recently, the vast majority of datasets were housed on purpose-built websites. For a book, this would be the equivalent of having a single library for each book, or at best for each publisher. We need the equivalent of a library for online datasets. Even though publishers ultimately retain authorship and of (and control over) their data, it is senseless that librariesâ€™ search engines for datasets lag so far behind search engines for books and journal articles.
Bibliographic software (Endnote, BibTeX, Zotero) needs to catch up too. Common practice in academics is now to cite the datasets used in your writing, if you are using any publicly available data as correlates for the primary observations in your study.
This article originally appeared on Ben Mazzotta's Weblog.