Last week there was a flurry of comments around a post by Bret Taylor, We need a Wikipedia for data. Taylor describes a model for a wiki that would aggregate common data in one database that could be cross-searched. Great idea.
One interesting thing about the types of datasets he mentions are that they are all copyrighted - stations own TV schedules, exchanges own market data (the free stuff is usually 20 minutes delayed) and a variety of companies own publishing rights over telephone numbers. This is the data that could be really useful if it was truly free, but given the amount of updating required, I wonder who would do so without a business or legislative imperative.
But that issue is perhaps besides the point. There are many, many incredible datasets out there, everything from Census data to older market information to astronomy. Reading the comments and suggestions on Taylor’s post and Read/Write Web’s post about the topic revealed dozens of sites to find these resources.
I did feel that looking through the list libraries may have missed an opportunity. We have been recommending and linking to various datasets on our websites for years, but there is a huge potential to go beyond this and build something collaboratively and use it as an input for different libraries. Many libraries now take in Open Access Journal records to their catalogues and search engines via DOAJ but there is no reason to not do something similar for Open Data.
Certainly, it is an issue that few of these datasets can talk to eachother - but perhaps the move towards a more standards-based Semantic Web will encourage standardisation and interoperability, at least within, for example, individual government departments so that Census records can be analysed against education records.
One of the sites recommended by Read/Write Web is CKAN, which is backed by the Open Knowledge Foundation that counts someone who has worked in the library sector amongst their leadership. Are these the types of groups more of us should be involved in to have a role in information access on a larger scale?
Originally published on the semanticlibrary.net blog