An openwetware blog on the challenges of open and connected science

US trip Oct/Nov 07

Sourceforge for science

I got to meet Jeremiah Faith this morning and we had an excellent wide ranging discussion which I will try to capture in more detail later. However I wanted to get down some thoughts we had at the end of the discussion. We were talking about how to publicise and generate more interest and activity for Open Notebook Science. Jeremiah suggested the idea of a Sourceforge for science; a central clearing house somewhere on the web where projects could be described and people could opt in to contribute. There have been some ideas in this direction such as Totally retrosynthetic but I don’t think there has been a lot of uptake there.

This was all tied into the idea of making lab books findable and indexed in places where people might look for them. I have been taken with the way PostGenomic and ChemicalBlogSpace aggregate blogs, particularly blog posts on the peer reviewed literature and in the case of ChemicalBlogSpace aggregate comments on molecules, based on trawling for InChi Keys (I think). So can we propose that one of (both of?) these sites start aggregating online notebook posts? If we could make these point at peer reviewed papers online it would also be possible to use a modified version of the Blue Obelisk Grease Monkey that would popup whenever you were looking at a paper for which there was raw data online.

It wouldn’t be necessary, or perhaps even advisable, to limit these to people strictly practising Open Notebook Science. People could put up data once a paper was published or after a delay. Perhaps we could not even require that all the raw data be put up. If the barriers are lowered more people may do it. A range of appropriate tags (’Partial Raw Data is available for this paper’, ‘Full raw data is available for this paper’, ‘Full raw data and associated data is available as an open notebook’,) would distinguish between what people are making available. Data could be dropped anywhere online and by aggregation it gains more visibility encouraging people to move from making specific data available towards making all their data available.

Any thoughts?

Discussion with OpenWetWare people

This morning I got to sit down with Bill Flanagan, Barry Canton, Austin Che, and Jason Kelly and throw some ideas around about electronic notebooks. This is an approximate summary of some of the points that came out of this. This may be a bit of brain dump so I might re-edit later.

  1. Neither Wikis nor Blogs provide all the functionality required. Wikis are good at providing a framework that within which to organise information where as blogs are good at logging information and providing it in a journal format. Barry showed me a hack that he uses in his Wiki based notebook that essentially provides a means of organising his lab book into experiments and projects but also provides a date style view. In the Southampton system we would achieve this through creating categories for different experiments, possibly independent blogs for different projects.
  2. Feature requests at Southampton has been driven largely by me which means that system is being driven by the needs of the PI. At OpenWetWare the development has been driven by grad students which means it has focussed on their issues. The question was raised of where the best place to ‘promote’ these systems was. Is it the PI’s who, at least at the moment, will get the greatest tangible benefits from the system. Or is it better to persuade grad students to take this up as they are the end users. Both have very different needs.
  3. Development based on the needs of a single person is unlikely to take us forward as the needs of a specific person are probably not general enough to be useful. Development should focus on enabling the interactions between people, therefore the minimum size ‘user unit’ is two (PI plus researcher, or group of researchers).
  4. The biggest wins for these systems are where collaboration is required and is enabled by a shared space to work in. This is shown by the IGEM lab books and by uptake by my collaborators in the UK. This will be the best place to take development forward.

I need to add links to this post but will do so later.

Talks on Open Notebook Science - some initial thoughts

So I have given three talks in ten days or so, one at the CanSAS meeting at NIST,  one at Drexel University and one at MIT last night. Jean-Claude Bradley was kind enough to help me record the talk at Drexel as a screencast and you can see this in various formats here. He has also made some comments on the talk on the UsefulChem Blog and Scientific Blogging site.

The talks at Drexel and MIT were interesting. I was expecting the focus of questions to be more on the issues of being open, the risks and benefits, and problems. Actually the focus of questions was on the technicalities and in particular people wanting to get under the hood and play with the underlying data. Several of the questions I was asked could be translated as ‘do you have an API?’. The answer to this is at the moment no, but we know it is a direction we need to go in.

We have two crucial things we need to address at the moment: the first is the issue of automating some of the posting. We believe this needs to be achieved through an application or script that sits outside the blog itself and that it can be linked to the process of actually labelling the stuff we make. The second issue is that of an API or web service that allows people to get at the underlying data in an automated fashion. This will be useful for us as we move towards doing analysis of our data as well. Jean-Claude said he was also looking at how to automate processes so clearly this is the next big step forward.

Another question raised at MIT was how you could retro-fit our approach into an existing blog or wiki engine. The key issues here are templates (which is next on my list to describe here in detail) which would probably require some sort of plugin. The other issue is the metadata. Our blog engine goes one step beyond tagging by providing keys with values. Presumably this could be coded into a conventional engine using RDF or microformats - perhaps we should be doing this our Blog in any case?

Incidentally a point I made in both talks, partly in response to the question ‘does anyone really look at it’, is that in many cases it is your own access you are enabling. Making it open means you can always get at your own data, which is a surprisingly helpful thing.

The CanSAS meeting was also interesting. This is traditionally a meeting where Small Angle Scattering instrument scientists, the people who maintain and support these instruments at large scale neutron and X-ray facilities, fail to agree on a standard data format. I wanted to make two points, one was the general point that making data available was a good thing, and secondly that making the instrument data available without a detailed description of the sample was pretty useless. However against all precedent they not only agreed a data format but it is also a flexible XML format allowing different tags for different ‘dialects’. So I can insert a tag into the data file that will point to our lab book, which is what I wanted.

Today I head off to talk to the OpenWetWare developers and the Simile group so that will be very interesting. More details as I have time to post.

The Soton Lab Blog Book US Tour

Given that most people reading this probably also read the UsefulChem Blog I would guess that they have already figured out I am visiting the States. However as I am now here and due to jet lag have a few hours to kill before breakfast I thougt I might detail the intinerary for anyone interested. Read more »