As science progresses we witness evolution and revolution in scientific understanding. Coupled intricately with this is an evolution and revolution in scientific techniques and methods – the problem-solving ability of science advances as new methods lead to new understanding, and this in turn leads to new methods…

In the last 10 years science has experienced a step change in problem-solving ability brought about by increasing digitisation, automation and participation. This has in part been achieved through the engagement between scientists and the providers of the advanced information, computational and software techniques that they need. That’s what we call e-Science.

Is it done now?  Can we put the “e-” in the history books and just get on with Science?

In a thinktank some months ago about a future strategy for e-Science I argued for “the next level”. e-Science has been successful and, in a properly Darwinian way, is bedding down in our research institutions.  But I don’t think we’re done yet – in fact we’re just beginning! What happens next is that we assemble the pieces in order to do entirely new things.  That is the next level.

This week I’ve seen a great example of the next-level in action. The music information retrieval community has been developing tools to extract features from digital recordings of music. It’s hard but this multidisciplinary community is flourishing and making great progress. For example, here in Tokyo I’ve just seen (or rather heard…) the vocal extracted from a pop song to generate two files – one with just the vocal, one without.  It works incredibly well! Check out the International Society for Music Information Retrieval (ISMIR) and the Music Information Retrieval Evaluation eXchange (MIREX).

And now we’re seeing the research move up a level – having extracted the features, they can be used for the next tier of work; having mastered the computational infrastructure we have it ready to support this. So, for example, you can extract some features and then use autocorrelation to find repeating patterns and figure out the musical stucture – and move from signal processing into musicology. To do this involves remote processing, smart alogorithms and stonking amounts of computation. And it’s a community effort (for content, algorithms and ground truth).

Perhaps the job of work for the science-serving computer scientist is now the assembly rather than the pieces.  The music IR community is so ready for assembly- for example, the linked data cloud has substantial music content, thanks to MusicBrainz and the BBC activities. We’ll be exploring this joining-up in the Networked Environment for Music Analysis (NEMA) project.

I wonder what the science history books will say. “1st Generation e-Science 2000-2010 – Accelerated Research”, “2nd Generation e-Science 2010… – stuff that just wouldn’t have happened otherwise” :-)

DDeR, Tokyo

This month sees the publication of our article “Software Design for Empowering Scientists” in IEEE Software (IEEE Explore; DOI). It was fun to write. Basically it tells the story of the principles that we used in delivering the Taverna scientific workflow workbench (56,000 downloads and counting) and how we applied them to the myExperiment virtual research environment. Which is interesting, because Taverna is software you install on your PC while myExperiment is a Web 2.0 site – quite different beasts.

I’m looking forward to seeing how this article goes down. Already it’s upset some computer scientists. I gave a lecture on it to my 3rd year class and there were protests that I was “saying the opposite to other lecturers”. It wasn’t just that I dared suggest that Web 2.0 isn’t hype but rather is a well-defined set of observed design patterns. Worse, I advocated the perpetual beta and – outrageously – I suggested that sometimes doing the specific is important over the generic…

This goes against the established wisdom in Computer Science. We train our computer scientists and software engineers to elicit requirements, design, build, test, deploy. We teach abstraction, and install in them (sic) an imperative toward the generic. Which is fine in certain situations, but in e-research it can be a formula for delivering a solution to a problem our users didn’t know they had and perhaps never will.

Tell me if you disagree! :-)


The Open Science workshop at the Pacific Symposium on Biocomputing was held on Monday 5th January 2009 on Big Island, Hawaii. The workshop was organised by Cameron Neylon and Shirley Wu, and is part of a series of open science workshops being organised across a variety of events in different communities. The workshop materials are available (the links in this summary are to slideshare).

Cameron introduced the workshop with some excellent slides. He didn’t dwell on the definition of Open Science lest this became an obstacle (he sums it up informally as “more stuff, more available, more quickly”, and being about “Open Access, Open Data and Open Process”). Cameron doesn’t insist on what one might call “extreme open science”, but simply observes that more open means better research (eg sharing within labs, sharing across labs) and that those with the the privilege of conducting publicly funded research have a duty to make it publicly accessible.

The keynote talk was by Phil Bourne of PLoS, on Open Science: one Person’s view and what we are doing about it, which set the scene in terms of open access and linking data to publications (Phil had some disturbing figures about the decay of suplemental materials). I presented The myExperiment approach to Open Science, emphasising sharing of process and method as exemplified by workflows and the principle of journeying with the users. Chang Feng Quo’s talk on Community annotation in translational bioinformatics: lessons from Wikipedia considered why the wikipedia approach sits differently in the science context. Nigam Shah from NCBO presented How bio-ontologies enable Open Science including the Open-Biomedical Annotator Web Service. In her talk on Measuring the adoption of Open Science Data, Heather Piwowar provided results of a study into data sharing which gives a really useful evidence base, and pointed out the need to find ways of measuring the adoption of open science (we need quantitative data to convince anyone of anything!)  It always helps to have a critical friend in the room and this role was filled excellently by Steven Brenner in his talk Challenges for Open Science (thoughts from a skeptical supporter). Online you will also find Kaitlin Thaney’s slides (Science Commons) on Laying out the principles of Open Science.

The panel discussion involved the presenters (with Carole Goble in the myExperiment seat this time) and a contribution from Drew Endy (now at Stanford, previously at MIT where OpenWetWare began as the Wiki for his lab). Discussion included “build and they won’t come”, with Carole explaining myExperiment’s solution of working closely with users, the use of user advocates and a willingness to pay experts to bootstrap content.  There was also a discussion of persistence and sustainability, with OpenWetWare as an example, and on the need somehow to cite tools. At the end Cameron asked what is needed to go forward – success stories to convince people of the approach was one of the answers (celebration of success and reflection on failure!)

It was a great workshop – the talks were useful and insightful (not commenting on my own of course!) and showed an appreciation of the social and cultural context, not just technical. Thanks to Cameron and Shirley for organising it, and I recommend that people take a look and build on the success of this event to take open science forward.

Tags: , ,

I wrote this essay for Bill Dutton and he’s kindly given me permission to post it here – it sets the scene nicely for the Blog. It’s based on the “New e-Science” presentation I gave back in September 2008, which gave a 10-point definition of “The New e-Science” (that’s the number of points, not the font size!)

The New e-Research

The success stories of e-Science have emerged in so-called ‘big science’, where researchers harness large scale computational and data resources to cope with the huge volumes of data arising from new experimental techniques and data capture technologies. But what about all the other researchers in our universities and R&D departments – the everyday researchers, working on their PhD or assisting other researchers, working in and across the diversity of disciplines? While e-Science provides a case study of advanced techniques in early adopter communities, other researchers have not been standing still. Here I define the ‘New e-Research, a set of observations on the key trends which frame our future research environment.

Increasing Scale and Diversity of Participation and Digital Content

The single most influential driver for the New e-Research is the increasing scale and diversity of participation. The decreasing cost of entry into digital research means more people, data, tools and methods. Anyone can participate: researchers in labs, archaeologists at digs or schoolchildren designing drug molecules in Web browsers; they may be professionals or enthusiasts. This leads to talk of ‘citizen science’ and ‘crowdsourcing’. Everyone becomes a first class citizen: each can be a publisher as well as a consumer. It isn’t enough that people can participate – they are also incentivized to do so, by the benefits of working digitally and by the opportunity to play a role. Elitists may disapprove.

The scale of participation, together with spectacular data volumes from new research instruments, also bring increasing scale, diversity and complexity of digital content. Research increasingly involves discovering and combining sources, so we now have more to find and more to combine. Furthermore, digital research is generating new digital artefacts: workflows, data provenance records, ontologies, electronic lab books and interactive papers. Associated with research content we have attribution and licensing to supports rights flow, and technical challenges in versioning, aggregation and security.

Context and provenance information is essential for data reuse, quality and trust. Not only is there more data of more kinds, but policy and practice are moving to make it more available: we are seeing open data, open access journals and increasing adoption of Creative Commons and Science Commons licences. The technologies of linked data (analogous to the linked documents of the Web) are now gaining traction to assist with discovery, integration, reuse and trust. With all this comes a very substantial digital curation challenge.

Supporting Wider and More Efficient Sharing

Our next defining characteristic is sharing – anyone can play, and they can play together. Research has always been a social process, in which researchers collaborate in order to compete, but now we’re using new social tools like wikis, blogs, instant messaging and social websites. The new knowledge lifecycle accelerates research and reduces time-to-experiment compared with traditional scholarly communications. Some of our new digital artefacts are not just pieces of data but rather they capture pieces of research process – like the details of a method, a workflow, an experimental plan or script. These digital representations of methods make research repeatable, reproducible and reusable. They facilitate automation, and sharing them is particularly powerful because it enables practice to propagate and reputations to grow.

Empowering Researchers

The increased participation, content and sharing is building a social network of people and things. We see the network effects through ‘community intelligence’: tagging, reviewing, discussion, and recommendation based on usage. This is a powerful new force with which to tackle discovery of data and methods, and rise to the curation challenge. We also see more and more services being made available for researchers to access remotely, which in itself is a mechanism for sharing capability and know-how – a form of distributed global collaboration.

Crucially, researchers are being empowered: increasing fluency with new tools and resources puts the researchers in control of their supporting tools, techniques and methods. Empowerment enables creativity and encourages bottom-up innovation in the practice of research, giving rise to new and potentially sharable methods. The techniques for automated processing, one of the great successes of e-Research, are empowering when they enable researchers to do what they’re good at and let the digital machinery deal with the drudgery – but not when they take away control or challenge autonomy. Researchers favour tools that make their work better or easier, rather than complex solutions demanding disproportionate effort to adopt or constrain desired practice. This de facto principle of ‘better not perfect’ is sometimes at odds with the established mindset of software and service providers. A traditional software engineering approach of requirements capture and top-down design fails in this bottom-up world, where requirements can only be elicited by sharing the journey with the researchers.

The Future: Pervasive Deployment?

The computing infrastructure that supports research is evolving too.  The 1990s’ instinct to harness remote computing resources is giving way to a new problem, which can be called ‘pervasive deployment’: making full use of the resources we have around us on an everyday basis.  There is an increasingly rich intersection between the physical and digital worlds as we interact with the devices around us, from phones to laboratory equipment to sensor networks. Researchers at the desktop favour interaction through familiar Web-based interfaces as opposed to software supported by the enterprise. This is consistent with the shift towards cloud computing (Hey et al, this volume) and virtualization, whereby remote computers can trivially be summoned on demand. While big science attracts highly engineered infrastructure solutions delivered by specialists, it seems that for many the Web provides a highly adequate distributed computing infrastructure for our work.

The future research environment is tremendously exciting, because we will do things that simply were not possible before. This isn’t just about acceleration, it’s about entirely new research outcomes. Lines will be redrawn as in silico practice evolves; entirely new methods and even disciplines will be created. The emerging pattern is an ‘open science’ world where data, methods and outputs are shared and reusable, where tools and services and the software that provides them are all shared. Though much of this was intended by pioneers of e-Science, it may come about through the emergence of a bottom-up paradigm where the simplicity of assembly of the pieces will enable the New e-Research to flourish.

It’s all Duncan’s fault. For years I’ve known he lives in that other dimension, the parallel universe that is the Blogsphere (across the void, bit like Dr Who). Then in September 2008 he responded to the challenge to get more professors blogging by mailing some of his colleagues. In my case, Duncan’s email pushed on an open door with creaky hinges…

Subsequent discussion with colleagues suggested several possible reasons not to write a personal blog: it takes time better spent doing far more important things, blogs are just vanity publishing for arrogant people with an inflated ego, and that blogging is strategically naive because information is power and should therefore be provided only on a need-to-know basis… :-)

But I want to. I lead a hectic (possibly crazy…) academic life where I get to work with experts in many disciplines – I get a unique, perhaps privileged, view of the world and it’s one I want to share. For example, when I’ve been in a good panel, there is information to be shared and debate to be continued too – time to blog. And from where I sit, not only do I get to see things but I get to see the connections between things – what better mechanism than a blog for communicating that interconnectedness? So for me it’s not ego, it’s duty and the appropriate tool.

And it’s part of my research – research is about connectedness and i want to understand how to achieve it. I see a compelling analogy between the informal communications of the great scientists of old – the “invisible college” communicating by letter and annotated book margin – and the emerging research practices of open science and Science 2.0.  I see the benefit in understanding how the scholarly knowledge cycle can evolve, especially in the context of the shift to digital and data-centric research.

And, number 3, I enjoy writing – and communicating is surely how we will achieve the vision of joined-up-research.

Join me in the experiment that is my blog.

Dave, on a flight to LA

Hello world!

New Year 2009 – my resolution is to blog!  Thanks to Duncan Hull, Jean-Claude Bradley and Cameron Neylon for leading the way, Bill Flanagan for making the way and HaikuGirl for proving that people from Bognor can blog!

Newer entries »