I wrote this essay for Bill Dutton and he’s kindly given me permission to post it here – it sets the scene nicely for the Blog. It’s based on the “New e-Science” presentation I gave back in September 2008, which gave a 10-point definition of “The New e-Science” (that’s the number of points, not the font size!)
The New e-Research
The success stories of e-Science have emerged in so-called ‘big science’, where researchers harness large scale computational and data resources to cope with the huge volumes of data arising from new experimental techniques and data capture technologies. But what about all the other researchers in our universities and R&D departments – the everyday researchers, working on their PhD or assisting other researchers, working in and across the diversity of disciplines? While e-Science provides a case study of advanced techniques in early adopter communities, other researchers have not been standing still. Here I define the ‘New e-Research, a set of observations on the key trends which frame our future research environment.
Increasing Scale and Diversity of Participation and Digital Content
The single most influential driver for the New e-Research is the increasing scale and diversity of participation. The decreasing cost of entry into digital research means more people, data, tools and methods. Anyone can participate: researchers in labs, archaeologists at digs or schoolchildren designing drug molecules in Web browsers; they may be professionals or enthusiasts. This leads to talk of ‘citizen science’ and ‘crowdsourcing’. Everyone becomes a first class citizen: each can be a publisher as well as a consumer. It isn’t enough that people can participate – they are also incentivized to do so, by the benefits of working digitally and by the opportunity to play a role. Elitists may disapprove.
The scale of participation, together with spectacular data volumes from new research instruments, also bring increasing scale, diversity and complexity of digital content. Research increasingly involves discovering and combining sources, so we now have more to find and more to combine. Furthermore, digital research is generating new digital artefacts: workflows, data provenance records, ontologies, electronic lab books and interactive papers. Associated with research content we have attribution and licensing to supports rights flow, and technical challenges in versioning, aggregation and security.
Context and provenance information is essential for data reuse, quality and trust. Not only is there more data of more kinds, but policy and practice are moving to make it more available: we are seeing open data, open access journals and increasing adoption of Creative Commons and Science Commons licences. The technologies of linked data (analogous to the linked documents of the Web) are now gaining traction to assist with discovery, integration, reuse and trust. With all this comes a very substantial digital curation challenge.
Supporting Wider and More Efficient Sharing
Our next defining characteristic is sharing – anyone can play, and they can play together. Research has always been a social process, in which researchers collaborate in order to compete, but now we’re using new social tools like wikis, blogs, instant messaging and social websites. The new knowledge lifecycle accelerates research and reduces time-to-experiment compared with traditional scholarly communications. Some of our new digital artefacts are not just pieces of data but rather they capture pieces of research process – like the details of a method, a workflow, an experimental plan or script. These digital representations of methods make research repeatable, reproducible and reusable. They facilitate automation, and sharing them is particularly powerful because it enables practice to propagate and reputations to grow.
The increased participation, content and sharing is building a social network of people and things. We see the network effects through ‘community intelligence’: tagging, reviewing, discussion, and recommendation based on usage. This is a powerful new force with which to tackle discovery of data and methods, and rise to the curation challenge. We also see more and more services being made available for researchers to access remotely, which in itself is a mechanism for sharing capability and know-how – a form of distributed global collaboration.
Crucially, researchers are being empowered: increasing fluency with new tools and resources puts the researchers in control of their supporting tools, techniques and methods. Empowerment enables creativity and encourages bottom-up innovation in the practice of research, giving rise to new and potentially sharable methods. The techniques for automated processing, one of the great successes of e-Research, are empowering when they enable researchers to do what they’re good at and let the digital machinery deal with the drudgery – but not when they take away control or challenge autonomy. Researchers favour tools that make their work better or easier, rather than complex solutions demanding disproportionate effort to adopt or constrain desired practice. This de facto principle of ‘better not perfect’ is sometimes at odds with the established mindset of software and service providers. A traditional software engineering approach of requirements capture and top-down design fails in this bottom-up world, where requirements can only be elicited by sharing the journey with the researchers.
The Future: Pervasive Deployment?
The computing infrastructure that supports research is evolving too. The 1990s’ instinct to harness remote computing resources is giving way to a new problem, which can be called ‘pervasive deployment’: making full use of the resources we have around us on an everyday basis. There is an increasingly rich intersection between the physical and digital worlds as we interact with the devices around us, from phones to laboratory equipment to sensor networks. Researchers at the desktop favour interaction through familiar Web-based interfaces as opposed to software supported by the enterprise. This is consistent with the shift towards cloud computing (Hey et al, this volume) and virtualization, whereby remote computers can trivially be summoned on demand. While big science attracts highly engineered infrastructure solutions delivered by specialists, it seems that for many the Web provides a highly adequate distributed computing infrastructure for our work.
The future research environment is tremendously exciting, because we will do things that simply were not possible before. This isn’t just about acceleration, it’s about entirely new research outcomes. Lines will be redrawn as in silico practice evolves; entirely new methods and even disciplines will be created. The emerging pattern is an ‘open science’ world where data, methods and outputs are shared and reusable, where tools and services and the software that provides them are all shared. Though much of this was intended by pioneers of e-Science, it may come about through the emergence of a bottom-up paradigm where the simplicity of assembly of the pieces will enable the New e-Research to flourish.