An openwetware blog on the challenges of open and connected science

communication

Telling stories…

On Tuesday I was able to sit in on a conversation that is regularly held within the Computer Science department at University of Toronto that focuses broadly on what can computer science bring as a discipline and a skill set to the sciences more generally. The conversation is lead by Steve Easterbrook so there is a focus on climate science but we also roamed much more widely than that.

A key question that was raised, one which many of us have been struggling with for some time, is how to describe and publish descriptions of the progress of research projects in a way that provides a route in for non-specialists. Blogs provide a great way to do this, either as a generic journal with more or less detail, or as an overlay over a more detailed open notebook. Jean-Claude Bradley’s UsefulChem blog is a great example of the latter, and the blogs of Rosis Redfield’s group an example of the former.

The conversation was interesting for me in that it pinned down the idea and necessity of creating a narrative. This contrasts with the kind of (largely incomprehensible) detail found in a notebook which is usually fragmented and often distributed. One of the things that researchers are quite poor at in my experience is actually recording the why of an experiment; the question of how it fits into the wider context. Again blogs are a great format for doing this but where is the motivation? Writing that narrative in any form is hard work, a classic example of work that “takes me away from the bench” so how can it be justified?

One reason is that it raises the profile of the research, always an important issue in today’s research environment. But this is more important to some people than to others. Another very valid reason is to take personal notes, to create a personal narrative of what you are doing and have done that you can return to and use as an index to your own work. In a later discussion with Alicia Grubb she mentioned that her supervisors insisted on her blogging about literature that she collected. Equally taking notes in a bookmarking service could provide the same functionality. But understanding the context in which you bookmarked something is valuable. Brent Mombourquette, an undergraduate student also demonstrated a nice Firefox plugin that he had developed which captures and displays browsing history as a directed graph which is an interesting tool to think about in this context. I’ll write more about some of the fabulous student demos I saw later.

For me though, the biggest benefit of making your research accessible, is that it provides an entry route for new people to come in and help. The story of Galaxy Zoo shows how by placing a question within an understandable narrative you can enable people to come in and help out. No-one is going to come in from the outside and comment on my lab notebook unless they are already a specialist with a specific question. I’ve often thought I should start another blog to discuss more generally what is happening in my “real” research. Maybe this is the time to do that.

But you can also take this one step further. There was a debate in the comments on the RealClimate blog a few months ago about making the details of data and analysis publicly available. One real and valid concern was that denialists would dig into the detail and mis-represent problems or mistakes to advance their own agenda. Dealing with this kind of thing takes up valuable time, time the average researcher, particularly if they are committed to taking time to engage with a wider community, doesn’t have. My question was whether you could configure that public release of data and process in such a way that even those who are working against you are helping you. If people are searching your code for bugs then surely there must be a way of taking advantage of that?

The argument that releasing data costs you time sounds compelling when it comes from a researcher. But equally the same argument sounds dangerous when it is made by, for example, a government. As Steve said tongue in cheek, perhaps channeling some recently removed political leaders, “clearly we’re not going to release this data because it would take us time to deal with public complaints, and that will costs taxpayer. In fact, we’re not even going to run the consultation because that would cost money. It’s much cheaper and more effective use of your tax dollars if you just trust us to do the right thing.” The situations clearly aren’t exact parallels and resources for communication are much more limited in a research setting but it would be interesting to think about parallel cases in different domains, such as government and research, and how the domain effects the credibility of the argument. If you believe in the value of sunshine as disinfectant for government data then you need a strong case to argue the same doesn’t apply to research data.

But if you decide that you want to make that narrative public, or even better the narrative along with the underlying data, it does take work to make it comprehensible. As I’ve discovered recently such posts don’t translate easily into papers so making the argument that you can re-use the text doesn’t really work, at least for me. To make this worthwhile you either have to be required to do it: JISC in the UK basically requires that all funded projects have blogs; or you have to believe and work towards the benefits it can bring you. In a sense this is actually just recapturing the idea of the research notebook that many historical scientists kept, and which make such rich pickings for modern historians of science. Somewhere along the line we lost that. There are lots of tools around that can help you create that narrative, from clickstreams, records, environmental capture, but these remain only an aide memoire. Making the story is something that will probably remain a purely human activity. It is something that we seem highly evolved to do and it remains the most effective means of human to human communication. The computers can help, and they can provide the detail to dig down into if desired, but the story itself will remain ours alone for a while yet I think.

Conferences as Spam? Liveblogging science hits the mainstream

I am probably supposed to be writing up some weighty blog post on some issue of importance but this is much more fun. Last year’s International Conference on Intelligent Systems for Molecular Biology (ISMB) kicked off one of the first major live blogging exercises in a mainstream biology conference. It was so successful that the main instigators were invited to write up the exercise and the conference in a paper in PLoS Comp Biol. This year, the conference organizers, with significant work from Michael Kuhn and many others, have set up a Friendfeed room and publicised this from the off, with the idea of supporting a more “official”, or at least coordinated process of disseminating the conference to the wider world. Many have been waiting in anticipation for the live blogging to start due to logistical or financial difficulties in attending in person.

However, there were also concerns. Many of the original ring leaders were not attending. With the usual suspects confined to their home computers would the general populace take up the challenge and provide the rich feed of information the world was craving? Things started well, then moved on rapidly as the room filled up. But the question as to whether it was sustainable was answered pretty effectively when the Friendfeed room went suddenly quiet. Fear gripped the microbloggers. Could the conference go on? Gradually the technorati figured out they could still post by VPNing to somewhere else. Friendfeed was blocking the IP corresponding to the conference wireless network. So much traffic was being generated it looked like spam! This has now been corrected, and normal service resumed, but in a funny and disturbing kind of way it seems to me like a watershed. There were enough people, and certainly not just the usual suspects, live blogging a scientific conference that the traffic looked like spam. Ladies and Gentleman. Welcome to the mainstream.

Now that’s what I call social networking…

So there’s been a lot of antagonistic and cynical commentary about Web2.0 tools particularly focused on Twitter, but also encompassing Friendfeed and the whole range of tools that are of interest to me. Some of this is ill informed and some of it more thoughtful but the overall tenor of the comments is that “this is all about chattering up the back, not paying attention, and making a disruption” or at the very least that it is all trivial nonsense.

The counter argument for those of us who believe in these tools is that they offer a way of connecting with people, a means for the rapid and efficient organization of information, but above all, a way of connecting problems to the resources that can let us make things happen. The trouble has been that the best examples that we could point to were flashmobs, small scale conversations and collaborations, strangers meeting in a bar, the odd new connection made. But overall these are small things; indeed in most cases trivial things. Nothing that registers on the scale of “stuff that matters” to the powers that be.

That was two weeks ago. In the last couple of weeks I have seen a number of remarkable things happen and I wanted to talk about one of them here because I think it is instructive.

On Friday last week there was a meeting held in London to present and discuss the draft Digital Britain Report. This report, commissioned by the government is intended to map out the needs of the UK in terms of digital infrastructure, both physical, legal, and perhaps even social. The current tenor of the draft report is what you might expect, heavy on the need of putting broadband everywhere, to get content to people, and heavy on the need to protect big media from the rising tide of piracy. Actually it’s not all that bad but many of the digerati felt that it is missing important points about what happens when consumers are also content producers and what that means for rights management as the asymmetry of production and consumption is broken but the asymmetry of power is not. Anyway, that’s not what’s important here.

What is important is that the sessions were webcast, a number of people were twittering from the physical audience, and a much larger number were watching and twittering from outside, aggregated around a hashtag #digitalbritain. There was reportage going on in real time from within the room and a wideranging conversation going on beyond the walls of the room. In this day and age nothing particularly remarkable there. It is still relatively unusual for the online audience to be bigger than the physical one for these kind of events but certainly not unheard of.

Nor was it remarkable when Kathryn Corrick tweeted the suggestion that an unconference should be organized to respond to the forum (actually it was Bill Thomson who was first with the suggestion but I didn’t catch that one). People say “why don’t we do something?” all the time; usually in a bar. No, what was remarkable was what followed this as a group of relative strangers aggregated around an idea, developed and refined it, and then made it happen. One week later, on Friday evening, a website went live, with two scheduled events [1, 2], and at least two more to follow. There is an agreement with the people handling the Digital Britain report on the form an aggregated response should take. And there is the beginning of a plan as to how to aggregate the results of several meetings into that form. They want the response by 13 May.

Lets rewind that. In a matter of hours a group of relative strangers, who met each other through something as intangible as a shared word, agreed on, and started to implement a nationwide plan to gather the views of maybe a few hundred, perhaps a few thousand people, with the aim, and the expectation of influencing government policy. Within a week there was a scalable framework for organizing the process of gathering the response (anyone can organize one of the meetings) and a process for pulling together a final report.

What made this possible? Essentially the range of low barrier communication, information, and aggregation tools that Web2.0 brings us.

  1. Twitter: without twitter the conversation could never have happened. Friendfeed never got a look in because that wasn’t where this specific community was. But much more than just twitter, the critical aspect was;
  2. The hashtag #digitalbritain: the hashtag became the central point of a conversation between people who didn’t know each other, weren’t following each other, and without that link would never have got in contact. As the conversation moved to discussing the idea of an unconference the hashtags morphed first to #digitalbritain #unconference (an intersection of ideas) and then to #dbuc09. In a sense it became serious when the hashtag was coined. The barrier to a group of sufficiently motivated people to identify each other was low.
  3. Online calendars: it was possible for me to identify specific dates when we might hold a meeting at my workplace in minutes because we have all of our rooms on an online calendar system. Had it been more complex I might not have bothered. As it was it was easy to identify possible dates. The barrier to organization was low.
  4. Free and easy online services: A Yahoo Group was set up very early and used as a mailing list. Wordpress.com provides a simple way of throwing up a website and giving specified people access to put up material. Eventbrite provies an easy method to manage numbers for the specific events. Sure someone could have set these up for us on a private site but the almost zero barrier of these services makes it easy for anyone to do this.
  5. Energy and community: these services  lead to low barriers, not zero barrier. There still has to be the motivation to carry it through. In this case Kathryn provided the majority of the energy and others chipped in along the way. Higher barriers could have put a stop to the whole thing, or perhaps stopped it going national, but there needs to be some motivation to get over the barriers that do remain. What was key was that a small group of people had sufficient energy to carry these through.
  6. Flexible working hours: none of this would be possible if the people who would be interested in attending such meetings couldn’t come on short notice. The ability of people to either arrange their own working schedule or to have the flexibility to take time out of work is crucial, otherwise no-one could come. Henry Gee had a marvelous riff on the economic benefits of flexible working just before the budget. The feasibility of our meetings is an example of the potential efficiency benefits that such flexibility could bring.

The common theme here is online services making it easy to aggregate the right people and the right information quickly, to re-publish that information in a useful form. We will use similar services, blogs, wikis, online documents to gather back the outputs from these meetings to push back into the policy making process. Will it make a big difference? Maybe not, but even in showing that this kind of response, this kind of community consultation can be done effectively in a matter of days and weeks, I think we’re showing what a Digital Britain ought to be about.

What does this mean for science or research? I will come back to more research related examples over the next few weeks but one key point was that this happened because there was a pretty large audience watching the webcast and communicating around it. As I and others have recently argued in research the community sizes probably aren’t big enough in most cases for these sort of network effects to kick in effectively. Building up community quantity and quality will be the main challenge of the next 6 - 12 months but where the community exists and where the time is available we are starting to see rapid, agile, and bursty efforts in projects and particularly in preparing documents.

There is clearly a big challenge in taking this into the lab but there is a good reason why when I talk to my senior management about the resources I need that the keywords are “capacity” and “responsiveness”. Bursty work requires the capacity to be in place to resource it. In a lab this is difficult, but it is not impossible. It will probably require a reconfiguring of resource distribution to realize its potential. But if that potential can be demonstrated then the resources will almost certainly follow.

The failure of online communication tools

Coming from me that may sound a strange title, but while I am very positive about the potential for online tools to improve the way we communicate science, I sometimes despair about the irritating little barriers that constantly prevent us from starting to achieve what we might. Today I had a good example of that.

Currently I am in Sydney, a city where many old, and some not so old friends live. I am a bit rushed for time so decided the best way to catch up was to propose a date, send out a broadcast message to all the relevant people, and then sort out the minor details of where and exactly when to meet up. Easy right? After all tools like Friendfeed and Facebook provide good broadcast functionality. Except of course, as many of these are old friends, they are not on Friendfeed. But that’s ok because I’ve many of them are on Facebook. Except some of them are not old friends, or are not people I have yet found on Facebook, but that’s ok, they’re on Friendfeed, so I just need to send two messages. Oh, except there are some people who aren’t on Facebook, so I need to email them - but they don’t all know each other so I shouldn’t send their email addresses in the clear. That’s ok, that’s what bcc is for. Oh, but this email address is about five years old…is it still correct?

So - I end up sending three independent messages, one via Friendfeed, three via Facebook (one status message, one direct message, and another direct message to the person I found but hadn’t yet friended), and one via email (some unfortunate people got all three - and it turns out they have to do their laundry anyway). It almost came down to trying some old mobile numbers to send out text. Twitter (which I don’t use very much) wouldn’t have helped either. But that’s not so bad - only took me ten minutes to cut and paste and get them all sent. They seem to be getting through to people as well which is good.

Except now I am getting back responses via email, via Facebook, and at some point via Friendfeed as well no doubt. All of which are inaccessible to me when I am out and about anyway because I’m not prepared to pay the swinging rates for roaming data.

What should happen is that I have a collection of people, I choose the send them a message, whether private or broadcast, and they choose how to receive that message and how to prioritise it. They then reply to me, and I see all their responses nicely aggregated because they are all related to my one query. As this query was time dependent I would have prioritised responses so perhaps I would receive them by text or direct to my mobile in some other form. The point is that each person controls the way they receive information from different streams and is in control of the way they deal with it.

It’s not just filter failure which is creating the impression of the information overload. The tools we are using, their incompatibility, and the cost of transferring items from one stream to another are also contributing to the problem. The web is designed to be sticky because the web is designed to sell advertising. Every me-too site wants to hold its users and communities, my community, my specific community that I want to meet up with for a drink, is split across multiple services. I don’t have a solution to the business model problem - I just want services with proper APIs that let other people build services that get all of my streams into one place. I hope someone comes up with a business model - but I also have to accept that maybe I just need to pay for it.

The distinction between recording and presenting - and what it means for an online lab notebook

Something that has been bothering me for quite some time fell into place for me in the last few weeks. I had always been slightly confused by my reaction to the fact that on UsefulChem Jean-Claude actively works to improve and polish the description of the experiments on the wiki. Indeed this is one of the reasons he uses a wiki as the process of making modifications to posts on blogs is generally less convenient and in most cases there isn’t a robust record of the different versions. I have always felt uncomfortable about this because to me a lab book is about the record of what happened - including any mistakes in recording you make along the way. There is some more nebulous object (probably called a report) which aggregates and polishes the description of the experiments together.

Now this is fine, but point is that the full history of a UsefulChem page is immediately available from the history. So the full record is very clearly there - it is just not what is displayed. In our system we tend to capture a warts and all view of what was recorded at the time and only correct typos or append comments or observations to a post. This tends not be very human readable in most cases - to understand the point of what is going on you have to step above this to a higher level - one which we are arguably not very good at describing at the moment.

I had thought for a long time that this was a difference between our respective fields. The synthetic chemistry of UsefulChem lends itself to a slightly higher level description where the process of a chemical reaction is described in a fairly well defined, community accepted, style. Our biochemistry is more a set of multistep processes where each of those steps is quite stereotyped. In fact for us it is difficult to define where the ‘experiment’ begins and end. This is at least partly true, but actually if you delve a little deeper and also have a look at Jean-Claude’s recent efforts to use a controlled vocabulary to describe the synthetic procedures a different view arises. Each line of one these ‘machine readable’ descriptions actually maps very well onto each of our posts in the LaBLog. Something that maps on even better is the log that appears near the bottom of each UsefulChem page. What we are actually recording is rather similar. It is simply that Jean-Claude is presenting it at a different level of abstraction.

And that I think is the key. It is true that synthetic chemistry lends itself to a slightly different level of abstraction than biochemistry and molecular biology, but the key difference actually comes in motivation. Jean-Claude’s motivation from the beginning has been to make the research record fully available to other scientists; to present that information to potential users. My focus has always been on recording the process that occurs in the lab and particular to capture the connections between objects and data files. Hence we have adopted a fine grained approach that provides a good record, but does not necessarily make it easy for someone to follow the process through. On UsefulChem the ideal final product contains a clear description of how to repeat the experiment. On the LaBLog this will require tracking through several posts to pick up the thread.

This also plays into the discussion I had some months ago with Frank Gibson about the use of data models. There is a lot to be said for using a data model to present the description of an experiment. It provides all sorts of added value to have an agreed model of what these descriptions look like. However it is less clear to me that it provides a useful way of recording or capturing the research process as it happen, at least in a general case. Stream of consciousness recording of what has happened, rather than stopping halfway through to figure out how what you are doing fits into the data model, is what is required at the recording stage. One of the reasons people feel uncomfortable with electronic lab notebooks is that they feel they will lose the ability to scribble such ‘free form’ notes - the lack of any presuppositions about what the page should loook like is one of the strengths of pen and paper.

However, once the record, or records, have been made then it is appropriate to pull these together and make sense of them - to present the description of an experiment in a structured and sensible fashion. This can of course be linked back to the primary records and specific data files but it provides a comprehensible and fine grained descriptionof the rationale for and conduct of the experiment as well as placing the results in context. This ‘presentation layer’ is something that is missing from our LaBLog but could relatively easily be pulled together by writing up the methodology section for a report. This would be good for us and good for people coming into the system looking for specific information.

Person Frank Gibson

Right click for SmartMenu shortcuts

The long slow catchup…

I’m a little shell shocked really. I’ve spent the last couple of weeks running around like a lunatic, being at meetings, organising meetings, flying out to other meetings. And then flying back to try and catch up with all the things that need doing before the next flurry of activity strikes (which involves less travel and more experiments you will be pleased to know). There are two things I desperately need to write up.

The Open Science workshop at Southampton on September 1 seemed to be well received and was certainly interesting for me.  Despite having a very diverse group of people we did seem to manage to have a sensible discussion that actually came to some conclusions. This was followed up by discussions with the web publishing group at Nature where some of these ideas were refined - more on this will follow!

Following on from this (and with a quick afternoon jaunt to Bristol for the Bristol Knowlege Unconference on the evening of September 5 I flew to Toronto en route to Waterloo for Science in the 21st Century, allowing for a brief stop for a Nature Network Toronto pub night panel session with Jen Dodd, Michael Nielsen, and Timo Hannay. The organisers of Science21, but in particular Sabine Hossenfelder, deserve huge congratulations for putting together one of the most diverse and exciting conferences I have ever been to. With speakers from historians to sociologists, hedge fund managers to writers, and even the odd academic scientist the sheer breadth of material covered was quite breathtaking.

You can see most of the talks and associated material on the Perimeter Institute Seminar Archive page here. The friendfeed commentary is also available in the science21 room. Once again it was a great pleasure to meet people I kind of knew but hadn’t ever actually met such as Greg Wilson and John Dupuis as well as to meet new people including (but by no means limited to) Harry Collins, Paul Guinnessy, and David Kaiser. We have yet to establish whether I knew Jen Dodd in a previous life…

Very many ideas will come out of this meeting I think - and I have no doubt you will see some interesting blog posts from others with the science21 tag coming out over the next few weeks and months. A couple of particular things I will try to follow up on;

  • Harry Collins spoke about categorisations of tacit (i.e. non-communicated) knowledge and how these relate to different categories of expertise. This has obvious implications for our mission to describe our experiments to a level where there is ‘no insider information’. The idea that we may be able to rationally describe what we can and cannot expect to be able to communicate and that we can therefore concentrate on the things that we can is compelling.
  • Greg Wilson made a strong case for the fully supported experiment that echoed my own thoughts about the recording of data analysis procedures. He was focussed on computational science but I think his point goes much wider than that. This requires some thought and processing but for me it is clear that the big challenge in communicating the details of our experiments now clearly lies in communicating process rather than data.

Each of these deserves its own post and will hopefully get it. And I am also aware that I owe many of you comments, replies, or other things - some more urgent than others. I’ll be getting to them as soon as I can dig myself out from under this pile of……