Drop and Compute

Did you know you can run remote computations from your Windows/Mac/Linux box without any special client software installed, just by dragging and dropping?  And it even doesn’t matter if it’s not online all the time…

It’s a great idea from Ian Cottam at The University of Manchester, and it makes some powerful points.

The trick uses Dropbox, which is software that syncs your files across your computers. This is incredibly handy – as time goes on we all use more PCs, laptops (and indeed iPhones!) and Dropbox synchronises the contents of your Dropbox folder across all these for you. Note this is quite different from having some centralised filestore (or WebDAV drive) mounted on everything – it doesn’t need you to be online at time of use and it doesn’t need a sysadmin to set it up. Dropbox is very easy to install and incredibly easy to use – there really is no need to read a manual and the benefits are immediate. (Other synchronising software exists, but Ian prefers the simpliciity and ease of Dropbox.)

With the “Drop and Compute” model you just drag and drop your “job” into the appropriate Dropbox folder. Later Dropbox notifies you about new files and when you look you find the results.  This is a totally familiar interface for file and data management. Behind the scenes, the server spotted the job – via a simple monitoring script – and did its thing. To find out all the details of how Ian makes this work with Condor job submission, check out the Drop and Compute Wiki page for instructions and a video.

Couldn’t we have done this before?  Yes, but nothing like this easily for all concerned. If we say to researchers “mount this network drive and put all your files there on every machine you use” then we are creating an extra burden and perhaps worsening a version control problem, and obliging them to be online to use it.  However, if we say “you can use Dropbox to keep your files in sync between all your machines – and your iPhone too” then the user has an immediate benefit at next to no extra work. Dropbox is a solution that makes things simpler while other solutions make things more complicated – this is the only acceptable direction!

I think there’s some interesting psychology involved here too. Asking people to put their files somewhere central stops them feeling they are their personal files any longer, whereas syncing them across personal machines keeps them close and personal. Of course, behind the scenes, Dropbox is indeed putting them somewhere central, but that’s an implementation detail (and has the benefit you can also manage them from a  Web browser).  Fundamentally, the user model is empowering rather than disempowering.

Is there a catch?  Just one little issue at the moment – if Dropbox puts its data in a different territory then it may be subject to different laws, the best known example being the Patriot Act in the US.

The Condor example has exercised the model well, and in principle this approach could be used for any remote processing. But best of all it’s an example of understanding that ease of use really matters: above all, it’s a solution which actually makes peoples lives simpler.

  1. Ian Cottam’s avatar

    Thanks for blogging about DropAndCompute.

    To clarify the bit about the “Patriot Act”….
    Yes, currently Dropbox shares pass through Amazon S3 servers in the USA.
    On the positive side, as soon as you drag your folder of inputs/outputs out of the share, it is gone from the US based server. Data on the servers is also encrypted.

    On the negative side: the encryption key is owned by Dropbox. It is possible to do user side encryption oneself, but this is a little tricky as obviously result files are generated by this approach. Also, deleted data on the server can be undeleted. This used to be unlimited versions forever, but now one has the choice of not keeping deleted files, but they do hang around for about 30 days I think.

    In the long run we need one of two things:
    a) Amazon is moving to keep European data in Europe, on its Dublin data centre. We need to encourage Dropbox to support this.
    b) Dropbox could support user-side encryption.

    On the Dropbox web site
    http://www.dropbox.com
    there is a Votebox area. All us Europeans should vote for the above!
    -Ian

    Reply

  2. Anna Croft’s avatar

    Ian, this is brilliant. We had been using a similar idea to manually shuffle files between our compute machines, but this is ace. Do you know if we could convince NGS or another ‘public’ grid to adopt it?

    Reply

  3. David Wallom’s avatar

    Hi Dave/Anna/Ian

    The NGS is currently looking into this/something like it using the idea as we have a fairly significant issue with the location of the resulting data since significant user communities won’t have a clear understanding of the implications of things such as the patriot act.

    What would be interesting for example would be that the same simple idea as DB could be used by large data repositories such as UKRDS and the BBSRC data store. The key here is a standard interface that the simple tools can connect into.

    Regards

    David

    Technical Director, UK NGS

    Reply

  4. Ian Cottam’s avatar

    David Wallom knows about it. You could lobby. -Ian

    Reply

  5. Ian Cottam’s avatar

    I just emailed the founders of Dropbox about this, so will report back if they reply.
    Perhaps we will have to use Amazon S3 directly, but it is nowhere near as convenient as Dropbox.
    -Ian

    Reply

  6. Ian Cottam’s avatar

    A Dropbox founder replied to say that for mainly technical reasons they are currently constrained to use US data centres. However, in the future, as they start to look at supporting organisations and businesses, things could change.
    -Ian

    Reply

  7. Ian Cottam’s avatar

    A modified Bash Script and Process including full, user-side hard encryption

    I haven’t written or tested this, but see if you think it sounds good:

    1. User submits an encrypted zip file. Encryption method should be e.g. AES256 and the key needs to be known to the DropAndCompute script.

    2. When the receiving Bash script spots it, it copies it to a secondary, non-dropbox-synced folder before unencrypting, unzipping and submitting the job.

    3. If the user drags a *.debug file to the shared folder, it is copied over to the secondary folder where it is actioned and the debug results copied back to the share for the user to see. This step requires no encryption.

    4. If the user drags a *.kill file to the shared folder, it is copied over to the secondary folder where it is actioned and the results (just a log file) copied back to the share for the user to see. The secondary submission (unencrypted) folder is deleted. This step requires no encryption.

    5. The Bash script has to monitor the log file or the queue to detect when the job has finished. When it has, the secondary folder — now containing the result files too — is encrypted, zipped and copied back to the share for the user to see. The secondary folder is deleted.

    The main disadvantage over the clear, unencrypted form would be that the user does not see outputs as they are generated; however, this seems minor.

    -Ian

    Reply

  8. Ian Cottam’s avatar

    There is an even simpler solution to the user-side encryption some people desire; and that is to use EncFS (Google for it). An encrypted file system is kept within the shared dropbox folder. EncFS keeps another folder — outside of the dropbox area — where you can see the unencrypted version of your files. As these never pass through the Dropbox servers, one is safe from prying eyes.

    Unfortunately, it only works with Macs and Linux boxes (not Windows).
    -Ian

    Reply

Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>