Stop Spam. Read Books.
By Ben February 19th, 2008
In New technologies · Stories · Tool development
From the worldwide quest to digitize all books (see below), here’s another way in which you and I can exert a tiny effort to collaborate on solving a greater cause.
Stumbling through the ether yesterday, I noticed that the CAPTCHA (get this: “Completely Automated Public Turing test to tell Computers and Humans Apart”) I had to use to register for the site I was on presented two real words, rather than an arbitrary string of figures. It turns out that this is no ordinary security device. reCAPTCHA are using the power of social computing to get over the problem of unrecognisable characters in text. When scanning pages from printed books, OCR software often fails to translated some words correctly, especially if the print quality is poor. reCAPTCHA throws those words out to CAPTCHA widgets around the web and invites us to translate them, when we would have been inputting a CAPTCHA code anyway. The site claims to be making use of 60 million wasted ten-second-moments every day - that’s 150,000 lost hours. Brilliant, those are the kind of numbers we love.
Social computing, not in the sense of computer science for social interactions, like building widgets for Facebook, but in the sense of ceding tasks to collaborative groups with the result of massive actions coming from many smaller efforts, has fascinated me ever since I heard of the SETI@home project. The Search For Extra-Terrestrial Intelligence project asks us to use some of the available power in our computers to analyse radio telescopy data in the hope of spotting a message from distant lifeforms. It’s a bit like SpringWatch for Aliens.
More on social computing when I have time. For now though, the message from a galaxy far, far away remains as pertinent for us busy workers as ever: phone home.
N.B: as mentioned above, more about the efforts being made to put all the world’s books into digital formats and make them publicly available; here are some links to some of the big guys: Project Gutenburg, Google Library Project and Open Content Alliance
1 Maggie Walsh // Feb 19, 2008 at 12:00 pm
I am really fascinated by the re-CAPTCHA project — smart, smart SMART! Also check out the folding@home project from Stanford if you want to do some social computing closer to home. Like the SETI project, they use the power of many computers to research proteins linked to diseases like Alzheimer’s, Parkinson’s, and cancer.
2 Dan O'Connor // Feb 19, 2008 at 4:41 pm
I like this, but I wonder is reCAPTCHA capable of putting translated words into context?
3 Mat Morrison // Feb 19, 2008 at 8:03 pm
See also this Wired article on Mechanical Turk-like activities.
4 Nigel Shardlow // Feb 20, 2008 at 11:43 am
We’re implementing re-captcha on our web interface. It presents you with one word image it knows the character string for, and one it doesn’t. The security is (obviously) based on the one it knows, and it stores your guess at the other image. If three people make the same guess at an unknown image, it calls it a wrap and moves on. Very clever.
5 Matt Rebeiro // Feb 20, 2008 at 2:37 pm
This sounds an awful lot like Amazon’s “Mechanical Turk” social computing site. It deals with HIT (Human Intellignece Tasks).
It was used in the search for Millionaire Steve Fosset when he went missing. Up to date images of the area in which he was beleived to have crashed (provided by non-other than Google Earth) were given to individuals to scour for plane wreckage.
The great thing is you can get paid for doing tasks too!