There’s a heartwarming story at the New York Times, “We Found Our Son in the Subway“. You should check it out. Really, I’ll wait… the rest of this post won’t make sense until you’ve read it.
It strikes me that the judge managed, on an ultra-compressed timescale, to provide this same-sex couple the same experience of surprise parenthood that alt-sex couples may have through an unplanned pregnancy.
This principle could be advanced more systematically. That is, perhaps adoption by same-sex couples should not just be permitted, but random and compulsory. If you’re married and of child-rearing age, you might be assigned a baby!
The agency making the assignments can be called the State Taskforce Organizing Reproductive Co-equality, or STORC. So when these children ask where they came from, the happy parents can honestly say, “the STORC dropped you off”.
As computing systems have the idea of ‘Turing equivalence‘, emerging 3D-printing will exhibit a sort of ‘Gatling equivalence’: minimal configurations of home manufacturing systems will be capable of making fully-automatic weapons. Such maker technology will be ubiquitous, rendering other licensing, registration, or ownership restrictions on specific kinds of weapons moot. Any script kiddie will be able to print an AK-47.
Ardent gun supporters sometimes dream that with widespread weapon-carrying by the public, there would be skilled armed resistance to violent crime in every workplace, mall, and school. “Take that, deranged rampage shooter!”
That ubiquitous armed resistance may arrive… but from the holsters of robots rather than humans. Mass-produced and largely autonomous lethally-armed defense drones are only a matter of time, whether they be wheeled, walking, flying – or simply wall-mounted.
Imagine: “The 2026 model year of the ED-208 is finally cheap enough to install one in every room. For legal reasons, the ED-208 only ever shoots second… but it never misses.”
(This was written after the first, but before viewing the second, match of the Jeopardy IBM Challenge.)
I’m a fan of Jeopardy, a professional software developer, and a strong optimist about the prospects for artificial intelligence – so I’m immensely enjoying the contest between the IBM Watson system and human Jeopardy Champions Ken Jennings and Brad Rutter.
Watson’s ability to answer quickly and confidently, even when clues are indirect or euphemistic, has been impressive. It’s rightly hailed as one of the most impressive demonstrations yet of information retrieval and even natural-language understanding.
When Alex Trebek walked by the 10 racks of 9 servers each, said to include 2880 computing cores and 15 terabytes (15,000 gigabytes) of high-speed RAM main-memory, I couldn’t shake the feeling: this seems like too much hardware… at least if any of the software includes new breakthroughs of actual understanding. As parts of the show took on the character of an IBM infomercial, my skepticism only grew.
Let me explain.
While Jeopardy questions are challenging and wide-ranging, in highly idiomatic (even whimsical) English, this trivia game remains a very constrained domain. The clues are short; the answers just a few words, at most, and usually discrete named entities – the kinds of things that have their own titled entries in encyclopedias, dictionaries, and gazetters/almanacs of various sorts.
While the clues often have wordplay, many also have signifiers that clearly indicate exactly what kind of word/phrase completion is expected. (The strongest is perhaps the word ‘this’, as in “this protein” or “‘Storm on the Sea of’ this”. But categories which promise a certain word or phrase will be in the answer, with that portion in quotes, also help brute-force search plenty.)
I strongly suspect that almost anyone with even a rudimentary understanding of language, and enough time to research clues in an offline static copy of Wikipedia, could get 90%+ of the clues right.
An offline copy of all of Wikipedia’s articles, as of the last full data-dump, is about 6.5GB compressed, 30GB uncompressed – that’s 1/500th Watson’s RAM. Furthermore, chopping this data up for rapid access – such as creating an inverted index, and replacing named/linked entities with ordinal numbers – tends to result in even smaller representations. So with fast lookup and a modicum of understanding, one server, with 64GB of RAM, could be more than enough to contain everything a language-savvy agent would need to dominate at Jeopardy.
But what if you’re not language savvy, and only have brute-force text-lookup? We can simulate the kinds of answers even a naive text-search approach against a Wikipedia snapshot might produce, by performing site-specific queries on Google.
For many of the questions Watson got right, a naive Google query of the ‘en.wikipedia.org’ domain, using the key words in the clue, will return as the first result the exact Wikipedia article whose title is the correct answer. For example:
DON’T WORRY ABOUT IT for $2000:
“It’s just acne! You don’t have this skin infection also known as Hansen’s Disease”
[site:en.wikipedia.org acne skin infection hansen's disease]
1st result: LEPROSY (right answer)
CAMBRIDGE for $1600/Daily Double:
“The chapels at Pembroke & Emmanuel Colleges were designed by this architect “
[site:en.wikipedia.org chapels pembroke emmanuel colleges designed architect]
1st result: CHRISTOPER WREN (right answer)
HEDGEHOG-PODGE for $2000 :
“A recent bestseller by Muriel Barbery is called this ‘of the hedgehog’”
[site:en.wikipedia.org bestseller Muriel Barbery "of the Hedgehog"]
1st result: THE ELEGANCE OF THE HEDGEHOG (right answer)
CAMBRIDGE for $2000:
“This ‘Narnia’ author went from teaching at Magdalen College, Oxford to teaching at Magdalene College, Cambridge”
[site:en.wikipedia.org "Narnia" author teaching Magdalen College Oxford Magdalene College Cambridge]
1st result: C.S. LEWIS (right answer)
Often, the correct answer isn’t first, but other trivial heuristics can reveal the answer further down. For example, discard any title that has already appeared in the question, and thus is unlikely to be the answer. Consider:
“CHURCH” and “STATE” for $400:
“A Dana Carvey character on ‘Saturday Night Live’; Isn’t that special…”
[site:en.wikipedia.org dana carvey character "saturday night live" special]
Dana Carvey (struck as appearing in clue)
2nd: THE CHURCH LADY (right answer)
ETUDE, BRUTE for $2000:
“From 1911 to 1917, this Romantic Russian composed ‘Etudes-Tableaux’ for piano “
[site:en.wikipedia.org 1911 1917 romantic russian "etudes-tableaux" piano]
Etudes-Tableaux (struck as appearing in clue)
2nd: SERGEI RACHMANINOFF (right answer)
Even when this technique fails, it sometimes fails just like Watson or real contestants:
THE ART OF THE STEAL for $1600:
“In May 2010 5 paintings worth $125 million by Braque, Matisse & 3 others left Paris’ Museum of this art period”
[site:en.wikipedia.org may 2010 paintings 125 million braque matisse paris museum art period]
1st: Picasso (Watson’s wrong, nonsensical answer)
…iteratively stripping some words trying to find candidates matching ‘this art period’…
[site:en.wikipedia.org paintings braque matisse paris museum art period]
Braque (struck as appearing in clue)
2nd: Cubism (Ken Jennings’ wrong answer)
Matisse (struck as appearing in clue)
4th: MODERN ART (right answer; Watson’s 3rd option)
And even where this this technique doesn’t yield the answer in a top page title, the answer is usually close at hand.
HEDGEHOG-PODGE for $400:
“Some hedgehogs enter periods of torpor; the Western European species spends the winter in this dormant condition”
[site:en.wikipedia.org hedgehogs periods of torpor western european species winter dormant condition]
1st: Short-beaked Echidna (HIBERNATION, the correct answer, appears prominently in the snippet a few words from ‘torpor’)
2nd: Bat (HIBERNATION appears alongside ‘winter’ in snippet)
There are many, many other possible heuristics for knowing when to accept or reject the naive top-results, and when patterns of words in the source material could yield other answers not in the title, or create confidence in answers stitched together from elsewhere. For example, consider the clue:
HEDGEHOG-PODGE for $1600:
“Hedgehogs are covered with quills or spines, which are hollow hairs made stiff by this protein”
The phrase ‘this protein’ strongly indicates the answer is a protein; Wikipedia has a ‘list of proteins’. Only one protein on that list also appears on each of the ‘hedgehog’ and ‘Spines (zoology)’ pages: KERATIN, the correct answer.
With a full, inverse-indexed, cross-linked, de-duplicated version of Wikipedia all in RAM, even a single server, with a few cores, can run hundreds of iteratively-refined probe queries, and scan the full-text of articles for sentences that correlate with the clue, in the seconds it takes Trebek to read the clue.
That makes me think that if you gave a leaner, younger, hungrier team millions of dollars and years to mine the entire history of Jeopardy answers-and-questions for workable heuristics, they could match Watson’s performance with a tiny fraction of Watson’s hardware.
Unfortunately, Jeopardy didn’t open this as a general challenge to all, like the DARPA Grand Challenges, with a large prize to motivate creative entries. Jeopardy seems to have simply followed IBM’s lead – and perhaps even received promotional payments from IBM for doing so. (I can’t find a definitive statement either way.)
IBM is known, and rightly admired, for many things… but hardware thrift isn’t one of them. And the boost to IBM’s sales from this whole exercise wouldn’t be nearly as large if Watson were a single machine, able to be positioned on the podium next to its human challengers, barely larger than the monitor displaying Final Jeopardy answers. That wouldn’t move roomfuls of computers!
Nice job, Jeopardy and IBM, but next time: open it to stingier teams!
Memesteading, June 24, 2009: Seven Score Characters, The Gettysburg Tweet
The Daily Show, December 2, 2010: The Twittersburg Address
Their version appeared briefly at the end of a segment mocking Sarah Palin’s tweets.
At this rate, I expect the Daily Show to joke about the Facebook Likernet replacing the Internet by 2012.
They made different choices starting from the same concept, emphasized faithfulness to the original wording, but brutally abbreviated.
I chose faithfulness to spartan, cutesy tweetspeak – while translating the key points. So in place of the familiar poetry –”Fourscore and seven years ago our fathers brought forth on this continent a new nation…” (87 characters) – I favored clunky twinglish: “87yr ago natl dads…” (18 characters).
That helped me squeeze in, as “Battle hallowed ground > our words”, one extra shard of a salient Lincoln point: “The brave men, living and dead, who struggled here, have consecrated [this ground] far above our poor power to add or detract”. But I still had to cut any form of the speech’s ironically unfulfilled prediction: “[t]he world will little note nor long remember what we say here”.
Well, I suppose both the Daily Show and I have ‘little-noted’ it, after all.
Once thought of as the best concise speech ever, the future may view the Gettysburg Address, at over 1400 characters, as yet another example of how Before Twitter, people took forever to get to the point.
Prime Minister David Cameron is leader of the United Kingdom, a parliamentary democracy of 62 megacitizens that is hundreds of years old. Prince Mark Zuckerberg is Chief Executive of Facebook, a networked-membership corporate principality of 500 megacitizens that is just over 6 years old. The two nations share over 20 megacitizens.
Here we see Cameron and Zuckerberg’s exchange, on the occasion of the UK Government reaching out to Zuckerberg’s citizenry for economic advice.
By traditional standards of status, decorum and context, this exchange looks all wrong. One of these people certainly doesn’t belong. But given the aesthetics of the presentation — either embedded here or at YouTube — it’s hard to say which one. Bravo to Cameron for trying something new — and giving us a glimpse of international relations to come.
The Internet was a great prototype for geeks and knowledge-worker bees.
But the cool kids and average folks have arrived, and the Internet has been kind of a mess for them — what with spammers and phishers and predators and nutballs all over.
So now Facebook brings us the successor to the Internet: the Likernet.
Instead of the Internet’s web of links, the Likernet offers a social graph of likes.
What the hell was a “link”, anyway? And “web” sounds like something you’re stuck in before a spider eats you. I know what I like, and it’s not chains and spiders.
The Internet interprets censorship as damage and routes around it, which was kind of nice. Unfortunately the Internet also interprets every unguarded email, form, website, and program as an opening into which to spray its unsolicited marketing, harassment, and malware.
In the Likernet, things only come to you from friends. I like friends. Who doesn’t? In the Likernet, you don’t need filters and antivirus software — a stern look or sarcastic remark is enough to let your friend know when they should cut out the monkey business.
Google did a bang-up job of making the anarchic shantytown Internet habitable, with their rankings and filters and reported-attack warnings and sandboxes, but Google can now take some well-earned time off. The shiny Facebook highrises are ready for occupancy, with their reliable doormen and standard modern social plugin appliances.
Facebook’s Likernet is a bright, safe, sanitary metropolis. It’s like Singapore, but in cyberspace with 100 times more citizens. Most current Internet residents will prefer to move to the Likernet. And even if you don’t want to move, you may find the Likernet rising all around you, leaving older Internet districts as blighted slums.