When I first heard that IBM created a machine called Watson specifically to play Jeopardy!, I thought, "PR stunt." Especially because its televised bout, which starts tonight, is against the two closest approximations to rock stars in Jeopardy!’s history, Ken Jennings (a record 74 straight wins) and Brad Rutter (a record $3,255,102 in cumulative winnings).
I also thought, "Well, come on: Every day I use my phone to ask Google how many ounces are in 2.6 pounds, or what the current temperature is in Honolulu, or who created Batman. That a team of IBM scientists took four years to make a really good trivia computer seems sad and wasteful, not impressive. Did they get hung up on getting it to answer in the form of a question?"
But it doesn’t take much reflection to realize how extremely specific our language must be to get a desired result from, say, a Google search, Google’s sophisticated autocorrect mechanism notwithstanding. It’s still a very far cry from how normal humans speak: Different people have different accents, phrase things differently, they use puns, they pepper their talk with references specific to their listeners.
Recall that Jeopardy! has all kinds of wordplay-based questions, so when Watson is presented with, for example, "A rhyming reminder of the past in the city of the NBA's kings" (a "Sacramento memento"), its processes are incredibly more complicated than those that occur when you string together proper nouns and click “I’m feeling lucky” on Google. While one can’t deny the PR IBM’s getting from this, it’s hard to imagine a better benchmark for a question-and-answer machine than a gold medal on TV’s most sophisticated trivia show. (Though admittedly, watching a computer totally own Are You Smarter Than A Fifth Grader and putting that smarmy Jeff Foxworthy in his place would have been mighty satisfying.)
There are two particularly neat and human-like aspects to how Watson operates. The first is that it chooses whether to buzz in based on its level of confidence. It’s vital to Jeopardy! strategy to account for the fact that you’re penalized for wrong answers. So upon recognizing a question, Watson not only takes into account sentence structure, proper nouns, homophones, etc., it also calculates the likelihood of its answer being correct and weighs that against how much money is at stake.
This is how Watson’s designers measured its progress. They noted that average Jeopardy! winners buzz in on about 41 percent of the game’s questions, getting about 88 percent correct. Championship players like Jennings and Rutter answer about 64 percent of the questions on the board, and are about 90 percent accurate. No question-and-answer machine can ever be judged as 100 percent accurate, so besting experts at a wordplay-heavy trivia game is the high-water mark.
The second is the concept of “machine learning.” It’s the notion of computers continually refining their definitions by making associations. An example: What do you think of when you see the word tree? You may picture a maple tree, or an evergreen. Does it bear fruit? Does a sapling count? It can be nearly impossible to come up with a simple way to encompass the full concept evoked by a simple, nonexclusive word. We understand it, of course, not by having memorized Webster’s definition, but by considering several overlapping associations.
In a similar sense, Watson refines its definitions by continually updating its associations. When it gets an answer wrong, it’s told how it should have answered, which allows it to perpetually improve itself. Watson’s gone through scores of old episodes of Jeopardy!, refining what constitutes a correct answer.
It now even learns from other contestants’ answers. Here’s an example: In a test match a few months ago in front of Jeopardy! producers, Watson was up by $3,000 when it came to a category called “Celebrations of the Month.”
The first clue was, “Administrative Professionals Day & National CPAs Goof-Off Day.” Watson got it wrong, answering “holiday.” One of the two human contestants knew it: “April.”
“D-Day anniversary & Magna Carta day” was next. Watson wasn’t confident with any answer, but the other human was, correctly guessing: “June.”
But even though Watson didn’t understand the category, it saw a pattern: All correct responses were months with which some of the key words are associated. By the fifth clue in the category, “National Teacher Day & Kentucky Derby day,” Watson had regained enough confidence to buzz in again, this time getting it right: “May.”
Dr. David Ferrucci, the lead researcher on the Watson/Jeopardy! project (if you’ve seen the commercials for this special, he’s the guy with the imposing goatee), says that once the game is over, Watson’s going to be dismantled forever, as was the fate of IBM’s Big Blue in 1997. But IBM will immediately begin using Watson’s technology in a number of fields—likely government and military, as well as science and medicine. Most important to Ferrucci is the technology's usefulness as a diagnostic tool, like to help doctors read MRIs and CAT scans. He foresees a paradigm shift, where Watson-like computers, using machine learning, will sift through millions of X-rays, paired with patient info and the patient’s diagnoses, until it can confidently name what’s wrong with a patient with greater accuracy than a human doctor.
So at the risk of sounding too operatic, if you tune in to Jeopardy! starting tonight and continuing through Wednesday, you can monitor the progress of one of the most important technological developments of our day. If Watson's descendants are going to one day advise me on whether to get a kidney transplant, I’ll be much more comfortable if I’ve already watched it outsmart some of the greatest minds in game-show history.
Watson competes on Jeopardy! from February 14 through February 16.