From Siri answering our questions and Watson advising nurses to smart apps that aggregate information to help us out (or spy on us), artificial intelligence is transforming our world. Despite incredible advances, somehow these amazingly “intelligent” systems sometimes seem profoundly stupid. Hector Levesque, a professor of computer science at the University of Toronto, likens them to savants. He was recently awarded the Research Excellence Award at the International Joint Conference on Artificial Intelligence in Beijing; he used his acceptance speech to highlight important questions about our approach to artificial intelligence and what it can tell us about ourselves.
In his speech, Professor Levesque distinguishes the technology of AI — everything from the expert systems and smart apps that are becoming part of our lives to advanced AIs like Watson and Deep Blue — from the science of AI, which aims to understand what intelligence is and how it works. Like engineering and physics, the two fields are interleaved and feed back into each other, but involve different goals, priorities, and processes. As Professor Levesque sees it, the science of AI is about understanding what intelligence is and how it works; building AI systems is one way to test our theories and discover their shortcomings. Before we can begin, we need to figure out how to judge whether or not something is intelligent. For that, we have to go back to Alan Turing.
Alan Turing was one of the great minds of the 20th century. The lynchpin of Britain’s cryptanalysis, he shaped the nascent field of computer science and also made major contributions to our understanding of how patterns form in developmental biology. Often called “the father of artificial intelligence”, Turing came up with a test to help decide whether a computer program should be considered intelligent. In the Turing Test, someone converses (via computer) with another person and with a computer program; if they can’t tell which is which, the program is “intelligent”. The idea is basically “intelligent is as intelligent does” — an intelligent program should behave so much like a person that we can’t distinguish them simply from a conversation (or other behavioral test).
Levesque seems to agree with the basic idea, but he’s not convinced about the Turing Test. The problem is that programs can pass the Turing Test by being deceptive rather than intelligent. All the program has to do in order to pass is fool a human. It turns out that it doesn’t necessarily take intelligence to engage in convincing conversation; canned responses using evasive answers, wordplay, and emotional outbursts can go a long way towards creating a simulacrum of conversation without the need for an intelligent system. Fooling people in a casual conversation is a pretty impressive trick, but maybe it’s not the best basis for deciding whether or not something is intelligent.
So what would be a better way to test a program? Professor Levesque suggests seeing how well it answers carefully crafted multiple choice questions like:
The trophy would not fit in the brown suitcase because it was so small. What was so small?
a) the trophy
b) the brown suitcase
The large ball crashed right through the table because it was made of styrofoam. What was made of styrofoam?
a) the large ball
b) the table
These Winograd schema, which Levesque named after the computer scientist Terry Winograd, seem like they would be less vulnerable to the sorts of tricks that undermine the Turing Test. Determining the correct answer doesn’t rely on deception, and statistical (“big data”) approaches won’t necessarily work either. For example, a program might scan a large amount of text and find out how frequent the phrases “trophy is so small” and “bag is so small” are in the context of the phrase “trophy does not fit in bag”, but it still wouldn’t have enough information to answer the question. The correct answer also depends on the relationship between the objects; if you change the question by replacing “because” with “despite the fact that”, the opposite answer becomes correct. Likewise, the second example presents a challenge to a program because answering correctly depends on knowing something about styrofoam.
These sorts of questions are hardly new to computer science — research on them dates back to the 1970s — but Professor Levesque has revived interest in them by focusing on their relevance to artificial intelligence and the Turing Test. Even though they’re simple questions which humans can easily answer, most state of the art programs only get the answer right about half the time — in other words, they might as well be guessing. In a recent paper, Altaf Rahman and Vincent Ng of the University of Texas at Dallas present a program which does significantly better, answering correctly three out of four times. It’s a step forward, but there’s still a lot of room for improvement.
So maybe these questions should be used in the Turing Test? That would certainly be useful, but I think the real point of this approach is to shift our focus away from trying to convince (or fool) a human and towards trying to understand the basis of intelligence. The Winograd schemas are challenging because they combine logical reasoning and knowledge about the world. As such, the questions embody a hypothesis about what constitutes intelligence and how it works. Although we’ve come to excel at building systems which are experts at a specific task, like chess or recognizing pictures, few people would call these systems truly intelligent. Perhaps a new approach will help us better understand this thing we call intelligence, a crucial aspect both of who we are and of the AIs we hope to one day build.
Rahman, Altaf and Ng, Vincent (2012) Resolving Complex Cases of Definite Pronouns: The Winograd Schema Challenge. Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 777–789, Jeju Island, Korea, 12–14 July 2012.
Levesque, Hector J. (2013) On our best behaviour. A written version of the Research Excellence Lecture presented at the IJCAI-13 conference in Beijing.