Tags
Alan Turing, Artificial intelligence, computer science, intelligence, Popular science, science, Turing Test
From Siri answering our questions and Watson advising nurses to smart apps that aggregate information to help us out (or spy on us), artificial intelligence is transforming our world. Despite incredible advances, somehow these amazingly “intelligent” systems sometimes seem profoundly stupid. Hector Levesque, a professor of computer science at the University of Toronto, likens them to savants. He was recently awarded the Research Excellence Award at the International Joint Conference on Artificial Intelligence in Beijing; he used his acceptance speech to highlight important questions about our approach to artificial intelligence and what it can tell us about ourselves.
In his speech, Professor Levesque distinguishes the technology of AI — everything from the expert systems and smart apps that are becoming part of our lives to advanced AIs like Watson and Deep Blue — from the science of AI, which aims to understand what intelligence is and how it works. Like engineering and physics, the two fields are interleaved and feed back into each other, but involve different goals, priorities, and processes. As Professor Levesque sees it, the science of AI is about understanding what intelligence is and how it works; building AI systems is one way to test our theories and discover their shortcomings. Before we can begin, we need to figure out how to judge whether or not something is intelligent. For that, we have to go back to Alan Turing.
Alan Turing was one of the great minds of the 20th century. The lynchpin of Britain’s cryptanalysis, he shaped the nascent field of computer science and also made major contributions to our understanding of how patterns form in developmental biology. Often called “the father of artificial intelligence”, Turing came up with a test to help decide whether a computer program should be considered intelligent. In the Turing Test, someone converses (via computer) with another person and with a computer program; if they can’t tell which is which, the program is “intelligent”. The idea is basically “intelligent is as intelligent does” — an intelligent program should behave so much like a person that we can’t distinguish them simply from a conversation (or other behavioral test).
Levesque seems to agree with the basic idea, but he’s not convinced about the Turing Test. The problem is that programs can pass the Turing Test by being deceptive rather than intelligent. All the program has to do in order to pass is fool a human. It turns out that it doesn’t necessarily take intelligence to engage in convincing conversation; canned responses using evasive answers, wordplay, and emotional outbursts can go a long way towards creating a simulacrum of conversation without the need for an intelligent system. Fooling people in a casual conversation is a pretty impressive trick, but maybe it’s not the best basis for deciding whether or not something is intelligent.
So what would be a better way to test a program? Professor Levesque suggests seeing how well it answers carefully crafted multiple choice questions like:
The trophy would not fit in the brown suitcase because it was so small. What was so small?
a) the trophy
b) the brown suitcase
The large ball crashed right through the table because it was made of styrofoam. What was made of styrofoam?
a) the large ball
b) the table
These Winograd schema, which Levesque named after the computer scientist Terry Winograd, seem like they would be less vulnerable to the sorts of tricks that undermine the Turing Test. Determining the correct answer doesn’t rely on deception, and statistical (“big data”) approaches won’t necessarily work either. For example, a program might scan a large amount of text and find out how frequent the phrases “trophy is so small” and “bag is so small” are in the context of the phrase “trophy does not fit in bag”, but it still wouldn’t have enough information to answer the question. The correct answer also depends on the relationship between the objects; if you change the question by replacing “because” with “despite the fact that”, the opposite answer becomes correct. Likewise, the second example presents a challenge to a program because answering correctly depends on knowing something about styrofoam.
These sorts of questions are hardly new to computer science — research on them dates back to the 1970s — but Professor Levesque has revived interest in them by focusing on their relevance to artificial intelligence and the Turing Test. Even though they’re simple questions which humans can easily answer, most state of the art programs only get the answer right about half the time — in other words, they might as well be guessing. In a recent paper, Altaf Rahman and Vincent Ng of the University of Texas at Dallas present a program which does significantly better, answering correctly three out of four times. It’s a step forward, but there’s still a lot of room for improvement.
So maybe these questions should be used in the Turing Test? That would certainly be useful, but I think the real point of this approach is to shift our focus away from trying to convince (or fool) a human and towards trying to understand the basis of intelligence. The Winograd schemas are challenging because they combine logical reasoning and knowledge about the world. As such, the questions embody a hypothesis about what constitutes intelligence and how it works. Although we’ve come to excel at building systems which are experts at a specific task, like chess or recognizing pictures, few people would call these systems truly intelligent. Perhaps a new approach will help us better understand this thing we call intelligence, a crucial aspect both of who we are and of the AIs we hope to one day build.
Refs
Rahman, Altaf and Ng, Vincent (2012) Resolving Complex Cases of Definite Pronouns: The Winograd Schema Challenge. Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 777–789, Jeju Island, Korea, 12–14 July 2012.
Levesque, Hector J. (2013) On our best behaviour. A written version of the Research Excellence Lecture presented at the IJCAI-13 conference in Beijing.
I know there are tons of studies out there trying to understand and “quantify” animal (either birds, primates, dogs, etc) intelligence. Does any of that research play into or influence how we create intelligent machines?
I’m not an AI researcher, but I think all kinds of behavioural studies inform research into creating “intelligent” systems. In many cases, we just want to build something that will integrate information from different sources and make an appropriate decision…and evolution has furnished us with many mechanisms for doing that (if we can figure out how they work). If memory serves, even things like quorum sensing by micro-organisms have been informative.
I wasn’t aware of those Winograd schemas, thanks for sharing. I actually discussed some of this with a friend last night and it really becomes challenging to even define what we mean when we say intelligence. I suppose the holy grail of intelligence could be about self-consciousness and we’re far far away from this. In the mean time though all the approaches out there are kind of stumbling in the right direction and maybe by progressively brining together maturing sub-fields of AI we can get there. You’ve talked a few times on this blog about how science evolves over time and it’s really interesting to see it play out in AI. The subject of scrutiny though is our own minds and in that regard it’s neat to think on and observe how advances in neuroscience, psychology, and even philosophy contribute to our understanding of intelligence and define goals for AI. Thanks for the post!
I’m glad you liked the post! Defining “intelligence” is definitely one of the big hurdles, but I guess that’s the idea behind the Turing Test (and the Winograd “improvements”): it’s supposed to offer us a way to identify intelligence without explicitly defining it.
Self-consciousness is where that falls short, though. It might be the holy-grail of AI, but I’m not sure it’s actually required for intelligence (even if we restrict that to non-trivial forms of intelligence). If a program passed the Turing Test or correctly answered Winograd schemas or even convincingly controlled an avatar in, eg, Second Life, we would probably concede that it’s intelligent…but I think we would just assume that meant it was self-conscious. It’s easy to spot intelligent systems that aren’t self-conscious, but I imagine that distinction could get harder with really advanced AIs. I guess the issue I’m getting at is: could something fake being self-aware? Or would convincingly faking self-awareness mean that it was aware (because it’s a reflexive property)?
I’m not sure if I managed to convey all that clearly — sorry if I failed. 🙂