Why toddlers are smarter than computers | Gary Marcus | TEDxCERN


Translator: Ellen Maloney
Reviewer: Helene Batt One of the biggest reasons
I work in artificial intelligence is because I think it genuinely
has the potential to change the world. I think there are a lot of problems
that we scientists can’t solve on our own, that our brains basically
aren’t big enough to handle the complexity of. So curing cancer, understanding
how the brain works, reducing energy consumption,
curing mental illness. These are all really complex problems. To take just one example, this is a diagram of all the genes
involved in one tiny part of the brain in Alzheimer’s. It’s an incredibly complex network
and we don’t understand, as individuals, how all of these things
relate to one another. We would like computers
to be able to help. The trouble is we’re not making
as much progress in artificial intelligence, I think,
as the world seems to think. You read headlines today
and they’re all about “deep learning.” “Scientists See Promise
In Deep-Learning Programs,” “‘Deep Learning’ Will Soon
Give Us Super Smart Robots.” That’s what most people think.
I’m actually not so sure. (Laughter) Intelligence, it’s important
to remember, is not homogeneous. There are lots of things
that go into intelligence. There’s perception, common sense,
planning, analogy, language, reasoning. These are all part of what we’d call
“intelligence,” and many more things. If you know Howard Gardner’s notion
of multiple intelligences, I think it’s fundamentally right. There are lots of things
that go into intelligence. We’ve made enormous progress in AI, but really just in one piece of that
which is in perception. And even in perception,
we haven’t got it all figured out yet. Here’s something machines
can do very well: they can identify a person. You train them on a lot of data
about some celebrities, and sure enough, it identifies
that this is Tiger Woods. Once in a while, it might get confused
and think it’s a golf ball, probably it will never tell you
that this is Angelina Jolie. (Laughter) The way that we do this nowadays,
is with big data. We derive statistical approximations
to the world from that big data. The most common technique for this now
is called a convolutional neural network, which was invented
by my NYU colleague, Yann LeCun. The idea is you have a series
of inputs into the system with labels on them. So this is a robot,
you get told it’s a robot. The system either
gets that correct or wrong, if it gets it wrong,
you adjust the stuff in-between. The stuff in-between is a set of nodes
which are modeled on neurons, very loosely modeled on neurons, and you’ll see layers there
going from left to right, and the idea is you start by detecting
low-level things about the image, like differences between light and dark, and you move up some hierarchy to things
like lines, circles, and curvy parts, until at the top of the hierarchy, you have things like Tiger Woods
or Oprah Winfrey, or what have you. As I say, it works perfectly fine
for simple categorization. But it doesn’t work the minute
the problem gets a little bit harder. Suppose you see an image like this. You might be able
to get your neural network – as these things are called –
to recognize the barbell. That wouldn’t be that hard. If you’re lucky,
it would recognize the dog. It might not because the ears
are in a configuration that dog ears almost never appear. Sort of straight out, right? That might actually stump the machine. But whether or not
that stumped the machine, I’m pretty sure that your neural net would not be able to tell you
that this is an unusual scene; that you don’t see a dog
doing a bench-press everyday, and this is something out of the ordinary. Here’s another example: This ran in the New York Times
when this came out. This was a paper that said, “Hey, wow! These deep learning systems,
they can caption images now!” If they really could do that perfectly,
I would be really impressed. But what we’ve got now
is certain cases that work really well, and others, not so well. Here’s a case that works really well: A person riding a motorcycle
on a dirt road. You show the computer this image
and it gives you the right answer. Here on the right, it gets it right too: Group of young people
playing a game of Frisbee. If you just look at these examples,
you’d say, “We’ve solved the problem. The machine understands what’s going on.” Then you show it this one. I would say the correct answer is maybe
a parking sign with stickers on it, but you could describe it differently. None of those would be
what the machine gives you, (Laughter) which is refrigerator filled
with lots of food and drinks. Makes me think of the Oliver Sack’s book,
“The Man Who Mistook His Wife for a Hat.” If my child did this, I would think
there was neurological problems. I would rush them to the doctor. The system doesn’t really understand
what a parking sign is, what a sticker is, what a refrigerator is, what drinks are, so it’s looking for the nearest thing
in its database, which is this melange of colors,
but that’s not really understanding. I’m about to show you a video
that the IEEE spectrum put together last year after the DARPA competition. DARPA was trying to have people
build “emergency robots,” and people did all kinds of work
in their lab to build robots that could do things like open doors
in case of emergency. People were thinking
of the Fukushima event, so you want to send robots in
where you can’t send people. All the things I’ll show you
were well practiced by the labs that participated in this competition, but as you’ll see, the results
left something to be desired. (Music) There’s more you can see on YouTube later, but that’s probably enough mocking
those particular sets of robots. The broader point that I want to make is that what we’re good at right now
as a field, in artificial intelligence, is the stuff on the left: the routine things
for which we have big data. So if you have a lot of data
about opening doors in a particular environment, you’re great. But what if the environment changes? Then you have only little data,
the unusual but important things. Or, what I jokingly call, “small data.” Humans are really good at small data,
but machines still aren’t very good at it. Part of it is because there’s little depth
of understanding, not even common sense. I recently wrote this article
with Ernie Davis on “Common Sense Reasoning in AI,”
and the people who put together the cover made this great cover
that makes the point very nicely that you have the robot here
that is sawing a tree limb. One way you could learn
about which side of the limb to sit on, when you were sitting with your chainsaw,
would be to collect a lot of data. But, you know, this is not good
for people sitting below the chainsaw, and not good for the robot. You don’t want to learn this
on the basis of big data, you want to have more abstract principles. Things get worse when you get
to scientific reasoning. Here’s a multiple choice exam, originally drawn
from eighth grader questions, made by Paul Allen’s Allen AI Institute. What do earthquakes tell scientists
about the history of the planet? One possibility – multiple choice – is, A: Earth’s climate
is constantly changing. B: The continents of Earth
are continually moving. C: Dinosaurs became extinct
about 65 million years ago. Or, D: The oceans are much deeper
than millions of years ago. Well, apparently if you’re a machine, most models that entered
the competition said: “C: Dinosaurs became extinct
about 65 million years ago.” Why is that? Probably because they’re
doing the equivalent of Google search, they’re doing keyword search. And “history of the planet,”
“65 million years ago,” “dinosaurs,” and “extinct,”
all kind of pop up at once. There’s no real understanding here
of what an earthquake is, or what the history of the planet is. Hopefully many of you sitting in CERN
realize that the answer is B, but not many of the machines did. As Wired magazine put it, “The Best AI
Still Flunks 8th Grade Science.” I’ve already told you
my vision is AI systems that could do scientific reasoning
on their own, and we’re not there yet. Here’s something I wrote a few years ago
and that I stand by every word of. I wrote this for the New Yorker
when deep learning became popular, and was front page news
in the New York Times. “Realistically, deep learning
is only part of the challenge of building intelligent machines. Such techniques lack ways
of representing causal relationships – what did what to whom – and are likely to face challenges
in acquiring abstract ideas.” Four years later, there’s much hype
about deep learning, and billions worth of investment. But we haven’t had progress
on the causal relationships, the abstract ideas,
the logical inferences, and so forth. It reminds me of an old parable. The parable is about building a ladder
when you want to get to the moon. Solving science through AI
is getting to the moon. Selling more advertisements isn’t, so we can use AI now to tell you
what else you might buy. “If you buy that books, you might
like this one,” and that’s great, but if you don’t buy the book,
it doesn’t really matter. But it matters when it comes
to things like medicine. We want the AI to really do it right. Well, building ladders that are getting us
an inch closer or an inch here might not be the right approach. What I think we need to think about is the difference between data
and abstract understanding. These are Boyle’s and Charles’ law
that you learned in high school chemistry. The blue dots represent the data. It’s easy for a big data
collection machine to organize that data, but what you really want
is essentially the lines. You want the idea about: what
is the relationship behind this data? So you can interpolate
where you haven’t seen things before, extrapolate beyond
what you’ve seen before. Which means really,
you want your AI systems to do something they haven’t done. Which is ask the question of “Why?” Not just “How much?” and “When?”
and “What is correlated with what?” But “Why are the things in the world
related to other things?” I think we have
only one model of a creature that asks this question a lot. And that would be the human toddler. This is my daughter, Chloe. She’s two and a half, and she asks me,
“Why?” roughly 20 times a day. (Laughter) “Why is it dark now?
“Why are you wearing a hat now?” She’s constantly asking “Why” questions. This is her brother, he’s a little older,
– it’s an older picture – he’s four. When he was two, he was studying
what I would describe as: “the functional utility of the hole”
on the top of a raspberry. He developed the concept,
not for the first time in history, of the “fingerberry” as he called it. So here he is with the fingerberry,
and maybe a few days after this picture, I was on the road, I was giving a talk,
and my wife sent me this text message. She says, to my son Alexander, – he’s two and a half years old – “Which of your animals
will come to school today?” And he says, “Big bunny.
Bear and platypus are eating.” So she walks to the next room
where his bedroom is, and she sees that he’s created
a diorama of bear and platypus and they are, in fact, eating. At this point, he was 100 percent
honest in his answers. What does this tell us? Well, for one, he understands
complex syntax. In a linguist’s term,
this is called a “WH question.” “Which of your animals
will come to school today?” If you’ve worked with Siri, you know that syntax is still
a challenge sometimes for computers. He was able to give novel answers
depending on recent updates to the state of the world. Or, instead of memorizing things and finding the most popular answer
that had been Googled for before, he was thinking about
what had happened right now, what was the current state of the world, and directly reflecting that
in his answers. He was doing logical reasoning; if they’re over there,
they’re not coming with me. So he’s able to integrate all this,
and importantly from the perspective of AI, he didn’t do this with massive data,
he did this with modest data. Two years, basically,
of people talking to him. First six months, I don’t think
he understood the phonology. (Laughter) So two years of people talking to him, and no direct access to what we call
in my trade, “labeled data.” I told you like you have Tiger Woods;
a picture of him, and you have the label. Picture of a golf ball, and a golf ball. He doesn’t get that most of the time,
and yet he was able to work it all out. So by now, a year and a half later,
he’s very flexible. When I was putting together this talk, I showed him this and said,
“What’s going on in the picture?” He said, “It’s an elephant
carrying an umbrella.” It’s not like in one of his books, there was an elephant with an umbrella,
and he had memorized that. He has a perceptual system
integrated with his language system, and he puts it all together. I said, “Is the umbrella
the right size for the elephant?” He said, “No, it’s too small.” He can, on the fly,
make inferences to things for which he has
a very small amount of data. This brings me to my main point. Which is very much inspired
by where we are. CERN is this vast, inter-disciplinary
and multi-country consortium to solve particular scientific problems. Maybe we need the same thing for AI. Most of the efforts in AI right now
are individual companies, or small labs working on small problems, like how to sell more advertising,
and things like that. What if we brought people together to try
this moonshot of doing better science? And what if we not only
brought together machine-learning experts and engineers who can make
faster hardware, but researchers who look at cognitive
development and cognitive science? I think maybe we could make some progress. I’m not saying humans are better
than machines at everything, humans aren’t nearly
as good as arithmetic. But we are better at asking “Why?”
and understanding science. Maybe we can learn something
from human children. So, here’s a way to think about it: We’ve been working
on computers for 60 years. We’ve made them much smaller,
much faster, more more energy efficient. This watch that I have can do
everything the ENIAC could do with an entire room 60 years ago. And yet, we still haven’t understood
how to program into a machine the flexibility of human thought. Or the ability of a child, toddler,
tiny toddler, to learn something new. Maybe it’s time that we try. Thank you very much. (Applause)

Leave a Reply

Your email address will not be published. Required fields are marked *