Just 10 years ago, the notion that computers might soon match the human ability to understand speech and steer cars seemed fanciful at best. But that was before the rise of deep neural networks (DNNs)—a kind of software architecture whose unusual pattern-matching capabilities now power many of today’s artificial intelligence applications, from the speech recognition algorithms in smartphones to the software that helps guide self-driving vehicles. And yet, nobody knows why DNNs work so well on these tasks.
Tomaso Poggio has begun closing this theory gap regarding DNNs. Poggio, the Eugene McDermott Professor of Brain and Cognitive Sciences at MIT and director of the Center for Brains, Minds, and Machines (CBMM)—a multi-institutional collaboration headquartered at MIT’s McGovern Institute for Brain Research—has joined with colleagues to author a trio of papers that attempt to bring some of the mysterious, almost magical-seeming features of deep neural networks down to earth.
Can a working theory of deep neural networks begin to crack the puzzle of intelligence itself?
For example, the first paper explains how deep neural networks are ideally equipped to exploit a certain kind of mathematical structure—called a compositional function—that occurs in knotty pattern-recognition tasks. Discerning an image of a school bus within a grid of pixels, for instance, requires “composing” the image from smaller, nested subcomponents (like edges and basic shapes). This hierarchical structure mirrors the internal structure of DNNs themselves. It’s this hand-in-glove fit between deep neural networks and compositional functions that makes DNNs excel at “intelligent” tasks that stymie traditional computing methods.
Not only does this theory provide a roadmap for what kinds of problems deep-learning networks are ideally equipped to solve—it also sheds light on what kinds of tasks these networks probably won’t handle especially well. In other words, if a complicated task or problem can be described using compositional functions, a deep neural network may be the best computational tool to approach it with. But if the problem’s complexity doesn’t match the language of compositional functions, neural networks won’t handle it any better than other computer architectures will.
The theory may also illuminate similarities between the behavior of DNNs and that of another mysteriously effective pattern-matching system: the human brain. The fact that both deep-learning networks and our own cognitive machinery seem to “prefer” processing compositional functions strikes Poggio as more than mere coincidence. “For certain problems like vision, it’s kind of obvious that [humans] can recognize objects and then put them together in a scene,” he says. “Text and speech have this structure, too. You have letters, you have words, then you compose words in sentences, sentences in paragraphs, and so on. Compositionality is what language is.” If deep neural networks and our own brains are “wired up” in similar ways, Poggio says, “then you would expect our brains to do well with problems that are compositional”—just as deep-learning systems do.
Can a working theory of deep neural networks begin to crack the puzzle of intelligence itself? To Poggio, the question of how intelligence mysteriously arises out of certain arrangements of matter and not others gets at “not only one of the great problems in science, like the origin of the universe—it’s actually the greatest of all, because it means understanding the very tool we use to understand everything else: our mind.”
Portrait: Jason Grow Photography