Hand a toddler a little red ball, and they will probably learn to recognize the next ball they encounter as a ball — even if it’s blue, or twice as large, or doesn’t bounce.
Show the same ball to a computer algorithm, and it might need a thousand other images of a ball to do the same thing.
Bridging the gulf between these abilities is the challenge that connects the work of David Crandall, a professor at the Indiana University Luddy School of Informatics, Computing and Engineering, and Linda Smith and T. Rowan Candy, professors in the College of Arts and Sciences’ Department of Psychological and Brain Sciences and the IU School of Optometry, respectively, who are studying how children learn and process information visually.
To conduct their research, Smith and Candy’s labs study the behavior of toddlers wearing head-mounted cameras as they play with toys and interact with caregivers. The resulting footage provides a wealth of visual information from children’s point of view as they explore the world around them. Crandall’s lab uses that data to glean insights into challenges in the fields of computer vision and artificial intelligence.
“Obviously the ‘hardware’ we’re studying — the computer versus the baby’s brain — is totally different,” Crandall said. “The whole mechanism is completely different; it’s electrical versus biological. But if you imagine both as a ‘black box’ learning machine, then you can find commonalties. What are the inputs? What is the training data? What are the system’s motivations? What is it trying to achieve? You can glean a lot of insights that are directly applicable to AI.”
One of the key insights to emerge from Smith’s lab has been that learning and the physical world are strongly linked. Toddlers don’t just see the ball; they feel it, they turn it in their hands to examine it from different angles and lighting conditions, and they might even smell or taste it. The richness of information that results from a single encounter with an object can’t match the information conveyed by a single image or video, which is all that’s available to artificial intelligence, Crandall said.
“It turns out that being embodied provides some significant learning advantages,” he said. “Kids get multimodal input. They get sounds, they get touch, they get the other senses. They get to manipulate their environment. Computers can’t do any of that — or at least not yet.”
Based upon insights from Smith and Candy’s labs, Crandall has been able to successfully improve some computational learning models. For example, he said, AI researchers tend to take a “more is more” approach to training data. To teach a computer to recognize a car, for instance, they might show it every possible example of a car that they can find on the Internet. But data from the toddler learning labs suggests that too much variety causes confusion. Instead, Crandall’s research has found that AI can be trained most efficiently with a mix of images that fall within certain boundaries of similarity.
“You want some diversity in your data, but not too many ‘outliers,’” Crandall said. “We’ve found that this combination yields better results with fewer images. That’s very different from the typical computer vision data set.”
This insight can be applied to other projects in Crandall’s lab. For example, using computer vision to identify faulty electronics in military hardware — a partnership with Naval Surface Warfare Center Crane — requires training machines to recognize visual information for which there are few existing examples.
“You can’t just go online and download 10 million images of counterfeit parts if you want to teach a machine to recognize faulty microchips,” Crandall said. “You need to make your training as data efficient as possible.”
Watch the video, “Unlocking the secrets to human vision” with audio descriptions.
Crandall’s work with infant learning data goes back about 10 years, he added, starting with a collaboration with Smith and Chen Yu, also a professor of psychological and brain sciences studying infant learning using visual data at IU at the time.
That effort focused on a comparatively rudimentary challenge: how to teach a computer to label objects in the blurry and chaotic footage that results from strapping a camera to the head of an energetic toddler. Prior to collaborating with Crandall’s lab, the easiest way to quantify this data was through painstaking and time-consuming human effort.
The project later grew into a more robust collaboration with support from the IU Bloomington Office of the Provost. Insights from the work have since gone on to inform the collaboration with Crane as well as other efforts to apply AI to creating more effective and engaging learning environments.
“A decade ago was a great time to start exploring these questions,” Crandall said. “That’s when deep learning and neural networks ‘revolutionized’ AI, and interest in the field really took off. But, despite what all of the hype around AI would have you believe, I think AI research is really in its infancy. And I think the path forward — to make AI that is safe, effective and reliable — is to better understand how kids learn and how people think.”