It’s 2025 (give or take), and the long-awaited Big One has hit the San Francisco Bay Area. In the frenetic aftermath, teams of specialized rescue workers begin tearing through piles of wreckage—searching for signs of life, administering care, and calling for backup.
These first responders aren’t Red Cross volunteers or paramedics. As Stanford University’s leading AI scientist Fei-Fei Li imagines it, they’re robots with the smarts to “see” through their immediate surroundings and respond to humans in need, saving the maximum number of lives they can. The enabling technology behind this scenario is one Li has thought about and researched deeply—and it’s not too far off, she argues, if computers can master what is arguably humankind’s most complicated cognitive ability: vision.
Current research, led by Li and the Stanford Artificial Intelligence Laboratory she directs, has already gotten us partially there, thanks to a database of more than 15 million digital images built in 2009. Each year since, researchers have used the database for the Large Scale Visual Recognition Challenge, a competition to develop algorithms that can teach computers to identify and understand the content of the images. In 2014, participants’ software programs recognized objects and actions with nearly double the accuracy of previous years, thanks to faster computing and smarter code. In late 2014, Li and her students produced one of the first computer vision models capable of generating human-like sentences to describe an image it “sees.”
Computer vision, Li argues, is the key enabling technology for all of AI. “Understanding vision and building visual systems is really understanding intelligence,” says Li in her office at Stanford’s Gates Computer Science Building. “And by see, I mean to understand, not just to record pixels.”
A New Kind of Brainpower
There’s a simple reason why AI scientists—not just Li and academics, but researchers at Google, Facebook and Microsoft—are pouring resources into computer-vision technology. We use half of our precious human brainpower for visual processing; it’s a cognitive capability that has taken 540 million years of evolution to develop. Li points to her head and jokes: “This real estate is pricier than Bay Area housing.” Vision is so critical to how we understand the world, Li argues, it’s hard to imagine any intelligent computer of the future without it. Any decent self-driving car will eventually need to distinguish between, say, a large rock in the roadway and a similar-sized paper bag—and that it should brake and steer to avoid the rock but ignore the bag.
Today, computers can spot a cat or tell us the make, model, and year of a car in a photo, but they’re still a long way from seeing and reasoning like humans and understanding context, not just content. (A bat on a youth baseball field and at a crime scene has two very different meanings.) “The next step for my lab,” Li says, “is to build the cognitive capability we need in fundamental vision tasks like understanding scenes, human behaviors, and relationships, and reasoning and telling stories.”
Illuminating Humanity’s “Dark Matter”
Teaching computers to see has applications well beyond identifying things that merely appear in our physical world. Better machine vision could reveal details and insights about us that we don’t even know. Every day, the Internet generates what Li calls the “dark matter of the digital age”—trillions of images and videos and other scraps of digital minutiae. More than 85 percent of content on the Web is multimedia imagery—and it’s a chaotic mess. “There is a fundamental reason that we need to understand this,” she says. “The recording of our lives, our daily activities, our relationships—be it my personal life or what’s going on in society—is in these contents.”
Those visual descriptors of humankind are growing faster than we can imagine. The volume of photos and videos generated in the past 30 days is bigger than all images dating back to the dawn of civilization. It’s humanly impossible to document all of this data, but intelligent machines that recognize patterns and can describe visual content with natural language could well be our future historians.
While Li says computer vision will eventually impact everything from monitoring and combating the effects of climate change to building smart homes, she’s most excited about its medical applications. “The day healthcare can fully embrace AI is the day we have a revolution in terms of cutting costs and improving care,” she says.
Small wonder that Li and students at the Stanford Computer Vision Lab are working with Stanford Medical School and Hospitals to relieve nurses of mundane charting tasks, which the average American nurse spends 45 minutes on every day. In Stanford Hospital’s ICU, clinicians check on gravely ill patients every two hours and score their health on a scale of -4 to 4. Li says she wants to build a system to continuously monitor the patient (detecting mobility, pain level, and alertness, for instance), not only to relieve busy nurses and doctors, but also to provide denser, more accurate, and unbiased data to clinicians who oversee the patient’s care.
The Vision Lab is also working with San Francisco nursing homes to figure out how AI can help seniors live more independently.
As Diversity Advances, So Will the Tech
Like any new tech innovation, computer vision has the potential to be used for nefarious purposes, starting with high-level, highly intrusive visual surveillance. Li doesn’t take the issue lightly. “Every technology can be an enabler of vices,” she says, “but as a scientist you have to have that social awareness and be very aware of these potential risks.”
Such risks are deeply intertwined with what Li calls the crisis of her professional life—the lack of diversity in technology research and AI, from corporations to academia. Solving the diversity issue long-term, she says, will help preserve the benevolent direction of research and mitigate the dark-side risks. “We need to inject humanism into our AI education and research by injecting all walks of life into the process,” she says, adding that attracting diverse groups to the field will provide the needed checks and balances and keep values front and center.
“From the day an idea is conceptualized to the day the technology is built, carried out and regulated, it’s important to have that human awareness,” she says. But that’s not the way it works today. Although she’s the director, Li is the only full-time female faculty—out of 15—at the Stanford AI Lab. (Elsewhere, the 39-person Facebook AI Research (FAIR) team includes just two women.) And although Li is working to change it—she holds afternoon teas for women in AI and is organizing an inaugural AI summer camp for ninth-grade girls at Stanford—she admits that like her own research, progress in diversity has a long way to go.