When Google used 16,000 machines to build a simulated brain that could correctly identify cats in YouTube videos, it signaled a turning point in the art of artificial intelligence.
Applying its massive cluster of computers to an emerging breed of AI algorithm known as “deep learning,” the so-called Google brain was twice as accurate as any previous system in recognizing objects pictured in digital images, and it was hailed as another triumph for the mega data centers erected by the kings of the web.
“The research is representative of a new generation of computer science that is exploiting the falling cost of computing and the availability of huge clusters of computers in giant data centers,” The New York Times wrote in 2012, “leading to significant advances in areas as diverse as machine vision and perception, speech recognition, and language translation.”
Indeed, in the two years since, Microsoft released a Skype service that uses deep learning to instantly translate conversions from one language to another, Facebook hired one of the leading experts in the field to boost image recognition and other tools on its service, and everyone from Twitter to Yahoo snapped up their own deep learning startups.
But in the middle of this revolution, a researcher named Alex Krizhevsky showed that you don’t need a massive computer cluster to benefit from this technology’s unique ability to “train itself” as it analyzes digital data. As described in a paper published later that same year, he outperformed Google’s 16,000-machine cluster with a single computer—at least on one particular image recognition test.
This was a rather expensive computer, equipped with large amounts of memory and two top-of-the-line cards packed with myriad GPUs, a specialized breed of computer chip that allows the machine to behave like many. But it was a single machine nonetheless, and it showed that you didn’t need a Google-like computing cluster to exploit the power of deep learning.
A researcher named Alex Krizhevsky showed that you don’t need a massive computer cluster to benefit from deep learning.
Harnessing this AI technology still requires a certain expertise—that’s why the giants of the web are buying up all the talent—and thanks to their massive data centers and deep pockets, the Googles of the world can take this technology to places others can’t. But many data scientists are now using single machines—ordinary consumer machines built for gaming—to solve their own problems via deep learning algorithms.
At Kaggle, a site where data scientists compete to solve problems on behalf of other businesses and organizations, deep learning has become one of the tools of choice, and according to Kaggle chief scientist Ben Hamner, single machines have been used to tackle everything from analyzing images and speech recognition to chemoinformatics.
For Richard Socher, a Stanford University researcher who has made extensive use of deep learning in systems that recognize natural language, this is another sign that these AI techniques can trickle down to smaller companies. “It’s very easy to deploy these kinds of models,” Socher says. “Anyone can buy a GPU machine.”
At the same time, startups are beginning to build cloud services that offer deep learning tools, and others are rolling out to new of software and consulting services to companies outside the giants of the web. This too can help democratize the technology. “There are only so many companies have datasets the size of Google’s and Facebook’s and Yahoo’s,” says Socher, who only used single machines in his own deep learning work. “Other, normal companies have smaller datasets, and they can train models too.”
The Rise of the GPU
GPU is short for graphics processing unit. These chips were originally built to quickly generate graphics and other images on behalf of games and other highly visual applications, but because of their ability to handle a certain kind of math calculation, they’re good for all sorts of other tasks. As it turns out, one of these task is deep learning.
Deep learning tries to mimic the behavior of neural networks in the human brain. In essence, it creates multi-layered software systems that—if properly configured—can train themselves as they analyze more and more data. Whereas traditional machine learning requires an awful lot of hand-holding from human engineers, deep learning does not.
These multi-layers neural nets involve many computer chips working in parallel—thus Google’s 16,000 machines—but you can also handle this kind of parallel processing with GPUs, processors that can be slotted into a single machine in enormous numbers. A top-of-the-line computer graphics card includes more than 2,000 of these processors.
In running deep learning algorithms on a machine with two GPU cards, Alex Krizhevsky could better the performance of 16,000 machines and their primary CPUs, the central chips that drive our computers. The trick involves how the algorithms operates but also that all those GPUs are so close together. Unlike with Google’s massive cluster, he didn’t have to send large amounts of data across a network.
As it turns out, Krizhevsky now works for Google—he was part of a deep learning startup recently acquired by the company—and Google, like other web giants, is exploring the use of GPUs in its own deep learning work. But as Socher explains, the larger point here is that GPUs provide an onramp to deep learning for much smaller outfits.
At Kaggle, data scientists are using deep learning algorithms on $3,000 gaming machines, which include a single graphics card. Typically, they’re working on problems involving image and speech recognition, but the technology can help in other areas as well. The first Kaggle competition won by a deep learning machines involved predicting a biological responses to certain molecules based on their chemical structure. “They trained on a single system,” Hamner explains. “We take the same technology that’s used for graphics and videos games and apply it to scientific purposes.”
A Question of Size
Certainly, there are cases where a 16,000-system cluster is far more useful—to say the least. The likes of Google and Facebook are analyzing enormously large collections of images and digital sound as they train their systems. But if your datasets are smaller, a single system can still provide a level of artificial intelligence that traditional machine learning systems aren’t capable of.
As Socher points out, deep learning involves two stages of computing. There’s the training stage–where a system learns to operate by analyzing data—and then there’s the stage where you actually out the system to work on a problem. The training stage requires more processing power, but in many cases, he says, you can even train systems on single machines. “It all depends on how fast of a turnaround you want,” he says.
The added rub is that, well, the giants of the web are buying up all the deep learning talent, and as Socher says, this talent is still vital in setting up these neutral nets. “Training a deep neural net is still just as much an art as a science. Many parameters used to train neural networks are based on intuition.”
That said, many deep learning algorithms are open source, meaning anyone can use them, and various startups, including a San Francisco outfit called Skymind, working to train data scientists in the vagaries of these algorithms. The Googles and the Facebooks are leading the way in this AI revolution, but so many others will follow.
No comments:
Post a Comment