Leading the Way for Big Data Startups, Yahoo Spin-Off Files for IPO


Hortonworks—the big-data startup spun off from Yahoo—has filed for an initial public offering.


The Silicon Valley startup sells support and services for its own version of the open source data-crunching software Hadoop, a mainstay among modern web companies. Its Wall Street debut is a milestone for much larger effort to create a market for new data center technologies developed at web giants like Google and Facebook—technologies designed to solve unusually large problems involving online data.


Yahoo funded the early development of Hadoop, which was initially based on technologies Google built to help run its search engine, and it quickly found a home outside of Yahoo, at companies like Facebook, eBay. and Twitter, spawning a whole ecosystem of new tools for storing and analyzing large amounts of online data. Along the way, Hortonworks spun-out of Yahoo, bringing much of the company’s original Hadoop team with it.


Hortonworks is part of the first generation of open source companies aiming to bring Google-type know-how to the larger market, and now, many other startups are nipping at its heels. A new company called Databricks is bringing the next generation data processing platform Spark to wider audiences, for instance, while Continuuity offers an online service that mimics Facebook’s internal data management platform.


Investors have been bullish on big data startups, and Hadoop-related companies have received much of their attention. Hortonworks has raised $248 million in venture capital to date, and its biggest competitor, Cloudera, which employs co-creator Doug Cutting, has raised $1.2 billion. Other competitors include MapR, co-founded by former Google infrastructure lead M.C. Srivas, and the EMC spin-off Pivotal.


Hortonworks has tried to set itself apart by staying as true to its open source roots as possible. While other Hadoop companies have focused on building proprietary Hadoop management tools to add value to the open source project, the startup has focused more on offering services. Although the company’s enterprise version has a few features reserved only for paying customers, the vast majority of its features are open source.


Hortonworks is the first of its peers to offer an IPO, but the competition will continue. “I don’t think that this IPO means that Hortonworks has won,” says Forrester analyst Mike Gualtieri. In fact, Hortonworks has a long way to go before it’s even a profitable company. As pointed out by Recode, Hortonworks has reported a $86.7 million loss on $33.3 million in revenue so far this year.


Gualtieri estimates that the other Hadoop companies are probably making around the same amount. Many companies are testing Hadoop, but they’re still using free versions of the software and won’t have to start paying vendors until the projects move out of the experimental phase and into real commercial use. And though there are many players in the market, along with many new big data tools, he expects there to be huge demand for Hadoop services in coming years. “For what it does,” he says, “Hadoop is still the only game in town.”


Correction 11/10/2014 at 8:25 PM EST: An earlier version of this article incorrectly stated that Doug Cutting is a co-founder of Cloudera. He was an early employee of the company, but not a co-founder.



Leading The Way for Big Data Startups, Yahoo Spin-Off Files For IPO


bigdatacomputers

infocux Technologies/Flickr



Hortonworks—the big-data startup spun off from Yahoo—has filed for an initial public offering.


The Silicon Valley startup sells support and services for its own version of the open source data-crunching software Hadoop, a mainstay among modern web companies. Its Wall Street debut is a milestone for much larger effort to create a market for new data center technologies developed at web giants like Google and Facebook—technologies designed to solve unusually large problems involving online data.


Yahoo funded the early development of Hadoop, which was initially based on technologies Google built to help run its search engine, and it quickly found a home outside of Yahoo, at companies like Facebook, eBay. and Twitter, spawning a whole ecosystem of new tools for storing and analyzing large amounts of online data. Along the way, Hortonworks spun-out of Yahoo, bringing much of the company’s original Hadoop team with it.


Hortonworks is part of the first generation of open source companies aiming to bring Google-type know-how to the larger market, and now, many other startups are nipping at its heels. A new company called Databricks is bringing the next generation data processing platform Spark to wider audiences, for instance, while Continuuity offers an online service that mimics Facebook’s internal data management platform.


Investors have been bullish on big data startups, and Hadoop-related companies have received much of their attention. Hortonworks has raised $248 million in venture capital to date, and its biggest competitor, Cloudera, which was co-founded by Hadoop co-creator Doug Cutting, has raised $1.2 billion. Other competitors include MapR, co-founded by former Google infrastructure lead M.C. Srivas, and the EMC spin-off Pivotal.


Hortonworks has tried to set itself apart by staying as true to its open source roots as possible. While other Hadoop companies have focused on building proprietary Hadoop management tools to add value to the open source project, the startup has focused more on offering services. Although the company’s enterprise version has a few features reserved only for paying customers, the vast majority of its features are open source.


Hortonworks is the first of its peers to offer an IPO, but the competition will continue. “I don’t think that this IPO means that Hortonworks has won,” says Forrester analyst Mike Gualtieri. In fact, Hortonworks has a long way to go before it’s even a profitable company. As pointed out by Recode, Hortonworks has reported a $86.7 million loss on $33.3 million in revenue so far this year.


Gualtieri estimates that the other Hadoop companies are probably making around the same amount. Many companies are testing Hadoop, but they’re still using free versions of the software and won’t have to start paying vendors until the projects move out of the experimental phase and into real commercial use. And though there are many players in the market, along with many new big data tools, he expects there to be huge demand for Hadoop services in coming years. “For what it does,” he says, “Hadoop is still the only game in town.”



Sorry, But Technology Alone Can’t Help Us Build a Better World


Can tech bring equality and peace? From left: James Surowiecki, Nandan Nilekani, Jack Dorsey, David Miliband, and Genevieve Bell.

Can tech bring equality and peace? From left: James Surowiecki, Nandan Nilekani, Jack Dorsey, David Miliband, and Genevieve Bell. Techonomy



Sometimes, when tech VIPs get together to opine about “the future,” the ambition goes a little over-the-top. On the other hand, if you’re going to go there, why not swing for the fences?


Such was the case on Sunday night at Techonomy, a gathering of tech CEOs, startup entrepreneurs, scientists, and assorted big thinkers put on by veteran tech journalist David Kilpatrick.


In a restaurant lounge at the Ritz Carlton overlooking the Pacific Ocean, the question of the evening was: “Can tech bring equality and peace?” Given the high profiles of the panelists assembled to entertain an answer, you could almost be forgiven for thinking they were going to come up with a definitive yes-or-no.


The group included Jack Dorsey, the founder of Twitter and CEO of Square; anthropologist Genevieve Bell, director of user experience research at Intel; IT mogul Nandan Nilekani, who led a massive project to create and issue a universal ID for hundreds of millions of Indian citizens; and former British foreign secretary David Miliband, one of the U.K.’s most visible public officials and now president and CEO of the International Rescue Committee.


If anything close to a consensus emerged, it was that technology can’t transcend history or politics. And within the everyday messiness of human lives and conflict, that technology is only as good as the hands of the people it’s in. “It certainly can’t alone,” Dorsey said, when asked if technology could bring equality and peace. “To me, technology fundamentally is just a tool. It’s up to us to figure out how to use those tools and how to apply those tools.”


‘The potential exists that someone could have an idea, that someone anywhere around the world could have an idea, and it could spread instantly.’


For Dorsey, the question of technology’s capacity to empower us but also keep us down is especially fraught. Twitter is credited with helping to spawn revolutions, but it has also become a funnel for bullying and harassment. Dorsey acknowledged that the tool he helped create—and others like it—could be misused. But he remained committed to the optimistic notion on which he has staked his career: that enabling radically efficient human connections nets out to the good.


“There is the potential for quality of voice, at least,” Dorsey said. “The potential exists that someone could have an idea, that someone anywhere around the world could have an idea, and it could spread instantly.”


The Work to Be Done


On the topic of equality, Bell pointedly noted that she was the only woman on the panel, which was moderated by New Yorker business writer James Surowiecki. The imbalance corresponded closely to the lopsided gender ratios at big tech companies.


Trying to decide whether a technology is a tool or a weapon misses the point, Bell said. “It’s both of these things and neither. It has no agency,” she said. “Technology can’t do the work that we as a society must do ourselves.”


In a way, the ID project led by Nilekani is one of the most striking examples in recent memory of trying to use technology to do that work. The effort to create a unique digital ID for every Indian citizen was intended to make access to everything from healthcare to banking easier in a country that lacked the equivalent of, for example, the Social Security number given to every U.S child at birth. But Nilekani would not make a blanket claim for technology as an automatic enabler of a better life.


“A lot of it goes down to how you design it for empowerment,” he said. Toward the end of the conversation, Kirkpatrick pointed out that Tesla CEO Elon Musk and a kid on the subway both have the same iPhone. A kind of equality, right? Nilekani responded that such parity doesn’t mean much when 100 million kids can’t read. “The iPhone 9 isn’t going to solve that problem.”


Instant Translation


One problem Miliband hoped that a problem a future iteration of the iPhone could solve was instant translation. As the head of a global organization with teams posted to refugee crises around the world, he said that the ability to speak and be understood universally could be a powerful tool. “That is a democratizing and equalizing change,” he said.


Miliband also praised technologies like Twitter as a way to draw near-immediate attention to hypocrisy and atrocities. Miliband, who has his own complicated relationship with the Edward Snowden leaks, said that technology has great potential for abuse by the powerful. But it also offers a way to push back.


“The means of secrecy are always renewed,” he said, “but they’re under much greater pressure.”



Enough With ‘Feel Good’ Data Science


datascience_660

ifindkarma/Flickr



Your SaaS startup reaches its two-year anniversary, and you lock a new round of funding. Every measure of customer success is strong. Users report high levels of satisfaction. They log in a lot, they “like” you on Facebook and they read a lot of your emails. In a survey, 90% said they’d recommend your product to a friend. Investors are impressed. Churn is at a high but acceptable level for a young startup, but over the next six months, it fails to improve. Instead, it slowly creeps up to problematic levels – and you can’t understand why.


Startups get blindsided like this when they rely on “feel good” data science: big data analytics that mashup qualitative measurements with quantitative science. Being data-driven is the stated goal of most tech executives, but you can’t be data-driven just because you wave your magical data science hands in the air. If you want to really understand what your customers think, and whether they are prime for upselling, conversion or churn, you need to strictly separate qualitative and quantitative data. It’s time to discover rather than assume what metrics mean, and it’s time to stop dicing customers into imaginary groups.


How to Kill Data Science


We intuitively know that qualitative metrics are unscientific, but they look good. When you take a number like average log-ins and arbitrarily give it a weight of 20% in your customer success ‘algorithm’, you’re converting it into a qualitative metric. This kills the data science and lulls you into a fantasy.


Unfortunately, that is how most data science is conducted today. All sorts of measurements – logins, time spent in the product, engagement with marketing emails, etc. – are given subjective weights.


Companies also rely heavily on self-reported data. Customers are often willing to give their satisfaction levels, rate different experiences and declare whether or not they’d recommend the service to a friend. There’s nothing wrong with this data, but if you mash it and weight together with data based on user actions, you spoil the quantitative data.


Stop tricking yourself.


Finding Versus Assigning a Meaning to Data


When it comes to understanding a customer’s probability of upgrading, continuing to pay for your service or unsubscribing, you cannot equate what people say with what they do. Likewise, you can’t impose meaning on quantitative data until you establish correlations between actions.


The whole point of big data is to find patterns and trends independent of opinions. However, drive-by data science – occasionally running large-scale data science projects to uncover correlations – is common and misleading because the conclusions begin to decay immediately as your customer base, onboarding process, marketing campaigns and other variables change.


An even bigger problem is the practice of pre-assigning meaning to data. For instance, you could (smartly) assume that your most active users are most likely to upgrade. And you could be wrong.


One way is to routinely take random samples of SaaS users and split them into three groups: a random control group, the most active users (those who log in most) and an algorithmically-selected group that we identified as most likely to upgrade by applying machine learning to a large number of behavioral inputs for each customer. Then observe.


One month later, the results are always a surprise to our customer. In one typical case with equal-sized groups, 10 members of the random group, 16 of the most active users and 356 people in the likely-upgraders group had upgraded. Logins and overall activity were a poor predictor of upgrades and barely better than random selection. Put simply, we can’t assume to know the meaning of quantitative behavior until we interrogate the data.


Cohorts of One


If you’re doing real data science, every user is his or her own cohort. Men 25 to 40 is just an imaginary and potentially misleading segment. Why age and gender? What about urban versus rural? New York versus Los Angeles? Home owners versus renters? Segmentation of this kind can continue infinitely. So to predict anything with certainty, reduce each cohort to one individual (or account). Assume no one is the same.


This is the same concept that drives personalization at Amazon, Netflix or Pandora. Their recommendations are based strictly on what you do – they are unconcerned with arbitrary group identities. What you purchase, watch or listen to, and how you do it, is what matters.


A group bigger than one is a myth in data science. Analyze data from thousands of users to find patterns, but apply the insights to individuals, not groups with arbitrary boundaries.


If you’re responsible for growing a subscription service – if you want to forecast and predict what users will do based on data science – you have to rely on real data science. If you’ve been mashing qualitative and quantitative data, assuming meanings for metrics and segmenting huge swaths of users, you can shift course. You can choose to handle your data more scientifically. Data science is now mature enough that it doesn’t need to be scary.


The SaaS space is far too competitive for feel good data science. Big data, despite all the hype, will be lethal if it weaves comforting illusions around reality. So if you’re succeeding, know why. If you’re failing, know why and do something about it.


Christopher Gooley is a co-founder of Preact.



Verdict Overturned for Italian Geoscientists Convicted of Manslaughter


Judge Fabrizia Ida Francabandera, center, reads the appeal sentence in L'Aquila, Italy, Monday, Nov. 10, 2014.

Judge Fabrizia Ida Francabandera, center, reads the appeal decision in L’Aquila, Italy, Monday, Nov. 10, 2014. Sandro Perozzi/AP



An appeals court in Italy has overturned the 2012 manslaughter conviction handed down to seven prominent scientists and engineers following a devastating earthquake in 2009. The decision came as a surprise—and a relief—to many of the accused’s colleagues, who worried that pressure from the community, victims’ families, and local press would compel the court to agree with the earlier decision.


The original conviction, handed down in October 2012, found the seven men guilty of manslaughter after a magnitude 6.3 earthquake killed 309 people in the Italian mountain town of L’Aquila. They each received a six-year sentence, two more than the prosecutor had requested, for not properly assessing the seismic risk and informing the public.


The decision was based on what the scientists said, and didn’t say, in the days leading up to the earthquake. For much of the winter and early spring of 2009, the mountain town of L’Aquila was shaking. A phenomenon known as a seismic swarm was delivering thousands of small earthquakes to the region, many of them significant enough to send glasses crashing to kitchen floors. On March 31, the head of the country’s Civil Protection Department (essentially Italy’s FEMA) asked a group of experts to convene a meeting in L’Aquila to assess the situation and speak to local officials about the risk at hand.


Some of the men were part of a group known as the Serious Risks Commission, a group of distinguished scientists that advises the government on matters such as earthquakes, floods, and nuclear hazards. Although the commission usually met behind closed doors, the L’Aquila meeting was also attended by local officials. Both before and after the meeting, some of the scientists spoke to the media, which was also atypical.


Six days later, a major earthquake destroyed much of the city and killed 309 people. Three years later, each of the men received a six-year sentence for manslaughter, in what critics deemed one of the biggest science-on-trial cases in ages.


Today, after a surprisingly swift-by-Italian-standards appeals process, the three-judge panel acquitted six of the men. The seventh, Bernardo De Bernardinis, received a two-year sentence for causing the death of some, but not all, of the 29 victims involved in the trial.


Much of the case, and subsequent appeal, hinged on an especially moronic statement made by De Bernardinis on the day of the now infamous meeting in L’Aquila preceding the quake. At the time, he was the number two official at the Civil Protection Department. In a television interview, De Bernardinis—whose training is in hydrology, not seismology—was asked if the swarm was a sign of worse to come.


“On the contrary,” he said. “The scientific community assures me that the situation is good because of the continuous discharge of energy.”


As a stand-alone comment, it does sound reassuring, but almost all seismologists would say this is rubbish. Worse, the brief clip from that interview was aired after the experts met on the afternoon of March 31, leaving the false impression that it was a summary of their opinions, not a rogue misstatement from an official who should have known better.


But to go from this to causing the deaths of dozens of people was too much of a stretch for the appeals court. For the people of L’Aquila, there is little solace in this decision: Their city is still in shambles and so many loved ones are gone. But for science, and anyone who thinks scientists should be free to advise on matters of public policy and safety without fear of legal repercussions, today is a good day.



Sorry, But Technology Alone Can’t Help Us Build a Better World


Can tech bring equality and peace? From left: James Surowiecki, Nandan Nilekani, Jack Dorsey, David Miliband, and Genevieve Bell.

Can tech bring equality and peace? From left: James Surowiecki, Nandan Nilekani, Jack Dorsey, David Miliband, and Genevieve Bell. Techonomy



Sometimes, when tech VIPs get together to opine about “the future,” the ambition goes a little over-the-top. On the other hand, if you’re going to go there, why not swing for the fences?


Such was the case on Sunday night at Techonomy, a gathering of tech CEOs, startup entrepreneurs, scientists, and assorted big thinkers put on by veteran tech journalist David Kilpatrick.


In a restaurant lounge at the Ritz Carlton overlooking the Pacific Ocean, the question of the evening was: “Can tech bring equality and peace?” Given the high profiles of the panelists assembled to entertain an answer, you could almost be forgiven for thinking they were going to come up with a definitive yes-or-no.


The group included Jack Dorsey, the founder of Twitter and CEO of Square; anthropologist Genevieve Bell, director of user experience research at Intel; IT mogul Nandan Nilekani, who led a massive project to create and issue a universal ID for hundreds of millions of Indian citizens; and former British foreign secretary David Miliband, one of the U.K.’s most visible public officials and now president and CEO of the International Rescue Committee.


If anything close to a consensus emerged, it was that technology can’t transcend history or politics. And within the everyday messiness of human lives and conflict, that technology is only as good as the hands of the people it’s in. “It certainly can’t alone,” Dorsey said, when asked if technology could bring equality and peace. “To me, technology fundamentally is just a tool. It’s up to us to figure out how to use those tools and how to apply those tools.”


‘The potential exists that someone could have an idea, that someone anywhere around the world could have an idea, and it could spread instantly.’


For Dorsey, the question of technology’s capacity to empower us but also keep us down is especially fraught. Twitter is credited with helping to spawn revolutions, but it has also become a funnel for bullying and harassment. Dorsey acknowledged that the tool he helped create—and others like it—could be misused. But he remained committed to the optimistic notion on which he has staked his career: that enabling radically efficient human connections nets out to the good.


“There is the potential for quality of voice, at least,” Dorsey said. “The potential exists that someone could have an idea, that someone anywhere around the world could have an idea, and it could spread instantly.”


The Work to Be Done


On the topic of equality, Bell pointedly noted that she was the only woman on the panel, which was moderated by New Yorker business writer James Surowiecki. The imbalance corresponded closely to the lopsided gender ratios at big tech companies.


Trying to decide whether a technology is a tool or a weapon misses the point, Bell said. “It’s both of these things and neither. It has no agency,” she said. “Technology can’t do the work that we as a society must do ourselves.”


In a way, the ID project led by Nilekani is one of the most striking examples in recent memory of trying to use technology to do that work. The effort to create a unique digital ID for every Indian citizen was intended to make access to everything from healthcare to banking easier in a country that lacked the equivalent of, for example, the Social Security number given to every U.S child at birth. But Nilekani would not make a blanket claim for technology as an automatic enabler of a better life.


“A lot of it goes down to how you design it for empowerment,” he said. Toward the end of the conversation, Kirkpatrick pointed out that Tesla CEO Elon Musk and a kid on the subway both have the same iPhone. A kind of equality, right? Nilekani responded that such parity doesn’t mean much when 100 million kids can’t read. “The iPhone 9 isn’t going to solve that problem.”


Instant Translation


One problem Miliband hoped that a problem a future iteration of the iPhone could solve was instant translation. As the head of a global organization with teams posted to refugee crises around the world, he said that the ability to speak and be understood universally could be a powerful tool. “That is a democratizing and equalizing change,” he said.


Miliband also praised technologies like Twitter as a way to draw near-immediate attention to hypocrisy and atrocities. Miliband, who has his own complicated relationship with the Edward Snowden leaks, said that technology has great potential for abuse by the powerful. But it also offers a way to push back.


“The means of secrecy are always renewed,” he said, “but they’re under much greater pressure.”



Thousands of never-before-seen human genome variations uncovered

Thousands of never-before-seen genetic variants in the human genome have been uncovered using a new genome sequencing technology. These discoveries close many human genome mapping gaps that have long resisted sequencing.



The technique, called single-molecule, real-time DNA sequencing (SMRT), may now make it possible for researchers to identify potential genetic mutations behind many conditions whose genetic causes have long eluded scientists, said Evan Eichler, professor of genome sciences at the University of Washington, who led the team that conducted the study.


"We now have access to a whole new realm of genetic variation that was opaque to us before," Eichler said.


Eichler and his colleague report their findings Nov. 10 in the journal Nature.


To date, scientists have been able to identify the genetic causes of only about half of inherited conditions. This puzzle has been called the "missing heritability problem." One reason for this problem may be that standard genome sequencing technologies cannot map many parts of the genome precisely. These approaches map genomes by aligning hundreds of millions of small, overlapping snippets of DNA, typically about 100 bases long, and then analyzing their DNA sequences to construct a map of the genome.


This approach has successfully pinpointed millions of small variations in the human genome. These variations arise from substitution of a single nucleotide base, called a single-nucleotide polymorphisms or SNP. The standard approach also made it possible to identify very large variations, typically involving segments of DNA that are 5,000 bases long or longer. But for technical reasons, scientists had previously not been able to reliably detect variations whose lengths are in between -- those ranging from about 50 to 5,000 bases in length.


The SMRT technology used in the new study makes it possible to sequence and read DNA segments longer than 5,000 bases, far longer than standard gene sequencing technology.


This "long-read" technique, developed by Pacific Biosciences of California, Inc. of Menlo Park, Calif., allowed the researchers to create a much higher resolution structural variation map of the genome than has previously been achieved. Mark Chaisson, a postdoctoral fellow in Eichler's lab and lead author on the study, developed the method that made it possible to detect structural variants at the base pair resolution using this data.


To simplify their analysis, the researchers used the genome from a hydatidiform mole, an abnormal growth caused when a sperm fertilizes an egg that lacks the DNA from the mother. The fact that mole genome contains only one copy of each gene, instead of the two copies that exist in a normal cell. simplifies the search for genetic variation.


Using the new approach in the hydatidiform genome, the researchers were able to identify and sequence 26,079 segments that were different from a standard human reference genome used in genome research. Most of these variants, about 22,000, have never been reported before, Eichler said.


"These findings suggest that there is a lot of variation we are missing," he said.


The technique also allowed Eichler and his colleagues to map some of the more than 160 segments of the genome, called euchromatic gaps, that have defied previous sequencing attempts. Their efforts closed 50 of the gaps and narrowed 40 others.


The gaps include some important sequences, Eichler said, including parts of genes and regulatory elements that help control gene expression. Some of the DNA segments within the gaps show signatures that are known to be toxic to Escherichia coli, the bacteria that is commonly used in some genome sequencing processes.


Eichler said, "It is likely that if a sequence of this DNA were put into an E. coli, the bacteria would delete the DNA." This may explain why it could not be sequenced using standard approaches. He added that the gaps also carry complex sequences that are not well reproduced by standard sequencing technologies.


"The sequences vary extensively between people and are likely hotspots of genetic instability," he explained.


For now, SMRT technology will remain a research tool because of its high cost, about $100,000 per genome.


Eichler predicted, "In five years there might be a long-read sequence technology that will allow clinical laboratories to sequence a patient's chromosomes from tip to tip and say, 'Yes, you have about three to four million SNPs and insertions deletions but you also have approximately 30,000-40,000 structural variants. Of these, a few structural variants and a few SNPs are the reason why you're susceptible to this disease.' Knowing all the variation is going to be a game changer."



Enough With ‘Feel Good’ Data Science


datascience_660

ifindkarma/Flickr



Your SaaS startup reaches its two-year anniversary, and you lock a new round of funding. Every measure of customer success is strong. Users report high levels of satisfaction. They log in a lot, they “like” you on Facebook and they read a lot of your emails. In a survey, 90% said they’d recommend your product to a friend. Investors are impressed. Churn is at a high but acceptable level for a young startup, but over the next six months, it fails to improve. Instead, it slowly creeps up to problematic levels – and you can’t understand why.


Startups get blindsided like this when they rely on “feel good” data science: big data analytics that mashup qualitative measurements with quantitative science. Being data-driven is the stated goal of most tech executives, but you can’t be data-driven just because you wave your magical data science hands in the air. If you want to really understand what your customers think, and whether they are prime for upselling, conversion or churn, you need to strictly separate qualitative and quantitative data. It’s time to discover rather than assume what metrics mean, and it’s time to stop dicing customers into imaginary groups.


How to Kill Data Science


We intuitively know that qualitative metrics are unscientific, but they look good. When you take a number like average log-ins and arbitrarily give it a weight of 20% in your customer success ‘algorithm’, you’re converting it into a qualitative metric. This kills the data science and lulls you into a fantasy.


Unfortunately, that is how most data science is conducted today. All sorts of measurements – logins, time spent in the product, engagement with marketing emails, etc. – are given subjective weights.


Companies also rely heavily on self-reported data. Customers are often willing to give their satisfaction levels, rate different experiences and declare whether or not they’d recommend the service to a friend. There’s nothing wrong with this data, but if you mash it and weight together with data based on user actions, you spoil the quantitative data.


Stop tricking yourself.


Finding Versus Assigning a Meaning to Data


When it comes to understanding a customer’s probability of upgrading, continuing to pay for your service or unsubscribing, you cannot equate what people say with what they do. Likewise, you can’t impose meaning on quantitative data until you establish correlations between actions.


The whole point of big data is to find patterns and trends independent of opinions. However, drive-by data science – occasionally running large-scale data science projects to uncover correlations – is common and misleading because the conclusions begin to decay immediately as your customer base, onboarding process, marketing campaigns and other variables change.


An even bigger problem is the practice of pre-assigning meaning to data. For instance, you could (smartly) assume that your most active users are most likely to upgrade. And you could be wrong.


One way is to routinely take random samples of SaaS users and split them into three groups: a random control group, the most active users (those who log in most) and an algorithmically-selected group that we identified as most likely to upgrade by applying machine learning to a large number of behavioral inputs for each customer. Then observe.


One month later, the results are always a surprise to our customer. In one typical case with equal-sized groups, 10 members of the random group, 16 of the most active users and 356 people in the likely-upgraders group had upgraded. Logins and overall activity were a poor predictor of upgrades and barely better than random selection. Put simply, we can’t assume to know the meaning of quantitative behavior until we interrogate the data.


Cohorts of One


If you’re doing real data science, every user is his or her own cohort. Men 25 to 40 is just an imaginary and potentially misleading segment. Why age and gender? What about urban versus rural? New York versus Los Angeles? Home owners versus renters? Segmentation of this kind can continue infinitely. So to predict anything with certainty, reduce each cohort to one individual (or account). Assume no one is the same.


This is the same concept that drives personalization at Amazon, Netflix or Pandora. Their recommendations are based strictly on what you do – they are unconcerned with arbitrary group identities. What you purchase, watch or listen to, and how you do it, is what matters.


A group bigger than one is a myth in data science. Analyze data from thousands of users to find patterns, but apply the insights to individuals, not groups with arbitrary boundaries.


If you’re responsible for growing a subscription service – if you want to forecast and predict what users will do based on data science – you have to rely on real data science. If you’ve been mashing qualitative and quantitative data, assuming meanings for metrics and segmenting huge swaths of users, you can shift course. You can choose to handle your data more scientifically. Data science is now mature enough that it doesn’t need to be scary.


The SaaS space is far too competitive for feel good data science. Big data, despite all the hype, will be lethal if it weaves comforting illusions around reality. So if you’re succeeding, know why. If you’re failing, know why and do something about it.


Christopher Gooley is a co-founder of Preact.



Verdict Overturned for Italian Geoscientists Convicted of Manslaughter


Judge Fabrizia Ida Francabandera, center, reads the appeal sentence in L'Aquila, Italy, Monday, Nov. 10, 2014.

Judge Fabrizia Ida Francabandera, center, reads the appeal decision in L’Aquila, Italy, Monday, Nov. 10, 2014. Sandro Perozzi/AP



An appeals court in Italy has overturned the 2012 manslaughter conviction handed down to seven prominent scientists and engineers following a devastating earthquake in 2009. The decision came as a surprise—and a relief—to many of the accused’s colleagues, who worried that pressure from the community, victims’ families, and local press would compel the court to agree with the earlier decision.


The original conviction, handed down in October 2012, found the seven men guilty of manslaughter after a magnitude 6.3 earthquake killed 309 people in the Italian mountain town of L’Aquila. They each received a six-year sentence, two more than the prosecutor had requested, for not properly assessing the seismic risk and informing the public.


The decision was based on what the scientists said, and didn’t say, in the days leading up to the earthquake. For much of the winter and early spring of 2009, the mountain town of L’Aquila was shaking. A phenomenon known as a seismic swarm was delivering thousands of small earthquakes to the region, many of them significant enough to send glasses crashing to kitchen floors. On March 31, the head of the country’s Civil Protection Department (essentially Italy’s FEMA) asked a group of experts to convene a meeting in L’Aquila to assess the situation and speak to local officials about the risk at hand.


Some of the men were part of a group known as the Serious Risks Commission, a group of distinguished scientists that advises the government on matters such as earthquakes, floods, and nuclear hazards. Although the commission usually met behind closed doors, the L’Aquila meeting was also attended by local officials. Both before and after the meeting, some of the scientists spoke to the media, which was also atypical.


Six days later, a major earthquake destroyed much of the city and killed 309 people. Three years later, each of the men received a six-year sentence for manslaughter, in what critics deemed one of the biggest science-on-trial cases in ages.


Today, after a surprisingly swift-by-Italian-standards appeals process, the three-judge panel acquitted six of the men. The seventh, Bernardo De Bernardinis, received a two-year sentence for causing the death of some, but not all, of the 29 victims involved in the trial.


Much of the case, and subsequent appeal, hinged on an especially moronic statement made by De Bernardinis on the day of the now infamous meeting in L’Aquila preceding the quake. At the time, he was the number two official at the Civil Protection Department. In a television interview, De Bernardinis—whose training is in hydrology, not seismology—was asked if the swarm was a sign of worse to come.


“On the contrary,” he said. “The scientific community assures me that the situation is good because of the continuous discharge of energy.”


As a stand-alone comment, it does sound reassuring, but almost all seismologists would say this is rubbish. Worse, the brief clip from that interview was aired after the experts met on the afternoon of March 31, leaving the false impression that it was a summary of their opinions, not a rogue misstatement from an official who should have known better.


But to go from this to causing the deaths of dozens of people was too much of a stretch for the appeals court. For the people of L’Aquila, there is little solace in this decision: Their city is still in shambles and so many loved ones are gone. But for science, and anyone who thinks scientists should be free to advise on matters of public policy and safety without fear of legal repercussions, today is a good day.



President Obama Calls On FCC to Uphold Net Neutrality


Image: Free Press/CC

Image: Free Press/CC



President Barack Obama has called on the Federal Communications Commission to lay down “the strongest possible rules” to protect net neutrality, saying that internet service providers should not be allowed to pick winners and losers in the online marketplace.


On Monday, in a move likely to please operations such as Netflix and Google that offer video and other content over the net, while raising the ire of big-name ISPs such as Comcast and Verizon, the White House released a statement and video from President Obama that aims to reclassify internet service as a utility under Title II of the 1934 Telecommunications Act and push the FCC back towards rules that would prevent ISPs from restricting best access to what you can do and see online.


“I believe the FCC should create a new set of rules protecting net neutrality and ensuring that neither the cable company nor the phone company will be able to act as a gatekeeper,” the statement reads.


In 2010, the FCC established a set of rules designed to protect net neutrality—the notion that all internet traffic should be treated equally—but last year, after a suit from Verizon, federal court shot down these 2010 rules, and earlier this year, the FCC issued a new proposal that seemed to undermine net neutrality. This was met with protests from internet activists and companies such as Netflix, with many saying the proposal would allow ISPs such as Comcast or Verizon to throttle video and other content streamed across the net by Netflix and others.


If the FCC reclassifies internet service as a utility under Title II—internet service would become something akin to telephone service, electricity, or water—it would have the legal freedom to lay down net neutrality laws. But some worry that this would end up slowing the expansion of the internet, giving ISPs less incentive to expand their networks.


In his statement, Obama acknowledged that the FCC is an independent agency, free to address net neutrality as it sees fit, but he made his own stance clear, providing the agency with more political freedom to push back against the big ISPs. “The rules I am asking for are simple, common-sense steps that reflect the Internet you and I use every day, and that some ISPs already observe,” his statement continues.


The President called for new rules that prevent ISPs from blocking or throttling content and from prioritizing certain content for a fee. “No service should be stuck in a ‘slow lane’ because it does not pay a fee,” the statement says. “That kind of gatekeeping would undermine the level playing field essential to the Internet’s growth. So, as I have before, I am asking for an explicit ban on paid prioritization and any other restriction that has a similar effect.”


He also said that the notion of net neutrality should apply not only to the “last mile” connections between ISPs and consumers, but to the connections between the various networks at the heart of the internet. “The connection between consumers and ISPs—the so-called ‘last mile’—is not the only place some sites might get special treatment,” he says. “I am also asking the FCC to…if necessary to apply net neutrality rules to points of interconnection between the ISP and the rest of the internet.”


This last stance could mark a change in the very notion of net neutrality, which many argue should only apply to the last mile. Certainly, ISPs also have power to effect how traffic flowing between back-end network providers—though the situation is rather complicated. It’s so complicated, the debate will likely continue for years to come.



DarkHotel: A Sophisticated New Hacking Attack Targets High-Profile Hotel Guests


This hotel is not implicated in the DarkHotel attacks. It is shown here to stand in for all luxury hotels.

This hotel is not implicated in the DarkHotel attacks. It is shown here to stand in for all luxury hotels. Flickr: L'HOTEL PORTO BAY SÃO PAULO



The hotel guest probably never knew what hit him. When he tried to get online using his five-star hotel’s WiFi network, he got a pop-up alerting him to a new Adobe software update. When he clicked to accept the download, he got a malicious executable instead.


What he didn’t know was that the sophisticated attackers who targeted him had been lurking on the hotel’s network for days waiting for him to check in. They uploaded their malware to the hotel’s server days before his arrival, then deleted it from the hotel network days after he left.


That’s the conclusion reached by researchers at Kaspersky Lab and the third-party company that manages the WiFi network of the unidentified hotel where the guest stayed, located somewhere in Asia. Kaspersky says the attackers have been active for at least seven years, conducting surgical strikes against targeted guests at other luxury hotels in Asia as well as infecting victims via spear-phishing attacks and P2P networks.


Kaspersky researchers named the group DarkHotel, but they’re also known as Tapaoux by other security firms who have been separately tracking their spear-phishing and P2P attacks. The attackers have been active since at least 2007, using a combination of highly sophisticated methods and pedestrian techniques to ensnare victims, but the hotel hacks appear to be a new and daring development in a campaign aimed at high-value targets.


“Every day this is getting bigger and bigger,” says Costin Raiu, manager of Kaspersky’s Global Research and Analysis Team. “They’re doing more and more hotels.” The majority of the hotels that are hit are in Asia but some are in the U.S. as well. Kaspersky will not name the hotels but says they’ve been uncooperative in assisting with the investigation.


“This Is NSA-Level Infection Mechanism”


The attackers’ methods include the use of zero-day exploits to target executives in spear-phishing attacks as well as a kernel-mode keystroke logger to siphon data from victim machines. They also managed to crack weak digital signing keys to generate certificates for signing their malware, in order to make malicious files appear to be legitimate software.


“Obviously, we’re not dealing with an average actor,” says Raiu. “This is a top-class threat actor. Their ability to do the kernel-mode key logger is rare, the reverse engineering of the certificate, the leveraging of zero days—that puts them in a special category.”


“Their targeting is nuclear themed, but they also target the defense industry base in the U.S.”


Targets in the spear-phishing attacks include high-profile executives—among them a media executive from Asia—as well as government agencies and NGOs and U.S. executives. The primary targets, however, appear to be in North Korea, Japan, and India. “All nuclear nations in Asia,” Raiu notes. “Their targeting is nuclear themed, but they also target the defense industry base in the U.S. and important executives from around the world in all sectors having to do with economic development and investments.” Recently there has been a spike in the attacks against the U.S. defense industry.


The attackers seems to take a two-pronged approach—using the P2P campaign to infect as many victims as possible and then the spear-phishing and hotel attacks for surgically targeted attacks. In the P2P attacks thousands of victims are infected with botnet malware during the initial stage, but if the victim turns out to be interesting, the attackers go a step further to place a backdoor on the system to exfiltrate documents and data.


Until recently, the attackers had about 200 command-and-control servers set up to manage the operation. Kaspersky managed to sinkhole 26 of the command server domains and even gained access to some of the servers, where they found unprotected logs identifying thousands of infected systems. A lot of the machines in the attackers’ logs, however, turned out to be sandboxes set up by researchers to ensnare and study botnets, showing how indiscriminating the attackers were in their P2P campaign. The attackers shut down much of their command infrastructure in October, however, presumably after becoming aware that the Kaspersky researchers were tracking them


“As far as I can see there was an emergency shut down,” Raiu says. “I think there is a lot of panic over this.”


Signs Point to South Korea


That panic may be because the campaign shows signs of possibly emanating from an important U.S. ally: South Korea. Researchers point out that one variant of malware the attackers used was designed to shut down if it found itself on a machine whose codepage was set to Korean. The key logger the attackers used also has Korean characters inside and appears to have ties to a coder in South Korea. The sophisticated nature of the key logger as well as the attack on the RSA keys indicates that DarkHotel is likely a nation-state campaign—or at least a nation-state supported campaign. If true, this would make the attack against the U.S. defense industry awkward, to say the least.


Raiu says the key logger, a kernel-mode logger, is the best written and most sophisticated logger he’s seen in his years as a security researcher. Kernel-mode malware is rare and difficult to pull off. Operating at the core of the machine rather than the user level where most software applications run, allows the malware to better bypass antivirus scanners and other detection systems. But kernel-mode malware requires a skillful touch since it can easily crash a system if not well-designed.


“You have to be very skilled in kernel-level development and this is already quite a rare skillset,” says Vitaly Kamluk, principal security researcher at Kaspersky Lab. “Then you have to make it very stable…. It must be very stable and very well tested.”


There’s no logical reason to use a kernel-level keylogger says Raiu since it’s so easy to write key loggers that hook the Windows API using about four lines of code. “But these guys prefer to do a kernel-level keylogger, which is about 300 kilobytes in size—the driver for the key logger—which is pretty crazy and very unusual. So the guy who did it is super confident in his coding skills. He knows that his code is top-notch.”


The logger, which was created in 2007, appears to have been written by someone who goes by the name “Chpie”—a name that appears in source code for the logger. Chpie is the name used by a South Korean coder who is known to have created another kernel-level key logger that Raiu says appears to be an earlier version of this one. The key logger in the DarkHotel attack uses some of the same source code but is more sophisticated, as if it’s an upgraded version of the earlier keylogger.


Aside from the sophisticated key logger, the attacker’s use of digital certificates to sign their malware also points to a nation-state or nation-state supported actor. The attackers found that a certificate authority belonging to the Malaysian government as well as Deutsche Telekom were using weak 512-bit signing keys. The small key size allowed the attackers, with a little super-computing power, to factor the 512-bit RSA keys (essentially re-engineer them) to generate their own digital certificates to sign their malware.


“You very rarely, if ever, see such techniques used by APT (advanced persistent threat) groups,” Raiu says. “Nobody else as far as we know has managed to do something similar, despite the fact that these certificates existed for some time…. This is [an] NSA-level infection mechanism.”


These sophisticated elements of the attack are important, but the most intriguing part of the DarkHotel campaign is the hotel operation.


Unravelling the Mystery of DarkHotel


The Kaspersky researchers first became aware of the hotel attacks last January when they got reports through their automated system about a cluster of customer infections. They traced the infections to the networks of a couple of hotels in Asia. Kamluk traveled to the hotels to see if he could determine how guests were being infected, but nothing happened to his machine. The hotels proved to be of no help when Kamluk told them what was happening to guests. But during his stay, he noticed that both hotels used the same third-party firm to manage its guest WiFi.


Some hotels own and operate their network infrastructure; others use a managed services firm. The company managing the WiFi network of the two hotels Kamluk visited wishes to remain anonymous, but it was an unusually willing partner in getting to the bottom of the attacks. It acted quickly to provide Kaspersky with server images and logs to track down the attackers.


Although the attackers left very few traces, “There were certain command lines which should not have been there in the hotel system,” a senior executive with the managed-services company says.


In one case, the researchers found a reference to a malicious Windows executable in the directory of a Unix server. The file itself was long gone, but a reference pointing to its former existence remained. “[T]there was a file-deletion record and a timestamp of when it happened,” says Kamluk. Judging from traces left behind, the attackers had operated outside normal business hours to place their malware on the hotel system and infect guests.


“They started early in the morning before the hotel staff would arrive to the office and then after they leave the office they were also distributing the malware then,” says the senior executive. “This is not just something that happened yesterday. These are people who have been taking their time. They’ve been trying to access networks over the last years.”


It’s unclear how many other hotels they’ve attacked, but it appears the hackers cherry-pick their targets, only hitting hotels where they know their victims will be staying.


When victims attempt to connect to the WiFi network, they get a pop-up alert telling them their Adobe Flash player needs an update and offering them a file, digitally signed to make it look authentic, to download. If the victims accept they download, they get a Trojan delivered instead. Crucially, the alerts pop up before guests actually get onto the WiFi network, so even if they abandon their plan to get online, they are infected the moment they hit “accept.” The malware doesn’t then immediately go to work. Instead it sits quietly for six months before waking up and calling home to a command-and-control server. Raiu says this is likely meant to circumvent the watchful eyes of IT departments who would be on the lookout for suspicious behavior immediately after an executive returned from a trip to Asia.


At some of the hotels, only a few victims appear to have been targeted. But on other systems, it appears the attackers targeted a delegation of visitors; in that instance, evidence shows they tried to hit every device attempting to get online during a specific period of time.


“Seems like some event occurred or maybe some delegation visited the hotel and stayed there for a few days and they tried to hit as many members of the delegation as possible,” Raiu says. He thinks the victims were ones the attackers couldn’t reach through ordinary spearphishing attacks—perhaps because their work networks were carefully protected.


Kaspersky still doesn’t know how the attackers get onto the hotel servers. They don’t live on the servers the way criminal hackers do—that is, maintain backdoor access to the servers to gain re-entry over an extended period of time. The DarkHotel attackers come in, do their deed, then erase all evidence and leave. But in the logs, the researchers found no backdoors on the systems, so either the attackers never used them or successfully erased any evidence of them. Or they had an insider who helped them pull off the attacks.


The researchers don’t know exactly who the attackers were targeting in the identified hotel attacks. Guests logging onto WiFi often have to enter their last name and room number in the WiFi login page, but neither Kaspersky, nor the company that maintained the WiFi network, had access to the guest information. Reports that come into Kaspersky’s automated reporting system from customers are anonymous, so Kaspersky is seldom able to identify a victim beyond an IP address.


The number of hotels that have been hit is also unknown. So far the researchers have found fewer than a dozen hotels with infection indicators. “Maybe there are some hotels that … use to be infected and we just cannot learn about that because there are no traces,” the network-management executive says.


The company worked with Kaspersky to scour all of the hotel servers it manages for any traces of malware and are “fairly confident that the malware doesn’t sit on any hotel server today.” But that is just one network-management company. Presumably, the DarkHotel operation is still active on other networks.


Safeguarding against such an attack can be difficult for hotel guests. The best defense is to double check update alerts that pop up on your computer during a stay in a hotel. Go to the software vendor’s site directly to see if an update has been posted and download it directly from there. Though, of course, this won’t help if the attackers are able to redirect your machine to a malicious download site.



DarkHotel: A Sophisticated New Hacking Attack Targets High-Profile Hotel Guests


This hotel is not implicated in the DarkHotel attacks. It is shown here to stand in for all luxury hotels.

This hotel is not implicated in the DarkHotel attacks. It is shown here to stand in for all luxury hotels. Flickr: L'HOTEL PORTO BAY SÃO PAULO



The hotel guest probably never knew what hit him. When he tried to get online using his five-star hotel’s WiFi network, he got a pop-up alerting him to a new Adobe software update. When he clicked to accept the download, he got a malicious executable instead.


What he didn’t know was that the sophisticated attackers who targeted him had been lurking on the hotel’s network for days waiting for him to check in. They uploaded their malware to the hotel’s server days before his arrival, then deleted it from the hotel network days after he left.


That’s the conclusion reached by researchers at Kaspersky Lab and the third-party company that manages the WiFi network of the unidentified hotel where the guest stayed, located somewhere in Asia. Kaspersky says the attackers have been active for at least seven years, conducting surgical strikes against targeted guests at other luxury hotels in Asia as well as infecting victims via spear-phishing attacks and P2P networks.


Kaspersky researchers named the group DarkHotel, but they’re also known as Tapaoux by other security firms who have been separately tracking their spear-phishing and P2P attacks. The attackers have been active since at least 2007, using a combination of highly sophisticated methods and pedestrian techniques to ensnare victims, but the hotel hacks appear to be a new and daring development in a campaign aimed at high-value targets.


“Every day this is getting bigger and bigger,” says Costin Raiu, manager of Kaspersky’s Global Research and Analysis Team. “They’re doing more and more hotels.” The majority of the hotels that are hit are in Asia but some are in the U.S. as well. Kaspersky will not name the hotels but says they’ve been uncooperative in assisting with the investigation.


“This Is NSA-Level Infection Mechanism”


The attackers’ methods include the use of zero-day exploits to target executives in spear-phishing attacks as well as a kernel-mode keystroke logger to siphon data from victim machines. They also managed to crack weak digital signing keys to generate certificates for signing their malware, in order to make malicious files appear to be legitimate software.


“Obviously, we’re not dealing with an average actor,” says Raiu. “This is a top-class threat actor. Their ability to do the kernel-mode key logger is rare, the reverse engineering of the certificate, the leveraging of zero days—that puts them in a special category.”


“Their targeting is nuclear themed, but they also target the defense industry base in the U.S.”


Targets in the spear-phishing attacks include high-profile executives—among them a media executive from Asia—as well as government agencies and NGOs and U.S. executives. The primary targets, however, appear to be in North Korea, Japan, and India. “All nuclear nations in Asia,” Raiu notes. “Their targeting is nuclear themed, but they also target the defense industry base in the U.S. and important executives from around the world in all sectors having to do with economic development and investments.” Recently there has been a spike in the attacks against the U.S. defense industry.


The attackers seems to take a two-pronged approach—using the P2P campaign to infect as many victims as possible and then the spear-phishing and hotel attacks for surgically targeted attacks. In the P2P attacks thousands of victims are infected with botnet malware during the initial stage, but if the victim turns out to be interesting, the attackers go a step further to place a backdoor on the system to exfiltrate documents and data.


Until recently, the attackers had about 200 command-and-control servers set up to manage the operation. Kaspersky managed to sinkhole 26 of the command server domains and even gained access to some of the servers, where they found unprotected logs identifying thousands of infected systems. A lot of the machines in the attackers’ logs, however, turned out to be sandboxes set up by researchers to ensnare and study botnets, showing how indiscriminating the attackers were in their P2P campaign. The attackers shut down much of their command infrastructure in October, however, presumably after becoming aware that the Kaspersky researchers were tracking them


“As far as I can see there was an emergency shut down,” Raiu says. “I think there is a lot of panic over this.”


Signs Point to South Korea


That panic may be because the campaign shows signs of possibly emanating from an important U.S. ally: South Korea. Researchers point out that one variant of malware the attackers used was designed to shut down if it found itself on a machine whose codepage was set to Korean. The key logger the attackers used also has Korean characters inside and appears to have ties to a coder in South Korea. The sophisticated nature of the key logger as well as the attack on the RSA keys indicates that DarkHotel is likely a nation-state campaign—or at least a nation-state supported campaign. If true, this would make the attack against the U.S. defense industry awkward, to say the least.


Raiu says the key logger, a kernel-mode logger, is the best written and most sophisticated logger he’s seen in his years as a security researcher. Kernel-mode malware is rare and difficult to pull off. Operating at the core of the machine rather than the user level where most software applications run, allows the malware to better bypass antivirus scanners and other detection systems. But kernel-mode malware requires a skillful touch since it can easily crash a system if not well-designed.


“You have to be very skilled in kernel-level development and this is already quite a rare skillset,” says Vitaly Kamluk, principal security researcher at Kaspersky Lab. “Then you have to make it very stable…. It must be very stable and very well tested.”


There’s no logical reason to use a kernel-level keylogger says Raiu since it’s so easy to write key loggers that hook the Windows API using about four lines of code. “But these guys prefer to do a kernel-level keylogger, which is about 300 kilobytes in size—the driver for the key logger—which is pretty crazy and very unusual. So the guy who did it is super confident in his coding skills. He knows that his code is top-notch.”


The logger, which was created in 2007, appears to have been written by someone who goes by the name “Chpie”—a name that appears in source code for the logger. Chpie is the name used by a South Korean coder who is known to have created another kernel-level key logger that Raiu says appears to be an earlier version of this one. The key logger in the DarkHotel attack uses some of the same source code but is more sophisticated, as if it’s an upgraded version of the earlier keylogger.


Aside from the sophisticated key logger, the attacker’s use of digital certificates to sign their malware also points to a nation-state or nation-state supported actor. The attackers found that a certificate authority belonging to the Malaysian government as well as Deutsche Telekom were using weak 512-bit signing keys. The small key size allowed the attackers, with a little super-computing power, to factor the 512-bit RSA keys (essentially re-engineer them) to generate their own digital certificates to sign their malware.


“You very rarely, if ever, see such techniques used by APT (advanced persistent threat) groups,” Raiu says. “Nobody else as far as we know has managed to do something similar, despite the fact that these certificates existed for some time…. This is [an] NSA-level infection mechanism.”


These sophisticated elements of the attack are important, but the most intriguing part of the DarkHotel campaign is the hotel operation.


Unravelling the Mystery of DarkHotel


The Kaspersky researchers first became aware of the hotel attacks last January when they got reports through their automated system about a cluster of customer infections. They traced the infections to the networks of a couple of hotels in Asia. Kamluk traveled to the hotels to see if he could determine how guests were being infected, but nothing happened to his machine. The hotels proved to be of no help when Kamluk told them what was happening to guests. But during his stay, he noticed that both hotels used the same third-party firm to manage its guest WiFi.


Some hotels own and operate their network infrastructure; others use a managed services firm. The company managing the WiFi network of the two hotels Kamluk visited wishes to remain anonymous, but it was an unusually willing partner in getting to the bottom of the attacks. It acted quickly to provide Kaspersky with server images and logs to track down the attackers.


Although the attackers left very few traces, “There were certain command lines which should not have been there in the hotel system,” a senior executive with the managed-services company says.


In one case, the researchers found a reference to a malicious Windows executable in the directory of a Unix server. The file itself was long gone, but a reference pointing to its former existence remained. “[T]there was a file-deletion record and a timestamp of when it happened,” says Kamluk. Judging from traces left behind, the attackers had operated outside normal business hours to place their malware on the hotel system and infect guests.


“They started early in the morning before the hotel staff would arrive to the office and then after they leave the office they were also distributing the malware then,” says the senior executive. “This is not just something that happened yesterday. These are people who have been taking their time. They’ve been trying to access networks over the last years.”


It’s unclear how many other hotels they’ve attacked, but it appears the hackers cherry-pick their targets, only hitting hotels where they know their victims will be staying.


When victims attempt to connect to the WiFi network, they get a pop-up alert telling them their Adobe Flash player needs an update and offering them a file, digitally signed to make it look authentic, to download. If the victims accept they download, they get a Trojan delivered instead. Crucially, the alerts pop up before guests actually get onto the WiFi network, so even if they abandon their plan to get online, they are infected the moment they hit “accept.” The malware doesn’t then immediately go to work. Instead it sits quietly for six months before waking up and calling home to a command-and-control server. Raiu says this is likely meant to circumvent the watchful eyes of IT departments who would be on the lookout for suspicious behavior immediately after an executive returned from a trip to Asia.


At some of the hotels, only a few victims appear to have been targeted. But on other systems, it appears the attackers targeted a delegation of visitors; in that instance, evidence shows they tried to hit every device attempting to get online during a specific period of time.


“Seems like some event occurred or maybe some delegation visited the hotel and stayed there for a few days and they tried to hit as many members of the delegation as possible,” Raiu says. He thinks the victims were ones the attackers couldn’t reach through ordinary spearphishing attacks—perhaps because their work networks were carefully protected.


Kaspersky still doesn’t know how the attackers get onto the hotel servers. They don’t live on the servers the way criminal hackers do—that is, maintain backdoor access to the servers to gain re-entry over an extended period of time. The DarkHotel attackers come in, do their deed, then erase all evidence and leave. But in the logs, the researchers found no backdoors on the systems, so either the attackers never used them or successfully erased any evidence of them. Or they had an insider who helped them pull off the attacks.


The researchers don’t know exactly who the attackers were targeting in the identified hotel attacks. Guests logging onto WiFi often have to enter their last name and room number in the WiFi login page, but neither Kaspersky, nor the company that maintained the WiFi network, had access to the guest information. Reports that come into Kaspersky’s automated reporting system from customers are anonymous, so Kaspersky is seldom able to identify a victim beyond an IP address.


The number of hotels that have been hit is also unknown. So far the researchers have found fewer than a dozen hotels with infection indicators. “Maybe there are some hotels that … use to be infected and we just cannot learn about that because there are no traces,” the network-management executive says.


The company worked with Kaspersky to scour all of the hotel servers it manages for any traces of malware and are “fairly confident that the malware doesn’t sit on any hotel server today.” But that is just one network-management company. Presumably, the DarkHotel operation is still active on other networks.


Safeguarding against such an attack can be difficult for hotel guests. The best defense is to double check update alerts that pop up on your computer during a stay in a hotel. Go to the software vendor’s site directly to see if an update has been posted and download it directly from there. Though, of course, this won’t help if the attackers are able to redirect your machine to a malicious download site.



This Device Diagnoses Hundreds of Diseases Using a Single Drop of Blood


rHEALTH X1.

rHEALTH X1. XPRIZE Foundation



The digital health revolution is still stuck.


Tech giants are jumping into the fray with fitness offerings like Apple Health and Google Fit, but there’s still not much in the way of, well, actual medicine. The Fitbits and Jawbones of the world measure users’ steps and heart rate, but they don’t get into the deep diagnostics of, say, biomarkers, the internal indicators that can serve as an early warning sign of a serious ailment. For now, those who want to screen for a disease or measure a medical condition with clinical accuracy still need to go to the doctor.


Dr. Eugene Chan and his colleagues at the DNA Medical Institute (DMI) aim to change that. Chan’s team has created a portable handheld device that can diagnose hundreds of diseases using a single drop of blood with what Chan claims is gold-standard accuracy. Known as rHEALTH, the technology was developed over the course of seven years with grants from NASA, the National Institutes of Health, and the Bill and Melinda Gates Foundation. On Monday, the team received yet another nod (and more funding) as the winners of this year’s Nokia Sensing XChallenge, one of several competitions run by the moonshot-seeking XPrize Foundation.



This Device Diagnoses Hundreds of Diseases Using a Single Drop of Blood


rHEALTH X1.

rHEALTH X1. XPRIZE Foundation



The digital health revolution is still stuck.


Tech giants are jumping into the fray with fitness offerings like Apple Health and Google Fit, but there’s still not much in the way of, well, actual medicine. The Fitbits and Jawbones of the world measure users’ steps and heart rate, but they don’t get into the deep diagnostics of, say, biomarkers, the internal indicators that can serve as an early warning sign of a serious ailment. For now, those who want to screen for a disease or measure a medical condition with clinical accuracy still need to go to the doctor.


Dr. Eugene Chan and his colleagues at the DNA Medical Institute (DMI) aim to change that. Chan’s team has created a portable handheld device that can diagnose hundreds of diseases using a single drop of blood with what Chan claims is gold-standard accuracy. Known as rHEALTH, the technology was developed over the course of seven years with grants from NASA, the National Institutes of Health, and the Bill and Melinda Gates Foundation. On Monday, the team received yet another nod (and more funding) as the winners of this year’s Nokia Sensing XChallenge, one of several competitions run by the moonshot-seeking XPrize Foundation.



In the New New World of Tech, This Company Is Banking on Longevity for the Win


Web

Getty Images



There are almost infinite ways to find things to do on a Friday night. You could check Ticketmaster for movie times, Yelp for nearby restaurants, or OpenTable for a place with seating. If you’re in another city or country, there’s TripAdvisor, or you could just Google whatever you had in mind and see what turns up.


Rarely, however, does such planning include a quick search of Zerve.com. But Scott Newman, founder of Zerve, believes that will change. His article of faith? Newman and his company have been around for what in the context of tech counts as a long, long time—more than a decade. Now that the internet industry, despite its obsession with new new things, isn’t so new anymore, Zerve hopes the experience gained in the online trenches will start to count as a virtue.


Zerve is a rare example these days of a tech company that’s become successful by doing just one thing, and doing it really well.


Despite its relative obscurity, Zerve is not some fresh-faced startup hoping to beat veterans at their own game. Zerve itself is the veteran in this space. For the last 12 years, the New York City company has been quietly building one of the country’s largest databases of local events and activities. Thousands of local event companies—the kind that offer walking tours and teach wine and cheese classes—use Zerve’s software to manage their businesses and book reservations, much like restaurants use OpenTable. And yet, throughout its more than decade-long history, Zerve has focused exclusively on luring more business clients.


Now it’s making a play for the consumer market, too.


On Monday, the company is debuting a new version of Zerve.com that Newman says will help people answer the question: What should I do tonight? Users can choose from 32 cities, select a date and time of day, and find a list of things to do, from food and wine festivals in town to special shows and performances.


One Thing to All People


In going after the consumer market, however, Zerve is setting itself up to battle formidable competitors. It’s asking consumers, in essence, to break the Yelp-Google cycle and instead start their search on Zerve. That’s a big ask for a company with next to no brand recognition. And yet, Newman is hoping that Zerve’s narrow expertise will make it a more valuable starting point. “There’s more to life than dinner and a movie,” he says. “We want to make it easy to find all those other things around you that you didn’t know existed in the first place.”


But what’s more interesting than what Zerve is doing now is how it got here. Zerve is a rare example these days of a tech company that’s become successful by doing just one thing, and doing it really well. More often, it seems, tech companies are looking over their shoulders, trying to mimic what the other guy is doing.


But even as Yelp’s star began rising in the late 2000s, years after Zerve launched, Newman says he was never tempted to make the company another all-purpose review site. “We tried to ignore the well-funded industries that already have tons of companies going after them,” he says. “We didn’t feel the need to fight it out.”


Business Takes Time


Zerve also was bootstrapped for its first 10 years and only recently raised $18 million from the likes of Draper Fisher Jurvetson and Yahoo founder Jerry Yang. By sticking to one niche and remaining independent of investors looking for a quick exit, Zerve was able to develop relationships with businesses who were willing to hand over real-time data about what events are running when. Zerve also says it has amassed more reviews about the companies it serves than all the other platforms combined. Now that Zerve is launching a consumer product, that information will be crucial to serving up more relevant search results.


Zerve’s move to embrace consumers is an especially appealing prospect to the company’s thousands of existing business clients. “I’ve been bugging them for a long time to do this,” says Rick Scarano, founder of the Classic Harbor Line boat tours. Scarano has been using Zerve since 2005 to track customers and monitor reviews. “I have no way of quantifying how much of the success we’ve had can be attributed to them, but what I do know is they’ve made our lives a lot easier,” Scarano says. Now, he hopes the new site will help attract a lot more business, too.



In the New New World of Tech, This Company Is Banking on Longevity for the Win


Web

Getty Images



There are almost infinite ways to find things to do on a Friday night. You could check Ticketmaster for movie times, Yelp for nearby restaurants, or OpenTable for a place with seating. If you’re in another city or country, there’s TripAdvisor, or you could just Google whatever you had in mind and see what turns up.


Rarely, however, does such planning include a quick search of Zerve.com. But Scott Newman, founder of Zerve, believes that will change. His article of faith? Newman and his company have been around for what in the context of tech counts as a long, long time—more than a decade. Now that the internet industry, despite its obsession with new new things, isn’t so new anymore, Zerve hopes the experience gained in the online trenches will start to count as a virtue.


Zerve is a rare example these days of a tech company that’s become successful by doing just one thing, and doing it really well.


Despite its relative obscurity, Zerve is not some fresh-faced startup hoping to beat veterans at their own game. Zerve itself is the veteran in this space. For the last 12 years, the New York City company has been quietly building one of the country’s largest databases of local events and activities. Thousands of local event companies—the kind that offer walking tours and teach wine and cheese classes—use Zerve’s software to manage their businesses and book reservations, much like restaurants use OpenTable. And yet, throughout its more than decade-long history, Zerve has focused exclusively on luring more business clients.


Now it’s making a play for the consumer market, too.


On Monday, the company is debuting a new version of Zerve.com that Newman says will help people answer the question: What should I do tonight? Users can choose from 32 cities, select a date and time of day, and find a list of things to do, from food and wine festivals in town to special shows and performances.


One Thing to All People


In going after the consumer market, however, Zerve is setting itself up to battle formidable competitors. It’s asking consumers, in essence, to break the Yelp-Google cycle and instead start their search on Zerve. That’s a big ask for a company with next to no brand recognition. And yet, Newman is hoping that Zerve’s narrow expertise will make it a more valuable starting point. “There’s more to life than dinner and a movie,” he says. “We want to make it easy to find all those other things around you that you didn’t know existed in the first place.”


But what’s more interesting than what Zerve is doing now is how it got here. Zerve is a rare example these days of a tech company that’s become successful by doing just one thing, and doing it really well. More often, it seems, tech companies are looking over their shoulders, trying to mimic what the other guy is doing.


But even as Yelp’s star began rising in the late 2000s, years after Zerve launched, Newman says he was never tempted to make the company another all-purpose review site. “We tried to ignore the well-funded industries that already have tons of companies going after them,” he says. “We didn’t feel the need to fight it out.”


Business Takes Time


Zerve also was bootstrapped for its first 10 years and only recently raised $18 million from the likes of Draper Fisher Jurvetson and Yahoo founder Jerry Yang. By sticking to one niche and remaining independent of investors looking for a quick exit, Zerve was able to develop relationships with businesses who were willing to hand over real-time data about what events are running when. Zerve also says it has amassed more reviews about the companies it serves than all the other platforms combined. Now that Zerve is launching a consumer product, that information will be crucial to serving up more relevant search results.


Zerve’s move to embrace consumers is an especially appealing prospect to the company’s thousands of existing business clients. “I’ve been bugging them for a long time to do this,” says Rick Scarano, founder of the Classic Harbor Line boat tours. Scarano has been using Zerve since 2005 to track customers and monitor reviews. “I have no way of quantifying how much of the success we’ve had can be attributed to them, but what I do know is they’ve made our lives a lot easier,” Scarano says. Now, he hopes the new site will help attract a lot more business, too.