Enough With ‘Feel Good’ Data Science


datascience_660

ifindkarma/Flickr



Your SaaS startup reaches its two-year anniversary, and you lock a new round of funding. Every measure of customer success is strong. Users report high levels of satisfaction. They log in a lot, they “like” you on Facebook and they read a lot of your emails. In a survey, 90% said they’d recommend your product to a friend. Investors are impressed. Churn is at a high but acceptable level for a young startup, but over the next six months, it fails to improve. Instead, it slowly creeps up to problematic levels – and you can’t understand why.


Startups get blindsided like this when they rely on “feel good” data science: big data analytics that mashup qualitative measurements with quantitative science. Being data-driven is the stated goal of most tech executives, but you can’t be data-driven just because you wave your magical data science hands in the air. If you want to really understand what your customers think, and whether they are prime for upselling, conversion or churn, you need to strictly separate qualitative and quantitative data. It’s time to discover rather than assume what metrics mean, and it’s time to stop dicing customers into imaginary groups.


How to Kill Data Science


We intuitively know that qualitative metrics are unscientific, but they look good. When you take a number like average log-ins and arbitrarily give it a weight of 20% in your customer success ‘algorithm’, you’re converting it into a qualitative metric. This kills the data science and lulls you into a fantasy.


Unfortunately, that is how most data science is conducted today. All sorts of measurements – logins, time spent in the product, engagement with marketing emails, etc. – are given subjective weights.


Companies also rely heavily on self-reported data. Customers are often willing to give their satisfaction levels, rate different experiences and declare whether or not they’d recommend the service to a friend. There’s nothing wrong with this data, but if you mash it and weight together with data based on user actions, you spoil the quantitative data.


Stop tricking yourself.


Finding Versus Assigning a Meaning to Data


When it comes to understanding a customer’s probability of upgrading, continuing to pay for your service or unsubscribing, you cannot equate what people say with what they do. Likewise, you can’t impose meaning on quantitative data until you establish correlations between actions.


The whole point of big data is to find patterns and trends independent of opinions. However, drive-by data science – occasionally running large-scale data science projects to uncover correlations – is common and misleading because the conclusions begin to decay immediately as your customer base, onboarding process, marketing campaigns and other variables change.


An even bigger problem is the practice of pre-assigning meaning to data. For instance, you could (smartly) assume that your most active users are most likely to upgrade. And you could be wrong.


One way is to routinely take random samples of SaaS users and split them into three groups: a random control group, the most active users (those who log in most) and an algorithmically-selected group that we identified as most likely to upgrade by applying machine learning to a large number of behavioral inputs for each customer. Then observe.


One month later, the results are always a surprise to our customer. In one typical case with equal-sized groups, 10 members of the random group, 16 of the most active users and 356 people in the likely-upgraders group had upgraded. Logins and overall activity were a poor predictor of upgrades and barely better than random selection. Put simply, we can’t assume to know the meaning of quantitative behavior until we interrogate the data.


Cohorts of One


If you’re doing real data science, every user is his or her own cohort. Men 25 to 40 is just an imaginary and potentially misleading segment. Why age and gender? What about urban versus rural? New York versus Los Angeles? Home owners versus renters? Segmentation of this kind can continue infinitely. So to predict anything with certainty, reduce each cohort to one individual (or account). Assume no one is the same.


This is the same concept that drives personalization at Amazon, Netflix or Pandora. Their recommendations are based strictly on what you do – they are unconcerned with arbitrary group identities. What you purchase, watch or listen to, and how you do it, is what matters.


A group bigger than one is a myth in data science. Analyze data from thousands of users to find patterns, but apply the insights to individuals, not groups with arbitrary boundaries.


If you’re responsible for growing a subscription service – if you want to forecast and predict what users will do based on data science – you have to rely on real data science. If you’ve been mashing qualitative and quantitative data, assuming meanings for metrics and segmenting huge swaths of users, you can shift course. You can choose to handle your data more scientifically. Data science is now mature enough that it doesn’t need to be scary.


The SaaS space is far too competitive for feel good data science. Big data, despite all the hype, will be lethal if it weaves comforting illusions around reality. So if you’re succeeding, know why. If you’re failing, know why and do something about it.


Christopher Gooley is a co-founder of Preact.



1 comment:

  1. Join now for the most comprehensive learning opportunities and create the most efficient set of modifications in Data Science with the aid of our AI Patasala Data Science Training in Hyderabad.
    Data Science Course Hyderabad

    ReplyDelete