4 Ways You Can Mess Up Your Predictive Analytics Algorithms

Predictive analytics algorithms are the holy grail of analytics. Every business owner dreams of opening their analytics dashboard to find pinpoint-perfect predictions about the future of their business. But, like the Wizard of Oz, predictive analytics algorithms can be both great and terrible. When they’re great, they help you prepare for your business’ future. When they’re terrible, they mislead you about upcoming earnings.

So what are the mistakes that can lead to inaccurate predictions?

Bad data

Here’s where the old saying “garbage in, garbage out” applies. Predictive analytics algorithms can only work with the data they’re given, so if the input data is biased, or there are too few data points, or you aren’t following data hygiene best practices, the predictions are going to come out skewed. The takeaway here is that Statistics 101 guidelines for good data apply even when advanced algorithms are working their magic. Comprehensive, reliable, accurate, consistent data is always a good idea.

When your predictive algorithms are using machine learning, this advice is especially important. Machine learning algorithms, as the name suggests, learn from the data they’re given. If they run off of bad data, they’ll draw incorrect conclusions, then iterate on those incorrect conclusions until they’re totally lost in the weeds.

Being too optimistic

Humans, as a rule, are more optimistic than computers. If asked to make a guess based off of our gut instinct, we’re biased towards predicting an outcome that is favorable to us. This can be a problem if, for example, you’re running a subscription business and using LimeLight’s predicted subscription revenue dashboard. A predictive dashboard like this requires you to enter your estimated average rebill rate, but if you input 73% when reality leans more towards 68%, the prediction output will look great, but it won’t be accurate.

Applying the predictions too far out into the future

While we’d love for predictive analytics algorithms to hand us our revenue estimates for the next five years on a silver platter, those estimates just aren’t going to be that reliable. Sure, you can set the time bounds on your predictions to go years out, but that doesn’t mean the results are trustworthy. Particularly when your business is just starting and growth is unsteady, there are countless complicating variables that can throw off your predictions for the far-off future.

Instead, use your predictive analytics to help you with your day-to-day work. Predictive dashboards are great at showing you what to expect for the next 7, 15, or 30 days. When it comes to longer term predictions, you’re better off taking the old-fashioned route: looking at your business’ past performance as a benchmark and doing your research.

Making your predictive analytics algorithm too complex

This is a phenomenon known as “overfitting.” It may seem like the more variables you add to your predictive analytics algorithm, the more accurate it will be, but it can actually result in predictions that are overcompensating for niche outlier data and giving you inaccurate results.

Let’s say you (or your data scientist) are in the position to define a price-prediction algorithm for your antique teddy bear eCommerce business. You want your algorithm to guess the most accurate price to assign to any given antique teddy bear, so you include as many variables as you can:

  • The number of teddy bears of this model that were produced
  • The year the teddy bear model was released
  • The quality of the teddy bear material
  • The price of the teddy bear when it was first released
  • The number of this model of teddy bear in circulation now
  • The shape of the teddy bear’s face
  • Whether the teddy bear is wearing a sweater vs a bow tie
  • The fluffiness level of the bear’s fur
  • The fabric pattern on the bear’s paws
  • The dimensions of the teddy bear
  • Whether Shirley Temple was ever photographed carrying this model of teddy bear
  • Whether the bear originally came with a related picture book or story book
  • Whether the bear is smiling or has a neutral expression

This list, while impressive, is going to give you strange results. While there probably is a teddy bear somewhere that has been photographed with Shirley Temple and is therefore more valuable, this is such an outlier that it will skew the results––your other antique teddy bears, no matter how in-demand, will have their price artificially lowered because they haven’t been photographed with Shirley Temple! Instead, your list of variables should stick to the universal characteristics that matter. It might look a little more like this:

  • The number of teddy bears of this model that were produced
  • The year the teddy bear model was released
  • The price of the teddy bear when it was first released
  • The number of this model of teddy bear in circulation now

This is one case where it’s a good idea to be an underachiever.

Don’t get intimidated!

Although there are many ways to get the wrong results from your predictive analytics algorithms, there are also many ways that things can go right. Predictive analytics are a powerful tool that can give you important insight into your business, and you shouldn’t be afraid to harness them.

LimeLight Analytics take advantage of predictive analytics to give you actionable insights.