How I Started: Machine Learning Is Not A Magic Stick- (Part 2)

Jan 142 min read

As a statistician, the concept of machine learning wasn’t too hard for me to grasp. It was also becoming increasingly popular, so I decided to dive in. I took a deep breath and immersed myself in whatever resources I could find—YouTube tutorials, Coursera courses, and more. I explored topics like Random Forest, Linear Regression, Support Vector Machines, and Nearest Neighbors.

I still vividly remember a Coursera instructor saying, “Machine learning is not a magic stick.” That statement resonated deeply with me and set the tone for my learning journey.

Back at Rossmann, I was tasked with building a model to calculate sales speed. However, new products and stocked-out items posed challenges; simply using averages for estimation didn’t cut it. While exploring solutions, I came up with a classification system that helped me to make different calculations for different sales patterns.

1. Predictable Sales

For products with consistent sales patterns, predicting future sales is straightforward. For example, if a product sold same quantities in every 35 days, I could confidently forecast its sales for the next 35-day period.

Even if there were obvious increases or decreases in one 35-day period, a meaningful average could still be calculated over three periods.

2. New Products

Products with zero sales in one or two recent periods were labeled as "New Products." By excluding salesless periods, I could reveal their potential sales trends.

3. Unpredictable Sales

Some products exhibited too much variation between sales periods, making it difficult to categorize them. While the model could calculate averages for these items, double-checking their forecasts was prudent to ensure accuracy.

Key Insights

The ratio of items in the "Predictable Sales" category provides a measure of the model's forecasting precision. If this ratio exceeded 80%, it indicated that the majority of items were measurable, and the calculations for sales speed and stock coverage days were likely reliable.