top of page


Predicting Michael Jordan’s Scoring with Machine Learning

I began my journey in the vast world of data with Kushagramati Analytics.

The avenues of training afforded to me by this job have been extraordinary

to say the least and the freedomto learn has been tremendously encouraging.

I started off with learning Python, Pandas and Numpy and subsequently moved on to ML

concepts. I decided to use an NBA dataset (up to 2017, found on Kaggle)as a part of my learning owing

to my passion for the game of basketball and what better way to begin my journey in predictive analytics than predicting MJ’s scoring.

I began my project with the standard exploration of data. I had split his 15 season career into 10 seasons as a training set for the ML model and 5 seasons as a testing set. The ML model I used for this was linear regression.

Using Linear Regression I obtained the following equation 

The equation yielded was PTS = -178.5739 + 34.7715*G and fitting without intercept was PTS = 32.4790*G

PTS = 34.7715*G + pos_coef

SG pos coeff = -178.57 — Michael being a shooting guard, this was the coefficient I obtained for him.

The errors that pertain to this model are:

MSE is 432839.0, RMSE is 657.9

This was what I finally obtained:


The Test Data Set with Predicted Points


Predicted Points vs Actual Points

It would appear that my model has overestimated Jordan’s scoring averages but I can forgive it for making that mistake with the GOAT. Currently I’m looking

at how I can improve the accuracy and I’ve been learning a lot about various ML Models.

In my next little attempt at ML, I’ll be looking

at predicting the outcomes of the Los Angles

Lakers’ playoff matches.

Rishan Sanjay

Software (Data) Engineer and Product Management
at Kushagramati

bottom of page