BLOG
Predicting Michael Jordan’s Scoring with Machine Learning
I began my journey in the vast world of data with Kushagramati Analytics.
The avenues of training afforded to me by this job have been extraordinary
to say the least and the freedomto learn has been tremendously encouraging.
I started off with learning Python, Pandas and Numpy and subsequently moved on to ML
concepts. I decided to use an NBA dataset (up to 2017, found on Kaggle)as a part of my learning owing
to my passion for the game of basketball and what better way to begin my journey in predictive analytics than predicting MJ’s scoring.
I began my project with the standard exploration of data. I had split his 15 season career into 10 seasons as a training set for the ML model and 5 seasons as a testing set. The ML model I used for this was linear regression.
Using Linear Regression I obtained the following equation
The equation yielded was PTS = -178.5739 + 34.7715*G and fitting without intercept was PTS = 32.4790*G
PTS = 34.7715*G + pos_coef
SG pos coeff = -178.57 — Michael being a shooting guard, this was the coefficient I obtained for him.
The errors that pertain to this model are:
MSE is 432839.0, RMSE is 657.9
This was what I finally obtained:
The Test Data Set with Predicted Points
Predicted Points vs Actual Points
It would appear that my model has overestimated Jordan’s scoring averages but I can forgive it for making that mistake with the GOAT. Currently I’m looking
at how I can improve the accuracy and I’ve been learning a lot about various ML Models.
​
In my next little attempt at ML, I’ll be looking
at predicting the outcomes of the Los Angles
Lakers’ playoff matches.
Rishan Sanjay
Software (Data) Engineer and Product Management
at Kushagramati