Checking for assumptions (3 in this case), with sub-plotted seaborn graphics.
The code for creating the linear regression model can be found in this post
Linearity Assumption
‘Each predictor variable (x) is linearly related to the outcome of variable y.’
Checking Homeoscedacity assumption with a scatterplot
y_pred are the predicted y values from a regression line.
The residuals are the differences between the actuals and the predicted y values.
Homoscedasticity means that the residuals have equal or almost equal variance across the regression line.
By plotting the error terms with predicted terms we can check that there should not be any pattern in the error terms.’
Check for Normality Assumption
Use a histogram to plot the residuals of a regression line (the actual y values vs. the predicted y values) for x.
fig, ax = plt.subplots(2, 2, figsize=(18, 10))
fig.suptitle('Assumption Checks')
#Check for linearity
ax[0, 0] = sns.regplot(
ax=ax[0, 0],
data = df,
x = df['Radio'],
y = df['Sales'],
);
ax[0, 0].set_title('Radio Sales')
ax[0, 0].set_xlabel('Radio Spend ($K)')
ax[0, 0].set_ylabel('Sales ($)')
#ax[0].set_xticks(range(0,10,10))
#ax[0].set_xticks(rotation=90)
#Check for Homeoscedacity
# Plot residuals against the fitted values
ax[0, 1] = sns.scatterplot( ax=ax[0, 1],x=y_pred, y=residuals)
ax[0, 1].set_title("Residuals vs Fitted Values")
ax[0, 1].set_xlabel("Fitted Values")
ax[0, 1].set_ylabel("Residuals")
ax[0, 1].axhline(0, linestyle='--', color='red')
#Check for normality
ax[1, 0] = sns.histplot(ax=ax[1, 0], x=residuals)
ax[1, 0].set_xlabel("Residual Value")
ax[1, 0].set_title("Histogram of Residuals")
#Check for nomrmality QQ plot
ax[1, 1] = sm.qqplot(residuals, line='s',ax = ax[1,1])
ax[1, 0].set_title("Q-Q Plot")
#sm.qqplot(test, loc = 20, scale = 5 , line='45')
plt.show()
Leave a Reply