I wanted to share with you our recent progress in predicting diabetes rates based on two key independent variables—obesity rate and inactivity rate—using a multiple linear regression model.
Key Variables: In our model, obesity rate and inactivity rate are considered as independent variables (X), while diabetes rate is the dependent variable (y).
Dataset Division: To streamline the model assessment process, we divided our dataset into training and testing sets. This division ensures that our model is evaluated effectively.
Model Initialization and Fitting: We initialized a linear regression model and fitted it to the training data, utilizing the independent variables (obesity rate and inactivity rate). The model aims to capture the relationship between these variables and the prevalence of diabetes.
Coefficient Analysis: Following the model fitting, we analyzed the coefficients to understand how the independent factors, namely obesity rate and inactivity rate, influence the prevalence of diabetes. These coefficients provide valuable insights into the impact of each variable on the diabetes rate.
Intercept Interpretation: The intercept in our model represents the anticipated baseline rate of diabetes, providing a crucial reference point for our predictions.
Model Evaluation with R2 Score: To assess the model’s performance on the test set, we utilized the R2 score. The R2 score measures the proportion of variation in the dependent variable (diabetes rate) that can be explained by the independent variables (obesity rate and inactivity rate). A higher R2 score indicates a more accurate and reliable predictive model.
Our ongoing research aims to further refine this model and explore additional metrics for a comprehensive evaluation of its predictive capabilities.