10-04-23

we are currently in the process of creating a concise and impactful report that summarizes the results of our study on the CDC Diabetes dataset.

In our examination of the CDC Diabetes dataset, we utilized a wide range of statistical techniques. These techniques included exploratory data analysis, correlation analysis, both simple and multiple linear regression, as well as the implementation of the Breusch-Pagan test to assess constant variance. Additionally, we introduced interaction terms, explored higher-order relationships through polynomial regression, and made use of cross-validation to evaluate the performance and generalization of our models.

Our findings have uncovered some interesting insights into the predictive power of our models. Initially, when we introduced an interaction term into the Simple Linear model, it contributed 36.5% to the overall explanatory power. However, this contribution increased to 38.5% when we developed a Multi-Linear quadratic regression model for predicting diabetes, which incorporated both ‘% INACTIVE’ and ‘% OBESE.’ Interestingly, when we applied Support Vector Regression, the explanatory power decreased to 30.1%.

While it is evident that ‘% INACTIVE’ and ‘% OBESE’ play a significant role in diabetes prediction, they may not fully capture the complex dynamics involved. This highlights the need for a more comprehensive analysis that considers a broader array of influencing factors. Therefore, incorporating additional variables is essential for gaining a deeper and more holistic understanding of diabetes prediction.

Leave a Reply

Your email address will not be published. Required fields are marked *