Continuing our analysis, we implemented Support Vector Regression (SVR) on the CDC dataset, incorporating quadratic and interaction terms. SVR is a machine learning algorithm designed for regression tasks, especially useful for high-dimensional data and scenarios where traditional linear regression models struggle to capture complex relationships between variables.
The SVR model was initialized with specific parameters, including the ‘RBF (Radial Basis Function) kernel, the regularization parameter (C), and the tolerance for errors (epsilon). The RBF kernel is a mathematical function utilized in SVR to capture non-linear relationships, ‘C’ controls the trade-off between fitting the training data and preventing overfitting, and ‘epsilon’ specifies the margin within which errors are acceptable in SVR.
In this analysis, we used ‘INACTIVE’ and ‘OBESE’ as features, along with their squared values (‘INACTIVE_sq’ and ‘OBESE_sq’) and an interaction term (‘OBESE*INACTIVE’). We employed a K-Fold cross-validator with 5 folds to split the data into training and testing sets for cross-validation. A new SVR model was created, fitted to the training data, and used for predictions on the testing data. The ‘RBF’ kernel was applied to enable the model to learn the relationships between the input features and the target variable.
The performance of the SVR model was evaluated using the R-squared (R2) score, which returned a value of 0.30. This score is lower than the R-squared from our quadratic model, suggesting that the SVR model may not capture the data’s relationships as effectively.