Cross validation and grid search are two very important ways to optimize hyperparameters for a model to get the best performance. Typically hyperparameters need manual fine tuning to get optimal results.
In this article we will do a demonstration of how to do hyperparameter optimization for a Support Vector Classifier (SVC) to do classification on Iris dataset. Some of the key parameters for a SVC are- C values, Gamma values and Kernel Function.
- Import necessary packages including GridSearchCV
- Import Iris dataset and create features (x) and label (y).
- Instantiate the classifier (Support Vector Machine) and get parameters for this function. Even though there are many hyperparameters for this classifier, we would focus on optimizing the three main ones- C, Gamma, and Kernel.
- We will try 5 difference values for ‘C’ , 4 values for ‘Kernel’ and 5 values for gamma as listed out in the param_grid below.
- Please note that the desired parameters possible values need to be passed on in a dictionary format (Key-Value Pair)
- Run possible models and do 10 fold cross validation. Please read here more on what is the utility of the cross validation. CAUTION– both gridsearch and cross-validation are resource intensive operations. Please use them carefully for bigger datasets.
- In this example, the best parameters come out to be C= 10, gamma = 0.01 and Kernel = linear to give the best accuracy score of 0.98 on this data.
Finally, here is an excellent link to know more about the cross-validation-
https://scikit-learn.org/stable/modules/cross_validation.html
Thanks for reading. Please share!
You must be logged in to post a comment.