Application of Variable Selection on K-Nearest Neighbors and Support Vector Machine for Classification of the Quality of Junior High Schools in Papua Province

Januarius Anongtop

Application of Variable Selection on K-Nearest Neighbors and Support Vector Machine for Classification of the Quality of Junior High Schools in Papua Province

Keywords : Classification, Quality of Junior High Schools, Boruta, KNN, and SVM.


Abstract

Classification is a supervised learning algorithm that aims to define an object's class into a predetermined type or category. Data with many variables will affect the classification algorithm training. Boruta algorithm is a wrapper technique built around a random forest algorithm to select relevant variables. K-Nearest Neighbors (KNN) is a classification algorithm that determines a new class based on the nearest neighbors in the unknown sample. Support Vector Machines (SVM) is a classification algorithm that aims to find the optimal hyperplanes that can linearly separate classes. Boruta produces five contributing factors to the quality of Junior High Schools in Papua. The KNN model with relevant variables and SMOTE training data produces an accuracy of 70%, sensitivity of 69%, specificity of 100%, and F1-score of 83%. The SVM model with relevant variables and SMOTE training data produces an accuracy of 81%, sensitivity of 83%, specificity of 50%, and F1-score of 89%. The SVM model performs better than the KNN model.

Download



Comments
No have any comment !
Leave a Comment