DATA MINING ANALYSIS USING THE KNN ALGORITHM TO DETERMINE THE ACTIVE AND INACTIVE STATUS OF 5TH SEMESTER INFORMATION SYSTEMS STUDENTS (CASE STUDY: SEPULUH NOPEMBER UNIVERSITY PAPUA)
Keywords:
Data Mining, KNN Algorithm, Student Status Classification, Academic Performance, Early Warning SystemAbstract
The accelerated advancement of information technology requires higher education institutions, including Universitas Sepuluh Nopember Papua (USNP), to leverage data analytics in support of strategic decision-making, particularly in the management of student activity status. One of the major challenges faced is the early and accurate identification of students at risk of becoming inactive, especially in the fifth semester, which represents a critical stage of study where inactivity rates of approximately 15–22% have been identified within the Information Systems program. This study seeks to address the limited body of research that examines student status classification within the context of universities in Eastern Indonesia. Accordingly, the primary objective of this research is to examine the activity patterns of fifth-semester students through the development of a classification model based on the K-Nearest Neighbor (KNN) algorithm. This study adopts a quantitative research design combined with computational experimentation, utilizing a total population sample of 80 students from the 2023 Information Systems cohort. The research relies on secondary data obtained from the datasets “Mahasiswa SI 2023.xlsx” and “ipk mhs aktif sistem informasi.xlsx”. The dependent variable examined is Student Status (Active/Inactive), while the independent variables include Cumulative Grade Point Average (IPK), the number of credits successfully completed, and other relevant administrative attributes. Data preprocessing procedures consist of dataset integration, data cleaning, imputation of missing IPK values using the mean value (2.891 based on 58 observations), label encoding (Active = 1, Inactive = 0), and normalization of numerical features. The K-Nearest Neighbor (KNN) classification model is developed using the Euclidean distance metric, with several K values (3, 5, and 7) evaluated to determine optimal performance. Model effectiveness is subsequently assessed using accuracy, precision, recall, and F1-score metrics. The results are expected to show that the K-Nearest Neighbor (KNN) algorithm is capable of accurately classifying student status, with Cumulative Grade Point Average (IPK) identified as a key influencing variable. This study contributes by developing a KNN-based classification model specifically designed to predict student engagement in the fifth semester, thereby providing Universitas Sepuluh Nopember Papua (USNP) with a practical, data-driven analytical tool to support early intervention initiatives and enhance the quality of academic services
References
[1] R. D. Malik and T. Gunawan, “Application of data mining techniques for academic data pattern discovery in higher education,” International Journal of Data Science and Education, vol. 5, no. 2, pp. 87–98, 2022.
[2] S. Prasetyo and A. H. Nugraha, “A comprehensive review of Machine Learning (ML) algorithms for student performance prediction,” Journal of Educational Technology and Analytics, vol. 14, no. 1, pp. 33–49, 2023.
[3] M. R. Sihombing, Y. Kurniawan, and L. A. Setiawan, “Comparative analysis of KNN, SVM, and Naïve Bayes algorithms for academic classification,” Journal of Intelligent Computing and Systems, vol. 12, no. 4, pp. 211–223, 2024.
[4] F. Astuti and D. Cahyono, “Academic determinants influencing student activity status in university learning systems,” Education and Information Technologies, vol. 30, no. 1,
pp. 145–162, 2023.
[5] R. M. Latuconsina and P. A. Tumiwa, “Identifying factors contributing to student inactivity and dropout in Indonesian higher education,” Journal of Educational Development Studies, vol. 12, no. 3, pp. 54–67, 2021.
[6] D. Wibisono and H. Maukar, “Machine Learning (ML) techniques for predicting university student retention: A systematic review,” Journal of Data Science and Analytics, vol. 6, no. 2, pp. 101–122, 2024.
[7] N. Firmansyah and S. Y. Rahardjo, “Improving K-Nearest Neighbor (KNN) performance using distance weighting for student status classification,” Indonesian Journal of Artificial Intelligence Research, vol. 8, no. 1, pp. 55–66, 2025
[8] Vaarma, M. (2024). Predicting student dropouts with machine learning.
ScienceDirect.
[9] Rodrigues, H. S., dkk. (2024). Artificial Intelligence Algorithms to Predict College Students ... (Scitepress).
[10] Irawan, S.R. (2023). Classification of student performance based on first half-semester (Fuzzy KNN). Universitas Indonesia repository.
[11] Munazhif, N.F. (2023). Implementation of K-Nearest Neighbor (kNN) Method to ...
(Jurnal lokal).
[12] Asro, A. (2025). Evaluasi Kinerja Algoritma Klasifikasi dalam Studi Kasus ...
(2025).
[13] Altman, N. S. (2022). An Introduction to the k-Nearest Neighbors Algorithm and Its Applications. Journal of Machine Learning Research, 23(1), 1–20.
[14] Zhang, Q., & Wang, L. (2023). Optimizing K-Nearest Neighbor Classification Through Feature Normalization Techniques. Expert Systems with Applications, 225, 120–131.
[15] Li, S., & Chen, Y. (2024). Performance Analysis of K-NN Variants for Imbalanced Classification. Information Sciences, 646, 119–133.
[16] Kumar, R., & Singh, A. (2021). Student Performance Prediction Using Machine Learning Algorithms: A Comparative Study. Education and Information Technologies, 26(5), 1235–1251.
[17] Wang, Y., & Xu, H. (2023). Data Preprocessing Strategies for Enhancing Machine Learning Classification Accuracy. Applied Intelligence, 53, 8765–8780.
[18] Abidin, R., & Wahyudi, T. (2022). Penerapan Data Mining untuk Prediksi Status Mahasiswa Menggunakan Algoritma K-Nearest Neighbor. Jurnal Teknologi Informasi dan Komputer, 10(2), 145–154.
[19] Arifin, M., & Ramdani, S. (2023). Analisis Faktor Akademik Terhadap Status Keaktifan Mahasiswa Menggunakan Metode Klasifikasi. Jurnal Sistem Informasi dan Sains Data, 12(1), 33–42.
[20] Fauziah, S., & Hartono, R. (2024). Implementasi KNN untuk Prediksi Keberlanjutan Studi Mahasiswa Berdasarkan IPK dan Riwayat Akademik. Jurnal Ilmu Komputer Terapan, 8(3), 201–210.
[21] Hidayat, A., & Pratama, F. (2021). Data Mining: Konsep dan Implementasi Algoritma Klasifikasi dalam Prediksi Akademik. Bandung: Informatika Publisher.
[22] Latif, M. S., & Siregar, R. (2022). Pengaruh IPK dan Aktivitas Akademik Terhadap Status Mahasiswa Aktif. Jurnal Evaluasi Pendidikan Tinggi, 6(2), 77–86.
[23] Ningrum, D., & Setiawan, B. (2023). Metodologi Penelitian Kuantitatif untuk Studi Sistem Informasi. Surabaya: Media Sains Indonesia.
[24] Prasetyo, D., & Lestari, F. (2021). Machine Learning untuk Klasifikasi Data Pendidikan Menggunakan Algoritma KNN. Jurnal Informatika dan Teknologi Digital, 5(4), 280–289.
[25] Putra, J., & Nugroho, R. (2025). Penerapan Algoritma K-Nearest Neighbor untuk Analisis Keaktifan Mahasiswa Program Sarjana. Jurnal Sains Data dan Analitik, 4(1), 55–63.
[26] Rahmawati, N., & Yusuf, M. (2024). Model Prediksi Risiko Mahasiswa Non-Aktif Menggunakan Pendekatan Data Mining. Jurnal Pendidikan dan Teknologi Informasi, 9(2), 115–124.
[27] Sukri, M. (2021). Prediksi Kelulusan Tepat Waktu Menggunakan Algoritma K-Nears Neighbour. Jurnal Sistem Informasi Akademik, 7(1), 24–32





