PENGARUH KOMPOSISI SPLIT DATA PADA AKURASI KLASIFIKASI PENDERITA DIABETES MENGGUNAKAN ALGORITMA MACHINE LEARNING

  • Febby Refindha Aftha Harianto Universitas Nahdlatul Ulama Sunan Giri
  • Zakki Alawi Universitas Nahdlatul Ulama Sunan Giri
  • Ita Aristia Sa’ida Universitas Nahdlatul Ulama Sunan Giri
Keywords: Algorithm, Classification, Diabetes, Machine Learning, Split Data

Abstract

The increasing number of people with diabetes is an international health problem. To prevent diabetic complications, early diagnosis and accurate classification are essential. This study looks at how the composition of split data affects the classification performance of diabetics with machine learning algorithms such as Random Forest, Naive Bayes, and Support Vector Machine (SVM). The research data is taken from Bojonegoro Regency Hospital, which consists of 128 samples that have 10 main features. To ensure the data is ready for use, the research method goes through a preprocessing stage. Next, the data was divided into training and testing data with a ratio of 90:10, 80:20, 70:30, 60:40, and 50:50 respectively. Using confusion matrix, the algorithm is assessed for accuracy, precision, recall, and F1 score. In this study we focus on the accuracy values obtained and the results show that the proportion of data sharing affects the performance of the algorithm. Random Forest achieved 100% accuracy in some scenarios. This algorithm also proved to be the most effective in the classification of diabetics. In conclusion, algorithm selection and data split composition are very important for model performance optimization. These results are important for the development of more accurate and efficient Machine Learning-based diagnosis systems. Further research can consider larger datasets and additional algorithms for better results.

References

Ainurrohma. (2021). Akurasi Algoritma Klasifikasi pada Software Rapidminer dan Weka. PRISMA, Prosiding Seminar Nasional Matematika, 4, 493–499. https://journal.unnes.ac.id/sju/index.php/prisma/
Angriani, S., & Baharuddin. (2020). Hubungan Tingkat Kecemasan Dengan Kadar Gula Darah Pada Penderita Diabetes Mellitus Tipe II Di Wilayah Kerja Puskesmas Batua Kota Makassar. Jurnal Ilmiah Kesehatan Diagnosis, 15(2), 102–106.
Baiq Nurul Azmi, Arief Hermawan, & Donny Avianto. (2023). Analisis Pengaruh Komposisi Data Training dan Data Testing pada Penggunaan PCA dan Algoritma Decision Tree untuk Klasifikasi Penderita Penyakit Liver. JTIM : Jurnal Teknologi Informasi Dan Multimedia, 4(4), 281–290. https://doi.org/10.35746/jtim.v4i4.298
Fathurahman, H., Ariwikri, A., Pratama, G. A., Fikri, M. A. F. S., & Alrizki, M. F. (2023). Perbandingan Akurasi Metode Naive Bayes Classifier Dan Random Forest Menggunakan Reduksi Dimensi Linear Dicriminant Analysis (Lda) Untuk Diagnosis Penyakit Diabetes. Jurnal Rekayasa Elektro Sriwijaya, 4(1), 24–31. https://doi.org/10.36706/jres.v4i1.58
Munir, A. S., Saputra, A. B., Aziz, A., & Barata, M. A. (2024). Perbandingan Akurasi Algoritma Naive Bayes dan Algoritma Decision Tree dalam Pengklasifikasian Penyakit Kanker Payudara. Jurnal Ilmiah Informatika Global, 15(1), 23–29. https://doi.org/10.36982/jiig.v15i1.3578
Nur Azizah, A., Falach Asy’ari, M., Wisma Dwi Prastya, I., & Purwitasari, D. (2023). Easy Data Augmentation untuk Data yang Imbalance pada Konsultasi Kesehatan Daring. Jurnal Teknologi Informasi Dan Ilmu Komputer, 10(5), 1095–1104. https://doi.org/10.25126/jtiik.20231057082
Prasetyo, Y. A., Utami, E., & Yaqin, A. (2024). Pengaruh Komposisi Split Data Terhadap Performa Akurasi Analisis Sentimen Algoritma Naïve Bayes dan SVM. 6(2), 382–390. https://doi.org/10.33650/jeecom.v4i2
Prastyo, P. H., Sumi, A. S., Dian, A. W., & Permanasari, A. E. (2020). Tweets Responding to the Indonesian Government’s Handling of Covid-19: Sentiment Analysis Using SVM with Normalized Poly Kernel. Journal of Information Systems Engineering and Business Intelligence, 6(2), 112. https://doi.org/10.20473/jisebi.6.2.112-122
Purnomo, A., Barata, M. A., Soeleman, M. A., & Alzami, F. (2020). Adding feature selection on Naïve Bayes to increase accuracy on classification heart attack disease. Journal of Physics: Conference Series, 1511(1). https://doi.org/10.1088/1742-6596/1511/1/012001
Ramon, E., Nazir, A., Novriyanto, N., Yusra, Y., & Oktavia, L. (2022). Klasifikasi Status Gizi Bayi Posyandu Kecamatan Bangun Purba Menggunakan Algoritma Support Vector Machine (Svm). Jurnal Sistem Informasi Dan Informatika (Simika), 5(2), 143–150. https://doi.org/10.47080/simika.v5i2.2185
Sanjaya, U. P., Alawi, Z., Zayn, A. R., & Dirgantoro, G. P. (2023). Optimasi Convolutional Neural Network dengan Standard Deviasi untuk Klasifikasi Pneumonia pada Citra X-rays Paru. Generation Journal, 7(3), 40–47. https://doi.org/10.29407/gj.v7i3.20183
Terbuka, P., Menurut, T. P. T., & Di, P. (2024). Algoritma K-Means Untuk Mengelompokkan Tingkat. 15(2), 75–81.
Ucha Putri, S., Irawan, E., Rizky, F., Tunas Bangsa, S., -Indonesia Jln Sudirman Blok No, P. A., & Utara, S. (2021). Implementasi Data Mining Untuk Prediksi Penyakit Diabetes Dengan Algoritma C4.5. Januari, 2(1), 39–46.
Yusnita, Y., Hi. A. Djafar, M., & Tuharea, R. (2021). Risiko Gejala Komplikasi Diabetes Mellitus Tipe II di UPTD Diabetes Center Kota Ternate. Media Publikasi Promosi Kesehatan Indonesia (MPPKI), 4(1), 60–73. https://doi.org/10.56338/mppki.v4i1.1391
Published
2025-01-07
How to Cite
Aftha Harianto, F., Alawi, Z., & Sa’ida, I. (2025). PENGARUH KOMPOSISI SPLIT DATA PADA AKURASI KLASIFIKASI PENDERITA DIABETES MENGGUNAKAN ALGORITMA MACHINE LEARNING. Jurnal Sistem Informasi Dan Informatika (Simika), 8(1), 36-44. https://doi.org/10.47080/simika.v8i1.3663