Analisis Perbandingan Metode Machine Learning KNN dengan Naive Bayes pada Log Serangan Jaringan

Agus Fs Ndruru; Rika Rosnelly; Bob Subhan Riza

doi:10.33395/jmp.v15i1.15956

Authors

Agus Fs Ndruru Universitas Potensi Utama
Rika Rosnelly Universitas Potensi Utama
Bob Subhan Riza

DOI:

10.33395/jmp.v15i1.15956

Keywords:

CIC-IDS2017, Intrusion Detection System, K-Nearest Neighbor, Machine Learning, Naive Bayes

Abstract

Massive cybercrimes, such as Distributed Denial of Service (DDoS) attacks, demand rapid and accurate preventive measures through an Intrusion Detection System (IDS). This research aims to analyze and compare the performance of machine learning algorithms, specifically K-Nearest Neighbor (KNN) and Naive Bayes, in classifying network attack logs. The research methodology utilizes the public CIC-IDS2017 dataset through the stages of data preprocessing, model design, parameter optimization, and confusion matrix-based evaluation. The test results show that the KNN method with an optimal neighborhood value of K=3 achieved an accuracy rate of 99.92%, outperforming the Gaussian Naive Bayes algorithm, which recorded an accuracy of 99.52%. The superiority of KNN is also consistent across precision, recall, and F1-score metrics, as its distance-based approach (Euclidean) is capable of capturing the correlation of complex, nonlinear attack patterns. Conversely, the probabilistic approach of Naive Bayes has much lighter computational efficiency, but its performance is slightly hindered by the assumption of attribute independence. The implications of this research provide a strategic guideline that KNN is highly recommended for security systems that prioritize absolute accuracy and minimal false negatives, while Naive Bayes is ideal as an efficient initial monitoring filter. The conclusion of the study affirms that KNN is significantly more adaptive and accurate than Naive Bayes in detecting network anomalies. For future research, it is recommended to conduct tests using hybrid models, the application of deep learning, or the implementation of real-time detection on network traffic to comprehensively examine the system's scalability and computational load.

GS Cited Analysis