PENDETEKSI KOMENTAR SPAM YOUTUBE MENGGUNAKAN BAG OF WORD DAN RANDOM FOREST

Ronald Julio, Hanna Pratiwi, Yulia Wahyuningsih

Abstract

Youtube is the largest platform in the world that is used to share videos that you want to upload. The rapidly growing popularity of YouTube has allowed spammers to take advantage of YouTube in an illegal way. These spammers will send comments spam that varies and is very much on the videos contained on YouTube so that YouTube will find it difficult to detect them. Spam comments can be said as comments that are not appropriate or relevant to existing content. To eliminate users or individuals who spam on Youtube, it is solved using the Random Forest and bag-of-word methods, the data used as an experiment is data from UCI which contains the Youtube Spam Collection consisting of 2000 comments. The process of implementing the GridSearchCV algorithm gets an accuracy of 96% and the process of implementing the algorithm uses 2000 words, and does not use the TFIDF algorithm.

Keywords

spam; youtube; Random Forest;

Full Text:

PDF

References

[1] R. M. Silva, T. C. Alberto, T. A. Almeida, and A. Yamakami, “Towards filtering undesired short text messages using an online learning approach with semantic indexing,” Expert Syst. Appl., vol. 83, no. April, pp. 314–325, 2017, doi: 10.1016/j.eswa.2017.04.055.

[2] I. Idris et al., “A combined negative selection algorithm-particle swarm optimization for an email spam detection system,” Eng. Appl. Artif. Intell., vol. 39, no. March, pp. 33–44, 2015, doi: 10.1016/j.engappai.2014.11.001.

[3] I. Thoib, A. Setyanto, and S. Raharjo, “Pengaruh Normalisasi Teks Dengan Text Expansion Dalam Deteksi Komentar Spam Pada Youtube,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 2, no. 3, pp. 708–715, 2018, doi: 10.29207/resti.v2i3.602.

[4] T. C. Alberto, J. V. Lochter, and T. A. Almeida, “TubeSpam: Comment spam filtering on YouTube,” Proc. - 2015 IEEE 14th Int. Conf. Mach. Learn. Appl. ICMLA 2015, no. January, pp. 138–143, 2016, doi: 10.1109/ICMLA.2015.37.

[5] I. D. M. B. A. Darmawan, “Implementasi Real Time Pitch Detection Untuk Mendeteksi Nada Kidung Bali Dengan Python,” Semin. Nas. Sains dan Teknol. IV 2017, pp. 29–36, 2017.

[6] S. Khomsah and Agus Sasmito Aribowo, “Model Text-Preprocessing Komentar Youtube Dalam Bahasa Indonesia,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 4, no. 4, pp. 648–654, 2020.

[7] Y. Riadi Silitonga, Munawar, and I. Noor Hapsari, “Analisis Dan Penerapan Datamining Untuk Mendeteksi Berita Palsu (Fake News) Pada Social Media Dengan Memanfaatkan Modul Scikit Learn,” Undergrad. Theses Inf. Syst., 2019.

[8] F. Sodik, B. Dwi, and I. Kharisudin, “Perbandingan Metode Klasifikasi Supervised Learning pada Data Bank Customers Menggunakan Python,” J. Mat., vol. 3, pp. 689–694, 2020.

[9] D. Normawati and S. A. Prayogi, “Implementasi Naïve Bayes Classifier Dan Confusion Matrix Pada Analisis Sentimen Berbasis Teks Pada Twitter,” J. Sains Komput. Inform., vol. 5, no. 2, pp. 697–711, 2021.

[10] R. Novendri, R. Andreswari, and ..., “Implementasi Data Mining Untuk Memprediksi Customer Churn Menggunakan Algoritma Naive Bayes,” eProceedings …, vol. 8, no. 2, pp. 2762–2773, 2021, [Online]. Available: https://openlibrarypublications.telkomuniversity.ac.id/index.php/engineering/article/download/14678/14455.

[11] M. Z. Naf’an, A. Burhanuddin, and A. Riyani, “Penerapan Cosine Similarity dan Pembobotan TF-IDF untuk Mendeteksi Kemiripan Dokumen,” J. Linguist. Komputasional, vol. 2, no. 1, pp. 23–27, 2019, doi: 10.26418/jlk.v2i1.17.

[12] Andriana et al., “Prediksi Gelombang Corona Dengan Metode Neural Network,” JIKOMSI (Jurnal Ilmu Komput. dan Sist. Inf., vol. 3, no. 2, pp. 102–107, 2020.

Refbacks

  • There are currently no refbacks.