TY - GEN
T1 - An Evaluation of Image-Based Malware Classification Using Machine Learning
AU - Son, Tran The
AU - Lee, Chando
AU - Le-Minh, Hoa
AU - Aslam, Nauman
AU - Raza, Moshin
AU - Long, Nguyen Quoc
N1 - Publisher Copyright:
© 2020, Springer Nature Switzerland AG.
PY - 2020
Y1 - 2020
N2 - This paper investigates the image-based malware classification using machine learning techniques. It is a recent approach for malware classification in which malware binaries are converted into images (i.e. malware images) prior to feeding machine learning models, i.e. k-nearest neighbour (k-NN), Naïve Bayes (NB), Support Vector Machine (SVM) or Convolution Neural Networks (CNN). This approach relies on image texture to classify a malware instead of signatures or behaviours of malware collected via malware analysis, thus it does not encounter a problem if the signatures of a new malware variant has not been collected or the behaviours of a new malware variant has not been updated. This paper evaluates classification performance of various machine learning classifiers (i.e. k-NN, NB, SVM, CNN) fed by malware images in various dimensions (i.e., 128 × 128, 64 × 64, 32 × 32, 16 × 16). The experiment results achieved on three different datasets including Malimg, Malheur and BIG2015 show that k-NN outperforms others on three datasets with high accuracy (i.e. 97.9%, 94.41% and 95.63% respectively). On the contrary, NB showed its weakness on image-based malware classification. Experiment results also indicate that the accuracy of the k-NN reaches the highest value at the input image size of 32 × 32 and tends to reduce if too many feature information provided by large input images, i.e. 64 × 64, 128 × 128.
AB - This paper investigates the image-based malware classification using machine learning techniques. It is a recent approach for malware classification in which malware binaries are converted into images (i.e. malware images) prior to feeding machine learning models, i.e. k-nearest neighbour (k-NN), Naïve Bayes (NB), Support Vector Machine (SVM) or Convolution Neural Networks (CNN). This approach relies on image texture to classify a malware instead of signatures or behaviours of malware collected via malware analysis, thus it does not encounter a problem if the signatures of a new malware variant has not been collected or the behaviours of a new malware variant has not been updated. This paper evaluates classification performance of various machine learning classifiers (i.e. k-NN, NB, SVM, CNN) fed by malware images in various dimensions (i.e., 128 × 128, 64 × 64, 32 × 32, 16 × 16). The experiment results achieved on three different datasets including Malimg, Malheur and BIG2015 show that k-NN outperforms others on three datasets with high accuracy (i.e. 97.9%, 94.41% and 95.63% respectively). On the contrary, NB showed its weakness on image-based malware classification. Experiment results also indicate that the accuracy of the k-NN reaches the highest value at the input image size of 32 × 32 and tends to reduce if too many feature information provided by large input images, i.e. 64 × 64, 128 × 128.
KW - CNN
KW - Deep Learning
KW - Image-Based malware classification
KW - k- NN
KW - Naïve Bayes
KW - SVM
UR - http://www.scopus.com/inward/record.url?scp=85097058491&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-63119-2_11
DO - 10.1007/978-3-030-63119-2_11
M3 - Conference contribution
AN - SCOPUS:85097058491
SN - 9783030631185
T3 - Communications in Computer and Information Science
SP - 125
EP - 138
BT - Advances in Computational Collective Intelligence - 12th International Conference, ICCCI 2020, Proceedings
A2 - Hernes, Marcin
A2 - Wojtkiewicz, Krystian
A2 - Szczerbicki, Edward
PB - Springer
T2 - 12th International Conference on International Conference on Computational Collective Intelligence, ICCCI 2020
Y2 - 30 November 2020 through 3 December 2020
ER -