Hybrid Feature Selection Method for Improving File Fragment Classification

Alia Algurashi, Wenjia Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Identifying types of file fragments in isolation from their context is an essential task in digital forensic analysis and can be done with several methods. One common approach is to extract various types of features from file fragments as inputs for classification algorithms. However, this approach suffers from dimensionality curse as the number of the extracted features is too high, which causes the learning and classification to be both inefficient and inaccurate. This paper proposes a hybrid method to address this issue by using filters and wrappers to significantly reduce the number of features and also improve the accuracy of file type classification. First, it uses and combines three appropriate filters to filter out a large number of irrelevant and/or less important features, and then some wrappers to reduce the number of features further to the most salient ones. Our method was tested on some benchmark datasets - GovDocs, and the experimental results indicated that our method was able to not only reduce the number of features from 66,313 to 11–32, but also improve the accuracy of the classification, compared with other methods that used all the features.
Original languageEnglish
Title of host publicationArtificial Intelligence XXXVI - 39th SGAI International Conference on Artificial Intelligence, AI 2019, Proceedings
EditorsMax Bramer, Miltos Petridis
Number of pages13
ISBN (Electronic)978-3-030-34885-4
Publication statusPublished - 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11927 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


  • Feature selection
  • File fragment classification
  • Filters
  • Forensics
  • Hybrid method
  • Wrappers

Cite this