A large amount of mobile applications (Apps) are uploaded, distributed and updated in various Android markets, e.g., Google Play and Huawei AppGallery every day. One of the ongoing challenges is to detect malicious Apps (also known as malware) among those massive newcomers accurately and efficiently in the daily security management of Android App markets. Customers rely on those detection results in the selection of Apps upon downloading, and undetected malware may result in great damages. In this paper, we propose a cloud-based malware detection system called SaaS by leveraging and marrying multiple approaches from diverse domains such as natural language processing (n-gram), image processing (GLCM), cryptography (fuzzy hash), machine learning (random forest) and complex networks. We firstly extract n-gram features and GLCM features from an App's smali code and DEX file, respectively. We next feed those features into training data set, to create a machine learning detect model. The model is further enhanced by fuzzy hash to detect whether inspected App is repackaged or not. Extensive experiments (involving 1495 samples) demonstrates that the detecting accuracy is more than 98.5%, and support a large-scale detecting and monitoring. Besides, our proposed system can be deployed as a service in clouds and customers can access cloud services on demand.
- Fuzzy hash
- Machine learning