About The Talk The Android platform is becoming increasingly popular and various applications (Apps) have been developed by organizations to meet the market trends over years. While the security on the Android platform also grabs considerable attention from both academic and industrial communities. So far, many approaches have been proposed to detect Android malware in different ways, and most of them produce considerable performance under the given Android environment settings and labelled samples. However, the existing approaches suffer robustness problems when there exist changes on either Android OS specification or the labels of app samples used in detection model training. The first work in this proposal proposed a slow-aging solution on Android malware detection named SDAC. SDAC evolves its feature set effectively by evaluating new APIs' contributions to malware detection according to existing API's contributions. In detail, SDAC evaluates the contributions of APIs by their contexts in the API call sequences. These sequences are extracted from Android apps which can provide the real world statistic on how APIs are used. An embedding algorithm named API2Vec is introduced to mapping APIs into vector spaces as the semantics of APIs. After that, SDAC clusters all APIs based on the semantic distances among them to create a feature set in the training phase, and extends the feature set to include all new APIs in the detecting phase. By the feature extension, SDAC can adapt to the changes in Android specifications by simply identifying new APIs appearing in the detection phase, and thus produces a robust approach against changes in Android specification. The second work in this proposal is a general framework providing robustness towards label noises, which is named NoiseInspector. From our observation, it is found that labels of some samples provided by Anti-Virus organizations changed over time, due to the updating of their detection signature database. This changes represent that the previous labels can be erroneous, and thus distort the model performance when such labels are used in training Android malware detection models. NoiseInspector, works as a general framework, can detect the label noises together with different Android malware detection approaches and do not need samples with guaranteed correct labels. This is achieved by utilizing the intermediate states of detection models when they are in training process. NoiseInspector then identifies the mismatching between the predicted labels and the labels in the dataset from these intermediate states. The mismatching is considered to be caused by desired labels being mistaken. NoiseInspector then corrects the mistaken labels by applying unsupervised outlier detection algorithms on such mismatching and identify the outliers as noises. With the correction, NoiseInspector can thus provide robustness towards label noises to Android malware detection approaches, and can help improve their accuracy and fairly evaluate their actual performance. This dissertation proposal makes contributions to the robustness in Android Malware detection approaches. On the one hand, SDAC performs resistance towards Android specification changes, and have a relatively slower aging speed than the-state-of-art approaches. On the other hand, NoiseInspector as a general framework can help various kinds of Android malware detection approaches increase their robustness against the label noises. |