IIn recent years, Machine Learning (ML) methods have been rapidly developed, with incredible successes, in many domains and applications. Such rapid development has also widened the complexity gap between the methods currently investigated in research and those used in practice. One reason is that many algorithms, despite achieving state-of-the-art performance, usually ignore important efficiency and practical constraints of real-world applications. For example, in the hashing domain, simpler data-independent or linear algorithms such as Locality Sensitive Hashing are preferred over complex nonlinear, neural network-based methods because, despite the noticeable retrieval-performance improvements, these deep methods usually require significantly more training time.
In this talk, the speaker will discuss the research effort, powered by generative-based models, to bridge the gap between Machine Learning research and practice in two important domains: retrieval and ML model security. He will first discuss the motivations and broad goals of his works. Then, in the retrieval domain, he will introduce an optimization framework to replace the many quantization losses (>3-4) in existing hashing-based methods with a single divergence minimization loss, significantly reducing their training time requirement. This framework has been successfully used to improve both the training process and retrieval performance of the hashing-based methods in computational advertising, and text/image retrieval. Next, in the ML security domain, he will describe an adversarial formulation of a security game between the attacker and the model trainer to craft backdoor attacks on ML models that are extremely difficult to detect in practice. This formulation also allows the attacker to flexibly adapt to newly developed countermeasures, further demonstrating the critical security risks in the Machine-Learning-as-a-Service supply chain. Finally, he will conclude the talk with some potential future and unexplored directions. Together, these directions suggest a path forward for building ML models that are more suitable for practical applications.
About the Speaker
Khoa Doan is currently an AI Researcher at Baidu Research, USA. He received his PhD in Computer Science at Virginia Tech, and MS in Computer Science at the University of Maryland, College Park. His research focuses on developing practical Deep Learning-To-Hash models and generative-based Machine Learning approaches in various areas such as ML Security, and Retrieval. He has first-authored several papers in Data Mining (e.g., WWW), Information Retrieval (e.g., SIGIR), Machine Learning (NeurIPS), and Computer Vision (e.g., CVPR) conferences. In the past, he worked as Software Developer at various enterprise software companies, and as Data Scientist/Researcher in high-performance computing projects at NASA and various advertising companies, such as Criteo AI Lab. He also engages in AI technology with startups and is currently an ML Advisor for a stealth-mode analytic startup.
He is a tenure-track faculty candidate for the Artificial Intelligence & Data Science, Machine Learning & Intelligence.