| |
Hierarchical Learning of Cross-Language Mappings through Distributed Vector Representations for Code Speaker (s): 
BUI Duy Quoc Nghi PhD Candidate School of Information Systems Singapore Management University | Date:
Time:
Venue: | | May 25, 2018, Friday
2:00pm - 2:30pm
Seminar Room 3.1, Level 3 School of Information Systems Singapore Management University 80 Stamford Road Singapore 178902 We look forward to seeing you at this research seminar. ![]()
|
|
ABOUT THE TALK Translating a program written in one programming language to another can be useful for software development tasks that need functionality implementations in different languages. Although past studies have considered this problem, they may be either specific to the language grammars, or specific to certain kinds of code elements (e.g., tokens, phrases, API uses). We propose a new approach to automatically learn cross-language representations for various kinds of structural code elements that may be used for program translation. Our key idea is two folded: First, we normalize and enrich code token streams with additional structural and semantic information, and train cross-language vector representations for the tokens (a.k.a. shared embeddings based on word2vec, a neural-network-based technique for producing word embeddings; Second, hierarchically from bottom up, we construct shared embeddings for code elements of higher levels of granularity (e.g., expressions, statements, methods) from the embeddings for their constituents, and then build mappings among code elements across languages based on similarities among embeddings. When compared with existing tools for mapping library API methods, our approach identifies many more mappings accurately. Our approach can also automatically learn shared embeddings for various code elements in different languages and identify their cross-language mappings with reasonable Mean Average Precision scores. This a pre-conference talk for 40th International Conference on Software Engineering (ICSE 2018). About the Speaker Nghi Bui is a second year PhD candidate in School of Information Systems, Singapore Management University. He is supervised by Associate Professor Lingxiao Jiang. His current research focuses on machine learning for programming language semantics to understand the behavior of software programs.
|