This thesis proposes a general solution framework that integrates methods in machine learning in creative ways to solve a diverse set of problems arising in urban environments. It particularly focuses on modeling spatiotemporal data for the purpose of predicting urban phenomena. Concretely, the framework is applied to solve three specific real-world problems: human mobility prediction, traffic speed prediction and incident prediction.
For human mobility prediction, I use visitor trajectories collected a large theme park in Singapore as a simplified microcosm of an urban area. A trajectory is an ordered sequence of attraction visits and corresponding timestamps produced by a visitor. This problem has two related sub-problems: (spatial) bundle prediction and trajectory prediction. In the first problem, I apply the framework to predict a bundle (i.e., an {unordered} set) of attractions that a given visitor would visit given a time budget. In the second problem, the framework is applied to predict the visitor's actual trajectory given the current partial trajectory and time budget. In both problems, I apply the methods of trajectory clustering, hidden Markov model, revealed preference learning and (inverse) reinforcement learning in the integrated framework.
In traffic speed prediction, I wish to predict the spatiotemporal distribution of traffic speed over urban road networks. To this end, I propose {local Gaussian processes} which combine non-negative matrix (NMF) factorization with Gaussian process (GP) in order to enhance the efficiency of model training such that the solution could be deployed in real-time use cases. NMF is essentially a spatiotemporal clustering technique. The solution is extensively evaluated using real-world traffic data collected in two U.S. cities.
The incident prediction problem is about predicting the distribution of the number of crime incidents over urban areas in future time periods. Because of its similarity to the traffic prediction problem above, its solution greatly benefits from the GP model developed earlier. Particularly, the GP kernel function is inherited and extended to model the distribution of incidents in urban areas and their features. The proposed solution is evaluated using real-world incident data collected in a large Asian city.
Conceptually, this thesis uses big data and machine learning techniques to solve three separate urban problems, whose contribution belongs to the large category of urban computing. At the core, its technical contribution lies in the unification of separate solutions tailored to those problems into an integrated framework that reasons with spatiotemporal data and, thus, is highly generalizable to other problems of similar nature.
LE Truc Viet is PhD candidate in Information Systems, specializing in Intelligent Systems & Optimization (IS&O) under the supervision of Prof. Lau Hoong Chuin. He is interested in applying artificial intelligence (AI) to solving real-world problems in human mobility and urban computing. He has authored several papers at top-tier AI conferences such as AAMAS and ECAI. He spent one year at Carnegie Mellon University (CMU) during AY 2014-15 and did a summer internship in 2016 at IBM Research (Singapore). He has been a machine learning developer at SAP since July, 2017, specializing in deep learning for practical business use cases.