Today’s analytics environments are characterized by a high degree of heterogeneity in terms of data systems, formats and types of analysis. Many occasions call for rapid, ad hoc, on demand construction of a data model that represents (parts of) the data infrastructure of an organization, including ML tasks. This data model is given to data scientists to play with (express reports, build ML models, explore, etc.) We present a novel graph-based conceptual model, the Data Virtual Machine (DVM) representing data (persistent, transient, derived) of an organization. By using a higher level abstraction, users deal with entities and attributes, concepts that most people are familiar with. A DVM can be built quickly and agilely, offering schema flexibility. It is amenable to visual interfaces for schema and query management. Dataframing, one of the most frequent analytics task, is usually carried out by experienced data engineers employing Python or R: a procedural approach with all the known drawbacks. Dataframes over DVMs are expressed declaratively - and visually, via a simple and intuitive tool. This way, non-IT experts can be involved in dataframing. In addition, query evaluation takes place within an algebraic framework with all the known benefits. I.e. a DVM enables the delegation of data engineering tasks to simpler users. We have seen analogous cases in the past, e.g. with the introduction of SQL. Finally, a DVM offers a formalism that facilitates data sharing, data portability and a single view of any entity -- because a DVM's node is an attribute and an entity at the same time. In this respect, DVMs can excellently serve as a data virtualization technique, an emerging trend in the industry. We argue that DVMs can have a significant practical impact in today's analytics environments.
About the Speaker
Damianos Chatziantoniou received his B.Sc. in Applied Mathematics from the University of Athens (June 1991, summa cum laude) and continued his studies in Computer Science at Courant Institute of Mathematical Sciences at New York University (M.Sc.) and Columbia University (Ph.D.) His academic research interests include big data systems, business intelligence, large-scale analytics, query processing, data streams and real-time analysis. He has published more than 40 articles at top conferences and journals, such as VLDB, ICDE, EDBT, KDD, SIGMOD, CIKM, Journal of Information Systems, Journal of Data and Knowledge Engineering and elsewhere. He is currently an Associate Professor at Athens University of Economics and Business (AUEB) - Department of Management Science and Technology - and Director of AUEB's new Master's program in Business Analytics. Besides academia, Damianos has been involved in several technology start-ups. Panakea Software Inc. (founder, 1998), based in New York City, developed and marketed BI technology to make certain analytics (ad-hoc OLAP) easier to express and faster to evaluate. Clients included Dun & Bradstreet, Columbia-Presbyterian Medical Center and Philips North America. VoiceWeb SA (founder, 2001), based in Athens, focused on speech & telecom applications. Damianos has served in 2007-2008 as a senior research consultant in Aster Data Systems, a pioneer in big data systems. Aster Data was acquired in March of 2011 by Teradata.