CS Colloquium: Ahmed Eldawy (University of California, Riverside) - Interactive Data Exploration as a Service
Tue, Oct 22, 2019
3:30 PM - 4:50 PM
Location: SAL 101
Speaker: Ahmed Eldawy, University of California, Riverside
Talk Title: Interactive Data Exploration as a Service
Series: Computer Science Colloquium
Abstract: Recently, there has been a tremendous growth in data collection from various sources such as satellites, IoT sensors, smartphones, autonomous cars, and others. At the same time, there is a move for open data led by governments, non-profit organizations, and industry which makes hundreds of thousands of datasets publicly available. This abundance of publicly available open data led to the new data revolution where everyone is interested in exploring this data to look for interesting patterns and innovative findings. While computer scientists and data scientists know how to process this data, no one is out to help citizen scientists, those with little to no knowledge about programming and data management.
This talk describes a new approach to provide citizen scientists with interactive data exploration as a service (IDEAS). The goal is to allow anyone to start exploring those publicly available datasets without a costly process of installing and learning data processing tools or even downloading the datasets of interest. This system will act as an ice breaker that will help engaging more citizen scientists into the field of data science. The main challenge is how to provide real-time processing for hundreds of thousands and petabytes of datasets through a simple interface. This talk describes three modules related to this system, synoptic computation, incremental indexing, and interactive visualization. The synoptic computation module scales up the query processing by providing a real-time approximate answer over small-size synopses of the data such as samples and histograms. The incremental indexing module works in the background and incrementally organizes the data over a cluster of machines to speed up the query processing. Finally, the interactive visualization module presents the results in a visual format which allows the users to inspect the query answers. Preliminary results on the proposed system show that it can bridge the gap between the user requirements for interactivity and the increasing volume of big spatial data.
This lecture satisfies requirements for CSCI 591: Research Colloquium.
Biography: Ahmed Eldawy is an Assistant Professor of Computer Science at the University of California, Riverside. His research interests lie in the broad area of databases with a focus on big data management and spatial data processing. Ahmed is the main inventor of SpatialHadoop, the most comprehensive open source system for big spatial data management. Ahmed has many collaborators in industrial research labs including Microsoft Research and IBM Watson. He was awarded the Quality Metrics Fellowship in 2016, Doctoral Dissertation Fellowship in 2015, and Best Poster Runner-up award in ICDE 2014. His work is supported by the National Science Foundation (NSF) and the US Department of Agriculture (USDA).
Host: Shahram Ghandeharizadeh