The ROADRUNNER project is a research project that is co-financed by the European Union (European Social Fund - ESF) and Greek national funds through the Operational Program "Education and Lifelong Learning" of the National Strategic Reference Framework (NSRF) - Research Funding Program: Aristeia II. Investing in knowledge society through the European Social Fund.
Computation local to data avoids network overload
Easy to handle partial failures. Avoid crawling horrors of failure. Speculative execution to work around stragglers.
Designed for cheap, commodity hardware.
The "end-user' programmer only writes MapReduce tasks.
We intend to explore the applicability of various types of data synopses for supporting efficient data analytics.
Our focus will be on effective techniques that produce the requested query result as soon as possible.
We intend to exploit intentionally constructed data synopsis that guide the search and enable retrieval of the top-k results, without scanning large portions of the data.
As the volume of generated data increases everyday with tremendous rates, we have entered the era of Big Data.Processing and analyzing such massive data sets is accomplished by means of cloud computing. In the case of Big Data, the sheer volume of data poses tremendous challenges for efficient and scalable data analytics.
Arguably, the most popular solution for massive data analysis is MapReduce which is a programming model that allows for parallel processing of jobs on a large cluster of commodity machines, with built-in salient features such as scalability, fault-tolerance, flexibility and ease of programming.
However, MapReduce has been criticized by the data management community about its lack of efficiency, when compared to parallel database systems.
The ROADRUNNER project aims to directly address this shortcoming of existing state of the art techniques for both efficient and scalable data analytics, by introducing various optimizations to MapReduce processing, thus enhancing its functionality and improving its performance. In essence, ROADRUNNER intends to propose a new framework for large-scale analysis of Big Data that builds on successful ideas originating from the MapReduce model (scalability) coupled with appropriately adjusted established techniques for query processing and optimization known for decades in the parallel data management community (efficiency). It is important to mention that the ROADRUNNER framework will not trade scalability or fault-tolerance for efficiency, but will provide efficient query processing mechanisms without compromising the remaining good features of MapReduce. To achieve this objective, the ROADRUNNER framework will focus on optimizing the utilization of resources and minimizing the amount of wasteful processing, by determining opportunities for early termination of processing without accessing the underlying data exhaustively.