There is a fundamental discrepancy between the targeted and actual users of current analytics frameworks. Most systems are designed for the problems faced by the Googles and Facebooks of the world—petabytes of data distributed across large cloud deployments consisting of thousands of cheap commodity machines. Yet, the vast majority of users operate clusters ranging from a few to a few dozen nodes and analyze relatively small data sets of up to a few terabytes. Targeting these users fundamentally changes the way we should build analytics systems.
Therefore, we are developing Tupleware, a new system specifically aimed at the challenges faced by the typical user. Tupleware’s architecture brings together ideas from the database and compiler communities to create a powerful end-to-end solution for data analysis. We propose novel techniques that consider the data, computations, and hardware together to achieve maximum performance on a case-by-case basis.