There is a fundamental discrepancy between the targeted and actual users of current analytics frameworks. Most systems are designed for the problems faced by the Googles and Facebooks of the world—petabytes of data distributed across large cloud deployments consisting of thousands of cheap commodity machines. Yet, the vast majority of users operate clusters ranging from a few to a few dozen nodes and analyze relatively small data sets of up to a few terabytes. Targeting these users fundamentally changes the way we should build analytics systems. Therefore, we are developing Tupleware, a new system specifically aimed at the challenges faced by the typical user. Tupleware’s architecture brings together ideas from the database and compiler communities to create a powerful end-to-end solution for data analysis. We propose novel techniques that consider the data, computations, and hardware together to achieve maximum performance on a case-by-case basis.
Machine learning and advanced statistics are important tools for drawing insights from large datasets. However, these techniques often require human intervention to steer computation towards meaningful results. To that end, we are building Vizdom, a new system for interactive analytics through pen and touch. Vizdom’s frontend allows users to visually compose complex workflows of machine learning and statistics operators on an interactive whiteboard, and the backend leverages recent advances in workflow compilation techniques to run these computations at interactive speeds. Additionally, we are exploring approximation techniques for quickly visualizing partial results that incrementally refine over time.