Thomas M. Siebel Professor of Computer Science
K2I Distinguished Lecture - Dr. Michael Franklin "Making Sense of Big Data with the Berkeley Data Analytics Stack"
Thursday, October 9, 2014
to 6:00 PM
McMurtry Auditorium Duncan Hall
6100 Main St
Houston, Texas, USA
The Berkeley AMPLab was founded to create a new approach to data analytics systems. Launching in early 2011, the project set out to break down the technical and intellectual barriers that had arisen during decades of evolutionary development. The vision of the lab is to seamlessly integrate the three main resources available for making sense of data at scale: Algorithms (such as machine learning and statistical techniques), Machines (in the form of scalable clusters and elastic cloud computing), and People (both individually as analysts and in crowds). The lab is realizing its ideas through the development of a freely-available Open Source software stack called BDAS: the Berkeley Data Analytics Stack. In the nearly four years the lab has been in operation, we've released major components of BDAS. Several of these components have gained significant traction in industry and elsewhere: the Mesos cluster resource manager, the Spark in-memory computation framework, and the Shark query processing system. BDAS features prominently in many industry discussions of the future of the Big Data analytics ecosystem - a rare degree of impact for an ongoing academic project.
Given this initial success, the lab is continuing on its research path, moving "up the stack" to better integrate and support advanced analytics and to make people a full-fledged resource for making sense of data. In this talk, I'll first outline the motivation and insights behind our research approach and describe how we have organized to address the cross-disciplinary nature of Big Data challenges. I will then describe the current state of BDAS with an emphasis on our newest efforts, including some or all of: the GraphX graph processing system, the MLBase machine learning platform, and the SampleClean framework for combining sampling and hybrid human/computer data cleaning. Finally I will present our current views of how all the pieces will fit together to form a system that can adaptively bring the right resources to bear on a given data-driven question to meet time, cost and quality requirements throughout the analytics lifecycle.
Biography of Michael Franklin:
Michael Franklin is the Thomas M. Siebel Professor of Computer Science and Chair of the Computer Science Division at the University of California, Berkeley. He has over 30 years of experience in the database, data analytics, and data management fields as a researcher, lab director, faculty member, entrepreneur, and software developer. Prof. Franklin is also the Director of the Algorithms, Machines, and People Laboratory (AMPLab) at UC Berkeley. The AMPLab currently works with 23 industrial sponsors including founding sponsors Amazon Web Services, Google, and SAP, and received a National Science Foundation CISE "Expeditions in Computing" award in March 2012. Prof. Franklin is a co-PI and Executive Committee member for the Berkeley Institute for Data Science, part of a multi-campus initiative to advance Data Science Environments. He is an ACM Fellow, a two-time winner of the ACM SIGMOD "Test of Time" award and received the outstanding Advisor Award from the Computer Science Graduate Student Association at Berkeley.