Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
Big-Data streaming applications are used in several domains such as social media analysis, financial analysis, video annotation, surveillance, medical services and traffic prediction. These applications, running on different types of platforms from mobile devices to servers, are characterized by a highly-variable stochastic input data stream, stringent delay constraints and complex task graphs. Several software and hardware optimization techniques have been proposed to maximize the execution quality and the throughput of these applications and to minimize their energy consumption on many-core platforms. By analyzing the existing techniques, one can observe that most solutions classify as a hardware-based task-specific optimization, or as an operating system scheduler optimization, or yet as a load shedding mechanism at the application level. Each of these categories is limited in scope and can be blind to the nature of the program, the data being processed or the characteristics of the hardware. Big-Data streaming applications, due to their wide range of host hardware and content and dynamically-changing input streams, expose the fragmentation of the optimization techniques and create a clear need for a better approach. In this thesis, I propose a suite of energy-efficient hardware-software co-design techniques to bridge the gap between modern Big-Data streaming applications and existing many-core platforms. I choose to model the task graph of the class of applications I consider here by a direct acyclic graph (DAG). First, at the application layer, I propose a unified DAG monitoring solution to process online the general DAG model of the application and provide a set of relevant information that is leveraged at run time by a connected scheduler. At the operating system layer, I propose three different online scheduling solutions for many-core platforms which leverage the feedback from both the application and the hardware layers. The first scheduler addresses the problem of minimizing the energy consumption and the deadlines miss rates of multimedia applications. It takes advantage of the output of the DAG monitoring solution to adapt the mapping of the tasks of multimedia applications to the hardware according to the detected performance and targeted quality of service. The second and third schedulers address the problem of maximizing the quality and throughput and minimizing the energy consumption of Big-Data stream mining applications with single and multiple streams originating from different sources. To cope with the dynamically-changing Big-Data characteristics, the schedulers integrate machine-learning techniques to learn the environment dynamics and the application requirements and adapt the scheduling policy to the desired quality of service. I show that the proposed schedulers are able to scale the execution of data-mining applications to the system capability even in the presence of concept drift. Last, at the hardware layer, to address existing system architectures limitations and to increase even more the throughput, I propose a novel low-power many-core architecture for modern Big-Data stream mining applications that integrates a novel flexible memory hierarchy able to adapt to the dynamic data-driven nature of the input data stream.
Andrea Rinaldo, Gianluca Botter
,