Model-based Hadoop YARN Auto-Configuration
Apache Hadoop YARN is a widely used open-source software framework supported by Apache to process big data on a cluster of commodity nodes. Because of its simplicity, cost efficiency, scalability, and fault tolerance, a wide variety of organizations and companies, such as Google, Yahoo!, Facebook, and Amazon, have used Hadoop YARN for both research and production. However, the performance of Hadoop YARN highly relies on configuring appropriate parameters. Traditional approach is based on trial and error, which is time-consuming and no guarantee to achieve best performance.
The goal of this master thesis is to investigate an appropriate hyperparameter optimization approach for ABS-YARN, which is an executable model written in ABS (Abstract Behavioral Specification) language for modeling and simulating Hadoop YARN. The syntax of ABS is similar to Java, so it is easy to understand and learn. You are expected to implement an appropriate hyperparameter optimization approach for ABS-YARN and automate the entire optimization process.
Through this project, you will learn and be familiar with
- Apache Hadoop YARN technology
- State-of-the-art ABS language for specifying distributed systems
- The state-of-the-art hyperparameter optimization algorithms
You will work in an active research environment and have practical experience in developing and implementing techniques and tools.
Jia-Chun Lin, Ingrid Chieh Yu, Einar Broch Johnsen, and Ming-Chang Lee, “ABS-YARN: A Formal Framework for Modeling Hadoop YARN Clusters,” Proceedings of the 19th International Conference on Fundamental Approaches to Software Engineering (FASE 2016). Springer, Berlin, Heidelberg, 2016, pp. 49–65.