Domain-specific language for "pipe-and-filter" based chain processing of data

With the abundance of open data, data science is becoming more and more mainstream and data workers increasingly try to leverage the potential benefit that this implies. However, most data transformation approaches rely on general-purpose scripting or programming languages, which require data workers to acquire extensive programming skills. The small minority of higher-level visual approaches, on the other hand, do not provide powerful enough functionality to fully support user needs. The task of this thesis will be to produce a domain-specific language, which abstracts from the lower-level details of programming/scripting languages, and, at the same time, supports the needs of data workers as much as possible (also allowing them to write short custom programs). Furthermore, the task will imply developing a compiler to transform from the high-level language into lower-level programming code, which can then be executed to transform the data. The DSL should be able to operate in a pattern of "pipe-and-filter", whereby the result of any operation can be passed to the next operation.

Publisert 17. aug. 2016 13:12 - Sist endret 17. aug. 2016 13:12

Omfang (studiepoeng)