MEWSE - Multi Engine Workflow Submission and Execution on Apache YARN

Loading...
Thumbnail Image

Authors

Sundaravarathan, Kiran

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

In this era of BigData, designing a workflow to gain insights from the vast amount of data has become more complex. There are several different frameworks which individually process the batch and streaming data but coordinating the jobs between the engines in the workflow creates a performance penalty and other performance issues. Current workflow systems typically run only on one engine and do not offer the versatility required for today’s workflows. The process of submitting the jobs on different engines manually is not only time consuming, but also requires the expertise of working on these engines. In this thesis, we have overcome the above mentioned issues by proposing a MEWSE - Multi Engine Workflow Submission and Execution on Apache YARN. It should also have design with plug and play functionalities to allow the inclusion of new engines. MEWSE has been tested on Amazon EC2 with a sample workflow which requires the following engines, Hadoop, Mahout, java and some scripts to process the data.

Description

Thesis (Master, Computing) -- Queen's University, 2015-09-14 18:00:28.306

Keywords

Big Data, Analytic systems, Workflow Submitter, Apache YARN

Citation

Endorsement

Review

Supplemented By

Referenced By