Dynamic Job Ordering and Slot Configurations for Mapreduce Workloads
In the dynamic MR process apart from the three concepts present in the paper, we are going to introduce clustering approach. In addition to the multi data center processing, we are going to add clustering concept. Because we are going to split the data and process the data in multiple data centers. If we combine the similar data’s into clusters using the k-means algorithm. By clustering the data we can able to process the data in short execution time. After preprocessing, we split our process into multiple files and apply clustering process. Here we are going to use k-means clustering algorithm which is a standard algorithm, which helps to process the data in a short execution time. We improve the performance of a MapReduce cluster via optimizing the slot utilization primarily from two perspectives. First, we can classify the slots into two types, namely, busy slots (i.e., with running tasks) and idle slots (i.e., no running tasks). Given the total number of map and reduce slots configured by users, one optimization approach (i.e., macro-level optimization) is to improve the slot utilization by maximizing the number of busy slots and reducing the number of idle slots. Second, it is worth not in that, not every busy slot can be efficiently utilized. Thus, our optimization approach (i.e., micro-level optimization) is to improve the utilization efficiency of busy slots after the macro-level optimization. Particularly, we identify two main affecting factors: Speculative tasks based on these, we propose DynamicMR, a dynamic utilization optimization framework for MapReduce, to improve the performance of a shared Hadoop cluster under a fair scheduling between users.
Published by: Prathamesh Chaudhari, Gaurav S. Salve, Nilesh Ghadge, Harish Barapatre
Author: Prathamesh Chaudhari
Paper ID: V3I2-1361
Paper Status: published
Published: March 28, 2017
Full Details