This paper is published in Volume-9, Issue-2, 2023
Area
Computational Science: Big Data & Distributed Systems
Author
Rajkamal Mahamuni Natarajan, Raj Pravinbhai Manvar, Thai Bui
Org/Univ
Indeed Inc, Austin, TX, USA
Keywords
Data Engineering, Autoscaling, Team Tenancy, Cost Attribution, Big Data
Citations
IEEE
Rajkamal Mahamuni Natarajan, Raj Pravinbhai Manvar, Thai Bui. Advanced autoscaling and team tenancy for Hadoop job clusters, International Journal of Advance Research, Ideas and Innovations in Technology, www.IJARIIT.com.
APA
Rajkamal Mahamuni Natarajan, Raj Pravinbhai Manvar, Thai Bui (2023). Advanced autoscaling and team tenancy for Hadoop job clusters. International Journal of Advance Research, Ideas and Innovations in Technology, 9(2) www.IJARIIT.com.
MLA
Rajkamal Mahamuni Natarajan, Raj Pravinbhai Manvar, Thai Bui. "Advanced autoscaling and team tenancy for Hadoop job clusters." International Journal of Advance Research, Ideas and Innovations in Technology 9.2 (2023). www.IJARIIT.com.
Rajkamal Mahamuni Natarajan, Raj Pravinbhai Manvar, Thai Bui. Advanced autoscaling and team tenancy for Hadoop job clusters, International Journal of Advance Research, Ideas and Innovations in Technology, www.IJARIIT.com.
APA
Rajkamal Mahamuni Natarajan, Raj Pravinbhai Manvar, Thai Bui (2023). Advanced autoscaling and team tenancy for Hadoop job clusters. International Journal of Advance Research, Ideas and Innovations in Technology, 9(2) www.IJARIIT.com.
MLA
Rajkamal Mahamuni Natarajan, Raj Pravinbhai Manvar, Thai Bui. "Advanced autoscaling and team tenancy for Hadoop job clusters." International Journal of Advance Research, Ideas and Innovations in Technology 9.2 (2023). www.IJARIIT.com.
Abstract
Hadoop has a powerful framework to process large distributed data across distributed nodes. Hadoop Data Lake empowers users of different streams like engineers, data scientists, and analysts in an organization. Slicing and dicing of data, complex computations can be expressed and built with Query Language and don’t need the expertise or knowledge of Map-Reduce or Spark jobs. Once the complex logic is expressed as a query and the query is scheduled to run in a regular cadence, the cluster can be auto-scaled based on applications running at a given time rather than running a static big cluster. In a multi-tenant cluster, where autoscaling is performed, resources should be isolated and dedicated to achieve multi-tenancy. Dedicated resources will drive attributing costs to a specific team/tag. Dedicated resources also solve the noisy neighbor problem. This paper details the architecture, algorithm, and framework to allow multi-tenant job clusters to achieve team tenancy and auto-scale the cluster seamlessly.