Preprint
Article

Efficient Resource Management for Deep Learning Applications with Virtual Containers

Altmetrics

Downloads

307

Views

147

Comments

0

This version is not peer-reviewed

Submitted:

01 November 2020

Posted:

02 November 2020

You are already at the latest version

Alerts
Abstract
The explosion of data has transformed the world since much more information is available for collection and analysis than ever before. To extract valuable information from the data in different dimensions, various deep learning models have been developed in the past years. Although these models have demonstrated their strong capability on improving products and services in various applications, training them is still a time-consuming and resource-intensive process. Presently, cloud, one of the most powerful computing infrastructures, has been used for the training. However, how to manage cloud computing resources and to perform the training efficiently is still challenging current techniques. For example, general resource scheduling approaches, such as spread priority and balanced resource schedulers, actually do not work well with deep learning workloads. Besides, the resource allocation problem on a cluster can be divide into two subproblems: (1) local resource optimization: improve resource configuration for a single machine; (2) global resource optimization: improve the cluster-wide resource allocation. In this thesis, we propose two novel container schedulers, FlowCon and SpeCon, that are designed to address these two subproblems respectively and specifically to optimize performance of short-lived deep learning applications in the cloud. FlowCon focuses on resource configuration of single-node in a cluster, as show that it efficiently improves deep learning tasks completion time and resource utilization, and reduces the completion time of a specific job by up to 42.06\% without sacrificing the overall system time. SpeCon targets on cluster-wide resource configuration that speculatively migrate slow-growing models to release resources for fast-growing ones. Based on our experiments, SpeCon improves makespan for up to 24.7\%, compared to current approaches.
Keywords: 
Subject: Computer Science and Mathematics  -   Software
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

© 2024 MDPI (Basel, Switzerland) unless otherwise stated