Involving CPUs into Multi-GPU Deep Learning

التفاصيل البيبلوغرافية
العنوان: Involving CPUs into Multi-GPU Deep Learning
المؤلفون: Tung D. Le, Haruki Imai, Taro Sekiyama, Yasushi Negishi, Kiyokuni Kawachiya
المصدر: ICPE
بيانات النشر: ACM, 2018.
سنة النشر: 2018
مصطلحات موضوعية: Speedup, Artificial neural network, Data parallelism, business.industry, Computer science, Deep learning, Training (meteorology), CPU time, 02 engineering and technology, Parallel computing, 010501 environmental sciences, 01 natural sciences, 020204 information systems, Scalability, 0202 electrical engineering, electronic engineering, information engineering, Artificial intelligence, business, Host (network), 0105 earth and related environmental sciences
الوصف: The most important part of deep learning, training the neural network, often requires the processing of a large amount of data and can takes days to complete. Data parallelism is widely used for training deep neural networks on multiple GPUs in a single machine thanks to its simplicity. However, its scalability is bound by the number of data transfers, mainly for exchanging and accumulating gradients among the GPUs. In this paper, we present a novel approach to data parallel training called CPU-GPU data parallel (CGDP) training that utilizes free CPU time on the host to speed up the training in the GPUs. We also present a cost model for analyzing and comparing the performances of both the typical data parallel training and the CPU-GPU data parallel training. Using the cost model, we formally show why our approach is better than the typical one and clarify the remaining issues. Finally, we explain how we optimized CPU-GPU data parallel training by introducing chunks of layers and present a runtime algorithm that automatically finds a good configuration for the training. The algorithm is effective for very deep neural networks, which are the current trend in deep learning. Experimental results showed that we achieved speedups of $1.21$, $1.04$, $1.21$ and $1.07$ for four state-of-the-art neural networks: AlexNet, GoogLeNet-v1, VGGNet-16, and ResNet-152, respectively. Weak scaling efficiency greater than $90$ was achieved for all networks across four GPUs.
DOI: 10.1145/3184407.3184424
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_________::9a7082ab254a3eb84d20bbf1f2dc095d
https://doi.org/10.1145/3184407.3184424
Rights: CLOSED
رقم الانضمام: edsair.doi...........9a7082ab254a3eb84d20bbf1f2dc095d
قاعدة البيانات: OpenAIRE