swFLOW: A large-scale distributed framework for deep learning on Sunway TaihuLight supercomputer

التفاصيل البيبلوغرافية
العنوان: swFLOW: A large-scale distributed framework for deep learning on Sunway TaihuLight supercomputer
المؤلفون: Mingfan Li, Qian Xiao, Junshi Chen, Rongfen Lin, Fei Wang, Guang R. Gao, Han Lin, Jose Monsalve Diaz, Hong An
المصدر: Information Sciences. 570:831-847
بيانات النشر: Elsevier BV, 2021.
سنة النشر: 2021
مصطلحات موضوعية: Information Systems and Management, Speedup, Computer science, Distributed computing, 02 engineering and technology, computer.software_genre, Convolutional neural network, Bottleneck, Theoretical Computer Science, Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, Sunway TaihuLight, business.industry, Deep learning, 05 social sciences, 050301 education, Supercomputer, Computer Science Applications, Software framework, Stochastic gradient descent, Control and Systems Engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, 0503 education, computer, Software
الوصف: Deep learning technology is widely used in many modern fields and a number of models and software frameworks have been proposed. However, it is still very difficult to process deep learning tasks efficiently on traditional high performance computing (HPC) systems. In this paper, we propose swFLOW: a large-scale distributed framework for deep learning on Sunway TaihuLight. Based on the performance analysis results of convolutional neural network (CNN), we optimize the convolutional layer , and get 10.42× speedup compared to the original version. As for distributed training, we use elastic averaging stochastic gradient descent (EASGD) algorithm to reduce communication. On 512 processes, we get a parallel efficiency of 81.01% with communication period τ = 8 . Particularly, a decentralized implementation of distributed swFLOW system is presented to alleviate bottleneck of the central server. By using distributed swFLOW system, we can scale the batch size up to 4096 among 1024 concurrent processes for cancerous region detection algorithm . The successful application on swFLOW reveals the great opportunity for joint combination of deep learning and HPC system.
تدمد: 0020-0255
DOI: 10.1016/j.ins.2020.12.079
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_________::85dd0f956723ecae0eb701eb60f8a1d3
https://doi.org/10.1016/j.ins.2020.12.079
Rights: CLOSED
رقم الانضمام: edsair.doi...........85dd0f956723ecae0eb701eb60f8a1d3
قاعدة البيانات: OpenAIRE