Report
Subgraph Stationary Hardware-Software Inference Co-Design
العنوان: | Subgraph Stationary Hardware-Software Inference Co-Design |
---|---|
المؤلفون: | Behnam, Payman, Tong, Jianming, Khare, Alind, Chen, Yangyu, Pan, Yue, Gadikar, Pranav, Bambhaniya, Abhimanyu Rajeshkumar, Krishna, Tushar, Tumanov, Alexey |
سنة النشر: | 2023 |
المجموعة: | Computer Science |
مصطلحات موضوعية: | Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Machine Learning |
الوصف: | A growing number of applications depend on Machine Learning (ML) functionality and benefits from both higher quality ML predictions and better timeliness (latency) at the same time. A growing body of research in computer architecture, ML, and systems software literature focuses on reaching better latency-accuracy tradeoffs for ML models. Efforts include compression, quantization, pruning, early-exit models, mixed DNN precision, as well as ML inference accelerator designs that minimize latency and energy, while preserving delivered accuracy. All of them, however, yield improvements for a single static point in the latency-accuracy tradeoff space. We make a case for applications that operate in dynamically changing deployment scenarios, where no single static point is optimal. We draw on a recently proposed weight-shared SuperNet mechanism to enable serving a stream of queries that uses (activates) different SubNets within this weight-shared construct. This creates an opportunity to exploit the inherent temporal locality with our proposed SubGraph Stationary (SGS) optimization. We take a hardware-software co-design approach with a real implementation of SGS in SushiAccel and the implementation of a software scheduler SushiSched controlling which SubNets to serve and what to cache in real-time. Combined, they are vertically integrated into SUSHI-an inference serving stack. For the stream of queries, SUSHI yields up to 25% improvement in latency, 0.98% increase in served accuracy. SUSHI can achieve up to 78.7% off-chip energy savings. Comment: 16 pages; MLSYS 2023 |
نوع الوثيقة: | Working Paper |
URL الوصول: | http://arxiv.org/abs/2306.17266 |
رقم الانضمام: | edsarx.2306.17266 |
قاعدة البيانات: | arXiv |
الوصف غير متاح. |