التفاصيل البيبلوغرافية
العنوان: |
Memory-Driven Data-Flow Optimization for Neural Processing Accelerators |
المؤلفون: |
Nie, Qi |
المساهمون: |
Malik, Sharad, Electrical Engineering Department |
بيانات النشر: |
Princeton University |
سنة النشر: |
2020 |
المجموعة: |
DataSpace at Princeton University |
مصطلحات موضوعية: |
Data flow optimization, Design space pruning, Memory constraints, Memory utilization, Neural network accelerators, Power efficiency, Electrical engineering |
الوصف: |
Neural processing applications are widely used in many fields like vision, speech recognition and language processing to realize artificial intelligence, but they are very computationally challenging given the large scale of data. The demand for high-performance, low-power computing of these applications has led to their implementation through specialized accelerators. With respect to both time and energy, memory access is much more expensive than computation. Thus, in accelerators, computation needs to exploit locality in SRAM to reduce DRAM requests as well as locality in datapath registers to reduce SRAM access. Neural applications generally include highly interleaved data reuse and this results in a working set which scales with the input size. However, the limited size and bandwidth of on-chip SRAM and registers are insufficient to provide local data storage and movement for the working set in large-scale computation. This results in significant expensive data movement across the memory hierarchy. To address this challenge, in this dissertation, I first define an optimization problem for minimizing data movement for a given application and architecture. The degrees of freedom (variables) in this optimization problem are: loop ordering, loop tiling and memory partitioning. The solution to this problem provides optimal values of these variables to maximize the data reuse at each level of memory. The design space of optimizing local memory utilization is large, and challenging to explore completely. For each point in the design space, I first provide analytical models to estimate its cost, i.e. the number of data movements across memory levels. I then investigate multiple techniques to prune the design space to find the optimal design efficiently. Finally, I extended to sparse scenarios as well where data distribution and reuse patterns are irregular. In summary, this thesis demonstrates the necessity of optimizing dataflow across the memory hierarchy to reduce expensive remote memory accesses and thus ... |
نوع الوثيقة: |
doctoral or postdoctoral thesis |
اللغة: |
English |
Relation: |
The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: catalog.princeton.edu; http://arks.princeton.edu/ark:/88435/dsp01cf95jf42w |
الاتاحة: |
http://arks.princeton.edu/ark:/88435/dsp01cf95jf42w |
رقم الانضمام: |
edsbas.8E7AA6C6 |
قاعدة البيانات: |
BASE |