KiloCore: A 32-nm 1000-Processor Computational Array
العنوان: | KiloCore: A 32-nm 1000-Processor Computational Array |
---|---|
المؤلفون: | Jon J. Pimentel, Emmanuel Adeagbo, Brent Bohnenstiehl, Anh T. Tran, Aaron Stillmaker, Bin Liu, Timothy Andreas, Bevan M. Baas |
المصدر: | IEEE Journal of Solid-State Circuits, vol 52, iss 4 |
بيانات النشر: | Institute of Electrical and Electronics Engineers (IEEE), 2017. |
سنة النشر: | 2017 |
مصطلحات موضوعية: | Electrical & Electronic Engineering, Instructions per cycle, Computer science, Clock rate, Throughput, 02 engineering and technology, Processor array, Affordable and Clean Energy, Globally asynchronous locally synchronous, many core, Hardware_INTEGRATEDCIRCUITS, 0202 electrical engineering, electronic engineering, information engineering, Other Technology, Independent clock, parallel processor, Electrical and Electronic Engineering, business.industry, multicore, 020208 electrical & electronic engineering, Electrical engineering, Condensed Matter Physics, 020202 computer hardware & architecture, Memory management, CMOS, business, NoC |
الوصف: | A processor array containing 1000 independent processors and 12 memory modules was fabricated in 32-nm partially depleted silicon on insulator CMOS. The programmable processors occupy 0.055 mm2 each, contain no algorithm-specific hardware, and operate up to an average maximum clock frequency of 1.78 GHz at 1.1 V. At 0.9 V, processors operating at an average of 1.24 GHz dissipate 17 mW while issuing one instruction per cycle. At 0.56 V, processors operating at an average of 115 MHz dissipate 0.61 mW while issuing one instruction per cycle, resulting in an energy consumption of 5.3 pJ/instruction. On-die communication is performed by complementary circuit and packet-based networks that yield a total array bisection bandwidth of 4.2 Tb/s. Independent memory modules handle data and instructions and operate up to an average maximum clock frequency of 1.77 GHz at 1.1 V. All processors, their packet routers, and the memory modules contain unconstrained clock oscillators within independent clock domains that adapt to large supply voltage noise. Compared with a variety of Intel i7s and Nvidia GPUs, the KiloCore at 1.1 V has geometric mean improvements of 4.3 $\times$ higher throughput per area and 9.4 $\times$ higher energy efficiency for AES encryption, 4095-b low-density parity-check decoding, 4096-point complex fast Fourier transform, and 100-B record sorting applications. |
تدمد: | 1558-173X 0018-9200 |
DOI: | 10.1109/jssc.2016.2638459 |
URL الوصول: | https://explore.openaire.eu/search/publication?articleId=doi_dedup___::f0a8daa1aa55988b862b79367fb8e8b0 https://doi.org/10.1109/jssc.2016.2638459 |
Rights: | OPEN |
رقم الانضمام: | edsair.doi.dedup.....f0a8daa1aa55988b862b79367fb8e8b0 |
قاعدة البيانات: | OpenAIRE |
تدمد: | 1558173X 00189200 |
---|---|
DOI: | 10.1109/jssc.2016.2638459 |