자료유형 | 학위논문 |
---|---|
서명/저자사항 | Design of a Scalable, Configurable, and Cluster-based Hierarchical Hardware Accelerator for a Cortically Inspired Algorithm and Recurrent Neural Networks. |
개인저자 | Dey, Sumon. |
단체저자명 | North Carolina State University. |
발행사항 | [S.l.]: North Carolina State University., 2019. |
발행사항 | Ann Arbor: ProQuest Dissertations & Theses, 2019. |
형태사항 | 128 p. |
기본자료 저록 | Dissertations Abstracts International 81-05B. Dissertation Abstract International |
ISBN | 9781392766576 |
학위논문주기 | Thesis (Ph.D.)--North Carolina State University, 2019. |
일반주기 |
Source: Dissertations Abstracts International, Volume: 81-05, Section: B.
Advisor: Laber, Eric |
이용제한사항 | This item must not be sold to any third party vendors. |
요약 | Machine learning algorithms based on deep learning have met with enormous success to achieve higher performance in applications ranging from object recognition to defeating a human expert in the complex game. Furthermore, it can broaden the horizon by processing a massive amount of multimodal natural data (video, audio) and learning useful join representations in applications. In addition to deep learning, cortical learning algorithms can also learn representations in much closer to in a way a human brain works. Unlike the use of dense data in deep learning, binary data is used in cortical learning to model sparse distributed memory for different representations. These algorithms use artificial neural networks to model them into hardware. However, implementation of such networks in hardware relies on throughput and memory bandwidth of hardware architecture, which requires dealing with massive amount of data. To advance the research of these rapidly evolving techniques, there is a direct need for the design and implementation of specialized hardware to accelerate these algorithms. In this work, a scalable, configurable, and cluster-based hierarchical hardware accelerator is designed and implemented through an application-specific integrated circuit (ASIC) for Sparsey, a cortical learning algorithm. Also, an application-specific instruction set processor (ASIP) is designed and implemented for recurrent neural networks (RNNs). A distributed on-chip memory organization is designed and implemented in ASIC to improve memory bandwidth and accelerate the memory read and write operations for synaptic weight matrices. A bit-level data process from memory, storage, and special multiply-accumulate hardware are implemented for multiply-accumulation operations. The fixed-point arithmetic and fixed-point storage are also adapted in ASIC implementation. At 16nm, the ASIC of Sparsey achieved an overall speedup of 25.24x and 353.12x reduction in energy per frame, and 1.43x reduction in silicon area against a GPU. In ASIP, the emerging 3D-stacked memory is used to increase the off-chip memory bandwidth and sized on-chip memory to improve data locality inside the processor. A set of short instructions are also implemented in ASIP architecture after analyzing different complex, time-consuming, special operations into high-level functional blocks, and a look-up table based special function operations to improve its performance. State-of-the-art mixed precision training and inference are also adapted in this architecture. A high-level programming environment is also developed to generate Very Long Instruction Word (VLIW) instructions for ASIP to process a variant of RNNs. At 16nm, an ASIP achieved 1.5x - 5.6x faster processing, 4.3x - 40.8x reduction in energy per sequence, and 1.5x area benefit than a GPU. |
일반주제명 | Computer engineering. Artificial intelligence. |
언어 | 영어 |
바로가기 |
: 이 자료의 원문은 한국교육학술정보원에서 제공합니다. |