대구한의대학교 향산도서관

상세정보

부가기능

Design of a Scalable, Configurable, and Cluster-based Hierarchical Hardware Accelerator for a Cortically Inspired Algorithm and Recurrent Neural Networks

상세 프로파일

상세정보
자료유형학위논문
서명/저자사항Design of a Scalable, Configurable, and Cluster-based Hierarchical Hardware Accelerator for a Cortically Inspired Algorithm and Recurrent Neural Networks.
개인저자Dey, Sumon.
단체저자명North Carolina State University.
발행사항[S.l.]: North Carolina State University., 2019.
발행사항Ann Arbor: ProQuest Dissertations & Theses, 2019.
형태사항128 p.
기본자료 저록Dissertations Abstracts International 81-05B.
Dissertation Abstract International
ISBN9781392766576
학위논문주기Thesis (Ph.D.)--North Carolina State University, 2019.
일반주기 Source: Dissertations Abstracts International, Volume: 81-05, Section: B.
Advisor: Laber, Eric
이용제한사항This item must not be sold to any third party vendors.
요약Machine learning algorithms based on deep learning have met with enormous success to achieve higher performance in applications ranging from object recognition to defeating a human expert in the complex game. Furthermore, it can broaden the horizon by processing a massive amount of multimodal natural data (video, audio) and learning useful join representations in applications. In addition to deep learning, cortical learning algorithms can also learn representations in much closer to in a way a human brain works. Unlike the use of dense data in deep learning, binary data is used in cortical learning to model sparse distributed memory for different representations. These algorithms use artificial neural networks to model them into hardware. However, implementation of such networks in hardware relies on throughput and memory bandwidth of hardware architecture, which requires dealing with massive amount of data. To advance the research of these rapidly evolving techniques, there is a direct need for the design and implementation of specialized hardware to accelerate these algorithms. In this work, a scalable, configurable, and cluster-based hierarchical hardware accelerator is designed and implemented through an application-specific integrated circuit (ASIC) for Sparsey, a cortical learning algorithm. Also, an application-specific instruction set processor (ASIP) is designed and implemented for recurrent neural networks (RNNs). A distributed on-chip memory organization is designed and implemented in ASIC to improve memory bandwidth and accelerate the memory read and write operations for synaptic weight matrices. A bit-level data process from memory, storage, and special multiply-accumulate hardware are implemented for multiply-accumulation operations. The fixed-point arithmetic and fixed-point storage are also adapted in ASIC implementation. At 16nm, the ASIC of Sparsey achieved an overall speedup of 25.24x and 353.12x reduction in energy per frame, and 1.43x reduction in silicon area against a GPU. In ASIP, the emerging 3D-stacked memory is used to increase the off-chip memory bandwidth and sized on-chip memory to improve data locality inside the processor. A set of short instructions are also implemented in ASIP architecture after analyzing different complex, time-consuming, special operations into high-level functional blocks, and a look-up table based special function operations to improve its performance. State-of-the-art mixed precision training and inference are also adapted in this architecture. A high-level programming environment is also developed to generate Very Long Instruction Word (VLIW) instructions for ASIP to process a variant of RNNs. At 16nm, an ASIP achieved 1.5x - 5.6x faster processing, 4.3x - 40.8x reduction in energy per sequence, and 1.5x area benefit than a GPU.
일반주제명Computer engineering.
Artificial intelligence.
언어영어
바로가기URL : 이 자료의 원문은 한국교육학술정보원에서 제공합니다.

서평(리뷰)

  • 서평(리뷰)

태그

  • 태그

나의 태그

나의 태그 (0)

모든 이용자 태그

모든 이용자 태그 (0) 태그 목록형 보기 태그 구름형 보기
 
로그인폼