대구한의대학교 향산도서관

상세정보

부가기능

Heterogeneous Monolithic 3D and FinFET Architectures for Energy-Efficient Computing

상세 프로파일

상세정보
자료유형학위논문
서명/저자사항Heterogeneous Monolithic 3D and FinFET Architectures for Energy-Efficient Computing.
개인저자Yu, Ye.
단체저자명Princeton University. Electrical Engineering.
발행사항[S.l.]: Princeton University., 2019.
발행사항Ann Arbor: ProQuest Dissertations & Theses, 2019.
형태사항210 p.
기본자료 저록Dissertations Abstracts International 81-04B.
Dissertation Abstract International
ISBN9781085774390
학위논문주기Thesis (Ph.D.)--Princeton University, 2019.
일반주기 Source: Dissertations Abstracts International, Volume: 81-04, Section: B.
Advisor: Jha, Niraj K.
이용제한사항This item must not be sold to any third party vendors.
요약More transistors are integrated within the same footprint area as the technology node shrinks to deliver higher performance. However, this is accompanied by higher power density that usually exceeds the coping capability of inexpensive cooling techniques. This Power Wall prevents the chip from running at full speed with all the devices powered-on. Another major bottleneck in chip design is the imbalance between the processor clock rate and memory access speed. This Memory Wall keeps the processor from fully utilizing its compute power. To address both the Power and Memory Walls, we propose several approaches and architectures.To tackle the Memory Wall, we develop an efficient memory interface for monolithic 3D-stacked non-volatile RAMs (NVRAMs). It takes advantage of the tremendous bandwidth made available by monolithic inter-tier vias (MIVs) to implement an on-chip memory bus in order to hide the latency of large data transfers. To tackle the Power Wall, we add a fine-grain dynamically reconfigurable (FDR) field- programmable gate array (FPGA) in our monolithic 3D architecture. It uses the concept of temporal logic folding to localize on-chip communication. We show that the architecture reduces both power and energy significantly at a better performance for both memory- and compute-intensive applications.The second problem targeted in this work is to develop energy-efficient architectures for convolutional neural networks (CNNs). CNNs have been shown to outperform conventional machine-learning algorithms across a wide range of applications, e.g., object detection, image classification, image segmentation, etc. However, the high computational complexity of CNNs often necessitates extremely fast and efficient hardware. The problem is getting worse as the size of neural networks grows exponentially. As a result, customized hardware accelerators have been developed to accelerate CNN processing without sacrificing model accuracy. However, previous accelerator design studies have not fully considered the characteristics of the target applications, which may lead to sub-optimal architecture designs. On the other hand, new CNN models have been developed for better accuracy, but their compatibility with the underlying hardware accelerator is overlooked most of the time. We propose an application-driven framework for architectural design space exploration of CNN accelerators. This framework is based on a hardware analytical model for individual CNN operations. It models the accelerator design task as a multi-dimensional optimization problem. We demonstrate that it can be efficaciously used in application-driven accelerator architecture design. In addition, it is capable of improving neural network models to best fit the underlying hardware resources.Most existing CNN accelerators focus on exploring various dataflow styles and computational parallelism designs. However, potential performance improvement from the sparsity (in activations and weights) is still underdeveloped. The amount of computation and memory footprint of CNNs can be significantly reduced if sparsity is exploited in network evaluations. With the design space exploration method discussed above, we develop SPRING, a sparsity-aware reduced-precision CNN accelerator architecture for both training and inference. We use a binary mask scheme to encode sparsity of activations and weights, and adopt the stochastic rounding algorithm to train CNNs with reduced precision without accuracy loss. We use the efficient monolithic 3D nonvolatile memory interface to alleviate the memory bottleneck of CNN evaluation, especially in training.The last research direction of this thesis focuses on analyzing timing, leakage power, and dynamic power of FinFET architectures under process, supply voltage, and temperature (PVT) variations. We propose a statistical optimization framework using dual device-type assignment at the architecture level under PVT variations that takes spatial correlations into account and leverages circuit-level statistical analysis techniques.
일반주제명Computer engineering.
Electrical engineering.
언어영어
바로가기URL : 이 자료의 원문은 한국교육학술정보원에서 제공합니다.

서평(리뷰)

  • 서평(리뷰)

태그

  • 태그

나의 태그

나의 태그 (0)

모든 이용자 태그

모든 이용자 태그 (0) 태그 목록형 보기 태그 구름형 보기
 
로그인폼