대구한의대학교 향산도서관

링크메뉴

주메뉴

- 소장자료+타기관자료검색
- 단행본+목차
- 연속간행물
- 학위논문
- 멀티미디어자료
- 기사색인
- 고문헌
- 전자책
- 신착/인기자료
- 오디오북
- 전자책
- 전자잡지(교양)
- 국내학술 DB
- 국외학술 DB
- E-learning
- Hot Article
- 전자저널 A-Z
- SFX(구독자원연계) 이용안내
- 전국대학 학위논문검색
- 논문 표절검사 솔루션
- 소개
- 개관시간
- 대출/반납
- 실별안내
- 편의시설
- 전자정보 교외접속안내
- 도서기증
- 이용교육자료
- 연구지원서비스
- 특별회원제도 안내
- LIFE 독서 마라톤
- DHU독서샘물
- 공지사항
- 자주묻는질문
- 묻고답하기
- 자료실
- 베스트 셀러
- 추천도서
- 대출현황/연장/예약
- 캠퍼스간대출신청현황
- 자료배달신청현황
- 희망도서신청
- 나의서평
- 타기관자료이용신청
- 보관함신청
- 그룹스터디룸
- 영상시설이용
- 관심도서리스트
- 소재불명도서 신고처리현황
- 교육 및 참가신청
- 개인정보관리

상세정보

상세정보

검색결과 돌아가기

검색화면

Export to Refworks

부가기능

MARC보기

Raising an Abstraction Level of Compilation and Optimization for Customized Computing

상세 프로파일

상세정보
자료유형	학위논문
서명/저자사항	Raising an Abstraction Level of Compilation and Optimization for Customized Computing.
개인저자	Yu, Hao.
단체저자명	University of California, Los Angeles. Computer Science 0201.
발행사항	[S.l.]: University of California, Los Angeles., 2019.
발행사항	Ann Arbor: ProQuest Dissertations & Theses, 2019.
형태사항	210 p.
기본자료 저록	Dissertations Abstracts International 81-03B. Dissertation Abstract International
ISBN	9781085666480
학위논문주기	Thesis (Ph.D.)--University of California, Los Angeles, 2019.
일반주기	Source: Dissertations Abstracts International, Volume: 81-03, Section: B. Advisor: Cong, Jason.
이용제한사항	This item must not be sold to any third party vendors.
요약	The demand for scalable, high-performance computing has increased as the size of datasets has grown in recent years. However, the breakdown of Dennard's scaling has led to energy efficiency becoming an important concern in datacenters, and spawned exploration into using power-efficient processors such as GPUs (Graphic Processing Units) and FPGAs (Field-Programmable Gate Arrays) as accelerators in datacenters. In particular, the FPGA's low power consumption and the re-programmability allow datacenters to use FPGAs as highly energy-efficient accelerators for a variety of application. On the other hand, FPGA has poor programmability compared to instructions-based architectures like CPU and GPU. To facilitate the process of implementing and deploying FPGA accelerators, High-Level Synthesis (HLS) that generates functional-equivalent RTL from C-based programming languages attracts more and more attention since past decades. Nowadays, both FPGA vendors have their commercial HLS products -- Xilinx SDx and Intel FPGA SDK for OpenCL. However, modern HLS is still not friendly for software designers who have limited FPGA domain knowledge. Since the hardware architecture inferred from a syntactic C implementation could be ambiguous, current commercial HLS tools usually generate architecture structures according to specific HLS C code patterns. As a result, even though the authors have illustrated that the HLS tool is capable of generating FPGA designs with competitive performance as the one in RTL, designers must manually reconstruct the HLS C kernel with specific code patterns to achieve high performance. This problem becomes one of the main impediments to consolidating the FPGA community on cooperation and developments.In this dissertation, we first present an automated framework that frees human efforts from code reconstruction and design space exploration (DSE). The framework creates a more comprehensive micro-architecture design space from user-written C-based kernel with the Merlin compiler, so the design point should cover the design point with better performance when compared to the HLS-pragma-based design space. To efficiently identify the best design configuration in the tremendous design space, we first propose efficient design space pruning processes that reduce the design space by 24.65x. Accordingly, we develop and evaluate several approaches, including multi-armed bandit hyper heuristic approach, gradient-based approach, and design bottleneck optimization approach. The evaluation result shows that our DSE framework is able to identify the design point that achieves on average (using geometric mean) 93.78% QoR compared to the corresponding manual design.Based on the proposed DSE framework, we further support automated design optimization for high level domain specific languages (DSLs). Since DSLs might not explicitly provide interfaces for users to specify design configurations, automatic DSE becomes even more important when supporting DSLs for FPGAs. Specifically, we adopt Merlin C, an OpenMP-like C-based programming model, as the intermediate representation (IR) and implement DSL-to-Merlin front-end compilers while preserving the semantic and domain-specific information such as parallel patterns, systolic patterns, and scheduling functions. We first implement Spark-to-Merlin front-end compiler that translates Spark applications in Scala to Merlin C for FPGA acceleration. By leveraging parallel patterns as scheduling hints, the generated accelerators are able to achieve 50x speedup on geometric mean for a set of machine learning kernels. In addition, we also demonstrate that our DSE framework can be even more practical for the DSLs with plenty scheduling functions. Specifically, we implement HeteroCL-to-Merlin front-end that takes HeteroCL programming model embedded in Python. Our DSE framework is capable of exploring a subset of HeteroCL scheduling primitives and let users focus on the platform independent loop transformations. With the help from the DSE framework, we achieve 27.62x speedup on geometric mean over a CPU core for a variety of compute-intensive kernels (chapter 3).On the other hand, a main challenge of performing design space exploration for a design with arbitrary functionality is the lack of the assumption to underlying micro-architectures. As we will illustrate in the dissertation, the cost of evaluating the quality of a design point is extremely expensive (15-60 minutes) so only a limited number of design points can be explored. In addition, due to the uncertainty of vendor tool behaviors, the development of performance and resource modeling is also unrealistic. As a result, we propose composable, parallel and pipeline (CPP) architecture template to limit the design space to a certain region that is more practical and has less exceptions (chapter 4). With the CPP architecture, we are able to derive an incremental analytical model, which only requires a few HLS run to be initialized, to facilitate the DSE process.In the last part of this dissertation, we use convolutional neural network (CNN) to demonstrate that the HLS runtime cost can be totally saved with the use of a more domain specific architecture (chapter 5). Specifically, we leverage a systolic array architecture template for CNN accelerator generation. By mapping a CNN model to the pre-defined systolic array template, we can guarantee the model accuracy and DSE efficiency. The experimental result shows that our analytical model for the architecture template achieves 96% accuracy, and the mapped CNN model achieves up to 1.2 Tops throughput on Intel Arria 10 FPGA.
일반주제명	Computer engineering.
언어	영어
바로가기	: 이 자료의 원문은 한국교육학술정보원에서 제공합니다.

정보 더보기(Naver)

서평(리뷰)

서평(리뷰)

링크메뉴

주메뉴

전체메뉴

상세정보

부가기능

상세 프로파일

서평(리뷰)

태그

나의 태그

모든 이용자 태그

MY MENU

도서관정보

서평(리뷰)
별점:
별점
제목:

내용: