MARC보기
LDR00000nam u2200205 4500
001000000436344
00520200228141510
008200131s2019 ||||||||||||||||| ||eng d
020 ▼a 9781085572187
035 ▼a (MiAaPQ)AAI13805962
040 ▼a MiAaPQ ▼c MiAaPQ ▼d 247004
0820 ▼a 621.3
1001 ▼a Motamedi, Mohammad.
24510 ▼a Efficient Inference using Deep Convolutional Neural Networks on Resource-Constrained Platforms.
260 ▼a [S.l.]: ▼b University of California, Davis., ▼c 2019.
260 1 ▼a Ann Arbor: ▼b ProQuest Dissertations & Theses, ▼c 2019.
300 ▼a 123 p.
500 ▼a Source: Dissertations Abstracts International, Volume: 81-02, Section: B.
500 ▼a Advisor: Ghiasi, Soheil.
5021 ▼a Thesis (Ph.D.)--University of California, Davis, 2019.
506 ▼a This item must not be sold to any third party vendors.
520 ▼a Deep Convolutional Neural Networks (CNNs) exhibit remarkable performance in many pattern recognition, segmentation, classification, and comprehension tasks that were widely considered open problems for most of the computing history. For example, CNNs are shown to outperform humans in certain visual object recognition tasks. Given the significant potential of CNNs in advancing autonomy and intelligence in systems, the Internet of Things (IoT) research community has witnessed a surge in demand for CNN-enabled data processing, technically referred to as inference, for critical tasks, such as visual, voice and language comprehension. Inference using modern CNNs involves billions of operations on millions of parameters, and thus their deployment requires significant compute, storage, and energy resources. However, such resources are scarce in many resource-constrained IoT applications. Designing an efficient CNN architecture is the first step in alleviating this problem. Use of asymmetric kernels, breadth control techniques, and reduce-expand structures are among the most important approaches that can effectively decrease CNNs parameter budget and their computational intensity. The architectural efficiency can be further improved by eliminating ineffective neurons using pruning algorithms, and quantizing the parameters to decrease the model size. Hardware-driven optimization is the subsequent step in addressing the computational demands of deep neural networks. Mobile System on Chips (SoCs), which usually include a mobile GPU, a DSP, and a number of CPU cores, are great candidates for CNN inference on embedded platforms. Depending on the application, it is also possible to develop customized FPGA-based and ASIC-based accelerators. ASIC-based acceleration drastically outperforms other approaches in terms of both power consumption and execution time. However, using this approach is reasonable only if designing a new chip is economically justifiable for the target application. This dissertation aims to bridge the gap between computational demands of CNNs and computational capabilities of embedded platforms. We contend that one has to strike a judicious balance between functional requirements of a CNN, and its resource requirements, for an IoT application to be able to utilize the CNN. We investigate several concrete formulations of this broad concept, and propose effective approaches for addressing the identified challenges.First, we target platforms that are equipped with reconfigurable fabric, such as Field Programmable Gate Arrays (FPGA), and offer a framework for generation of optimized FPGA-based CNN accelerators. Our solution leverages an analytical approach to characterization and exploration of the accelerator design space through which, it synthesizes an efficient accelerator for a given CNN on a specific FPGA. Second, we investigate the problem of CNN inference on mobile SoCs, propose effective approaches for CNN parallelization targeting such platforms, and explore the underlying tradeoffs. Finally, in the last part of this dissertation, we investigate utilization of an existing optimized CNN model to automatically generate a competitive CNN for an IoT application whose objects of interest are a fraction of categories that the original CNN was designed to classify, such that the resource requirement of inference using the synthesized CNN is proportionally scaled down. We use the term resource scalability to refer to this concept and propose solutions for automated synthesis of context-aware, resource-scalable CNNs that meet the functional requirements of the target IoT application at fraction of the resource requirements of the original CNN.
590 ▼a School code: 0029.
650 4 ▼a Computer engineering.
650 4 ▼a Electrical engineering.
690 ▼a 0464
690 ▼a 0544
71020 ▼a University of California, Davis. ▼b Electrical and Computer Engineering.
7730 ▼t Dissertations Abstracts International ▼g 81-02B.
773 ▼t Dissertation Abstract International
790 ▼a 0029
791 ▼a Ph.D.
792 ▼a 2019
793 ▼a English
85640 ▼u http://www.riss.kr/pdu/ddodLink.do?id=T15490471 ▼n KERIS ▼z 이 자료의 원문은 한국교육학술정보원에서 제공합니다.
980 ▼a 202002 ▼f 2020
990 ▼a ***1008102
991 ▼a E-BOOK