MARC보기
LDR03176nam u200445 4500
001000000419496
00520190215163725
008181129s2018 |||||||||||||||||c||eng d
020 ▼a 9780438325531
035 ▼a (MiAaPQ)AAI10840868
035 ▼a (MiAaPQ)berkeley:18070
040 ▼a MiAaPQ ▼c MiAaPQ ▼d 247004
0820 ▼a 004
1001 ▼a Devarakonda, Aditya.
24510 ▼a Avoiding Communication in First Order Methods for Optimization.
260 ▼a [S.l.]: ▼b University of California, Berkeley., ▼c 2018.
260 1 ▼a Ann Arbor: ▼b ProQuest Dissertations & Theses, ▼c 2018.
300 ▼a 125 p.
500 ▼a Source: Dissertation Abstracts International, Volume: 80-01(E), Section: B.
500 ▼a Adviser: James W. Demmel.
5021 ▼a Thesis (Ph.D.)--University of California, Berkeley, 2018.
520 ▼a Machine learning has gained renewed interest in recent years due to advances in computer hardware (processing power and high-capacity storage) and the availability of large amounts of data which can be used to develop accurate, robust models. Wh
520 ▼a In addition to hardware improvements, algorithm redesign is also an important direction to further reduce running times. On modern computer architectures, the cost of moving data (communication) from main memory to caches in a single machine is
520 ▼a Many problems in machine learning solve mathematical optimization problems which, in most non-linear and non-convex cases, requires iterative methods. This thesis is focused on deriving communication-avoiding variants of the block coordinate des
520 ▼a This thesis adapts well-known techniques from existing work on communication-avoiding (CA) Krylov and s-step Krylov methods. CA-Krylov methods unroll vector recurrences and rearrange the sequence of computation in way that defers communication f
520 ▼a We apply a similar recurrence unrolling technique to block coordinate descent in order to obtain communication-avoiding variants which solve the L2-regularized least-squares, L1-regularized least-squares, Support Vector Machines, and Kernel prob
520 ▼a Our experimental results illustrate that our new, communication-avoiding methods can obtain speedups of up to 6.1x on a Cray XC30 supercomputer using MPI for parallel processing. For CA-kernel methods we show modeled speedups of 26x, 120x, and 1
520 ▼a Finally, we also present an adaptive batch size technique which reduces the latency cost of training convolutional neural networks (CNN). With this technique we have achieved speedups of up to 6.25x when training CNNs on up to 4 NVIDIA P100 GPUs
590 ▼a School code: 0028.
650 4 ▼a Computer science.
690 ▼a 0984
71020 ▼a University of California, Berkeley. ▼b Electrical Engineering & Computer Sciences.
7730 ▼t Dissertation Abstracts International ▼g 80-01B(E).
773 ▼t Dissertation Abstract International
790 ▼a 0028
791 ▼a Ph.D.
792 ▼a 2018
793 ▼a English
85640 ▼u http://www.riss.kr/pdu/ddodLink.do?id=T14999755 ▼n KERIS ▼z 이 자료의 원문은 한국교육학술정보원에서 제공합니다.
980 ▼a 201812 ▼f 2019
990 ▼a ***1012033