자료유형 | 학위논문 |
---|---|
서명/저자사항 | Avoiding Communication in First Order Methods for Optimization. |
개인저자 | Devarakonda, Aditya. |
단체저자명 | University of California, Berkeley. Electrical Engineering & Computer Sciences. |
발행사항 | [S.l.]: University of California, Berkeley., 2018. |
발행사항 | Ann Arbor: ProQuest Dissertations & Theses, 2018. |
형태사항 | 125 p. |
기본자료 저록 | Dissertation Abstracts International 80-01B(E). Dissertation Abstract International |
ISBN | 9780438325531 |
학위논문주기 | Thesis (Ph.D.)--University of California, Berkeley, 2018. |
일반주기 |
Source: Dissertation Abstracts International, Volume: 80-01(E), Section: B.
Adviser: James W. Demmel. |
요약 | Machine learning has gained renewed interest in recent years due to advances in computer hardware (processing power and high-capacity storage) and the availability of large amounts of data which can be used to develop accurate, robust models. Wh |
요약 | In addition to hardware improvements, algorithm redesign is also an important direction to further reduce running times. On modern computer architectures, the cost of moving data (communication) from main memory to caches in a single machine is |
요약 | Many problems in machine learning solve mathematical optimization problems which, in most non-linear and non-convex cases, requires iterative methods. This thesis is focused on deriving communication-avoiding variants of the block coordinate des |
요약 | This thesis adapts well-known techniques from existing work on communication-avoiding (CA) Krylov and s-step Krylov methods. CA-Krylov methods unroll vector recurrences and rearrange the sequence of computation in way that defers communication f |
요약 | We apply a similar recurrence unrolling technique to block coordinate descent in order to obtain communication-avoiding variants which solve the L2-regularized least-squares, L1-regularized least-squares, Support Vector Machines, and Kernel prob |
요약 | Our experimental results illustrate that our new, communication-avoiding methods can obtain speedups of up to 6.1x on a Cray XC30 supercomputer using MPI for parallel processing. For CA-kernel methods we show modeled speedups of 26x, 120x, and 1 |
요약 | Finally, we also present an adaptive batch size technique which reduces the latency cost of training convolutional neural networks (CNN). With this technique we have achieved speedups of up to 6.25x when training CNNs on up to 4 NVIDIA P100 GPUs |
일반주제명 | Computer science. |
언어 | 영어 |
바로가기 |
: 이 자료의 원문은 한국교육학술정보원에서 제공합니다. |