자료유형 | 학위논문 |
---|---|
서명/저자사항 | Intelligent and Scalable Algorithms for Canonical Polyadic Decomposition. |
개인저자 | Aggour, Kareem Sherif. |
단체저자명 | Rensselaer Polytechnic Institute. Computer Science. |
발행사항 | [S.l.]: Rensselaer Polytechnic Institute., 2019. |
발행사항 | Ann Arbor: ProQuest Dissertations & Theses, 2019. |
형태사항 | 168 p. |
기본자료 저록 | Dissertations Abstracts International 81-02B. Dissertation Abstract International |
ISBN | 9781085567510 |
학위논문주기 | Thesis (Ph.D.)--Rensselaer Polytechnic Institute, 2019. |
일반주기 |
Source: Dissertations Abstracts International, Volume: 81-02, Section: B.
Advisor: Yener, Bulent |
이용제한사항 | This item must not be sold to any third party vendors. |
요약 | Over the past two decades, there has been a dramatic increase in the volume and variety of data generated in almost every scientific discipline. To enable the efficient storage and processing of these massive datasets, a variety of fault tolerant, scalable distributed storage and processing platforms have been popularized---most famously, Hadoop MapReduce and Spark. Novel distributed algorithms are being developed to take full advantage of these platforms, including scalable variants of algorithms such as Canonical Polyadic Decomposition (CPD), an unsupervised learning technique frequently used in data mining and machine learning applications to discover latent factors in a class of multimodal datasets referred to as tensors. Current research in scalable CPD algorithms have focused almost exclusively on the analysis of large sparse tensors, however.This research addresses the complementary need for efficient, scalable algorithms to decompose large dense tensors that arise in many signal processing and anomaly detection applications. To that end, we developed a progression of algorithms designed for MapReduce settings that incorporate combinations of regularization and sketching to efficiently operate on dense, skewed tensors. The first MapReduce CPD algorithm utilizes an Alternating Least Squares (ALS) strategy that is mathematically equivalent to the classical sequential CPD-ALS algorithm. A second algorithm was then developed that features regularization and sketching working in tandem to accelerate and stabilize tensor decompositions. Prior research had demonstrated the benefits of applying either regularization or sketching to CPD-ALS, but to our knowledge this work is the first to demonstrate the utility of using both together, outperforming the use of either technique alone. However, this algorithm requires the manual selection of the sketching and regularization hyperparameter values.We next developed two novel algorithms that employ online learning-based approaches to dynamically select the sketching rate and regularization parameters at runtime, further optimizing CP decompositions while simultaneously eliminating the burden of manual hyperparameter selection. This work is the first to intelligently choose the sketching rate and regularization parameters at each iteration of a CPD algorithm to balance the trade-off between minimizing the runtime and maximizing the decomposition accuracy. On both synthetic and real data, it was observed that for noisy tensors, our intelligent CPD algorithm produces decompositions of accuracy comparable to the exact distributed CPD-ALS algorithm in less time, often half the time. For ill-conditioned tensors, given the same time budget, the intelligent CPD algorithm produces decompositions with significantly lower relative error, often yielding an order of magnitude improvement. |
일반주제명 | Computer science. Artificial intelligence. |
언어 | 영어 |
바로가기 |
: 이 자료의 원문은 한국교육학술정보원에서 제공합니다. |