MARC보기
LDR00000nam u2200205 4500
001000000433633
00520200225151816
008200131s2019 ||||||||||||||||| ||eng d
020 ▼a 9781085763684
035 ▼a (MiAaPQ)AAI22615061
040 ▼a MiAaPQ ▼c MiAaPQ ▼d 247004
0820 ▼a 004
1001 ▼a Jonathan, Albert.
24510 ▼a Multi-tenant Geo-distributed Data Analytics.
260 ▼a [S.l.]: ▼b University of Minnesota., ▼c 2019.
260 1 ▼a Ann Arbor: ▼b ProQuest Dissertations & Theses, ▼c 2019.
300 ▼a 146 p.
500 ▼a Source: Dissertations Abstracts International, Volume: 81-02, Section: B.
500 ▼a Advisor: Chandra, Abhishek
5021 ▼a Thesis (Ph.D.)--University of Minnesota, 2019.
506 ▼a This item must not be sold to any third party vendors.
520 ▼a Geo-distributed data analytics has gained much interest in recent years due to the need for extracting insights from geo-distributed data. Traditionally, data analytics has been done within a cluster/data center environment. However, analyzing geo-distributed data using existing cluster-based systems typically cannot satisfy the timeliness requirement of most applications and result in wasteful resource consumption due to the fundamental differences of the environments, especially due to the scarce, highly heterogeneous, and dynamic nature of the wide-area resources: compute power and network bandwidth. This thesis addresses the challenges faced by geo-distributed data analytics systems in ensuring high-performance and reliable execution of multiple data analytics applications/queries. Specifically, the focus is on sharing resources across multiple users, applications, and computing frameworks. Sharing resources is attractive as it increases resource utilization and reduces operational cost. However, ensuring high-performance execution of multiple applications in a shared environment is challenging as they may compete for the same resources, especially in a wide-area environment with scarce resources. Furthermore, dynamics such as workload variation, resource variation, stragglers, and failures are inevitable in large-scale distributed systems. These can cause large resource perturbation that significantly affect the performance of query executions.This thesis makes the following contributions. First, we present a resource sharing technique across multiple geo-distributed data analytics frameworks. The main challenge here is how to elastically partition resources while allowing high locality scheduling to each individual framework, which is critical to the execution performance of geo-distributed analytics queries. We then address the problem of how to identify and exploit common executions across multiple queries to mitigate wasteful resource consumption. We demonstrate that traditional multi-query optimization may degrade the overall query execution performance due to its lack of support for network awareness. Finally, we highlight the importance of adaptability in ensuring reliable query execution in the presence of dynamics, both for single and multiple query executions. We propose a systematic approach that can selectively determine which queries to adapt and how to adapt them based on the types of queries, dynamics, and optimization goals.
590 ▼a School code: 0130.
650 4 ▼a Computer science.
690 ▼a 0984
71020 ▼a University of Minnesota. ▼b Computer Science.
7730 ▼t Dissertations Abstracts International ▼g 81-02B.
773 ▼t Dissertation Abstract International
790 ▼a 0130
791 ▼a Ph.D.
792 ▼a 2019
793 ▼a English
85640 ▼u http://www.riss.kr/pdu/ddodLink.do?id=T15493267 ▼n KERIS ▼z 이 자료의 원문은 한국교육학술정보원에서 제공합니다.
980 ▼a 202002 ▼f 2020
990 ▼a ***1008102
991 ▼a E-BOOK