MARC보기
LDR00000nam u2200205 4500
001000000433914
00520200226132949
008200131s2019 ||||||||||||||||| ||eng d
020 ▼a 9781085796194
035 ▼a (MiAaPQ)AAI13885135
040 ▼a MiAaPQ ▼c MiAaPQ ▼d 247004
0820 ▼a 310
1001 ▼a Chu, Lynna.
24510 ▼a A Graph-Based Approach to Change-Point Detection for Multivariate and Non-Euclidean Data.
260 ▼a [S.l.]: ▼b University of California, Davis., ▼c 2019.
260 1 ▼a Ann Arbor: ▼b ProQuest Dissertations & Theses, ▼c 2019.
300 ▼a 156 p.
500 ▼a Source: Dissertations Abstracts International, Volume: 81-04, Section: B.
500 ▼a Advisor: Chen, Hao.
5021 ▼a Thesis (Ph.D.)--University of California, Davis, 2019.
506 ▼a This item must not be sold to any third party vendors.
506 ▼a This item must not be added to any third party search indexes.
520 ▼a We consider the testing and estimation of change-points, locations where the distribution abruptly changes, in a sequence of multivariate or non-Euclidean observations. While the change-point problem has been extensively studied for low-dimensional data, advances in data collection technology have produced data sequences of increasing volume and complexity. Motivated by the challenges of modern data, we study a non-parametric framework that can be effectively applied to various data types as long as an informative similarity measure on the sample space can be defined. We first consider the change-point problem in the offline setting, where the sequence of observations has been completely observed. The existing approach along this line has low power and/or biased estimates for change-points under some common scenarios. To address these problems, we present new tests based on similarity information that exhibit substantial improvements in detecting and estimating change-points. In addition, under some mild conditions, the new test statistics are asymptotically distribution free under the null hypothesis of no change. Analytic p-value approximation formulas to the significance of the new test statistics are derived, making the new approaches easy off-the-shelf tools for large datasets.Moreover, in many applications it is of scientific significance to detect anomaly events as data are being collected. We extend the new test statistics to the related, but distinct, online setting where change-points are detected sequentially as data is being generated. The approach utilizes nearest neighbor information and can be applied to ongoing sequences of multivariate data or non-Euclidean data. Analytic formulas for approximating the average run lengths of the new approaches are derived to make them fast applicable for large datasets. The effectiveness of the new approaches are illustrated in an analysis of New York taxi data.
590 ▼a School code: 0029.
650 4 ▼a Biostatistics.
650 4 ▼a Statistics.
690 ▼a 0308
690 ▼a 0463
71020 ▼a University of California, Davis. ▼b Biostatistics.
7730 ▼t Dissertations Abstracts International ▼g 81-04B.
773 ▼t Dissertation Abstract International
790 ▼a 0029
791 ▼a Ph.D.
792 ▼a 2019
793 ▼a English
85640 ▼u http://www.riss.kr/pdu/ddodLink.do?id=T15491418 ▼n KERIS ▼z 이 자료의 원문은 한국교육학술정보원에서 제공합니다.
980 ▼a 202002 ▼f 2020
990 ▼a ***1816162
991 ▼a E-BOOK