MARC보기
LDR00000nam u2200205 4500
001000000434290
00520200226143923
008200131s2019 ||||||||||||||||| ||eng d
020 ▼a 9781085576550
035 ▼a (MiAaPQ)AAI13863164
040 ▼a MiAaPQ ▼c MiAaPQ ▼d 247004
0820 ▼a 310
1001 ▼a Moran, Gemma E.
24510 ▼a Bayesian Approaches for Modeling Variation.
260 ▼a [S.l.]: ▼b University of Pennsylvania., ▼c 2019.
260 1 ▼a Ann Arbor: ▼b ProQuest Dissertations & Theses, ▼c 2019.
300 ▼a 144 p.
500 ▼a Source: Dissertations Abstracts International, Volume: 81-02, Section: B.
500 ▼a Advisor: George, Edward I.
5021 ▼a Thesis (Ph.D.)--University of Pennsylvania, 2019.
506 ▼a This item must not be sold to any third party vendors.
506 ▼a This item must not be added to any third party search indexes.
506 ▼a This item must not be added to any third party search indexes.
506 ▼a This item must not be sold to any third party vendors.
520 ▼a A core focus of statistics is determining how much of the variation in data may be attributed to the signal of interest, and how much to noise. When the sources of variation are many and complex, a Bayesian approach to data analysis offers a number of advantages. In this thesis, we propose and implement new Bayesian methods for modeling variation in two general settings. The first setting is high-dimensional linear regression where the unknown error variance is also of interest. Here, we show that a commonly used class of conjugate shrinkage priors can lead to underestimation of the error variance. We then extend the Spike-and-Slab Lasso (SSL, Rockova and George, 2018) to the unknown variance case, using an alternative, independent prior framework. This extended procedure outperforms both the fixed variance approach and alternative penalized likelihood methods on both simulated and real data.For the second setting, we move from univariate response data where the predictors are known, to multivariate response data in which potential predictors are unobserved. In this setting, we first consider the problem of biclustering, where a motivating example is to find subsets of genes which have similar expression in a subset of patients. For this task, we propose a new biclustering method called Spike-and-Slab Lasso Biclustering (SSLB). SSLB utilizes the SSL prior to find a doubly-sparse factorization of the data matrix via a fast EM algorithm. Applied to both a microarray dataset and a single-cell RNA-sequencing dataset, SSLB recovers biologically meaningful signal in the data.The second problem we consider in this setting is nonlinear factor analysis. The goal here is to find low-dimensional, unobserved "factors'' which drive the variation in the high-dimensional observed data in a potentially nonlinear fashion. For this purpose, we develop factor analysis BART (faBART), an MCMC algorithm which alternates sampling from the posterior of (a) the factors and (b) a functional approximation to the mapping from the factors to the data. The latter step utilizes Bayesian Additive Regression Trees (BART, Chipman et al., 2010). On a variety of simulation settings, we demonstrate that with only the observed data as the input, faBART is able to recover both the unobserved factors and the nonlinear mapping.
590 ▼a School code: 0175.
650 4 ▼a Statistics.
690 ▼a 0463
71020 ▼a University of Pennsylvania. ▼b Statistics.
7730 ▼t Dissertations Abstracts International ▼g 81-02B.
773 ▼t Dissertation Abstract International
790 ▼a 0175
791 ▼a Ph.D.
792 ▼a 2019
793 ▼a English
85640 ▼u http://www.riss.kr/pdu/ddodLink.do?id=T15490985 ▼n KERIS ▼z 이 자료의 원문은 한국교육학술정보원에서 제공합니다.
980 ▼a 202002 ▼f 2020
990 ▼a ***1816162
991 ▼a E-BOOK