대구한의대학교 향산도서관

상세정보

부가기능

Rich and Scalable Models for Text

상세 프로파일

상세정보
자료유형학위논문
서명/저자사항Rich and Scalable Models for Text.
개인저자Nguyen, Thang Dai.
단체저자명University of Maryland, College Park. Computer Science.
발행사항[S.l.]: University of Maryland, College Park., 2019.
발행사항Ann Arbor: ProQuest Dissertations & Theses, 2019.
형태사항246 p.
기본자료 저록Dissertations Abstracts International 81-05B.
Dissertation Abstract International
ISBN9781687922106
학위논문주기Thesis (Ph.D.)--University of Maryland, College Park, 2019.
일반주기 Source: Dissertations Abstracts International, Volume: 81-05, Section: B.
Advisor: Boyd-Graber, Jordan
이용제한사항This item must not be sold to any third party vendors.
요약Topic models have become essential tools for uncovering hidden structures in big data. However, the most popular topic model algorithm-Latent Dirichlet Allocation (LDA)- and its extensions suffer from sluggish performance on big datasets. Recently, the machine learning community has attacked this problem using spectral learning approaches such as the moment method with tensor decomposition or matrix factorization. The anchor word algorithm by Arora et al. [2013] has emerged as a more efficient approach to solve a large class of topic modeling problems. The anchor word algorithm is high-speed, and it has a provable theoretical guarantee: it will converge to a global solution given enough number of documents. In this thesis, we present a series of spectral models based on the anchor word algorithm to serve a broader class of datasets and to provide more abundant and more flexible modeling capacity.First, we improve the anchor word algorithm by incorporating various rich priors in the form of appropriate regularization terms. Our new regularized anchor word algorithms produce higher topic quality and provide flexibility to incorporate informed priors, creating the ability to discover topics more suited for external knowledge.Second, we enrich the anchor word algorithm with metadata-based word representation for labeled datasets. Our new supervised anchor word algorithm runs very fast and predicts better than supervised topic models such as Supervised LDA on three sentiment datasets. Also, sentiment anchor words, which play a vital role in generating sentiment topics, provide cues to understand sentiment datasets better than unsupervised topic models.Lastly, we examine ALTO, an active learning framework with a static topic overview, and investigate the usability of supervised topic models for active learning. We develop a new, dynamic, active learning framework that combines the concept of informativeness and representativeness of documents using dynamically updating topics from our fast supervised anchor word algorithm. Experiments using three multi-class datasets show that our new framework consistently improves classification accuracy over ALTO.
일반주제명Computer science.
Artificial intelligence.
언어영어
바로가기URL : 이 자료의 원문은 한국교육학술정보원에서 제공합니다.

서평(리뷰)

  • 서평(리뷰)

태그

  • 태그

나의 태그

나의 태그 (0)

모든 이용자 태그

모든 이용자 태그 (0) 태그 목록형 보기 태그 구름형 보기
 
로그인폼