대구한의대학교 향산도서관

상세정보

부가기능

Connecting Documents, Words, and Languages Using Topic Models

상세 프로파일

상세정보
자료유형학위논문
서명/저자사항Connecting Documents, Words, and Languages Using Topic Models.
개인저자Yang, Weiwei.
단체저자명University of Maryland, College Park. Computer Science.
발행사항[S.l.]: University of Maryland, College Park., 2019.
발행사항Ann Arbor: ProQuest Dissertations & Theses, 2019.
형태사항170 p.
기본자료 저록Dissertations Abstracts International 81-05B.
Dissertation Abstract International
ISBN9781687913074
학위논문주기Thesis (Ph.D.)--University of Maryland, College Park, 2019.
일반주기 Source: Dissertations Abstracts International, Volume: 81-05, Section: B.
Advisor: Boyd-Graber, Jordan L.
이용제한사항This item must not be sold to any third party vendors.
요약Topic models discover latent topics in documents and summarize documents at a high level. To improve topic models' topic quality and extrinsic performance, external knowledge is often incorporated as part of the generative story. One form of external knowledge is weighted text links that indicate similarity or relatedness between the connected objects. This dissertation 1) uncovers the latent structures in observed weighted links and integrates them into topic modeling, and 2) learns latent weighted links from other external knowledge to improve topic modeling.We consider incorporating links at three different levels: documents, words, and topics. We first look at binary document links, e.g., citation links of papers. Document links indicate topic similarity of the connected documents. Past methods model the document links separately, ignoring the entire link density. We instead uncover latent document blocks in which documents are densely connected and tend to talk about similar topics. We introduce LBH-RTM, a relational topic model with lexical weights, block priors, and hinge loss. It extracts informative topic priors from the document blocks for documents' topic generation. It predicts unseen document links with block and lexical features and hinge loss, in addition to topical features. It outperforms past methods in link prediction and gives more coherent topics.Like document links, words are also linked, but usually with real-valued weights. Word links are known as word associations and indicate the semantic relatedness of the connected words. They provide more information about word relationships in addition to the co-occurrence patterns in the training corpora. To extract and incorporate the knowledge in word associations, we introduce methods to find the most salient word pairs. The methods organize the words in a tree structure, which serves as a prior (i.e., tree prior) for tree LDA. The methods are straightforward but effective, yielding more coherent topics than vanilla LDA, and slightly improving the extrinsic classification performance.Weighted topic links are different. Topics are latent, so it is difficult to obtain ground-truth topic links, but learned weighted topic links could bridge the topics across languages. We introduce a multilingual topic model (MTM) that assumes each language has its own topic distributions over the words only in that language and learns weighted topic links based on word translations and words' topic distributions. It does not force the topic spaces of different languages to be aligned and is more robust than previous MTMs that do. It outperforms past MTMs in classification while still giving coherent topics on less comparable and smaller corpora.
일반주제명Computer science.
Artificial intelligence.
언어영어
바로가기URL : 이 자료의 원문은 한국교육학술정보원에서 제공합니다.

서평(리뷰)

  • 서평(리뷰)

태그

  • 태그

나의 태그

나의 태그 (0)

모든 이용자 태그

모든 이용자 태그 (0) 태그 목록형 보기 태그 구름형 보기
 
로그인폼