대구한의대학교 향산도서관

상세정보

부가기능

Training and Architecting Sequence to Sequence Language Models for Applications in Varied Domains

상세 프로파일

상세정보
자료유형학위논문
서명/저자사항Training and Architecting Sequence to Sequence Language Models for Applications in Varied Domains.
개인저자Li, Congrui.
단체저자명Rensselaer Polytechnic Institute. Computer Science.
발행사항[S.l.]: Rensselaer Polytechnic Institute., 2019.
발행사항Ann Arbor: ProQuest Dissertations & Theses, 2019.
형태사항142 p.
기본자료 저록Dissertations Abstracts International 81-02B.
Dissertation Abstract International
ISBN9781085558693
학위논문주기Thesis (Ph.D.)--Rensselaer Polytechnic Institute, 2019.
일반주기 Source: Dissertations Abstracts International, Volume: 81-02, Section: B.
Advisor: Fox, Peter.
이용제한사항This item must not be sold to any third party vendors.This item must not be added to any third party search indexes.
요약Lots of challenges exist while dealing with language text sequence data directly on document level. The sequence-to-sequence (seq2seq) model is an ideal tool for this task. A basic sequence-to-sequence model consists of two recurrent networks: an encoder that processes the input and a decoder that generates the output. To allow the decoder's more direct access to the input, an attention mechanism was introduced by researchers so that the decoder can peek into the input at every decoding step. To improve the long-term dependencies, more sophisticated neuron cell structures, such as Long Short-Term Memory and Gated Recurrent Unit, were also developed by researchers. The task of Neural Machine Translation was the very first testbed for seq2seq models with wild success, and then followed by the task of chatbot applications in various domains.혻This thesis introduces three innovative case studies using variants of seq2seq model, and each of them focuses on a different stage of the model's training process. The first case study focuses on the stage before the training of seq2seq model. We introduce a generative chatbot in Chinese language trained with data on a finer level of granularity. Based on the evaluation of A/B testing results by multiple human evaluators, we conclude that the character-level model can still maintain the performance of the word-level benchmark.The second case study focuses on the stage during the training of seq2seq model. We introduce an unsupervised information retrieval (IR) model using sequence autoencoder which is competitive with multiple existing techniques, including Jaccard similarity, bag-of-words cosine similarity, tf-idf cosine similarity, as well as the recent neural network approaches such as Doc2Vec and Skip-Thoughts. The third case study focuses on the stage after the training of seq2seq model. We explore mergers and acquisitions in the domain of business analytics. We further demonstrate the effectiveness of the IR model in the previous case study for measuring business proximity, and also investigate the capability of the IR model's output as pre-trained input for a downstream supervised task, to prediction acquisitions. For the subsequent task, we compare the variations of models with two different types of inputs as well as three different types of network structure. Sophisticated data preprocessing techniques are carried out for each experiment to improve the quality of the training data. Bidirectional seq2seq models with GRU cells and Luong attention are used for all tasks.In conclusion, research is conducted before, during, and after the training of seq2seq model so that improvements or discoveries are made in each case study to more effectively encode natural language text sequence data at the document level to obtain responses/answers/trends for various training corpora.
일반주제명Computer science.
언어영어
바로가기URL : 이 자료의 원문은 한국교육학술정보원에서 제공합니다.

서평(리뷰)

  • 서평(리뷰)

태그

  • 태그

나의 태그

나의 태그 (0)

모든 이용자 태그

모든 이용자 태그 (0) 태그 목록형 보기 태그 구름형 보기
 
로그인폼