MARC보기
LDR00000nam u2200205 4500
001000000433710
00520200225153031
008200131s2019 ||||||||||||||||| ||eng d
020 ▼a 9781088389799
035 ▼a (MiAaPQ)AAI22618672
040 ▼a MiAaPQ ▼c MiAaPQ ▼d 247004
0820 ▼a 660
1001 ▼a Zhang, Tong.
24510 ▼a Chemical Process Data Analytics via Text Mining and Machine Learning.
260 ▼a [S.l.]: ▼b Carnegie Mellon University., ▼c 2019.
260 1 ▼a Ann Arbor: ▼b ProQuest Dissertations & Theses, ▼c 2019.
300 ▼a 185 p.
500 ▼a Source: Dissertations Abstracts International, Volume: 81-05, Section: B.
500 ▼a Advisor: Sahinidis, Nikolaos V.
5021 ▼a Thesis (Ph.D.)--Carnegie Mellon University, 2019.
506 ▼a This item must not be sold to any third party vendors.
506 ▼a This item must not be added to any third party search indexes.
520 ▼a Today, chemical engineers have access to enormous amounts of data from a variety of sources. Decision makers are frequently tasked with manipulating and analyzing complex datasets. This data can be generated in different forms, such as numerical data, text data, and graphical data. Although numerous studies explore numerical data analysis, only a very small number explore text data and graphical data.In this dissertation, we develop different methodologies that integrate text mining techniques with optimization algorithms to automatically extract information from text and graphical data in chemical engineering applications. We use graphical data in Chapter 2, and text data in Chapters 3, 4, and 5.In Chapter 2, we address the problem of mining chemical flowsheets for process patterns. We propose a systematic methodology for mining structural patterns in chemical process flowsheets using sequence comparison algorithms. Our proposed methodology consists of three major steps. First, we generate graphical representations of general process flowsheets. Second, we use a depth-first search algorithm to traverse the graph of a flowsheet and convert it into a string. Finally, we use sequence alignment algorithms to mine flowsheet strings for process patterns. Depending on which alignment algorithm is used, the identified process patterns may or may not have inserted gaps. In addition, we conduct several case studies and present many resulting flowsheet patterns, which we are able to relate to heuristic rules in the literature.In Chapter 3, we address the problem of evaluating chemical patents. We propose the simultaneous use of eight criteria for patent ranking and evaluation. We also develop an intuitive linear optimization model that determines how to weigh different criteria. Our proposed methodology has been implemented in a web-based decision support system, and tested for its ability to identify the most important patents in the production of 22 chemicals.In Chapter 4, we analyze a collection of scientific literature using a technique in unsupervised data analytics, called "topic modeling." We use a state-of-the-art topic model to study the topic coverage in Computers & Chemical Engineering. This topic model uses the nonnegative matrix factorization technique to uncover the latent semantic structure (topics) in the documents of the journal. The results show that the journal has expanded its original four topics to 18 topics nowadays. Since 2000, the supply chain topic has grown rapidly and become a popular research area.In Chapter 5, we tackle a supervised learning task. We propose a modeling framework that uses derivative-free optimization to optimize document classification models with imbalanced datasets. Document classification models are considered to be black-box systems due to their hyperparameters. Derivative-free optimization is a well-suited technique for optimizing the performance of black-box systems. The nature of data imbalance affects a model's performance in two ways, both of which we address in our proposed modeling framework. To address the first effect, we maximize the smallest F1 prediction accuracy, and to address the second effect, we maximize the model prediction accuracy. Applied to a real dataset from Linde, our methodology resulted in up to 61% improvements of manual classification schemes.
590 ▼a School code: 0041.
650 4 ▼a Chemical engineering.
690 ▼a 0542
71020 ▼a Carnegie Mellon University. ▼b Chemical Engineering.
7730 ▼t Dissertations Abstracts International ▼g 81-05B.
773 ▼t Dissertation Abstract International
790 ▼a 0041
791 ▼a Ph.D.
792 ▼a 2019
793 ▼a English
85640 ▼u http://www.riss.kr/pdu/ddodLink.do?id=T15493559 ▼n KERIS ▼z 이 자료의 원문은 한국교육학술정보원에서 제공합니다.
980 ▼a 202002 ▼f 2020
990 ▼a ***1008102
991 ▼a E-BOOK