MARC보기
LDR00000nam u2200205 4500
001000000434777
00520200227103000
008200131s2019 ||||||||||||||||| ||eng d
020 ▼a 9781392516041
035 ▼a (MiAaPQ)AAI27543912
040 ▼a MiAaPQ ▼c MiAaPQ ▼d 247004
0820 ▼a 400
1001 ▼a Simpson, Sean S.
24510 ▼a A Multifactorial, Multitask Approach to Automated Speaker Profiling.
260 ▼a [S.l.]: ▼b Georgetown University., ▼c 2019.
260 1 ▼a Ann Arbor: ▼b ProQuest Dissertations & Theses, ▼c 2019.
300 ▼a 301 p.
500 ▼a Source: Dissertations Abstracts International, Volume: 81-06, Section: B.
500 ▼a Advisor: Zeldes, Amir
5021 ▼a Thesis (Ph.D.)--Georgetown University, 2019.
506 ▼a This item must not be sold to any third party vendors.
520 ▼a Automated Speaker Profiling (ASP) refers broadly to the computational prediction of speaker traits based on cues mined from the speech signal. Accurate prediction of such traits can have a wide variety of applications such as automating the collection of customer metadata, improving smart-speaker/voice-assistant interactions, narrowing down suspect pools in forensic situations, etc.Approaches to ASP to date have primarily focused on single-task computational models-i.e. models which each predict one speaker trait in isolation. Recent work however has suggested that using a multi-task learning framework, in which a systemlearns to predict multiple related traits simultaneously, each trait-prediction task having access to the training signals of all other trait-prediction tasks, can increase classification accuracy along all trait axes considered.Likewise, most work on ASP to date has focused primarily on acoustic cues as predictive features for speaker profiling. However, there is a wide range of evidence from the sociolinguistic literature that lexical and phonological cues may also be of use in predicting social characteristics of a given speaker. Recent work in the field of author profiling has also demonstrated the utility of lexical features in predicting social information about authors of textual data, though few studies have investigated whether this carries over to spoken data.In this dissertation I focus on prediction of five different social traits: sex, ethnicity, age, region, and education. Linguistic features from the acoustic, phonetic, and lexical realms are extracted from 60 second chunks of speech taken from the 2008 NIST SRE corpus and used to train several types of predictive models. Naive (majority class prediction) and informed (single-task neural network) models are trained to provide baseline predictions against which multi-task neural network models are evaluated. Feature importance experiments are performed in order to investigate which features and feature types are most useful for predicting which social traits.Results presented in chapters 5-7 of this dissertation demonstrate that multitask models consistently outperform single-task models, that models are most accurate when provided information from all three linguistic levels considered, and that lexical features as a group contribute substantially more predictive power than either phonetic or acoustic features.
590 ▼a School code: 0076.
650 4 ▼a Linguistics.
650 4 ▼a Artificial intelligence.
650 4 ▼a Sociolinguistics.
690 ▼a 0290
690 ▼a 0800
690 ▼a 0636
71020 ▼a Georgetown University. ▼b Linguistics.
7730 ▼t Dissertations Abstracts International ▼g 81-06B.
773 ▼t Dissertation Abstract International
790 ▼a 0076
791 ▼a Ph.D.
792 ▼a 2019
793 ▼a English
85640 ▼u http://www.riss.kr/pdu/ddodLink.do?id=T15494471 ▼n KERIS ▼z 이 자료의 원문은 한국교육학술정보원에서 제공합니다.
980 ▼a 202002 ▼f 2020
990 ▼a ***1008102
991 ▼a E-BOOK