대구한의대학교 향산도서관

상세정보

부가기능

A Multifactorial, Multitask Approach to Automated Speaker Profiling

상세 프로파일

상세정보
자료유형학위논문
서명/저자사항A Multifactorial, Multitask Approach to Automated Speaker Profiling.
개인저자Simpson, Sean S.
단체저자명Georgetown University. Linguistics.
발행사항[S.l.]: Georgetown University., 2019.
발행사항Ann Arbor: ProQuest Dissertations & Theses, 2019.
형태사항301 p.
기본자료 저록Dissertations Abstracts International 81-06B.
Dissertation Abstract International
ISBN9781392516041
학위논문주기Thesis (Ph.D.)--Georgetown University, 2019.
일반주기 Source: Dissertations Abstracts International, Volume: 81-06, Section: B.
Advisor: Zeldes, Amir
이용제한사항This item must not be sold to any third party vendors.
요약Automated Speaker Profiling (ASP) refers broadly to the computational prediction of speaker traits based on cues mined from the speech signal. Accurate prediction of such traits can have a wide variety of applications such as automating the collection of customer metadata, improving smart-speaker/voice-assistant interactions, narrowing down suspect pools in forensic situations, etc.Approaches to ASP to date have primarily focused on single-task computational models-i.e. models which each predict one speaker trait in isolation. Recent work however has suggested that using a multi-task learning framework, in which a systemlearns to predict multiple related traits simultaneously, each trait-prediction task having access to the training signals of all other trait-prediction tasks, can increase classification accuracy along all trait axes considered.Likewise, most work on ASP to date has focused primarily on acoustic cues as predictive features for speaker profiling. However, there is a wide range of evidence from the sociolinguistic literature that lexical and phonological cues may also be of use in predicting social characteristics of a given speaker. Recent work in the field of author profiling has also demonstrated the utility of lexical features in predicting social information about authors of textual data, though few studies have investigated whether this carries over to spoken data.In this dissertation I focus on prediction of five different social traits: sex, ethnicity, age, region, and education. Linguistic features from the acoustic, phonetic, and lexical realms are extracted from 60 second chunks of speech taken from the 2008 NIST SRE corpus and used to train several types of predictive models. Naive (majority class prediction) and informed (single-task neural network) models are trained to provide baseline predictions against which multi-task neural network models are evaluated. Feature importance experiments are performed in order to investigate which features and feature types are most useful for predicting which social traits.Results presented in chapters 5-7 of this dissertation demonstrate that multitask models consistently outperform single-task models, that models are most accurate when provided information from all three linguistic levels considered, and that lexical features as a group contribute substantially more predictive power than either phonetic or acoustic features.
일반주제명Linguistics.
Artificial intelligence.
Sociolinguistics.
언어영어
바로가기URL : 이 자료의 원문은 한국교육학술정보원에서 제공합니다.

서평(리뷰)

  • 서평(리뷰)

태그

  • 태그

나의 태그

나의 태그 (0)

모든 이용자 태그

모든 이용자 태그 (0) 태그 목록형 보기 태그 구름형 보기
 
로그인폼