語系:
繁體中文
English
說明(常見問題)
圖資館首頁
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
Identifying, Evaluating and Applying Importance Maps for Speech.
紀錄類型:
書目-電子資源 : Monograph/item
正題名/作者:
Identifying, Evaluating and Applying Importance Maps for Speech.
作者:
Trinh, Viet Anh.
出版者:
Ann Arbor : ProQuest Dissertations & Theses, 2022
面頁冊數:
119 p.
附註:
Source: Dissertations Abstracts International, Volume: 83-06, Section: B.
附註:
Advisor: Mandel, Michael I.
Contained By:
Dissertations Abstracts International83-06B.
標題:
Computer science.
電子資源:
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28861594
ISBN:
9798759951452
Identifying, Evaluating and Applying Importance Maps for Speech.
Trinh, Viet Anh.
Identifying, Evaluating and Applying Importance Maps for Speech.
- Ann Arbor : ProQuest Dissertations & Theses, 2022 - 119 p.
Source: Dissertations Abstracts International, Volume: 83-06, Section: B.
Thesis (Ph.D.)--City University of New York, 2022.
This item must not be sold to any third party vendors.
Like many machine learning systems, speech models often perform well when employed on data in the same domain as their training data. However, when the inference is on out-of-domain data, performance suffers. With a fast-growing number of applications of speech models in healthcare, education, automotive, automation, etc., it is essential to ensure that speech models can generalize to out-of-domain data, especially to noisy environments in real-world scenarios. In contrast, human listeners are quite robust to noisy environments. Thus, a thorough understanding of the differences between human listeners and speech models is urgently required to enhance speech model performance in noise. These differences exist presumably because the speech model does not use the same information as humans for recognizing the speech. A possible solution is encouraging the speech model to attend to the same time-frequency regions as human listeners. In this way, speech model generalization in noise may be improved. We define those time-frequency regions that humans or machines focus on to recognize the speech as importance maps (IMs). In this research, first, we investigate how to identify speech importance maps. Second, we compare human and machine importance maps to understand how they differ and how the speech model can learn from humans to improve its performance in noise. Third, we develop a structured saliency benchmark (SSBM), a metric for evaluating IMs. Finally, we propose a new application of IMs as data augmentation for speech models, enhancing their performance and enabling them to better generalize to out-of-domain noise.Overall, our work demonstrates that we can improve speech models and achieve out-of-domain generalization to different noise environments with importance maps. In the future, we will expand our work with large-scale speech models and deploy different methods to identify IMs and use them to augment the speech data, such as those based on human responses. We can also extend the technique to computer vision tasks, such as image recognition by predicting importance maps for images and use IMs to enhance model performance to out-of-domain data.
ISBN: 9798759951452Subjects--Topical Terms:
199325
Computer science.
Subjects--Index Terms:
Data augmentation
Identifying, Evaluating and Applying Importance Maps for Speech.
LDR
:03357nmm a2200361 4500
001
616501
005
20220513114354.5
008
220920s2022 ||||||||||||||||| ||eng d
020
$a
9798759951452
035
$a
(MiAaPQ)AAI28861594
035
$a
AAI28861594
040
$a
MiAaPQ
$c
MiAaPQ
100
1
$a
Trinh, Viet Anh.
$3
915880
245
1 0
$a
Identifying, Evaluating and Applying Importance Maps for Speech.
260
1
$a
Ann Arbor :
$b
ProQuest Dissertations & Theses,
$c
2022
300
$a
119 p.
500
$a
Source: Dissertations Abstracts International, Volume: 83-06, Section: B.
500
$a
Advisor: Mandel, Michael I.
502
$a
Thesis (Ph.D.)--City University of New York, 2022.
506
$a
This item must not be sold to any third party vendors.
520
$a
Like many machine learning systems, speech models often perform well when employed on data in the same domain as their training data. However, when the inference is on out-of-domain data, performance suffers. With a fast-growing number of applications of speech models in healthcare, education, automotive, automation, etc., it is essential to ensure that speech models can generalize to out-of-domain data, especially to noisy environments in real-world scenarios. In contrast, human listeners are quite robust to noisy environments. Thus, a thorough understanding of the differences between human listeners and speech models is urgently required to enhance speech model performance in noise. These differences exist presumably because the speech model does not use the same information as humans for recognizing the speech. A possible solution is encouraging the speech model to attend to the same time-frequency regions as human listeners. In this way, speech model generalization in noise may be improved. We define those time-frequency regions that humans or machines focus on to recognize the speech as importance maps (IMs). In this research, first, we investigate how to identify speech importance maps. Second, we compare human and machine importance maps to understand how they differ and how the speech model can learn from humans to improve its performance in noise. Third, we develop a structured saliency benchmark (SSBM), a metric for evaluating IMs. Finally, we propose a new application of IMs as data augmentation for speech models, enhancing their performance and enabling them to better generalize to out-of-domain noise.Overall, our work demonstrates that we can improve speech models and achieve out-of-domain generalization to different noise environments with importance maps. In the future, we will expand our work with large-scale speech models and deploy different methods to identify IMs and use them to augment the speech data, such as those based on human responses. We can also extend the technique to computer vision tasks, such as image recognition by predicting importance maps for images and use IMs to enhance model performance to out-of-domain data.
590
$a
School code: 0046.
650
4
$a
Computer science.
$3
199325
653
$a
Data augmentation
653
$a
Explainable artificial intelligence
653
$a
Importance maps
653
$a
Noise robustness
653
$a
Speech perception
653
$a
Speech recognition
690
$a
0984
710
2
$a
City University of New York.
$b
Computer Science.
$3
492891
773
0
$t
Dissertations Abstracts International
$g
83-06B.
790
$a
0046
791
$a
Ph.D.
792
$a
2022
793
$a
English
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28861594
筆 0 讀者評論
全部
電子館藏
館藏
1 筆 • 頁數 1 •
1
條碼號
館藏地
館藏流通類別
資料類型
索書號
使用類型
借閱狀態
預約狀態
備註欄
附件
000000208594
電子館藏
1圖書
電子書
EB 2022
一般使用(Normal)
在架
0
1 筆 • 頁數 1 •
1
多媒體
多媒體檔案
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28861594
評論
新增評論
分享你的心得
Export
取書館別
處理中
...
變更密碼
登入