國立高雄大學圖資館 |

語系: 繁體中文

說明(常見問題)

圖資館首頁

登入

回首頁

切換: 標籤 | MARC模式 | ISBD

Learning discriminant narrow-band te...

Chen, Barry Yue.

Learning discriminant narrow-band temporal patterns for automatic recognition of conversational telephone speech.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	Learning discriminant narrow-band temporal patterns for automatic recognition of conversational telephone speech.
作者:	Chen, Barry Yue.
面頁冊數:	181 p.
附註:	Chair: Nelson Morgan.
附註:	Source: Dissertation Abstracts International, Volume: 66-08, Section: B, page: 4314.
Contained By:	Dissertation Abstracts International66-08B.
標題:	Computer Science.
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3187001
ISBN:	9780542292125

Learning discriminant narrow-band temporal patterns for automatic recognition of conversational telephone speech.
Chen, Barry Yue.

Learning discriminant narrow-band temporal patterns for automatic recognition of conversational telephone speech. - 181 p.

Chair: Nelson Morgan.

Thesis (Ph.D.)--University of California, Berkeley, 2005.

Typical automatic speech recognition (ASR) systems extract features from the full spectrum of speech over relatively short time spans (from about 25 milliseconds to approximately 100 milliseconds). They rely on the short-term spectral envelope of speech for modeling speech sounds. This dependence on the short-term spectral envelope of speech may account for the fact that ASR systems still fall short of human recognition ability. Variabilities in the speech signal come from environmental sources (such as noise and reverberation) as well as from the speaker herself/himself (such as accent and speaking style). These variabilities create difficult problems for typical ASR systems relying on the short-term spectral envelope of speech. This thesis further explores the extraction of discriminant speech information from long-term narrow-frequency energy trajectories of speech. These long-term narrow-frequency energy trajectories stretch over 500 milliseconds of speech and span critical-bandwidths. Previous work on extracting information from these long-term trajectories led to the development of a neural network architecture called Neural TRAP [52, 112]. Neural TRAP consists of two stages of multi-layer perceptrons (MLPs), each of which is a single hidden layer fully-connected MLP. The first stage is trained to estimate the phone posterior probabilities within each critical-band, while the second stage uses the critical-band level phone probabilities to come up with an overall estimate of the full spectrum phone posterior probabilities. This system was competitive to conventional ASR systems, but in combination with conventional systems, Neural TRAP significantly improved ASR performance. We extend the Neural TRAP work along two major directions in this thesis. First, we develop two new Neural TRAP-like architectures that extract different critical-band level information. The first new architecture, Hidden Activation TRAP (HAT), is like Neural TRAP except that instead of using the outputs of the critical-band MLPs, which estimate critical-band level phone probabilities, it uses the outputs of the critical-band hidden units, which represent probabilities of certain discriminant energy trajectories. The second new architecture, Tonotopic Multi-Layer Perceptron (TMLP), has the same network topology as HAT, but the critical-band hidden unit parameters and the discriminant energy trajectories that they model are not constrained to learn critical-band level phone posteriors, rather they are free to learn useful critical-band discriminant patterns for the estimation of the full-band phone posteriors. The second major extension in this thesis is the integration of the long-term narrow-band systems with a conventional ASR system for the recognition of conversational telephone speech (CTS). By augmenting conventional short-term features with features derived from a combination of phone posteriors estimated by the long-term systems and by more conventional intermediate-term systems, we achieve word error rate reductions of

ISBN: 9780542292125Subjects--Topical Terms:

212513
Computer Science.

Learning discriminant narrow-band temporal patterns for automatic recognition of conversational telephone speech.
LDR:04061nmm _2200265 _450 001 170803
005 20061228142249.5
008 090528s2005 eng d
020 $a 9780542292125
035 $a 00242833
040 $a UnM $c UnM
100 0 $a Chen, Barry Yue. $3 244834
245 1 0 $a Learning discriminant narrow-band temporal patterns for automatic recognition of conversational telephone speech.
300 $a 181 p.
500 $a Chair: Nelson Morgan.
500 $a Source: Dissertation Abstracts International, Volume: 66-08, Section: B, page: 4314.
502 $a Thesis (Ph.D.)--University of California, Berkeley, 2005.
520 # $a Typical automatic speech recognition (ASR) systems extract features from the full spectrum of speech over relatively short time spans (from about 25 milliseconds to approximately 100 milliseconds). They rely on the short-term spectral envelope of speech for modeling speech sounds. This dependence on the short-term spectral envelope of speech may account for the fact that ASR systems still fall short of human recognition ability. Variabilities in the speech signal come from environmental sources (such as noise and reverberation) as well as from the speaker herself/himself (such as accent and speaking style). These variabilities create difficult problems for typical ASR systems relying on the short-term spectral envelope of speech. This thesis further explores the extraction of discriminant speech information from long-term narrow-frequency energy trajectories of speech. These long-term narrow-frequency energy trajectories stretch over 500 milliseconds of speech and span critical-bandwidths. Previous work on extracting information from these long-term trajectories led to the development of a neural network architecture called Neural TRAP [52, 112]. Neural TRAP consists of two stages of multi-layer perceptrons (MLPs), each of which is a single hidden layer fully-connected MLP. The first stage is trained to estimate the phone posterior probabilities within each critical-band, while the second stage uses the critical-band level phone probabilities to come up with an overall estimate of the full spectrum phone posterior probabilities. This system was competitive to conventional ASR systems, but in combination with conventional systems, Neural TRAP significantly improved ASR performance. We extend the Neural TRAP work along two major directions in this thesis. First, we develop two new Neural TRAP-like architectures that extract different critical-band level information. The first new architecture, Hidden Activation TRAP (HAT), is like Neural TRAP except that instead of using the outputs of the critical-band MLPs, which estimate critical-band level phone probabilities, it uses the outputs of the critical-band hidden units, which represent probabilities of certain discriminant energy trajectories. The second new architecture, Tonotopic Multi-Layer Perceptron (TMLP), has the same network topology as HAT, but the critical-band hidden unit parameters and the discriminant energy trajectories that they model are not constrained to learn critical-band level phone posteriors, rather they are free to learn useful critical-band discriminant patterns for the estimation of the full-band phone posteriors. The second major extension in this thesis is the integration of the long-term narrow-band systems with a conventional ASR system for the recognition of conversational telephone speech (CTS). By augmenting conventional short-term features with features derived from a combination of phone posteriors estimated by the long-term systems and by more conventional intermediate-term systems, we achieve word error rate reductions of
590 $a School code: 0028.
650 # 0 $a Computer Science. $3 212513
650 # 0 $a Engineering, Electronics and Electrical. $3 226981
690 $a 0544
690 $a 0984
710 0 # $a University of California, Berkeley. $3 212474
773 0 # $g 66-08B. $t Dissertation Abstracts International
790 $a 0028
790 1 0 $a Morgan, Nelson, $e advisor
791 $a Ph.D.
792 $a 2005
856 4 0 $u http://libsw.nuk.edu.tw:81/login?url=http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3187001 $z http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3187001