語系:
繁體中文
English
說明(常見問題)
圖資館首頁
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
Term selection for information retrieval applications.
紀錄類型:
書目-電子資源 : Monograph/item
正題名/作者:
Term selection for information retrieval applications.
作者:
Schultz, J. Michael.
面頁冊數:
136 p.
附註:
Source: Dissertation Abstracts International, Volume: 64-10, Section: A, page: 3667.
附註:
Supervisor: Mark Y. Liberman.
Contained By:
Dissertation Abstracts International64-10A.
標題:
Language, Linguistics.
電子資源:
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3109218
ISBN:
0496567616
Term selection for information retrieval applications.
Schultz, J. Michael.
Term selection for information retrieval applications.
[electronic resource] - 136 p.
Source: Dissertation Abstracts International, Volume: 64-10, Section: A, page: 3667.
Thesis (Ph.D.)--University of Pennsylvania, 2003.
In this dissertation we investigate methods for selecting terms in the context of a number of specific tasks. As a practical test-case for some of the approaches developed here, we participate in the formal evaluations of Topic Detection and Tracking. In the spirit of residual-idf, a metric which measures deviation from Poisson, we develop a sum-log-ratios metric which improves upon residual-idf in two significant ways---it incorporates document length normalization and it is a function of the entire within-document term count distribution. Also developed here is the idea of a "universal dictionary" as a basis for translingual information retrieval tasks. In the methods section, we describe a suffix array based indexing scheme ideally suited to efficiently calculate within-document term counts for ngrams in very large corpora.
ISBN: 0496567616Subjects--Topical Terms:
212724
Language, Linguistics.
Term selection for information retrieval applications.
LDR
:03220nmm _2200277 _450
001
162235
005
20051017073425.5
008
230606s2003 eng d
020
$a
0496567616
035
$a
00148736
035
$a
162235
040
$a
UnM
$c
UnM
100
0
$a
Schultz, J. Michael.
$3
227361
245
1 0
$a
Term selection for information retrieval applications.
$h
[electronic resource]
300
$a
136 p.
500
$a
Source: Dissertation Abstracts International, Volume: 64-10, Section: A, page: 3667.
500
$a
Supervisor: Mark Y. Liberman.
502
$a
Thesis (Ph.D.)--University of Pennsylvania, 2003.
520
#
$a
In this dissertation we investigate methods for selecting terms in the context of a number of specific tasks. As a practical test-case for some of the approaches developed here, we participate in the formal evaluations of Topic Detection and Tracking. In the spirit of residual-idf, a metric which measures deviation from Poisson, we develop a sum-log-ratios metric which improves upon residual-idf in two significant ways---it incorporates document length normalization and it is a function of the entire within-document term count distribution. Also developed here is the idea of a "universal dictionary" as a basis for translingual information retrieval tasks. In the methods section, we describe a suffix array based indexing scheme ideally suited to efficiently calculate within-document term counts for ngrams in very large corpora.
520
#
$a
The selection and identification of terms is an important part of many natural language applications. In the information retrieval domain documents are often abbreviated to their most salient terms in order to reduce storage requirements and processing time and also to make algorithms more efficient. The quality of search results is a direct reflection of the quality of these representative features. In translingual applications translation dictionaries must be built in order to bridge the gap between source and target languages. With limited time and resources the most effective terms for translation must somehow be chosen. Techniques for term selection are also fundamental to a number of other tasks including automatic generation of indices, concordances and abstracts and the extraction of terminology.
520
#
$a
We test our methods in a number of real-world applications. In the formal evaluations of TDT2 we show that the simple vector space model performs as well as much more complicated models. In the context of building a "universal dictionary", we use our method of term selection to choose a vocabulary of less than 10,000 terms which is essentially as effective for topic tracking as an unlimited vocabulary of over 300,000 terms. We demonstrate that this same method extends well to other applications, employing it as a novel approach to multi-word terminology and collocation extraction.
590
$a
School code: 0175.
650
# 0
$a
Language, Linguistics.
$3
212724
650
# 0
$a
Computer Science.
$3
212513
650
# 0
$a
Information Science.
$3
212402
710
0 #
$a
University of Pennsylvania.
$3
212781
773
0 #
$g
64-10A.
$t
Dissertation Abstracts International
790
$a
0175
790
1 0
$a
Liberman, Mark Y.,
$e
advisor
791
$a
Ph.D.
792
$a
2003
856
4 0
$u
http://libsw.nuk.edu.tw/login?url=http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3109218
$z
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3109218
筆 0 讀者評論
全部
電子館藏
館藏
1 筆 • 頁數 1 •
1
條碼號
館藏地
館藏流通類別
資料類型
索書號
使用類型
借閱狀態
預約狀態
備註欄
附件
000000000728
電子館藏
1圖書
學位論文
一般使用(Normal)
在架
0
1 筆 • 頁數 1 •
1
多媒體
多媒體檔案
http://libsw.nuk.edu.tw/login?url=http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3109218
評論
新增評論
分享你的心得
Export
取書館別
處理中
...
變更密碼
登入