Language:
English
繁體中文
Help
圖資館首頁
Login
Back
Switch To:
Labeled
|
MARC Mode
|
ISBD
Term selection for information retrieval applications.
Record Type:
Electronic resources : Monograph/item
Title/Author:
Term selection for information retrieval applications.
Author:
Schultz, J. Michael.
Description:
136 p.
Notes:
Source: Dissertation Abstracts International, Volume: 64-10, Section: A, page: 3667.
Notes:
Supervisor: Mark Y. Liberman.
Contained By:
Dissertation Abstracts International64-10A.
Subject:
Language, Linguistics.
Online resource:
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3109218
ISBN:
0496567616
Term selection for information retrieval applications.
Schultz, J. Michael.
Term selection for information retrieval applications.
[electronic resource] - 136 p.
Source: Dissertation Abstracts International, Volume: 64-10, Section: A, page: 3667.
Thesis (Ph.D.)--University of Pennsylvania, 2003.
In this dissertation we investigate methods for selecting terms in the context of a number of specific tasks. As a practical test-case for some of the approaches developed here, we participate in the formal evaluations of Topic Detection and Tracking. In the spirit of residual-idf, a metric which measures deviation from Poisson, we develop a sum-log-ratios metric which improves upon residual-idf in two significant ways---it incorporates document length normalization and it is a function of the entire within-document term count distribution. Also developed here is the idea of a "universal dictionary" as a basis for translingual information retrieval tasks. In the methods section, we describe a suffix array based indexing scheme ideally suited to efficiently calculate within-document term counts for ngrams in very large corpora.
ISBN: 0496567616Subjects--Topical Terms:
212724
Language, Linguistics.
Term selection for information retrieval applications.
LDR
:03220nmm _2200277 _450
001
162235
005
20051017073425.5
008
230606s2003 eng d
020
$a
0496567616
035
$a
00148736
035
$a
162235
040
$a
UnM
$c
UnM
100
0
$a
Schultz, J. Michael.
$3
227361
245
1 0
$a
Term selection for information retrieval applications.
$h
[electronic resource]
300
$a
136 p.
500
$a
Source: Dissertation Abstracts International, Volume: 64-10, Section: A, page: 3667.
500
$a
Supervisor: Mark Y. Liberman.
502
$a
Thesis (Ph.D.)--University of Pennsylvania, 2003.
520
#
$a
In this dissertation we investigate methods for selecting terms in the context of a number of specific tasks. As a practical test-case for some of the approaches developed here, we participate in the formal evaluations of Topic Detection and Tracking. In the spirit of residual-idf, a metric which measures deviation from Poisson, we develop a sum-log-ratios metric which improves upon residual-idf in two significant ways---it incorporates document length normalization and it is a function of the entire within-document term count distribution. Also developed here is the idea of a "universal dictionary" as a basis for translingual information retrieval tasks. In the methods section, we describe a suffix array based indexing scheme ideally suited to efficiently calculate within-document term counts for ngrams in very large corpora.
520
#
$a
The selection and identification of terms is an important part of many natural language applications. In the information retrieval domain documents are often abbreviated to their most salient terms in order to reduce storage requirements and processing time and also to make algorithms more efficient. The quality of search results is a direct reflection of the quality of these representative features. In translingual applications translation dictionaries must be built in order to bridge the gap between source and target languages. With limited time and resources the most effective terms for translation must somehow be chosen. Techniques for term selection are also fundamental to a number of other tasks including automatic generation of indices, concordances and abstracts and the extraction of terminology.
520
#
$a
We test our methods in a number of real-world applications. In the formal evaluations of TDT2 we show that the simple vector space model performs as well as much more complicated models. In the context of building a "universal dictionary", we use our method of term selection to choose a vocabulary of less than 10,000 terms which is essentially as effective for topic tracking as an unlimited vocabulary of over 300,000 terms. We demonstrate that this same method extends well to other applications, employing it as a novel approach to multi-word terminology and collocation extraction.
590
$a
School code: 0175.
650
# 0
$a
Language, Linguistics.
$3
212724
650
# 0
$a
Computer Science.
$3
212513
650
# 0
$a
Information Science.
$3
212402
710
0 #
$a
University of Pennsylvania.
$3
212781
773
0 #
$g
64-10A.
$t
Dissertation Abstracts International
790
$a
0175
790
1 0
$a
Liberman, Mark Y.,
$e
advisor
791
$a
Ph.D.
792
$a
2003
856
4 0
$u
http://libsw.nuk.edu.tw/login?url=http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3109218
$z
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3109218
based on 0 review(s)
ALL
電子館藏
Items
1 records • Pages 1 •
1
Inventory Number
Location Name
Item Class
Material type
Call number
Usage Class
Loan Status
No. of reservations
Opac note
Attachments
000000000728
電子館藏
1圖書
學位論文
一般使用(Normal)
On shelf
0
1 records • Pages 1 •
1
Multimedia
Multimedia file
http://libsw.nuk.edu.tw/login?url=http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3109218
Reviews
Add a review
and share your thoughts with other readers
Export
pickup library
Processing
...
Change password
Login