國立高雄大學圖資館 |

語系: 繁體中文

說明(常見問題)

圖資館首頁

登入

回首頁

切換: 標籤 | MARC模式 | ISBD

Algorithms for high dimensional data.

Nguyen, Huy Le.

Algorithms for high dimensional data.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	Algorithms for high dimensional data.
作者:	Nguyen, Huy Le.
面頁冊數:	168 p.
附註:	Source: Dissertation Abstracts International, Volume: 76-03(E), Section: B.
附註:	Adviser: Moses S. Charikar.
Contained By:	Dissertation Abstracts International76-03B(E).
標題:	Computer science.
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3642130
ISBN:	9781321287837

Algorithms for high dimensional data.
Nguyen, Huy Le.

Algorithms for high dimensional data. - 168 p.

Source: Dissertation Abstracts International, Volume: 76-03(E), Section: B.

Thesis (Ph.D.)--Princeton University, 2014.

This item must not be sold to any third party vendors.

The quest of understanding large high dimension data has always been plagued by the so-called "curse of dimensionality": algorithms regularly suffer from terrible exponential dependence on the dimension. One popular approach to deal with this problem is dimension reduction, the process of reducing high dimensional objects to lower dimensional objects or other compact representations while maintaining the objectives of the problems at hand. This philosophy manifests itself in many different forms: from analytic notions like metric embeddings, to locality sensitive hashing, and even information theoretic notions like sketches. .

ISBN: 9781321287837Subjects--Topical Terms:

199325
Computer science.

Algorithms for high dimensional data.
LDR:04945nmm a2200337 4500 001 457727
005 20150805065227.5
008 150916s2014 ||||||||||||||||| ||eng d
020 $a 9781321287837
035 $a (MiAaPQ)AAI3642130
035 $a AAI3642130
040 $a MiAaPQ $c MiAaPQ
100 1 $a Nguyen, Huy Le. $3 708783
245 1 0 $a Algorithms for high dimensional data.
300 $a 168 p.
500 $a Source: Dissertation Abstracts International, Volume: 76-03(E), Section: B.
500 $a Adviser: Moses S. Charikar.
502 $a Thesis (Ph.D.)--Princeton University, 2014.
506 $a This item must not be sold to any third party vendors.
520 $a The quest of understanding large high dimension data has always been plagued by the so-called "curse of dimensionality": algorithms regularly suffer from terrible exponential dependence on the dimension. One popular approach to deal with this problem is dimension reduction, the process of reducing high dimensional objects to lower dimensional objects or other compact representations while maintaining the objectives of the problems at hand. This philosophy manifests itself in many different forms: from analytic notions like metric embeddings, to locality sensitive hashing, and even information theoretic notions like sketches. .
520 $a A fundamental task in data processing is identifying patterns, from similar objects, to relations between different attributes of the same object. An abstract way to model similarity is via the notion of distance: two objects are similar if their distance is small, a relation of regressors is predictive if the observed dependent variable is at a small distance from the predicted value, etc. All three aforementioned notions preserve distances but with varied amounts of structure. Metric embeddings preserve the most amount of structure and hence, can be combined easily with existing algorithms. In contrast, sketches require no additional structure and therefore, are the easiest to obtain but often require new algorithms. Locality sensitive hashing only has limited structure, which is enough to enable certain applications such as nearest neighbor search. The flexibility of such a large toolkit is hugely beneficial. We will demonstrate applications of these techniques to design fast algorithms for practical questions including nearest neighbor search, linear regressions, and low rank approximation, study their limitations, and overcome those limitations with new techniques.
520 $a Specifically, in this thesis, we present the following contributions.
520 $a Fast dimension reduction for sparse data. In many applications, the input data is in high dimensions but also sparse. Examples of such inputs include columns of document-word incidence matrices in information retrieval, the person-object scoring matrix in collaborative filtering tasks, constraint matrices for geodetic survey problems, photogrammetry problems, molecular structure problems [Rice83], etc. In the last decade, subspace embedding has emerged as a useful tool for designing algorithms for such data. We present a new construction of subspace embedding with nearly optimal dimensions and embedding time nearly linear in the sparsity of the input. Our new embeddings have improved the running time of many important problems from least squares regression to low rank approximations, which are workhorses of statistical analysis. For instance, our running time for least squares regression is nearly optimal unless one can avoid generic matrix multiplication.
520 $a Lower bounds for dimension reduction. We show the embeddings above are in fact nearly optimal in many regimes of parameters. Additionally, the same techniques also yield nearly tight bounds for sparse Johnson-Lindenstrass embedding, and RIP matrices. We also give new lower bounds for dimension reduction in l1, showing that for 1+epsilon distortion, there are point sets of size n in l1 requiring n(1-1/log (1/epsilon)) dimensions. This bound is nearly optimal as one can reduce the dimensions to linear while approximately preserving all distances.
520 $a Better approximate nearest neighbor search. For approximation factor c, we give a new algorithm for approximate nearest neighbor search in Euclidean space with query time n rho and space n(1+rho) where rho = 7/(8 c2) + O(1/c3). This algorithm bypasses the limit of locality sensitive hashing, the main previous technique for approximate nearest neighbor search in high dimensions. Furthermore, the algorithm achieves faster running time and smaller space via data-dependent hashing, a method widely used in practice but eluded satisfactory mathematical explanation.
590 $a School code: 0181.
650 4 $a Computer science. $3 199325
690 $a 0984
710 2 $a Princeton University. $b Computer Science. $3 708784
773 0 $t Dissertation Abstracts International $g 76-03B(E).
790 $a 0181
791 $a Ph.D.
792 $a 2014
793 $a English
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3642130