國立高雄大學圖資館 |

語系: 繁體中文

說明(常見問題)

圖資館首頁

登入

回首頁

切換: 標籤 | MARC模式 | ISBD

Optimal Integration of Machine Learn...

George Mason University.

Optimal Integration of Machine Learning Models: A Large-Scale Distributed Learning Framework with Application to Systematic Prediction of Adverse Drug Reactions.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	Optimal Integration of Machine Learning Models: A Large-Scale Distributed Learning Framework with Application to Systematic Prediction of Adverse Drug Reactions.
作者:	Ngufor, Che G.
面頁冊數:	208 p.
附註:	Source: Dissertation Abstracts International, Volume: 76-03(E), Section: B.
附註:	Advisers: Janusz Wojtusiak; James Gentle.
Contained By:	Dissertation Abstracts International76-03B(E).
標題:	Statistics.
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3644793
ISBN:	9781321331738

Optimal Integration of Machine Learning Models: A Large-Scale Distributed Learning Framework with Application to Systematic Prediction of Adverse Drug Reactions.
Ngufor, Che G.

Optimal Integration of Machine Learning Models: A Large-Scale Distributed Learning Framework with Application to Systematic Prediction of Adverse Drug Reactions. - 208 p.

Source: Dissertation Abstracts International, Volume: 76-03(E), Section: B.

Thesis (Ph.D.)--George Mason University, 2014.

This item must not be sold to any third party vendors.

Too often in the real world information from multiple sources such as humans, experts, agents, or classifiers need to be integrated to provide support for a decision making system. One popular approach in machine learning is to combine these sources through an ensemble learning method. Ensemble learning has been proven to provide appealing solutions to many complex and challenging problems in machine learning. These include for example learning under non-standard conditions such as learning from large volumes of data, learning in the presence of uncertainties, learning with data streams, or when the concept to be learned drifts over time. Although considerable amount of research work has been done in ensemble learning in recent years, there still remain many open issues and challenges. This thesis explores three major challenges in this research area: First, development of techniques that scale up to large and possibly physically distributed databases. Second, construction of exact or approximately exact global models from distributed heterogeneous datasets with minimal data communication while preserving privacy of the data. Third, how to efficiently learn from modern large-scale datasets which are often characterized by noisy data points, unlabeled or poorly labeled, sample bias, missing values, etc.

ISBN: 9781321331738Subjects--Topical Terms:

182057
Statistics.

Optimal Integration of Machine Learning Models: A Large-Scale Distributed Learning Framework with Application to Systematic Prediction of Adverse Drug Reactions.
LDR:05380nmm a2200337 4500 001 457734
005 20150805065228.5
008 150916s2014 ||||||||||||||||| ||eng d
020 $a 9781321331738
035 $a (MiAaPQ)AAI3644793
035 $a AAI3644793
040 $a MiAaPQ $c MiAaPQ
100 1 $a Ngufor, Che G. $3 708796
245 1 0 $a Optimal Integration of Machine Learning Models: A Large-Scale Distributed Learning Framework with Application to Systematic Prediction of Adverse Drug Reactions.
300 $a 208 p.
500 $a Source: Dissertation Abstracts International, Volume: 76-03(E), Section: B.
500 $a Advisers: Janusz Wojtusiak; James Gentle.
502 $a Thesis (Ph.D.)--George Mason University, 2014.
506 $a This item must not be sold to any third party vendors.
520 $a Too often in the real world information from multiple sources such as humans, experts, agents, or classifiers need to be integrated to provide support for a decision making system. One popular approach in machine learning is to combine these sources through an ensemble learning method. Ensemble learning has been proven to provide appealing solutions to many complex and challenging problems in machine learning. These include for example learning under non-standard conditions such as learning from large volumes of data, learning in the presence of uncertainties, learning with data streams, or when the concept to be learned drifts over time. Although considerable amount of research work has been done in ensemble learning in recent years, there still remain many open issues and challenges. This thesis explores three major challenges in this research area: First, development of techniques that scale up to large and possibly physically distributed databases. Second, construction of exact or approximately exact global models from distributed heterogeneous datasets with minimal data communication while preserving privacy of the data. Third, how to efficiently learn from modern large-scale datasets which are often characterized by noisy data points, unlabeled or poorly labeled, sample bias, missing values, etc.
520 $a These challenges are addressed in this thesis by the introduction of a large-scale parallel efficient optimal Bayesian integration framework. The framework is divided into three main parts: In the first part, two simple, fast and scalable active learning techniques are used to provide the base learners with "desirable" training sets. Then, application of a Bayesian inference and decision making technique allows the computation of several performance measures for ranking and selection of the base learners. The second part presents computationally efficient Bayesian inference and generative models to optimally integrate the outputs of a collection of classifiers optionally selected in the first part. The models improve overall performance by their ability to incorporate highly informative features not available to the classifiers at training time. In addition, the influence of weak classifiers in the final decision can be mitigated while the performances of reliable ones complimented. The last part presents a collective machine learning system for learning large-scale homogeneous and heterogeneous distributed databases through the integration of parts one and two. Each distributed data site modeled as an agent is tightly integrated with a collection of classifiers, a data and algorithm selection model, a classification evidence model and an ensemble model.
520 $a Two parallel programming models are explored for collective learning: an efficient single-pass MapReduce programming model is proposed for homogeneous agents while the MPI programming model with minimal communication is proposed for heterogeneous agents. By sharing a small set of highly informative non-sensitive feature vectors hidden from the agents at training time, the system is able to improve classification accuracy compared to traditional methods. A salient feature of the system is that global models generated are approximately exact. Further, since the information shared during learning is non-restrictive, it negates in some cases the need for difficult and computationally intensive privacy-preserving machine learning algorithms.
520 $a Three real world applications demonstrates the accuracy and scalability of the proposed integration framework: First, experiments on a series of benchmark datasets demonstrates the superiority of the framework in learning from distributed heterogeneous data sites. Second, the framework is applied to predict with high accuracy flight delays from a large-scale distributed flight arrival, departure and aircraft information modeled as homogeneous agents on a small Hadoop cluster. The last experiment is a case study of the usability of the collective machine learning system. Implemented on low cost cloud computing infrastructures such as Google Compute Engine, the system is used in a novel systematic and structured approach to detect and predict large-scale drug-side effects interactions with very high accuracy.
590 $a School code: 0883.
650 4 $a Statistics. $3 182057
650 4 $a Mathematics. $3 184409
650 4 $a Computer science. $3 199325
690 $a 0463
690 $a 0405
690 $a 0984
710 2 $a George Mason University. $b Computational Sciences and Informatics. $3 708797
773 0 $t Dissertation Abstracts International $g 76-03B(E).
790 $a 0883
791 $a Ph.D.
792 $a 2014
793 $a English
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3644793