國立高雄大學圖資館 |

Topics in Machine Learning Optimization.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	Topics in Machine Learning Optimization.
作者:	Fang, Biyi.
出版者:	Ann Arbor : ProQuest Dissertations & Theses, 2021
面頁冊數:	247 p.
附註:	Source: Dissertations Abstracts International, Volume: 83-07, Section: B.
附註:	Advisor: Klabjan, Diego.
Contained By:	Dissertations Abstracts International83-07B.
標題:	Applied mathematics.
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28771127
ISBN:	9798759972181

Topics in Machine Learning Optimization.
Fang, Biyi.

Topics in Machine Learning Optimization. - Ann Arbor : ProQuest Dissertations & Theses, 2021 - 247 p.

Source: Dissertations Abstracts International, Volume: 83-07, Section: B.

Thesis (Ph.D.)--Northwestern University, 2021.

This item must not be sold to any third party vendors.

Recently, machine learning and deep learning, which have made many theoretical and empirical breakthroughs and is widely applied in various fields, attract a great number of researchers and practitioners. They have become one of the most popular research directions and plays a significant role in many fields, such as machine translation, speech recognition, image recognition, recommendation system, etc. Optimization, as one of the core components, attracts much attention of researchers. The essence of most machine learning and deep learning algorithms is to build an optimization model and learn the parameters in the objective function from the given data. With the exponential growth of data amount and the increase of model complexity, optimization methods in machine learning face more and more challenges. In the era of immense data, the effectiveness and efficiency of the numerical optimization algorithms dramatically influence the popularization and application of the machine learning and deep learning models. In this study, we propose a few effective optimization algorithms for different optimization problems, which have improved the performance and efficiency of machine learning and deep learning methods. This dissertation consists of four chapters, 1) Stochastic Large-scale Machine Learning Algorithms with Distributed Features and Observations, 2) Convergence Analyses of Online ADAM, 3) Topic Analysis for Text with Side Data and 4) Tricks and Plugins to GBM on Images and Sequences.In the first chapter, We propose a general stochastic offline algorithm where observations, features, and gradient components can be sampled in a double distributed setting, i.e., with both features and observations distributed. Moreover, very technical analyses establish convergence properties of the algorithm under different conditions on the learning rate (diminishing to zero or constant). Furthermore, computational experiments in Spark demonstrate a superior performance of our algorithm versus a benchmark in early iterations of the algorithm, which is due to the stochastic components of the algorithm.In the second chapter, we explore how to apply optimization algorithms with fixed learning rate in online learning. Online learning is an appealing learning paradigm, which is of great interest in practice due to the recent emergence of large scale applications. Standard online learning assumes a finite number of samples while in practice data is streamed infinitely. In such a setting gradient descent with a diminishing learning rate does not work. In this chapter, we first introduce regret with rolling window, a performance metric, which measures the performance of an algorithm on every fixed number of contiguous samples. Meanwhile, we propose a family of algorithms with a constant or adaptive learning rate and provide very technical analyses establishing regret bound properties. We cover the convex setting showing the regret of the order of the square root of the size of the window in the constant and dynamic learning rate scenarios. Our proof is applicable also to the standard online setting where we provide analyses of the same regret order (the previous proofs have flaws). We also study a two layer neural network setting with reLU activation. In this case we establish that if initial weights are close to a stationary point, the same regret bound is attainable. We conduct computational experiments demonstrating a superior performance of the proposed algorithms.In the third paper, we employ text with side data to tackle the limitations like cold-start and non- transparency in latent factor models (e.g., matrix factorization). We introduce a hybrid generative probabilistic model that combines a neural network with a latent topic model, which is a four-level hierarchical Bayesian model. In the model, each document is modeled as a finite mixture over an4underlying set of topics and each topic is modeled as an infinite mixture over an underlying set of topic probabilities. Furthermore, each topic probability is modeled as a finite mixture over side data. In the context of text, the neural network provides an overview distribution about side data for the corresponding text, which is the prior distribution in LDA to help perform topic grouping. The approach is evaluated on several different datasets, where the model is shown to outperform standard LDA and Dirichlet-multinomial regression (DMR) in terms of topic grouping, model perplexity, classification and comment generation.In the forth paper, we propose a new algorithm for boosting Deep Convolutional Neural Net- works (BoostCNN) to combine the merits of dynamic feature selection and BoostCNN, and an- other new family of algorithms combining boosting and transformers. To learn these new models, we introduce subgrid selection and importance sampling strategies and propose a set of algorithms to incorporate boosting weights into a deep learning architecture based on a least squares objective function. These algorithms not only reduce the required manual effort for finding an appropriate network architecture but also result in superior performance and lower running time. Experiments show that the proposed methods outperform benchmarks on several fine-grained classification tasks. The systematic retrospect and summary of the optimization methods from the perspective of machine learning are of great significance, which can offer guidance for both developments of optimization and machine learning research.

ISBN: 9798759972181Subjects--Topical Terms:

377601
Applied mathematics.
Subjects--Index Terms:

Algorithm design

Topics in Machine Learning Optimization.
LDR:06599nmm a2200349 4500 001 616484
005 20220513114349.5
008 220920s2021 ||||||||||||||||| ||eng d
020 $a 9798759972181
035 $a (MiAaPQ)AAI28771127
035 $a AAI28771127
040 $a MiAaPQ $c MiAaPQ
100 1 $a Fang, Biyi. $3 915842
245 1 0 $a Topics in Machine Learning Optimization.
260 1 $a Ann Arbor : $b ProQuest Dissertations & Theses, $c 2021
300 $a 247 p.
500 $a Source: Dissertations Abstracts International, Volume: 83-07, Section: B.
500 $a Advisor: Klabjan, Diego.
502 $a Thesis (Ph.D.)--Northwestern University, 2021.
506 $a This item must not be sold to any third party vendors.
520 $a Recently, machine learning and deep learning, which have made many theoretical and empirical breakthroughs and is widely applied in various fields, attract a great number of researchers and practitioners. They have become one of the most popular research directions and plays a significant role in many fields, such as machine translation, speech recognition, image recognition, recommendation system, etc. Optimization, as one of the core components, attracts much attention of researchers. The essence of most machine learning and deep learning algorithms is to build an optimization model and learn the parameters in the objective function from the given data. With the exponential growth of data amount and the increase of model complexity, optimization methods in machine learning face more and more challenges. In the era of immense data, the effectiveness and efficiency of the numerical optimization algorithms dramatically influence the popularization and application of the machine learning and deep learning models. In this study, we propose a few effective optimization algorithms for different optimization problems, which have improved the performance and efficiency of machine learning and deep learning methods. This dissertation consists of four chapters, 1) Stochastic Large-scale Machine Learning Algorithms with Distributed Features and Observations, 2) Convergence Analyses of Online ADAM, 3) Topic Analysis for Text with Side Data and 4) Tricks and Plugins to GBM on Images and Sequences.In the first chapter, We propose a general stochastic offline algorithm where observations, features, and gradient components can be sampled in a double distributed setting, i.e., with both features and observations distributed. Moreover, very technical analyses establish convergence properties of the algorithm under different conditions on the learning rate (diminishing to zero or constant). Furthermore, computational experiments in Spark demonstrate a superior performance of our algorithm versus a benchmark in early iterations of the algorithm, which is due to the stochastic components of the algorithm.In the second chapter, we explore how to apply optimization algorithms with fixed learning rate in online learning. Online learning is an appealing learning paradigm, which is of great interest in practice due to the recent emergence of large scale applications. Standard online learning assumes a finite number of samples while in practice data is streamed infinitely. In such a setting gradient descent with a diminishing learning rate does not work. In this chapter, we first introduce regret with rolling window, a performance metric, which measures the performance of an algorithm on every fixed number of contiguous samples. Meanwhile, we propose a family of algorithms with a constant or adaptive learning rate and provide very technical analyses establishing regret bound properties. We cover the convex setting showing the regret of the order of the square root of the size of the window in the constant and dynamic learning rate scenarios. Our proof is applicable also to the standard online setting where we provide analyses of the same regret order (the previous proofs have flaws). We also study a two layer neural network setting with reLU activation. In this case we establish that if initial weights are close to a stationary point, the same regret bound is attainable. We conduct computational experiments demonstrating a superior performance of the proposed algorithms.In the third paper, we employ text with side data to tackle the limitations like cold-start and non- transparency in latent factor models (e.g., matrix factorization). We introduce a hybrid generative probabilistic model that combines a neural network with a latent topic model, which is a four-level hierarchical Bayesian model. In the model, each document is modeled as a finite mixture over an4underlying set of topics and each topic is modeled as an infinite mixture over an underlying set of topic probabilities. Furthermore, each topic probability is modeled as a finite mixture over side data. In the context of text, the neural network provides an overview distribution about side data for the corresponding text, which is the prior distribution in LDA to help perform topic grouping. The approach is evaluated on several different datasets, where the model is shown to outperform standard LDA and Dirichlet-multinomial regression (DMR) in terms of topic grouping, model perplexity, classification and comment generation.In the forth paper, we propose a new algorithm for boosting Deep Convolutional Neural Net- works (BoostCNN) to combine the merits of dynamic feature selection and BoostCNN, and an- other new family of algorithms combining boosting and transformers. To learn these new models, we introduce subgrid selection and importance sampling strategies and propose a set of algorithms to incorporate boosting weights into a deep learning architecture based on a least squares objective function. These algorithms not only reduce the required manual effort for finding an appropriate network architecture but also result in superior performance and lower running time. Experiments show that the proposed methods outperform benchmarks on several fine-grained classification tasks. The systematic retrospect and summary of the optimization methods from the perspective of machine learning are of great significance, which can offer guidance for both developments of optimization and machine learning research.
590 $a School code: 0163.
650 4 $a Applied mathematics. $3 377601
650 4 $a Operations research. $3 182516
653 $a Algorithm design
653 $a Deep learning
653 $a Machine learning
653 $a Optimization
690 $a 0364
690 $a 0796
710 2 $a Northwestern University. $b Engineering Sciences and Applied Mathematics. $3 886704
773 0 $t Dissertations Abstracts International $g 83-07B.
790 $a 0163
791 $a Ph.D.
792 $a 2021
793 $a English
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28771127