國立高雄大學圖資館 |

登入

回首頁

Co-designing Model Compression Algorithms and Hardware Accelerators for Efficient Deep Learning.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	Co-designing Model Compression Algorithms and Hardware Accelerators for Efficient Deep Learning.
作者:	Zhao, Ritchie.
出版者:	Ann Arbor : ProQuest Dissertations & Theses, 2020
面頁冊數:	131 p.
附註:	Source: Dissertations Abstracts International, Volume: 81-12, Section: B.
附註:	Advisor: Zhang, Zhiru.
Contained By:	Dissertations Abstracts International81-12B.
標題:	Electrical engineering.
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=27993782
ISBN:	9798617029545

Co-designing Model Compression Algorithms and Hardware Accelerators for Efficient Deep Learning.
Zhao, Ritchie.

Co-designing Model Compression Algorithms and Hardware Accelerators for Efficient Deep Learning. - Ann Arbor : ProQuest Dissertations & Theses, 2020 - 131 p.

Source: Dissertations Abstracts International, Volume: 81-12, Section: B.

Thesis (Ph.D.)--Cornell University, 2020.

This item must not be sold to any third party vendors.

Over the past decade, machine learning (ML) with deep neural networks (DNNs) has become extremely successful in a variety of application domains including computer vision, natural language processing, and game AI. DNNs are now a primary topic of academic research among computer scientists, and a key component of commercial technologies such as web search, recommendation systems, and self-driving vehicles. However, factors such as the growing complexity of DNN models, the diminished benefits of technology scaling, and the proliferation of resource-constrained edge devices are driving a demand for higher DNN performance and energy efficiency. Consequently, neural network training and inference have begun to shift from commodity general-purpose processors (e.g., CPUs and GPUs) to custom-built hardware accelerators (e.g., FPGAs and ASICs). In line with this trend, there has been extensive research on specialized algorithms and architectures for dedicated DNN processors. Furthermore, the rapid pace of innovation in DNN algorithm space is mismatched with the time-consuming process of hardware implementation. This has generated increased interest in novel design methodologies and tools which can reduce the human effort and turn-around time of hardware design.This thesis studies how low-precision quantization and structured matrices can improve the performance and energy efficiency of DNNs running on specialized accelerators. We co-design both the DNN compression algorithms and the accelerator architectures, enabling us to evaluate the impact of our ideas on real hardware. In the process, we examine the use of high-level synthesis tools in reducing the hardware design effort. This thesis represents a cross-domain research effort at efficient deep learning. First, we propose specialized architectures for accelerating binarized neural networks on FPGA. Second, we study novel high-level synthesis techniques to reduce the manual effort in FPGA accelerator design. Third, we show a fundamental link between group convolutions and circulant matrices, two previously disparate lines of research in DNN compression. Using this insight we propose HadaNet, an alternative to circulant compression which achieve identical accuracy with asymptotically fewer multiplications. Fourth, we present outlier channel splitting, a technique to improve DNN weight quantization by removing outliers from the weight distribution without arduous retraining. Finally, we show preliminary results on overwrite quantization, a technique which address outliers in DNN activation quantization using extremely lightweight architectural extensions to a spatial accelerator template.

ISBN: 9798617029545Subjects--Topical Terms:

454503
Electrical engineering.
Subjects--Index Terms:

Computer architecture

Co-designing Model Compression Algorithms and Hardware Accelerators for Efficient Deep Learning.
LDR:03840nmm a2200373 4500 001 616369
005 20220513114319.5
008 220920s2020 ||||||||||||||||| ||eng d
020 $a 9798617029545
035 $a (MiAaPQ)AAI27993782
035 $a AAI27993782
040 $a MiAaPQ $c MiAaPQ
100 1 $a Zhao, Ritchie. $0 (orcid)0000-0003-1656-9165 $3 915522
245 1 0 $a Co-designing Model Compression Algorithms and Hardware Accelerators for Efficient Deep Learning.
260 1 $a Ann Arbor : $b ProQuest Dissertations & Theses, $c 2020
300 $a 131 p.
500 $a Source: Dissertations Abstracts International, Volume: 81-12, Section: B.
500 $a Advisor: Zhang, Zhiru.
502 $a Thesis (Ph.D.)--Cornell University, 2020.
506 $a This item must not be sold to any third party vendors.
520 $a Over the past decade, machine learning (ML) with deep neural networks (DNNs) has become extremely successful in a variety of application domains including computer vision, natural language processing, and game AI. DNNs are now a primary topic of academic research among computer scientists, and a key component of commercial technologies such as web search, recommendation systems, and self-driving vehicles. However, factors such as the growing complexity of DNN models, the diminished benefits of technology scaling, and the proliferation of resource-constrained edge devices are driving a demand for higher DNN performance and energy efficiency. Consequently, neural network training and inference have begun to shift from commodity general-purpose processors (e.g., CPUs and GPUs) to custom-built hardware accelerators (e.g., FPGAs and ASICs). In line with this trend, there has been extensive research on specialized algorithms and architectures for dedicated DNN processors. Furthermore, the rapid pace of innovation in DNN algorithm space is mismatched with the time-consuming process of hardware implementation. This has generated increased interest in novel design methodologies and tools which can reduce the human effort and turn-around time of hardware design.This thesis studies how low-precision quantization and structured matrices can improve the performance and energy efficiency of DNNs running on specialized accelerators. We co-design both the DNN compression algorithms and the accelerator architectures, enabling us to evaluate the impact of our ideas on real hardware. In the process, we examine the use of high-level synthesis tools in reducing the hardware design effort. This thesis represents a cross-domain research effort at efficient deep learning. First, we propose specialized architectures for accelerating binarized neural networks on FPGA. Second, we study novel high-level synthesis techniques to reduce the manual effort in FPGA accelerator design. Third, we show a fundamental link between group convolutions and circulant matrices, two previously disparate lines of research in DNN compression. Using this insight we propose HadaNet, an alternative to circulant compression which achieve identical accuracy with asymptotically fewer multiplications. Fourth, we present outlier channel splitting, a technique to improve DNN weight quantization by removing outliers from the weight distribution without arduous retraining. Finally, we show preliminary results on overwrite quantization, a technique which address outliers in DNN activation quantization using extremely lightweight architectural extensions to a spatial accelerator template.
590 $a School code: 0058.
650 4 $a Electrical engineering. $3 454503
650 4 $a Computer engineering. $3 212944
650 4 $a Artificial intelligence. $3 194058
650 4 $a Computer science. $3 199325
653 $a Computer architecture
653 $a FPGA
653 $a Hardware accelerators
653 $a Machine learning
690 $a 0544
690 $a 0984
690 $a 0464
690 $a 0800
710 2 $a Cornell University. $b Electrical and Computer Engineering. $3 915523
773 0 $t Dissertations Abstracts International $g 81-12B.
790 $a 0058
791 $a Ph.D.
792 $a 2020
793 $a English
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=27993782