國立高雄大學圖資館 |

語系: 繁體中文

說明(常見問題)

圖資館首頁

登入

回首頁

切換: 標籤 | MARC模式 | ISBD

Algorithm and Hardware Design for Ef...

Arizona State University.

Algorithm and Hardware Design for Efficient Deep Learning Inference.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	Algorithm and Hardware Design for Efficient Deep Learning Inference.
作者:	Mohanty, Abinash.
出版者:	Ann Arbor : ProQuest Dissertations & Theses, 2018
面頁冊數:	151 p.
附註:	Source: Dissertation Abstracts International, Volume: 80-04(E), Section: B.
附註:	Adviser: Yu Cao.
Contained By:	Dissertation Abstracts International80-04B(E).
標題:	Electrical engineering.
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10981580
ISBN:	9780438727489

Algorithm and Hardware Design for Efficient Deep Learning Inference.
Mohanty, Abinash.

Algorithm and Hardware Design for Efficient Deep Learning Inference. - Ann Arbor : ProQuest Dissertations & Theses, 2018 - 151 p.

Source: Dissertation Abstracts International, Volume: 80-04(E), Section: B.

Thesis (Ph.D.)--Arizona State University, 2018.

Deep learning (DL) has proved itself be one of the most important developements till date with far reaching impacts in numerous fields like robotics, computer vision, surveillance, speech processing, machine translation, finance, etc. They are now widely used for countless applications because of their ability to generalize real world data, robustness to noise in previously unseen data and high inference accuracy. With the ability to learn useful features from raw sensor data, deep learning algorithms have out-performed tradinal AI algorithms and pushed the boundaries of what can be achieved with AI. In this work, we demonstrate the power of deep learning by developing a neural network to automatically detect cough instances from audio recorded in un-constrained environments. For this, 24 hours long recordings from 9 dierent patients is collected and carefully labeled by medical personel. A pre-processing algorithm is proposed to convert event based cough dataset to a more informative dataset with start and end of coughs and also introduce data augmentation for regularizing the training procedure. The proposed neural network achieves 92.3% leave-one-out accuracy on data captured in real world.

ISBN: 9780438727489Subjects--Topical Terms:

454503
Electrical engineering.

Algorithm and Hardware Design for Efficient Deep Learning Inference.
LDR:04900nmm a2200349 4500 001 547660
005 20190513114600.5
008 190715s2018 ||||||||||||||||| ||eng d
020 $a 9780438727489
035 $a (MiAaPQ)AAI10981580
035 $a (MiAaPQ)asu:18509
035 $a AAI10981580
040 $a MiAaPQ $c MiAaPQ
100 1 $a Mohanty, Abinash. $3 827040
245 1 0 $a Algorithm and Hardware Design for Efficient Deep Learning Inference.
260 1 $a Ann Arbor : $b ProQuest Dissertations & Theses, $c 2018
300 $a 151 p.
500 $a Source: Dissertation Abstracts International, Volume: 80-04(E), Section: B.
500 $a Adviser: Yu Cao.
502 $a Thesis (Ph.D.)--Arizona State University, 2018.
520 $a Deep learning (DL) has proved itself be one of the most important developements till date with far reaching impacts in numerous fields like robotics, computer vision, surveillance, speech processing, machine translation, finance, etc. They are now widely used for countless applications because of their ability to generalize real world data, robustness to noise in previously unseen data and high inference accuracy. With the ability to learn useful features from raw sensor data, deep learning algorithms have out-performed tradinal AI algorithms and pushed the boundaries of what can be achieved with AI. In this work, we demonstrate the power of deep learning by developing a neural network to automatically detect cough instances from audio recorded in un-constrained environments. For this, 24 hours long recordings from 9 dierent patients is collected and carefully labeled by medical personel. A pre-processing algorithm is proposed to convert event based cough dataset to a more informative dataset with start and end of coughs and also introduce data augmentation for regularizing the training procedure. The proposed neural network achieves 92.3% leave-one-out accuracy on data captured in real world.
520 $a Deep neural networks are composed of multiple layers that are compute/memory intensive. This makes it difficult to execute these algorithms real-time with low power consumption using existing general purpose computers. In this work, we propose hardware accelerators for a traditional AI algorithm based on random forest trees and two representative deep convolutional neural networks (AlexNet and VGG). With the proposed acceleration techniques, ~ 30x performance improvement was achieved compared to CPU for random forest trees. For deep CNNS, we demonstrate that much higher performance can be achieved with architecture space exploration using any optimization algorithms with system level performance and area models for hardware primitives as inputs and goal of minimizing latency with given resource constraints. With this method, ~30GOPs performance was achieved for Stratix V FPGA boards.
520 $a Hardware acceleration of DL algorithms alone is not always the most ecient way and sucient to achieve desired performance. There is a huge headroom available for performance improvement provided the algorithms are designed keeping in mind the hardware limitations and bottlenecks. This work achieves hardware-software co-optimization for Non-Maximal Suppression (NMS) algorithm. Using the proposed algorithmic changes and hardware architecture.
520 $a With CMOS scaling coming to an end and increasing memory bandwidth bottlenecks, CMOS based system might not scale enough to accommodate requirements of more complicated and deeper neural networks in future. In this work, we explore RRAM crossbars and arrays as compact, high performing and energy efficient alternative to CMOS accelerators for deep learning training and inference. We propose and implement RRAM periphery read and write circuits and achieved ~3000x performance improvement in online dictionary learning compared to CPU.
520 $a This work also examines the realistic RRAM devices and their non-idealities. We do an in-depth study of the effects of RRAM non-idealities on inference accuracy when a pretrained model is mapped to RRAM based accelerators. To mitigate this issue, we propose Random Sparse Adaptation (RSA), a novel scheme aimed at tuning the model to take care of the faults of the RRAM array on which it is mapped. Our proposed method can achieve inference accuracy much higher than what traditional Read-Verify-Write (R-V-W) method could achieve. RSA can also recover lost inference accuracy 100x~1000x faster compared to R-V-W. Using 32-bit high precision RSA cells, we achieved ~10% higher accuracy using fautly RRAM arrays compared to what can be achieved by mapping a deep network to an 32 level RRAM array with no variations.
590 $a School code: 0010.
650 4 $a Electrical engineering. $3 454503
650 4 $a Artificial intelligence. $3 194058
690 $a 0544
690 $a 0800
710 2 $a Arizona State University. $b Electrical Engineering. $3 708713
773 0 $t Dissertation Abstracts International $g 80-04B(E).
790 $a 0010
791 $a Ph.D.
792 $a 2018
793 $a English
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10981580