國立高雄大學圖資館 |

Language: English

Back

Performance bottlenecks on large-sca...

Kunz, Robert C.

Performance bottlenecks on large-scale shared-memory multiprocessors.

Record Type:	Electronic resources : Monograph/item
Title/Author:	Performance bottlenecks on large-scale shared-memory multiprocessors.
Author:	Kunz, Robert C.
Description:	137 p.
Notes:	Adviser: John Hennessy.
Notes:	Source: Dissertation Abstracts International, Volume: 65-11, Section: B, page: 5934.
Contained By:	Dissertation Abstracts International65-11B.
Subject:	Engineering, Electronics and Electrical.
Online resource:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3153509
ISBN:	0496138243

Performance bottlenecks on large-scale shared-memory multiprocessors.
Kunz, Robert C.

Performance bottlenecks on large-scale shared-memory multiprocessors. - 137 p.

Adviser: John Hennessy.

Thesis (Ph.D.)--Stanford University, 2005.

Even setting aside contention, the coherence protocol is a smaller bottleneck than other system aspects including the operating system's scheduling policies and the applications effective or ineffective use of the cache coherent memory system. Programmers still need to tune their programs to a specific architecture; such tuning limits portability. While coherence protocols might be able to provide a reduction in remote communication, the mismatch between an application and the architecture are often more significant and prevent major performance improvements.

ISBN: 0496138243Subjects--Topical Terms:

226981
Engineering, Electronics and Electrical.

Performance bottlenecks on large-scale shared-memory multiprocessors.
LDR:03548nmm _2200289 _450 001 167294
005 20061005085843.5
008 090528s2005 eng d
020 $a 0496138243
035 $a 00197910
040 $a UnM $c UnM
100 0 $a Kunz, Robert C. $3 237440
245 1 0 $a Performance bottlenecks on large-scale shared-memory multiprocessors.
300 $a 137 p.
500 $a Adviser: John Hennessy.
500 $a Source: Dissertation Abstracts International, Volume: 65-11, Section: B, page: 5934.
502 $a Thesis (Ph.D.)--Stanford University, 2005.
520 # $a Even setting aside contention, the coherence protocol is a smaller bottleneck than other system aspects including the operating system's scheduling policies and the applications effective or ineffective use of the cache coherent memory system. Programmers still need to tune their programs to a specific architecture; such tuning limits portability. While coherence protocols might be able to provide a reduction in remote communication, the mismatch between an application and the architecture are often more significant and prevent major performance improvements.
520 # $a Large-scale multiprocessors continue to remain difficult to program because the memory system alone cannot eliminate the need for programmers to remain aware of implicit communication. The software libraries, compiler, and operating system must apply complex machine-specific optimizations to reduce second- and third-order performance bottlenecks. Therefore, the memory system should provide meaningful visibility and feedback to programming monitoring tools and compilers. Without such tools to assist programmers, the programming advantages of a coherent shared memory multiprocessor versus a message passing multiprocessor are likely to be small for larger processor counts.
520 # $a Researchers working on multiprocessor memory systems have advocated easing the programming burden by adding enhancements to the memory system designed to reduce memory latency and coherence overhead. Analogous to the lessons learned during the RISC movement over 20 years ago, simpler memory system designs are faster than more complicated ones, primarily because the additional contention present in the memory system overwhelms minor reductions in latency that more complicated protocols provide. Thus, architects should focus on minimizing memory controller occupancy on large-scale multiprocessors rather than just latency.
520 # $a While multiprocessors have existed for many years, most parallel architectures are difficult to program efficiently. The key challenge is how to simplify the programming model so that programmers can write portable highly efficient parallel programs with minimal effort. For example, cache-coherent shared-memory architectures trade the memory system complexity of the coherence protocol for a simpler programming model that does not require communication to be programmed explicitly. Using the FLASH machine, a large-scale cc-NUMA multiprocessor, this dissertation explores the interaction between hardware and software design trade-offs and quantifies the performance gains of memory system enhancements.
590 $a School code: 0212.
650 # 0 $a Engineering, Electronics and Electrical. $3 226981
690 $a 0544
710 0 # $a Stanford University. $3 212607
773 0 # $g 65-11B. $t Dissertation Abstracts International
790 $a 0212
790 1 0 $a Hennessy, John, $e advisor
791 $a Ph.D.
792 $a 2005
856 4 0 $u http://libsw.nuk.edu.tw:81/login?url=http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3153509 $z http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3153509