Language:
English
繁體中文
Help
圖資館首頁
Login
Back
Switch To:
Labeled
|
MARC Mode
|
ISBD
Fault-tolerance techniques for high-...
~
Herault, Thomas.
Fault-tolerance techniques for high-performance computing
Record Type:
Electronic resources : Monograph/item
Title/Author:
Fault-tolerance techniques for high-performance computingedited by Thomas Herault, Yves Robert.
other author:
Herault, Thomas.
Published:
Cham :Springer International Publishing :2015.
Description:
ix, 320 p. :ill., digital ;24 cm.
Contained By:
Springer eBooks
Subject:
Fault-tolerant computing.
Online resource:
http://dx.doi.org/10.1007/978-3-319-20943-2
ISBN:
9783319209432 (electronic bk.)
Fault-tolerance techniques for high-performance computing
Fault-tolerance techniques for high-performance computing
[electronic resource] /edited by Thomas Herault, Yves Robert. - Cham :Springer International Publishing :2015. - ix, 320 p. :ill., digital ;24 cm. - Computer communications and networks,1617-7975. - Computer communications and networks..
Part I: General Overview -- Fault-Tolerance Techniques for High-Performance Computing -- Part II: Technical Contributions -- Errors and Faults -- Fault-Tolerant MPI -- Using Replication for Resilience on Exascale Systems -- Energy-Aware Check pointing Strategies.
This timely text/reference presents a comprehensive overview of fault tolerance techniques for high-performance computing (HPC) The text opens with a detailed introduction to the concepts of checkpoint protocols and scheduling algorithms, prediction, replication, silent error detection and correction, together with some application-specific techniques such as algorithm-based fault tolerance. Emphasis is placed on analytical performance models. This is then followed by a review of general-purpose techniques, including several checkpoint and rollback recovery protocols. Relevant execution scenarios are also evaluated and compared through quantitative models. Topics and features: Includes self-contained contributions from an international selection of preeminent experts Provides a survey of resilience methods and performance models Examines the various sources for errors and faults in large-scale systems, detailing their characteristics, with a focus on modeling, detection and prediction Reviews the spectrum of techniques that can be applied to design a fault-tolerant message passing interface Investigates different approaches to replication, comparing these to the traditional checkpoint-recovery approach Discusses the challenge of energy consumption of fault-tolerance methods in extreme-scale systems, proposing a methodology to estimate such energy consumption This authoritative volume is essential reading for all researchers and graduate students involved in high-performance computing. Dr. Thomas Herault is a Research Scientist in the Innovative Computing Laboratory (ICL) at the University of Tennessee Knoxville, TN, USA. Dr. Yves Robert is a Professor in the Laboratory of Parallel Computing at the Ecole Normale Superieure de Lyon, France, and a Visiting Research Scholar in the ICL.
ISBN: 9783319209432 (electronic bk.)
Standard No.: 10.1007/978-3-319-20943-2doiSubjects--Topical Terms:
324957
Fault-tolerant computing.
LC Class. No.: QA76.9.F38
Dewey Class. No.: 004.2
Fault-tolerance techniques for high-performance computing
LDR
:03089nmm a2200325 a 4500
001
472557
003
DE-He213
005
20160223100031.0
006
m d
007
cr nn 008maaau
008
160316s2015 gw s 0 eng d
020
$a
9783319209432 (electronic bk.)
020
$a
9783319209425 (paper)
024
7
$a
10.1007/978-3-319-20943-2
$2
doi
035
$a
978-3-319-20943-2
040
$a
GP
$c
GP
041
0
$a
eng
050
4
$a
QA76.9.F38
072
7
$a
UYD
$2
bicssc
072
7
$a
COM074000
$2
bisacsh
082
0 4
$a
004.2
$2
23
090
$a
QA76.9.F38
$b
F263 2015
245
0 0
$a
Fault-tolerance techniques for high-performance computing
$h
[electronic resource] /
$c
edited by Thomas Herault, Yves Robert.
260
$a
Cham :
$b
Springer International Publishing :
$b
Imprint: Springer,
$c
2015.
300
$a
ix, 320 p. :
$b
ill., digital ;
$c
24 cm.
490
1
$a
Computer communications and networks,
$x
1617-7975
505
0
$a
Part I: General Overview -- Fault-Tolerance Techniques for High-Performance Computing -- Part II: Technical Contributions -- Errors and Faults -- Fault-Tolerant MPI -- Using Replication for Resilience on Exascale Systems -- Energy-Aware Check pointing Strategies.
520
$a
This timely text/reference presents a comprehensive overview of fault tolerance techniques for high-performance computing (HPC) The text opens with a detailed introduction to the concepts of checkpoint protocols and scheduling algorithms, prediction, replication, silent error detection and correction, together with some application-specific techniques such as algorithm-based fault tolerance. Emphasis is placed on analytical performance models. This is then followed by a review of general-purpose techniques, including several checkpoint and rollback recovery protocols. Relevant execution scenarios are also evaluated and compared through quantitative models. Topics and features: Includes self-contained contributions from an international selection of preeminent experts Provides a survey of resilience methods and performance models Examines the various sources for errors and faults in large-scale systems, detailing their characteristics, with a focus on modeling, detection and prediction Reviews the spectrum of techniques that can be applied to design a fault-tolerant message passing interface Investigates different approaches to replication, comparing these to the traditional checkpoint-recovery approach Discusses the challenge of energy consumption of fault-tolerance methods in extreme-scale systems, proposing a methodology to estimate such energy consumption This authoritative volume is essential reading for all researchers and graduate students involved in high-performance computing. Dr. Thomas Herault is a Research Scientist in the Innovative Computing Laboratory (ICL) at the University of Tennessee Knoxville, TN, USA. Dr. Yves Robert is a Professor in the Laboratory of Parallel Computing at the Ecole Normale Superieure de Lyon, France, and a Visiting Research Scholar in the ICL.
650
0
$a
Fault-tolerant computing.
$3
324957
650
0
$a
High performance computing.
$3
211079
650
1 4
$a
Computer Science.
$3
212513
650
2 4
$a
System Performance and Evaluation.
$3
273898
650
2 4
$a
Performance and Reliability.
$3
277564
650
2 4
$a
Numeric Computing.
$3
275524
700
1
$a
Herault, Thomas.
$3
727766
700
1
$a
Robert, Yves.
$3
727767
710
2
$a
SpringerLink (Online service)
$3
273601
773
0
$t
Springer eBooks
830
0
$a
Computer communications and networks.
$3
560387
856
4 0
$u
http://dx.doi.org/10.1007/978-3-319-20943-2
950
$a
Computer Science (Springer-11645)
based on 0 review(s)
ALL
電子館藏
Items
1 records • Pages 1 •
1
Inventory Number
Location Name
Item Class
Material type
Call number
Usage Class
Loan Status
No. of reservations
Opac note
Attachments
000000118662
電子館藏
1圖書
電子書
EB QA76.9.F38 F263 2015
一般使用(Normal)
On shelf
0
1 records • Pages 1 •
1
Multimedia
Multimedia file
http://dx.doi.org/10.1007/978-3-319-20943-2
Reviews
Add a review
and share your thoughts with other readers
Export
pickup library
Processing
...
Change password
Login