SC|05 SC|05 Gateway to Discovery
About Interactive Schedule Programs Registration Exhibits Initiatives & Challenges News & Press Hotel & Travel




You currently have 0 events on your schedule.

Schedule: November 12-18th 2005
Entire WeekSaturdaySundayMondayTuesdayWednesdayThursdayFriday

Transparent Incremental Checkpointing at Kernel Level: A Foundation for Fault Tolerance for Parallel Computers

Session: Cluster Environments

Event Type: Paper

Time: 11:30am - 12:00pm

Session Chair: Dongyan Xu

Speaker(s): Roberto Gioiosa, Jose Carlos Sancho, Song Jiang, Fabrizio Petrini, Kei Davis

Location: 608-609

Abstract:

We describe the software architecture, technical features, and performance of TICK (Transparent Incremental Checkpointer at Kernel level), a system-level checkpointer implemented as a kernel thread, specifically designed to provide fault tolerance in Linux clusters. This implementation, based on the 2.6.11 Linux kernel, provides the essential functionality for transparent, highly responsive, and efficient fault tolerance based on full or incremental checkpointing at system level. TICK is completely user-transparent and does not require any changes to user code or system libraries; it is highly responsive: an interrupt, such as a timer interrupt, can trigger a checkpoint in as little as 2.5us; and it supports incremental and full checkpoints with minimal overhead---less than 6% with full checkpointing to disk performed as frequently as once per minute.

This paper can be found in the ACM and IEEE Digital Libaries
Click here for ACM
Click here for IEEE



Chair/Speaker Details:

Dongyan Xu (Chair)
Purdue University

Roberto Gioiosa
Los Alamos National Laboratory

Jose Carlos Sancho
LANL

Song Jiang
LANL

Fabrizio Petrini
Pacific Northwest National Laboratory

Kei Davis
LANL