Dynamic and Fault-Tolerant Clustering for Scientific Workflows

0
253
Dynamic and Fault-Tolerant Clustering for Scientific Workflows

Dynamic and Fault-Tolerant Clustering for Scientific Workflows

Abstract

Task clustering has proven to be an effective method to reduce execution overhead and to improve the computational granularity of scientific workflow tasks executing on distributed resources. Dynamic and Fault-Tolerant Clustering for Scientific Workflows However, a job composed of multiple tasks may have a higher risk of suffering from failures than a single task job.
In this Dynamic and Fault-Tolerant Clustering for Scientific Workflows paper, we conduct a theoretical analysis of the impact of transient failures on the runtime performance of scientific workflow executions. We also propose three fault-tolerant clustering strategies to improve runtime performance of workflow executions in faulty execution environments.

System Configuration

H/W System Configuration
Speed                   : 1.1 GHz
RAM                      : 256 MB(min)
Hard Disk              : 20 GB
Floppy Drive          : 1.44 MB
Key Board             : Standard Windows Keyboard
Mouse                  : Two or Three Button Mouse
Monitor                : SVGA
S/W System Configuration

Platform                     :  cloud computing

Operating system       : Windows Xp,7,
Server                       : WAMP/Apache
Working on                : Browser Like Firefox, IE

Conclusion

In this work, we model transient failures in a distributed environment and evaluate their influence on task clustering. We proposed three dynamic clustering methods to improve the fault tolerance of task clustering and applied them to five widely used scientific workflows. Experimental results showed that the proposed methods significantly improve the makepan of the workflow compared to an existing task clustering method used in workflow management systems. The Dynamic Reclustering method, in particular, performed best among all methods since it could adjust the clustering size based on the Maximum Likelihood Estimation of task runtime, system overheads, and inter-arrival failure time.