Leveraging Data Deduplication to Improve the Performance of Primary Storage Systems in the Cloud

0
723
Leveraging Data Deduplication to Improve the Performance of Primary Storage Systems in the Cloud

Leveraging Data Deduplication to Improve the Performance of Primary Storage Systems in the Cloud

Abstract

The I / O bottleneck has become an increasingly daunting challenge for big data analytics in the cloud with the explosive growth in data volume. Recent studies have shown that  Leveraging Data Deduplication to Improve the Performance of Primary Storage Systems in the Cloud moderate to high data redundancy clearly exists in the cloud’s primary storage systems.

In addition, Leveraging Data Deduplication to Improve the Performance of Primary Storage Systems in the Cloud directly applying data deduplication to primary cloud storage systems is likely to cause space contention in disk memory and data fragmentation. Based on these observations, we propose a performance-oriented I / O deduplication, called POD, rather than a capacity-oriented I / O deduplication, exemplified by iDedup, to enhance the I / O performance of primary cloud storage systems without sacrificing the latter’s capacity savings.

Advantages

  • POD significantly improves the performance and saves capacity of primary storage systems in the Cloud

Disadvantages

  • From a performance perspective, the existing data deduplication schemes fail to consider these workload characteristics in primary storage systems, missing the opportunity to address one of the most important issues in primary storage, that of performance.

System Configuration

H/W System Configuration
Speed                   : 1.1 GHz
RAM                      : 256 MB(min)
Hard Disk              : 20 GB
Floppy Drive          : 1.44 MB
Key Board             : Standard Windows Keyboard
Mouse                  : Two or Three Button Mouse
Monitor                : SVGA
S/W System Configuration

Platform                     :  cloud computing

Operating system       : Windows Xp,7,
Server                       : WAMP/Apache
Working on                : Browser Like Firefox, IE

Conclusion

In this paper, we propose POD, a performance-oriented deduplication scheme, to improve the performance of primary cloud storage systems by leveraging I / O path data deduplication to remove redundant write requests while also saving storage space. It takes a request-based selective deduplication approach (Select-Dedupe) to deduplicate the I / O redundancy on the critical I / O path in a way that minimizes the problem of data fragmentation.