EPAS: A Sampling Based Similarity Identification Algorithm for the Cloud

0
760
EPAS: A Sampling Based Similarity Identification Algorithm for the Cloud

EPAS: A Sampling Based Similarity Identification Algorithm for the Cloud

Abstract

Data explosive growth brings new challenges to cloud environment data storage and management. EPAS: A Sampling Based Similarity Identification Algorithm for the Cloud These data usually need to be processed in the cloud in a timely fashion. Thus, any increase in latency can cause the enterprises a massive loss. Detection of similarity plays a very important role in data management.
This EPAS: A Sampling Based Similarity Identification Algorithm for the Cloud paper proposes an Enhanced Position-Aware Sampling Algorithm (EPAS) to identify cloud file similarity by length of the modulo file. EPAS simultaneously samples data blocks from the head and the tail of the modulated file to avoid the change in position. Meanwhile, an improved metric is proposed to measure the similarity between different files and close the actual probability of detection. This paper also describes a query algorithm to reduce the time overhead for similarity detection.

System Configuration

H/W System Configuration
Speed                   : 1.1 GHz
RAM                      : 256 MB(min)
Hard Disk              : 20 GB
Floppy Drive          : 1.44 MB
Key Board             : Standard Windows Keyboard
Mouse                  : Two or Three Button Mouse
Monitor                : SVGA
S/W System Configuration

Platform                     :  cloud computing

Operating system       : Windows Xp,7,
Server                       : WAMP/Apache
Working on                : Browser Like Firefox, IE

Conclusion

We propose an Enhanced Position-Aware Sampling algorithm for the cloud environment in this paper. Comprehensive experiments are carried out to select optimal parameters for EPAS. This paper introduces the corresponding analysis and discussion of the selection of parameters. The evaluation of precision and recall shows that EPAS is very effective in detecting file similarity in contrast to Shingle, Simhash, Traits and PAS.