**Explicit Minimum Storage Regenerating Codes**

## Abstract

**Explicit Minimum Storage Regenerating Codes **management report in data mining **.**In distributed storage, a file is stored in a set of nodes and protected by erasure-correcting codes. Regenerating code is a type of code with two properties: first, it can reconstruct the entire file in the presence of any r node erasures for some specified integer r; second, it can efficiently repair an erased node from any subset of remaining nodes with a given size. In the repair process, the amount of information transmitted from each node normalized by the storage size per node is termed repair bandwidth (fraction).

When the storage size per node is minimized, the repair bandwidth is lower bounded by 1/r, where r is the number of parity nodes. A code attaining this lower bound is said to have optimal repair. We consider codes with minimum storage size per node and optimal repair, called MSR (minimum storage regenerating) codes. In particular, if an MSR code has r parities and any r erasures occur, then by transmitting all the information from the remaining nodes, the original file can be reconstructed. On the other hand, if only one erasure occurs, only a fraction of 1/r of the information in each remaining node needs to be transmitted.

If we view each node as a vector or a column over some field, then the code forms a 2D array. Given the length of the column l, the number of parities r, we construct explicitly high-rate MSR codes. The number of systematic nodes of our construction is (r + 1)logr l, which is longer than previously known results. Besides, we construct MSR codes with other desirable properties: first codes with low complexity when the information is updated, and second, codes with low access or storage node I/O cost during repair.

## INTRODUCTION

We introduce explicit minimum storage regenerating codes. Regenerating codes have two properties: first one remote user to get same file and same size. Second, failed file can be regenerate by new node. If any unauthorized person modified particular file then that file will be failed. Regenerating file help the end user to get whole data as it is in old file. We can provide security key on each node. Even we come to know were particular file has been modified and were particular node has been failed.No proposed regeneration of all file. It will leave few data sets while regenerating. It takes more time to regenerate a file. We take one source file that file will be split in four nodes, that file will stored in each nodes when unauthorized person will modify any node it will corrupt the file.

After that file shows which node will modify and that node will regenerate the new node in same size that file will get the end user. Since the repair operation takes a lot of system data to measure, and because the system cannot access the unsuccessful hub until recovery, so the reduction of the repair information measurement. Contrasted and repairing disappointment independently the agreeable repair of various disappointment additionally spare data transfer capacity utilization when different disappointment are being repaired. The new node exchange a unsuccessfully systematic node ought to have identical knowledge as within the unsuccessful node. This is after termed actually regeneration are lot of easer to take care of since no further communication and process with reference to the new node co-efficient is needed.

The family of regenerating codes defined by the parameters n, k, d, α, β,M, was proposed in [8] for the following distributed storage model. Consider a file of size M bits to be encoded into a length n codeword, where each symbol of the encoding is stored on a distinct (storage) node of size α bits. Consider codes satisfying the following two properties: (i) Reconstruction: the entire file can be decoded from the information stored on any k nodes; (ii) Repair: in case of a node erasure (failure) the information can be functionally repaired from any d ≤ n − 1 helper nodes, each transmitting β bits of information. Functional repair means that the newly repaired node does not need to be identical to the erased node, but only to satisfy the reconstruction condition. It was shown that by the above two properties, the parameters satisfy k−1 ∑ i=0 min{(d − i)β, α} ≥ M.

For fixed n, k, d, α,M, codes that can repair any node erausre with β bits of transmission satisfying (1) with equality are called optimal repair codes. Codes with reconstruction property and optimal repair property are called regenerating codes. We study in the paper Minimum Storage Regenerating (MSR) codes, which have the minimal possible storage size, i.e., α = M k . Furthermore, we consider the case with d = n − 1 helper nodes, and assume that the codes are systematic, i.e., the first k nodes store the information itself, and the last r = n − k are the parity nodes. The code is assumed to have k > r, namely, to be high rate. Typically the information is viewed as an array of n columns (also referred to as nodes or symbols), where each column stores a vector of information over some finite field and it corresponds to one of the n codeword symbols.

The length of each vector is denoted by l and its elements are referred to as sub-symbols. These type of codes are called array codes ,and they are widely used in data storage to protect information against erasures due to their error correction capability and low computational complexity. As typically required from storage systems, we restrict our focus to codes that perform exact repair, i.e., the erased information is exactly recovered during the repair process. Note that the lower bound still holds even if we consider exact repair, since it is a special case of functional repair. The efficiency of the repair is measured by the repair bandwidth, β/α, which is the amount of transmitted information from each helper node, normalized by the amount of information it stores. For MSR codes with d = n − 1, constraint becomes β ≥ M/k(n − k), and thus the repair bandwidth for a single erasure satisfies β α ≥ 1 n − k = 1 r . Since systematic nodes account for the majority of the nodes, and repairing them is much more crucial than repairing the parity nodes, we focus on the optimal repair of only the systematic part. We note here that from the derivation, still holds even if we restrict to optimal repair of only the systematic nodes1 . Finally, in this paper an MSR code is referred to a code that can optimally repair any of its systematic nodes, i.e., the amount of transmitted information is equal to 1/r amount of stored information for each of the n − 1 helper nodes.

The main contribution of this paper is as follows:

1) For any two positive integers l,r, we construct a highrate MSR code with r parity nodes, column length l, and k = (r + 1)logr l systematic nodes. In particular, for r = 2 parity nodes we obtain an MSR code with k = 3 log2 l, moreover, the code is defined over a finite field of size 1 + 2 log2 l.

2) We rigorously state the necessary and sufficient conditions of linear optimal repair codes, and thus enable explicit code constructions and simplify proofs of repair optimality.

3) We design optimal update codes with 2 parities and k = 2 log2 l systematic nodes. This construction exceeds the upper bound of k ≤ log2 l given by [18] for optimal update and diagonal encoding matrices. Diagonal encoding matrices means that the encoding is done only within each row in the information array. However our construction allows mixing of different rows in encoding. As a result, we can see a fundamental difference between these two types of codes.

4) We construct a family of codes that further reduces the access. We introduce a technique that equivalently transforms a linear code through block-diagonal matrix. This technique can be applied to an arbitrary MSR code and therefore can be a useful tool for transforming regenerating codes.

**PROBLEM DEFINATION**

When any unauthorized person access or modify a particular file. We can’t get same data present in file before. Even we can’t identify where, the modify has been done and which file has been modified. By regenerating file we can help the end user to get same file and same size. If any unauthorized person modified particular file then that file will be failed and that failed file can be regenerate by new node. Regenerating file help the end user to get whole data as it is in old file. We can provide security it terms of node. Where, in new node the old file will be same as it is. Even we come to know were particular file has been modified and were particular node has been failed

**IMPLEMENTATION**

The MSR code has two properties. In any case, the perfect repair attribute indicates the ability to repair in any k precision center errors. More specifically, if the effective center i∈ [k] is turned over, then at each point of the n-1 remaining center is traversed by traversing the traversal of repair method 1/r sub image, the sub- in a small part of the 1/r it stores the propagation property, that is, the information can be decoded from the information set on any k center points. For i=1, n, let Ci ∈ Fl be the i=th fragment code. Allowing the necessary k-point to store the information itself. Each of the equilibrium center point Ck + i, i = 1, r, is the linear limit of the information center point, that is the existence of the size of l × l, Ai, j, j = 1 reversible coding grid, k to such a degree, to consider the perfect repair of a code. It is considered that the center point is cleared and is repaired by sending information of a fraction of the information of 1/r of the information stored there in from each of the n-1 of the collaborator center points. The perfect repair code for the accomplice (n, k, l) is reminiscent of the following logical space attribute: for any i ∈ [k], there are subspace Si, j ⊆F1, j 6 = i, j ∈ [n] / R, for all j ∈ [k] \, s ∈ [r]

## System Configuration:

**H/W System Configuration:-**

Processor : Pentium IV

Speed : 1 Ghz

RAM : 512 MB (min)

Hard Disk : 20GB

Keyboard : Standard Keyboard

Mouse : Two or Three Button Mouse

Monitor : LCD/LED Monitor

**S/W System Configuration:-**

Operating System : Windows XP/7

Programming Language : Java/J2EE

Software Version : JDK 1.7 or above

Database : MYSQL

**CONCLUDING REMARKS**

In this paper, we presented a family of codes with parameters (n = (r + 1)m + r, k = (r + 1)m, l = r m) and they are the longest known high-rate MSR codes. The codes were constructed using eigenspaces of the encoding matrices, such that they satisfy the subspace property. This property gives more insights on the structure of the codes, and simplifies the proof of optimal repair. Next we make some observations on the code paraemters, and then point out open problems for MSR codes. If we require that the code rate approaches 1, i.e., r being a constant and m goes to infinity, then the column length l is exponential in the number of systematic nodes k. However, if we require the code rate to be roughly a constant fraction, i.e., m being a constant and r goes to infinity, then l is polynomial in k. Therefore, depending on the application and therefore the different codes rate, one can obtain different asymptotic characteristics of the number of systematic nodes. For n ≥ 2k or k ≤ r (low code rate), constructions in [13], [16] give the column length l = r. With some modifications, this column length is feasible for all k ≤ r + 1. In our construction (high code rate), the column length is l = r k r+1 .

Fix the value of k, we can draw the graph of the column length with respect to the number of parities. Even though we need integer values for k,r, l, this graph still shows the trend of the code parameters. For example, this relationship is shown in Figure 9 for k = 10. These two regimes coincide when r = k − 1 = 9. Actually, we can see that these two constructions are identical for r = k − 1. Note that our construction only considers the repair of systematic nodes, so is only practical when k >> r + 1. It is interesting to investigate the actual shape of this curve, and to understand for fixed k how the column length l changes with the number of parities r. Moreover, it is still an open problem what the largest k is given the column length l for an optimal repair code. Moreover, the bound of the finite field size used for the codes may not be tight. Unlike the constructions in this paper, the field size may be reduced when we assume that the encoding matrices do not have eigenvalues or eigenvectors, namely, they are not diagonalizable. In this paper, we addressed the repair of systematic nodes only. See [19] for a construction of codes that optimally repair systematic and parity codes. In that work, the encoding matrices are constructed directly instead of from their eigenspaces. Regardless of this difference, the subspace property need to be satisfied to ensure optimal repair. It is an interesting problem to construct codes with large k for a fixed column length l, such that they repair any node optimally. At last, one possible application of the codes is to store hot/cold data. We notice that in our construction, some of the nodes have lower access ratio than others during repair. Since hot data is more commonly requested, we can put the hot data in the low-access nodes, and cold data in the others. More generally, appropriate measurement of repair cost should be defined, and outer bounds and code constructions can be considered for hot/cold data storage.

## Conclusion

**Explicit Minimum Storage Regenerating Codes management report in data mining.**When any file corrupted by the other end user, they come to know were actually file has been corrupted. When file corrupted the hub new node where we get same size and same data present in that file. End user can get whole data as it is. While regenerating file the hub shows were actually the file has been modified. Other nodes help to get same data present in file.