Graph Regularized Feature Selection with Data Reconstruction

0
772
Graph Regularized Feature Selection with Data Reconstruction

Graph Regularized Feature Selection with Data Reconstruction

Abstract of Graph Regularized Feature Selection

Graph Regularized Feature Selection with Data Reconstruction,Feature selection is a challenging problem for high dimensional data processing, which arises in many real applications such as data mining, information retrieval, and pattern recognition. In this paper, we study the problem of unsupervised feature selection. The problem is challenging due to the lack of label information to guide feature selection. We formulate the problem of unsupervised feature selection from the viewpoint of graph regularized data reconstruction. The underlying idea is that the selected features not only preserve the local structure of the original data space via graph regularization, but also approximately reconstruct each data point via linear combination. Therefore, the graph regularized data reconstruction error becomes a natural criterion for measuring the quality of the selected features. By minimizing the reconstruction error, we are able to select the features that best preserve both the similarity and discriminant information in the original data.

Conclusion

Data Reconstruction,We formulate the problem of unsupervised feature selection from a new perspective of graph regularized data reconstruction. We consider that the discriminant information can be preserved by selecting the features that minimizes the data reconstruction error. We also preserve the similarity of the original data space by graph regularized feature selection. Our approach integrates both data reconstruction and graph regularization seamlessly into a common framework that tackles the problem of unsupervised feature selection. In this way, our approach selects the features that best preserve the similarity and discriminant information in the original data space via the minimization of the graph regularized data reconstruction error. We devise a novel gradient method to solve the optimization problem. We conduct several experiments on the text clustering for TDT2 and Routers corpus.