Clustering of data in high dimensionality spaces
(2012-2013)

Project founded by the Italian National Group for Scientific Computation (GNCS)

One of the strongest problems afflicting current machine learning techniques is dataset dimensionality. In many applications to real world problems, we deal with data with anywhere from a few dozen to many thousands of dimensions. Such high-dimensional data spaces are often encountered in areas such as medicine or biology, where DNA microarray technology can produce a large number of measurements at once, the clustering of text documents, where, if a word-frequency vector is used, the number of dimensions equals the size of the dictionary, and many others, including data integration and management, and social network analysis. In all these cases, the dimensionality of data makes learning problems hardly tractable.

In particular, the high dimensionality of data is a highly critical factor for the clustering task. The following problems need to be faced for clustering high-dimensional data:
The project is aimed to the study the current approaches for clustering high-dimensional data with particular stress on relational clustering, data reduction using rough and fuzzy sets, biclustering/co-clustering and related methods for intrinsic dimension estimation and for clustering comparison.

Research Group

PI

Francesco Masulli

University of Genova, Italy

Co-PI

Stefano Rovetta

University of Genova, Italy

Co-PI

Alfredo Petrosino

University of Naples Parthenope, Italy

Co-PI

Alessio Ferone

University of Naples Parthenope, Italy

Co-PI

Francesco Camastra

University of Naples Parthenope, Italy

Participant

Hassan Mahmoud

University of Genova, Italy

Participant

Giorgio Gemignani

University of Naples Parthenope, Italy

Participant

Mario Manzo

University of Naples Parthenope, Italy

"Giornata di studio"

A focal activity of this project is the CHDD 2012, International Workshop on Clustering High-Dimensional Data, May 15th, 2012, Naples, Italy, open to the free partecipation of all interested researchers.

Last updated 17 April 2012 by masulli@disi.unige.it