Tutorial on

Introduction to Bioinformatics Data Sets Mining using Fuzzy Biclustering 


International Joint Conference on Neural Networks (IJCNN 2009)

Westin PeachTree Hotel, Atlanta, Georgia, USA, June 14, 2009




The analysis of genomic data from DNA microarray can produce a valuable information on the biological relevance of genes and correlations among them. In the last few years some biclustering techniques have been proposed and applied to this analysis. Biclustering is an un-supervised learning task aimed to find clusters of samples possessing similar characteristics together with features creating these similarities. Starting from the seminal paper by Cheng and Church published in 2000 [1], many biclustering techniques have been proposed for bioinformatic data analysis [2]. Biclustering is especially useful when applied to the analysis of DNA microarray data since it can tackle the important problem of identifying genes with similar behavior with respect to different conditions. Some biological tasks where biclustering can be successfully applied are: (1) Identification of co-regulated genes and/or specific regulation processes; (2) Gene functional annotation; (3) Sample and/or tissue classification. In this tutorial we will focus on the fuzzy model of biclustering as it is very promising from both a computational and a representation point of view [4,5,6,7]. This model allows finding multiple solutions (thus avoiding problems such as random interference [7]) with significant speed. Moreover, some techniques, based on the fuzzy-possibilistic approach to clustering, can find very large and homogeneous biclusters, as shown by experimental results. In the tutorial we will present also an experimental assessment of fuzzy biclustering algorithms, using some computationally parsimonious stability indexes [8] . 


[1] Y. Cheng and G. M. Church, Biclustering of expression data. Proc Int Conf Intell Syst Mol Biol, vol. 8, pp. 93-103, 2000.
 [2] S. C. Madeira and A. L. Oliveira, Biclustering algorithms for biological data analysis: A survey, IEEE Transactions on Computational Biology and Bioinformatics, vol. 1, pp. 24-45, 2004.
 [3] K. Umayahara, S. Miyamoto, and Y. Nakamori, Formulations of fuzzy clustering for categorical data, Int. J. of Innovative Computing, Information and Control, vol. 1, no. 1, pp. 83-94, 2005.
 [4] W.-C. Tjhi and L. Chen, Minimum sum-squared residue for fuzzy co-clustering, Intelligent Data Analysis, vol. 10, no. 3, pp. 237-249, 2006.
 [5] C. Cano, L. Adarve, J. Lopez, and A. Blanco, Possibilistic approach for biclustering microarray data, Computers in Biology and Medicine, vol. 37, no. 10, pp. 1426-1436, October 2007.
 [6] M. Filippone, F. Masulli, S. Rovetta, S. Mitra, and H. Banka, Possibilistic approach to biclustering: An application to oligonucleotide microarray data analysis.in Lecture Notes in Bioinformatics, C. Priami, Ed., vol. 4210. Springer, October 2006, pp. 312-322.
 [7] J. Yang, H. Wang, W. Wang, and P. Yu, Enhanced biclustering on expression data, in BIBE 03: Proceedings of the 3rd IEEE Symposium on BioInformatics and BioEngineering. Washington, DC, USA: IEEE Computer Society, 2003, p. 321.
 [8] M. Filippone, F. Masulli, and S. Rovetta, Comparing Fuzzy Approaches to Biclustering, Computational Intelligence Methods for Bioinformatics and Biostatistics, 
Proceedings of the CIBB 2008, LNCS/LNBI, Springer-Verlag, Heidelberg (Germany), 2008 (in press).


Francesco Masulli (1,2) and Stefano Rovetta (1)

(1) DISI Dept. Computer and Information Sciences
University of Genova and CNISM
Via Dodecaneso 35, 16146 Genoa, Italy
E-mails: masulli <at> disi.unige.it, rovetta
<at> disi.unige.it
(3) Sbarro Institute for Cancer Research and Molecular Medicine,
Temple University, 1900 N 12th Street Philadelphia, PA 19122, USA

Last updated 27/3/2009