- Open Access
An improved sparse representation model with structural information for Multicolour Fluorescence In-Situ Hybridization (M-FISH) image classification
BMC Systems Biologyvolume 7, Article number: S5 (2013)
Multicolour Fluorescence In-Situ Hybridization (M-FISH) images are employed for detecting chromosomal abnormalities such as chromosomal translocations, deletions, duplication and inversions. This technique uses mixed colours of fluorochromes to paint the whole chromosomes for rapid detection of chromosome rearrangements. The M-FISH data sets used in our research are obtained from microscopic scanning of a metaphase cell labelled with five different fluorochromes and a DAPI staining. The reliability of the technique lies in accurate classification of chromosomes (24 classes for male and 23 classes for female) from M-FISH images. However, due to imaging noise, mis-alignment between multiple channels and many other imaging problems, there is always a classification error, leading to wrong detection of chromosomal abnormalities. Therefore, how to accurately classify different types of chromosomes from M-FISH images becomes a challenging problem.
This paper presents a novel sparse representation model considering structural information for the classification of M-FISH images. In our previous work a sparse representation based classification model was proposed. This model employed only individual pixel information for the classification. With the structural information of neighbouring pixels as well as the information of themselves simultaneously, the novel approach extended the previous one to the regional case. Based on Orthogonal Matching Pursuit (OMP), we developed simultaneous OMP algorithm (SOMP) to derive an efficient solution of the improved sparse representation model by incorporating the structural information.
The p-value of two models shows that the newly proposed model incorporating the structural information is significantly superior to our previous one. In addition, we evaluated the effect of several parameters, such as sparsity level, neighbourhood size, and training sample size, on the of the classification accuracy.
The comparison with our previously used sparse model demonstrates that the improved sparse representation model is more effective than the previous one on the classification of the chromosome abnormalities.
Chromosomal abnormalities (e.g., changes in number and translocations of structures) could all cause genetic diseases and cancers. To detect these deathful diseases, multicolour Fluorescence In-Situ Hybridization (M-FISH) technique use different colours to paint human chromosomes. Therefore, this technique can be employed to analyze these abnormalities simultaneously [1, 2]. This cytogenetic approach uses N fluorochromes to label a metaphase cell; there are 2N-1 different combinations that can differentiate different types of chromosomes. It is obviously that 5 different fluorochromes are enough to differentiate 24 types of different human chromosomes. Therefore, the S Gold (F), S Green (G), S Aqua (A), Red (R) and S Red (Y) are used to paint the chromosomes. The painted chromosomes are illuminated by specific wavelength light. The fluorochromes on the chromosomes emit florescent light with distinct wavelength which can be detected by the microscopy. To acquire images of different fluorescence colours, 5 different emission filters were employed to avoid the disturbance of the other fluorescence colours and keep the valid emission light. Figure 1 illustrates M-FISH image set which is collected by microscopy with CCD camera. In addition, the last image in Figure 1 is the DAPI channel which shows the whole chromosomes in a cell. For each fluorescence channel, one image is generated and the chromosomes are detected by the pixels with high intensity. Ideally, a chromosome can be dyed with at least two fluorochromes, for example, S Green (G) and DAPI. Hence, the chromosome should be visible only in G and DAPI channels, but sometimes it might be observed in other channels because of spectral mixing, inhomogeneous background . Therefore, it is extremely challenging to identify the chromosomes accurately based on M-FISH image set in practice.
For detecting the chromosomal abnormalities associated with genetic diseases or cancers by M-FISH technique, it is important to improve the accuracy of the classification of the chromosomes. Before classification, some preprocessing methods [3–7] are necessary to increase the accuracy by reducing the noise of the original images. In classification, there are two major types of classifiers: the pixel by pixel classifier [8–10] and the region-based classifier [6, 7]. For the classification, we have proposed Bayesian classifier  and sparse representation based classification (SRC). For the segmentation purpose, we have developed Adaptive Fuzzy C-Means (AFCM) segmentation method . To bring the imaging technique into clinical use, further effort is needed to improve the classification accuracy.
Sparse representation methods including compressive sensing have been widely studied recently in applied mathematics and signal/image processing for their advantages in processing high dimensional data [13, 14]. There are many algorithms ( e.g., greedy algorithms (Matching Pursuit (MP ), OMP  and Homotopy ) to solve the sparse models. Recently Multiple Measurement Vectors (MMV) based models have also been proposed to recover a set of vectors that share a common support. Such models can find wide applications in many research fields (e.g., multiple signal classification(MUSIC), blind multiband signal reconstruction and compressive diffuse optical tomography), where MMV problem is commonly applied. Motivated by these efforts on the MMV problem, we proposed a novel sparse representation model by incorporating the structural information into the classification of M-FISH image set, which was reported in our preliminary study . This improved model considers the correlations of neighbouring pixels, which often share the same features and belong to the same class. By utilizing multiple information both from the neighbourhood of a pixel as well as from different spectral channels, the classification results of the proposed sparse model are better than that of sparse model we used before .
The paper is organized as follows. First, we introduce the SRC model without structural information and then propose an improved sparse model as well as the corresponding algorithm (i.e., SOMP) for estimating the solution. Next, we apply the improved model to M-FISH classification and compare it with a conventional sparse model which was employed in our previous model . Finally, the paper is concluded with a short summary and discussion of the proposed model.
The SRC model has been successfully used in many fields (e.g., hyperspectral imaging classification  and M-FISH chromosome classification ). Before introducing the improved sparse model, we first review the sparse model and show how to apply it on M-FISH image data analysis. Then, we present the improved sparse model with the structural information for M-FISH chromosome classification by utilizing correlated information of the neighbouring pixels within a region. Finally, we describe the numerical algorithm, SOMP, for solving this improved model.
SRC algorithm for M-FISH data
A general type of sparse model is shown in Eq. (1), where y is a vector with different observations; A is a matrix consisting of features from different classes; and x is a vector of coefficients corresponding to the observation vector y. If the observations y belongs to a particular class, the corresponding coefficients in x will have a few non-zero entries concentrated around a particular region, whereas the rest will be zeros; i.e., the vector x is a sparse vector with many zero entries. Figure 2 shows the schematic diagram of the sparse model. In Figure 2, matrix A consists of features from three different classes which are represented by different colours: yellow, red and green respectively. x is a sparse vector with non-zero entries in red region and zero entries in white regions. Given an observation vector y, the sparse vector x can be solved with the optimization model shown in Eq. (2). Assuming we have m (i.e., m = 24 in our case) classes and each pixel corresponds to a n (n = 5) dimensional feature vector , we can have a feature matrix A represented by , where each sub-matrix is (i.e., ), and . Here is the number of training pixels from the i-th class. In matrix A, the number of pixels is . Based on the sparse model in Eq. (1), a testing sample y can be approximated by a sparse solution with non-zero coefficients corresponding to a particular class using Eq. (2).
where is the test pixels to be classified; , is the L-p norm of x and is usually used to shrink the solution to have small percentage of nonzero coefficients, which results in the sparse of the solution; by specifying the values of K0,we can obtain the solution with different sparse levels. For the sake of simplicity, we take the case of , and is the corresponding L0 norm of x, which means the number of the non-zero coefficients in x.
After estimating the solution of Esq. (1)-(2), we will classify a test sample y as follows:
where m represents the number of different classes; and is the sparse solution corresponding to class i . The class that y belongs to is determined by assigning it to the one that the distance between the y and estimated solution is minimum.
Improved sparse model with structural information for M-FISH data analysis
In the Eq. (1), y is a feature vector consisting of 5- channel spectral information at only one pixel. However, in practice a pixel usually shares the same feature with its neighbouring pixels, which is the case with M-FISH image set. The neighbouring pixels with similar intensity values are the nearest neighbourhood of y 5 which is a central pixel, as illustrated in Figure 3.
The classification accuracy of a pixel by pixel classifier and its robustness to noise can be improved by considering structural information of the pixel within a neighbour region. Therefore, we exploit a new sparse model with structural information by utilizing the information of neighbouring pixels simultaneously instead of a single pixel as shown in Eq. (4):
where y 1 ,...,y 9 are the test samples within a neighbourhood that form the matrix Y and y 5 is the central pixel. x 1 ,...,x 9 are the vectors of corresponding weights. Eq. (4) shows that y 1 ,...,y 9 share the same features in matrix A but different weights. Figure 4 shows the schematic diagram of the improved sparse model with structural information based on the Eq. (4).
Since matrix X is a row-wise sparse matrix, as shown in Figure 4, the improved model is an extension of our previous sparse model (1) by considering multiple pixels simultaneously. With this improved sparse model, we propose to use the following optimization for the solution:
where is a test matrix instead of the vector in SRC model. The text matrix contains s test pixels within a neighbouring region. Assuming that there is spatially correlated among the s pixels. The row-sparse solution corresponds to the input matrix Y. The entries in share the same non-zero supports. They are obtained by Eq. (5) with the following regularization term:
where indicates the number of non-zero rows of X, and indicates the i-th row of X. is an indicator function that has the value 1 if and 0 otherwise. In this work, we set . The solution vectors have the row-wise sparsity (i.e., the non-zero entries in the same row), which indicates the high correlation of the neighbouring pixels.
The rule of the decision used in the Eq.(3) and in the improved model is similar. After we get , we will employ Eq. (7) to determine to which class the test samples surrounding a central vector y c belongs to,
where y c is the central pixel of a neighbourhood and is the residual between an input matrix Y consisting of neighbouring pixels around y c and the product of the solution and the corresponding sub-matrix A i . The minimum value of the residual determines the class which the central pixel belongs to.
Algorithms for the solution of the improved sparse model
There have been many approximate algorithms for solving the optimization problems (i.e., Eq.(2) and (5)). When p equals 0 [15, 16], e.g., L0 norm, the greedy algorithms (e.g., MP, OMP) will be employed to solve the problem of Eq. (2). In , simultaneous OMP (SOMP) algorithm for Eq. (5) was employed instead of OMP algorithm for solve Eq. (2) and the detail of SOMP is described in Table 1. At each iteration, the algorithm will pick up one column from the training matrix A based on the criterion that the maximum q-norm value of the projection on the current residual matrix could be obtained only by selecting the column . Once the column is selected, it will be included for re-estimating the signal and thus the new reduced residual. The algorithm will continue until the solution reaches the pre-specified sparsity level.
Results and analysis
We have collaborated with Advanced Digital Imaging Research (ADIR; League City, Texas, USA) to establish the M-FISH image database, which is a valuable source for chromosome imaging studies . The database is publicly available from . A set of images from five different fluorescence channels and a DAPI channel were acquired by microscopy and an example is shown in Figure 1. In addition, to evaluate the classification accuracy, an experienced cytogeneticist provided a ground truth image which is shown in Figure 5(a) in the form of pseudo colours, where different colours indicate different types of chromosomes. There are totally 24 different classes including male and female chromosomes. In the ground truth images, the background pixels were labelled with 0. The pixels in the region of overlap were labelled with 255. Others were labelled by numbers from 1 to 24 which was used to discriminate different types of chromosomes. The ground truth will be employed to verify the accuracy of classification algorithms for M-FISH image set.
Segmentation of chromosome regions
In M-FISH images, background usually contains most pixels, but the chromosomal regions are of most interests. Therefore, to separate the chromosomal region from the background and improve the efficiency of the classification, a mask was generated by the DAPI channel which can show all chromosomes in a cell. The AFCM method we proposed in  was employed for this purpose. This mask was then applied on the other five channels, so that the chromosome regions could be extracted based on the mask while the pixels out of the mask were removed. In Figure 6, an image of a DAPI channel is demonstrated as well as how the mask is generated by the segmentation.
M-FISH training and testing data
The improved sparse model with structural information was applied on the classification of M-FISH image data. 20 cells (i.e., 10 male, 10 female) were chosen from our database . The features of different types of chromosome were constructed by randomly sampling pixels from M-FISH images to form the training matrix A, which satisfy the sparsity concentration index (SCI) proposed by. SCI is used to measure the sparsity concentration of the feature vectors. Matrix A is an n×N matrix, in which n represents the spectral dimension of pixels and N represents the number of training features. In the case of M-FISH image data, n equals 5. After completing the matrix training, the rest of the pixels were taken as testing data to validate our proposed classification method.
The analysis of the classification results with different models
Both the sparse model incorporating the structural information and our previously used sparse model  were tested and compared on our M-FISH data set. Figure 5(b) and 5(c) show the classification results of two different models on the same cell, with and without the use of structural information respectively. It can be seen that there are more isolated spots in the chromosomal regions of Figure 5 (c) than those of Figure 5 (b). These isolated spots are mostly misclassifications, which can be effectively corrected by using the improved sparse model with structural information. The ratio of correct classification (RCC) as follow:
Table 2 shows RCC of different types of chromosomes for one M-FISH image set. The RCC of the improved sparse model with structural information is generally greater than that of our previously used sparse model. Figure 7 compares the classification results of both models on each cell in terms of RCC. It can be seen that the accuracy of the classification of the improved sparse model with structural information (in red) is greater than that of the previously used sparse model  (in blue). Therefore, with the structural information of neighbouring region, the improved sparse model can increase the accuracy of the classification for the M-FISH image set.
Significance analysis of the new sparse model with structural information
Statistical analysis by using a paired-sample t-test was performed to demonstrate the significant level between the two different models. The null hypothesis is that there are no differences between both models. Figure 8 shows the results of the statistical analysis based on the results in Figure 7. The improved sparse model with structural information has the greater mean value while less standard deviation, 76.72 ± 9.3 (i.e., the left box plot in Figure 8), than those of the previous sparse model, 72.94 ± 9.82 (i.e., the right box plot in Figure 8). The significant level (i.e. p-value) of this statistical analysis is less than 1e-6. Therefore, the improved sparse model with the structural information significantly outperforms our previous sparse model, by incorporating the structural information available in the neighbour of each pixel.
Effects of parameters used
There are three important parameters, neighbour size (s), sparsity level (K 0 ), and training sample size (N i ), which are involved in the improved sparse model. The accuracy of the classification results can be affected by these three parameters and hence it is worthwhile to study their effects. Figure 9 shows how the RCC is affected by different values of K 0 and s. When K 0 is fixed, the RCC will raise with the increase of the neighbourhood size s until a certain threshold (e.g., s = 121). This indicates that the use of correlated information within a window can generally increase the classification accuracy, however, if the window size is too large, there is high probability that more irrelevant or other chromosomal pixels will be included, which tends to increase the classification error. An appropriate window size is therefore needed. A neighbourhood size (s = 9) is recommended based on our experiments. When the neighbourhood size s is fixed, from Figure 9, the smaller value of the sparsity level K 0 will give the greater accuracy of the classification.
In addition, the correct ratio of classifying the M-FISH image is affected by the training sample size N i for both models as shown in Figure 10. A number of different percentages of training samples were selected: 1%, 3%, 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, and 50%. In Figure 10, the correct classification ratios of the two models are represented by stars and triangles respectively. The analysis results show that the correct classification ratios increase with the increasing size of the training samples for both models, which is reasonable.
Conclusions and discussion
A sparse model based classifier that we proposed before  used the pixel by pixel classification, overlooking structural information so that there are much more isolated spots in the results leading to the low accuracy of the classification. In this paper we proposed an improved sparse model, in which the information of a central pixel as well as its neighbouring pixels is used simultaneously for improved classification. This is validated by the comparison of chromosomal classification accuracy between the two models on a real M-FISH database . The comparison (as illustrated by Figure 5) shows that there are more isolated spots (i.e., misclassifications) in the classification results of our previously model  than those of using new sparse model incorporating the structural information. The correct classification ratio in Table 2 also shows the improved accuracy of using the improved sparse model. The statistical comparison between the two models indicates that the new sparse model with structural information is superior to the previously used sparse model, with the significant level less than 1e-6,. The effects of parameters used in the model on the accuracy of classification were also investigated. We have shown how the sparsity level (K 0 ) and the neighbourhood size (s) and the training sample size (N i ) affected the RCC of our improved sparse model incorporating structural information and how the training sample size (N i ) affected the RCC of our previously used model as well as improved model. A proper choice of sparsity level (K 0 < = 5) and neighbourhood size (s = 9) is recommended based on our experiments.
In summary, all the result shows that our proposed improved sparse model incorporating structural information can significantly improve the accuracy of the classification compared with a general sparse model that we proposed before . This will in turn improve the M-FISH imaging technique for detecting chromosome abnormalities to better diagnose genetic diseases and cancers.
Schrock E, duManoir S, Veldman T, Schoell B, Wienberg J, FergusonSmith MA, Ning Y, Ledbetter DH, BarAm I, Soenksen D, et al.: Multicolor spectral karyotyping of human chromosomes. Science. 1996, 273 (5274): 494-497. 10.1126/science.273.5274.494.
Speicher MR, Ballard SG, Ward DC: Karyotyping human chromosomes by combinatorial multi-fluor FISH. Nat Genet. 1996, 12 (4): 368-375. 10.1038/ng0496-368.
Choi H, Bovik AC, Castleman KR: Feature normalization via expectation maximization and unsupervised nonparametric classification for M-FISH chromosome images. IEEE Trans Med Imaging. 2008, 27 (8): 1107-1119.
Choi H, Castleman K, Bovik A: Joint segmentation and classification of M-FISH chromosome images. Conf Proc IEEE Eng Med Biol Soc. 2004, 3: 1636-1639.
Cao HB, Deng HW, Wang YP: Segmentation of M-FISH Images for Improved Classification of Chromosomes With an Adaptive Fuzzy C-means Clustering Algorithm. Ieee T Fuzzy Syst. 2012, 20 (1): 1-8.
Karvelis PS, Fotiadis DI, Tsalikakis DG, Georgiou IA: Enhancement of multichannel chromosome classification using a region-based classifier and vector median filtering. IEEE Trans Inf Technol Biomed. 2009, 13 (4): 561-570.
Karvelis PS, Tzallas AT, Fotiadis DI, Georgiou I: A multichannel watershed-based segmentation method for multispectral chromosome classification. IEEE Trans Med Imaging. 2008, 27 (5): 697-708.
Sampat ACB MP, Aggarwal JK, Castleman KR: Pixel-by-pixel classification of MFISH images. 24th IEEE Ann Intern Conf (EMBS). 2002, Houston, TX, 2: 999-1000.
Schwartzkopf WC, Bovik AC, Evans BL: Maximum-likelihood techniques for joint segmentation-classification of multispectral chromosome images. Ieee T Med Imaging. 2005, 24 (12): 1593-1610.
Sampat MP, Bovik AC, Aggarwal JK, Castleman KR: Supervised parametric and non-parametric classification of chromosome images. Pattern Recogn. 2005, 38 (8): 1209-1223. 10.1016/j.patcog.2004.09.010.
Wang YP, Castleman KR: Normalization of multicolor fluorescence in situ hybridization (M-FISH) images for improving color karyotyping. Cytom Part A. 2005, 64A (2): 101-109. 10.1002/cyto.a.20116.
Cao HB, Deng HW, Li M, Wang YP: Classification of Multicolor Fluorescence In Situ Hybridization (M-FISH) Images With Sparse Representation. Ieee T Nanobiosci. 2012, 11 (2): 111-118.
Simoncelli EP, Olshausen BA: Natural image statistics and neural representation. Annu Rev Neurosci. 2001, 24: 1193-1216. 10.1146/annurev.neuro.24.1.1193.
Li Y, Cichocki A, Amari S: Analysis of sparse representation and blind source separation. Neural Comput. 2004, 16 (6): 1193-1234. 10.1162/089976604773717586.
Mallat SG, Zhang ZF: Matching Pursuits with Time-Frequency Dictionaries. Ieee T Signal Proces. 1993, 41 (12): 3397-3415. 10.1109/78.258082.
Tropp JA, Gilbert AC: Signal recovery from random measurements via orthogonal matching pursuit. Ieee T Inform Theory. 2007, 53 (12): 4655-4666.
Donoho DL, Tsaig Y: Fast Solution of l(1)-Norm Minimization Problems When the Solution May Be Sparse. Ieee T Inform Theory. 2008, 54 (11): 4789-4812.
Kim JM, Lee OK, Ye JC: Compressive MUSIC: Revisiting the Link Between Compressive Sensing and Array Signal Processing. IEEE T Inform Theory. 2012, 58 (1): 278-301.
Mishali M, Eldar YC: Blind Multiband Signal Reconstruction: Compressed Sensing for Analog Signals. IEEE T Signal Proces. 2009, 57 (3): 993-1009.
Lee O, Kim JM, Bresler Y, Ye JC: Compressive diffuse optical tomography: noniterative exact reconstruction using joint sparsity. IEEE Trans Med Imaging. 2011, 30 (5): 1129-1142.
Li J, Lin D, Cao H, Wang Y: Classification of multicolor fluorescence in-situ hybridization (M-FISH) image using structure based sparse representation model. Bioinformatics and Biomedicine (BIBM), 2012 IEEE International Conference on: 4-7 October 2012. 2012, 1-6. 10.1109/BIBM.2012.6392672.
Chen Y, Nasrabadi NM, Tran TD: Hyperspectral Image Classification Using Dictionary-Based Sparse Representation. IEEE T Geosci Remote. 2011, 49 (10): 3973-3985.
Tropp JA, Gilbert AC, Strauss MJ: Algorithms for simultaneous sparse approximation. Part I: Greedy pursuit. Signal Process. 2006, 86 (3): 572-588. 10.1016/j.sigpro.2005.05.030.
M-Fish Database website. [https://sites.google.com/site/xiaobaocao006/database-for-download]
Wright J, Yang AY, Ganesh A, Sastry SS, Ma Y: Robust Face Recognition via Sparse Representation. Ieee T Pattern Anal. 2009, 31 (2): 210-227.
Based on " Classification of multicolor fluorescence in-situ hybridization (M-FISH) image using structure based sparse representation model", by Jingyao Li, Dongdong Lin, Hongbao Cao and Yu-Ping Wang, which appeared in 2012 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). © 2012 IEEE . This work has been partially supported by the NIH and NSF.
The publication costs for this article were funded by the corresponding author.
This article has been published as part of BMC Systems Biology Volume 7 Supplement 4, 2013: Selected articles from the IEEE International Conference on Bioinformatics and Biomedicine 2012: Systems Biology. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcsystbiol/supplements/7/S4.
The authors declare that they have no competing interests.
JL, DL and YPW designed research. JL designed the algorithm. HC performed segmentation algorithm. All authors read and approved the final manuscript.