Spatially-Aware Clustering of Ion Images in Mass Spectrometry Imaging Data Through the Use of Pre-trained Neural Networks
This project concerns improving unsupervised methods to cluster ion images, which enables identifying m/z bins with similar spatial expressions. All data analyses shown here were performed by Wanqiu Zhang under supervision of Nico Verbeeck and Marc Claesen for all bioinformatics aspects.
Most clustering approaches for MSI data focus on clustering spectra, i.e., to group pixels with comparable chemical content, which is not comparable to our problem statement.
In this work we specifically aim to improve clustering of ion images, by exploiting both global and local spatial structures. The simplest approach is to run a clustering algorithm directly on raw image vectors, but this fails to incorporate any spatial information directly. As a result, the resulting clusterings usually focus exclusively on global trends - which is often good enough - but fail to incorporate local salient patterns.
We leverage neural networks because they are known for their aptitude in detecting salient, localized image features. However, large neural networks typically require a large number of parameters to be learned, so training them poses significant computational requirements and is prone to overfitting given a limited sample such as the ion images of a single MSI dataset.
To avoid these problems, we use a general-purpose pre-trained network - Xception (freely available) - as a high-level image feature detector. The key idea is that clustering based on the embeddings produced by this network for each ion image, rather than the raw intensity data of these ion images, drives the clustering algorithm to identify structures based on spatial features without any modifications to the clustering pipeline itself. To test whether our hypothesis holds, we compared a pipeline with and without neural network embeddings, as shown in Figure 1.
Figure 1: Illustration of the two data analysis pipelines used in this work. In the UMAP-only model (left column), the raw image vectors were fed into UMAP and then clustered using DBSCAN. In contrast, the NN-based model embeds raw image vectors with the pretrained Xception model and then feeds those embeddings into UMAP followed by DBSCAN clustering.
Both pipelines managed to perform a fairly good clustering overall. We observed that the NN-based model identifies more clusters, and that it specifically manages to pull apart ion images with comparable global structure but clear, localized differences - just like we hoped.
Interestingly, both pipelines usually end up grouping all ion images with fairly little spatial expression (i.e., noise images) together in a single cluster. For the NN-based model, this cluster of noisy images was consistently smaller than for the UMAP model.
To illustrate the difference between both pipelines, Figure 2 shows some ion images and their associated clusters for both approaches. The image shows sets of ion images corresponding to three isotopical distributions (1 Da apart), so each set should at least be clustered together and, ideally, in a cluster with comparable overall spatial expression.
Figure 2: Clustering outcomes for both pipelines for ion images of different isotopes. Note that these three sets of ion images are situated on single TOF peaks, but still exhibit a clear distinct spatial expression. The data was acquired using a rapifleX MALDI-TOF instrument at 10 µm spatial resolution from human lymphoma tissue.
Figure 2 clearly shows the merits of using a neural network to detect spatial features. In the NN-based approach, ion images from the same isotope distribution are clustered together and the mean cluster images show a comparable spatial expression. For the UMAP-only approach this is not the case, with 8 out of 9 ion images being clustered together in a cluster without a clear spatial expression.
For more information, please refer to the poster below or feel free to contact us.
Wanqiu Zhang1,2, Nico Verbeeck1,2, Thomas Moerman2, Etienne Waelkens3, Marc Claesen1,2, Bart De Moor1. Spatially-Aware Clustering of Ion Images in Mass Spectrometry Imaging Data Through the Use of Pre-trained Neural Networks, ASMS 2020 Reboot, online, 2020
KU Leuven, ESAT-STADIUS, Leuven, Belgium
Aspect Analytics NV, Genk, Belgium
KU Leuven, Dept. of Cellular and Molecular Medicine, Leuven, Belgium