In this work, we explored the utility of a recently introduced, nonlinear dimensionality reduction method named Uniform Manifold Approximation (UMAP) for MSI data analysis. We compared UMAP to PCA and t-SNE. t-SNE is another pervasive nonlinear dimensionality reduction approach that is increasingly used for MSI data analysis. The work was primarily carried out by Tina Smets while receiving input and supervision from Nico Verbeeck and Marc Claesen for data analysis aspects.
Figure 1: Visualizations of two human lymphoma tissues using different approaches. Top row shows the resulting three-dimensional embedding encoded in the color channels of each image. The bottom row shows the associated embedding space. In this image, Euclidian distance was used in methods that enable choices. The data was acquired using a rapifleX MALDI-TOF instrument at 10 µm spatial resolution.
Specifically, our results illustrate that UMAP and t-SNE yield comparable results, which are clearly superior to simpler linear methods like PCA. Compared to t-SNE, however, UMAP provides significant computational advantages, namely:
- UMAP shows dramatically reduced computation time compared to t-SNE. In our experiments, we’ve observed an order of magnitude speedup of UMAP compared to the well-known Barnes-Hut approximation of t-SNE.
- In contrast to t-SNE, UMAP enables out-of-sample prediction, which means that the model can be used to embed data it was not trained on. This is a critical advantage for many applications.
Additional to the investigation of UMAP itself, we compared various distance metrics for MSI data. The results are shown in the figure below.
Figure 2: UMAP-based visualizations of two human lymphoma tissue using various distance metrics.
Upon comparing Figures 1 and 2, we can clearly see the superiority of distance metrics like cosine similarity and correlation compared to using standard Euclidian distance to model chemical similarity across spectra. The main underlying mathematical weaknesses of the Euclidian distance to model chemical similarity are its sensitivity to outliers (in this case m/z bins with very high intensity compared to the rest) along with its well-known problems when working in sparse, high-dimensional spaces.
Finally, during our investigation we identified a region of outlier pixels that skewed the UMAP analysis such that the dynamic range of colors was poorly used. After removing the impact of these outliers, we managed to improve our visualizations further.
Figure 3: UMAP-based visualization and associated embedding of a human lymphoma tissue after removing the influence of outlier pixels.
We were the first to apply UMAP to MSI data and showed that it has significant potential within MSI data analysis. Since our publication, UMAP has received a lot of attention by the MSI community and UMAP is now being adopted as a staple approach for MSI dimensionality reduction.
Evaluation of Distance Metrics and Spatial Autocorrelation in Uniform Manifold Approximation and Projection Applied to Mass Spectrometry Imaging Data
Tina Smets 1 , Nico Verbeeck 1 , 2 , Marc Claesen 1 , 2 , Arndt Asperger 3 , Gerard Griffioen 4 , Thomas Tousseyn 5 , Wim Waelput 6 , Etienne Waelkens 7 , Bart De Moor 1
Analytical Chemistry 91:5706-5714, 2019
- STADIUS Center for Dynamical Systems, Signal Processing, and Data Analytics, Department of Electrical Engineering (ESAT), KU Leuven, 3001 Leuven, Belgium
- Aspect Analytics NV, C-mine 12, 3600 Genk, Belgium
- Bruker Daltonik GmbH, Fahrenheitstrasse 4, 28359 Bremen, Germany
- reMYND, Bio-Incubator, Gaston Geenslaan 1, 3000 Leuven, Belgium
- Department of Pathology, University Hospitals KU Leuven, 3001 Leuven, Belgium
- Department of Pathology, UZ-Brussel, 1000 Brussels, Belgium
- Department of Cellular and Molecular Medicine, KU Leuven, 3000 Leuven, Belgium