Two-Tier Mapper, an unbiased topology-based clustering method for enhanced global gene expression analysis

Motivation: Unbiased clustering methods are needed to analyze growing numbers of complex data sets. Currently available clustering methods often depend on parameters that are set by the user, they lack stability, and are not applicable to small data sets. To overcome these shortcomings we used topological data analysis, an emerging field of mathematics that can discerns additional feature and discover hidden insights on data sets and has a wide application range. Results: We have developed a topology-based clustering method called Two-Tier Mapper (TTMap) for enhanced analysis of global gene expression datasets. First, TTMap discerns divergent features in the control group, adjusts for them, and identifies outliers. Second, the deviation of each test sample from the control group in a high-dimensional space is computed, and the test samples are clustered using a new Mapper-based topological algorithm at two levels: a global tier and local tiers. All parameters are either carefully chosen or data-driven, avoiding any user-induced bias. The method is stable, different datasets can be combined for analysis, and significant subgroups can be identified. It outperforms current clustering methods in sensitivity and stability on synthetic and biological datasets, in particular when sample sizes are small; outcome is not affected by removal of control samples, by choice of normalization, or by subselection of data. TTMap is readily applicable to complex, highly variable biological samples and holds promise for personalized medicine.

Two-Tier Mapper, an unbiased topology-based clustering method for enhanced global gene expression analysis

Graph Chatbot

Chat with Graph Search

Data and scripts for the RaFSIP scheme

Data set for control of Ge island coalescence for the formation of nanowires on silicon.

DATED: GUIDELINES FOR CREATING SYNTHETIC DATASETS FOR ENGINEERING DESIGN APPLICATIONS

Data and scripts for the RaFSIP scheme

Data set for control of Ge island coalescence for the formation of nanowires on silicon.

DATED: GUIDELINES FOR CREATING SYNTHETIC DATASETS FOR ENGINEERING DESIGN APPLICATIONS