Intuitively, two data instances should be co-located in the resulting visualization if their multi-dimensional profiles are similar. Two-dimensional embeddings and their visualizations may assist in the analysis and interpretation of high-dimensional data. We also show the predictive power of our simple, visual classification approach in t-SNE space matches the accuracy of specialized machine learning techniques that consider the entire compendium of features that profile single cells. The visualizations constructed by our proposed approach are clear of batch effects, and the cells from secondary data sets correctly co-cluster with cells of the same type from the primary data. The batch effects in our studies are particularly strong as the data comes from different institutions using different experimental protocols. We demonstrate the utility of this approach by analyzing six recently published single-cell gene expression data sets with up to tens of thousands of cells and thousands of genes. This prevents any interactions between instances in the secondary data and implicitly mitigates batch effects. Each data instance from a new, unseen, secondary data is embedded independently and does not change the reference embedding. To circumvent these batch effects, we propose an embedding procedure that uses a t-SNE visualization constructed on a reference data set as a scaffold for embedding new data points. When jointly visualising multiple data sets, a straightforward application of these methods often fails instead of revealing underlying classes, the resulting visualizations expose dataset-specific clusters. Dimensionality reduction techniques, such as t-SNE, can construct informative visualizations of high-dimensional data.
0 Comments
Leave a Reply. |