Key Differences Between PCA, t-SNE, and UMAP in Fancy Data Handling
-
How They Work:
- PCA: This method looks for the biggest differences in the data in a straight line.
- t-SNE: This one is more flexible and keeps the smaller group patterns together.
- UMAP: Like t-SNE, but it also keeps the bigger picture in mind.
-
Output Size:
- PCA usually shrinks the data down to k size, where k is less than the original size.
- t-SNE often makes the data 2 or 3 sizes for easier viewing.
- UMAP can also make 2 or 3 sizes, and it can handle even bigger sizes if needed.
-
Speed of Calculation:
- PCA: Takes a lot of time with O(n2⋅m), where n is the number of data points and m is the features.
- t-SNE: Usually runs on O(n2), but can be faster with shortcuts.
- UMAP: Works on O(nlogn), which is better for more data.
-
Keeping Differences:
- PCA keeps the biggest differences in the first few parts.
- t-SNE mostly keeps the close similarities among the data.
- UMAP tries to keep both the close neighbors and the wider picture.
-
When to Use Them:
- PCA: Good for simple data shrinking.
- t-SNE: Great for looking at complicated data in a clear way.
- UMAP: Very useful for mixing grouping and viewing tasks.