Explore possibilities for better t-SNE convergence

t-SNE is, in essence, optimizing the [Kullback-Leibler](https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence) divergence between neighbour distributions in high- and low-dimensional spaces. The minimum possible value for KL is 0 when distributions are identical.  

However, given our data, `bhtsne` does not yield a zero- or near-zero-valued map. For small set of tags (~300), the observed divergence is about 1.1. For the full set of tags, it is as high as 5.0, with small improvements provided by more iterations.  

This can potentially misguide users who try to make an analysis out of visualization. There should be some way to improve convergence in our case. Possible steps:

- perform a **multiple-map t-SNE**, as suggested by this [video](https://www.youtube.com/watch?v=EMD106bB2vY). It may turn out to be impossible due to non-scaling of such approach (i.e. Barnes-Hut optimization does not work). Search for an open-source implementation or go for one;
- much research is done for visualizing some concrete classes of data. We can **find analogues** and apply tricks from existing approaches. For instance, our problem is really similar to "given n authors and k papers by them, visualize all authors based on co-authorship of papers";
- we can **play with conversion** from tag-tag post count to weights in the tag graph. Actually, some approaches can be inferred from similar work. Also can look [here](https://github.com/stared/tag-graph-map-of-stackexchange) and [here](https://github.com/stared/tagoverflow) at an attempt to visualize SO tags, but in much smaller scale;
- maybe **visualizing intermediate steps** of t-SNE can give insight on how to improve convergence

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Explore possibilities for better t-SNE convergence #6

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Explore possibilities for better t-SNE convergence #6

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions