Skip to content

Explore possibilities for better t-SNE convergence #6

@ItsLastDay

Description

@ItsLastDay

t-SNE is, in essence, optimizing the Kullback-Leibler divergence between neighbour distributions in high- and low-dimensional spaces. The minimum possible value for KL is 0 when distributions are identical.

However, given our data, bhtsne does not yield a zero- or near-zero-valued map. For small set of tags (~300), the observed divergence is about 1.1. For the full set of tags, it is as high as 5.0, with small improvements provided by more iterations.

This can potentially misguide users who try to make an analysis out of visualization. There should be some way to improve convergence in our case. Possible steps:

  • perform a multiple-map t-SNE, as suggested by this video. It may turn out to be impossible due to non-scaling of such approach (i.e. Barnes-Hut optimization does not work). Search for an open-source implementation or go for one;
  • much research is done for visualizing some concrete classes of data. We can find analogues and apply tricks from existing approaches. For instance, our problem is really similar to "given n authors and k papers by them, visualize all authors based on co-authorship of papers";
  • we can play with conversion from tag-tag post count to weights in the tag graph. Actually, some approaches can be inferred from similar work. Also can look here and here at an attempt to visualize SO tags, but in much smaller scale;
  • maybe visualizing intermediate steps of t-SNE can give insight on how to improve convergence

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions