latent-simplex-variable-visualizer

Reference docs Visualize latent variables obtained with latent Dirichlet methods (topic models)

Installation

pip3 install toplot

Getting started

After training your topic model with $K$ components, you've inferred two latent variables:

The posterior over the weights of the model $\pmb{W} = [\pmb{w}_1, \dots, \pmb{w}_K]^T$. We assume that the weights have a two-level structure: each weight is composed of multiple sets of categories.
Per example $i$, the posterior of the hidden units $\pmb{h}^{(i)}$ (also denoted as $\pmb{\theta}_i$ in LDA).

To visualise, we generate some "fake" weights $\pmb{W}$ from two sets, BMI and sex containing three and two categories each, respectively:

import pandas as pd
from numpy.random import dirichlet

weights_bmi = dirichlet([16.0, 32.0, 32.0], size=1_000)
weights_sex = dirichlet([8.1, 4.1], size=1_000)
weights = pd.concat(
    {
        "BMI": pd.DataFrame(
            weights_bmi, columns=["Underweight", "Healthy Weight", "Overweight"]
        ),
        "sex": pd.DataFrame(weights_sex, columns=["Male", "Female"]),
    },
    axis="columns",
)

This is how you visualize weights, including the 95% quantile range:

from toplot import bar_plot, bar_plot_stacked

bar_plot(weights)

If you have many multinomials, you can reduce the size of the plot by folding the categories (e.g., "Underweight", "Healthy Weight", and "Overweight") belonging to the same multinomial (BMI) into a single bar

bar_plot_stacked(weights)

Next, we plot the hidden units/topic identities $[\pmb{h}^{(1)}, \dots, \pmb{h}^{(m)}]^T$: that is, for each example/participant/document $i$, the proportion over the components/topics. Let's generate the (average) proportion for $m=30$ examples to visualize:

hidden = pd.DataFrame(dirichlet([0.6, 0.8, 0.2], size=30), columns=["A", "B", "C"])

The function plot_cohort computes the distance between all examples (the cohort) and, by default, sorts them accordingly using the travelling salesman problem. Currently, no uncertainty visualization is supported for plot_cohort (like in bar_plot), so you need to pass the posterior average.

from toplot import plot_cohort, plot_polar_cohort

plot_cohort(hidden)

You can emphasize the periodicity inherent in the travelling salesman solution by visualizing all the examples using a polar plot:

plot_polar_cohort(hidden)

For the scattermap plot call:

from toplot import scattermap_plot

scattermap_plot(dataframe=weights, dataframe_counts=weights, marker_scaler=100)

Resulting in:

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.github/workflows		.github/workflows
examples		examples
gallery/figures		gallery/figures
src		src
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

latent-simplex-variable-visualizer

Installation

Getting started

About

Releases

Packages

Languages

License

luntergroup/toplot

Folders and files

Latest commit

History

Repository files navigation

latent-simplex-variable-visualizer

Installation

Getting started

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages