A probabilistic circuit for imputing missing tabular data

December 22, 2023

2023 · class-notes food-for-thought · blog-post

SPNs for imputing data

The following is a Jupyter notebook ran on Google Colab using Kaggle’s Titanic dataset to illustrate a practical use case of a probabilistic circuit, a sum-product network, from deepprob-kit: to impute missing tabular data.

This notebook is forked from a Kaggle tutorial on using a Tensorflow’s autoencoder to impute missing data. Standard autoencoders can’t:

custom fill-in missing data
flexibly incorporate domain knowledge like what distribution is best used to model a feature
Furthermore, p;robabilistic circuits can tractably compute missing data through maximum a posteriori estimation.

Enjoy Reading This Article?

Here are some more articles you might like to read next:

Gramática universal y una gallina

Goedel Benchmarks: self-improving benchmarks

A brief take on benchmarks

Conversatoria sobre la inteligencia artificial en la medicina

The interesting analogy between language and biology