A probabilistic circuit for imputing missing tabular data
SPNs for imputing data
The following is a Jupyter notebook ran on Google Colab using Kaggle’s Titanic dataset to illustrate a practical use case of a probabilistic circuit, a sum-product network, from deepprob-kit: to impute missing tabular data.
This notebook is forked from a Kaggle tutorial on using a Tensorflow’s autoencoder to impute missing data. Standard autoencoders can’t:
- custom fill-in missing data
- flexibly incorporate domain knowledge like what distribution is best used to model a feature
- Furthermore, p;robabilistic circuits can tractably compute missing data through maximum a posteriori estimation.
Enjoy Reading This Article?
Here are some more articles you might like to read next: