SPNs for imputing data

The following is a Jupyter notebook ran on Google Colab using Kaggle’s Titanic dataset to illustrate a practical use case of a probabilistic circuit, a sum-product network, from deepprob-kit: to impute missing tabular data.

This notebook is forked from a Kaggle tutorial on using a Tensorflow’s autoencoder to impute missing data. Standard autoencoders can’t:

  • custom fill-in missing data
  • flexibly incorporate domain knowledge like what distribution is best used to model a feature
  • Furthermore, p;robabilistic circuits can tractably compute missing data through maximum a posteriori estimation.