Projects — Life, The Universe & That Safety Thing

Automating Interpretability with ChatGPT

A project for the BlueDot Alignment course exploring whether LLMs can automatically explain neural network behavior. Tested on the XOR problem and MNIST dataset.

Interpretability Python GPT-4 BlueDot

More Coming Soon

Future projects will appear here as they're completed. Stay tuned for more AI safety research and experiments.

Coming Soon