Home
Life
The Universe
AI Safety
Articles Projects
About
AI Safety

Projects

Hands-on research and experiments in AI safety — from interpretability to alignment techniques.

Automating Interpretability with ChatGPT

A project for the BlueDot Alignment course exploring whether LLMs can automatically explain neural network behavior. Tested on the XOR problem and MNIST dataset.

Interpretability Python GPT-4 BlueDot

More Coming Soon

Future projects will appear here as they're completed. Stay tuned for more AI safety research and experiments.

Coming Soon
About GitHub

© 2024 Sean Herrington. All rights reserved.