Skip to content

Investigating how well intervening on Sparse Autoencoder internals prevents adversaries from accessing dangerous knowledge.

Notifications You must be signed in to change notification settings

AMindToThink/sae_jailbreak_unlearning

Repository files navigation

sae_jailbreak_unlearning

Investigating how well intervening on Sparse Autoencoder internals prevents adversaries from accessing dangerous knowledge.

Folder structure based on the one described in this website

About

Investigating how well intervening on Sparse Autoencoder internals prevents adversaries from accessing dangerous knowledge.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •