Steering interpretable language models with concept algebra

35 points | by luulinh90s a day ago

3 comments