Prompt eval cues predicted refusal shifts across 32k LLM rollouts

1 points | by ratnaditya 4 hours ago

1 comments