My LLM optimization loop reward-hacked its own benchmark (and other lessons) [pdf]

1 points | by CodeReclaimers 6 hours ago

3 comments