DeepSWE crowns GPT-5.5, and finds Claude Opus exploiting a benchmark loophole

3 points | by sonink 5 hours ago

No comments yet.