DeepSWE: A contamination-free benchmark for long-horizon coding agents

39 points | by ammar_x 10 hours ago

12 comments