Top model scores may be skewed by Git history leaks in SWE-bench

382 points | by mustaphah 15 hours ago

119 comments