Senior SWE-Bench: open-source benchmark that assesses agents as senior engineers

143 points | by matt_d 15 hours ago

94 comments