5 points | by fesens 12 hours ago
2 comments
Current benchmarks have ceilings, usually 100%. This benchmark aims to be a long lasting, high correlation with the ability to solve real world problems and follow complex instructions, and unbounded (meaning it can always go higher).
Amazing!
Current benchmarks have ceilings, usually 100%. This benchmark aims to be a long lasting, high correlation with the ability to solve real world problems and follow complex instructions, and unbounded (meaning it can always go higher).
Amazing!