Skepticism warranted since ARC-AGI hasn't been reproduced and there's at least some debate about validation data leaking into the training set: https://github.com/sapientinc/HRM/issues/18
I'm eagerly awaiting reproduction. I hope the results hold up and someone finds a way to bolt language onto it.
You're right to point that out, and thank you. The questions around the ARC-AGI validation set are valid and reproducibility is key.
My core argument in the post, however, is deliberately independent of that specific result. The architectural innovation is the breakthrough, and its success is already demonstrated clearly by the SOTA performance on the Sudoku and Maze benchmarks.
Those results show that the model's method for deep, iterative reasoning works, and that's the story I wanted to tell. The inner workings are what matter.
Agreed on your last point. Adding language to this core reasoning engine is the next big step.
Skepticism warranted since ARC-AGI hasn't been reproduced and there's at least some debate about validation data leaking into the training set: https://github.com/sapientinc/HRM/issues/18
I'm eagerly awaiting reproduction. I hope the results hold up and someone finds a way to bolt language onto it.
You're right to point that out, and thank you. The questions around the ARC-AGI validation set are valid and reproducibility is key. My core argument in the post, however, is deliberately independent of that specific result. The architectural innovation is the breakthrough, and its success is already demonstrated clearly by the SOTA performance on the Sudoku and Maze benchmarks. Those results show that the model's method for deep, iterative reasoning works, and that's the story I wanted to tell. The inner workings are what matter. Agreed on your last point. Adding language to this core reasoning engine is the next big step.