Show HN: Autonomous recovery for distributed training jobs

12 points | by tsvoboda 6 days ago

3 comments