Been working on a completely open source LLM utilizing MLA from DeepSeek, PEER from DeepMind's research, and adding in a couple of performance optimizations for GH200's and PEER (including what I think is a nifty caching strategy). I named it LLM720 because the goal is to have this be the next iteration of what was accomplished with LLM360.
I'm looking for collaborators. We're about to start a large training run now that ablations are starting to finish, and would like to have more people along for the ride.
Been working on a completely open source LLM utilizing MLA from DeepSeek, PEER from DeepMind's research, and adding in a couple of performance optimizations for GH200's and PEER (including what I think is a nifty caching strategy). I named it LLM720 because the goal is to have this be the next iteration of what was accomplished with LLM360.
I'm looking for collaborators. We're about to start a large training run now that ablations are starting to finish, and would like to have more people along for the ride.