This was fun to work on. LLMs for writing kernels still has a long way to go. Its honestly a little surprising how decent they are now. I guess I've been pretty consistently "surprised" by codegen for a while now (meaning the last two years)
This is the first step towards fully automated GPU performance optimization. The idea is to automatically generate GPU kernels, then automatically integrate them in vLLM/SGLang/PyTorch.
This was fun to work on. LLMs for writing kernels still has a long way to go. Its honestly a little surprising how decent they are now. I guess I've been pretty consistently "surprised" by codegen for a while now (meaning the last two years)
This is the first step towards fully automated GPU performance optimization. The idea is to automatically generate GPU kernels, then automatically integrate them in vLLM/SGLang/PyTorch.
Quite cool. It's interesting that the LLM is able to optimize code based on the target hardware itself.