Dispersion loss counteracts embedding condensation in small language models

37 points | by E-Reverance 13 hours ago

8 comments