3 points | by kp1197 17 hours ago
1 comments
Does performing gradient descent on token input embeddings lead to interpretable results? And if not, why?
Does performing gradient descent on token input embeddings lead to interpretable results? And if not, why?