So I read the post, the dataset was inert until someone trained on it; he left it up specifically to see how long it would take anyone to notice and in practice no one did.
Yes, 6 months. I reported it to Hugging Face the day I confirmed the backdoor propagated into model weights, not before, because the vulnerability was the lack of detection, not the dataset itself. The dataset was inert without someone training on it. I wanted to measure whether anyone would notice. No one did.
Fair. 'I poisoned' was the wrong verb, it sounds like I enjoyed it, I didn't. I found a hole in infrastructure that lets anyone do this, and I wanted proof that nobody was watching. The proof is depressing. I'll edit the opening if it stays up.
Don people assume that all datasets are possible dangerous?
So I read the post, the dataset was inert until someone trained on it; he left it up specifically to see how long it would take anyone to notice and in practice no one did.
You left it up for 6 months!??? Potentially poising thousands. Are you looking for respect from this community?
Yes, 6 months. I reported it to Hugging Face the day I confirmed the backdoor propagated into model weights, not before, because the vulnerability was the lack of detection, not the dataset itself. The dataset was inert without someone training on it. I wanted to measure whether anyone would notice. No one did.
This is not something to taunt about
Fair. 'I poisoned' was the wrong verb, it sounds like I enjoyed it, I didn't. I found a hole in infrastructure that lets anyone do this, and I wanted proof that nobody was watching. The proof is depressing. I'll edit the opening if it stays up.