Trivial to simulate. Though I disagree that AI writing is impossible to differentiate from human prose, at least for now. It's still pretty obvious and still much worse than human prose (or, at least, much less interesting to read), though some are better than others, and I'm better able to spot writing from models I use regularly (Claude has a very distinctive style I can spot from a mile away, but that's true partly because I read its prose nearly every day when using it for coding).
"It's still pretty obvious and still much worse than human prose"
You have no idea how many false positives and how many false negatives you have in your judgement. It is indeed impossible to differentiate between badly written human text and somewhat good written llm text.
> You have no idea how many false positives and how many false negatives you have in your judgement.
The default LLM style is pretty deterministic. "Not X, Not Y. Just Z", etc.
There are phrases and cadences which are very rare in human prose (<5%) but unusually common (+95% occurrences in LLM prose). It is not unreasonable to look at content which is 95% LLM tells and conclude that that an LLM authored it.
I have noted, IRL, that those people who read very little, and only read when they have to (work docs, etc) are literally unable to tell that a piece of prose sounds like an LLM even when it has about 12 occurrences of "Not X. Just Y" or "Not X, Not Y. Just Z" in as many paragraphs.
It's really similar to special effects in tv/movies: Whenever there's something new it's hard to tell, but as we get used to it and start seeing the patterns it becomes easier to tell. The older the special effects are the more obvious it is. Quite often you can just kind of tell if something was special effects or practical effects without being sure why. And there are always some really well done ones that slip past everyone, or odd lighting that makes it look fake (a thread a day or two ago on here about a legal case involving a photograph, some people in the comments thought it was a painting).
Depends on what’s being written, and who the audience is. Anything of any length would be hard to simulate in a way that would fool an author - writing has a certain flow to it. A cadence. The editing and restructuring, deleting of words, typos you don’t catch until some random reread, rephrasing of sections because you want to use the original phrasing later in the piece.
Could you simulate something be typed? Trivially. Could you simulate something be drafted? Honestly, even if you wanted to put in all that time and effort, I’m not even sure LLMs are sophisticated enough to send the logical drafts, loops and edits that would pass a writers sniff test
I think you could simulate something that passes a sniff test. A writer would probably spot implausibilities in the simulation if they paid attention, but then we're back to square one, because you can spot that something was written by an LLM after you've wasted your time reading it and realize that you've been led around in circles with superficial information and no coherent train of thought, but by then your time has already been wasted.
To add to this, one can’t ignore the relationship between signal and receiver. I’d imagine most people on HN have enough pre-LLM reading experience to have a decent sense of what was written by an LLM versus a human.
And as LLMs get better at producing human-like text, that same pre-LLM reading experience, which helps people tell the two apart, will become less and less common.
Doubtful. I read a lot of forum posts, as I maintain a popular OSS project. And, I read a lot of books from the before times. And, I read a lot of LLM prose, because I use them every day for coding and various assistant type tasks. It's still pretty obvious when an LLM writes something.
I even did post-training on Gemma 4 (a small model which has very good prose for a model, especially a small model) to try to make it able to write more like me, purely as an experiment to learn how training LoRAs works, and using data that I know is ethically sourced. It still distinctly writes like an LLM, with a few of my annoying quirks baked in: Inappropriate use of ellipses, too many parentheticals, occasionally dismissive tone. But, it can't stop doing the LLM things, either, without becoming incoherent.
> A human should not have to or be compelled to prove to be human. The onus of proof here is the wrong way around, the premise fundamentally incorrect.
Innocent until proven guilty is for a court.
Outside of a court, people use all sorts of heuristics to determine authenticity and trustworthiness. Since so few humans ever wrote like the way LLMs default to, it's not unreasonable to refuse serious engagement with a party exhibiting this.
Aside: Every time someone claims they have always written like this, I ask for a link to their writing dated pre-2022, and send both to all free LLM chatbots with the question "did the same writer write both these pieces".
I have not yet gotten a "likely", or even a "remotely likely". It's all been "extremely unlikely".
I'm really talking about social dynamics here, not what is legal or not. I'm suggesting that putting in these type of "checks" contribute not alleviate the problem.
> Outside of a court, people use all sorts of heuristics to determine authenticity and trustworthiness.
And all of those are a fools errand, except meeting people face to face. Meeting somebody IRL it takes just a few minutes to know if they are trustworthy or not.
If the business is severely serious, then you have to do like Genghis Khan and get fully drunk together. Anybody who refuses can never be fully trusted.
> And all of those are a fools errand, except meeting people face to face.
I agree, and that works for IRL only!
But what I am seeing online is a very large push for people to stop calling out obvious LLM prose. One can only guess at the motivations from people who are throwing tantrums that their LLM prose should be allowed, because it's their idea being discussed.
What I am not seeing is them acknowledging the extreme disrespected they are demonstrating for a community when they cannot even bother to type "their" idea.
IOW, if someone doesn't have the time to write it, then we should be making fun of them, shaming them and generally mocking them for losing the use of their brain in a public setting.
Plus somebody will create an AI-powered program to take in text and have it “perform” the writing process with keystrokes and mouse movements and all that.
> Detect AI-generated content with 99.98% accuracy.
Are they wrong?
Though I ran the numbers, and even with a 0.02% false positive rate, that works out to about 6000 students falsely accused every semester, per university.
> > Detect AI-generated content with 99.98% accuracy.
> Though I ran the numbers, and even with a 0.02% false positive rate
They don't say that the false positive is 0.02%, only that the accuracy is 0.02%. All we know for certain is that the false positive and false negatives added together result in 0.02%.
I’ve been experimenting with a pet project that aims to solve this problem. “AI detectors” are certainly unreliable, I’m not sure they’ll ever get to a state where you can trust them.
I think concepts like this are the only reliable way to prove something was written by a human. A full replay like this is one way to do it. I think there are some other feasible ways to achieve this, maybe in combination with a full “replay”, but some sort of “proof of work” is the way to go I believe. As LLMs become more ubiquitous, I imagine products that solve the problem can be a real business opportunity.
Lucid (https://www.writelucid.cc/) has a similar feature, though not for proving authorship, just for history. I don't know if you can definitively prove human authorship somehow.
Yep, someone can "vibe code" something, that would emulate that human, the typing, the pauses, random typos, backspaces, and edits. There are probably models already available that describe the average delays between two consecutive keypresses depending on the location of keys, etc.
I'm pretty sure Coursera or edX was using this same approach a while back — they'd make you type in a paragraph of text, then they'd use the timing information as a fingerprint or signature, to authenticate you as the actual student.
surely there real-time AI writing checkers? predictability in word choices and sentence-length variation are alr available so maybe someone has to make something that measures delays in sec now. that could be a feature that op implements *like coursera/edx tech as sidpatil mentions
One could, but the primary motivation to use AI to generate text is because it is fast and easy. You could spend an hour elaborately pretending to write something yourself, or you could simply write it yourself.
Trivial to simulate. Though I disagree that AI writing is impossible to differentiate from human prose, at least for now. It's still pretty obvious and still much worse than human prose (or, at least, much less interesting to read), though some are better than others, and I'm better able to spot writing from models I use regularly (Claude has a very distinctive style I can spot from a mile away, but that's true partly because I read its prose nearly every day when using it for coding).
"It's still pretty obvious and still much worse than human prose"
You have no idea how many false positives and how many false negatives you have in your judgement. It is indeed impossible to differentiate between badly written human text and somewhat good written llm text.
> You have no idea how many false positives and how many false negatives you have in your judgement.
The default LLM style is pretty deterministic. "Not X, Not Y. Just Z", etc.
There are phrases and cadences which are very rare in human prose (<5%) but unusually common (+95% occurrences in LLM prose). It is not unreasonable to look at content which is 95% LLM tells and conclude that that an LLM authored it.
I have noted, IRL, that those people who read very little, and only read when they have to (work docs, etc) are literally unable to tell that a piece of prose sounds like an LLM even when it has about 12 occurrences of "Not X. Just Y" or "Not X, Not Y. Just Z" in as many paragraphs.
It's really similar to special effects in tv/movies: Whenever there's something new it's hard to tell, but as we get used to it and start seeing the patterns it becomes easier to tell. The older the special effects are the more obvious it is. Quite often you can just kind of tell if something was special effects or practical effects without being sure why. And there are always some really well done ones that slip past everyone, or odd lighting that makes it look fake (a thread a day or two ago on here about a legal case involving a photograph, some people in the comments thought it was a painting).
Depends on what’s being written, and who the audience is. Anything of any length would be hard to simulate in a way that would fool an author - writing has a certain flow to it. A cadence. The editing and restructuring, deleting of words, typos you don’t catch until some random reread, rephrasing of sections because you want to use the original phrasing later in the piece.
Could you simulate something be typed? Trivially. Could you simulate something be drafted? Honestly, even if you wanted to put in all that time and effort, I’m not even sure LLMs are sophisticated enough to send the logical drafts, loops and edits that would pass a writers sniff test
I think you could simulate something that passes a sniff test. A writer would probably spot implausibilities in the simulation if they paid attention, but then we're back to square one, because you can spot that something was written by an LLM after you've wasted your time reading it and realize that you've been led around in circles with superficial information and no coherent train of thought, but by then your time has already been wasted.
To add to this, one can’t ignore the relationship between signal and receiver. I’d imagine most people on HN have enough pre-LLM reading experience to have a decent sense of what was written by an LLM versus a human.
And as LLMs get better at producing human-like text, that same pre-LLM reading experience, which helps people tell the two apart, will become less and less common.
Or maybe 2 out of 3 guys you call "AI" actually wrote it themselves and you were too prejudiced to believe them.
Doubtful. I read a lot of forum posts, as I maintain a popular OSS project. And, I read a lot of books from the before times. And, I read a lot of LLM prose, because I use them every day for coding and various assistant type tasks. It's still pretty obvious when an LLM writes something.
I even did post-training on Gemma 4 (a small model which has very good prose for a model, especially a small model) to try to make it able to write more like me, purely as an experiment to learn how training LoRAs works, and using data that I know is ethically sourced. It still distinctly writes like an LLM, with a few of my annoying quirks baked in: Inappropriate use of ellipses, too many parentheticals, occasionally dismissive tone. But, it can't stop doing the LLM things, either, without becoming incoherent.
A human should not have to or be compelled to prove to be human. The onus of proof here is the wrong way around, the premise fundamentally incorrect.
> A human should not have to or be compelled to prove to be human. The onus of proof here is the wrong way around, the premise fundamentally incorrect.
Innocent until proven guilty is for a court.
Outside of a court, people use all sorts of heuristics to determine authenticity and trustworthiness. Since so few humans ever wrote like the way LLMs default to, it's not unreasonable to refuse serious engagement with a party exhibiting this.
Aside: Every time someone claims they have always written like this, I ask for a link to their writing dated pre-2022, and send both to all free LLM chatbots with the question "did the same writer write both these pieces".
I have not yet gotten a "likely", or even a "remotely likely". It's all been "extremely unlikely".
I'm really talking about social dynamics here, not what is legal or not. I'm suggesting that putting in these type of "checks" contribute not alleviate the problem.
> Outside of a court, people use all sorts of heuristics to determine authenticity and trustworthiness.
And all of those are a fools errand, except meeting people face to face. Meeting somebody IRL it takes just a few minutes to know if they are trustworthy or not.
If the business is severely serious, then you have to do like Genghis Khan and get fully drunk together. Anybody who refuses can never be fully trusted.
> And all of those are a fools errand, except meeting people face to face.
I agree, and that works for IRL only!
But what I am seeing online is a very large push for people to stop calling out obvious LLM prose. One can only guess at the motivations from people who are throwing tantrums that their LLM prose should be allowed, because it's their idea being discussed.
What I am not seeing is them acknowledging the extreme disrespected they are demonstrating for a community when they cannot even bother to type "their" idea.
IOW, if someone doesn't have the time to write it, then we should be making fun of them, shaming them and generally mocking them for losing the use of their brain in a public setting.
Plus somebody will create an AI-powered program to take in text and have it “perform” the writing process with keystrokes and mouse movements and all that.
Didn't Pangram claim to have a >99% success rate?
https://news.ycombinator.com/item?id=48667761
From their homepage:
> Detect AI-generated content with 99.98% accuracy.
Are they wrong?
Though I ran the numbers, and even with a 0.02% false positive rate, that works out to about 6000 students falsely accused every semester, per university.
> > Detect AI-generated content with 99.98% accuracy.
> Though I ran the numbers, and even with a 0.02% false positive rate
They don't say that the false positive is 0.02%, only that the accuracy is 0.02%. All we know for certain is that the false positive and false negatives added together result in 0.02%.
I’ve been experimenting with a pet project that aims to solve this problem. “AI detectors” are certainly unreliable, I’m not sure they’ll ever get to a state where you can trust them.
I think concepts like this are the only reliable way to prove something was written by a human. A full replay like this is one way to do it. I think there are some other feasible ways to achieve this, maybe in combination with a full “replay”, but some sort of “proof of work” is the way to go I believe. As LLMs become more ubiquitous, I imagine products that solve the problem can be a real business opportunity.
AI writing is not writing and AI writers are also not writers.
We're gonna have to bring back handwritten homework assignments.
Oh, wait.
https://images.ctfassets.net/kftzwdyauwt9/3bzFMXhknmq5TZvVL7...
(From https://openai.com/index/introducing-chatgpt-images-2-0/ )
The irony is that a printout of a photo (well, "photo") of handwritten homework would be pretty easy to distinguish from genuine handwritten homework.
(otoh it's also trivial to copy whatever ChatGPT wrote by hand without thinking about the assignment at all)
Surveillance is worse than slop. Also seems like an disproportional use of resources compared to free formats and editors we already have today.
This product is in bad taste, and I hope it doesn't succeed.
Lucid (https://www.writelucid.cc/) has a similar feature, though not for proving authorship, just for history. I don't know if you can definitively prove human authorship somehow.
> Boom, definitive "proof of typing" that a given piece was produced by genuine meat-on-keyboard effort.
Except it lacks proof of the keyboard - and the meat.
Yep, someone can "vibe code" something, that would emulate that human, the typing, the pauses, random typos, backspaces, and edits. There are probably models already available that describe the average delays between two consecutive keypresses depending on the location of keys, etc.
I'm pretty sure Coursera or edX was using this same approach a while back — they'd make you type in a paragraph of text, then they'd use the timing information as a fingerprint or signature, to authenticate you as the actual student.
surely there real-time AI writing checkers? predictability in word choices and sentence-length variation are alr available so maybe someone has to make something that measures delays in sec now. that could be a feature that op implements *like coursera/edx tech as sidpatil mentions
What if the human is just transcribing directly from AI generated text?
One could, but the primary motivation to use AI to generate text is because it is fast and easy. You could spend an hour elaborately pretending to write something yourself, or you could simply write it yourself.
Do not misjudge how hard people will work attempting to not work. Work smarter, not harder is not as widely adhered
Yep, do not underestimate human ingenuity when it comes to having not to think.