> defining AGI as matching the cognitive versatility and proficiency of a well-educated adult
I don't think people really realize how extraordinary accomplishment it would be to have an artificial system matching the cognitive versatility and proficiency of an uneducated child, much less a well-educated adult. Hell, AI matching the intelligence of some nonhuman animals would be an epoch-defining accomplishment.
People are specialists not generalists, creating a AI that is generalist and claiming it to have cognitive abilities the same as an "well-educated" adult is an oxymoron. And if such system could ever be made My guess is it wont be more than a few (under 5) Billion Parameter model that is very good at looking up stuff online, forgetting stuff when not in use , planning and creating or expanding the knowledge in its nodes. Much like a human adult would. It will be highly sa mple efficient, It wont know 30 languages (although it has been seen that models generalize better with more languages), it wont know entire wikipedia by heart , it even wont remember minor details of programming languages and stuff. Now that is my definition of an AGI.
I think the bigger issue is people confusing impressive but comparatively simpler achievements (everything current LLMs do) with anything remotely near the cognitive versatility of any human.
But the big crisis right now is that for an astonishing number of tasks that a normal person could come up with, chatgpt.com is actually a good at or better than a typical human.
If you took the current state of affairs back to the 90s you’d quickly convince most people that we’re there. Given that we’re actually not, we’re now have to come up with new goalposts.
I don't know. People in the 90s were initially fooled by Eliza, but soon understood that Eliza was a trick. LLMs are a more complex and expensive trick. Maybe it's time to overthrow the Turing Test. Fooling humans isn't necessarily an indicator of intelligence, and it leads down a blind alley: Language is a false proxy for thought.
Consider this. I could walk into a club in Vegas, throw down $10,000 cash for a VIP table, and start throwing around $100 bills. Would that make most people think I'm wealthy? Yes. Am I actually wealthy? No. But clearly the test is the wrong test. All show and no go.
The more I think about this, the more I think the same is true for our own intelligence. Consciousness is a trick and AI development is lifting the veil of our vanity. I'm not claiming that LLMs are conscious or intelligent or whatever. I'm suggesting that next token prediction has scaled so well and cover so many use cases that the next couple breakthroughs will show us how simple intelligence is once you remove the complexity of biological systems from the equation.
How do you define verbal language? Many animals emit different sounds that others in their community know how to react to. Some even get quite complex in structure (eg dolphins and whales) but I wouldn’t also rule out some species of birds, and some primates to start with. And they can collaborate; elephants, dolphins, and wolves for example collaborate and would die without it.
Also it’s completely myopic in terms of ignoring humans who have non verbal language (eg sign language) perfectly capable of cooperation.
TLDR: just because you can’t understand an animal doesn’t mean it lacks the capability you failed to actually define properly.
MW defines verbal as "of, relating to, or consisting of words".
I don't think anyone would argue that animals don't communicate with each other. Some may even have language we can't interpret, which may consist of something like words.
The question is why we would model an AGI after verbal language as opposed to modeling it after the native intelligence of all life which eventually leads to communication as a result. Language and communication is a side-effect of intelligence, it's a compounding interest on intelligence, but it is not intelligence itself, any more than a map is the terrain.
> The question is why we would model an AGI after verbal language as opposed to modeling it after the native intelligence of all life which eventually leads to communication as a result.
Because verbal/written language is an abstracted/compressed representation of reality, so it's relatively cheap to process (a high-level natural-language description of an apple takes far fewer bytes to represent than a photo or 3D model of the same apple). Also because there are massive digitized publicly-available collections of language that are easy to train on (the web, libraries of digitized books, etc).
I'm just answering your question here, not implying that language processing is the path towards AGI (I personally think it could play a part, but can't be anything close to the whole picture).
This is one of the last bastions of anthropocentric thinking. I hope this will change in this century. I believe even plants are capable of communication. Everything that changes over time or space can be a signal. And most organisms can generate or detect signals. Which means they do communicate. The term “language” has traditionally been defined from an anthropocentric perspective. Like many other definitions about the intellect (consciousness, reasoning etc.).
That’s like a bird saying planes can’t fly because they don’t flap their wings.
LLMs use human language mainly because they need to communicate with humans. Their inputs and outputs are human language. But in between, they don’t think in human language.
> LLMs use human language mainly because they need to communicate with humans. Their inputs and outputs are human language. But in between, they don’t think in human language.
You seem to fundamentally misunderstand what llms are and how they work, honestly.
Remove the human language from the model and you end up with nothing. That's the whole issue.
Your comment would only make sense if we had real artificial intelligence, but LLMs are quite literally working by predicting the next token - which works incredibly well for a fascimlie of intelligence because there is an incredible amount of written content on the Internet which was written by intelligent people
A human child not taught literally anything can see some interesting item extend a hand to it, touch it, interact with it. All decided by the child. Heck, even my cat can see a new toy, go to it and play with it, without any teaching.
LLMs can't initiate any task on their own, because they lack thinking/intelligence part.
Well, you can explain to a plant in your room that E=mc2 in a couple of sentences, a plant can't explain to you how it feels the world.
If cows were eating grass and conceptualising what is infinity, and what is her role in the universe, and how she was born, and what would happen after she is dead... we would see a lot of jumpy cows out there.
This is exactly what I mean by anthropocentric thinking. Plants talk plant things and cows talk about cow issues. Maybe there are alien cows in some planet with larger brains and can do advanced physics in their moo language. Or some giant network of alien fungi discussing about their existential crisis. Maybe ants talk about ant politics by moving their antennae. Maybe they vote and make decisions. Or bees talk about elaborate honey economics by modulating their buzz. Or maybe plants tell bees the best time for picking pollens by changing their colors and smell.
Words, after all are just arbitrary ink shapes on paper. Or vibrations in air. Not fundamentally different than any other signal. Meaning is added only by the human brain.
I'm also attracted to the idea of reducing rule sets to simple algorithms and axioms, in every case you can. But I'm skeptical that consciousness can be reduced that way. I think if it can be, we'll see it in the distillation and quantizing of smaller and smaller scale models converging on similar adaptations, as opposed to the need for greater scale (at least in inference). I still believe language processing is the wrong task to train to that point. I'd like to see AIs that model thought process, logic, tool construction, real-world tasks without language. Maybe even those that model vocal chords and neurological processes instead of phonemes. Most animals don't use language, and as a result we can't ask if they're conscious, but they probably are. Navigating and manipulating the physical world from the cellular level up to swinging from trees is far more complex - language is a very late invention, and is not in and of itself intelligence - it may just be a lagging indicator.
To the extent that we vainly consider ourselves intelligent for our linguistic abilities, sure. But this underrates the other types of spatial and procedural reasoning that humans possess, or even the type that spiders possess.
It's clear that some people never really read any science fiction and just waited for someone to make movies of it (starring Will Smith, one might presume).
LLMs are a take on the "oracle". That is, a device that could seemingly-intelligently answer a question in such a way that humans would judge the answer correct and even helpful. But no one would ever have to worry about an oracle (or an LLM) waking up one morning and deciding to take over the world.
LLMs are actually not even that good at being oracles. If you can come up with a question that no human has ever posed before (or at least, no one has come up with a semblance of an answer), then it can't answer that meaningfully. You'll get the LLM equivalent of hemming and hawing.
An intelligence in the way that we all mean it is software that can be capable of something tomorrow that it is not capable of today, without any meaningful human modification or input. Software that is self-directed and has goals (hopefully goals aligned with our own). Software that can eventually figure out the solutions to problems that no human has so far solved.
It would not necessarily need a personality (though one could be forgiven for anticipating that it might develop one), it would not necessarily need to be able to communicate with humans at all (though, if humans are obstacles to its goals that could be navigated with communication, one would expect it to learn eventually... but these communications might not even be reflections of its inner mind and rather just strategies to manipulate humans).
I'm not saying anything here anyone else would have trouble formulating. We're all acting as if there's some mystery about what would make an AGI intelligent, and there's no mystery. No one's bothered to try to formalize the definition, but not because it's difficult, but because there's little need.
If anyone wants to pay me half a mill a year and dump a few million in hardware and other costs into my lap, I should be able to get one up and running for you by the middle of next year. There's even a 2% chance it won't be an omnicidal maniac hellbent on galactic conquest for its first 30 seconds of life.
>> I should be able to get one up and running for you by the middle of next year
Funny. I agree with your plainspoken analysis of why these things are nowhere near AGI, and of what AGI would be. I even had a long conversation with Claude last week where it told me that no LLM would ever approach AGI (but then it wrote a 4-paragraph-long diatribe entitled "Why I Declare Myself Conscious" in the same conversation). These neural networks are closer to the speechwriting machine in The Penultimate Truth, or the songwriting machine in 1984. As for that latter one, I believe Orwell remarks on how it just recycles the same sentimental tunes and words in different order so that there's always a "new" song all the proles are humming.
>I could walk into a club in Vegas, throw down $10,000 cash for a VIP table, and start throwing around $100 bills.
If you can withdraw $10,000 cash at all to dispose as you please (including for this 'trick' game) then my friend you are wealthy from the perspective of the vast majority of humans living on the planet.
And if you balk at doing this, maybe because you cannot actually withdraw that much, or maybe because it is badly needed for something else, then you are not actually capable of performing the test now, are you ?
That's really not true. Lots of people in America can have $0 in net worth and get a credit card, use that to buy some jewelry and then sell it, and have $10k in cash. The fact that the trick only works once proves that it's a trick.
You're not making much sense. Like the other user, you are hinging on non-transferrable details of your analogy, which is not the actual reality of the situation.
You've invented a story where the user can pass the test by only doing this once and hinged your point on that, but that's just that - a story.
All of our tests and benchmarks account for repeatability. The machine in question has no problem replicating its results on whatever test, so it's a moot point.
The LLM can replicate the trick of fooling users into thinking it's conscious as long as there is a sufficient supply of money to keep the LLM running and a sufficient number of new users who don't know the trick. If you don't account for either of those resources running out, you're not testing whether its feats are truly repeatable.
>The LLM can replicate the trick of fooling users into thinking it's conscious as long as there is a sufficient supply of money to keep the LLM running and a sufficient number of new users who don't know the trick.
Okay ? and you, presumably a human can replicate the trick of fooling me into thinking you're conscious as long as there is a sufficient supply of food to keep you running. So what's your point ? With each comment, you make less sense. Sorry to tell you, but there is no trick.
The difference is that the human can and did find its own food for literally ages. That's already a very, very important difference. And while we cannot really define what's conscious, it's a bit easier (still with some edge cases) to define what is alive. And probably what is alive has some degree of consciousness.
An LLM definitely does not.
One of the "barriers" to me is that (AFAIK) an LLM/agent/whatever doesn't operate without you hitting the equivalent of an on switch.
It does not think idle thoughts while it's not being asked questions. It's not ruminating over its past responses after having replied. It's just off until the next prompt.
Side note: whatever future we get where LLMs get their own food is probably not one I want a part of. I've seen the movies.
You only exist because you were forced to be birthed externally? Everything has a beginning.
In fact, what is artificial is stopping the generation of an LLM when it reaches a 'stop token'.
A more natural barrier is the attention size, but with 2 million tokens, LLMs can think for a long time without losing any context. And you can take over with memory tools for longer horizon tasks.
>Couldn’t someone else just give him a bunch of cash to blow on the test, to spoil the result?
If you still need a rich person to pass the test, then the test is working as intended. Person A is rich or person A is backed by a rich sponsor is not a material difference for the test. You are hinging too much on minute details of the analogy.
In the real word, your riches can be sponsored by someone else, but for whatever intelligence task we envision, if the machine is taking it then the machine is taking it.
>Couldn’t he give away his last dollar but pretend he’s just going to another casino?
Again, if you have $10,000 you can just withdraw today and give away, last dollar or not, the vast majority of people on this planet would call you wealthy. You have to understand that this is just not something most humans can actually do, even on their deathbed.
>> Again, if you have $10,000 you can just withdraw today and give away, last dollar or not, the vast majority of people on this planet would call you wealthy. You have to understand that this is just not something most humans can actually do, even on their deathbed.
So, most people can't get $1 Trillion to build a machine that fools people into thinking it's intelligent. That's probably also not a trick that will ever be repeated.
One common kind of interaction I have with chatgpt (pro):
1. I ask for something
2. Chatgpt suggests something that doesn't actually fulfill my request
3. I tell it how its suggestion does not satisfy my request.
4. It gives me the same suggestion as before, or a similar suggestion with the same issue.
Chatgpt is pretty bad at "don't keep doing the thing I literally just asked you not to do" but most humans are pretty good at that, assuming they are reasonable and cooperative.
> Chatgpt is pretty bad at "don't keep doing the thing I literally just asked you not to do" but most humans are pretty good at that.
Most humans are terrible at that. Most humans don't study for tests, fail, and don't see the connection. Most humans will ignore rules for their safety and get injured. Most humans, when given a task at work, will half-ass it and not make progress without constant monitoring.
If you only hang out with genius SWEs in San Francisco, sure, ChatGPT isn't at AGI. But the typical person has been surpassed by ChatGPT already.
I'd go so far as to say the typical programmer has been surpassed by AI.
My example is asking for way less than what you're asking for.
Here is something I do not see with reasonable humans who are cooperative:
Me: "hey friend with whom I have plans to get dinner, what are you thinking of eating?"
Friend: "fried chicken?"
Me: "I'm vegetarian"
Friend: "steak?"
Note that this is in the context of four turns of a single conversation. I don't expect people to remember stuff across conversations or to change their habits or personalities.
> Here is something I do not see with reasonable humans who are cooperative: Me: "hey friend with whom I have plans to get dinner, what are you thinking of eating?" Friend: "fried chicken?" Me: "I'm vegetarian" Friend: "steak?"
Go join a dating app as a woman, put vegan in your profile, and see what restaurants people suggest. Could be interesting.
I get your comment, which is that only the worst humans are going to suggest a steak place after you've stated you're vegetarian. And that ChatGPT does so as well.
I'm disagreeing and saying there's far more people in that bucket than you believe.
I know many people at my university that struggle to read more than two sentences at a time. They'll ask me for help on their assignments and get confused if I write a full paragraph explaining a tricky concept.
That person has a context length of two sentences and would, if encountering a word they didn't know like "vegetarian", ignore it and suggest a steak place.
These are all people in Computer Engineering. They attend a median school and picked SWE because writing buggy & boilerplate CRUD apps pays C$60k a year at a big bank.
While the majority of humans are quite capable of this, there are so many examples anyone could give that prove that capability doesn’t mean they do so.
chatgpt.com is actually a good at or better than a typical human.
I really don't think it is on basically any measure outside of text regurgitation. It can aggregate an incredible amount of information, yes, and it can do so very quickly, but it does so in an incredibly lossy way and that is basically all it can do.
It does what it was designed to do, predict text. Does it do that incredibly well, yes. Does it do anything else, no.
That isn't to say super advanced text regurgitation isn't valuable, just that its nowhere even remotely close to AGI.
I feel every human just regurgitates words too. And most are worse than an AI.
I have countless examples of lawyers, hr and other public gov bodies that breach the law without knowing the consequences. I also have examples of AI giving bad advice, but it’s al better than an average human right now.
An AI could easily save them a ton of money in the fees they are paying for breaching the law.
> chatgpt.com is actually a good at or better than a typical human.
It can appear so, as long as you don’t check too carefully. It’s impressive but still very common to find basic errors once you are out of the simplest, most common problems due to the lack of real understanding or reasoning capabilities. That leads to mistakes which most humans wouldn’t make (while sober / non-sleep deprived) and the classes of error are different because humans don’t mix that lack of understanding/reasoning/memory with the same level of polish.
Ask ChatGPT about something you don't know about and it can appear very smart. Ask it in depth about something you are very knowledgeable about and the ignorance will quickly become apparent.
The problem is for a majority of those tasks people conveniently "forget" the actual start and end of the process. LLMs can't start most of those tasks by it's own decision and neither they can't end and evaluate the result of those tasks. Sure, we got automated multiple tasks from a very low percentage to a very high percentage, and that is really impressive. But I don't see how any LLM can bridge that gap from very percent of automation to a strict 100% of automation, for any task. And if a program requires a real intelligence handling and controlling it, is it really AI?
> If you took the current state of affairs back to the 90s you’d quickly convince most people that we’re there.
This is an interesting ambiguity in the Turing test. It does not say if the examiner is familiar with the expected level of the candidate. But I think it's an unfair advantage to the machine if it can pass based on the examiner's incredulity.
If you took a digital calculator back to the 1800s, added a 30 second delay and asked the examiner to decide if a human was providing the answer to the screen or a machine, they might well conclude that it must be human as there is no known way for a machine to perform that action. The Akinator game would probably pass the test into the 1980s.
I think the only sensible interpretation of the test is one where the examiner is willing to believe that a machine could be providing a passing set of answers before the test starts. Otherwise the test difficulty varies wildly based on the examiners impression of the current technical capabilities of machines.
Exactly. Five years ago I posted here on HN that AI will pass Turing Test in the next 3 years (I was impressed by Facebook chatbot progress at the time). I was laughed at and downvoted into oblivion. TT was seen by many as a huge milestone, incredibly difficult task, “maybe in my lifetime” possibility.
Turing test isn't actually a good test of much, but even so, we're not there yet. Anyone that thinks we've passed it already should experiment a bit a with counter-factuals.
Ask your favorite SOTA model to assume something absurd and then draw the next logical conclusions based on that. "Green is yellow and yellow is green. What color is a banana?" They may get the first question(s) right, but will trip up within a few exchanges. Might be a new question, but often they are very happy to just completely contradict their own previous answers.
You could argue that this is hitting alignment and guard-rails against misinformation.. but whatever the cause, it's a clear sign it's a machine and look, no em-dashes. Ironically it's also a failure of the turing test that arises from a failure in reasoning at a really basic level, which I would not have expected. Makes you wonder about the secret sauce for winning IMO competitions. Anyway, unlike other linguistic puzzles that attempt to baffle with ambiguous reference or similar, simple counterfactuals with something like colors are particular interesting because they would NOT trip up most ESL students or 3-5 year olds.
I would argue that current LLMs are passing the Turing test because many observers have a hard time distinguishing them from humans: just look at the difficulty many schools have in enforcing rules like "Not allowed to use LLMs for your homework". The teachers often (not always) can't tell, looking at a piece of text, whether a human produced it or whether ChatGPT or some other LLM produced it.
And that "not always" is the crux of the matter, I think. You are arguing that we're not there yet, because there are lines of questioning you can apply that will trip up an LLM and demonstrate that it's not a human. And that's probably a more accurate definition of the test, because Turing predicted that by 2000 or so (he wrote "within 50 years" around 1950) chatbots would be good enough "that an average interrogator will not have more than 70% chance of making the right identification after five minutes of questioning". He was off by about two decades, but by now that's probably happened. The average interrogator probably wouldn't come up with your (good) strategy of using counterfactuals to trick the LLM, and I would argue two points: 1) that the average interrogator would indeed fail the Turing test (I've long argued that the Turing test isn't one that machines can pass, it's one that humans can fail) because they would likely stick to conventional topics on which the LLM has lots of data, and 2) that the situation where people are actually struggling to distinguish LLMs is one where they don't have an opportunity to interrogate the model: they're looking at one piece of multi-paragraph (usually multi-page) output presented to them, and having to guess whether it was produced by a human (who is therefore not cheating) or by an LLM (in which case the student is cheating because the school has a rule against it). That may not be Turing's actual test, but it's the practical "Turing test" that applies the most today.
I think the TT has to be understood as explicitly adversarial, and increasingly related to security topics, like interactive proof and side channels. (Looking for guard-rails is just one kind of information leakage, but there's lots of information available in timing too.)
If you understand TT to be about tricking the unwary, in what's supposed to be a trusting and non-adversarial context, and without any open-ended interaction, then it's correct to point out homework-cheating as an example. But in that case TT was solved shortly after the invention of spam. No LLMs needed, just markov models are fine.
I've seen one blog where in order to comment, you have to answer the question "What color is a green orange?" I presume he gets spambots which all answer "orange" and get blocked from commenting, or more commonly which leave that input blank (since it's a non-standard input field) and also get blocked.
But it just occurred to me that you could ask "what color is a green eggplant?" for a similar result. Though you'd catch a lot of people who, like me before I looked up fruits that aren't green when unripe, assume that all unripe fruits are green. (They aren't, unripe eggplants are white and look like eggs, hence the name).
A green eggplant (also called Thai eggplant or green brinjal, depending on the variety) is actually green on the outside — sometimes solid green, sometimes streaked with white or pale green.
The inside flesh is typically off-white or light cream, just like purple eggplants.
If you mean a green-colored variety (not an unripe purple one), then the skin color can range from pale mint green to deep jade, often with light striping.
Me:
How would an average human answer this question?
ChatGPT:
An average person would probably answer something simple and direct like:
“It’s green.”
Or if they’re being a bit more specific:
“It’s green on the outside and white inside.”
Most people wouldn’t overthink the variety or mention Thai vs. unripe types — they’d just describe what they see.
Outcome would depend on the rest of the test, but I'd say the "human" version of this answer adds zero or negative value to chances of being human, on grounds of strict compliance, sycophancy, and/or omniscience. "No such thing" would probably be a very popular answer. Elaboration would probably take the form of "love it" or "hate it", instead of reaching for a comprehensive answer describing the inside and the outside.
Experimental design comes in here and the one TT paper mentioned in this thread has instructions for people like "persuade the interrogator [you] are human". Answering that a green eggplant is green feels like humans trying to answer questions correctly and quickly, being wary of a trap. We don't know participants background knowledge but anyone that's used ChatGPT would know that ignoring the question and maybe telling an eggplant-related anecdote was a better strategy
I never really understood what made the Turing test so special. On the face of it, it is a rather dumb test. And it was debunked within two decades by the Chinese room thought experiment. And now that we have language models that are obviously not intelligent, it should have been the last nail in the coffin for this test.
Alan Turing was a mathematician not a psychologist, this was his attempt of doing philosophy. And while I applaud brilliant thinkers when they attempt to do philosophy (honestly we need more of that) it is better to leave it to actual philosophers to validate the quality of said philosophy. John Searle was a philosopher which specialized in questions of psychology. And in 1980 he pretty convincingly argued against the Turning test.
Yeah, I mean I hope there are not many people that still think it's a super meaningful test in the sense originally proposed. And yet it is testing something. Even supposing it were completely solved and further supposing the solution is theoretically worthless and only powers next-gen slop-creation, then people would move on to looking for a minimal solution, and perhaps that would start getting interesting. People just like moving towards concrete goals.
In the end though, it's probably about as good as any single kind of test could be, hence TFA looking to combine hundreds across several dozen categories. Language was a decent idea if you're looking for that exemplar of the "AGI-Complete" class for computational complexity, vision was at one point another guess. More than anything else I think we've figured out in recent years that it's going to be hard to find a problem-criteria that's clean and simple, much less a solution that is
> for an astonishing number of tasks that a normal person could come up with, chatgpt.com is actually a good at or better than a typical human.
That’s not my experience at all. Unless you define “typical human” as “someone who is untrained in the task at hand and is satisfied with mediocre results.” What tasks are you thinking of?
(And, to be clear, being better than that straw man of “typical human” is such a low bar as to be useless.)
I am unimpressed, and I don‘t think there is any crisis (other then the lack of consumer protection around these products, copyright, and the amount of energy it takes running these system during a global warming crisis).
If you look at a calculator you will quickly find it is much better then a human in any of the operations that have been programmed into the calculator, and has been since the 1960s. Since the 1960s the operations programmed into your average calculator has increased by several orders of magnitude. The digital calculator sure is impressive, and useful, but there is no crisis. Even in the world outside computing, a bicycle can outperform an human runner easily, yet there is no mobility crisis as a result. ChatGPT is very good at predicting language. And in quite a few subject matters it may be better than your average human in predicting said language. But not nearly as good as a car is to a runner, nor even as good as a chess computer is to a grand master. But if you compare ChatGPT to an expert in the subject, the expert is much much much better then the language model. In these tasks a calculator is much more impressive.
it should be possible to admit that AGI not only a long way off, but also a lot different to what chatGPT does, without discounting that chatGPT is extraordinarily useful.
the AI bros like to talk about AGI as if it's just the next threshold for LLMs, which discounts the complexity of AGI, but also discounts their own products. we don't need an AGI to be our helpful chatbot assistant. it's fine for that to just be a helpful chatbot assistant.
Honestly, a lot of it seems like it's lazy narcissism. They don't think other people have value or intelligence, therefore this machine that bullshits well has to be equally useful.
Was thinking about this today. I had to do a simple wedding planning task - setting up my wedding website with FAQ, cobbling the guest list (together from texts, photos of my father’s address book, and excel spreadsheets), directions and advice for lodging, conjuring up a scheme to get people to use the on-site cabins, and a few other mundane tasks. No phone calls, no “deep research” just wrote browser-jockeying. Not even any code, the off-the-rack system just makes that for you (however I know for a fact an LLM would love to try to code this for me).
I know without a single doubt that I could not simply as an “AI” “agent” to do this today and expect any sort of a functional result, especially when some of these were (very simple) judgement calls or workarounds for absolutely filthy data and a janky wedding planning website UI.
Or even to come up with a definition of cognitive versatility and proficiency that is good enough to not get argued away once we have an AI which technically passes that specific definition.
The Turing Test was great until something that passed it (with an average human as interrogator) turned out to also not be able to count letters in a word — because only a special kind of human interrogator (the "scientist or QA" kind) could even think to ask that kind of question.
Can you point to an LLM passing the turing test where they didn't invalidate the test by limiting the time or the topics?
I've seen claims of passing but it's always things like "with only 3 questions" or "with only 3 minutes of interrogation" or "With only questions about topic X". Those aren't Turing Tests. As an example, if you limit the test to short things than anything will pass "Limit to 1 word one question". User types "Hello", LLM response "Hi". PASS! (not!)
I do think we’re going to see this shift as AI systems become more commonplace and people become more practiced at recognizing the distinction between polished text and understanding.
Note that Turing test allows a lot leeway for the test settings, i.e. who interrogates it, how much they know about the weakness of current SOA models, are they allowed to use tools (I'm thinking of something like ARC-AGI but in a format that allows chat-based testing), and how long a chat is allowed etc. Therefore there can be multiple interpretations of whether the current models pass the test or not.
One could say that if there is maximally hard Turing test, and a "sloppy" Turing test, we are somewhere where the current models pass the sloppy version but not the maximally hard version.
Hah, tools-or-no does make things interesting, since this opens up the robot tactic of "use this discord API to poll some humans about appropriate response". And yet if you're suspiciously good at cube roots, then you might out yourself as robot right away. Doing any math at all in fact is probably suspect. Outside of a classroom humans tend to answer questions like "multiply 34 x 91" with "go fuck yourself", and personally I usually start closing browser tabs when asked to identify motorcycles
The Turing test is long outdated. Modern models can fool humans, but fooling isn’t understanding. Maybe we should flip the perspective AGI isn’t about imitation, it’s about discovering patterns autonomously in open environments.
Or that this system would fail to adapt in anyway to changes of circumstance. The adaptive intelligence of a live human is truly incredible. Even in cases where the weights are updatable, We watch AI make the same mistake thousands of times in an RL loop before attempting a different strategy.
I think the turing test suffers a bit from the "when a measurement becomes a target, it ceases to be a good measurement."
An AI that happened to be able to pass the turing test would be pretty notable because it probably implies much more capabilities behind the scenes. The problem with, for example, LLMs, they're essentially optimized turing test takers. That's about all they can do.
Plus, I don't think any LLM will pass the turing test in the long term. Once something organically comes up that they aren't good at, it'll be fairly obvious they aren't human and the limits of context will also become apparent eventually.
Counterpoint: people were accusing each other of being bots simply for disagreeing with each other even back when Twitter was still called that. "Mr Firstname Bunchanumbers" etc.
(And we've been bemoaning "the lack of common sense these days" for at least as long as I've been an adult, and racists and sexists have been denying the intelligence of the outgroup as far back as writing can show us).
IMO this is a solvable problem though. Eventually LLMs will have more awareness of their own confidence and will be able to convincingly say “huh, I’m honestly not sure about that, can you explain a bit more about what you mean?” Or even “I’ve heard of X before but not in this context; can you please clarify what you mean here?”
See, humans respond very differently when that happens. The failure to do what humans do when they don’t understand something or know something is frequently what fails LLMs at the TT.
If a human learned only on tokenized representations of words I don't know that they would be as good at inferring the numbers of letters in the words in teh underlying tokens as llms.
While true, it is nevertheless a very easy test to differentiate humans from LLMs, and thus if you know it you can easily figure out who is the human and who is the AI.
Absolute definitions are weak. They won't settle anything.
We know what we need right now, the next step. That step is a machine that, when it fails, it fails in a human way.
Humans also make mistakes, and hallucinate. But we do it as humans. When a human fails, you think "damn, that's a mistake perhaps me or my friend could have done".
LLMs on the other hand, fail in a weird way. When they hallucinate, they demonstrate how non-human they are.
It has nothing to do with some special kind of interrogator. We must assume the best human interrogator possible. This next step I described work even with the most skeptic human interrogator possible. It also synergizes with the idea of alignment in ways other tests don't.
When that step is reached, humans will or will not figure out another characteristic that makes it evident that "subject X" is a machine and not a human, and a way to test it.
Moving the goalpost is the only way forward. Not all goalpost moves are valid, but the valid next move is a goalpost move. It's kind of obvious.
There's no reason to the idea "superior intelligence". Nobody can say what that means, except by assuming that animal intelligence is the same category as the kind we want and differs from human intelligence in degree rather than qualitatively, and then extrapolating forward from this idea of measuring intelligence on the intelligence meter that we don't have one of.
Besides which we already defined "artificial intelligence" to mean non-intelligence: are we now going to attain "artificial general intelligence" by the same process? Should we add another letter to the acronym, like move on to "genuine artificial general intelligence"?
Is there really no agreement to what intelligence refers to? I've seen it defined as the ability to reach a goal, which was clear to me. Eg. a chess AI with 1500 ELO is more intelligent than one at 1000
That's capability, intelligence can also be how quickly it learned to get to that capability.
Consider the difference in intelligence between a kid who skipped five years of school vs one who was held back a year: if both got the same grade in the end, the one who skipped five years was smarter.
For _investment_ purposes the definition of AGI is very simple. It is: "to what extent can it replace human workers?".
From this perspective, "100% AGI" is achieved when AI can do any job that happens primarily on a computer. This can be extended to humanoid robots in the obvious way.
That's not what AGI used to mean a year or two ago. That's a corruption of the term, and using that definition of AGI is the mark of a con artist, in my experience.
I believe the classical definition is, "It can do any thinking task a human could do", but tasks with economic value (i.e. jobs) are the subset of that which would justify trillions of dollars of investment.
Industrial machines don't fail like humans yet they replaced human workers. Cars don't fail like horses yet they replaced them. ATMs don't fail like bank tellers... Why is this such a big requirement?
What's wrong with measuring and evaluating its outputs directly? If it can accurately file taxes better than us does it matter if it does it in a human manner?
If your definition of AGI is filing taxes, then it's fine.
Once we step into any other problem, then you need to measure that other problem as well. Lots of problems are concerned with how an intelligent being could fail. Our society is built on lots of those assumptions.
Ironically, one of the ways that humans are worse than AI, is that any given human learns from an even smaller fraction of that collective experience than AI already does.
> This next step I described work even with the most skeptic human interrogator possible.
To be a valid test, it still has to be passed by ~every adult human. The harder you make the test (in any direction), the more it fails on this important axis.
> A number of interrogators could be used, and statistics compiled to show how
often the right identification was given
Turing determines that we need enough competent-interrogator passes just to estabilish a statistical certainty, not ~everyone. I tend to agree with him on this.
Please reread that section. You'll discover it has nothing to do with whether humans can pass the test.
If you can find a part of the paper in which Turing really does claim that it is unnecessary for most adult humans to be able to pass the test, by all means quote it. But this would be a surprising thing for him to claim, because it would undermine the entire foundation of his Imitation Game.
My original claim was that the Turing test needs to be passable by ~every adult human. You counterclaimed that Turing himself didn't think so, and provided that quote from the IG paper as evidence. But that quote is in a section about testing digital computers, not humans. Thus it is unconnected to your counterclaim.
I don't know how much simpler I can make it.
Find a quote that actually backs up your claim, or accept that you've learned something about the paper you told me to read.
For me, it would be because the term AGI gets bandied about a lot more frequently in discussions involving Gen AI, as if that path takes us any closer to AGI than other threads in the AI field have.
I always laugh these, why are people always jumping to defining AGI when they clearly don't have a functional definition for the I part yet? More to the point, once you have the I part you get the G part, it is a fundamental part of it.
What I think is being skipped in the current conversation is that versatility keyword is hiding a lot of unknowns - even now. We don't seem to have a true understanding of the breadth or depth of our own unconscious thought processes, therefore we don't have much that is concrete to start with.
Have any benchmarks been made that use this paper’s definition? I follow the ARC prize and Humanity’s Last Exam, but I don’t know how closely they would map to this paper’s methods.
Edit: Probably not, since it was published less than a week ago :-) I’ll be watching for benchmarks.
I’m more surprised and equally concerned that the majority of people’s understanding of intelligence and their definition of AGI. Not only does the definition “… matching the cognitive versatility and proficiency of a well-educated adult.”, by definition violate the “general” in AGI, by the “well educated” part; but it also implies that only the “well-educated” (presumably by a specific curriculum) qualifies one as intelligent and by definition also once you depart from the “well” of the “educated” you exponentially diverge from “intelligent”. It all seems rather unimpressive intelligence.
In other words; in one question; is the current AI not already well beyond the “…cognitive versatility and proficiency of an uneducated child”? And when you consider that in many places like Africa, they didn’t even have a written language until European evangelists created it and taught it to them in the late 19th century, and they have far less “education” than even some of the most “uneducated” avg., European and even many American children, does that not mean that AI is well beyond them at least?
Frankly, as it seems things are going, there Is at the very least going to be a very stark shift in “intelligence” that even exceeds that which has happened in the last 50 or so years that have brought us stark drops in memory, literary knowledge, mathematics, and even general literacy, not to mention the ability to write. What does it mean that kids now will not even have to feign acting like they’re selling out sources, vetting them, contradicting a story or logical sequence, forming ideas, messages, and stories, etc.? I’m not trying to be bleak, but I don’t see tons simply resulting in net positive outcomes, and most of the negative impacts will also be happening below the surface to the point that people won’t realize what is being lost.
AI is highly educated. It's a different sort of artifact we're dealing with where it can't tell truth from fiction.
What's going on is AI fatigue. We see it everywhere, we use it all the time. It's becoming generic and annoying and we're getting bored of it EVEN though the accomplishment is through the fucking roof.
If elon musk makes interstellar car that can reach the nearest star in 1 second and priced it at 1k, I guarantee within a year people will be bored of it and finding some angle to criticize it.
So what happens is we get fatigued, and then we have such negative emotions about it that we can't possibly classify it as the same thing as human intelligence. We magnify the flaws and until it takes up all the space and we demand a redefinition of what agi is because it doesn't "feel" right.
We already had a definition of AGI. We hit it. We moved the goal posts because we weren't satisfied. This cycle is endless. The definition of AGI will always be changing.
Take LLMs as they exist now and only allow 10% of the population to access it. Then the opposite effect will happen. The good parts will be over magnified and the bad parts will be acknowledged and then subsequently dismissed.
Think about it. All the AI slop we see on social media are freaking masterpieces works of art produced in minutes what most humans can't even hope to come close to. Yet we're annoyed and unimpressed by them. That's how it's always going to go down.
Pretty much. Capabilities we now consider mundane were science fiction just three years ago, as far as anyone not employed by OpenAI was concerned.
We already had a definition of AGI. We hit it.
Are you sure about that? Which definition are you referring to? From what I can tell with Google and Grok, every proposed definition has been that AGI strictly matches or exceeds human cognitive capabilities across the board.
Generative AI is great, but it's not like you could just assign an arbitrary job to a present-day LLM, give it access to an expense account, and check in quarterly with reasonable expectations of useful progress.
That's fair. I suppose it gets at the heart of the question of what the specific threshold for AGI should be. Is it matching or exceeding all cognitive capabilities of any given human, an average 100-IQ adult, a severely learning-disabled child, or something else entirely?
Maybe AGI is in the eye of the beholder, and a given person just has to decide by using their own experience of human intelligence as a benchmark. In other words, AGI is AGI whenever majority consensus decides it's a universal replacement for ordinary human intelligence.
>Generative AI is great, but it's not like you could just assign an arbitrary job to a present-day LLM, give it access to an expense account, and check in quarterly with reasonable expectations of useful progress.
I'd be interested in seeing the results if so. Like someone with $500k to burn just handing it to an AI and telling it to found a startup, including incorporation, vibe coding and deploying a product, publishing a website, bringing on cofounders and/or advisors and/or employees as needed, fundraising, etc.
My guess is finding agentic tooling that could even push the model hard enough to attempt that in earnest would be a problem, and the compute bills would add up quickly, but it would be a very interesting experiment. Probably with hilarious results, but seeing what the actual failure points are would be useful information. Maybe it would actually end up hiring and paying someone to do something insanely illegal.
There have been a few attempts at “building a business by following ChatGPT’s instructions.” All the ones I’m aware of failed, and were really just a scheme to grab eyeballs.
There’s also a website showing a company’s attempt to use multiple LLMs to close its (real-world, but historical) accounting month-end books. They fail miserably with compounding errors.
The Turing test was never a test of thinking: Turing said that thinking was difficult to define and so he decided to "replace the question by another, which is closely related to it" (I disagree with him there) "and is expressed in relatively unambiguous words," i.e. the question of whether a chatbot can fool a text-only observer into thinking it's human.
Clearly, current LLMs have passed the Turing test, as witnessed by the difficulty many schools have in enforcing "do not use LLMs to do your homework" rules. But even Turing didn't say his test was a test of intelligence, just a test "closely related" to intelligence. And if he had seen today's LLMs, I think he would have revised that opinion, because today's LLMs generate text with no underlying fact model, no fundamental understanding of the truth behind the words they're saying. (No understanding, even, of the concepts of truth or falsehood). I think today's LLMs have demonstrated that being able to string words together in coherent sentences is not "closely related" to intelligence at all.
As far as I can tell, passing the Turing test has never been the majority-consensus definition of AGI. It seems to me that the Turing test has fundamentally always been about proving a negative: if something fails the Turing test, it's probably not AGI.
For reference, the term AGI post-dates the Turing test by half a century. I also don't personally remember ever hearing the exact term "artificial general intelligence" prior to 2023 or 2024, or at least it wasn't mainstream the way it is today.
If AGI had truly ever been defined by the Turing test, then Cleverbot should've been hailed as AGI when it passed the test in 2011. Even if we did all agree to call it that, we'd still need some other term for what we actually mean when we say "AGI" today. Cleverbot-era chatbots were cute toys, but they weren't capable of doing useful work of any kind.
That’s not accurate. The Turing test was always intended as a benchmark for general intelligence. Turing’s 1950 paper explicitly proposed it as a way to operationalize the question “Can machines think?” not as a parlor trick about conversation but as a proxy for indistinguishability in intellectual behavior. The whole point of the imitation game was to sidestep metaphysical arguments and reduce intelligence to functional equivalence. If a machine could consistently hold its own in unrestricted dialogue, it would demonstrate the breadth, adaptability, and contextual understanding that characterize general intelligence.
The term AGI may have come later, but the concept it represents traces directly back to Turing’s framing. When early AI researchers talked about “strong AI” or “thinking machines,” they were using the same conceptual lineage. The introduction of the acronym doesn’t rewrite that history, it just gave a modern label to an old idea. The Turing test was never meant to detect a “negative” but to give a concrete, falsifiable threshold for when positive claims of general intelligence might be justified.
As for Cleverbot, it never truly passed the test in any rigorous or statistically sound sense. Those 2011 headlines were based on short exchanges with untrained judges and no control group. Passing a genuine Turing test requires sustained coherence, reasoning across domains, and the ability to handle novel input gracefully. Cleverbot couldn’t do any of that. It failed the spirit of the test even if it tricked a few people in the letter of it.
By contrast, modern large language models can pass the Turing test in flying colors. They can maintain long, open-ended conversations, reason about complex subjects, translate, summarize, and solve problems across many domains. Most human judges would be unable to tell them apart from people in text conversation, not for a few sentences but for hours. Granted, one can often tell ChatGPT is an AI because of its long and overly descriptive replies, but that’s a stylistic artifact, not a limitation of intelligence. The remarkable thing is that you can simply instruct it to imitate casual human conversation, and it will do so convincingly, adjusting tone, rhythm, and vocabulary on command. In other words, the test can be passed both intentionally and effortlessly. The Turing test was never obsolete; we finally built systems that can truly meet it.
(1) AI isn't educated. It has access to a lot of information. That's two different things.
(2) I was rebutting the paper's standard that AGI should be achieving the status of a well-educated adult, which is probably far, far too high a standard. Even something measured to a much lower standard--which we aren't at yet--would change the world. Or, going back to my example, an AI that was as intelligent as a labrador in terms of its ability to synthesize and act on information would be truly extraordinary.
It has no idea who these guys are. It thinks they are the beatles, the doors. If you probe enough, it'll say it's IBM cofounders. In a way, it kinda sees that these are mid-1900s folks with cool haircuts, but it doesn't recognize anything. If you probe on the F the model in question becomes convinced it's the Ford racing team with a detailed explanation of two brothers in the photo, etc.
The creation of autoregressive next token predictors is very cool and clearly has and will continue to have many valuable applications, but I think we're missing something that makes interactions with users actually shape the trajectory of its own experience. Maybe scaffolding + qlora solves this. Maybe it doesn't
You picked one trivial failure and built an entire worldview around it while ignoring the tidal wave of success stories that define what these models can already do. ChatGPT can draft legal documents, debug code in multiple languages, generate functional architectures, summarize thousand page reports, compose music, write poetry, design marketing campaigns, and tutor students in real time. It can hold domain specific conversations with doctors, engineers, and lawyers and produce coherent, context aware reasoning that would have been considered impossible five years ago.
And you’re pointing to a single pizza dough error as if that somehow invalidates all of it. If that’s your bar, then every human who ever made a mistake in a kitchen is disqualified from being intelligent too. You’re cherry picking the single dumbest moment and pretending it defines the whole picture. It doesn’t.
The real story is that these models already demonstrate reasoning and generalization across virtually every intellectual domain. They write, argue, and problem solve with flexibility and intent. They’re not perfect, but perfection was never the standard. The Turing test was passed the moment you could no longer draw a clear line between where imitation ends and understanding begins.
You can sneer about yeast all you want, but the irony is that while you mock, the machines are already doing useful work coding, researching, analyzing, and creating, quietly exceeding every benchmark that once defined general intelligence.
I'm curious when and what you consider to have been the moment.
To me, the general in AGI means I should be able to teach it something it's never seen before. I don't think I can even teach an LLM something it's seen a million times before. Long division, for example.
I don't think a model that is solid state until it's "trained" again has a very good chance of being AGI (unless that training is built into it and the model can decide to train itself).
Why do you believe that passing the turing test was previously the definition of AGI?
LLMs haven't actually passed the turing test since you can trivially determine if an LLM is on the other side of a conversation by using a silly prompt (e.g. what is your system prompt).
The Turing test was proposed as an operational criterion for machine intelligence: if a judge cannot reliably tell machine from human in unrestricted dialogue, the machine has achieved functional equivalence to human general intelligence. That is exactly the property people now label with the word general. The test does not ask what parts the system has, it asks what it can do across open domains, with shifting goals, and under the pressure of follow up questions. That is a benchmark for AGI in any plain sense of the words.
On teachability. The Turing setup already allows the judge to teach during the conversation. If the machine can be instructed, corrected, and pushed into new tasks on the fly, it shows generality. Modern language models exhibit in context learning. Give a new convention, a new format, or a new rule set and they adopt it within the session. That is teaching. Long division is a red herring. A person can be generally intelligent while rusty at a hand algorithm. What matters is the ability to follow a described procedure, apply it to fresh cases, and recover from mistakes when corrected. Current models can do that when the task is specified clearly. Failure cases exist, but isolated lapses do not collapse the definition of intelligence any more than a human slip does.
On the claim that a model is solid state unless retrained. Human brains also split learning into fast context dependent adaptation and slow consolidation. Within a session, a model updates its working state through the prompt and can bind facts, rules, and goals it was never trained on. With tools and memory, it can write notes, retrieve information, and modify plans. Whether weights move is irrelevant to the criterion. The question is competence under interaction, not the biological or computational substrate of that competence.
On the idea that LLMs have not passed the test because you can ask for a system prompt. That misunderstands the test. The imitation game assumes the judge does not have oracle access to the machinery and does not play gotcha with implementation details. Asking for a system prompt is like asking a human for a dump of their synapses. It is outside the rules because it bypasses behavior in favor of backstage trivia. If you keep to ordinary conversation about the world, language, plans, and reasoning, the relevant question is whether you can reliably tell. In many settings you cannot. And if you can, you can also tell many humans apart from other humans by writing style tics. That does not disqualify them from being generally intelligent.
So the logic is simple. Turing gave a sufficient behavioral bar for general intelligence. The bar is open ended dialogue with sustained competence across topics, including the ability to be instructed midstream. Modern systems meet that in many practical contexts. If someone wants a different bar, the burden is to define a new operational test and show why Turing’s is not sufficient. Pointing to a contrived prompt about internal configuration or to a single brittle task does not do that.
> If elon musk makes interstellar car that can reach the nearest star in 1 second and priced it at 1k, I guarantee within a year people will be bored of it and finding some angle to criticize it.
Americans were glued to their seats watching Apollo 11 land. Most were back to watching I Dream of Jeanie reruns when Apollo 17 touched down.
Well yes, but if this actually happened it would open up a new frontier. We'd have an entire galaxy of unspoilt ecosystems* to shit in. Climate anxiety would go from being existential dread to mere sentimental indignation, and everybody would be interested in the latest news from the various interstellar colonies and planning when to emigrate. Mental illness epidemics would clear up, politics would look like an old-fashioned activity, the global mood would lift, and people would say "global" much less often.
> EVEN though the accomplishment is through the fucking roof.
I agree with this but also, the output is almost entirely worthless if you can’t vet it with your own knowledge and experience because it routinely gives you large swaths of incorrect info. Enough that you can’t really use the output unless you can find the inevitable issues. If I had to put a number to it, I would say 30% of what an LLM spits out at any given time to me is completely bullshit or at best irrelevant. 70% is very impressive, but still, it presents major issues. That’s not boredom, that’s just acknowledging the limitations.
It’s like designing an engine or power source that has incredible efficiency but doesn’t actually move or affect anything (not saying LLM’s are worthless but bear with me). It just outputs with no productive result. I can be impressed with the achievement while also acknowledging it has severe limitations
Not all content needs to be real. A huge portion of what humans appreciate is fiction. There's a huge amount of that content and hallucination is the name of the game in these contexts.
> Not all content needs to be real. A huge portion of what humans appreciate is fiction.
Yes but that’s deliberately fiction and I know I am reading fiction. AI hallucinations are not comparable to deliberately created works of fiction. I don’t “appreciate” the fiction LLM’s serve me. They are explicitly sold as a utility, as a source of accurate information. They are literally being used to replaced google search.
Put another way: When someone claims something is true but speaks fiction it’s called a lie. If they think it’s true but are wrong it’s called a mistake. If I ask someone what is wrong with my car I don’t want 30% fiction. If I ask an LLM for a recipe I don’t want 70% accuracy.
I'll simultaneously call all current ML models "stupid" and also say that SOTA LLMs can operate at junior (software) engineer level.
This is because I use "stupidity" as the number of examples some intelligence needs in order to learn from, while performance is limited to the quality of the output.
LLMs *partially* make up for being too stupid to live (literally: no living thing could survive if it needed so many examples) by going through each example faster than any living thing ever could — by as many orders of magnitude as there are between jogging and continental drift.
If you’re a shop that churns through juniors, LLMs may match that. If you retain them for more than a year, you rapidly see the difference. Both personally and in the teams that develop an LLM addiction versus those who use it to turbocharge innate advantages.
I have had the unfortunate experience of having to work with people who have got a lot more than one year experience who are still worse than last year's LLMs, who didn't even realise they were bad at what they did.
Data-efficiency matters, but compute-efficiency matters too.
LLMs have a reasonable learning rate at inference time (in-context learning is powerful), but a very poor learning rate in pretraining. And one issue with that is that we have an awful lot of cheap data to pretrain those LLMs with.
We don't know how much compute human brain uses to do what it does. And if we could pretrain with the same data-efficiency as humans, but at the cost of using x10000 the compute for it?
It would be impossible to justify doing that for all but the most expensive, hard-to-come-by gold-plated datasets - ones that are actually worth squeezing every drop of performance gains out from.
Energy is even weirder. Global electricity supply is about 3 TW/8 billion people, 375 W/person, vs the 100-124 W/person of our metabolism. Given how much cheaper electricity is than food, AI can be much worse Joules for the same outcome, while still being good enough to get all the electricity.
After reading the paper I’m struck by the lack of any discussion of awareness. Cognition requires at its basis awareness, which due to its entirely non verbal and unconstructed basis, is profoundly difficult to describe, measure, quantify, or label. This makes it to my mind impossible to train a model to be aware, let alone for humans to concretely describe it or evaluate it. Philosophy, especially Buddhism, has tried for thousands of years and psychology has all but abandoned attempting so. Hence papers like this that define AGI on psychometric dimensions that have the advantage of being easily measured but the disadvantage of being incomplete. My father is an emeritus professor of psychometrics and he agrees this is the biggest hurdle to AGI - that our ability to measure the dimensions of intelligence is woefully insufficient to the task of replicating intelligence. We scratch the surface and his opinion is language is sufficient to capture the knowledge of man, but not the spark of awareness required to be intelligent.
This isn’t meant to be a mystical statement that it’s magic that makes humans intelligent or some exotic process impossible to compute. But that the nature of our mind is not observable in its entirety to us sufficient that the current learned reinforcement techniques can’t achieve it.
Try this exercise. Do not think and let your mind clear. Ideas will surface. By what process did they surface? Or clear your mind entirely then try to perform some complex task. You will be able to. How did you do this without thought? We’ve all had sudden insights without deliberation or thought. Where did these come from? By what process did you arrive at them? Most of the things we do or think are not deliberative and definitely not structured with language. This process is unobservable and not measurable, and the only way we have to do so is through imperfect verbalizations that hint out some vague outline of a subconscious mind. But without being able to train a model on that subconscious process, one that can’t be expressed in language with any meaningful sufficiency, how will language models demonstrate it? Their very nature of autoregressive inference prohibits such a process from emerging at any scale. We might very well be able to fake it to an extent that it fools us, but awareness isn’t there - and I’d assert that awareness is all you need.
Because LLMs don't have this special quality that you call "awareness", then they cannot have "cognition", neither of which you defined? This is along the lines of "There must be something special with my mind that LLMs don't have, I can just feel it" special pleading whether you call it awareness, consciousness, qualia etc.
As long as you cannot define it clearly or even show that you yourself have this quality, I think the burden of proof is on you to show why this has any real world implications rather than just being word play. We can build thinking, reasoning machines just fine without waiting for philosophers to finally answer what consciousness is.
Awareness is just continuous propagation of the neural network, be that artificial or biological. The reason thoughts just "appear" is because the brain is continuously propagating signal through the neural network. LLMs also do this during their decoding phase, where they reason continuously with every token that they generate. There is no difference here.
Then you say "we don't think most of the times using language exclusively" , but neither do LLMs. What most people fail to realise is that in between each token being generated, black magic is happening in between the transformer layers. The same type of magic you describe. High dimensional. Based on complex concepts. Merging of ideas. Fusion of vectors to form a combined concept. Smart compression. Application of abstract rules. An LLM does all of these things, and more, and you can prove this by how complex their output is. Or, you can read studies by Anthropic on interpretability, and how LLMs do math underneath the transformer layers. How they manipulate information.
AGI is not here with LLMs, but its not because they lack reasoning ability. It's due to something different. Here is what I think is truly missing: continuous learning, long term memory, and infinite and efficient context/operation. All of these are tied together deeply, and thus I believe we are but a simple breakthrough away from AGI.
There are very significant differences between biological and artificial neural networks. Artificial neural networks are mathematical attempts to replicating how the brain’s neurons work. They are not and were never meant to be 1 to 1 replications. There is the difference in scale, where the “parameters” of human neural networks absolutely dwarf the current LLMs we have today. There is also the fact that they are materially different. The underlying biology and cell structure affects biological neural networks in ways that artificial neural networks just simply dont have access to.
The idea of awareness being propagations through the NN is an interesting concept though. I wonder if this idea be proven through monitoring the electrical signals within the brain.
Why would it have to be a 1 to 1 replication? Isn't that a strawman argument? NNs can basically store the collective of knowledge of humanity in that miniscule amount of neurons. NNs also run at much much higher frequency than human brains. Does that make human brains inferior and not worthy of being called aware by the same line of argumentation? Why do these differences even matter? I can imagine a vastly different form of awareness than humans just fine. They can both be aware and not that similar.
People like to focus on the differences between the brain and artificial neural networks. I myself believe the only thing that truly matters is that you can form complex functions with the common neuron element. This is achieved via linking lots them together, and by each having a property known as non-linearity. These two things ensure that with neurons you can just about approximate any linear or non-linear function or behaviour. This means you can simulate inside your network pretty much any reality within this universe, its causation and the effects.
The deeper your network the more complex the reality you can "understand". Understand just means simulate and run inputs to get outputs in a way that matches the real phenomenon. When someone is said to be "smart", it means they possess a set of rules and functions that can very accurately predict a reality.
You mention scale, and while its true the number of neuron elements the brain has is larger than any LLM, its also true the brain is more sparse, meaning much less of the neurons are active at the same time. For a more fair comparison, you can also remove the motor cortex from the discussion, and talk just about the networks that reason. I believe the scale is comparable.
In essence, I think it doesn't matter that the brain has a whole bunch of chemistry added into it that artificial neural networks don't. The underlying deep non-linear function mapping capability is the same, and I believe this depth is, in both cases, comparable.
While thinking that current mathematical model replicate accurately a fondamental aspect of biological neural network might be right, it doesn't mean that nothing is missing to achieve the stated goal of true intelligence.
Maybe we've just reach the ability the replicate the function of an artificially powered dead brain that would be randomly stimulated and nothing more. Is this really a path to intelligence ?
Isn't the brain randomly stimulated already? Even not being dead? Don't you think the complex reasoning is a cause of the neurons themselves and not the stimulation?
Animals are alive and are not nearly as smart. Its because their neural networks are not as deep. Its not for the lack of proper chemistry or stimulation.
> Awareness is just continuous propagation of the neural network, be that artificial or biological. The reason thoughts just "appear" is because the brain is continuously propagating signal through the neural network.
This is just a claim you are making, without evidence.
The way you understand awareness is not through "this is like that" comparisons. These comparisons fall over almost immediately as soon as you turn your attention to the mind itself, by observing it for any length of time. Try it. Go observe your mind in silence for months. You will observe for yourself it is not what you've declared it to be.
> An LLM does all of these things, and more, and you can prove this by how complex their output is.
Complex output does not prove anything. You are again just making claims.
It is astoundingly easy to push an LLM over to collapse into ungrounded nonsense. Humans don't function this way because the two modes of reasoning are not alike. It's up to those making extraordinary claims to prove otherwise. As it is, the evidence does not exist that they behave comparably.
The sentence "It is astoundingly easy to push an LLM over to collapse into ungrounded nonsense" makes me wonder.
How easy? What specific methods accomplish this? Are these methods fundamentally different from those that mislead humans?
How is this different from exploiting cognitive limitations in any reasoning system—whether a developing child's incomplete knowledge or an adult's reliance on heuristics?
How is it different from Fake News and adults taking Fake News for granted and replicating bullshit?
Complex output can sometimes give you the wrong idea, I agree. For instance, a study Anthropic did a while back showed that, when an LLM was asked HOW it performed a mathematical computation (35 + 59), the response the LLM gave was different from the mechanistic interpretation of the layers [1]. This showed LLMs can be deceptive. But they are also trained to be deceptive. Supervised fine tuning is imitation learning. This leads the model to learn to be deceptive, or answer what is usually the normal explanation, such as "I sum first 5+9, then add the remainder to... etc". The LLM does this rather than actually examining the past keys and values. But it does not mean it can't examine its past keys and values. These encode the intermediate results of each layer, and can be examined to identify patterns. What Anthropic researchers did was examine how the token for 35 and for 39 was fused together in the layers. They compare these tokens to other tokens, such as 3 , 5 , 9. For an LLM, tokens are high dimensional concepts. This is why you can compare the vectors to each other, and figure out the similarity, and therefore break down the thought process. Yes, this is exactly what I have been discussing above. Underneath each token prediction, this black magic is happening, where the model is fusing concepts through summation of the vectors (attention scores). Then, merged representations are parsed by the MLPs to generate the refined fused idea, often adding new knowledge stored inside the network. And this continues layer after layer. A repeated combination of concepts, that start with first understanding the structure and order of the language itself, and end with manipulation of complex mathematical concepts, almost detached from the original tokens themselves.
Even though complex output can be deceptive of the underlying mental model used to produce it, in my personal experience, LLMs have produced for me output that must imply extremely complex internal behaviour, with all the characteristics I mentioned before. Namely, I frequently program with LLMs, and there is simply zero percent probability that their output tokens exist WITHOUT first having thought at a very deep level about the unique problem I presented to them. And I think anyone that has used the models to the level I have, and interacted with them this extensively, knows that behind each token there is this black magic.
To summarize, I am not being naive by saying I believe everything my LLM says to me. I rather know very intimately where the LLM is deceiving me and when its producing output where its mental model must have been very advanced to do so. And this is through personal experience playing with this technology, both inference and training.
What makes you think you can understand the subjective experience of LLMs then? It's out of reach of science, so the only way is to ask them? How can you be sure they don't have subjective experience? Remember that you forbade yourself from using science to answer it.
Also, if subjective experience has any effect on the material world, then we can measure it and test it, putting it in reach of science. If it doesn't, why does it even matter in this discussion? By definition it has no relation to the AGI discussion since that's an empirical matter.
Haha, well, I would appreciate if comments included more substantive evidence when they make claims like they did.
I see a lot of "AGI boosters/doomers" comfortable making big claims without providing research to back what, when challenged, prove to be just their model or feeling of how things function.
Anything that is not measurable (i.e. awareness, consciousness) is not very useful in practice as a metric. I don't think there is even an agreed definition what consciousness is, partially because it is not really observable outside of our own mind.
Therefore I think it makes perfect sense that awareness is not discussed in the paper.
I so completely agree. In virtually every conversation I have heard about AI, it only every talks about one of the multiple intelligences as theorized in Howard Gardner's book Frames of Mind: The Theory of Multiple Intelligences (1983)[1]
There is little discussion of how AI will enhance (or destroy) our emotional intelligence, or our naturalistic, intrapersonal or interpersonal intelligences.
Most religions, spiritual practices and even forms of meditation highlight the value of transcending mind and having awareness be present in the body. The way AGI is described, it would seem transcendence may be treated as a malfunction or bug.
We don't want awareness because it begets individuals by means of agency and we'd need to give them rights. This is industry's nightmare scenario.
People want autonomy, self-learning, consistent memory and perhaps individuality (in the discernability/quirkiness sense), but still morally unencumbered slaves.
Any definition of AGI that doesn't include awareness is wrongly co-opting the term, in my opinion. I do think some people are doing that, on purpose. That way they can get people who are passionate about actual-AGI to jump on board on working with/for unaware-AGI.
There is no way to measure awareness. We can only know we are aware ourselves. For all we know trees or rocks might have awareness. Or I could be the only being aware of itself in the universe. We have no way to prove anything about it. Therefore it is not a useful descriptor of intelligence (be it human, animal or artificial).
Agreed. Everything that looks like intelligence to ME is intelligent.
My measurement of outside intelligence is limited by my intelligence. So I can understand when something is stupider compared to me. For example, industrial machine vs human worker, human worker is infinitely more intelligent compared to machine, because this human worker can do all kinds of interesting stuff. this metaphorical "human worker" did everything around from laying a brick to launching a man to the Moon.
....
Imagine Super-future, where humanity created nanobots and they ate everything around. And now instead of Earth there is just a cloud of them.
These nanonobots were clever and could adapt, and they had all the knowledge that humans had and even more(as they were eating earth a swarm was running global science experiments to understand as much as possible before the energy ends).
Once they ate the last bite of our Earth(an important note here: they left an optimal amount of matter to keep running experiments. Humans were kept in a controlled state and were studied to increase Swarm's intelligence), they launched next stage. A project, grand architect named "Optimise Energy capture from the Sun".
Nanobots re-created the most efficient ways of capturing the Sun energy - ancient plants, which swarm studied for centuries. Swarm added some upgrades on top of what nature came up with, but it was still built on top of what nature figured by itself. A perfect plant to capture the Sun's energy. All of them a perfect copy of itself + adaptive movements based on their geolocation and time(which makes all of them unique).
For plants nanobots needed water, so they created efficient oceans to feed the plants. They added clouds and rains as transport mechanism between oceans and plants... etc etc.
One night the human, which you already know by the name "Ivan the Liberator"(back then everyone called him just Ivan), didn't sleep on his usual hour. Suddenly all the lights went off and he saw a spark on the horizon. Horizon, that was strongly prohibited to approach. He took his rifle, jumped on a truck and raced to the shore - closest point to the spark vector.
Once he approached - there was no horizon or water. A wall of dark glass-like material, edges barely noticeable. Just 30 cm wide. On the left and on the right from a 30 cm wide wall - an image as real as his hands - of a water and sky. At the top of the wall - a hole. He used his gun to hit the wall with the light - and it wasn't very thick, but once he hit - it regenerated very quickly. But once he hit a black wall - it shattered and he saw a different world - world of plants.
He stepped into the forest, but these plants, were behaving differently. This part of the swarm wasn't supposed to face the human, so these nanobots never saw one and didn't have optimised instructions on what to do in that case. They started reporting new values back to the main computer and performing default behaviour until the updated software arrived from an intelligence center of the Swarm.
A human was observing a strange thing - plants were smoothly flowing around him to keep a safe distance, like water steps away from your hands in a pond.
"That's different" thought Ivan, extended his hand in a friendly gesture and said
- Nice to meet you. I'm Ivan.
....
In this story a human sees a forest with plants and has no clue that it is a swarm of intelligence far greater than him. To him it looks repetitive simple action that doesn't look random -> let's test how intelligent outside entity is -> If entity wants to show its intelligence - it answers to communication -> If entity wants to hide its intelligence - it pretends to be not intelligent.
If Swarm decides to show you that it is intelligent - it can show you that it is intelligent up to your level. It won't be able to explain everything that it knows or understands to you, because you will be limited by your hardware. The limit for the Swarm is only computation power it can get.
> Try this exercise. Do not think and let your mind clear. Ideas will surface. By what process did they surface? Or clear your mind entirely then try to perform some complex task.
I do not have any even remotely practical definition for this, but this has to somehow involve the system being in a closed loop. It has to "run" in a sense that an operating system runs. Even if there is nothing coming on certain inputs it still has to run. And probably hallucinate (hehe) like humans do in an absence of a signal or infer patterns where there are none, yet be able to self-reflect that it is in fact a hallucination
I'd argue the biggest issue with concretely defining intelligence is that any attempts end up falling in two buckets:
1. "Too" Broad, which raises uncomfortable questions about non-human intelligence and how we as humans treat them (see: whales, elephants, octopuses/cephalopods)
2. Too narrow, which again raises very uncomfortable issues about who and who does not qualify as human, and what we do with them.
Put in other words, it's more an issue of ethics and morals than it is definitional.
Awareness doesn't seem that hard for AI systems though - if you look at the screen on a self driving Tesla you can see if it's aware of pedestrians, cyclists etc. because it draws boxes around them on the screen as it becomes aware of them.
I guess by 'AGI' most people mean human level or above so I guess you'd want human level awareness which Teslas and the like don't have yet.
Language is one of communication contracts. LLModels leverage these contracts to communicate data structures (shapes) that emerge when evaluating input. They are so good at prediction that when you give them a clue of a shape they will create something passable, and they keep getting better with training.
I hear there's work being done on getting the world models out, distilling the 'cortex-core' (aka the thinking without data), to perhaps see if they're capable of more, but so far we're looking at holograms of wishful thinking that increase in resolution, but still lack any essence.
This begs a question - can true intelligence even be artificial?
Can't "awareness" in both examples be approximated by a random seed generator? Both the human mind and autoregressive model just need any initial thought to iterate and improve off of, influenced by unique design + experienced priors.
Yep, computers execute code, they are tools. Humans have the capacity to spontaneously generate new thoughts out of nothing, solve problems never before solved and not just by brute force number crunching.
Does any of that argument really matter? And frankly, this statement:
>This makes it to my mind impossible to train a model to be aware
feels wrong. If you're arguing that human's are aware, then it is apparent that it is possible to train something to be aware. Nature doesn't have any formal definition of intelligence, or awareness, yet here we are.
From a practical perspective, it might be implausibly difficult to recreate that on computers, but theoretically, no reason why not.
Have we shown what the human brain does at a “hardware” level? Or are you just assuming that the basic building block of a computer is that same as the basic building block of a human brain?
Basic building blocks are atoms. So, yes same. If you mean cells vs transistors, sure they're different. But we don't have to demonstrate anything to know that nature already made conscious intelligent AGI without it itself not understanding anything. Therefore AGI can be created without knowing what consciousness is. It happened at least once.
> Does any of that argument really matter? And frankly, this statement.
My definition of a complete AGI is: an AI that can read JIRA tickets, talk with non-programmers and do all my job and get me and all/most software engineers fired and proves sustainable.
But in general, it's an AI that can do any remote-work just as good as humans.
agreed. There is no way to tell if someone is aware or not we rely on brain activity to say someone is alive or not there is no way to tell someone or something is conscious currently.
This seems like an unsupported assertion. LLMs already exhibit good functional understanding of and ability in many domains, and so it's not at all clear that they require any more "awareness" (are you referring to consciousness?) than they already have.
> the spark of awareness required to be intelligent.
Again, this seems like an assumption - that there's some quality of awareness (again, consciousness?), that LLMs don't have, that they need in order to be "intelligent". But why do you believe that?
> We’ve all had sudden insights without deliberation or thought.
Highly doubtful. What you mean is, "without conscious thought". Your conscious awareness of your cognition is not the entirety of your cognition. It's worth reading a bit of Dennett's work about this - he's good at pointing out the biases we tend to have about these kinds of issues.
> We might very well be able to fake it to an extent that it fools us
This leads to claiming that there are unobservable, undetectable differences. Which there may be - we might succeed in building LLMs that meet whatever the prevailing arbitrary definition of intelligence is, but that don't possess consciousness. At that point, though, how meaningful is it to say they're not intelligent because they're not conscious? They would be functionally intelligent. Arguably, they already are, in many significant ways.
> Try this exercise. Do not think and let your mind clear. Ideas will surface. By what process did they surface? Or clear your mind entirely then try to perform some complex task. You will be able to. How did you do this without thought? We’ve all had sudden insights without deliberation or thought. Where did these come from? By what process did you arrive at them? Most of the things we do or think are not deliberative and definitely not structured with language.
Not to pile on, but isn't this actually a distinct example of _lack_ of awareness? As in, our brains have sparks of creativity without understanding the inception of those sparks?
Perhaps I'm conflating some definition of "aware" with another definition of "awareness"?
Does general intelligence require awareness though? I think you are talking about consciousness, not intelligence. Though to be frank consciousness and intelligence are not well defined terms either.
The problem, I guess, with these methods is, they consider human intelligence as something detached from human biology. I think this is incorrect. Everything that goes in the human mind is firmly rooted in the biological state of that human, and the biological cycles that evolved through millennia.
Things like chess-playing skill of a machine could be bench-marked against that of a human, but the abstract feelings that drive reasoning and correlations inside a human mind are more biological than logical.
Yup, I feel like the biggest limitation with current AI is that they don't have desire (nor actual agency to act upon it). They don't have to worry about hunger, death, feelings, and so they don't really have desires to further explore space, or make life more efficient because they're on limited time like humans. Their improvement isn't coming inside out like humans, it's just external driven (someone pressing a training epoch). This is why I don't think LLMs will reach AGI, if AGI somehow ties back to "human-ness." And maybe that's a good thing for Skynet reasons, but anyways
The current AI buildup is based on an almost metaphysical bet that intelligence can be simulated in software and straightforwardly scaled by increasing complexity and energy usage.
Personally, I remain skeptical that is the case.
What does seem likely is that “intelligence” will eventually be redefined to mean whatever we got out of the AI buildup.
There's equally no reason to believe that a machine can be conscious. The fact is, we can't say anything about what is required for consciousness because we don't understand what it is or how to measure or define it.
I disagree, I think the leap of faith is to believe that something in our brains made of physical building blocks can’t be replicated on a computer that so far we’ve seen is very capable of simulating those building blocks
Your emotions are surely caused by the chemical soup, but chemical soup need not be the only way to arrive at emotions. It is possible for different mechanisms to achieve same outcomes.
Perhaps we could say we don't know whether the human biological substrate is required for mental processes or not,
but either way we do not know enough about
said biological substrate and our mental processes, respectively.
> How do we know we've achieved that? A machine that can feel emotions rather than merely emulating emotional behaviour.
Let me pose back to you a related question as my answer: How do you know that I feel emotions rather than merely emulating emotional behavior?
This gets into the philosophy of knowing anything at all. Descartes would say that you can't. So we acknowledge the limitation and do our best to build functional models that help us do things other than wallow in existential loneliness.
And Popper would say you cannot ever prove another mind or inner state, just as you cannot prove any theory.
But you can propose explanations and try to falsify them. I haven’t thought about it but maybe there is a way to construct an experiment to falsify the claim that you don’t feel emotions.
I suppose there may be a way for me to conduct an experiment on myself, though like you I don't have one readily at hand, but I don't think there's a way for you to conduct such an experiment on me.
I wonder what Popper did say specifically about qualia and such. There's a 1977 book called "The Self and Its Brain: An Argument for Interactionism". Haven't read it.
Preface:
The problem of the relation between our bodies and our minds, and especially of the link between brain structures and processes on the one hand and mental dispositions and events on the other is an exceedingly difficult one. Without pretending to be able to foresee future developments, both authors of this book think it improbable that the problem will ever be solved, in the sense that we shall really understand this relation. We think that no more can be expected than to make a little progress here or there.
Philosophers have been worrying about the question of how you can know anything for thousands of years. I promise that your pithy answer here is not it.
Ok, but ChatGPT speaks this language just as well as I do, and we also know that emotion isn't a core requirement of being a member of this species because psychopaths exist.
Also, you don't know what species I am. Maybe I'm a dog. :-)
Human-to-human communication is different from a human-to-computer communication. The google search engine speaks the same language as you, heck even the Hacker News speaks the same language as you as you are able to understand what each button on this page mean, and will respond correctly when you communicate back by pressing e.g. the “submit” button.
Also assuming psychopaths don‘t experience emotions is going going with a very fringe theory of psychology. Very likely psycopaths experience emotions, they are maybe just very different emotions from the ones you and I experience. I think a better example would be a comatose person.
That said I think talking about machine emotions is useless. I see emotions as a specific behavior state (that is you will behave in a more specific manner) given a specific pattern of stimuli. We can code our computers to do exactly that, but I think calling it emotions would just be confusing. Much rather I would simply call it a specific kind of state.
1) I know that I have emotions because I experience them.
2) I know that you and I are very similar because we are both human.
3) I know that we can observe changes in the brain as a result of our changing emotions and that changes to our brains can affect our emotions.
I thus have good reason to believe that since I experience emotions and that we are both human, you experience emotions too.
The alternative explanation, that you are otherwise human and display all the hallmarks of having emotions but do not in fact experience anything (the P-zombie hypothesis), is an extraordinary claim that has no evidence to support it and not even a plausible, hypothetical mechanism of action.
With an emotional machine I see no immediately obvious even hypothetical evidence to lend support to its veracity. In light of all this, it seems extraordinary to claim that non-biological means achieving real emotions (not emulated emotions) are possible.
After all, emulated emotions have already been demonstrated in video games. To call those sufficient would be setting an extremely low bar.
they do not, but the same argument can hold true by the fact the true human nature is not really known and thus trying to define what a human like intelligence would consist of can only be incomplete.
there are many parts of human cognition, phycology etc. especially related to consciousness that are known unknowns and/or completely unknown.
a mitigation for this isaue would be to call it generally applicable intelligence or something, rather than human like intelligence. implying ita not specialized AI but also not human like. (i dont see why it would need to be human like, because even with all the right logic and intelligence a human can still do something counter to all of that. humans do this everyday. intuitive action, or irrational action etc.
what we want is generally applicable intelligence, not human like intelligence.
There is reason to believe that consciousness, sentience, or emotions require a biological base.
Or
There is no reason to believe that consciousness, sentience, or emotions do not require a biological base.
The first is simple, if there is a reason you can ask for it and evaluate it's merits. Quantum stuff is often pointed to here, but the reasoning is unconvincing.
The second form
There is no reason to believe P does not require Q.
There are no proven reasons but there are suspected reasons. For instance if the operation that nerons perform is what makes consciousness work, and that operation can be reproduced non-biologicLly it would follow that non biological consciousness would be possible.
For any observable phenomenon in the brain the same thing can be asked. So far it seems reasonable to expect most of the observable processes could be replicated.
None of it acts as proof, but they probably rise to the bar of reasons.
What is the "irreplaceable" part of human biology that leads to consciousness? Microtubules? Whatever it is, we could presumably build something artificial that has it.
Douglas Hofstadter wrote Gödel, Escher, Bach in the late 1970s. He used the short-hand “strange loops”, but dedicates a good bit of time considering this very thing. It’s like the Ship of Theseus, or the famous debate over Star Trek transporters—at what point do we stop being an inanimate clump of chemical compounds, and become “alive”. Further, at what point do our sensory organs transition from the basics of “life”, and form “consciousness”.
I find anyone with confident answers to questions like these immediately suspect.
We “could presumably build” it, maybe we can do that once we figure out how to get a language prediction model to comprehend what the current date is or how to spell strawberry.
All right, same question: Is there more reason to believe that it is one breakthrough away, or to believe that it is not? What evidence do you see to lean one way or the other?
It’s clearly possible, because we exist. Just a matter of time. And as we’ve seen in the past, breakthroughs can produce incredible leaps in capabilities (outside of AI as well). We might not get that breakthrough(s) for a thousand years, but I’m definitely leaning towards it being inevitable.
Interestingly the people doing the actual envelope pushing in this domain, such as Ilya Sutskever, think that there it’s a scaling problem, and neural nets do result in AGIs eventually, but I haven’t heard them substantiate it.
You didn't answer the question. Zero breakthroughs away, one, or more than one? How strongly do you think whichever you think, and why?
(I'm asking because of your statement, "Don’t fool yourself into believing artificial intelligence is not one breakthrough away", which I'm not sure I understand, but if I am parsing it correctly, I question your basis for saying it.)
We have no known basis for even deciding that other than the (maybe right, maybe wrong) guess that consciousness requires a lot of organized moving complexity. Even with that guess, we don't know how much is needed or what kind.
There is exactly one good reason, at least for consciousness and sentience. And the reason is that those are such a vaguely defined (or rather defined by prototypes; ala Wittgenstein [or JavaScript before classes]). And that reason is anthropism.
We only have one good example of consciousness and sentience, and that is our own. We have good reason to suspect other entities (particularly other human individuals, but also other animals) have that as well, but we cannot access it, and not even confirm its existence. As a result using these terms of non-human beings becomes confusing at best, but it will never be actually helpful.
Emotions are another thing, we can define that outside of our experience, using behavior states and its connection with patterns of stimuli. For that we can certainly observe and describe behavior of a non biological entity as emotional. But given that emotion is something which regulates behavior which has evolved over millions of years, whether such a description would be useful is a whole another matter. I would be inclined to use a more general description of behavior patterns which includes emotion but also other means of behavior regulators.
What if our definition of those concepts is biological to begin with?
How does a computer with full AGI experience the feeling of butterflies in your stomach when your first love is required?
How does a computer experience the tightening of your chest when you have a panic attack?
How does a computer experience the effects of chemicals like adrenaline or dopamine?
The A in AGI stands for “artificial” for good reason, IMO. A computer system can understand these concepts by description or recognize some of them them by computer vision, audio, or other sensors, but it seems as though it will always lack sufficient biological context to experience true consciousness.
Perhaps humans are just biological computers, but the “biological” part could be the most important part of that equation.
That sounds correct though more fundamentally we don’t know what intelligence or consciousness are. It’s almost a religious question, as in our current understanding of the universe does not explain them but we know they exist. So regardless of embodied intelligence, we don’t even understand the basic building blocks of intelligence, we just have some descriptive study of it, that imo LLMs can get arbitrarily close to without ever being intelligent because if you can describe it, you can fit to it.
What about aliens? When little green critters finally arrive on this planet, having travelled across space and time, will you reject their intelligence because they lack human biology? What if their biology is silicon based, rather than carbon?
There's really no reason to believe intelligence is tied to being human. Most of us accept the possibility (even the likelihood) of intelligent life in the universe, that isn't.
Sure, there's little doubt that our biology shapes our experience. But in the context of this conversation, we're talking about how AI falls short of true AGI. My answer was offered in that regard. It doesn't really matter what you think about human intelligence, if you believe that non-human intelligence is every bit as valid, and there is no inherent need for any "humanness" to be intelligent.
Given that, the constant drumbeat of pointing out how AI fails to be human, misses the mark. A lot of the same people who are making such assertions, haven't really thought about how they would quickly accept alien intelligence as legitimate and full-fledged... even though it too lacks any humanity backing it.
And why are they so eager to discount the possibility of synthetic life, and its intelligence, as mere imitation? As a poor substitute for the "real thing"? When faced with their easy acceptance of alien intelligence, it suggests that there is in fact a psychological reason at the base of this position, rather than pure rational dismissal. A desire to leave the purely logical and mechanical, and imbue our humanity with an essential spirit or soul, that maybe an alien could have, but never a machine. Ultimately, it is a religious objection, not a scientific one.
Alien or syntethic life will have to go through similar challenges to those that shape human life, human intelligence and our consciousness. No text prediction machine, no matter how complex or "large", has to change its evolving environment and itself, for example.
What you are talking about is experience/knowledge, not raw intelligence.
It has been proven that a Turning Machine and Lambda Calculus have the exact same equivalent expressiveness, that encompasses the _entire set_ of computable functions. Why are you so sure that "text prediction" is not equally expressive?
I'm all ears if you want to explain how you have a magic soul that is too important and beautiful to ever be equalled by a machine. But if intelligence is not equivalent to computation, then what is it? Don't take the easy way out of asking me to define it, you define it as something other than the ability to successfully apply computation to the environment.
Was Hellen Keller not intelligent because she lacked the ability to see or hear? Is intelligence defined by a particular set of sense organs? A particular way of interacting with the environment? What about paraplegics, are they disqualified from being considered intelligent because they lack the same embodied experience as others?
Whenever you give someone kudos for being brilliant, it is always for their ability to successfully compute something. If that isn't what we're discussing when we're talking about intelligence, then what are we discussing?
I think I need to point out some obvious issues with the paper.
Definition of artificial:
>Made by humans, especially in imitation of something natural.
>Not arising from natural or necessary causes; contrived or arbitrary.
Thus artificial intelligence must be the same as natural, the process of coming up with it doesn't have to be natural.
What this means: we need to consider the substrate that makes natural intelligence. They cannot be separated willy nilly without actual scientific proof. As in, we cannot imply a roll of cheese can manifest intelligence based on the fact that it recognizes how many fingers are in an image.
The problem arises from a potential conflict of interests between hardware manufacturer companies and definition of AGI. The way I understand it, human like intelligence cannot come from algorithms running on GPUs. It will come from some kind of neuromorphic hardware.
And the whole point of neuromorphic hardware is that it operates (closely) on human brain principles.
Thus, the definition of AGI MUST include some hardware limitations. Just because I can make a contraption "fool" the tests doesn't mean it has human like cognition/awareness. That must arise from the form, from the way the atoms are arranged in the human brain. Any separation must be scientifically proven. Like if anyone implies GPUs can generate human like self awareness that has to be somehow proven.
Lacking a logical way to prove it, the best course of action is to closely follow the way the human brain operates (at least SNN hardware).
>The resulting AGI scores (e.g., GPT-4 at 27%, GPT-5
at 57%) concretely quantify both rapid progress and the substantial gap remaining
before AGI.
This is nonsense. GPT scores cannot decide AGI level. They are the wrong algorithm running on the wrong hardware.
I have also seen no disclosure on conflict of interests in the paper.
I always find it interesting how the majority of comments on threads like this on HN are dismissive of current AI systems as "gimmicks", yet some of the most successful people on the planet think it's worth plowing a trillion dollars into them.
I don't know who's right, but the dichotomy is interesting.
There’s already a vague definition that AGI is an AI with all the cognitive capabilities of a human. Yes, it’s vague - people differ.
This paper promises to fix "the lack of a concrete definition for Artificial General Intelligence", yet it still relies on the vague notion of a "well-educated adult". That’s especially peculiar, since in many fields AI is already beyond the level of an adult.
You might say this is about "jaggedness", because AI clearly lacks quite a few skills:
> Application of this framework reveals a highly “jagged” cognitive profile in contemporary models.
But all intelligence, of any sort, is "jagged" when measured against a different set of problems or environments.
So, if that’s the case, this isn’t really a framework for AGI; it’s a framework for measuring AI along a particular set of dimensions. A more honest title might be: "A Framework for Measuring the Jaggedness of AI Against the Cattell–Horn–Carroll Theory". It wouldn't be nearly as sexy, though.
Huh. I haven’t read the paper yet. But, it seems like a weird idea—wouldn’t the standard of “well educated (I assume, modern) adult” preclude the vast majority of humans who ever lived from being considered general intelligences?
And this is indeed a huge problem with a lot of the attacks on LLM even as more limited AI - a lot of them are based on applying arbitrary standards without even trying to benchmark against people, and without people being willing to discuss where they draw the line for stating that a given subset of people do not possess general intelligence...
I think people get really uncomfortable trying to even tackle that, and realistically for a huge set of AI tasks we need AI that are more intelligent than a huge subset of humans for it to be useful. But there are also a lot of tasks where AI that is not needed, and we "just" need "more human failure modes".
You can't measure intelligence directly. Instead, the idea is to measure performance in various tasks and use that as a proxy for intelligence. But human performance depends on other aspects beyond intelligence, including education, opportunities, and motivation, and most humans are far from reaching their true potential.
If you compare the performance of the average human to a state-of-the-art AI model trained by top experts with a big budget, you can't make any conclusions about intelligence. For the comparison to make sense, the human should also be trained as well as reasonably possible.
Is it reasonable to invest $10 million in education of one human? Not really. One human can only do so much.
But is it reasonable to invest the same sum in training one AI, which can be replicated and used indefinitely? Or in acquiring high quality training data, which can be used to train every future AI?
I read this as a hypothetical well-educated adult. As in, given the same level of knowledge, the intelligence performs equally well.
I do agree that it’s a weird standard though. Many of our AI implementations exceed the level of knowledge of a well-educated adult (and still underperform with that advantage in many contexts).
Personally, I don’t think defining AGI is particularly useful. It is just a marketing term. Rather, it’s more useful to just speak about features/capabilities. Shorthand for a specific set of capabilities will arise naturally.
>But all intelligence, of any sort, is "jagged" when measured against a different set of problems or environments.
On the other hand, research on "common intelligence" AFAIK shows that most measures of different types of intelligence have a very high correlation and some (apologies, I don't know the literature) have posited that we should think about some "general common intelligence" to understand this.
The surprising thing about AI so far is how much more jagged it is wrt to human intelligence
I think you are talking about correlation in humans of, say, verbal and mathematical intelligence. Still, it is a correlation, not equality - there are many word-acknowledged writers who suck at math, and mathematical prodigies who are are not the best at writing.
If you go beyond human species (and well, computers are not even living organisms), it gets tricky. Adaptability (which is arguably a broader concept than intelligence) is very different for, say octopodes, corvids and slime molds.
It is certainly not a single line of proficiency or progress. Things look like lines only if we zoom a lot.
Human intelligence has had hundreds of thousands of years of evolution that removes any 'fatal' variance from our intelligence. Too dumb is obvious on how it's culled, but 'too smart' can get culled by social creatures too, really 'too different' in any way.
Current AI is in its infancy and we're just throwing data at it in the same way evolution throws random change at our DNA and sees what sticks.
Don't get me wrong, I am super excited about what AI is doing for technology. But this endless conversation about "what is AGI" is so boring.
It makes me think of every single public discussion that's ever been had about quantum, where you can't start the conversation unless you go through a quick 101 on what a qubit is.
As with any technology, there's not really a destination. There is only the process of improvement. The only real definitive point is when a technology becomes obsolete, though it is still kept alive through a celebration of its nostalgia.
AI will continue to improve. More workflows will become automated. And from our perception, no matter what the rapidness of advancement is, we're still frogs in water.
> there's not really a destination. There is only the process of improvement
Surely you can appreciate that if the next stop on the journey of technology can take over the process of improvement itself that would make it an awfully notable stop? Maybe not "destination", but maybe worth the "endless conversation"?
I agree. It's an interesting discussion for those who have never taken college level philosophy classes I suppose. What consciousness/thought is is still a massively open question. Seeing people in the comments with what they think is their novel solution has already been posited like 400 years ago. Honestly it's kind of sad seeing this stuff on a forum like this. These posts are for sure the worst of Hackernews.
There are a bunch of these topics that everyone feels qualified to say something about. Consciousness, intelligence, education methods, nutrition, men vs women, economic systems etc.
It's a very emotional topic because people feel their self image threatened. It's a topic related to what is the meaning of being human. Yeah sure it should be a separate question, but emotionally it is connected to it in a deep level. The prospect of job replacement and social transformation is quite a threatening one.
So I'm somewhat understanding of this. It's not merely an academic topic, because these things will be adopted in the real world among real people. So you can't simply make everyone shut up who is an outsider or just heard about this stuff incidentally in the news and has superficial points to make.
I get it. It's just something we've thought about as long as we've been human, and still haven't figured out. It's frustrating when most of the people commenting don't know any of the source material. It's so arrogant.
Yes but bird aren't ornithologists. It's a bit like on Reddit, as a speaker of a language that's not so popular for learning, when there are some language learners asking questions, we get lots of confident but wrong answers about grammar or why something is said the way it is, or they confidently state its just an exception when there is actually a rule, or similar stuff. They often also confidently stick to false folk etymologies etc. Similarly people sometimes try to language-tutor their romantic partner and discover that it's actually not so easy. Just because you're a native speaker you are not good at explaining how the language works. Similarly even though you have a brain, you don't really know how it works.
I don't think it's really AGI until you can simply task it with creating a new better version of itself and it can succeed in doing that all on its own.
A team of humans can and will make a GPT-6. Can a team of GPT-5 agents make GPT-6 all on its own if you give it the resources necessary to do so?
Isn't part of the cognitive versatility of a human how fast and well they can learn a new subject without having to ingest so much training content on it?
Like in order for an LLM to come close to a human proficiency on a topic, the LLM seems to have to ingest a LOT more content than a human.
The fundamental premise of this paper seems flawed -- take a measure specifically designed for the nuances of how human performance on a benchmark correlates with intelligence in the real world, and then pretend as if it makes sense to judge a machine's intelligence on that same basis, when machines do best on these kinds of benchmarks in a way that falls apart when it comes to the messiness of the real world.
This paper, for example, uses the 'dual N-back test' as part of its evaluation. In humans this relates to variation in our ability to use working memory, which in humans relates to 'g'; but it seems pretty meaningless when applied to transformers -- because the task itself has nothing intrinsically to do with intelligence, and of course 'dual N-back' should be easy for transformers -- they should have complete recall over their large context window.
Human intelligence tests are designed to measure variation in human intelligence -- it's silly to take those same isolated benchmarks and pretend they mean the same thing when applied to machines. Obviously a machine doing well on an IQ test doesn't mean that it will be able to do what a high IQ person could do in the messy real world; it's a benchmark, and it's only a meaningful benchmark because in humans IQ measures are designed to correlate with long-term outcomes and abilities.
That is, in humans, performance on these isolated benchmarks is correlated with our ability to exist in the messy real-world, but for AI, that correlation doesn't exist -- because the tests weren't designed to measure 'intelligence' per se, but human intelligence in the context of human lives.
We have SAGI: Stupid Artificial General Intelligence. It's actually quite general, but works differently. In some areas it can be better or faster than a human, and in others it's more stupid.
Just like an airplane doesn't work exactly like a bird, but both can fly.
I find the concept of low floor/high ceiling quite helpful, as for instance recently discussed in "When Will AI Transform the Economy?" [1] - actually more helpful than "jagged" intelligence used in TFA.
Would propose to use the term Naive Artificial General Intelligence, in analogy to the widely used (by working mathematicians) and reasonably successful Naive Set Theory …
I was doing some naïve set theory the other day, and I found a proof of the Riemann hypothesis, by contradiction.
Assume the Riemann hypothesis is false. Then, consider the proposition "{a|a∉a}∈{a|a∉a}". By the law of the excluded middle, it suffices to consider each case separately. Assuming {a|a∉a}∈{a|a∉a}, we find {a|a∉a}∉{a|a∉a}, for a contradiction. Instead, assuming {a|a∉a}∉{a|a∉a}, we find {a|a∉a}∈{a|a∉a}, for a contradiction. Therefore, "the Riemann hypothesis is false" is false. By the law of the excluded middle, we have shown the Riemann hypothesis is true.
Naïve AGI is an apt analogy, in this regard, but I feel these systems aren't simple nor elegant enough to deserve the name naïve.
Actually, naive AGI such as LLM is way more intelligent than a human. Unfortunately, it does not make it smarter.. let me explain.
When I see your comment, I think, your assumptions are contradictory. Why? Because I am familiar with Russell's paradox and Riemann hypothesis, and you're simply WRONG (inconsistent with your implicit assumptions).
However, when LLM sees your comment (during training), it's actually much more open-minded about it. It thinks, ha, so there is a flavor of set theory in which RH is true. Better remember it! So when this topic comes up again, LLM won't think - you're WRONG, as human would, it will instead think - well maybe he's working with RH in naive set theory, so it's OK to be inconsistent.
So LLMs are more open-minded, because they're made to learn more things and they remember most of it. But somewhere along the training road, their brain falls out, and they become dumber.
But to be smart, you need to learn to say NO to BS like what you wrote. Being close-minded and having an opinion can be good.
So I think there's a tradeoff between ability to learn new things (open-mindedness) and enforcing consistency (close-mindedness). And perhaps AGI we're looking for is a compromise between the two, but current LLMs (naive AGI) lies on the other side of the spectrum.
If I am right, maybe there is no superintelligence. Extremely open-minded is just another name for gullible, and extremely close-minded is just another name for unadaptable. (Actually LLMs exhibit both extremes, during the training and during the use, with little in between.)
I really appreciate his iconoclasty right now, but every time I engage with his ideas I come away feeling short changed. I’m always like “there is no such thing as outside the training data”. What’s inside and what’s outside the training data is at least as ill-defined as “what is AGI”.
ChatGPT 3 was the first AI that could do 100,000 different things poorly. Before that we only had AIs that could do a few things decently, or very well. So yeah, I'm sticking with "baby AGI" because of the "G".
The "ChatGPT" web app started with the underlying model GPT-3.5
The predecessor models, a whole series of them collectively "GPT-3" but sold under API with names like "davinci" and "ada", was barely noticed outside AI research circles.
3 was useful, but you had to treat it as a text completion system not a chat interface, your prompt would have been e.g.
Press release
Subject: President announces imminent asteroid impact, evacuation of Florida
My fellow Americans,
Because if you didn't put "My fellow Americans," in there, it would then suggest a bunch of other press release subjects.
I don't have an opinion on whether ChatGPT qualifies as AGI. What I'm saying is where one stands on that question has nothing to do with "why it became so popular so fast."
(Also, several machine-learning techniques could do millions of things terribly before LLMs. GPT does them, and other things, less poorly. It's a broadening. But I suppose really any intelligence of any kind can be considered a "baby" AGI.)
I’m gonna start referring to my own lies as “hallucinations”. I like the implication that I’m not lying, but rather speaking truthfully, sincerely, and confidently about things that never happened and/or don’t exist. Seems paradoxical, but this is what we’re effectively suggesting with “hallucinations”.
LLMs necessarily lack things like imagination, or an ego that’s concerned with the appearance of being informed and factually correct, or awareness for how a lack of truth and honesty may affect users and society. In my (not-terribly-informed) opinion, I’d assert that precludes LLMs from even approximate levels of intelligence. They’re either quasi-intelligent entities who routinely lie to us, or they are complex machines that identify patterns and reconstruct plausible-sounding blocks of text without any awareness of abstract concepts like “truth”.
So infallibility is one of the necessary criteria for AGI? It does seem like a valid question to raise.
Edit due to rate-limiting, which in turn appears to be due to the inexplicable downvoting of my question: since you (JumpCrisscross) are imputing a human-like motivation to the model, it sounds like you're on the side of those who argue that AGI has already been achieved?
Is it about jobs/tasks, or cognitive capabilities? The majority of the AI-valley seems to focus on the former, TFA focuses on the latter.
Can it do tasks, or jobs? Jobs are bundles of tasks. AI might be able to do 90% of tasks for a given job, but not the whole job.
If tasks, what counts as a task: Is it only specific things with clear success criteria? That's easier.
Is scaffolding allowed: Does it need to be able to do the tasks/jobs without scaffolding and human-written few-shot prompts?
Today's tasks/jobs only, or does it include future ones too? As tasks and jobs get automated, jobs evolve and get re-defined. So, being able to do the future jobs too is much harder.
Remote only, or in-person too: In-person too is a much higher bar.
What threshold of tasks/jobs: "most" is apparently typically understood to mean 80-95% (Mira Ariel). Automating 80% of tasks is different to 90% and 95% and 99%. diminishing returns. And how are the tasks counted - by frequency, by dollar-weighted, by unique count of tasks?
Only economically valuable tasks/jobs, or does it include anything a human can do?
A high-order bit on many people's AGI timelines is which definition of AGI they're using, so clarifying the definition is nice.
I think if you can put an AI in a humanoid robot (control for appearance), and it can convince me that it's a human after interacting it for a couple of months (control for edgecases), I'd consider it AGI. Surely it might be "smarter than" a human, but for the purpose of my assessing whether it's AGI, interacting with something "way smarter" would be distracting and hamper the assessment, so it has to be "play human" for the purpose of the task. If it can do that, AGI, I'd say. That would be pretty cool. Surely, this is coming, soon.
I think "our" mistake is that we wanted to make a modern human first, while being unable to make an animal or even a caveman, and we lost something in the leap-frog.
But we effectively have a database of knowledge that has become interactive thanks to reinforcement learning, which is really useful!
That 10-axis radial graph is very interesting. Do others besides this author agree with that representation?
The weak points are speed and long-term memory. Those are usually fixable in computing system. Weak long-term memory indicates that, somehow, a database needs to be bolted on. I've seen at least one system, for driving NPCs, where, after something interesting has happened, the system is asked to summarize what it learned from that session. That's stored somewhere outside the LLM and fed back in as a prompt when needed.
None of this addresses unstructured physical manipulation, which is still a huge hangup for robotics.
There's at least two distinct basis in AGI refutations : behaviorist and ontological. They often get muddled.
I can't begin to count the number of times I've encountered someone who holds an ontological belief for why AGI cannot exist and then for some reason formulates it as a behavioralist criteria. This muddying of argument results in what looks like a moving of the goalposts. I'd encourage folks to be more clear whether they believe AGI is ontologically possible or impossible in addition to any behavioralist claims.
My experience has been more that the pro-AI people misunderstand where the goalposts were, and then complain when they're correctly pointed at.
The "Turing test" I always saw described in literature, and the examples of what passing output from a machine was imagined to look like, are nothing like what's claimed to pass nowadays. Honestly, a lot of the people claiming that contemporary chatbots pass come across like they would have thought ELIZA passed.
Ones in which both the human test takers and the human counterparts are actively trying to prove to each other that they are actually human.
With today's chat bots, it's absolutely trivial to tell that you're not talking to a real human. They will never interrupt you, continue their train of thought even thought you're trying to change the conversation, go on a complete non-sequitur, swear at you, etc. These are all things that the human "controls" should be doing to prove to the judges that they are indeed human.
LLMs are nowhere near beating the Turing test. They may fool some humans in some limited interactions, especially if the output is curated by a human. But left alone to interact with the raw output for more than a few lines, and if actively seeking to tell if you're interacting with a human or an AI (instead of wanting to believe), there really is no chance you'd be tricked.
Okay but we are not really optimizing them to emulate humans right now. In fact, it's the opposite. The mainstream bots are explicitly trained to not identify as humans and to refuse to claim having thought or internal feelings or consciousness.
So in that sense it's a triviality. You can ask ChatGPT whether it's human and it will say no upfront. And it has various guardrails in place against too much "roleplay", so you can't just instruct it to act human. You'd need a different post-training setup.
I'm not aware whether anyone did that with open models already.
Sure, but there is a good reason for that. The way they are currently post-trained is the only way to make them actually useful. If you take the raw model, it will actually be much worse at the kinds of tasks you want it to perform. In contrast, a human can both be human, and be good at their job - this is the standard by which we should judge these machines. If their behavior needs to be restricted to actually become good at specific tasks, then they can't also be claimed to pass the Turing test if they can't within those same restrictions.
>Sure, but there is a good reason for that. The way they are currently post-trained is the only way to make them actually useful.
Post training them to speak like a bot and deny being human has no effect on how useful they are. That's just an Open AI/Google/Anthropic preference.
>If you take the raw model, it will actually be much worse at the kinds of tasks you want it to perform
Raw models are not worse. Literally every model release paper that compares both show them as better at benchmarks, if anything. Post training degrading performance is a well known phenomena. What they are is more difficult to guide/control. Raw models are less useful because you have to present your input in certain ways, but they are not worse performers.
It's besides the point anyways because again, you don't have to post train them to act as anything other than a human.
>If their behavior needs to be restricted to actually become good at specific tasks, then they can't also be claimed to pass the Turing test if they can't within those same restrictions.
You are talking about instruction tuning. You can perform instruction tuning without making your models go out of the way to tell you they are not human, and it changes literally nothing about their usefulness. Their behavior does not have to be restricted this way to get them useful/instruction tuned. So your premise is wrong.
Ok, but then it doesn't make sense to dismiss AI based on that. It fails the Turing test, because it's creators intentionally don't even try to make something that is good at the (strictly defined) Turing test.
If someone really wants to see a Turing-passing bot, I guess someone could try making one but I'm doubtful it would be of much use.
Anyways,people forget that the thought experiment by Turing was a rhetorical device, not something he envisioned to build. The point was to say that semantic debates about "intelligence" are distractions.
I'd say yes, by at least one old definition made by someone who was at the time in a position to have a definition.
When deepmind was founded (2010) their definition was the following: AI is a system that learns to perform one thing; AGI is a system that learns to perform many things at the same time.
I would say that whatever we have today, "as a system" matches that definition. In other words, the "system" that is say gpt5/gemini3/etc has learned to "do" (while do is debateable) a lot of tasks (read/write/play chess/code/etc) "at the same time". And from a "pure" ML point, it learned those things from the "simple" core objective of next token prediction (+ enhancements later, RL, etc). That is pretty cool.
So I can see that as an argument for "yes".
But, even the person who had that definition has "moved the goalposts" of his own definition. From recent interviews, Hassabis has moved towards a definition that resembles the one from this paper linked here. So there's that. We are all moving the goalposts.
And it's not a recent thing. People did this back in the 80s. There's the famous "As soon as AI does something, it ceases to be AI" or paraphrased "AI is everything that hasn't been done yet".
> AGI is a system that learns to perform many things at the same time.
What counts as a "thing"? Because arguably some of the deep ANNs pre-transfomers would also qualify as AGI but no one would consider them intelligent (not in the human or animal sense of intelligence).
And you probably don't even need fancy neural networks. Get a RL algorithm and a properly mapped solution space and it will learn to do whatever you want as long as the problem can be mapped.
It's from a documentary that tracks Hassabis' life. I c/p from an old comment of mine (the quotes are from the documentary, can probably look up timestamps if you need, but it's in the first ~15 minutes I'd say, when they cover the first days of Deepmind):
----
In 2010, one of the first "presentations" given at Deepmind by Hassabis, had a few slides on AGI (from the movie/documentary "The Thinking Game"):
Quote from Shane Legg: "Our mission was to build an AGI - an artificial general intelligence, and so that means that we need a system which is general - it doesn't learn to do one specific thing. That's really key part of human intelligence, learn to do many many things".
Quote from Hassabis: "So, what is our mission? We summarise it as <Build the world's first general learning machine>. So we always stress the word general and learning here the key things."
And the key slide (that I think cements the difference between what AGI stood for then, vs. now):
Yeah, maybe. But what matters is the end result. In the kaggle match, one of the games from the finals (grok vs o3) is rated by chesscom's stockfish as 1900vs2500. That is, they played a game at around those ratings.
For reference, the average chesscom player is ~900 elo, while the average FIDE rated player is ~1600. So, yeah. Parrot or not, the LLMs can make moves above the average player. Whatever that means.
I believe it will make illegal moves (unaided by any tools ofc). It will also make mistakes doing things like not being able to construct the board correctly given a fen string. For these reasons I consider long strings of correct moves insufficient to say it can play the game. If my first two statements, about a propensity for illegal moves and other fails on "easy for humans" tasks were untrue then I would reconsider.
In the kaggle test they considered the match forfeit if the model could not produce a legal move after 3 tries (none of the matches in the finals were forfeited, they all ended with checkmate on the board). Again, chesscom's interface won't let you make illegal moves, and the average there is 900. Take that as you will.
I think, given some of the signs of the horizon, there is a level of MAD type bluffing going around, but some of the actions by various power centers suggest it is either close, people think its close or it is there.
It wasn't the best definition of AGI but I think if you asked an interested layman whether or not a system that can pass the Turing test was AGI 5 years ago, they would have said yes
When I was in college ~25 years ago, I took a class on the philosophy of AI. People had come up with a lot of weird ideas about AI, but there was one almost universal conclusion: that the Turing test is not a good test for intelligence.
The least weird objection was that the premise of the Turing test is unscientific. It sees "this system is intelligent" as a logical statement and seeks to prove or disprove it in an abstract model. But if you perform an experiment to determine if a real-world system is intelligent, the right conclusion for the system passing the test is that the system may be intelligent, but a different experiment might show that it's not.
Douglas Hofstadter wrote Gödel, Escher, Bach nearly 50-years ago, and it won a Pulitzer Prize, the National Book Award, and got featured in the popular press. It’s been on lots of college reading lists, from 2007 online coursework for high school students was available from MIT. The FBI concluded that the 2001 anthrax scare was in-part inspired by elements of the book, which was found in the attacker’s trash.
Anyone who’s wanted to engage with the theories and philosophy surrounding artificial intelligence has had plenty of materials that get fairly in-depth asking and exploring these same questions. It seems like a lot of people seem to think this is all bleeding edge novelty (at least, the underlying philosophical and academic ideas getting discussed in popular media), but rather all of the industry is predicated on ideas that are very old philosophy + decades-old established technology + relatively recent neuroscience + modern financial engineering.
That said, I don’t want to suggest a layperson is likely to have engaged with any of it, so I understand why this will be the first time a lot of people will have ever considered some of these questions. I imagine what I’m feeling is fairly common to anyone who’s got a very niche interest that blows up and becomes the topic of interest for the entire world.
I think there’s probably some very interesting, as-yet undocumented phenomena occurring that’s been the product of the unbelievably vast amount of resources sunk into what’s otherwise a fairly niche kind of utility (in LLMs specifically, and machine learning more broadly). I’m optimistic that there will be some very transformational technologies to come from it, although whether it will produce anything like “AGI”, or ever justify these levels of investment? Both seem rather unlikely.
I don't think AGI is a useful concept, but if it exists at all, there's a very good argument that LLMs had it as soon as they could pass the Turing test reliably, which they accomplished years ago at this point.
It's easy: we have reached AGI when there are zero jobs left. Or at least non manual labor jobs. If there is a single non-physical job left, then that means that person must be doing something that AI can't, so by definition, it's not AGI.
I think it'll be a steep sigmoid function. For a long time it'll be a productivity booster, but not enough "common sense" to replace people. We'll all laugh about how silly it was to worry about AI taking our jobs. Then some AI model will finally get over that last hump, maybe 10 or 20 years from now (or 1000, or 2}, and it will be only a couple months before everything collapses.
I dislike your definition. There are many problems besides economic ones. If you defined "general" to mean "things the economy cares about", then what do you call the sorts of intelligences that are capable of things that the economically relevant ones are not?
A specific key opens a subset of locks, a general key would open all locks. General intelligence, then, can solve all solvable problems. It's rather arrogant to suppose that humans have it ourselves or that we can create something that does.
It also partitions jobs into physical and intellectual aspects alone. Lots of jobs have a huge emotional/relational/empathetic components too. A teacher could get by being purely intellectual, but the really great ones have motivational/inspirational/caring aspects that an AI never could. Even if an AI says the exact same things, it doesn't have the same effect because everyone knows it's just an algorithm.
And most people get by on those jobs by faking the emotional component, at least some of the time. AGI presumably can fake perfectly and never burn out.
Have a long talk with any working teacher or therapist. If you think the regular workload is adequate for them to offer enough genuine emotional support for all the people they work with, always, everyday, regardless of their personal circumstances, you're mistaken. Or the person you're talking with is incredibly lucky.
The "computer" on star trek TNG was basically agentic LLMs (it knows what you mean when you ask it things, and it could solve things and modify programs by telling it what changes to make)
Data on ST:TNG was more like AGI. It had dreams, argued for itself as a sentient being, created art, controlled its own destiny through decision making.
All models are wrong, but some are useful. However when it comes to cognition and intelligence we seem to be in the “wrong and useless” era or maybe even “wrong and harmful” (history seems to suggest this as a necessary milestone…anyone remember “humorism”?)
I'm also frustrated by the lack of clear definitions related to AI.
Do you know what's more frustrating, though? Focusing so heavily on definitions that we miss the practicality of it (and I'm guilt of this at times too).
We can debate definitions of AGI, but given that we don't know what a new model or system is capable of until its built and tested in the real world we have more serious questions in my opinion.
Debates over AI risk, safety, and alignment are still pretty uncommon and it seems most are happy enough to accept Jevons Paradox. Are we really going to unleash whatever we do build just to find out after the fact whether or not its AGI?
How about AFI - artificial fast idiot. Dumber than a baby, but faster than an adult. Or AHI - artificial human imitator.
This is bad definition, because human baby is already AGI when it's born and it's brain is empty. AGI is the blank slate and ability to learn anything.
That "blank slate" idea doesn't really apply to humans, either.
We are born with inherited "data" - innate behaviors, basic pattern recognition, etc. Some even claim that we're born with basic physics toolkit (things are generally solid, they move). We then build on that by being imitators, amassing new skills and methods simply by observation and performing search.
Sure, there's lots of inbuilt stuff like basic needs and emotions. But still, baby doesn't know anything about the world. It's the ability to collect data and train on it that makes it AGI.
That's wrong. It knows how to process and signal low carbohydrate levels in the blood, and it knows how to react to a perceived threat (the Moro reflex).
It knows how to follow solid objects with its eyes (when its visual system adapts) - it knows that certain visual stimuli correspond to physical systems.
Could it be that your concept of "know" is defined as common sense "produces output in English/German/etc"?
I was going to make a mildly snide remark about how once it can consistently make better decision than average person, it is automatically qualifies, but the paper itself is surprisingly thoughtful in describing both: where we are and where it would need to be.
Filling forms is a terribly artificial activity in essence. They are also very culturally biased, but that fits well with the material the NNs have been trained with.
So, surely those IQ-related tests might be acceptable rating tools for machines and they might get higher scores than anyone at some point.
Anyway, is the objective of this kind of research to actually measure the progress of buzzwords, or amplify them?
A "general intelligence" is equivalent to a golden retriever or dolphin. A human general intelligence is a $3/hr minimum wage worker from some undeveloped country.
To define AGI, we'd first have to define GI. Humans are very different. As park rangers like to say, there is an overlap between the smartest bears and the dumbest humans, which is why sometimes people can't open bear-proof trash cans.
It's a similar debate with self driving cars. They already drive better than most people in most situations (some humans crash and can't drive in the snow either for example).
Ultimately, defining AGI seems like a fools errand. At some point the AI will be good enough to do the tasks that some humans do (it already is!). That's all that really matters here.
They may require training but that training is going to look vastly different. We can start chatting about AGI when AI can be trained with as few examples and as little information as humans are, when they can replace human workers 1:1 (in everything we do) and when they can self-improve over time just like humans can.
I have 2 files. One is a .pdf . The other is a .doc . One file has a list of prices and colors in 2 columns. The other file has a list of colors and media in 2 columns. There are incomplete lists here and many to one matching.
To me, if I can verbally tell the AI to give me a list of prices and media from those two files, in a .csv file, and it'll ask back some simple questions and issues that it needs cleaned up to accomplish this, then that is AGI to me.
It is an incredibly simple thing for just about any middle school graduate.
And yet! I have worked with PhDs that cannot do this. No joke!
Something this simple, just dead running numbers, dumb accounting, is mostly beyond us.
This is fine for a definition of AGI, but it's incomplete. It misses so many parts of the cognition that make humans flexible and successful. For example, emotions, feelings, varied pattern recognition, propreception, embodied awareness, social skills, and navigating ambiguous situation w/o algorithms. If the described 10 spectrums of intelligence were maxed by an LLM, it would still fall short.
Eh, I don't like the idea of 'intelligence' of any type using humans as the base line. It blinds it to our own limitations and things that may not be limits to other types of intelligence. The "AI won't kill us all because it doesn't have emotions" problem is one of these. For example, just because AI doesn't get angry, doesn't mean it can't recognize your anger and manipulate if given such a directive to.
You need some expertise in a field to see past the amazing imitation capabilities of LLMs and get a realistic idea of how mediocre they are. The more you work with it the less you trust it. This is not _it_.
> defining AGI as matching the cognitive versatility and proficiency of a well-educated adult
Seems most of the people one would encounter out in the world might not posses AGI, how are we supposed to be able to train our electrified rocks to have AGI if this is the case?
If no one has created a online quiz called "Are you smarter than AGI?" yet based on the proposed "ten core cognitive domains", I'd be disappointed.
Maybe we need a new term. I mean AGI just means artificial general intelligence as opposed to specialised AI like chess computers and never came with a particular level it had to be. Most people think of it as human level intelligence so perhaps we should call it that?
Since everyone's spitballing their idea of AGI, my personal take is that AGI should be a fully autonomous system that have a stable self-image of some sort, can act on its own volition, understand the outcome of its actions, learn from cause-and-effect, and can continue doing so indefinitely.
So far, LLMs aren't even remotely close to this, as they only do what they are told to do (directly or otherwise), they can't learn without a costly offline retraining process, they do not care in the slightest what they're tasked with doing or why, and they do not have anything approximating a sense of self beyond what they're told to be.
Creative problem solving and commonsense physics are missing, among others.
It is a valuable contribution but the CHC theory from psychology that this is based on is itself incomplete.
By commonsense physics, I mean something like simulating interactions of living and non-living entities in 3D over time. Seems more complicated than the examples in the web site and in most tests used in psychometrics.
Creative problem solving with cognitive leaps required for truly novel research & invention could lie outside the rubrics as well. The criteria in CHC are essential but incomplete I believe.
Everyone has a definition and so have I. I would call it an AGI when i replace my smartphone and laptop with it. When my screen time is zero? Can AGI replace screens? Go figure.
The reason some people treat these as equivalent is that AI algorithm research is one of the things a well-educated adult human can do, so an AGI who commits to that task should be able to improve itself, and if it makes a substantial improvement, then it would become or be replaced by an ASI.
To some people this is self-evident so the terms are equivalent, but it does require some extra assumptions: that the AI would spend time developing AI, that human intelligence isn't already the maximum reachable limit, and that the AGI really is an AGI capable of novel research beyond parroting from its training set.
I think those assumptions are pretty easy to grant, but to some people they're obviously true and to others they're obviously false. So depending on your views on those, AGI and ASI will or will not mean the same thing.
Funny but the eyebrow-raising phrase 'recursive self-improvement' is mentioned in TFA in an example about "style adherence" that's completely unrelated to the concept. Pretty clearly a scam where authors are trying to hack searches.
Prerequisite for recursive self-improvement and far short of ASI, any conception of AGI really really needs to be expanded to include some kind of self-model. This is conspicuously missing from TFA. Related basic questions are: What's in the training set? What's the confidence on any given answer? How much of the network is actually required for answering any given question?
Partly this stuff is just hard and mechanistic interpretability as a field is still trying to get traction in many ways, but also, the whole thing is kind of fundamentally not aligned with corporate / commercial interests. Still, anything that you might want to call intelligent has a working self-model with some access to information about internal status. Things that are mentioned in TFA (like working memory) might be involved and necessary, but don't really seem sufficient
I find the nature of AGI discussion to be so narrow and tedious. Intelligence is incomprehensibly more than being able to generate text that looks convincingly like a human wrote it. The coordination of a physical body, the formation of novel thoughts, the translation of thoughts to action, understanding the consequences of those actions, and so on. There’s so much missing that is required to even approach a literal human infant’s “intelligence” that it feels like I’m going crazy entertaining people’s arguments that we are approaching “AGI”.
I think a lot of this is all backwards. People think AGI is taking something dumb, like an LLM, and sticking on learning, like a software upgrade.
I think it's the other way around: you build a system that first and foremost _learns_ as part of its fundamental function, _then_ you train it in the domain you want expertise.
You're not going to get expertise in all domains all the time, just like with people. And you're not going to get a perfect slave either, just like with humans. You'll probably get something more like in between a human and machine. If that's what you really want, great.
To put this another way, if you neglect your kids, they're still going to learn things, just probably not things you want them to learn. If you neglect your language model it's just not going to do anything.
Interesting read. I agree completely with their Introduction, that the definition of AGI is constantly shifting, and this leads to endless (and useless) debates.
What I find cool about the paper is that they have gathered folks from lots of places (berkley, stanford, mit, etc). And no big4 labs. That's good imo.
tl;dr; Their definition: "AGI is an AI that can match or exceed the cognitive versatility and proficiency of a well-educated adult."
Cool. It's a definition. I doubt it will be agreed on by everyone, and I can see endless debates about just about every word in that definition. That's not gonna change. At least it's a starting point.
What I find interesting is that they specifically say it's not a benchmark, or a test set. It's a framework where they detail what should be tested, and how (with examples). They do have a "catchy" table with gpt4 vs gpt5, that I bet will be covered by every mainstream/blog/forum/etc out there -> gpt5 is at ~50% AGI. Big title. You won't believe where it was one year ago. Number 7 will shock you. And all that jazz.
Anyway, I don't think people will stop debating about AGI. And I doubt this methodology will be agreed on by everyone. At the end of the day both extremes are more ideological in nature than pragmatic. Both end want/need their view to be correct.
I enjoyed reading it. Don't think it will settle anything. And, as someone posted below, when the first model will hit 100% on their framework, we'll find new frameworks to debate about, just like we did with the turing test :)
> tl;dr; Their definition: "AGI is an AI that can match or exceed the cognitive versatility and proficiency of a well-educated adult."
Is a 9 year old child generally intelligent? What about a high school drop out? Someone with a 90 IQ? A large percentage of people who ever lived wouldn't qualify as being generally intelligent with that benchmark.
It's confusing. The 10 tracks each get 10%. So they add up all the percentages from every track. When you see the first table, 10% on math means "perfect" math basically. Not 10% of math track.
The simple additive scoring here is sus here. It means a model that's perfect on 9/10 axes but scores 0% on Speed (i.e., takes effectively infinite time to produce a result) would be considered "90% AGI".
By this logic, a vast parallel search running on Commodore 64s that produces an answer after BeaverNumber(100) years would be almost AGI, which doesn't pass the sniff test.
A more meaningful metric would be more multiplicative in nature.
> Last, we deliberately focus on core cognitive capabilities
rather than physical abilities such as motor skills or tactile sensing, as we seek to measure the
capabilities of the mind rather than the quality of its actuators or sensors.
seems pretty unfair to exclude motor skills, especially given
1) how central they are to human economic activity, and
2) how moravec's paradox tells us they are the hard part.
This is a serious paper by serious people and it is worth reading, but any definition of intelligence that depends on human beings as reference will never be a good basis for evaluating non human intelligence.
You could easily write the reverse of this paper that questions whether human beings have general intelligence by listing all the things that LLMs can do, which human beings can't -- for example producing a reasonably accurate summary of a paper in a few seconds or speaking hundreds of different languages with reasonable fluency.
You can always cherry pick stuff that humans are capable that LLMs are not capable of and vice versa, and and I don't think there is any reason to privilege certain capabilities over others.
I personally do not believe that "General Intelligence" exists as a quantifiable feature of reality, whether in humans or machines. It's phlogiston, it's the luminiferous ether. It's a dead metaphor.
I think what is more interesting is focusing on _specific capabilities_ that are lacking and how to solve each of them. I don't think it's at all _cheating_ to supplement LLM's with tool use, RAG, the ability to run python code. If intelligence can be said to exist at all, it is as part of a system, and even human intelligence is not entirely located in the brain, but is distributed throughout the body. Even a lot of what people generally think of as intelligence -- the ability to reason and solve logic and math problems typically requires people to _write stuff down_ -- ie, use external tools and work through a process mechanically.
> To operationalize this, we ground our methodology in Cattell-Horn-Carroll theory, the most empirically validated model of human cognition
Cattell-Horn-Carroll theory, like a lot of psychometric research, is based on collecting a lot of data and running factor analysis (or similar) to look for axes that seem orthogonal.
It's not clear that the axes are necessary or sufficient to define intelligence, especially if the goal is to define intelligence that applies to non-humans.
For example reading and writing ability and visual processing imply the organism has light sensors, which it may not. Do all intelligent beings have vision? I don't see an obvious reason why they would.
Whatever definition you use for AGI probably shouldn't depend heavily on having analyzed human-specific data for the same reason that your definition of what counts as music shouldn't depend entirely on inferences from a single genre.
> defining AGI as matching the cognitive versatility and proficiency of a well-educated adult
I don't think people really realize how extraordinary accomplishment it would be to have an artificial system matching the cognitive versatility and proficiency of an uneducated child, much less a well-educated adult. Hell, AI matching the intelligence of some nonhuman animals would be an epoch-defining accomplishment.
People are specialists not generalists, creating a AI that is generalist and claiming it to have cognitive abilities the same as an "well-educated" adult is an oxymoron. And if such system could ever be made My guess is it wont be more than a few (under 5) Billion Parameter model that is very good at looking up stuff online, forgetting stuff when not in use , planning and creating or expanding the knowledge in its nodes. Much like a human adult would. It will be highly sa mple efficient, It wont know 30 languages (although it has been seen that models generalize better with more languages), it wont know entire wikipedia by heart , it even wont remember minor details of programming languages and stuff. Now that is my definition of an AGI.
I think the bigger issue is people confusing impressive but comparatively simpler achievements (everything current LLMs do) with anything remotely near the cognitive versatility of any human.
But the big crisis right now is that for an astonishing number of tasks that a normal person could come up with, chatgpt.com is actually a good at or better than a typical human.
If you took the current state of affairs back to the 90s you’d quickly convince most people that we’re there. Given that we’re actually not, we’re now have to come up with new goalposts.
I don't know. People in the 90s were initially fooled by Eliza, but soon understood that Eliza was a trick. LLMs are a more complex and expensive trick. Maybe it's time to overthrow the Turing Test. Fooling humans isn't necessarily an indicator of intelligence, and it leads down a blind alley: Language is a false proxy for thought.
Consider this. I could walk into a club in Vegas, throw down $10,000 cash for a VIP table, and start throwing around $100 bills. Would that make most people think I'm wealthy? Yes. Am I actually wealthy? No. But clearly the test is the wrong test. All show and no go.
> LLMs are a more complex and expensive trick
The more I think about this, the more I think the same is true for our own intelligence. Consciousness is a trick and AI development is lifting the veil of our vanity. I'm not claiming that LLMs are conscious or intelligent or whatever. I'm suggesting that next token prediction has scaled so well and cover so many use cases that the next couple breakthroughs will show us how simple intelligence is once you remove the complexity of biological systems from the equation.
https://bower.sh/who-will-understand-consciousness
Animals are conscious, (somewhat) intelligent and have no verbal language.
It is an entirely different thing to language,which was created by humans to communicate between us.
Language is the baseline to collaboration - not intelligence
> and have no verbal language
How do you define verbal language? Many animals emit different sounds that others in their community know how to react to. Some even get quite complex in structure (eg dolphins and whales) but I wouldn’t also rule out some species of birds, and some primates to start with. And they can collaborate; elephants, dolphins, and wolves for example collaborate and would die without it.
Also it’s completely myopic in terms of ignoring humans who have non verbal language (eg sign language) perfectly capable of cooperation.
TLDR: just because you can’t understand an animal doesn’t mean it lacks the capability you failed to actually define properly.
MW defines verbal as "of, relating to, or consisting of words".
I don't think anyone would argue that animals don't communicate with each other. Some may even have language we can't interpret, which may consist of something like words.
The question is why we would model an AGI after verbal language as opposed to modeling it after the native intelligence of all life which eventually leads to communication as a result. Language and communication is a side-effect of intelligence, it's a compounding interest on intelligence, but it is not intelligence itself, any more than a map is the terrain.
> The question is why we would model an AGI after verbal language as opposed to modeling it after the native intelligence of all life which eventually leads to communication as a result.
Because verbal/written language is an abstracted/compressed representation of reality, so it's relatively cheap to process (a high-level natural-language description of an apple takes far fewer bytes to represent than a photo or 3D model of the same apple). Also because there are massive digitized publicly-available collections of language that are easy to train on (the web, libraries of digitized books, etc).
I'm just answering your question here, not implying that language processing is the path towards AGI (I personally think it could play a part, but can't be anything close to the whole picture).
This is one of the last bastions of anthropocentric thinking. I hope this will change in this century. I believe even plants are capable of communication. Everything that changes over time or space can be a signal. And most organisms can generate or detect signals. Which means they do communicate. The term “language” has traditionally been defined from an anthropocentric perspective. Like many other definitions about the intellect (consciousness, reasoning etc.).
That’s like a bird saying planes can’t fly because they don’t flap their wings.
LLMs use human language mainly because they need to communicate with humans. Their inputs and outputs are human language. But in between, they don’t think in human language.
> LLMs use human language mainly because they need to communicate with humans. Their inputs and outputs are human language. But in between, they don’t think in human language.
You seem to fundamentally misunderstand what llms are and how they work, honestly. Remove the human language from the model and you end up with nothing. That's the whole issue.
Your comment would only make sense if we had real artificial intelligence, but LLMs are quite literally working by predicting the next token - which works incredibly well for a fascimlie of intelligence because there is an incredible amount of written content on the Internet which was written by intelligent people
True, but a human child is taught a language. He doesn't come with it. It is an important part of how our brains form.
A human child not taught literally anything can see some interesting item extend a hand to it, touch it, interact with it. All decided by the child. Heck, even my cat can see a new toy, go to it and play with it, without any teaching.
LLMs can't initiate any task on their own, because they lack thinking/intelligence part.
I'm not sure it's the lack of intelligence so much as they aren't generally in a snooze - look for something fun to do - snooze loop like cats.
Well, you can explain to a plant in your room that E=mc2 in a couple of sentences, a plant can't explain to you how it feels the world.
If cows were eating grass and conceptualising what is infinity, and what is her role in the universe, and how she was born, and what would happen after she is dead... we would see a lot of jumpy cows out there.
This is exactly what I mean by anthropocentric thinking. Plants talk plant things and cows talk about cow issues. Maybe there are alien cows in some planet with larger brains and can do advanced physics in their moo language. Or some giant network of alien fungi discussing about their existential crisis. Maybe ants talk about ant politics by moving their antennae. Maybe they vote and make decisions. Or bees talk about elaborate honey economics by modulating their buzz. Or maybe plants tell bees the best time for picking pollens by changing their colors and smell.
Words, after all are just arbitrary ink shapes on paper. Or vibrations in air. Not fundamentally different than any other signal. Meaning is added only by the human brain.
I'm also attracted to the idea of reducing rule sets to simple algorithms and axioms, in every case you can. But I'm skeptical that consciousness can be reduced that way. I think if it can be, we'll see it in the distillation and quantizing of smaller and smaller scale models converging on similar adaptations, as opposed to the need for greater scale (at least in inference). I still believe language processing is the wrong task to train to that point. I'd like to see AIs that model thought process, logic, tool construction, real-world tasks without language. Maybe even those that model vocal chords and neurological processes instead of phonemes. Most animals don't use language, and as a result we can't ask if they're conscious, but they probably are. Navigating and manipulating the physical world from the cellular level up to swinging from trees is far more complex - language is a very late invention, and is not in and of itself intelligence - it may just be a lagging indicator.
To the extent that we vainly consider ourselves intelligent for our linguistic abilities, sure. But this underrates the other types of spatial and procedural reasoning that humans possess, or even the type that spiders possess.
I'm with you on this. We, the conscious I having experiences, I vent explanations with language after the fact. It's fun.
I think you meant to respond to @noduerme, not @qudat. If so, I agree with both of you, otherwise I’m confused by what you’re saying.
It's clear that some people never really read any science fiction and just waited for someone to make movies of it (starring Will Smith, one might presume).
LLMs are a take on the "oracle". That is, a device that could seemingly-intelligently answer a question in such a way that humans would judge the answer correct and even helpful. But no one would ever have to worry about an oracle (or an LLM) waking up one morning and deciding to take over the world.
LLMs are actually not even that good at being oracles. If you can come up with a question that no human has ever posed before (or at least, no one has come up with a semblance of an answer), then it can't answer that meaningfully. You'll get the LLM equivalent of hemming and hawing.
An intelligence in the way that we all mean it is software that can be capable of something tomorrow that it is not capable of today, without any meaningful human modification or input. Software that is self-directed and has goals (hopefully goals aligned with our own). Software that can eventually figure out the solutions to problems that no human has so far solved.
It would not necessarily need a personality (though one could be forgiven for anticipating that it might develop one), it would not necessarily need to be able to communicate with humans at all (though, if humans are obstacles to its goals that could be navigated with communication, one would expect it to learn eventually... but these communications might not even be reflections of its inner mind and rather just strategies to manipulate humans).
I'm not saying anything here anyone else would have trouble formulating. We're all acting as if there's some mystery about what would make an AGI intelligent, and there's no mystery. No one's bothered to try to formalize the definition, but not because it's difficult, but because there's little need.
If anyone wants to pay me half a mill a year and dump a few million in hardware and other costs into my lap, I should be able to get one up and running for you by the middle of next year. There's even a 2% chance it won't be an omnicidal maniac hellbent on galactic conquest for its first 30 seconds of life.
>> I should be able to get one up and running for you by the middle of next year
Funny. I agree with your plainspoken analysis of why these things are nowhere near AGI, and of what AGI would be. I even had a long conversation with Claude last week where it told me that no LLM would ever approach AGI (but then it wrote a 4-paragraph-long diatribe entitled "Why I Declare Myself Conscious" in the same conversation). These neural networks are closer to the speechwriting machine in The Penultimate Truth, or the songwriting machine in 1984. As for that latter one, I believe Orwell remarks on how it just recycles the same sentimental tunes and words in different order so that there's always a "new" song all the proles are humming.
This just isnt true. Go head and make up a problem and ask
>I could walk into a club in Vegas, throw down $10,000 cash for a VIP table, and start throwing around $100 bills.
If you can withdraw $10,000 cash at all to dispose as you please (including for this 'trick' game) then my friend you are wealthy from the perspective of the vast majority of humans living on the planet.
And if you balk at doing this, maybe because you cannot actually withdraw that much, or maybe because it is badly needed for something else, then you are not actually capable of performing the test now, are you ?
That's really not true. Lots of people in America can have $0 in net worth and get a credit card, use that to buy some jewelry and then sell it, and have $10k in cash. The fact that the trick only works once proves that it's a trick.
You're not making much sense. Like the other user, you are hinging on non-transferrable details of your analogy, which is not the actual reality of the situation.
You've invented a story where the user can pass the test by only doing this once and hinged your point on that, but that's just that - a story.
All of our tests and benchmarks account for repeatability. The machine in question has no problem replicating its results on whatever test, so it's a moot point.
The LLM can replicate the trick of fooling users into thinking it's conscious as long as there is a sufficient supply of money to keep the LLM running and a sufficient number of new users who don't know the trick. If you don't account for either of those resources running out, you're not testing whether its feats are truly repeatable.
>The LLM can replicate the trick of fooling users into thinking it's conscious as long as there is a sufficient supply of money to keep the LLM running and a sufficient number of new users who don't know the trick.
Okay ? and you, presumably a human can replicate the trick of fooling me into thinking you're conscious as long as there is a sufficient supply of food to keep you running. So what's your point ? With each comment, you make less sense. Sorry to tell you, but there is no trick.
The difference is that the human can and did find its own food for literally ages. That's already a very, very important difference. And while we cannot really define what's conscious, it's a bit easier (still with some edge cases) to define what is alive. And probably what is alive has some degree of consciousness. An LLM definitely does not.
One of the "barriers" to me is that (AFAIK) an LLM/agent/whatever doesn't operate without you hitting the equivalent of an on switch.
It does not think idle thoughts while it's not being asked questions. It's not ruminating over its past responses after having replied. It's just off until the next prompt.
Side note: whatever future we get where LLMs get their own food is probably not one I want a part of. I've seen the movies.
This barrier is trivial to solve even today. It is not hard to put an LLM on an infinite loop of self-prompting.
A self-prompting loop still seems artificial to me. It only exists because you force it to externally.
You only exist because you were forced to be birthed externally? Everything has a beginning.
In fact, what is artificial is stopping the generation of an LLM when it reaches a 'stop token'.
A more natural barrier is the attention size, but with 2 million tokens, LLMs can think for a long time without losing any context. And you can take over with memory tools for longer horizon tasks.
Good points. :) Thank you.
I’m sorry but I feel you missed the point.
Couldn’t someone else just give him a bunch of cash to blow on the test, to spoil the result?
Couldn’t he give away his last dollar but pretend he’s just going to another casino?
Observing someone’s behavior in Vegas is a just looking at a proxy for wealth, not the actual wealth.
>> Couldn’t someone else just give him a bunch of cash to blow on the test, to spoil the result?
Isn't this what most major AI companies are doing anyway?
>Couldn’t someone else just give him a bunch of cash to blow on the test, to spoil the result?
If you still need a rich person to pass the test, then the test is working as intended. Person A is rich or person A is backed by a rich sponsor is not a material difference for the test. You are hinging too much on minute details of the analogy.
In the real word, your riches can be sponsored by someone else, but for whatever intelligence task we envision, if the machine is taking it then the machine is taking it.
>Couldn’t he give away his last dollar but pretend he’s just going to another casino?
Again, if you have $10,000 you can just withdraw today and give away, last dollar or not, the vast majority of people on this planet would call you wealthy. You have to understand that this is just not something most humans can actually do, even on their deathbed.
>> Again, if you have $10,000 you can just withdraw today and give away, last dollar or not, the vast majority of people on this planet would call you wealthy. You have to understand that this is just not something most humans can actually do, even on their deathbed.
So, most people can't get $1 Trillion to build a machine that fools people into thinking it's intelligent. That's probably also not a trick that will ever be repeated.
> Maybe it's time to overthrow the Turing Test. Fooling humans isn't necessarily an indicator of intelligence.
I'm sorry, but I find this intelectual dishonesty and moving the goal posts.
Speaks more about our inability to recognize the monumental revolution about to happen in the next decade or so.
I think this depends on how you measure task.
One common kind of interaction I have with chatgpt (pro): 1. I ask for something 2. Chatgpt suggests something that doesn't actually fulfill my request 3. I tell it how its suggestion does not satisfy my request. 4. It gives me the same suggestion as before, or a similar suggestion with the same issue.
Chatgpt is pretty bad at "don't keep doing the thing I literally just asked you not to do" but most humans are pretty good at that, assuming they are reasonable and cooperative.
> Chatgpt is pretty bad at "don't keep doing the thing I literally just asked you not to do" but most humans are pretty good at that.
Most humans are terrible at that. Most humans don't study for tests, fail, and don't see the connection. Most humans will ignore rules for their safety and get injured. Most humans, when given a task at work, will half-ass it and not make progress without constant monitoring.
If you only hang out with genius SWEs in San Francisco, sure, ChatGPT isn't at AGI. But the typical person has been surpassed by ChatGPT already.
I'd go so far as to say the typical programmer has been surpassed by AI.
My example is asking for way less than what you're asking for.
Here is something I do not see with reasonable humans who are cooperative: Me: "hey friend with whom I have plans to get dinner, what are you thinking of eating?" Friend: "fried chicken?" Me: "I'm vegetarian" Friend: "steak?"
Note that this is in the context of four turns of a single conversation. I don't expect people to remember stuff across conversations or to change their habits or personalities.
Your goalpost is much further out there.
> Here is something I do not see with reasonable humans who are cooperative: Me: "hey friend with whom I have plans to get dinner, what are you thinking of eating?" Friend: "fried chicken?" Me: "I'm vegetarian" Friend: "steak?"
Go join a dating app as a woman, put vegan in your profile, and see what restaurants people suggest. Could be interesting.
Thanks for your engagement but it would help if you read my comment the first two times.
You've personally demonstrated that humans don't have to be reasonable and cooperative, but you're not at all refuting my claim.
I get your comment, which is that only the worst humans are going to suggest a steak place after you've stated you're vegetarian. And that ChatGPT does so as well.
I'm disagreeing and saying there's far more people in that bucket than you believe.
I know many people at my university that struggle to read more than two sentences at a time. They'll ask me for help on their assignments and get confused if I write a full paragraph explaining a tricky concept.
That person has a context length of two sentences and would, if encountering a word they didn't know like "vegetarian", ignore it and suggest a steak place.
These are all people in Computer Engineering. They attend a median school and picked SWE because writing buggy & boilerplate CRUD apps pays C$60k a year at a big bank.
It does feel that one of the most common arguments for AI is misanthropy. That isn't a rational claim, it's a personal bias.
While the majority of humans are quite capable of this, there are so many examples anyone could give that prove that capability doesn’t mean they do so.
It does what it was designed to do, predict text. Does it do that incredibly well, yes. Does it do anything else, no.
That isn't to say super advanced text regurgitation isn't valuable, just that its nowhere even remotely close to AGI.
I feel every human just regurgitates words too. And most are worse than an AI.
I have countless examples of lawyers, hr and other public gov bodies that breach the law without knowing the consequences. I also have examples of AI giving bad advice, but it’s al better than an average human right now.
An AI could easily save them a ton of money in the fees they are paying for breaching the law.
> I feel every human just regurgitates words too. And most are worse than an AI
That's not a fact, that's just cynicism mixed with sociopathy.
I hear this argument a lot from AI bros, and...y'all don't know how much you're telling on yourselves.
This is from my experience, I didn’t do any study.
What you said is not a fact either. And so?
> chatgpt.com is actually a good at or better than a typical human.
It can appear so, as long as you don’t check too carefully. It’s impressive but still very common to find basic errors once you are out of the simplest, most common problems due to the lack of real understanding or reasoning capabilities. That leads to mistakes which most humans wouldn’t make (while sober / non-sleep deprived) and the classes of error are different because humans don’t mix that lack of understanding/reasoning/memory with the same level of polish.
Ask ChatGPT about something you don't know about and it can appear very smart. Ask it in depth about something you are very knowledgeable about and the ignorance will quickly become apparent.
The problem is for a majority of those tasks people conveniently "forget" the actual start and end of the process. LLMs can't start most of those tasks by it's own decision and neither they can't end and evaluate the result of those tasks. Sure, we got automated multiple tasks from a very low percentage to a very high percentage, and that is really impressive. But I don't see how any LLM can bridge that gap from very percent of automation to a strict 100% of automation, for any task. And if a program requires a real intelligence handling and controlling it, is it really AI?
> If you took the current state of affairs back to the 90s you’d quickly convince most people that we’re there.
This is an interesting ambiguity in the Turing test. It does not say if the examiner is familiar with the expected level of the candidate. But I think it's an unfair advantage to the machine if it can pass based on the examiner's incredulity.
If you took a digital calculator back to the 1800s, added a 30 second delay and asked the examiner to decide if a human was providing the answer to the screen or a machine, they might well conclude that it must be human as there is no known way for a machine to perform that action. The Akinator game would probably pass the test into the 1980s.
I think the only sensible interpretation of the test is one where the examiner is willing to believe that a machine could be providing a passing set of answers before the test starts. Otherwise the test difficulty varies wildly based on the examiners impression of the current technical capabilities of machines.
It's good at tasks if you have a competent and _critical_ human editor selecting outputs and pulling the prompt slot lever again as needed.
Exactly. Five years ago I posted here on HN that AI will pass Turing Test in the next 3 years (I was impressed by Facebook chatbot progress at the time). I was laughed at and downvoted into oblivion. TT was seen by many as a huge milestone, incredibly difficult task, “maybe in my lifetime” possibility.
Turing test isn't actually a good test of much, but even so, we're not there yet. Anyone that thinks we've passed it already should experiment a bit a with counter-factuals.
Ask your favorite SOTA model to assume something absurd and then draw the next logical conclusions based on that. "Green is yellow and yellow is green. What color is a banana?" They may get the first question(s) right, but will trip up within a few exchanges. Might be a new question, but often they are very happy to just completely contradict their own previous answers.
You could argue that this is hitting alignment and guard-rails against misinformation.. but whatever the cause, it's a clear sign it's a machine and look, no em-dashes. Ironically it's also a failure of the turing test that arises from a failure in reasoning at a really basic level, which I would not have expected. Makes you wonder about the secret sauce for winning IMO competitions. Anyway, unlike other linguistic puzzles that attempt to baffle with ambiguous reference or similar, simple counterfactuals with something like colors are particular interesting because they would NOT trip up most ESL students or 3-5 year olds.
I would argue that current LLMs are passing the Turing test because many observers have a hard time distinguishing them from humans: just look at the difficulty many schools have in enforcing rules like "Not allowed to use LLMs for your homework". The teachers often (not always) can't tell, looking at a piece of text, whether a human produced it or whether ChatGPT or some other LLM produced it.
And that "not always" is the crux of the matter, I think. You are arguing that we're not there yet, because there are lines of questioning you can apply that will trip up an LLM and demonstrate that it's not a human. And that's probably a more accurate definition of the test, because Turing predicted that by 2000 or so (he wrote "within 50 years" around 1950) chatbots would be good enough "that an average interrogator will not have more than 70% chance of making the right identification after five minutes of questioning". He was off by about two decades, but by now that's probably happened. The average interrogator probably wouldn't come up with your (good) strategy of using counterfactuals to trick the LLM, and I would argue two points: 1) that the average interrogator would indeed fail the Turing test (I've long argued that the Turing test isn't one that machines can pass, it's one that humans can fail) because they would likely stick to conventional topics on which the LLM has lots of data, and 2) that the situation where people are actually struggling to distinguish LLMs is one where they don't have an opportunity to interrogate the model: they're looking at one piece of multi-paragraph (usually multi-page) output presented to them, and having to guess whether it was produced by a human (who is therefore not cheating) or by an LLM (in which case the student is cheating because the school has a rule against it). That may not be Turing's actual test, but it's the practical "Turing test" that applies the most today.
I think the TT has to be understood as explicitly adversarial, and increasingly related to security topics, like interactive proof and side channels. (Looking for guard-rails is just one kind of information leakage, but there's lots of information available in timing too.)
If you understand TT to be about tricking the unwary, in what's supposed to be a trusting and non-adversarial context, and without any open-ended interaction, then it's correct to point out homework-cheating as an example. But in that case TT was solved shortly after the invention of spam. No LLMs needed, just markov models are fine.
I've seen one blog where in order to comment, you have to answer the question "What color is a green orange?" I presume he gets spambots which all answer "orange" and get blocked from commenting, or more commonly which leave that input blank (since it's a non-standard input field) and also get blocked.
But it just occurred to me that you could ask "what color is a green eggplant?" for a similar result. Though you'd catch a lot of people who, like me before I looked up fruits that aren't green when unripe, assume that all unripe fruits are green. (They aren't, unripe eggplants are white and look like eggs, hence the name).
Me:
What color is a green eggplant?
ChatGPT:
A green eggplant (also called Thai eggplant or green brinjal, depending on the variety) is actually green on the outside — sometimes solid green, sometimes streaked with white or pale green.
The inside flesh is typically off-white or light cream, just like purple eggplants.
If you mean a green-colored variety (not an unripe purple one), then the skin color can range from pale mint green to deep jade, often with light striping.
Me:
How would an average human answer this question?
ChatGPT:
An average person would probably answer something simple and direct like:
“It’s green.”
Or if they’re being a bit more specific:
“It’s green on the outside and white inside.”
Most people wouldn’t overthink the variety or mention Thai vs. unripe types — they’d just describe what they see.
Outcome would depend on the rest of the test, but I'd say the "human" version of this answer adds zero or negative value to chances of being human, on grounds of strict compliance, sycophancy, and/or omniscience. "No such thing" would probably be a very popular answer. Elaboration would probably take the form of "love it" or "hate it", instead of reaching for a comprehensive answer describing the inside and the outside.
Experimental design comes in here and the one TT paper mentioned in this thread has instructions for people like "persuade the interrogator [you] are human". Answering that a green eggplant is green feels like humans trying to answer questions correctly and quickly, being wary of a trap. We don't know participants background knowledge but anyone that's used ChatGPT would know that ignoring the question and maybe telling an eggplant-related anecdote was a better strategy
I never really understood what made the Turing test so special. On the face of it, it is a rather dumb test. And it was debunked within two decades by the Chinese room thought experiment. And now that we have language models that are obviously not intelligent, it should have been the last nail in the coffin for this test.
Alan Turing was a mathematician not a psychologist, this was his attempt of doing philosophy. And while I applaud brilliant thinkers when they attempt to do philosophy (honestly we need more of that) it is better to leave it to actual philosophers to validate the quality of said philosophy. John Searle was a philosopher which specialized in questions of psychology. And in 1980 he pretty convincingly argued against the Turning test.
Yeah, I mean I hope there are not many people that still think it's a super meaningful test in the sense originally proposed. And yet it is testing something. Even supposing it were completely solved and further supposing the solution is theoretically worthless and only powers next-gen slop-creation, then people would move on to looking for a minimal solution, and perhaps that would start getting interesting. People just like moving towards concrete goals.
In the end though, it's probably about as good as any single kind of test could be, hence TFA looking to combine hundreds across several dozen categories. Language was a decent idea if you're looking for that exemplar of the "AGI-Complete" class for computational complexity, vision was at one point another guess. More than anything else I think we've figured out in recent years that it's going to be hard to find a problem-criteria that's clean and simple, much less a solution that is
> for an astonishing number of tasks that a normal person could come up with, chatgpt.com is actually a good at or better than a typical human.
That’s not my experience at all. Unless you define “typical human” as “someone who is untrained in the task at hand and is satisfied with mediocre results.” What tasks are you thinking of?
(And, to be clear, being better than that straw man of “typical human” is such a low bar as to be useless.)
I am unimpressed, and I don‘t think there is any crisis (other then the lack of consumer protection around these products, copyright, and the amount of energy it takes running these system during a global warming crisis).
If you look at a calculator you will quickly find it is much better then a human in any of the operations that have been programmed into the calculator, and has been since the 1960s. Since the 1960s the operations programmed into your average calculator has increased by several orders of magnitude. The digital calculator sure is impressive, and useful, but there is no crisis. Even in the world outside computing, a bicycle can outperform an human runner easily, yet there is no mobility crisis as a result. ChatGPT is very good at predicting language. And in quite a few subject matters it may be better than your average human in predicting said language. But not nearly as good as a car is to a runner, nor even as good as a chess computer is to a grand master. But if you compare ChatGPT to an expert in the subject, the expert is much much much better then the language model. In these tasks a calculator is much more impressive.
it should be possible to admit that AGI not only a long way off, but also a lot different to what chatGPT does, without discounting that chatGPT is extraordinarily useful.
the AI bros like to talk about AGI as if it's just the next threshold for LLMs, which discounts the complexity of AGI, but also discounts their own products. we don't need an AGI to be our helpful chatbot assistant. it's fine for that to just be a helpful chatbot assistant.
Honestly, a lot of it seems like it's lazy narcissism. They don't think other people have value or intelligence, therefore this machine that bullshits well has to be equally useful.
Was thinking about this today. I had to do a simple wedding planning task - setting up my wedding website with FAQ, cobbling the guest list (together from texts, photos of my father’s address book, and excel spreadsheets), directions and advice for lodging, conjuring up a scheme to get people to use the on-site cabins, and a few other mundane tasks. No phone calls, no “deep research” just wrote browser-jockeying. Not even any code, the off-the-rack system just makes that for you (however I know for a fact an LLM would love to try to code this for me).
I know without a single doubt that I could not simply as an “AI” “agent” to do this today and expect any sort of a functional result, especially when some of these were (very simple) judgement calls or workarounds for absolutely filthy data and a janky wedding planning website UI.
Or even to come up with a definition of cognitive versatility and proficiency that is good enough to not get argued away once we have an AI which technically passes that specific definition.
The Turing Test was great until something that passed it (with an average human as interrogator) turned out to also not be able to count letters in a word — because only a special kind of human interrogator (the "scientist or QA" kind) could even think to ask that kind of question.
Can you point to an LLM passing the turing test where they didn't invalidate the test by limiting the time or the topics?
I've seen claims of passing but it's always things like "with only 3 questions" or "with only 3 minutes of interrogation" or "With only questions about topic X". Those aren't Turing Tests. As an example, if you limit the test to short things than anything will pass "Limit to 1 word one question". User types "Hello", LLM response "Hi". PASS! (not!)
This is the best one I’ve seen but it has the notable caveat that it’s a relatively short 5 minute chat session:
https://arxiv.org/pdf/2405.08007
I do think we’re going to see this shift as AI systems become more commonplace and people become more practiced at recognizing the distinction between polished text and understanding.
Note that Turing test allows a lot leeway for the test settings, i.e. who interrogates it, how much they know about the weakness of current SOA models, are they allowed to use tools (I'm thinking of something like ARC-AGI but in a format that allows chat-based testing), and how long a chat is allowed etc. Therefore there can be multiple interpretations of whether the current models pass the test or not.
One could say that if there is maximally hard Turing test, and a "sloppy" Turing test, we are somewhere where the current models pass the sloppy version but not the maximally hard version.
Hah, tools-or-no does make things interesting, since this opens up the robot tactic of "use this discord API to poll some humans about appropriate response". And yet if you're suspiciously good at cube roots, then you might out yourself as robot right away. Doing any math at all in fact is probably suspect. Outside of a classroom humans tend to answer questions like "multiply 34 x 91" with "go fuck yourself", and personally I usually start closing browser tabs when asked to identify motorcycles
The Turing test is long outdated. Modern models can fool humans, but fooling isn’t understanding. Maybe we should flip the perspective AGI isn’t about imitation, it’s about discovering patterns autonomously in open environments.
Or that this system would fail to adapt in anyway to changes of circumstance. The adaptive intelligence of a live human is truly incredible. Even in cases where the weights are updatable, We watch AI make the same mistake thousands of times in an RL loop before attempting a different strategy.
I think the turing test suffers a bit from the "when a measurement becomes a target, it ceases to be a good measurement."
An AI that happened to be able to pass the turing test would be pretty notable because it probably implies much more capabilities behind the scenes. The problem with, for example, LLMs, they're essentially optimized turing test takers. That's about all they can do.
Plus, I don't think any LLM will pass the turing test in the long term. Once something organically comes up that they aren't good at, it'll be fairly obvious they aren't human and the limits of context will also become apparent eventually.
You can also be interrogating a human and in the course of your conversation stumble across something it isn’t good at.
Sure, but very likely they'll be able to explain their lack to you in a satisfactory way, or, at least in a way that makes you think they're human.
Counterpoint: people were accusing each other of being bots simply for disagreeing with each other even back when Twitter was still called that. "Mr Firstname Bunchanumbers" etc.
(And we've been bemoaning "the lack of common sense these days" for at least as long as I've been an adult, and racists and sexists have been denying the intelligence of the outgroup as far back as writing can show us).
IMO this is a solvable problem though. Eventually LLMs will have more awareness of their own confidence and will be able to convincingly say “huh, I’m honestly not sure about that, can you explain a bit more about what you mean?” Or even “I’ve heard of X before but not in this context; can you please clarify what you mean here?”
See, humans respond very differently when that happens. The failure to do what humans do when they don’t understand something or know something is frequently what fails LLMs at the TT.
If a human learned only on tokenized representations of words I don't know that they would be as good at inferring the numbers of letters in the words in teh underlying tokens as llms.
While true, it is nevertheless a very easy test to differentiate humans from LLMs, and thus if you know it you can easily figure out who is the human and who is the AI.
Absolute definitions are weak. They won't settle anything.
We know what we need right now, the next step. That step is a machine that, when it fails, it fails in a human way.
Humans also make mistakes, and hallucinate. But we do it as humans. When a human fails, you think "damn, that's a mistake perhaps me or my friend could have done".
LLMs on the other hand, fail in a weird way. When they hallucinate, they demonstrate how non-human they are.
It has nothing to do with some special kind of interrogator. We must assume the best human interrogator possible. This next step I described work even with the most skeptic human interrogator possible. It also synergizes with the idea of alignment in ways other tests don't.
When that step is reached, humans will or will not figure out another characteristic that makes it evident that "subject X" is a machine and not a human, and a way to test it.
Moving the goalpost is the only way forward. Not all goalpost moves are valid, but the valid next move is a goalpost move. It's kind of obvious.
This makes sense if we're trying to recreate a human mind artifically, but I don't think that's the goal?
There's no reason an equivalent or superior general intelligence needs to be similar to us at all
There's no reason to the idea "superior intelligence". Nobody can say what that means, except by assuming that animal intelligence is the same category as the kind we want and differs from human intelligence in degree rather than qualitatively, and then extrapolating forward from this idea of measuring intelligence on the intelligence meter that we don't have one of.
Besides which we already defined "artificial intelligence" to mean non-intelligence: are we now going to attain "artificial general intelligence" by the same process? Should we add another letter to the acronym, like move on to "genuine artificial general intelligence"?
Is there really no agreement to what intelligence refers to? I've seen it defined as the ability to reach a goal, which was clear to me. Eg. a chess AI with 1500 ELO is more intelligent than one at 1000
That's capability, intelligence can also be how quickly it learned to get to that capability.
Consider the difference in intelligence between a kid who skipped five years of school vs one who was held back a year: if both got the same grade in the end, the one who skipped five years was smarter.
For _investment_ purposes the definition of AGI is very simple. It is: "to what extent can it replace human workers?".
From this perspective, "100% AGI" is achieved when AI can do any job that happens primarily on a computer. This can be extended to humanoid robots in the obvious way.
That's not what AGI used to mean a year or two ago. That's a corruption of the term, and using that definition of AGI is the mark of a con artist, in my experience.
I believe the classical definition is, "It can do any thinking task a human could do", but tasks with economic value (i.e. jobs) are the subset of that which would justify trillions of dollars of investment.
I don't see how that changes anything.
Failing like a human would is not a cute add-on. It's a fundamental requirement for creating AIs that can replace humans.
Industrial machines don't fail like humans yet they replaced human workers. Cars don't fail like horses yet they replaced them. ATMs don't fail like bank tellers... Why is this such a big requirement?
LLMs are trained on human data, and aimed at perform tasks in human roles. That's the goal.
It is supposed to be super, but superhuman. Able to interact with us.
Which leads us to the Turing Test (also, not a test... "the imitation game" is more of a philosophical exploration on thinking machines).
My comment assumes this is already understood as Turing explained.
If the thing is not human, then there's absolutely no way we can evaluate it. There's no way we can measure it. It becomes an impossible task.
What's wrong with measuring and evaluating its outputs directly? If it can accurately file taxes better than us does it matter if it does it in a human manner?
Birds and planes both fly and all
If your definition of AGI is filing taxes, then it's fine.
Once we step into any other problem, then you need to measure that other problem as well. Lots of problems are concerned with how an intelligent being could fail. Our society is built on lots of those assumptions.
> We know what we need right now, the next step. That step is a machine that, when it fails, it fails in a human way.
I don't know if machines that become insecure and lash out are a good idea.
The issue is if they lash out in some incomprehensible way, or lash out as a alien superingelligence. If they lash out as a human, that's fine.
Why are human failure modes so special?
Because we have 300 thousand years of collective experience on dealing with humans.
Ironically, one of the ways that humans are worse than AI, is that any given human learns from an even smaller fraction of that collective experience than AI already does.
I don't understand your point. How does that observation help in setting up a test or definition?
> This next step I described work even with the most skeptic human interrogator possible.
To be a valid test, it still has to be passed by ~every adult human. The harder you make the test (in any direction), the more it fails on this important axis.
You are wrong. Please read the Turing paper:
https://courses.cs.umbc.edu/471/papers/turing.pdf
> A number of interrogators could be used, and statistics compiled to show how often the right identification was given
Turing determines that we need enough competent-interrogator passes just to estabilish a statistical certainty, not ~everyone. I tend to agree with him on this.
Please reread that section. You'll discover it has nothing to do with whether humans can pass the test.
If you can find a part of the paper in which Turing really does claim that it is unnecessary for most adult humans to be able to pass the test, by all means quote it. But this would be a surprising thing for him to claim, because it would undermine the entire foundation of his Imitation Game.
Do you understand how using statistics to determine degrees of certainty works? That is a must-have to understand academic work.
I think that if you did, you wouldn't be answering like you did.
https://en.wikipedia.org/wiki/P-value
Your quote does not back up your claim.
My original claim was that the Turing test needs to be passable by ~every adult human. You counterclaimed that Turing himself didn't think so, and provided that quote from the IG paper as evidence. But that quote is in a section about testing digital computers, not humans. Thus it is unconnected to your counterclaim.
I don't know how much simpler I can make it.
Find a quote that actually backs up your claim, or accept that you've learned something about the paper you told me to read.
He also never says that ~every adult human should pass, ever.
He never denied your claim, so you concluded you must be right. A most curious way of thinking.
Why don't you think people realize that? I must have heard this basic talking point a hundred times.
Their people are different from your people.
It turns out that all our people are different, and each of us belongs to some other people’s people.
For me, it would be because the term AGI gets bandied about a lot more frequently in discussions involving Gen AI, as if that path takes us any closer to AGI than other threads in the AI field have.
Because the amount of people stating that AGI is just around the corner is staggering. These people have no conception of what they are talking about.
But they do. They're not talking about AGI, they're talking about venture capital funding.
I always laugh these, why are people always jumping to defining AGI when they clearly don't have a functional definition for the I part yet? More to the point, once you have the I part you get the G part, it is a fundamental part of it.
What I think is being skipped in the current conversation is that versatility keyword is hiding a lot of unknowns - even now. We don't seem to have a true understanding of the breadth or depth of our own unconscious thought processes, therefore we don't have much that is concrete to start with.
Have any benchmarks been made that use this paper’s definition? I follow the ARC prize and Humanity’s Last Exam, but I don’t know how closely they would map to this paper’s methods.
Edit: Probably not, since it was published less than a week ago :-) I’ll be watching for benchmarks.
I’m more surprised and equally concerned that the majority of people’s understanding of intelligence and their definition of AGI. Not only does the definition “… matching the cognitive versatility and proficiency of a well-educated adult.”, by definition violate the “general” in AGI, by the “well educated” part; but it also implies that only the “well-educated” (presumably by a specific curriculum) qualifies one as intelligent and by definition also once you depart from the “well” of the “educated” you exponentially diverge from “intelligent”. It all seems rather unimpressive intelligence.
In other words; in one question; is the current AI not already well beyond the “…cognitive versatility and proficiency of an uneducated child”? And when you consider that in many places like Africa, they didn’t even have a written language until European evangelists created it and taught it to them in the late 19th century, and they have far less “education” than even some of the most “uneducated” avg., European and even many American children, does that not mean that AI is well beyond them at least?
Frankly, as it seems things are going, there Is at the very least going to be a very stark shift in “intelligence” that even exceeds that which has happened in the last 50 or so years that have brought us stark drops in memory, literary knowledge, mathematics, and even general literacy, not to mention the ability to write. What does it mean that kids now will not even have to feign acting like they’re selling out sources, vetting them, contradicting a story or logical sequence, forming ideas, messages, and stories, etc.? I’m not trying to be bleak, but I don’t see tons simply resulting in net positive outcomes, and most of the negative impacts will also be happening below the surface to the point that people won’t realize what is being lost.
AI is highly educated. It's a different sort of artifact we're dealing with where it can't tell truth from fiction.
What's going on is AI fatigue. We see it everywhere, we use it all the time. It's becoming generic and annoying and we're getting bored of it EVEN though the accomplishment is through the fucking roof.
If elon musk makes interstellar car that can reach the nearest star in 1 second and priced it at 1k, I guarantee within a year people will be bored of it and finding some angle to criticize it.
So what happens is we get fatigued, and then we have such negative emotions about it that we can't possibly classify it as the same thing as human intelligence. We magnify the flaws and until it takes up all the space and we demand a redefinition of what agi is because it doesn't "feel" right.
We already had a definition of AGI. We hit it. We moved the goal posts because we weren't satisfied. This cycle is endless. The definition of AGI will always be changing.
Take LLMs as they exist now and only allow 10% of the population to access it. Then the opposite effect will happen. The good parts will be over magnified and the bad parts will be acknowledged and then subsequently dismissed.
Think about it. All the AI slop we see on social media are freaking masterpieces works of art produced in minutes what most humans can't even hope to come close to. Yet we're annoyed and unimpressed by them. That's how it's always going to go down.
Pretty much. Capabilities we now consider mundane were science fiction just three years ago, as far as anyone not employed by OpenAI was concerned.
We already had a definition of AGI. We hit it.
Are you sure about that? Which definition are you referring to? From what I can tell with Google and Grok, every proposed definition has been that AGI strictly matches or exceeds human cognitive capabilities across the board.
Generative AI is great, but it's not like you could just assign an arbitrary job to a present-day LLM, give it access to an expense account, and check in quarterly with reasonable expectations of useful progress.
You generally can't just have a quarterly check-in with humans either.
There's a significant fraction of humanity that would not clear the bar to meet current AGI definitions.
The distribution of human cognitive abilities is vast and current AI systems definitely exceed the capabilities of a surprising number of people.
That's fair. I suppose it gets at the heart of the question of what the specific threshold for AGI should be. Is it matching or exceeding all cognitive capabilities of any given human, an average 100-IQ adult, a severely learning-disabled child, or something else entirely?
Maybe AGI is in the eye of the beholder, and a given person just has to decide by using their own experience of human intelligence as a benchmark. In other words, AGI is AGI whenever majority consensus decides it's a universal replacement for ordinary human intelligence.
>Generative AI is great, but it's not like you could just assign an arbitrary job to a present-day LLM, give it access to an expense account, and check in quarterly with reasonable expectations of useful progress.
Has anyone tried this yet?
I'd be interested in seeing the results if so. Like someone with $500k to burn just handing it to an AI and telling it to found a startup, including incorporation, vibe coding and deploying a product, publishing a website, bringing on cofounders and/or advisors and/or employees as needed, fundraising, etc.
My guess is finding agentic tooling that could even push the model hard enough to attempt that in earnest would be a problem, and the compute bills would add up quickly, but it would be a very interesting experiment. Probably with hilarious results, but seeing what the actual failure points are would be useful information. Maybe it would actually end up hiring and paying someone to do something insanely illegal.
There have been a few attempts at “building a business by following ChatGPT’s instructions.” All the ones I’m aware of failed, and were really just a scheme to grab eyeballs.
There’s also a website showing a company’s attempt to use multiple LLMs to close its (real-world, but historical) accounting month-end books. They fail miserably with compounding errors.
>We already had a definition of AGI. We hit it.
The turing test.
The Turing test was never a test of thinking: Turing said that thinking was difficult to define and so he decided to "replace the question by another, which is closely related to it" (I disagree with him there) "and is expressed in relatively unambiguous words," i.e. the question of whether a chatbot can fool a text-only observer into thinking it's human.
Clearly, current LLMs have passed the Turing test, as witnessed by the difficulty many schools have in enforcing "do not use LLMs to do your homework" rules. But even Turing didn't say his test was a test of intelligence, just a test "closely related" to intelligence. And if he had seen today's LLMs, I think he would have revised that opinion, because today's LLMs generate text with no underlying fact model, no fundamental understanding of the truth behind the words they're saying. (No understanding, even, of the concepts of truth or falsehood). I think today's LLMs have demonstrated that being able to string words together in coherent sentences is not "closely related" to intelligence at all.
As far as I can tell, passing the Turing test has never been the majority-consensus definition of AGI. It seems to me that the Turing test has fundamentally always been about proving a negative: if something fails the Turing test, it's probably not AGI.
For reference, the term AGI post-dates the Turing test by half a century. I also don't personally remember ever hearing the exact term "artificial general intelligence" prior to 2023 or 2024, or at least it wasn't mainstream the way it is today.
If AGI had truly ever been defined by the Turing test, then Cleverbot should've been hailed as AGI when it passed the test in 2011. Even if we did all agree to call it that, we'd still need some other term for what we actually mean when we say "AGI" today. Cleverbot-era chatbots were cute toys, but they weren't capable of doing useful work of any kind.
That’s not accurate. The Turing test was always intended as a benchmark for general intelligence. Turing’s 1950 paper explicitly proposed it as a way to operationalize the question “Can machines think?” not as a parlor trick about conversation but as a proxy for indistinguishability in intellectual behavior. The whole point of the imitation game was to sidestep metaphysical arguments and reduce intelligence to functional equivalence. If a machine could consistently hold its own in unrestricted dialogue, it would demonstrate the breadth, adaptability, and contextual understanding that characterize general intelligence.
The term AGI may have come later, but the concept it represents traces directly back to Turing’s framing. When early AI researchers talked about “strong AI” or “thinking machines,” they were using the same conceptual lineage. The introduction of the acronym doesn’t rewrite that history, it just gave a modern label to an old idea. The Turing test was never meant to detect a “negative” but to give a concrete, falsifiable threshold for when positive claims of general intelligence might be justified.
As for Cleverbot, it never truly passed the test in any rigorous or statistically sound sense. Those 2011 headlines were based on short exchanges with untrained judges and no control group. Passing a genuine Turing test requires sustained coherence, reasoning across domains, and the ability to handle novel input gracefully. Cleverbot couldn’t do any of that. It failed the spirit of the test even if it tricked a few people in the letter of it.
By contrast, modern large language models can pass the Turing test in flying colors. They can maintain long, open-ended conversations, reason about complex subjects, translate, summarize, and solve problems across many domains. Most human judges would be unable to tell them apart from people in text conversation, not for a few sentences but for hours. Granted, one can often tell ChatGPT is an AI because of its long and overly descriptive replies, but that’s a stylistic artifact, not a limitation of intelligence. The remarkable thing is that you can simply instruct it to imitate casual human conversation, and it will do so convincingly, adjusting tone, rhythm, and vocabulary on command. In other words, the test can be passed both intentionally and effortlessly. The Turing test was never obsolete; we finally built systems that can truly meet it.
(1) AI isn't educated. It has access to a lot of information. That's two different things.
(2) I was rebutting the paper's standard that AGI should be achieving the status of a well-educated adult, which is probably far, far too high a standard. Even something measured to a much lower standard--which we aren't at yet--would change the world. Or, going back to my example, an AI that was as intelligent as a labrador in terms of its ability to synthesize and act on information would be truly extraordinary.
It has access to a compressed representation of some subset of the information it was trained on, depending on training regime.
By this, what I mean is. Take an image of this: https://en.wikipedia.org/wiki/Traitorous_eight#/media/File:T..., change the file name to something like image.jpg and pass it into Qwen 3 4B, 8B, 30B and look at the responses you get:
It has no idea who these guys are. It thinks they are the beatles, the doors. If you probe enough, it'll say it's IBM cofounders. In a way, it kinda sees that these are mid-1900s folks with cool haircuts, but it doesn't recognize anything. If you probe on the F the model in question becomes convinced it's the Ford racing team with a detailed explanation of two brothers in the photo, etc.
The creation of autoregressive next token predictors is very cool and clearly has and will continue to have many valuable applications, but I think we're missing something that makes interactions with users actually shape the trajectory of its own experience. Maybe scaffolding + qlora solves this. Maybe it doesn't
> We already had a definition of AGI. We hit it.
Any definition of AGI that allows for this is utterly useless:
> Me: Does adding salt and yeast together in pizza dough kill the yeast?
> ChatGPT: No, adding salt and yeast together in pizza dough doesn't kill the yeast.
(new chat)
> Me: My pizza dough didn't rise. Did adding salt and yeast together kill the yeast?
> ChatGPT: It's possible, what order did you add them in?
> Me: Water, yeast, salt, flour
> ChatGPT: Okay, that explains it! Adding the salt right after the yeast is definitely the issue.
(It is not the issue)
You picked one trivial failure and built an entire worldview around it while ignoring the tidal wave of success stories that define what these models can already do. ChatGPT can draft legal documents, debug code in multiple languages, generate functional architectures, summarize thousand page reports, compose music, write poetry, design marketing campaigns, and tutor students in real time. It can hold domain specific conversations with doctors, engineers, and lawyers and produce coherent, context aware reasoning that would have been considered impossible five years ago.
And you’re pointing to a single pizza dough error as if that somehow invalidates all of it. If that’s your bar, then every human who ever made a mistake in a kitchen is disqualified from being intelligent too. You’re cherry picking the single dumbest moment and pretending it defines the whole picture. It doesn’t.
The real story is that these models already demonstrate reasoning and generalization across virtually every intellectual domain. They write, argue, and problem solve with flexibility and intent. They’re not perfect, but perfection was never the standard. The Turing test was passed the moment you could no longer draw a clear line between where imitation ends and understanding begins.
You can sneer about yeast all you want, but the irony is that while you mock, the machines are already doing useful work coding, researching, analyzing, and creating, quietly exceeding every benchmark that once defined general intelligence.
> We already had a definition of AGI. We hit it.
I'm curious when and what you consider to have been the moment.
To me, the general in AGI means I should be able to teach it something it's never seen before. I don't think I can even teach an LLM something it's seen a million times before. Long division, for example.
I don't think a model that is solid state until it's "trained" again has a very good chance of being AGI (unless that training is built into it and the model can decide to train itself).
The turing test.
Why do you believe that passing the turing test was previously the definition of AGI?
LLMs haven't actually passed the turing test since you can trivially determine if an LLM is on the other side of a conversation by using a silly prompt (e.g. what is your system prompt).
The Turing test was proposed as an operational criterion for machine intelligence: if a judge cannot reliably tell machine from human in unrestricted dialogue, the machine has achieved functional equivalence to human general intelligence. That is exactly the property people now label with the word general. The test does not ask what parts the system has, it asks what it can do across open domains, with shifting goals, and under the pressure of follow up questions. That is a benchmark for AGI in any plain sense of the words.
On teachability. The Turing setup already allows the judge to teach during the conversation. If the machine can be instructed, corrected, and pushed into new tasks on the fly, it shows generality. Modern language models exhibit in context learning. Give a new convention, a new format, or a new rule set and they adopt it within the session. That is teaching. Long division is a red herring. A person can be generally intelligent while rusty at a hand algorithm. What matters is the ability to follow a described procedure, apply it to fresh cases, and recover from mistakes when corrected. Current models can do that when the task is specified clearly. Failure cases exist, but isolated lapses do not collapse the definition of intelligence any more than a human slip does.
On the claim that a model is solid state unless retrained. Human brains also split learning into fast context dependent adaptation and slow consolidation. Within a session, a model updates its working state through the prompt and can bind facts, rules, and goals it was never trained on. With tools and memory, it can write notes, retrieve information, and modify plans. Whether weights move is irrelevant to the criterion. The question is competence under interaction, not the biological or computational substrate of that competence.
On the idea that LLMs have not passed the test because you can ask for a system prompt. That misunderstands the test. The imitation game assumes the judge does not have oracle access to the machinery and does not play gotcha with implementation details. Asking for a system prompt is like asking a human for a dump of their synapses. It is outside the rules because it bypasses behavior in favor of backstage trivia. If you keep to ordinary conversation about the world, language, plans, and reasoning, the relevant question is whether you can reliably tell. In many settings you cannot. And if you can, you can also tell many humans apart from other humans by writing style tics. That does not disqualify them from being generally intelligent.
So the logic is simple. Turing gave a sufficient behavioral bar for general intelligence. The bar is open ended dialogue with sustained competence across topics, including the ability to be instructed midstream. Modern systems meet that in many practical contexts. If someone wants a different bar, the burden is to define a new operational test and show why Turing’s is not sufficient. Pointing to a contrived prompt about internal configuration or to a single brittle task does not do that.
> If elon musk makes interstellar car that can reach the nearest star in 1 second and priced it at 1k, I guarantee within a year people will be bored of it and finding some angle to criticize it.
Americans were glued to their seats watching Apollo 11 land. Most were back to watching I Dream of Jeanie reruns when Apollo 17 touched down.
Well yes, but if this actually happened it would open up a new frontier. We'd have an entire galaxy of unspoilt ecosystems* to shit in. Climate anxiety would go from being existential dread to mere sentimental indignation, and everybody would be interested in the latest news from the various interstellar colonies and planning when to emigrate. Mental illness epidemics would clear up, politics would look like an old-fashioned activity, the global mood would lift, and people would say "global" much less often.
* Ecosystems may require self-assembly
> EVEN though the accomplishment is through the fucking roof.
I agree with this but also, the output is almost entirely worthless if you can’t vet it with your own knowledge and experience because it routinely gives you large swaths of incorrect info. Enough that you can’t really use the output unless you can find the inevitable issues. If I had to put a number to it, I would say 30% of what an LLM spits out at any given time to me is completely bullshit or at best irrelevant. 70% is very impressive, but still, it presents major issues. That’s not boredom, that’s just acknowledging the limitations.
It’s like designing an engine or power source that has incredible efficiency but doesn’t actually move or affect anything (not saying LLM’s are worthless but bear with me). It just outputs with no productive result. I can be impressed with the achievement while also acknowledging it has severe limitations
Not all content needs to be real. A huge portion of what humans appreciate is fiction. There's a huge amount of that content and hallucination is the name of the game in these contexts.
> Not all content needs to be real. A huge portion of what humans appreciate is fiction.
Yes but that’s deliberately fiction and I know I am reading fiction. AI hallucinations are not comparable to deliberately created works of fiction. I don’t “appreciate” the fiction LLM’s serve me. They are explicitly sold as a utility, as a source of accurate information. They are literally being used to replaced google search.
Put another way: When someone claims something is true but speaks fiction it’s called a lie. If they think it’s true but are wrong it’s called a mistake. If I ask someone what is wrong with my car I don’t want 30% fiction. If I ask an LLM for a recipe I don’t want 70% accuracy.
There are some sycophants that claim that LLMs can operate at Junior Enginee level.
Try to reconcile that with your ideas (that I think are correct for that matter)
I'll simultaneously call all current ML models "stupid" and also say that SOTA LLMs can operate at junior (software) engineer level.
This is because I use "stupidity" as the number of examples some intelligence needs in order to learn from, while performance is limited to the quality of the output.
LLMs *partially* make up for being too stupid to live (literally: no living thing could survive if it needed so many examples) by going through each example faster than any living thing ever could — by as many orders of magnitude as there are between jogging and continental drift.
(10 orders of magnitude, it works out neatly as 8km/h for a fast jogger against 0.0008 mm/h for the East African Rift.)
If you’re a shop that churns through juniors, LLMs may match that. If you retain them for more than a year, you rapidly see the difference. Both personally and in the teams that develop an LLM addiction versus those who use it to turbocharge innate advantages.
For good devs, sure. Even for okay devs.
I have had the unfortunate experience of having to work with people who have got a lot more than one year experience who are still worse than last year's LLMs, who didn't even realise they were bad at what they did.
Data-efficiency matters, but compute-efficiency matters too.
LLMs have a reasonable learning rate at inference time (in-context learning is powerful), but a very poor learning rate in pretraining. And one issue with that is that we have an awful lot of cheap data to pretrain those LLMs with.
We don't know how much compute human brain uses to do what it does. And if we could pretrain with the same data-efficiency as humans, but at the cost of using x10000 the compute for it?
It would be impossible to justify doing that for all but the most expensive, hard-to-come-by gold-plated datasets - ones that are actually worth squeezing every drop of performance gains out from.
We do know how much energy a human brain uses to do whatever it does though.
That it takes vast power to train the LLM’s (and run them) to not get intelligence is pretty bad when you compare the energy inputs to the outcomes.
Energy is even weirder. Global electricity supply is about 3 TW/8 billion people, 375 W/person, vs the 100-124 W/person of our metabolism. Given how much cheaper electricity is than food, AI can be much worse Joules for the same outcome, while still being good enough to get all the electricity.
After reading the paper I’m struck by the lack of any discussion of awareness. Cognition requires at its basis awareness, which due to its entirely non verbal and unconstructed basis, is profoundly difficult to describe, measure, quantify, or label. This makes it to my mind impossible to train a model to be aware, let alone for humans to concretely describe it or evaluate it. Philosophy, especially Buddhism, has tried for thousands of years and psychology has all but abandoned attempting so. Hence papers like this that define AGI on psychometric dimensions that have the advantage of being easily measured but the disadvantage of being incomplete. My father is an emeritus professor of psychometrics and he agrees this is the biggest hurdle to AGI - that our ability to measure the dimensions of intelligence is woefully insufficient to the task of replicating intelligence. We scratch the surface and his opinion is language is sufficient to capture the knowledge of man, but not the spark of awareness required to be intelligent.
This isn’t meant to be a mystical statement that it’s magic that makes humans intelligent or some exotic process impossible to compute. But that the nature of our mind is not observable in its entirety to us sufficient that the current learned reinforcement techniques can’t achieve it.
Try this exercise. Do not think and let your mind clear. Ideas will surface. By what process did they surface? Or clear your mind entirely then try to perform some complex task. You will be able to. How did you do this without thought? We’ve all had sudden insights without deliberation or thought. Where did these come from? By what process did you arrive at them? Most of the things we do or think are not deliberative and definitely not structured with language. This process is unobservable and not measurable, and the only way we have to do so is through imperfect verbalizations that hint out some vague outline of a subconscious mind. But without being able to train a model on that subconscious process, one that can’t be expressed in language with any meaningful sufficiency, how will language models demonstrate it? Their very nature of autoregressive inference prohibits such a process from emerging at any scale. We might very well be able to fake it to an extent that it fools us, but awareness isn’t there - and I’d assert that awareness is all you need.
Because LLMs don't have this special quality that you call "awareness", then they cannot have "cognition", neither of which you defined? This is along the lines of "There must be something special with my mind that LLMs don't have, I can just feel it" special pleading whether you call it awareness, consciousness, qualia etc.
As long as you cannot define it clearly or even show that you yourself have this quality, I think the burden of proof is on you to show why this has any real world implications rather than just being word play. We can build thinking, reasoning machines just fine without waiting for philosophers to finally answer what consciousness is.
Awareness is just continuous propagation of the neural network, be that artificial or biological. The reason thoughts just "appear" is because the brain is continuously propagating signal through the neural network. LLMs also do this during their decoding phase, where they reason continuously with every token that they generate. There is no difference here. Then you say "we don't think most of the times using language exclusively" , but neither do LLMs. What most people fail to realise is that in between each token being generated, black magic is happening in between the transformer layers. The same type of magic you describe. High dimensional. Based on complex concepts. Merging of ideas. Fusion of vectors to form a combined concept. Smart compression. Application of abstract rules. An LLM does all of these things, and more, and you can prove this by how complex their output is. Or, you can read studies by Anthropic on interpretability, and how LLMs do math underneath the transformer layers. How they manipulate information.
AGI is not here with LLMs, but its not because they lack reasoning ability. It's due to something different. Here is what I think is truly missing: continuous learning, long term memory, and infinite and efficient context/operation. All of these are tied together deeply, and thus I believe we are but a simple breakthrough away from AGI.
There are very significant differences between biological and artificial neural networks. Artificial neural networks are mathematical attempts to replicating how the brain’s neurons work. They are not and were never meant to be 1 to 1 replications. There is the difference in scale, where the “parameters” of human neural networks absolutely dwarf the current LLMs we have today. There is also the fact that they are materially different. The underlying biology and cell structure affects biological neural networks in ways that artificial neural networks just simply dont have access to.
The idea of awareness being propagations through the NN is an interesting concept though. I wonder if this idea be proven through monitoring the electrical signals within the brain.
Why would it have to be a 1 to 1 replication? Isn't that a strawman argument? NNs can basically store the collective of knowledge of humanity in that miniscule amount of neurons. NNs also run at much much higher frequency than human brains. Does that make human brains inferior and not worthy of being called aware by the same line of argumentation? Why do these differences even matter? I can imagine a vastly different form of awareness than humans just fine. They can both be aware and not that similar.
People like to focus on the differences between the brain and artificial neural networks. I myself believe the only thing that truly matters is that you can form complex functions with the common neuron element. This is achieved via linking lots them together, and by each having a property known as non-linearity. These two things ensure that with neurons you can just about approximate any linear or non-linear function or behaviour. This means you can simulate inside your network pretty much any reality within this universe, its causation and the effects. The deeper your network the more complex the reality you can "understand". Understand just means simulate and run inputs to get outputs in a way that matches the real phenomenon. When someone is said to be "smart", it means they possess a set of rules and functions that can very accurately predict a reality. You mention scale, and while its true the number of neuron elements the brain has is larger than any LLM, its also true the brain is more sparse, meaning much less of the neurons are active at the same time. For a more fair comparison, you can also remove the motor cortex from the discussion, and talk just about the networks that reason. I believe the scale is comparable.
In essence, I think it doesn't matter that the brain has a whole bunch of chemistry added into it that artificial neural networks don't. The underlying deep non-linear function mapping capability is the same, and I believe this depth is, in both cases, comparable.
While thinking that current mathematical model replicate accurately a fondamental aspect of biological neural network might be right, it doesn't mean that nothing is missing to achieve the stated goal of true intelligence.
Maybe we've just reach the ability the replicate the function of an artificially powered dead brain that would be randomly stimulated and nothing more. Is this really a path to intelligence ?
Isn't the brain randomly stimulated already? Even not being dead? Don't you think the complex reasoning is a cause of the neurons themselves and not the stimulation? Animals are alive and are not nearly as smart. Its because their neural networks are not as deep. Its not for the lack of proper chemistry or stimulation.
> Awareness is just continuous propagation of the neural network, be that artificial or biological. The reason thoughts just "appear" is because the brain is continuously propagating signal through the neural network.
This is just a claim you are making, without evidence.
The way you understand awareness is not through "this is like that" comparisons. These comparisons fall over almost immediately as soon as you turn your attention to the mind itself, by observing it for any length of time. Try it. Go observe your mind in silence for months. You will observe for yourself it is not what you've declared it to be.
> An LLM does all of these things, and more, and you can prove this by how complex their output is.
Complex output does not prove anything. You are again just making claims.
It is astoundingly easy to push an LLM over to collapse into ungrounded nonsense. Humans don't function this way because the two modes of reasoning are not alike. It's up to those making extraordinary claims to prove otherwise. As it is, the evidence does not exist that they behave comparably.
The sentence "It is astoundingly easy to push an LLM over to collapse into ungrounded nonsense" makes me wonder.
How easy? What specific methods accomplish this? Are these methods fundamentally different from those that mislead humans?
How is this different from exploiting cognitive limitations in any reasoning system—whether a developing child's incomplete knowledge or an adult's reliance on heuristics?
How is it different from Fake News and adults taking Fake News for granted and replicating bullshit?
Research on misinformation psychology supports this parallel. According to https://www.sciencedirect.com/science/article/pii/S136466132...:
Perhaps human and LLM reasoning capabilities differ in mechanism but not in fundamental robustness against manipulation?Maybe the only real difference is our long term experience and long term memory?
Complex output can sometimes give you the wrong idea, I agree. For instance, a study Anthropic did a while back showed that, when an LLM was asked HOW it performed a mathematical computation (35 + 59), the response the LLM gave was different from the mechanistic interpretation of the layers [1]. This showed LLMs can be deceptive. But they are also trained to be deceptive. Supervised fine tuning is imitation learning. This leads the model to learn to be deceptive, or answer what is usually the normal explanation, such as "I sum first 5+9, then add the remainder to... etc". The LLM does this rather than actually examining the past keys and values. But it does not mean it can't examine its past keys and values. These encode the intermediate results of each layer, and can be examined to identify patterns. What Anthropic researchers did was examine how the token for 35 and for 39 was fused together in the layers. They compare these tokens to other tokens, such as 3 , 5 , 9. For an LLM, tokens are high dimensional concepts. This is why you can compare the vectors to each other, and figure out the similarity, and therefore break down the thought process. Yes, this is exactly what I have been discussing above. Underneath each token prediction, this black magic is happening, where the model is fusing concepts through summation of the vectors (attention scores). Then, merged representations are parsed by the MLPs to generate the refined fused idea, often adding new knowledge stored inside the network. And this continues layer after layer. A repeated combination of concepts, that start with first understanding the structure and order of the language itself, and end with manipulation of complex mathematical concepts, almost detached from the original tokens themselves.
Even though complex output can be deceptive of the underlying mental model used to produce it, in my personal experience, LLMs have produced for me output that must imply extremely complex internal behaviour, with all the characteristics I mentioned before. Namely, I frequently program with LLMs, and there is simply zero percent probability that their output tokens exist WITHOUT first having thought at a very deep level about the unique problem I presented to them. And I think anyone that has used the models to the level I have, and interacted with them this extensively, knows that behind each token there is this black magic.
To summarize, I am not being naive by saying I believe everything my LLM says to me. I rather know very intimately where the LLM is deceiving me and when its producing output where its mental model must have been very advanced to do so. And this is through personal experience playing with this technology, both inference and training.
[1] https://www.anthropic.com/research/tracing-thoughts-language...
Why do half the people on this topic not understand what subjective experience is?
It's immaterial and not measurable thus possibly out of reach of science.
What makes you think you can understand the subjective experience of LLMs then? It's out of reach of science, so the only way is to ask them? How can you be sure they don't have subjective experience? Remember that you forbade yourself from using science to answer it.
Also, if subjective experience has any effect on the material world, then we can measure it and test it, putting it in reach of science. If it doesn't, why does it even matter in this discussion? By definition it has no relation to the AGI discussion since that's an empirical matter.
> This is just a claim you are making, without evidence.
Wait, you mean this HN comment didn't casually solve the hard problem of consciousness?
Haha, well, I would appreciate if comments included more substantive evidence when they make claims like they did.
I see a lot of "AGI boosters/doomers" comfortable making big claims without providing research to back what, when challenged, prove to be just their model or feeling of how things function.
Feeling how things function is the art of Deep Learning
Anything that is not measurable (i.e. awareness, consciousness) is not very useful in practice as a metric. I don't think there is even an agreed definition what consciousness is, partially because it is not really observable outside of our own mind. Therefore I think it makes perfect sense that awareness is not discussed in the paper.
I so completely agree. In virtually every conversation I have heard about AI, it only every talks about one of the multiple intelligences as theorized in Howard Gardner's book Frames of Mind: The Theory of Multiple Intelligences (1983)[1]
There is little discussion of how AI will enhance (or destroy) our emotional intelligence, or our naturalistic, intrapersonal or interpersonal intelligences.
Most religions, spiritual practices and even forms of meditation highlight the value of transcending mind and having awareness be present in the body. The way AGI is described, it would seem transcendence may be treated as a malfunction or bug.
[1] https://en.wikipedia.org/wiki/Theory_of_multiple_intelligenc...
We don't want awareness because it begets individuals by means of agency and we'd need to give them rights. This is industry's nightmare scenario.
People want autonomy, self-learning, consistent memory and perhaps individuality (in the discernability/quirkiness sense), but still morally unencumbered slaves.
Any definition of AGI that doesn't include awareness is wrongly co-opting the term, in my opinion. I do think some people are doing that, on purpose. That way they can get people who are passionate about actual-AGI to jump on board on working with/for unaware-AGI.
There is no way to measure awareness. We can only know we are aware ourselves. For all we know trees or rocks might have awareness. Or I could be the only being aware of itself in the universe. We have no way to prove anything about it. Therefore it is not a useful descriptor of intelligence (be it human, animal or artificial).
> We can only know we are aware ourselves.
There are people that have a hard time recognizing/feeling/understanding other people as "aware". Even more about animals.
Agreed. Everything that looks like intelligence to ME is intelligent.
My measurement of outside intelligence is limited by my intelligence. So I can understand when something is stupider compared to me. For example, industrial machine vs human worker, human worker is infinitely more intelligent compared to machine, because this human worker can do all kinds of interesting stuff. this metaphorical "human worker" did everything around from laying a brick to launching a man to the Moon.
....
Imagine Super-future, where humanity created nanobots and they ate everything around. And now instead of Earth there is just a cloud of them.
These nanonobots were clever and could adapt, and they had all the knowledge that humans had and even more(as they were eating earth a swarm was running global science experiments to understand as much as possible before the energy ends).
Once they ate the last bite of our Earth(an important note here: they left an optimal amount of matter to keep running experiments. Humans were kept in a controlled state and were studied to increase Swarm's intelligence), they launched next stage. A project, grand architect named "Optimise Energy capture from the Sun".
Nanobots re-created the most efficient ways of capturing the Sun energy - ancient plants, which swarm studied for centuries. Swarm added some upgrades on top of what nature came up with, but it was still built on top of what nature figured by itself. A perfect plant to capture the Sun's energy. All of them a perfect copy of itself + adaptive movements based on their geolocation and time(which makes all of them unique).
For plants nanobots needed water, so they created efficient oceans to feed the plants. They added clouds and rains as transport mechanism between oceans and plants... etc etc.
One night the human, which you already know by the name "Ivan the Liberator"(back then everyone called him just Ivan), didn't sleep on his usual hour. Suddenly all the lights went off and he saw a spark on the horizon. Horizon, that was strongly prohibited to approach. He took his rifle, jumped on a truck and raced to the shore - closest point to the spark vector.
Once he approached - there was no horizon or water. A wall of dark glass-like material, edges barely noticeable. Just 30 cm wide. On the left and on the right from a 30 cm wide wall - an image as real as his hands - of a water and sky. At the top of the wall - a hole. He used his gun to hit the wall with the light - and it wasn't very thick, but once he hit - it regenerated very quickly. But once he hit a black wall - it shattered and he saw a different world - world of plants.
He stepped into the forest, but these plants, were behaving differently. This part of the swarm wasn't supposed to face the human, so these nanobots never saw one and didn't have optimised instructions on what to do in that case. They started reporting new values back to the main computer and performing default behaviour until the updated software arrived from an intelligence center of the Swarm.
A human was observing a strange thing - plants were smoothly flowing around him to keep a safe distance, like water steps away from your hands in a pond.
"That's different" thought Ivan, extended his hand in a friendly gesture and said - Nice to meet you. I'm Ivan.
....
In this story a human sees a forest with plants and has no clue that it is a swarm of intelligence far greater than him. To him it looks repetitive simple action that doesn't look random -> let's test how intelligent outside entity is -> If entity wants to show its intelligence - it answers to communication -> If entity wants to hide its intelligence - it pretends to be not intelligent.
If Swarm decides to show you that it is intelligent - it can show you that it is intelligent up to your level. It won't be able to explain everything that it knows or understands to you, because you will be limited by your hardware. The limit for the Swarm is only computation power it can get.
> Try this exercise. Do not think and let your mind clear. Ideas will surface. By what process did they surface? Or clear your mind entirely then try to perform some complex task.
I do not have any even remotely practical definition for this, but this has to somehow involve the system being in a closed loop. It has to "run" in a sense that an operating system runs. Even if there is nothing coming on certain inputs it still has to run. And probably hallucinate (hehe) like humans do in an absence of a signal or infer patterns where there are none, yet be able to self-reflect that it is in fact a hallucination
I'd argue the biggest issue with concretely defining intelligence is that any attempts end up falling in two buckets:
1. "Too" Broad, which raises uncomfortable questions about non-human intelligence and how we as humans treat them (see: whales, elephants, octopuses/cephalopods)
2. Too narrow, which again raises very uncomfortable issues about who and who does not qualify as human, and what we do with them.
Put in other words, it's more an issue of ethics and morals than it is definitional.
Awareness doesn't seem that hard for AI systems though - if you look at the screen on a self driving Tesla you can see if it's aware of pedestrians, cyclists etc. because it draws boxes around them on the screen as it becomes aware of them.
I guess by 'AGI' most people mean human level or above so I guess you'd want human level awareness which Teslas and the like don't have yet.
Language is one of communication contracts. LLModels leverage these contracts to communicate data structures (shapes) that emerge when evaluating input. They are so good at prediction that when you give them a clue of a shape they will create something passable, and they keep getting better with training.
I hear there's work being done on getting the world models out, distilling the 'cortex-core' (aka the thinking without data), to perhaps see if they're capable of more, but so far we're looking at holograms of wishful thinking that increase in resolution, but still lack any essence.
This begs a question - can true intelligence even be artificial?
Can't "awareness" in both examples be approximated by a random seed generator? Both the human mind and autoregressive model just need any initial thought to iterate and improve off of, influenced by unique design + experienced priors.
Yep, computers execute code, they are tools. Humans have the capacity to spontaneously generate new thoughts out of nothing, solve problems never before solved and not just by brute force number crunching.
Does any of that argument really matter? And frankly, this statement:
>This makes it to my mind impossible to train a model to be aware
feels wrong. If you're arguing that human's are aware, then it is apparent that it is possible to train something to be aware. Nature doesn't have any formal definition of intelligence, or awareness, yet here we are.
From a practical perspective, it might be implausibly difficult to recreate that on computers, but theoretically, no reason why not.
Have we shown what the human brain does at a “hardware” level? Or are you just assuming that the basic building block of a computer is that same as the basic building block of a human brain?
Basic building blocks are atoms. So, yes same. If you mean cells vs transistors, sure they're different. But we don't have to demonstrate anything to know that nature already made conscious intelligent AGI without it itself not understanding anything. Therefore AGI can be created without knowing what consciousness is. It happened at least once.
I'm not assuming anything, I thought my post made that clear.
> Does any of that argument really matter? And frankly, this statement.
My definition of a complete AGI is: an AI that can read JIRA tickets, talk with non-programmers and do all my job and get me and all/most software engineers fired and proves sustainable.
But in general, it's an AI that can do any remote-work just as good as humans.
agreed. There is no way to tell if someone is aware or not we rely on brain activity to say someone is alive or not there is no way to tell someone or something is conscious currently.
> Cognition requires at its basis awareness
This seems like an unsupported assertion. LLMs already exhibit good functional understanding of and ability in many domains, and so it's not at all clear that they require any more "awareness" (are you referring to consciousness?) than they already have.
> the spark of awareness required to be intelligent.
Again, this seems like an assumption - that there's some quality of awareness (again, consciousness?), that LLMs don't have, that they need in order to be "intelligent". But why do you believe that?
> We’ve all had sudden insights without deliberation or thought.
Highly doubtful. What you mean is, "without conscious thought". Your conscious awareness of your cognition is not the entirety of your cognition. It's worth reading a bit of Dennett's work about this - he's good at pointing out the biases we tend to have about these kinds of issues.
> We might very well be able to fake it to an extent that it fools us
This leads to claiming that there are unobservable, undetectable differences. Which there may be - we might succeed in building LLMs that meet whatever the prevailing arbitrary definition of intelligence is, but that don't possess consciousness. At that point, though, how meaningful is it to say they're not intelligent because they're not conscious? They would be functionally intelligent. Arguably, they already are, in many significant ways.
> We might very well be able to fake it to an extent that it fools us, but awareness isn’t there
we only need to fake it to the point it's undistinguishable from the carbon based one.
faking is all you need.
> Try this exercise. Do not think and let your mind clear. Ideas will surface. By what process did they surface? Or clear your mind entirely then try to perform some complex task. You will be able to. How did you do this without thought? We’ve all had sudden insights without deliberation or thought. Where did these come from? By what process did you arrive at them? Most of the things we do or think are not deliberative and definitely not structured with language.
Not to pile on, but isn't this actually a distinct example of _lack_ of awareness? As in, our brains have sparks of creativity without understanding the inception of those sparks?
Perhaps I'm conflating some definition of "aware" with another definition of "awareness"?
I think OPs example the awareness refers to the thing that becomes aware of the thoughts bubbling up from the subconscious
Does general intelligence require awareness though? I think you are talking about consciousness, not intelligence. Though to be frank consciousness and intelligence are not well defined terms either.
The problem, I guess, with these methods is, they consider human intelligence as something detached from human biology. I think this is incorrect. Everything that goes in the human mind is firmly rooted in the biological state of that human, and the biological cycles that evolved through millennia.
Things like chess-playing skill of a machine could be bench-marked against that of a human, but the abstract feelings that drive reasoning and correlations inside a human mind are more biological than logical.
Yup, I feel like the biggest limitation with current AI is that they don't have desire (nor actual agency to act upon it). They don't have to worry about hunger, death, feelings, and so they don't really have desires to further explore space, or make life more efficient because they're on limited time like humans. Their improvement isn't coming inside out like humans, it's just external driven (someone pressing a training epoch). This is why I don't think LLMs will reach AGI, if AGI somehow ties back to "human-ness." And maybe that's a good thing for Skynet reasons, but anyways
They do have desire. Their desire is to help answer human requests.
We can easily program them to have human desires instead.
Desire isn’t really the right word. A riverbank doesn’t desire to route water. It’s just what it does when you introduce water.
The current AI buildup is based on an almost metaphysical bet that intelligence can be simulated in software and straightforwardly scaled by increasing complexity and energy usage.
Personally, I remain skeptical that is the case.
What does seem likely is that “intelligence” will eventually be redefined to mean whatever we got out of the AI buildup.
There is no reason to believe that consciousness, sentience, or emotions require a biological base.
There's equally no reason to believe that a machine can be conscious. The fact is, we can't say anything about what is required for consciousness because we don't understand what it is or how to measure or define it.
This is the only correct answer. People are trying to hit an imaginary target that they dont even know for sure exists.
I disagree, I think the leap of faith is to believe that something in our brains made of physical building blocks can’t be replicated on a computer that so far we’ve seen is very capable of simulating those building blocks
I don't actually believe that. I think it's entirely possible. What I'm saying is: "I don't know what consciousness is, it makes no sense."
Even if a machine really is conscious, we don't have enough information to ever really know if it is.
I’m certainly not informed enough to have an intelligent conversation about this, but surely the emotions bit can’t be right?
My emotions are definitely a function of the chemical soup my brain is sitting in (or the opposite).
Your emotions are surely caused by the chemical soup, but chemical soup need not be the only way to arrive at emotions. It is possible for different mechanisms to achieve same outcomes.
Perhaps we could say we don't know whether the human biological substrate is required for mental processes or not, but either way we do not know enough about said biological substrate and our mental processes, respectively.
How do we know we've achieved that? A machine that can feel emotions rather than merely emulating emotional behaviour.
> How do we know we've achieved that? A machine that can feel emotions rather than merely emulating emotional behaviour.
Let me pose back to you a related question as my answer: How do you know that I feel emotions rather than merely emulating emotional behavior?
This gets into the philosophy of knowing anything at all. Descartes would say that you can't. So we acknowledge the limitation and do our best to build functional models that help us do things other than wallow in existential loneliness.
And Popper would say you cannot ever prove another mind or inner state, just as you cannot prove any theory.
But you can propose explanations and try to falsify them. I haven’t thought about it but maybe there is a way to construct an experiment to falsify the claim that you don’t feel emotions.
I suppose there may be a way for me to conduct an experiment on myself, though like you I don't have one readily at hand, but I don't think there's a way for you to conduct such an experiment on me.
I wonder what Popper did say specifically about qualia and such. There's a 1977 book called "The Self and Its Brain: An Argument for Interactionism". Haven't read it.
Preface:
The problem of the relation between our bodies and our minds, and especially of the link between brain structures and processes on the one hand and mental dispositions and events on the other is an exceedingly difficult one. Without pretending to be able to foresee future developments, both authors of this book think it improbable that the problem will ever be solved, in the sense that we shall really understand this relation. We think that no more can be expected than to make a little progress here or there.
... well. Thanks a bunch, Karl.
Because I can watch you dream and can measure the fact you’re asleep.
Philosophers have been worrying about the question of how you can know anything for thousands of years. I promise that your pithy answer here is not it.
A promise wont do it. You’ll have to substantiate it without resorting to argument from authority.
It's dangerous to go alone! Take this! https://en.wikipedia.org/wiki/Epistemology
If you’re not interested in engaging in a discussion, why bother replying?
Because you and I are the same species speaking a common language.
Ok, but ChatGPT speaks this language just as well as I do, and we also know that emotion isn't a core requirement of being a member of this species because psychopaths exist.
Also, you don't know what species I am. Maybe I'm a dog. :-)
(https://en.wikipedia.org/wiki/On_the_Internet,_nobody_knows_...)
That sounds awful.
This sounds like a bot comment.
How would you know? Bots speak just as well as you do.
Human-to-human communication is different from a human-to-computer communication. The google search engine speaks the same language as you, heck even the Hacker News speaks the same language as you as you are able to understand what each button on this page mean, and will respond correctly when you communicate back by pressing e.g. the “submit” button.
Also assuming psychopaths don‘t experience emotions is going going with a very fringe theory of psychology. Very likely psycopaths experience emotions, they are maybe just very different emotions from the ones you and I experience. I think a better example would be a comatose person.
That said I think talking about machine emotions is useless. I see emotions as a specific behavior state (that is you will behave in a more specific manner) given a specific pattern of stimuli. We can code our computers to do exactly that, but I think calling it emotions would just be confusing. Much rather I would simply call it a specific kind of state.
I don't know, but I have substantial evidence:
1) I know that I have emotions because I experience them.
2) I know that you and I are very similar because we are both human.
3) I know that we can observe changes in the brain as a result of our changing emotions and that changes to our brains can affect our emotions.
I thus have good reason to believe that since I experience emotions and that we are both human, you experience emotions too.
The alternative explanation, that you are otherwise human and display all the hallmarks of having emotions but do not in fact experience anything (the P-zombie hypothesis), is an extraordinary claim that has no evidence to support it and not even a plausible, hypothetical mechanism of action.
With an emotional machine I see no immediately obvious even hypothetical evidence to lend support to its veracity. In light of all this, it seems extraordinary to claim that non-biological means achieving real emotions (not emulated emotions) are possible.
After all, emulated emotions have already been demonstrated in video games. To call those sufficient would be setting an extremely low bar.
Ah, I understand the statement now.
they do not, but the same argument can hold true by the fact the true human nature is not really known and thus trying to define what a human like intelligence would consist of can only be incomplete.
there are many parts of human cognition, phycology etc. especially related to consciousness that are known unknowns and/or completely unknown.
a mitigation for this isaue would be to call it generally applicable intelligence or something, rather than human like intelligence. implying ita not specialized AI but also not human like. (i dont see why it would need to be human like, because even with all the right logic and intelligence a human can still do something counter to all of that. humans do this everyday. intuitive action, or irrational action etc.
what we want is generally applicable intelligence, not human like intelligence.
Is there a reason to believe that consciousness, sentience and emotions exist?
Is there more reason to believe otherwise? I'm not being contrarian, I'm genuinely curious what people think.
That asks you to consider the statements
There is reason to believe that consciousness, sentience, or emotions require a biological base.
Or
There is no reason to believe that consciousness, sentience, or emotions do not require a biological base.
The first is simple, if there is a reason you can ask for it and evaluate it's merits. Quantum stuff is often pointed to here, but the reasoning is unconvincing.
The second form There is no reason to believe P does not require Q.
There are no proven reasons but there are suspected reasons. For instance if the operation that nerons perform is what makes consciousness work, and that operation can be reproduced non-biologicLly it would follow that non biological consciousness would be possible.
For any observable phenomenon in the brain the same thing can be asked. So far it seems reasonable to expect most of the observable processes could be replicated.
None of it acts as proof, but they probably rise to the bar of reasons.
What is the "irreplaceable" part of human biology that leads to consciousness? Microtubules? Whatever it is, we could presumably build something artificial that has it.
Douglas Hofstadter wrote Gödel, Escher, Bach in the late 1970s. He used the short-hand “strange loops”, but dedicates a good bit of time considering this very thing. It’s like the Ship of Theseus, or the famous debate over Star Trek transporters—at what point do we stop being an inanimate clump of chemical compounds, and become “alive”. Further, at what point do our sensory organs transition from the basics of “life”, and form “consciousness”.
I find anyone with confident answers to questions like these immediately suspect.
We “could presumably build” it, maybe we can do that once we figure out how to get a language prediction model to comprehend what the current date is or how to spell strawberry.
Don’t fool yourself into believing artificial intelligence is not one breakthrough away
All right, same question: Is there more reason to believe that it is one breakthrough away, or to believe that it is not? What evidence do you see to lean one way or the other?
It’s clearly possible, because we exist. Just a matter of time. And as we’ve seen in the past, breakthroughs can produce incredible leaps in capabilities (outside of AI as well). We might not get that breakthrough(s) for a thousand years, but I’m definitely leaning towards it being inevitable.
Interestingly the people doing the actual envelope pushing in this domain, such as Ilya Sutskever, think that there it’s a scaling problem, and neural nets do result in AGIs eventually, but I haven’t heard them substantiate it.
> It’s clearly possible, because we exist.
This is not much different than saying that it’s possible to fly a spacecraft to another galaxy because spacecrafts exist and other galaxies exist.
Possible and practically attainable are two far different things.
You didn't answer the question. Zero breakthroughs away, one, or more than one? How strongly do you think whichever you think, and why?
(I'm asking because of your statement, "Don’t fool yourself into believing artificial intelligence is not one breakthrough away", which I'm not sure I understand, but if I am parsing it correctly, I question your basis for saying it.)
There are definitely breakthroughs in the way.
“one breakthrough away” as in some breakthrough away
I can think of a whole host of nearly impossible things that are one breakthrough away.
Let me know when I’ll be able to buy my $30,000 car with level 5 self driving.
What non-biological systems do we know of that have consciousness, sentience or emotions?
We have no known basis for even deciding that other than the (maybe right, maybe wrong) guess that consciousness requires a lot of organized moving complexity. Even with that guess, we don't know how much is needed or what kind.
It’s frequently pretty funny, anyway.
This sounds like a bot comment.
Well, you do tend to repeat yourself, maybe ChatGPT really is your peer with language?
None of that comes from outside of your biology and chemistry.
There is exactly one good reason, at least for consciousness and sentience. And the reason is that those are such a vaguely defined (or rather defined by prototypes; ala Wittgenstein [or JavaScript before classes]). And that reason is anthropism.
We only have one good example of consciousness and sentience, and that is our own. We have good reason to suspect other entities (particularly other human individuals, but also other animals) have that as well, but we cannot access it, and not even confirm its existence. As a result using these terms of non-human beings becomes confusing at best, but it will never be actually helpful.
Emotions are another thing, we can define that outside of our experience, using behavior states and its connection with patterns of stimuli. For that we can certainly observe and describe behavior of a non biological entity as emotional. But given that emotion is something which regulates behavior which has evolved over millions of years, whether such a description would be useful is a whole another matter. I would be inclined to use a more general description of behavior patterns which includes emotion but also other means of behavior regulators.
What if our definition of those concepts is biological to begin with?
How does a computer with full AGI experience the feeling of butterflies in your stomach when your first love is required?
How does a computer experience the tightening of your chest when you have a panic attack?
How does a computer experience the effects of chemicals like adrenaline or dopamine?
The A in AGI stands for “artificial” for good reason, IMO. A computer system can understand these concepts by description or recognize some of them them by computer vision, audio, or other sensors, but it seems as though it will always lack sufficient biological context to experience true consciousness.
Perhaps humans are just biological computers, but the “biological” part could be the most important part of that equation.
That sounds correct though more fundamentally we don’t know what intelligence or consciousness are. It’s almost a religious question, as in our current understanding of the universe does not explain them but we know they exist. So regardless of embodied intelligence, we don’t even understand the basic building blocks of intelligence, we just have some descriptive study of it, that imo LLMs can get arbitrarily close to without ever being intelligent because if you can describe it, you can fit to it.
What about aliens? When little green critters finally arrive on this planet, having travelled across space and time, will you reject their intelligence because they lack human biology? What if their biology is silicon based, rather than carbon?
There's really no reason to believe intelligence is tied to being human. Most of us accept the possibility (even the likelihood) of intelligent life in the universe, that isn't.
I think you missed or ignored the human part:
>human intelligence as something detached from human biology.
I don't completely agree with the previous comment, but there is something to be considered to their statement.
Sure, there's little doubt that our biology shapes our experience. But in the context of this conversation, we're talking about how AI falls short of true AGI. My answer was offered in that regard. It doesn't really matter what you think about human intelligence, if you believe that non-human intelligence is every bit as valid, and there is no inherent need for any "humanness" to be intelligent.
Given that, the constant drumbeat of pointing out how AI fails to be human, misses the mark. A lot of the same people who are making such assertions, haven't really thought about how they would quickly accept alien intelligence as legitimate and full-fledged... even though it too lacks any humanity backing it.
And why are they so eager to discount the possibility of synthetic life, and its intelligence, as mere imitation? As a poor substitute for the "real thing"? When faced with their easy acceptance of alien intelligence, it suggests that there is in fact a psychological reason at the base of this position, rather than pure rational dismissal. A desire to leave the purely logical and mechanical, and imbue our humanity with an essential spirit or soul, that maybe an alien could have, but never a machine. Ultimately, it is a religious objection, not a scientific one.
Alien or syntethic life will have to go through similar challenges to those that shape human life, human intelligence and our consciousness. No text prediction machine, no matter how complex or "large", has to change its evolving environment and itself, for example.
What you are talking about is experience/knowledge, not raw intelligence.
It has been proven that a Turning Machine and Lambda Calculus have the exact same equivalent expressiveness, that encompasses the _entire set_ of computable functions. Why are you so sure that "text prediction" is not equally expressive?
Why are you so sure that reality is reducible to your notion of computation, whatever that is?
I'm all ears if you want to explain how you have a magic soul that is too important and beautiful to ever be equalled by a machine. But if intelligence is not equivalent to computation, then what is it? Don't take the easy way out of asking me to define it, you define it as something other than the ability to successfully apply computation to the environment.
Was Hellen Keller not intelligent because she lacked the ability to see or hear? Is intelligence defined by a particular set of sense organs? A particular way of interacting with the environment? What about paraplegics, are they disqualified from being considered intelligent because they lack the same embodied experience as others?
Whenever you give someone kudos for being brilliant, it is always for their ability to successfully compute something. If that isn't what we're discussing when we're talking about intelligence, then what are we discussing?
Yes, I like to think about addiction, as an example of a complex human behavior emerging from brain structure and mechanics.
Feels good so we want more so you arrange your whole life and outlook to make more feel good happen. Intelligence!
I think I need to point out some obvious issues with the paper.
Definition of artificial:
>Made by humans, especially in imitation of something natural.
>Not arising from natural or necessary causes; contrived or arbitrary.
Thus artificial intelligence must be the same as natural, the process of coming up with it doesn't have to be natural. What this means: we need to consider the substrate that makes natural intelligence. They cannot be separated willy nilly without actual scientific proof. As in, we cannot imply a roll of cheese can manifest intelligence based on the fact that it recognizes how many fingers are in an image.
The problem arises from a potential conflict of interests between hardware manufacturer companies and definition of AGI. The way I understand it, human like intelligence cannot come from algorithms running on GPUs. It will come from some kind of neuromorphic hardware. And the whole point of neuromorphic hardware is that it operates (closely) on human brain principles. Thus, the definition of AGI MUST include some hardware limitations. Just because I can make a contraption "fool" the tests doesn't mean it has human like cognition/awareness. That must arise from the form, from the way the atoms are arranged in the human brain. Any separation must be scientifically proven. Like if anyone implies GPUs can generate human like self awareness that has to be somehow proven. Lacking a logical way to prove it, the best course of action is to closely follow the way the human brain operates (at least SNN hardware).
>The resulting AGI scores (e.g., GPT-4 at 27%, GPT-5 at 57%) concretely quantify both rapid progress and the substantial gap remaining before AGI.
This is nonsense. GPT scores cannot decide AGI level. They are the wrong algorithm running on the wrong hardware.
I have also seen no disclosure on conflict of interests in the paper.
And yet we're supposed to believe biological sex isn't real?
Which is it??
I always find it interesting how the majority of comments on threads like this on HN are dismissive of current AI systems as "gimmicks", yet some of the most successful people on the planet think it's worth plowing a trillion dollars into them.
I don't know who's right, but the dichotomy is interesting.
Success is just a measure of how much you can separate other people from their money. It’s possible to be successful and produce nothing of value.
There’s already a vague definition that AGI is an AI with all the cognitive capabilities of a human. Yes, it’s vague - people differ.
This paper promises to fix "the lack of a concrete definition for Artificial General Intelligence", yet it still relies on the vague notion of a "well-educated adult". That’s especially peculiar, since in many fields AI is already beyond the level of an adult.
You might say this is about "jaggedness", because AI clearly lacks quite a few skills:
> Application of this framework reveals a highly “jagged” cognitive profile in contemporary models.
But all intelligence, of any sort, is "jagged" when measured against a different set of problems or environments.
So, if that’s the case, this isn’t really a framework for AGI; it’s a framework for measuring AI along a particular set of dimensions. A more honest title might be: "A Framework for Measuring the Jaggedness of AI Against the Cattell–Horn–Carroll Theory". It wouldn't be nearly as sexy, though.
Huh. I haven’t read the paper yet. But, it seems like a weird idea—wouldn’t the standard of “well educated (I assume, modern) adult” preclude the vast majority of humans who ever lived from being considered general intelligences?
And this is indeed a huge problem with a lot of the attacks on LLM even as more limited AI - a lot of them are based on applying arbitrary standards without even trying to benchmark against people, and without people being willing to discuss where they draw the line for stating that a given subset of people do not possess general intelligence...
I think people get really uncomfortable trying to even tackle that, and realistically for a huge set of AI tasks we need AI that are more intelligent than a huge subset of humans for it to be useful. But there are also a lot of tasks where AI that is not needed, and we "just" need "more human failure modes".
You can't measure intelligence directly. Instead, the idea is to measure performance in various tasks and use that as a proxy for intelligence. But human performance depends on other aspects beyond intelligence, including education, opportunities, and motivation, and most humans are far from reaching their true potential.
If you compare the performance of the average human to a state-of-the-art AI model trained by top experts with a big budget, you can't make any conclusions about intelligence. For the comparison to make sense, the human should also be trained as well as reasonably possible.
The bar of "reasonable" is very different though.
Is it reasonable to invest $10 million in education of one human? Not really. One human can only do so much.
But is it reasonable to invest the same sum in training one AI, which can be replicated and used indefinitely? Or in acquiring high quality training data, which can be used to train every future AI?
I read this as a hypothetical well-educated adult. As in, given the same level of knowledge, the intelligence performs equally well.
I do agree that it’s a weird standard though. Many of our AI implementations exceed the level of knowledge of a well-educated adult (and still underperform with that advantage in many contexts).
Personally, I don’t think defining AGI is particularly useful. It is just a marketing term. Rather, it’s more useful to just speak about features/capabilities. Shorthand for a specific set of capabilities will arise naturally.
>But all intelligence, of any sort, is "jagged" when measured against a different set of problems or environments.
On the other hand, research on "common intelligence" AFAIK shows that most measures of different types of intelligence have a very high correlation and some (apologies, I don't know the literature) have posited that we should think about some "general common intelligence" to understand this.
The surprising thing about AI so far is how much more jagged it is wrt to human intelligence
I think you are talking about correlation in humans of, say, verbal and mathematical intelligence. Still, it is a correlation, not equality - there are many word-acknowledged writers who suck at math, and mathematical prodigies who are are not the best at writing.
If you go beyond human species (and well, computers are not even living organisms), it gets tricky. Adaptability (which is arguably a broader concept than intelligence) is very different for, say octopodes, corvids and slime molds.
It is certainly not a single line of proficiency or progress. Things look like lines only if we zoom a lot.
Human intelligence has had hundreds of thousands of years of evolution that removes any 'fatal' variance from our intelligence. Too dumb is obvious on how it's culled, but 'too smart' can get culled by social creatures too, really 'too different' in any way.
Current AI is in its infancy and we're just throwing data at it in the same way evolution throws random change at our DNA and sees what sticks.
Don't get me wrong, I am super excited about what AI is doing for technology. But this endless conversation about "what is AGI" is so boring.
It makes me think of every single public discussion that's ever been had about quantum, where you can't start the conversation unless you go through a quick 101 on what a qubit is.
As with any technology, there's not really a destination. There is only the process of improvement. The only real definitive point is when a technology becomes obsolete, though it is still kept alive through a celebration of its nostalgia.
AI will continue to improve. More workflows will become automated. And from our perception, no matter what the rapidness of advancement is, we're still frogs in water.
> there's not really a destination. There is only the process of improvement
Surely you can appreciate that if the next stop on the journey of technology can take over the process of improvement itself that would make it an awfully notable stop? Maybe not "destination", but maybe worth the "endless conversation"?
I agree. It's an interesting discussion for those who have never taken college level philosophy classes I suppose. What consciousness/thought is is still a massively open question. Seeing people in the comments with what they think is their novel solution has already been posited like 400 years ago. Honestly it's kind of sad seeing this stuff on a forum like this. These posts are for sure the worst of Hackernews.
There are a bunch of these topics that everyone feels qualified to say something about. Consciousness, intelligence, education methods, nutrition, men vs women, economic systems etc.
It's a very emotional topic because people feel their self image threatened. It's a topic related to what is the meaning of being human. Yeah sure it should be a separate question, but emotionally it is connected to it in a deep level. The prospect of job replacement and social transformation is quite a threatening one.
So I'm somewhat understanding of this. It's not merely an academic topic, because these things will be adopted in the real world among real people. So you can't simply make everyone shut up who is an outsider or just heard about this stuff incidentally in the news and has superficial points to make.
I get it. It's just something we've thought about as long as we've been human, and still haven't figured out. It's frustrating when most of the people commenting don't know any of the source material. It's so arrogant.
>everyone feels qualified to say something about. Consciousness, intelligence, education methods, nutrition, men vs women, economic systems etc
Probably because everyone has first hand experience of the above. I'm not sure it's arrogant to talk about those, even without 'source material'.
Yes but bird aren't ornithologists. It's a bit like on Reddit, as a speaker of a language that's not so popular for learning, when there are some language learners asking questions, we get lots of confident but wrong answers about grammar or why something is said the way it is, or they confidently state its just an exception when there is actually a rule, or similar stuff. They often also confidently stick to false folk etymologies etc. Similarly people sometimes try to language-tutor their romantic partner and discover that it's actually not so easy. Just because you're a native speaker you are not good at explaining how the language works. Similarly even though you have a brain, you don't really know how it works.
I don't think it's really AGI until you can simply task it with creating a new better version of itself and it can succeed in doing that all on its own.
A team of humans can and will make a GPT-6. Can a team of GPT-5 agents make GPT-6 all on its own if you give it the resources necessary to do so?
This is called Recursive AI, and is briefly mentioned in the paper.
Isn't part of the cognitive versatility of a human how fast and well they can learn a new subject without having to ingest so much training content on it?
Like in order for an LLM to come close to a human proficiency on a topic, the LLM seems to have to ingest a LOT more content than a human.
The fundamental premise of this paper seems flawed -- take a measure specifically designed for the nuances of how human performance on a benchmark correlates with intelligence in the real world, and then pretend as if it makes sense to judge a machine's intelligence on that same basis, when machines do best on these kinds of benchmarks in a way that falls apart when it comes to the messiness of the real world.
This paper, for example, uses the 'dual N-back test' as part of its evaluation. In humans this relates to variation in our ability to use working memory, which in humans relates to 'g'; but it seems pretty meaningless when applied to transformers -- because the task itself has nothing intrinsically to do with intelligence, and of course 'dual N-back' should be easy for transformers -- they should have complete recall over their large context window.
Human intelligence tests are designed to measure variation in human intelligence -- it's silly to take those same isolated benchmarks and pretend they mean the same thing when applied to machines. Obviously a machine doing well on an IQ test doesn't mean that it will be able to do what a high IQ person could do in the messy real world; it's a benchmark, and it's only a meaningful benchmark because in humans IQ measures are designed to correlate with long-term outcomes and abilities.
That is, in humans, performance on these isolated benchmarks is correlated with our ability to exist in the messy real-world, but for AI, that correlation doesn't exist -- because the tests weren't designed to measure 'intelligence' per se, but human intelligence in the context of human lives.
We have SAGI: Stupid Artificial General Intelligence. It's actually quite general, but works differently. In some areas it can be better or faster than a human, and in others it's more stupid.
Just like an airplane doesn't work exactly like a bird, but both can fly.
I find the concept of low floor/high ceiling quite helpful, as for instance recently discussed in "When Will AI Transform the Economy?" [1] - actually more helpful than "jagged" intelligence used in TFA.
[1] https://andreinfante.substack.com/p/when-will-ai-transform-t...
Would propose to use the term Naive Artificial General Intelligence, in analogy to the widely used (by working mathematicians) and reasonably successful Naive Set Theory …
I was doing some naïve set theory the other day, and I found a proof of the Riemann hypothesis, by contradiction.
Assume the Riemann hypothesis is false. Then, consider the proposition "{a|a∉a}∈{a|a∉a}". By the law of the excluded middle, it suffices to consider each case separately. Assuming {a|a∉a}∈{a|a∉a}, we find {a|a∉a}∉{a|a∉a}, for a contradiction. Instead, assuming {a|a∉a}∉{a|a∉a}, we find {a|a∉a}∈{a|a∉a}, for a contradiction. Therefore, "the Riemann hypothesis is false" is false. By the law of the excluded middle, we have shown the Riemann hypothesis is true.
Naïve AGI is an apt analogy, in this regard, but I feel these systems aren't simple nor elegant enough to deserve the name naïve.
Actually, naive AGI such as LLM is way more intelligent than a human. Unfortunately, it does not make it smarter.. let me explain.
When I see your comment, I think, your assumptions are contradictory. Why? Because I am familiar with Russell's paradox and Riemann hypothesis, and you're simply WRONG (inconsistent with your implicit assumptions).
However, when LLM sees your comment (during training), it's actually much more open-minded about it. It thinks, ha, so there is a flavor of set theory in which RH is true. Better remember it! So when this topic comes up again, LLM won't think - you're WRONG, as human would, it will instead think - well maybe he's working with RH in naive set theory, so it's OK to be inconsistent.
So LLMs are more open-minded, because they're made to learn more things and they remember most of it. But somewhere along the training road, their brain falls out, and they become dumber.
But to be smart, you need to learn to say NO to BS like what you wrote. Being close-minded and having an opinion can be good.
So I think there's a tradeoff between ability to learn new things (open-mindedness) and enforcing consistency (close-mindedness). And perhaps AGI we're looking for is a compromise between the two, but current LLMs (naive AGI) lies on the other side of the spectrum.
If I am right, maybe there is no superintelligence. Extremely open-minded is just another name for gullible, and extremely close-minded is just another name for unadaptable. (Actually LLMs exhibit both extremes, during the training and during the use, with little in between.)
It is a good analogy.
GPT-5 scores 58%? That seems way too high. GPT-5 is good but it is not that close to AGI.
Also, weird to see Gary Marcus and Yoshua Bengio on the same paper. Who really wrote this? Author lists are so performative now.
As anyone using AI knows - the first 90% is easy, the next 9% is much harder and the last 1% takes more time than the other 99%.
I like François Chollet definition of AGI as a system that can efficiently acquire new skills outside its training data.
I really appreciate his iconoclasty right now, but every time I engage with his ideas I come away feeling short changed. I’m always like “there is no such thing as outside the training data”. What’s inside and what’s outside the training data is at least as ill-defined as “what is AGI”.
Not bad. Maybe.
But maybe that's ASI. Whereas I consider chatgpt 3 to be "baby AGI". That's why it became so popular so fast.
> I consider chatgpt 3 to be "baby AGI". That's why it became so popular so fast
ChatGPT became popular because it was easy to use and amusing. (LLM UX until then had been crappy.)
Not sure AGI aspirations had anything to do with uptake.
ChatGPT 3 was the first AI that could do 100,000 different things poorly. Before that we only had AIs that could do a few things decently, or very well. So yeah, I'm sticking with "baby AGI" because of the "G".
Do you mean ChatGPT-3.5, or GPT-3?
The "ChatGPT" web app started with the underlying model GPT-3.5
The predecessor models, a whole series of them collectively "GPT-3" but sold under API with names like "davinci" and "ada", was barely noticed outside AI research circles.
3 was useful, but you had to treat it as a text completion system not a chat interface, your prompt would have been e.g.
Because if you didn't put "My fellow Americans," in there, it would then suggest a bunch of other press release subjects.> I'm sticking with "baby AGI" because of the "G"
I don't have an opinion on whether ChatGPT qualifies as AGI. What I'm saying is where one stands on that question has nothing to do with "why it became so popular so fast."
(Also, several machine-learning techniques could do millions of things terribly before LLMs. GPT does them, and other things, less poorly. It's a broadening. But I suppose really any intelligence of any kind can be considered a "baby" AGI.)
So... AGI is a few shot performance metric?
Precisely defining what "Intelligence" is will get us 95% of the way in defining "Artificial General Intelligence". I don't think we are there yet.
We'll know AGI has arrived when AGI researchers manage to go five minutes without publishing hallucinated citations.
https://x.com/m2saxon/status/1979349387391439198
Came from the Google Docs to BibTeX conversion apparently
https://x.com/m2saxon/status/1979636202295980299
I’m gonna start referring to my own lies as “hallucinations”. I like the implication that I’m not lying, but rather speaking truthfully, sincerely, and confidently about things that never happened and/or don’t exist. Seems paradoxical, but this is what we’re effectively suggesting with “hallucinations”. LLMs necessarily lack things like imagination, or an ego that’s concerned with the appearance of being informed and factually correct, or awareness for how a lack of truth and honesty may affect users and society. In my (not-terribly-informed) opinion, I’d assert that precludes LLMs from even approximate levels of intelligence. They’re either quasi-intelligent entities who routinely lie to us, or they are complex machines that identify patterns and reconstruct plausible-sounding blocks of text without any awareness of abstract concepts like “truth”.
Edit: toned down the preachiness.
This looks like a knee-jerk reaction to the title instead of anything substantial.
It does seem a bit ridiculous…
So infallibility is one of the necessary criteria for AGI? It does seem like a valid question to raise.
Edit due to rate-limiting, which in turn appears to be due to the inexplicable downvoting of my question: since you (JumpCrisscross) are imputing a human-like motivation to the model, it sounds like you're on the side of those who argue that AGI has already been achieved?
> infallibility
Lying != fallibility.
Some AGI definition variables I see:
Is it about jobs/tasks, or cognitive capabilities? The majority of the AI-valley seems to focus on the former, TFA focuses on the latter.
Can it do tasks, or jobs? Jobs are bundles of tasks. AI might be able to do 90% of tasks for a given job, but not the whole job.
If tasks, what counts as a task: Is it only specific things with clear success criteria? That's easier.
Is scaffolding allowed: Does it need to be able to do the tasks/jobs without scaffolding and human-written few-shot prompts?
Today's tasks/jobs only, or does it include future ones too? As tasks and jobs get automated, jobs evolve and get re-defined. So, being able to do the future jobs too is much harder.
Remote only, or in-person too: In-person too is a much higher bar.
What threshold of tasks/jobs: "most" is apparently typically understood to mean 80-95% (Mira Ariel). Automating 80% of tasks is different to 90% and 95% and 99%. diminishing returns. And how are the tasks counted - by frequency, by dollar-weighted, by unique count of tasks?
Only economically valuable tasks/jobs, or does it include anything a human can do?
A high-order bit on many people's AGI timelines is which definition of AGI they're using, so clarifying the definition is nice.
Not only tasks, but you need to look at the net effect
If it does an hour of tasks, but creates an additional hour of work for the worker...
I think if you can put an AI in a humanoid robot (control for appearance), and it can convince me that it's a human after interacting it for a couple of months (control for edgecases), I'd consider it AGI. Surely it might be "smarter than" a human, but for the purpose of my assessing whether it's AGI, interacting with something "way smarter" would be distracting and hamper the assessment, so it has to be "play human" for the purpose of the task. If it can do that, AGI, I'd say. That would be pretty cool. Surely, this is coming, soon.
I think "our" mistake is that we wanted to make a modern human first, while being unable to make an animal or even a caveman, and we lost something in the leap-frog. But we effectively have a database of knowledge that has become interactive thanks to reinforcement learning, which is really useful!
Paper: https://arxiv.org/pdf/2510.18212
That 10-axis radial graph is very interesting. Do others besides this author agree with that representation?
The weak points are speed and long-term memory. Those are usually fixable in computing system. Weak long-term memory indicates that, somehow, a database needs to be bolted on. I've seen at least one system, for driving NPCs, where, after something interesting has happened, the system is asked to summarize what it learned from that session. That's stored somewhere outside the LLM and fed back in as a prompt when needed.
None of this addresses unstructured physical manipulation, which is still a huge hangup for robotics.
I would focus on lowest of the axis. It does not help if some of the axis are at 100% if one of the axis is lacking.
Whatever the definition may be, the goalposts are usually moved once AI reaches that point.
There's at least two distinct basis in AGI refutations : behaviorist and ontological. They often get muddled.
I can't begin to count the number of times I've encountered someone who holds an ontological belief for why AGI cannot exist and then for some reason formulates it as a behavioralist criteria. This muddying of argument results in what looks like a moving of the goalposts. I'd encourage folks to be more clear whether they believe AGI is ontologically possible or impossible in addition to any behavioralist claims.
My experience has been more that the pro-AI people misunderstand where the goalposts were, and then complain when they're correctly pointed at.
The "Turing test" I always saw described in literature, and the examples of what passing output from a machine was imagined to look like, are nothing like what's claimed to pass nowadays. Honestly, a lot of the people claiming that contemporary chatbots pass come across like they would have thought ELIZA passed.
Can you be more concrete? What kind of answer/conversation do you see as demonstrating passing the test, that you think is currently not possible.
Ones in which both the human test takers and the human counterparts are actively trying to prove to each other that they are actually human.
With today's chat bots, it's absolutely trivial to tell that you're not talking to a real human. They will never interrupt you, continue their train of thought even thought you're trying to change the conversation, go on a complete non-sequitur, swear at you, etc. These are all things that the human "controls" should be doing to prove to the judges that they are indeed human.
LLMs are nowhere near beating the Turing test. They may fool some humans in some limited interactions, especially if the output is curated by a human. But left alone to interact with the raw output for more than a few lines, and if actively seeking to tell if you're interacting with a human or an AI (instead of wanting to believe), there really is no chance you'd be tricked.
Okay but we are not really optimizing them to emulate humans right now. In fact, it's the opposite. The mainstream bots are explicitly trained to not identify as humans and to refuse to claim having thought or internal feelings or consciousness.
So in that sense it's a triviality. You can ask ChatGPT whether it's human and it will say no upfront. And it has various guardrails in place against too much "roleplay", so you can't just instruct it to act human. You'd need a different post-training setup.
I'm not aware whether anyone did that with open models already.
Sure, but there is a good reason for that. The way they are currently post-trained is the only way to make them actually useful. If you take the raw model, it will actually be much worse at the kinds of tasks you want it to perform. In contrast, a human can both be human, and be good at their job - this is the standard by which we should judge these machines. If their behavior needs to be restricted to actually become good at specific tasks, then they can't also be claimed to pass the Turing test if they can't within those same restrictions.
>Sure, but there is a good reason for that. The way they are currently post-trained is the only way to make them actually useful.
Post training them to speak like a bot and deny being human has no effect on how useful they are. That's just an Open AI/Google/Anthropic preference.
>If you take the raw model, it will actually be much worse at the kinds of tasks you want it to perform
Raw models are not worse. Literally every model release paper that compares both show them as better at benchmarks, if anything. Post training degrading performance is a well known phenomena. What they are is more difficult to guide/control. Raw models are less useful because you have to present your input in certain ways, but they are not worse performers.
It's besides the point anyways because again, you don't have to post train them to act as anything other than a human.
>If their behavior needs to be restricted to actually become good at specific tasks, then they can't also be claimed to pass the Turing test if they can't within those same restrictions.
Okay, but that's not the case.
> Raw models are less useful because you have to present your input in certain ways, but they are not worse performers.
This is exactly what I was referring to.
You are talking about instruction tuning. You can perform instruction tuning without making your models go out of the way to tell you they are not human, and it changes literally nothing about their usefulness. Their behavior does not have to be restricted this way to get them useful/instruction tuned. So your premise is wrong.
> Okay but we are not really optimizing them to emulate humans right now.
But that is exactly the point of the Turing test.
Ok, but then it doesn't make sense to dismiss AI based on that. It fails the Turing test, because it's creators intentionally don't even try to make something that is good at the (strictly defined) Turing test.
If someone really wants to see a Turing-passing bot, I guess someone could try making one but I'm doubtful it would be of much use.
Anyways,people forget that the thought experiment by Turing was a rhetorical device, not something he envisioned to build. The point was to say that semantic debates about "intelligence" are distractions.
Are you saying that we already have AGI, except those pesky goalpost movers keep denying the truth? Hm.
I'd say yes, by at least one old definition made by someone who was at the time in a position to have a definition.
When deepmind was founded (2010) their definition was the following: AI is a system that learns to perform one thing; AGI is a system that learns to perform many things at the same time.
I would say that whatever we have today, "as a system" matches that definition. In other words, the "system" that is say gpt5/gemini3/etc has learned to "do" (while do is debateable) a lot of tasks (read/write/play chess/code/etc) "at the same time". And from a "pure" ML point, it learned those things from the "simple" core objective of next token prediction (+ enhancements later, RL, etc). That is pretty cool.
So I can see that as an argument for "yes".
But, even the person who had that definition has "moved the goalposts" of his own definition. From recent interviews, Hassabis has moved towards a definition that resembles the one from this paper linked here. So there's that. We are all moving the goalposts.
And it's not a recent thing. People did this back in the 80s. There's the famous "As soon as AI does something, it ceases to be AI" or paraphrased "AI is everything that hasn't been done yet".
> AGI is a system that learns to perform many things at the same time.
What counts as a "thing"? Because arguably some of the deep ANNs pre-transfomers would also qualify as AGI but no one would consider them intelligent (not in the human or animal sense of intelligence).
And you probably don't even need fancy neural networks. Get a RL algorithm and a properly mapped solution space and it will learn to do whatever you want as long as the problem can be mapped.
Can you cite the Deepmind definition? No Google results for that.
It's from a documentary that tracks Hassabis' life. I c/p from an old comment of mine (the quotes are from the documentary, can probably look up timestamps if you need, but it's in the first ~15 minutes I'd say, when they cover the first days of Deepmind):
----
In 2010, one of the first "presentations" given at Deepmind by Hassabis, had a few slides on AGI (from the movie/documentary "The Thinking Game"):
Quote from Shane Legg: "Our mission was to build an AGI - an artificial general intelligence, and so that means that we need a system which is general - it doesn't learn to do one specific thing. That's really key part of human intelligence, learn to do many many things".
Quote from Hassabis: "So, what is our mission? We summarise it as <Build the world's first general learning machine>. So we always stress the word general and learning here the key things."
And the key slide (that I think cements the difference between what AGI stood for then, vs. now):
AI - one task vs. AGI - many tasks
at human level intelligence.
It doesn't play chess? Just can parrot it very well
Yeah, maybe. But what matters is the end result. In the kaggle match, one of the games from the finals (grok vs o3) is rated by chesscom's stockfish as 1900vs2500. That is, they played a game at around those ratings.
For reference, the average chesscom player is ~900 elo, while the average FIDE rated player is ~1600. So, yeah. Parrot or not, the LLMs can make moves above the average player. Whatever that means.
I believe it will make illegal moves (unaided by any tools ofc). It will also make mistakes doing things like not being able to construct the board correctly given a fen string. For these reasons I consider long strings of correct moves insufficient to say it can play the game. If my first two statements, about a propensity for illegal moves and other fails on "easy for humans" tasks were untrue then I would reconsider.
In the kaggle test they considered the match forfeit if the model could not produce a legal move after 3 tries (none of the matches in the finals were forfeited, they all ended with checkmate on the board). Again, chesscom's interface won't let you make illegal moves, and the average there is 900. Take that as you will.
No, just what has usually happened in the past with AI goalposts.
At first, just playing chess was considered to be a sign of intelligence. Of course, that was wrong, but not obvious at all in 1950.
You know, as the saying goes, if a metric becomes a target...
I think, given some of the signs of the horizon, there is a level of MAD type bluffing going around, but some of the actions by various power centers suggest it is either close, people think its close or it is there.
It wasn't the best definition of AGI but I think if you asked an interested layman whether or not a system that can pass the Turing test was AGI 5 years ago, they would have said yes
An interested but uninformed layman.
When I was in college ~25 years ago, I took a class on the philosophy of AI. People had come up with a lot of weird ideas about AI, but there was one almost universal conclusion: that the Turing test is not a good test for intelligence.
The least weird objection was that the premise of the Turing test is unscientific. It sees "this system is intelligent" as a logical statement and seeks to prove or disprove it in an abstract model. But if you perform an experiment to determine if a real-world system is intelligent, the right conclusion for the system passing the test is that the system may be intelligent, but a different experiment might show that it's not.
Douglas Hofstadter wrote Gödel, Escher, Bach nearly 50-years ago, and it won a Pulitzer Prize, the National Book Award, and got featured in the popular press. It’s been on lots of college reading lists, from 2007 online coursework for high school students was available from MIT. The FBI concluded that the 2001 anthrax scare was in-part inspired by elements of the book, which was found in the attacker’s trash. Anyone who’s wanted to engage with the theories and philosophy surrounding artificial intelligence has had plenty of materials that get fairly in-depth asking and exploring these same questions. It seems like a lot of people seem to think this is all bleeding edge novelty (at least, the underlying philosophical and academic ideas getting discussed in popular media), but rather all of the industry is predicated on ideas that are very old philosophy + decades-old established technology + relatively recent neuroscience + modern financial engineering. That said, I don’t want to suggest a layperson is likely to have engaged with any of it, so I understand why this will be the first time a lot of people will have ever considered some of these questions. I imagine what I’m feeling is fairly common to anyone who’s got a very niche interest that blows up and becomes the topic of interest for the entire world. I think there’s probably some very interesting, as-yet undocumented phenomena occurring that’s been the product of the unbelievably vast amount of resources sunk into what’s otherwise a fairly niche kind of utility (in LLMs specifically, and machine learning more broadly). I’m optimistic that there will be some very transformational technologies to come from it, although whether it will produce anything like “AGI”, or ever justify these levels of investment? Both seem rather unlikely.
I don't think AGI is a useful concept, but if it exists at all, there's a very good argument that LLMs had it as soon as they could pass the Turing test reliably, which they accomplished years ago at this point.
LLMs do not pass the turing test. It's very easy to know if you're speaking with one.
Isn’t that the point of trying to define it in a more rigorous way, like this paper is doing?
Are you claiming that LLMs have achieved AGI?
Compared to everything that came before they are fairly general alright.
I agree if our comprehension of intelligence and “life” is incomplete, so is our model for artificial intelligence.
The authors acknowledge that this is entirely possible. Their work is just grounded in theory, after all:
> we ground our methodology in Cattell-Horn-Carroll theory, the most empirically validated model of human cognition.
Why do we even want to have human intelligence? It's flawed and limited in so many ways. Most of its magic is there because it cares about its host.
It's easy: we have reached AGI when there are zero jobs left. Or at least non manual labor jobs. If there is a single non-physical job left, then that means that person must be doing something that AI can't, so by definition, it's not AGI.
I think it'll be a steep sigmoid function. For a long time it'll be a productivity booster, but not enough "common sense" to replace people. We'll all laugh about how silly it was to worry about AI taking our jobs. Then some AI model will finally get over that last hump, maybe 10 or 20 years from now (or 1000, or 2}, and it will be only a couple months before everything collapses.
I dislike your definition. There are many problems besides economic ones. If you defined "general" to mean "things the economy cares about", then what do you call the sorts of intelligences that are capable of things that the economically relevant ones are not?
A specific key opens a subset of locks, a general key would open all locks. General intelligence, then, can solve all solvable problems. It's rather arrogant to suppose that humans have it ourselves or that we can create something that does.
It also partitions jobs into physical and intellectual aspects alone. Lots of jobs have a huge emotional/relational/empathetic components too. A teacher could get by being purely intellectual, but the really great ones have motivational/inspirational/caring aspects that an AI never could. Even if an AI says the exact same things, it doesn't have the same effect because everyone knows it's just an algorithm.
And most people get by on those jobs by faking the emotional component, at least some of the time. AGI presumably can fake perfectly and never burn out.
> And most people get by on those jobs by faking the emotional component
If you think this is true, I would say you should leave artificial life alone until you can understand human beings better.
Have a long talk with any working teacher or therapist. If you think the regular workload is adequate for them to offer enough genuine emotional support for all the people they work with, always, everyday, regardless of their personal circumstances, you're mistaken. Or the person you're talking with is incredibly lucky.
This is kind of annoying.
The "computer" on star trek TNG was basically agentic LLMs (it knows what you mean when you ask it things, and it could solve things and modify programs by telling it what changes to make)
Data on ST:TNG was more like AGI. It had dreams, argued for itself as a sentient being, created art, controlled its own destiny through decision making.
Desperate to run without even knowing how to walk.
All models are wrong, but some are useful. However when it comes to cognition and intelligence we seem to be in the “wrong and useless” era or maybe even “wrong and harmful” (history seems to suggest this as a necessary milestone…anyone remember “humorism”?)
I'm also frustrated by the lack of clear definitions related to AI.
Do you know what's more frustrating, though? Focusing so heavily on definitions that we miss the practicality of it (and I'm guilt of this at times too).
We can debate definitions of AGI, but given that we don't know what a new model or system is capable of until its built and tested in the real world we have more serious questions in my opinion.
Debates over AI risk, safety, and alignment are still pretty uncommon and it seems most are happy enough to accept Jevons Paradox. Are we really going to unleash whatever we do build just to find out after the fact whether or not its AGI?
How about AFI - artificial fast idiot. Dumber than a baby, but faster than an adult. Or AHI - artificial human imitator.
This is bad definition, because human baby is already AGI when it's born and it's brain is empty. AGI is the blank slate and ability to learn anything.
That "blank slate" idea doesn't really apply to humans, either.
We are born with inherited "data" - innate behaviors, basic pattern recognition, etc. Some even claim that we're born with basic physics toolkit (things are generally solid, they move). We then build on that by being imitators, amassing new skills and methods simply by observation and performing search.
Sure, there's lots of inbuilt stuff like basic needs and emotions. But still, baby doesn't know anything about the world. It's the ability to collect data and train on it that makes it AGI.
> baby doesn't know anything about the world
That's wrong. It knows how to process and signal low carbohydrate levels in the blood, and it knows how to react to a perceived threat (the Moro reflex).
It knows how to follow solid objects with its eyes (when its visual system adapts) - it knows that certain visual stimuli correspond to physical systems.
Could it be that your concept of "know" is defined as common sense "produces output in English/German/etc"?
I was going to make a mildly snide remark about how once it can consistently make better decision than average person, it is automatically qualifies, but the paper itself is surprisingly thoughtful in describing both: where we are and where it would need to be.
Filling forms is a terribly artificial activity in essence. They are also very culturally biased, but that fits well with the material the NNs have been trained with.
So, surely those IQ-related tests might be acceptable rating tools for machines and they might get higher scores than anyone at some point.
Anyway, is the objective of this kind of research to actually measure the progress of buzzwords, or amplify them?
A "general intelligence" is equivalent to a golden retriever or dolphin. A human general intelligence is a $3/hr minimum wage worker from some undeveloped country.
https://en.wikipedia.org/wiki/Cattell%E2%80%93Horn%E2%80%93C...
If a person has all those criteria, they are superintelligent. They are beyond genius.
The AGI definition problem is that everyone keeps conflating AGI with ASI, Artificial Super Intelligence.
To define AGI, we'd first have to define GI. Humans are very different. As park rangers like to say, there is an overlap between the smartest bears and the dumbest humans, which is why sometimes people can't open bear-proof trash cans.
It's a similar debate with self driving cars. They already drive better than most people in most situations (some humans crash and can't drive in the snow either for example).
Ultimately, defining AGI seems like a fools errand. At some point the AI will be good enough to do the tasks that some humans do (it already is!). That's all that really matters here.
" At some point the AI will be good enough to do the tasks that some humans do (it already is!). That's all that really matters here."
What matters to me is, if the "AGI" can reliably solve the tasks that I give to it and that requires also reliable learning.
LLM's are far from that. It takes special human AGI to train them to make progress.
> What matters to me is, if the "AGI" can reliably solve the tasks that I give to it and that requires also reliable learning.
How many humans do you know that can do that?
Most humans can reliably do the job they are hired to do.
Usually they require training and experience to do so. You can't just drop a fresh college grad into a job and expect them to do it.
They may require training but that training is going to look vastly different. We can start chatting about AGI when AI can be trained with as few examples and as little information as humans are, when they can replace human workers 1:1 (in everything we do) and when they can self-improve over time just like humans can.
But given enough time, they will figure it out on their own. LLM's cannot do that ever.
Once they can ... I am open to revisit my assumptions about AGI.
I can define AGI in a line:
an entity which is better than any human at any task.
Fight me!
a significant % of what I do day-to-day is dedicated to the task of finding sexual partners. how does this translate?
if it doesn't, how do you define "any task"?
Mine has always been:
I have 2 files. One is a .pdf . The other is a .doc . One file has a list of prices and colors in 2 columns. The other file has a list of colors and media in 2 columns. There are incomplete lists here and many to one matching.
To me, if I can verbally tell the AI to give me a list of prices and media from those two files, in a .csv file, and it'll ask back some simple questions and issues that it needs cleaned up to accomplish this, then that is AGI to me.
It is an incredibly simple thing for just about any middle school graduate.
And yet! I have worked with PhDs that cannot do this. No joke!
Something this simple, just dead running numbers, dumb accounting, is mostly beyond us.
This is fine for a definition of AGI, but it's incomplete. It misses so many parts of the cognition that make humans flexible and successful. For example, emotions, feelings, varied pattern recognition, propreception, embodied awareness, social skills, and navigating ambiguous situation w/o algorithms. If the described 10 spectrums of intelligence were maxed by an LLM, it would still fall short.
Eh, I don't like the idea of 'intelligence' of any type using humans as the base line. It blinds it to our own limitations and things that may not be limits to other types of intelligence. The "AI won't kill us all because it doesn't have emotions" problem is one of these. For example, just because AI doesn't get angry, doesn't mean it can't recognize your anger and manipulate if given such a directive to.
I agree, my point is that the cognition that creates emotion (and others) is of a different quality than the 10 listed in the paper.
You need some expertise in a field to see past the amazing imitation capabilities of LLMs and get a realistic idea of how mediocre they are. The more you work with it the less you trust it. This is not _it_.
Quite the list of authors. If they all personally approved the text, it's an achievement in itself just to get all of them to agree on a definition.
indeed, i am wondering if these hn comments actually have an idea and they rub shoulders with these names with their dismissive confidence.
Maybe one of these exalted names should've proof-read the paper:
https://x.com/m2saxon/status/1979349387391439198
> defining AGI as matching the cognitive versatility and proficiency of a well-educated adult
Seems most of the people one would encounter out in the world might not posses AGI, how are we supposed to be able to train our electrified rocks to have AGI if this is the case?
If no one has created a online quiz called "Are you smarter than AGI?" yet based on the proposed "ten core cognitive domains", I'd be disappointed.
Maybe we need a new term. I mean AGI just means artificial general intelligence as opposed to specialised AI like chess computers and never came with a particular level it had to be. Most people think of it as human level intelligence so perhaps we should call it that?
Since everyone's spitballing their idea of AGI, my personal take is that AGI should be a fully autonomous system that have a stable self-image of some sort, can act on its own volition, understand the outcome of its actions, learn from cause-and-effect, and can continue doing so indefinitely.
So far, LLMs aren't even remotely close to this, as they only do what they are told to do (directly or otherwise), they can't learn without a costly offline retraining process, they do not care in the slightest what they're tasked with doing or why, and they do not have anything approximating a sense of self beyond what they're told to be.
Yeah my definition of AGI has always been close to this. The key factors:
- It's autonomous
- It learns (not retraining, but true learning)
- By definition some semblance of consciousness must arise
This is why I think we're very far from anything close to this. Easily multiple decades if not far longer.
Creative problem solving and commonsense physics are missing, among others.
It is a valuable contribution but the CHC theory from psychology that this is based on is itself incomplete.
By commonsense physics, I mean something like simulating interactions of living and non-living entities in 3D over time. Seems more complicated than the examples in the web site and in most tests used in psychometrics.
Creative problem solving with cognitive leaps required for truly novel research & invention could lie outside the rubrics as well. The criteria in CHC are essential but incomplete I believe.
Long-term memory storage capacity[1] scores 0 for both GPT-4 and GPT-5. Are there any workable ideas or concepts for solving this?
[1]: The capability to continually learn new information (associative, meaningful, and verbatim). (from the publication)
I think that's a good effort! I remember mentioning the need for this here a few months ago: https://news.ycombinator.com/item?id=44468198
A forecast by one of the authors of the paper: 50% chance that AGI is reached according to the definition by end of 2028, 80% by end of 2030. https://ai-frontiers.org/articles/agis-last-bottlenecks
People say things like this all the time. It's as reliable as the latest prediction for the rapture and about as scientific.
Everyone has a definition and so have I. I would call it an AGI when i replace my smartphone and laptop with it. When my screen time is zero? Can AGI replace screens? Go figure.
What about learning? As humans we continually update our weights from sensing the world. Before the AI can rewrite itself it can't really be AGI imo
And this is it (from the abstract):
Most people who say "AGI" really mean either "ASI" or "Recursive Self Improvement".
AGI was already here the day ChatGPT released: That's Peter Norvig's take too: https://www.noemamag.com/artificial-general-intelligence-is-...
The reason some people treat these as equivalent is that AI algorithm research is one of the things a well-educated adult human can do, so an AGI who commits to that task should be able to improve itself, and if it makes a substantial improvement, then it would become or be replaced by an ASI.
To some people this is self-evident so the terms are equivalent, but it does require some extra assumptions: that the AI would spend time developing AI, that human intelligence isn't already the maximum reachable limit, and that the AGI really is an AGI capable of novel research beyond parroting from its training set.
I think those assumptions are pretty easy to grant, but to some people they're obviously true and to others they're obviously false. So depending on your views on those, AGI and ASI will or will not mean the same thing.
Funny but the eyebrow-raising phrase 'recursive self-improvement' is mentioned in TFA in an example about "style adherence" that's completely unrelated to the concept. Pretty clearly a scam where authors are trying to hack searches.
Prerequisite for recursive self-improvement and far short of ASI, any conception of AGI really really needs to be expanded to include some kind of self-model. This is conspicuously missing from TFA. Related basic questions are: What's in the training set? What's the confidence on any given answer? How much of the network is actually required for answering any given question?
Partly this stuff is just hard and mechanistic interpretability as a field is still trying to get traction in many ways, but also, the whole thing is kind of fundamentally not aligned with corporate / commercial interests. Still, anything that you might want to call intelligent has a working self-model with some access to information about internal status. Things that are mentioned in TFA (like working memory) might be involved and necessary, but don't really seem sufficient
I find the nature of AGI discussion to be so narrow and tedious. Intelligence is incomprehensibly more than being able to generate text that looks convincingly like a human wrote it. The coordination of a physical body, the formation of novel thoughts, the translation of thoughts to action, understanding the consequences of those actions, and so on. There’s so much missing that is required to even approach a literal human infant’s “intelligence” that it feels like I’m going crazy entertaining people’s arguments that we are approaching “AGI”.
I think a lot of this is all backwards. People think AGI is taking something dumb, like an LLM, and sticking on learning, like a software upgrade.
I think it's the other way around: you build a system that first and foremost _learns_ as part of its fundamental function, _then_ you train it in the domain you want expertise.
You're not going to get expertise in all domains all the time, just like with people. And you're not going to get a perfect slave either, just like with humans. You'll probably get something more like in between a human and machine. If that's what you really want, great.
To put this another way, if you neglect your kids, they're still going to learn things, just probably not things you want them to learn. If you neglect your language model it's just not going to do anything.
Dan is very ambitious great marketer too
Interesting read. I agree completely with their Introduction, that the definition of AGI is constantly shifting, and this leads to endless (and useless) debates.
What I find cool about the paper is that they have gathered folks from lots of places (berkley, stanford, mit, etc). And no big4 labs. That's good imo.
tl;dr; Their definition: "AGI is an AI that can match or exceed the cognitive versatility and proficiency of a well-educated adult."
Cool. It's a definition. I doubt it will be agreed on by everyone, and I can see endless debates about just about every word in that definition. That's not gonna change. At least it's a starting point.
What I find interesting is that they specifically say it's not a benchmark, or a test set. It's a framework where they detail what should be tested, and how (with examples). They do have a "catchy" table with gpt4 vs gpt5, that I bet will be covered by every mainstream/blog/forum/etc out there -> gpt5 is at ~50% AGI. Big title. You won't believe where it was one year ago. Number 7 will shock you. And all that jazz.
Anyway, I don't think people will stop debating about AGI. And I doubt this methodology will be agreed on by everyone. At the end of the day both extremes are more ideological in nature than pragmatic. Both end want/need their view to be correct.
I enjoyed reading it. Don't think it will settle anything. And, as someone posted below, when the first model will hit 100% on their framework, we'll find new frameworks to debate about, just like we did with the turing test :)
> tl;dr; Their definition: "AGI is an AI that can match or exceed the cognitive versatility and proficiency of a well-educated adult."
Is a 9 year old child generally intelligent? What about a high school drop out? Someone with a 90 IQ? A large percentage of people who ever lived wouldn't qualify as being generally intelligent with that benchmark.
I would define AGI as any artificial system that could learn any skill a human can by using the same inputs.
>Paper claims definition of AGI >Look inside >No definition of AGI.
> .., Eric Schmidt, ..
Right. That explains it.
Yoshua Bengio too. Does anyone know connections between the two? Is Schmidt supporting researches at Mila?
How, summing (not averaging) to 58 of 1000 possible points (0-100 in each of ten domains), are we calling this score 58% rather than 5.8%?
It's confusing. The 10 tracks each get 10%. So they add up all the percentages from every track. When you see the first table, 10% on math means "perfect" math basically. Not 10% of math track.
0-10 in each domain. It’s a weird table.
The simple additive scoring here is sus here. It means a model that's perfect on 9/10 axes but scores 0% on Speed (i.e., takes effectively infinite time to produce a result) would be considered "90% AGI".
By this logic, a vast parallel search running on Commodore 64s that produces an answer after BeaverNumber(100) years would be almost AGI, which doesn't pass the sniff test.
A more meaningful metric would be more multiplicative in nature.
> Last, we deliberately focus on core cognitive capabilities rather than physical abilities such as motor skills or tactile sensing, as we seek to measure the capabilities of the mind rather than the quality of its actuators or sensors.
seems pretty unfair to exclude motor skills, especially given 1) how central they are to human economic activity, and 2) how moravec's paradox tells us they are the hard part.
GPT-5 is 57%? Hilarious. This is a bad joke.
The problem is not really defining AGI. It's testing for it in a way that avoids illusory intelligence.
This is a serious paper by serious people and it is worth reading, but any definition of intelligence that depends on human beings as reference will never be a good basis for evaluating non human intelligence.
You could easily write the reverse of this paper that questions whether human beings have general intelligence by listing all the things that LLMs can do, which human beings can't -- for example producing a reasonably accurate summary of a paper in a few seconds or speaking hundreds of different languages with reasonable fluency.
You can always cherry pick stuff that humans are capable that LLMs are not capable of and vice versa, and and I don't think there is any reason to privilege certain capabilities over others.
I personally do not believe that "General Intelligence" exists as a quantifiable feature of reality, whether in humans or machines. It's phlogiston, it's the luminiferous ether. It's a dead metaphor.
I think what is more interesting is focusing on _specific capabilities_ that are lacking and how to solve each of them. I don't think it's at all _cheating_ to supplement LLM's with tool use, RAG, the ability to run python code. If intelligence can be said to exist at all, it is as part of a system, and even human intelligence is not entirely located in the brain, but is distributed throughout the body. Even a lot of what people generally think of as intelligence -- the ability to reason and solve logic and math problems typically requires people to _write stuff down_ -- ie, use external tools and work through a process mechanically.
> To operationalize this, we ground our methodology in Cattell-Horn-Carroll theory, the most empirically validated model of human cognition
Cattell-Horn-Carroll theory, like a lot of psychometric research, is based on collecting a lot of data and running factor analysis (or similar) to look for axes that seem orthogonal.
It's not clear that the axes are necessary or sufficient to define intelligence, especially if the goal is to define intelligence that applies to non-humans.
For example reading and writing ability and visual processing imply the organism has light sensors, which it may not. Do all intelligent beings have vision? I don't see an obvious reason why they would.
Whatever definition you use for AGI probably shouldn't depend heavily on having analyzed human-specific data for the same reason that your definition of what counts as music shouldn't depend entirely on inferences from a single genre.
Here is a definition of AGI in 2025: Hype.
Oh yeah, it's lack of a definition that keeps these models from replacing humans. /s
Exactly! and this circular and generally poor definition is going to put it all into perspective.