Intelligence: A Working Definition

By Frederick Lowe, May 11, 2026

Open AI's Lucasz Kaiser and Pathway's Adrian Kowsowski square off in Pathway's Transformer vs Post-Transformer debate at INSEAD — OpenAI's Lukasz Kaiser and Pathway's Adrian Kosowski face off at the Transformer vs Post-Transformer debate at INSEAD

This essay is the first of six in a larger series discussing Transformers and Post-Transformers. It establishes the meaning of "Intelligence" and "Functional General Intelligence," and provides a reference point for those terms in future essays.

Definitions

I use the word intelligence throughout this document, sometimes prefaced by "human" or "machine". But there is a unifying meaning a reader should carry, and it's central to this essay's claim: the lynchpin of "intelligence" as a recognizable human trait is "useful known-to-novel context synthesis". Accordingly, when I use terms like "Functional General Intelligence", I'm talking about intelligence that evidences the same useful known-to-novel context synthesis we associate with human intelligence.

I also use "contexts" broadly and deliberately. The known-to-novel context synthesis I assert as central to the definition of intelligence in any form arises from codified or empirically discoverable knowledge. There are too many forms of that to comprehensively enumerate them.

I use both Large Language Models and LLM interchangeably, with the acronym use assuming the plural 's' of its expansion. Readers should expect to see "LLM are", rather than "LLMs are".

Functional General Intelligence

I propose that Functional General Intelligence is:

The ability to efficiently synthesize known contexts into novel contexts of social, economic, or scientific value (thinking), and to plastically incorporate novel contexts into known contexts (learning).

Qualifiers:

Dependent

"Value" - Illustrative enumeration. I could not hope to list every category of value. Further, I acknowledge that "value" is generally entangled with concepts like "benefit" and "improvement".

"Thinking" - For precision, known-to-novel context synthesis does not represent the full scope of thinking. Recall, chunking, and pattern-matching are all forms of thinking as well, and they are assumed as prerequisites to useful synthesis.

Dispositive

"Efficiency" - inefficiency is a barrier to unqualified functionality. A General Intelligence whose energy costs exceed the economic value of the novel contexts it generates is not Functional. Nor is an Intelligence whose environmental impact is irreversible destruction of dependent ecosystems.

"Plasticity" - a lack of plasticity is a barrier to unqualified functionality. An Intelligence which cannot incorporate novel contexts is neither, by definition, Functional nor General, and therefore does not qualify as a Functional General Intelligence.

"General Utility" - An intelligence that understands the needs of all known applications, but cannot serve them, is not Generally Useful. Functional General Intelligence requires both capability and utility.

Maybe worth pointing out that by this definition, most humans don't meet the Functional General Intelligence bar, because most humans aren't polymaths who (given sufficient training time) could acquire arbitrary capability leading to arbitrary utility.

Plasticity

The biological concept of "plasticity" is foundational to the concept of Functional General Intelligence. An intelligence capable of known-to-novel context synthesis, but which cannot incorporate that synthesis into known contexts becomes stale on its first act of synthesis.

Reinforcement Learning from Human Feedback (RLHF) changes LLM weights, which alters outputs in a way that simulates learning. But it is not learning in the Hebbian meaning of that term (repeated use strengthens neural pathways), due to its dependency on external supervisory signal.

This is not a trivial difference. Despite anything a reader may have encountered to the contrary, RLHF is not learning, in the sense that anyone outside LLM post-training means when they use that word.

General Capability vs General Utility

As this essay concerns itself with defining types of intelligence in the Transformer vs. Post-Transformer discussion, it bears noting that the term comprises two distinct sub-components. LLM increasingly satisfy the first (General Capability), but cannot satisfy the second (General Utility), now or foreseeably for a sizable number of applications.

Functional General Intelligence assumes General Capability; a scope consideration. It describes facilities that no known, non-mystical intelligence delivers: unqualified grasp of all known contexts, and the ability to synthesize novel contexts from that understanding.

It also assumes General Utility; an applied consideration. An intelligence which understands all known contexts, but cannot leverage that understanding at a rate demanded by general application is inarguably useful, but not generally so.

Learning Rate and Subject Mastery

Known contexts are a result of training. Two quantifiable aspects of training, learning rate and subject mastery, are also broadly accepted and readily identifiable aspects of human intelligence.

A person who learns to speak a non-native language to conversational fluency in 400 hours of training would likely be seen as intelligent. A person who learns a non-native language to full fluency in the same 400-hour window would likely be seen as "considerably more intelligent".

How fast you learn and how much you learn are broadly accepted evidence of intelligence, and serve as defensible proxies for it.

Known-To-Novel Context Synthesis

A third aspect, harder to quantify, is useful known-to-novel context synthesis. It arises reliably along a few specific paths:

Path 1 - Deliberate results of scientific or lay reasoning, such as when a person considers a known A and a known B, and synthesizes them to hypothesize a novel C, tests that hypothesis, and subsequently observes repeatable evidence that establishes or disproves the predicted C.

Scientific examples: Mendel's pea-plant trait inheritance experiments. Hertz confirming Maxwell's predicted electromagnetic waves. Edison's exhaustive filament screening for the incandescent bulb.

Path 2 - Surprise results of scientific or lay reasoning, such as when a person considers a known A and a known B, and synthesizes them to hypothesize a novel C, tests that hypothesis, and subsequently observes repeatable evidence that disproves novel C, but establishes novel D.

Scientific examples: Rutherford's gold foil experiment (expected light scattering, found nuclei). Penzias and Wilson hunting radio noise, discovering the cosmic microwave background. Michelson-Morley designed to measure ether drift, found the null result that opened the door to relativity.

Path 3 - Sudden insight results of scientific or lay reasoning, such as when a novel E reveals itself via accidental, intuitive, or subconscious paths, to a person with sufficient known contexts to understand what they've seen is novel, and to claim discovery on that basis.

Scientific examples: Kekulé's benzene-ring dream. Fleming's penicillin contamination. Poincaré's bus-step group theory insight.

Across all three paths, methodical adherence to rationality and/or scientific rigor bordering on reflexiveness is, itself, a marker of intelligence. Path 3 cases look magical, but Newton's apple doesn't get promoted from curiosity to evidence in a mind that isn't already reasoning about gravity.

The paths described above are illustrative, not definitive; I am certain there are paths I have not listed, and given the centrality of known-to-novel context synthesis to this project, I'll likely return to expand this section with other concrete examples. But irrespective of how it's achieved, useful known-to-novel context synthesis is widely accepted as evidence of intelligence.

I assert it is the most probative available evidence.

Measurement

Measuring learning rate and subject mastery is easy, because both can be described in advance of learning and tested post-hoc.

Learning rate is a simple observation of the time to achieve a given level of subject mastery. Subject mastery can be established by testing the learner's command using the training material itself, and generalized mastery can be established by testing adjacent, held-out material.

Returning to non-native language learning, a useful example is verb conjugation:

A person who has learned a specific set of regular and irregular verbs can be said to have mastered conjugation of those verbs.
A person who has learned to conjugate all regular verbs within a class (-er, -ir, and -ar suffixes in Spanish) can be said to have mastered regular conjugation for verbs with that suffix.

Thus aspects one and two, rate and subject mastery, are both well understood and easily measurable. Aspect three, known-to-novel context synthesis, is less measurable; it may be resistant to scalable post-hoc measurement altogether.

That's partly because novel context synthesis isn't inherently an "intelligent" activity when viewed through the lenses of utility and complexity.

For example, nearly any intelligence can synthesize a set of useless-but-novel contexts from known contexts (consider a babbling toddler delighted with the sound of its newfound voice). And synthesizing a simple-and-novel context that is highly useful might be seen as evidence of a different level of intelligence than synthesizing a complex-and-novel context that is marginally useful.

Still, humans have a time-honored benchmark for synthesis, measured by a human institution operating retrospectively.

Inventorship As A Codified, Retrospective Intelligence Benchmark

That institution is the patent system. Ask a seasoned examiner how they adjudge whether a novel context is worthy of the protection sought by an applicant, and you might be surprised to hear an answer that frames patentable novelty in the same terms as the definition-resistant concept of obscenity: "you know it when you see it."

No application is allowed strictly on that intuition, however. Examiners spend their professional days reviewing applications against an ever-expanding bolus of prior art. Office actions force applicants to narrow their applications to claims that extend art, and final rejections serve as a firewall against cruft. Utility is a requirement (under 35 U.S.C. § 101), novelty is a requirement (under 35 U.S.C. § 102), and an application must be "non-obvious to one of ordinary skill in the art" (under 35 U.S.C. § 103): roughly, a median competent practitioner in the field. Allowances raise the bar for novelty.

The patent system of 2026 is messy in a way that has diminished its usefulness to most legitimate filers, while remaining lucrative for litigation specialists. But irrespective of that criticism, it persists as an institution established by humans to judge whether a synthesis from known contexts is both novel and useful.

It's worth remembering that patents reward publication of the details of novel contexts by granting a temporary legal monopoly on their value, thereby increasing known contexts and stimulating additional known-to-novel context synthesis.

For a time, that system produced a social class that was a form of intellectual celebrity: The Inventor. That honorarium, and its attendant potential economic fortune, remain as evidence that humanity regards acts of useful known-to-novel context synthesis as distinct, recognizable, and worthy of significant reward.

AI As An Inventor

Against this backdrop, large language models are trivial to assess on the first two aspects, and impossible to currently assess on the third due to the PTO's refusal to review LLM-generated patent applications.

Aspect 1 - Rate

There is no remotely comparable human analog.

A frontier LLM ingests approximately the content of the internet as training data over a period of months, tokenizing it in its entirety.
A human scholar reads perhaps a hundredth of one percent of that over a lifetime, encoding a tiny fraction of that material neurologically.

This observation is pivotal to the Transformer / Post-Transformer conversation, because Transformer-based machine intelligences require extraordinary training material volume to generate extraordinary results. Humans do not. A human intelligence does not need to ingest the content of the internet to learn to "walk and chew gum", in the cognitive sense of that phrase.

Aspect 2 - Subject Mastery

All frontier LLM generate output reflecting expert-level command of ingested subject matter with sufficient representation in the corpora. Sub-expert outputs are generally rooted in training data volume or training data quality. Technological causes, when they apply, are narrower: context-window limitations that constrain how much information the model can hold in working attention at once (fuller treatment here), and chain-of-thought failures where small errors compound across multi-step inference.

Aspect 3 - Synthesis From Known Contexts To Novel Contexts

The patent system has declared that AI cannot be a named inventor, perhaps for good reason. It couldn't efficiently manage application flow in the pre-LLM era. Processing the volume of submissions it would receive if it began accepting machine-generated applications would drown it.

But saying "LLM can't pass an historic Aspect 3 test of intelligence because the institution responsible for that benchmark has formally refused to consider AI submissions" isn't the same as saying "AI can't perform known-to-novel context synthesis." LLM can and do generate novel-looking contexts. Whether what they produce meets the extant framework's criteria is genuinely unsettled, and subject matter for a different essay.

Conclusion

To attract capital, a machine intelligence architecture must externalize utility that can be leveraged for competitive advantage on investment-cycle timelines. In theory, architectures that fail this bar never attract the capital to proceed beyond the level of "academic curiosity." In practice, during a bubble, irrational exuberance takes over.

There is currently no machine intelligence architecture more capable than the Large Language Model for most human cognitive tasks. In 2026 we passed an inflection point where, given sufficient inference and reasoning time, LLM outperform most humans on many cognitive tasks. It seems likely that by end of 2026 we will see another milestone: Large Language Models outperforming all humans on most cognitive tasks. It's defensible speculation to assume that by 2027, we'll see models that outperform all humans on most or all cognitive tasks.

It's easy to mistake legitimately mind-blowing General Capability milestones for General Utility, but Large Language Models are not useful for a broad class of applications. For example, they cannot be used to increase the value of online advertisement within the ad industry's realtime bid window of 50ms because even the fastest LLM takes longer to reason than 50ms. Further, the attribution signals that make targeted ad serving work well are asynchronous, and LLM are pre-trained (they don't update on live signal). LLM are not useful for self-driving or smart grid switching. Broadly: if an application requires realtime/very-low-latency reasoning, LLM are not the right solution, and may never be.

That's because high inference time is an architectural feature of LLM (see: Understanding Transformer Attention). Making them larger reliably improves their capability, but unless they can be made faster, their utility has hard limits.

Functional General Intelligence requires both General Capability and General Utility: recognizable known-to-novel context synthesis and inference speeds aligned with all applications, not just latency tolerant ones. LLM-pilled advocates answer this criticism with something like "Generally Capable LLM will help us find a way to make Generally Useful LLM".

In the end, they may be right. But Entropy is the opponent to Optimism in that boxing ring, and Entropy hits harder.

AI Use Disclosure

Content Authorship: 100% human.

AI Use, Claude Opus 4.7 (Editor Role): Outline review, adherence, spelling and grammar review, multi-pass guidance, sentence order changes limited to a few paragraphs.

AI Use, ChatGPT 5.5 (Reviewer Role): Full essay review, not resulting in changes. Observation: frontier models are getting worse on the "utility" side for this use case in publishing, I fear. GPT's output might have made the essay more complete structurally, but following its "helpful" suggestions would have steered it into subject matter a layperson can't read.