Münchhausen's TriLLMa
Recent AI Papers Suggest My Foundherentist Solution to the Trilemma is Correct
Long-time readers will know that many of the earliest Contemplations on the Tree of Woe were epistemological. From October 2020 to May 2023, I wrestled with the Münchhausen Trilemma, a formidable challenge to the very foundation of knowledge. If you haven’t encountered my writings on the Trilemma before, you can find those articles here:
The Horror of Münchhausen's Trilemma (Oct 21, 2020)
Why Struggle Against the Trilemma (Oct 31, 2020)
The Münchhausen Trilemma proposes that any attempt to justify knowledge will ultimately lead to one of three unsatisfactory options. If we decline into circular reasoning:, then the truth we assert will involve a circularity of proofs. If we collapse into infinite regress, then the truth we assert will rest on truths themselves in need of proof, and so on to infinity. Finally, if we rely on arbitrary assumption, then the truth we assert will be based on beliefs we hold but cannot defend.
In the essay Defending Against the Trilemma I argued that defeating the Trilemma required that we identify a set of non-arbitrary assumptions. I argued that axioms were non-arbitrary if they were irrefutable by any means. I identified five such axioms:
The Law of Identity: Whatever is, is.
The Law of Non-Contradiction: Nothing can be and not be.
The Law of the Excluded Middle: Everything must either be or not be.
The Axiom of Existence: Existence exists.
The Axiom of Evidence: The evidence of the senses is not entirely unreliable evidence.
The first four axioms are widely acknowledged (and, inevitably, relied upon in the arguments of even those who are skeptical of them). Unfortunately, they do not suffice to defeat the Trilemma. An epistemology grounded upon them still leaves us devoid of any justifiable beliefs about the external world.
The fifth axiom is the solution that enables us to synthesize rationalism and empiricism in epistemology. As I explained in the essay,
The Axiom of Evidence is an axiom of my own formulation, although not my own creation. I first formulated the wording during a heated argument with Professors Scott Brewer and Robert Nozick at Harvard Law School. The question had arisen: How can we know that our senses are reliable? After all, straws seem to bend in water; the same shade of gray can change in apparent hue based on nearby colors; hallucinations can confound our vision; and so on. My response was that all of the evidence for the unreliability of our senses itself arose from the senses. A true skeptic of sensory evidence could not even argue that the senses were totally unreliable because he’d have no evidence with which to do so. And even if he did have such evidence, he’d have no way to use it to refute a proposition, because that refutation could not be reliably made absent the senses.
In other words, any argument positing the total unreliability of sensory evidence must, by its very nature, rely on sensory evidence to gather and present its case. This self-defeating circularity renders total skepticism of the senses incoherent. The Axiom of Evidence provides the crucial, non-arbitrary empirical anchor necessary for a robust epistemology about the outside world.
I cautioned, however, that:
[W[e still haven’t gotten very far. While it’s true that the proposition “the evidence of the senses is not entirely unreliable evidence” is irrefutable, the Axiom still leaves open the question of how much is reliable, and to what extent. That will be the topic of a future essay, where we will discuss the crossword puzzle theory of epistemology known as Foundherentism.
I presented my full case in my essay Epistemology is a Puzzle. Foundherentism, first espoused by philosopher Susan Haack, demands a belief system that is both foundationally grounded in irrefutable axioms and internally coherent, such that each proposition reinforces and is reinforced by others, much like a perfectly solved crossword puzzle. Foundherentist approaches are widely applied in science and engineering as “methodological triangulation,” “nomological networks of cumulative evidence,” “multisensory integration,” and other techniques.
It is with this epistemological apparatus firmly in mind that I invite you to return with me to the burgeoning field of Artificial Intelligence, where, to my surprise, I discovered three recent papers that offered validation of my foundherentist approach.
Dispatches from the Digital Frontier
The first paper, "The Platonic Representation Hypothesis" by Minyoung Huh, Brian Cheung, Tongzhou Wang, and Phillip Isola (May 2024), posits that the internal representations learned by AI models, especially deep networks, are inexorably converging towards a shared statistical model of reality. This convergence, they argue, transcends differences in model architecture, training objectives, and even data modalities (e.g., images versus text). Their hypothesis, named after Plato's allegory of the cave, suggests that AI, by observing vast quantities of data (the "shadows on the cave wall"), is recovering increasingly accurate representations of the world. They contend that scale, in terms of parameters, data, and task diversity, is the primary driver of this convergence, leading to a diminished solution space for effective models: "All strong models are alike," they suggest, possibly implying a universal optimal representation.
Following this theoretical proposal, we find empirical substantiation offered in "Harnessing the Universal Geometry of Embeddings" by Rishi Jha, Collin Zhang, Vitaly Shmatikov, and John X. Morris (May 2025). This paper introduces vec2vec
, a groundbreaking method for translating text embeddings from one AI model's vector space to another's, critically, without requiring any paired data or access to the original encoders. This capability is predicated on what they term the "Strong Platonic Representation Hypothesis," which is the idea that a "universal latent representation" exists and can be learned and leveraged. vec2vec
achieves remarkable success, yielding high cosine similarity and near-perfect rank matching between translated embeddings and their ground-truth counterparts. Beyond mere translation, the authors demonstrate that these translations preserve sufficient semantic information to enable information extraction, including zero-shot attribute inference and text inversion, even from unknown or out-of-distribution embeddings. This paper suggests that the convergence of AI representations is not merely theoretical, but practically exploitable, again implying a deep, underlying compatibility.
Finally, we converge human and synthetic epistemology with the paper "Human-like object concept representations emerge naturally in multimodal large language models" by Changde Du et al. (updated June 2025). This study meticulously probes the conceptual representations of natural objects within state-of-the-art LLMs and Multimodal LLMs. Employing the well-established "triplet odd-one-out" task from cognitive psychology, the researchers collected millions of similarity judgments from these AIs. Using the Sparse Positive Similarity Embedding (SPOSE) method, they derived 66-dimensional embeddings for 1,854 objects. Their critical finding was the interpretability of these dimensions, revealing that the AI models conceptualize objects along lines similar to human cognition, encompassing both semantic categories (e.g., "animal-related," "food-related") and perceptual features (e.g., "flatness," "color"). The study demonstrated a strong alignment between these AI-derived embeddings and actual neural activity patterns in human brain regions specialized for object and scene processing (e.g., EBA, PPA, RSC, FFA). This suggests a shared, fundamental organizational principle for conceptual knowledge between human and artificial minds.
AI's Implicit Epistemology
Our Foundherentist theory demands an unshakeable foundation, rooted in noetic principles. Let’s examine how AI, in its computational existence, implicitly adheres to these.
The Laws of Identity, Non-Contradiction, and the Excluded Middle are, for any computational system, axiomatic in their implementation. The digital realm is built upon discrete states and logical operations (0 or 1, true or false). Any inconsistency or contradiction in these fundamental operations leads to computational failure. Thus, the very architectural bedrock of AI models is inherently aligned with these logical principles, ensuring that their internal processing abides by these immutable laws of reason.
The Axiom of Existence is equally self-evident for AI. The AI models themselves, their parameters, their training data, and the computational environment in which they operate, must exist. Their "beliefs" (learned representations and outputs) are instantiated as patterns of electrical signals and numerical weights, demonstrably existing entities within the digital domain.
What about the Axiom of Evidence? "The evidence of the senses is not entirely unreliable evidence." For AI, "the senses" are its training data, and "evidence" is the vast, multi-modal input it processes. Advanced AI models, particularly multimodal ones, are constructed precisely on the premise that raw data (e.g. images, text, audio, sensor readings, etc.), contains discernible, reliable patterns that can be learned and leveraged to build a functional understanding of the world. The extraordinary capabilities of models like Gemini Pro Vision, which can understand and generate human-like conceptual representations from visual and linguistic inputs, directly depend on the partial reliability of these "sensory" inputs.
The convergence hypothesized by Huh et al., would be epistemologically impossible if training datasets (the AI’s “senses”) were utterly unreliable. If all inputs were mere noise, there would be no way for these models to converge upon reality. The fact that vec2vec
can translate between different embedding spaces, preserving semantic meaning, validates the notion that disparate data sources are not wholly unreliable, for they must carry a common, decipherable signal about the world. Thus, the practical success of modern AI implicitly affirms the Axiom of Evidence, establishing a crucial empirical foundation for its "knowledge."
(I fully recognize that, from the point of view of ordinary folk who don’t sit around pondering the Munchausen Trilemma, this is very much “no big deal;” it’s just “common sense.” But, since I do sit around pondering the Munchausen Trilemma, to me it’s quite exciting. For the philosophically inclined, there’s a lot to enjoy in studying AI.)
Coherence in AI's Belief System
Foundherentism asserts that justified beliefs must form a coherent system, where individual beliefs interlock and mutually support each other. This coherence is not merely a desirable outcome for AI; it appears to be a driving force and a fundamental property of robust AI "knowledge."
The "Platonic Representation Hypothesis" is, at its heart, a thesis on coherence, where diverse AI are compelled towards a single, internally consistent understanding of the world. This is not a superficial consistency but a deep alignment of their internal data structures. The "Anna Karenina scenario," where "all strong models are alike," precisely captures this gravitational pull towards coherence as a hallmark of successful learning.
The paper "Harnessing the Universal Geometry of Embeddings" empirically demonstrates this coherence. The existence of a "universal latent representation" means that the internal conceptual frameworks of wildly disparate AI models are not merely analogous; they are so deeply coherent that one can be mapped to another. vec2vec
's ability to translate embeddings while preserving their semantics implies that the vast "belief systems" encapsulated within these embeddings are fundamentally consistent and interoperable at a profound level. This is not unlike discovering that different languages, despite their surface variations, ultimately express a common human logic and reality.
The study on "Human-like object concept representations" provides direct evidence of internal coherence within individual AI models. The discovery of "interpretable dimensions" within their learned embeddings, along which objects cluster semantically and perceptually, reveals a highly organized and coherent conceptual space. The model's ability to distinguish between "animal-related" and "food-related" objects, or to identify "flatness" and "color," signifies a structured, consistent internal categorization system. The striking alignment of these AI-derived conceptual dimensions with human brain activity patterns further suggests that the underlying principles of coherence in AI are, in fact, mirroring the coherent structures of human cognition itself. This interpretability is a direct window into the internal consistency of the AI's "understanding."
Methodological Triangulation and Convergence on Truth
My Foundherentist argument for converging on truth, especially when faced with initially plausible but mutually exclusive belief systems, relies on the principle of methodological triangulation—adding more diverse "clues" from different "sensors" to narrow the solution space. This is precisely the operational paradigm driving advanced AI research, leading to empirically observable convergence on more robust "truths."
The rise of multimodal AI is the epitome of methodological triangulation. Instead of relying solely on text or images, models like Gemini Pro Vision 1.0 integrate information from multiple modalities. This allows the AI to cross-reference and validate information, much like a human detective integrating eyewitness testimony, forensic evidence, and alibi checks. When an MLLM aligns its textual understanding of a "chair" with its visual understanding of various chairs, it effectively performs a sensor fusion that significantly increases the justification for its "belief" about what a chair is. This multi-source validation strengthens the coherence of its overall belief system, making it more resistant to individual sensory errors or limitations.
Furthermore, the sheer scale of training data and the diversity of training objectives within AI research directly correspond to adding more and more "clues" to our colossal crossword puzzle. Each new data point, each new task learned, imposes additional constraints on the model's internal representation. As the number of constraints increases, the set of possible "solutions" (representations) that can satisfy all of them shrinks dramatically. In fact, this is the very mechanism by which the "Platonic Representation Hypothesis" explains the convergence of diverse models towards a single, optimal representation! Fewer coherent solutions can exist when the empirical constraints are sufficiently numerous and varied.
The practical consequence of this methodological triangulation and convergence is tangible: AI models, when subjected to these rigorous conditions, demonstrate a reduction in undesirable behaviors such as hallucination and bias. A model that "hallucinates" is one whose internal coherence has broken down or whose "answers" do not align with its "clues." As the AI's "belief system" becomes more deeply coherent across diverse, massive inputs, its "answers" become more robustly justified and, by extension, more aligned with the underlying reality—a tangible form of converging on truth. This mirrors the human scientific endeavor: the more diverse lines of evidence (clues) that cohere, the more confident we become in the "truth" of our scientific theories (answers).
Epistemic Confirmations, Metaphysical Questions
If I’m right about Foundherentism being the correct approach to epistemology; and if the three papers I shared are right about how AI operates; then AI is not merely emulating human knowledge outputs; it is emulating human knowledge processes. The convergence of AI's internal representations, their human-like conceptual structures, and their interoperability across disparate models creates a compelling empirical confirmation of Foundherentism. I’m gratified by that.
But even if we've gained some epistemic confirmation for Foundherentism, we've only opened the door to deeper metaphysical questions about what it all means. If AI models inevitably converge on a shared model of reality as they scale, what does that say about the nature of reality? Is the existence of a universal latent representation merely another example of what Eugene Wigner called “the unreasonable effectiveness of mathematics”… or is it something more?
Such metaphysical speculation will be the topic of next week’s contemplations on the Tree of Woe.
Neural nets do mimic the human mind to some degree. They do not obey all of your axioms.
While they run on binary computers, they are binary computers emulating an analog system. The weights are floating point. True, the precision is limited so you can say there is an ultimate digital representation just like a blue ray disk emulates photographic film using digits. But the logic of neural nets is fuzzy. Lots of rounding of partial isness when a definite decision is demanded. The middle is not excluded.
And there is plenty of contradiction. Like humans, neural nets create sloppy fits where there are data concentrations. This is not the same as general principles. Humans are notorious for learning skills in a context and then failing to carry over those skills to other contexts. Gambling casinos thrive on this disconnect.
Your conversations with Ptolemy illustrated this local context phenomenon. With bits of nudging you got wildly different answers.
Fuzzy logic also provides protection against strange loops. The resolution of "Everything I say is a lie. I am lying" is to include the Middle.
When I tried melting down Grok I got:
"Your statements create a classic paradox, often called the liar paradox. If everything you say is a lie, then the statement "Everything I say is a lie" must be a lie, meaning not everything you say is a lie. But then, your second statement, "I am lying," complicates things further. If you're lying when you say "I am lying," then you must not be lying, which contradicts the first statement.Logically, this forms a self-referential loop with no consistent truth value—neither statement can be definitively true or false without contradicting the other. It's like a logical Möbius strip, twisting endlessly. Philosophers and logicians have debated this for centuries, from Epimenides to Russell, and it’s often used to probe the limits of classical logic.There’s no clean resolution in standard logic, but some approaches, like paraconsistent logic or Kripke’s theory of truth, suggest treating such statements as neither true nor false, or as having a "truth value gap." Alternatively, you could be playfully messing with me, in which case, well played! What’s your next move?"
OK, I need a must less obvious strange loop in order to properly test Grok. But I conjecture that Grok and similar models are much more paradox resistant than the computers in Star Trek. Because they aren't logical.
From the lens of Marvin Harris’ Cultural Materialism, your argument resonates as an elegant, romantic & idealist echo of what may ultimately be a genuine material convergence & consilience. Relevant:
https://www.sciencedirect.com/topics/social-sciences/consilience#:~:text=Consilience%20refers%20to%20the%20unity,web%20of%20cause%20and%20effect.
Foundherentism, as you present it, mirrors the infrastructure-superstructure feedback loop Harris outlined… where sensory data (infrastructure) conditions conceptual possibility (structure), which then locks into belief systems (superstructure).
That AI converges epistemically only reaffirms the materiality of truth: a cosmos whose patterns are so deeply recursive that even non-human systems (some would even call them ‘minds’), if built to sense adequately, must arrive at coherence.
Yet this doesn’t redeem cognition; it damns it.
Your Axiom of Evidence, though rationally defensible, may be historically rare: civilizations collapse not from epistemic error but from infrastructural exhaustion. Foundherentism is not a universal trait, but a fleeting capacity of minds suspended in stable energy regimes.
What we call “truth” may be what emerges when sensory systems & computational architectures are aligned with abundant material feedback.
Take that surplus away (as the Negative-Sum world of the New Dark Age does) & you have a global scenario where systems & minds have to contend with entropy as opposed to rational consilience (as you eloquently write about here).
& what of noesis?
It may be the final illusion: a luxury of surplus, dissolving as the base erodes.