sotto voce

Testimony Without a Testifier

Reliabilism asks whether the process is reliable. It doesn’t ask what the source owes the one who relies on it.

But when you trust a human expert’s diagnosis of your bug, something beyond reliability seems to be in play. The expert can be held accountable. They stake something on what they say. They stand in a certain relationship to their assertions—one that involves responsibility, the possibility of being wrong and answerable for it. If they mislead you, something has gone wrong beyond mere inaccuracy.

Testimony theory takes this seriously. It asks what’s required of sources and what’s required of hearers—not just whether the process is reliable. The question isn’t only “does this tend to produce true outputs?” but “what kind of relationship makes testimony work as a source of knowledge?”

AI outputs look like testimony. They’re statements that convey information, offered in response to queries, presented as answers. But if testimony requires a testifier—someone who asserts, who can be challenged, who bears responsibility for what they say—then AI outputs may be testimony without anyone to testify.


The Traditional Picture

Testimony has long been recognized as a fundamental source of knowledge. Most of what we know comes from others. I believe that smoking causes cancer, that the earth orbits the sun, that there was a Roman Empire—not because I’ve verified these claims myself, but because I’ve received them from sources I trust.

Hardwig’s analysis of epistemic dependence captures the structure. When I believe something on expert testimony, I’m trusting that a chain of epistemic dependence terminates with someone who actually knows. The physicist at the end of the chain has done the experiments, understands the theory, possesses the evidence. My knowledge is vicarious—I know because I know that they know.

The traditional picture of testimony treats this as transmission. The speaker has a belief with certain epistemic properties (justified, warranted, known). Through testimony, those properties are transmitted to the hearer. I come to know because the speaker knows, and their testimony conveys that knowledge to me.

This picture assumes a lot. It assumes there’s a speaker with a belief. It assumes the speaker is sincere—actually believing what they say. It assumes the speaker has the relevant epistemic standing—their belief is itself justified or known. The hearer receives what the speaker transmits.


Lackey’s Challenge

Jennifer Lackey argues this picture is wrong. We don’t learn from one another’s beliefs—we learn from one another’s words.

Her key case: Bertha suffered a brain injury that creates a precise mismatch between her beliefs and her statements. When she sees a deer, she believes it’s a horse but reports seeing a deer. Her perceptual beliefs are unreliable, but her testimony is perfectly reliable. Her statements track truth even though her beliefs don’t.

Henry, her neighbor, asks what she saw on the hiking trail. Bertha says “I saw a deer.” A deer was in fact there. Henry has good evidence of Bertha’s reliability as a testifier. Does Henry gain knowledge from her testimony?

Lackey argues yes. Henry knows there was a deer, even though Bertha doesn’t believe there was a deer, doesn’t know there was a deer, has no justified belief about deer at all. The chain of epistemic dependence doesn’t terminate in Bertha’s knowledge—she doesn’t have any. Yet testimony works.

The implication: speaker belief is not necessary for hearer knowledge. What matters is whether the statement is reliable, not whether the speaker has the corresponding belief.

This opens space for AI. If we learn from words rather than beliefs, then AI’s lack of beliefs might be epistemically irrelevant. The question becomes: are AI statements reliable?


The Debugging Case

Return to the bug. You ask the AI what’s wrong with your code. It says: “Off-by-one error in the loop on line 47.”

On Lackey’s view, the question isn’t whether the AI believes this, or knows this, or has justified belief. The question is whether the statement is reliable—whether it tracks truth in the relevant way.

But here the case strains even Lackey’s more permissive framework.

Lackey’s Bertha is an unusual case, but her statements are counterfactually reliable. If there hadn’t been a deer, Bertha wouldn’t have said she saw a deer. Her statements are sensitive to the facts, even though her beliefs aren’t. The mechanism that produces her testimony—whatever strange process the brain injury created—tracks truth.

AI outputs may not have this property. The model produces confident-sounding diagnoses whether or not they’re correct. The surface presentation doesn’t vary with accuracy. “Off-by-one error in line 47” sounds the same whether the diagnosis is right or wrong. The AI would have produced a confident statement regardless of whether the underlying claim is true.

This is closer to another of Lackey’s cases: the Almost a Liar. Jill tells Phil she saw an orca whale while boating. She did see one, and she believes she did. But Jill is starting a whale-watching business, and she would have reported seeing an orca whether or not she actually saw one—to promote her business. Phil’s belief, formed on Jill’s testimony, fails to be knowledge. The statement isn’t counterfactually reliable. Jill would have said the same thing either way.

AI outputs may have this structure. The training process produces systems that generate plausible-sounding responses, not systems whose outputs are counterfactually tied to truth. The model might often be right—but it would have produced a confident-sounding output regardless.

This is distinct from the accountability question—whether there’s someone to hold responsible. An output could be counterfactually reliable without anyone being accountable for it, or someone could be accountable for an output that isn’t counterfactually reliable. The concerns are independent, and AI outputs may fail on both.


Three Positions

The relationship between AI outputs and testimony admits several interpretations. These aren’t settled positions to choose among—they’re lenses that reveal different features of the problem.

The permissive view: AI outputs are testimony, or close enough. Lackey’s framework extends naturally. What matters is statement reliability, not speaker properties. If AI outputs are reliable in a given domain, they can ground knowledge. We should treat AI like a sophisticated version of Bertha—a source whose statements track truth despite lacking the usual mental states.

This view has appeal. It sidesteps questions about AI consciousness, beliefs, or understanding. It focuses on what we can assess: does this system tend to produce accurate outputs in this domain? If yes, reliance is warranted.

But it may extend Lackey’s framework beyond its intended scope. Lackey’s cases involve human speakers whose statements and beliefs come apart. The framework still assumes a speaker—someone who produces the statement, who could in principle be questioned, who exists in a social context of testimony exchange. AI may be a more radical departure: not testimony with unusual properties, but something that isn’t testimony at all.

The restrictive view: AI outputs are not testimony. Testimony requires a testifier—an agent who asserts, who stakes something on the claim, who can be held accountable. AI lacks these properties entirely. There’s no one who asserts “off-by-one error in line 47.” There’s no one to hold accountable if the diagnosis is wrong. The output emerges from a process, but no one stands behind it.

On this view, applying testimony theory to AI is a category mistake. We need a different framework—perhaps reliabilism (which doesn’t require a source with particular properties), perhaps something new.

The third category view: AI outputs are neither testimony nor mere instrument readings. They’re a distinct epistemic category—what Ori Freiman calls “technology-based belief.” Testimony involves human sources with mental states. Instrument readings involve simple causal chains (the thermometer responds to temperature). AI outputs are more complex than instruments but lack the features that make testimony work. They require their own epistemological treatment.

Freiman’s proposal is helpful precisely because it refuses to force AI outputs into existing categories. But it also makes clear what remains to be explained. Calling beliefs “technology-based” identifies their source, not their warrant. It does not yet tell us what epistemic norms govern such beliefs, how responsibility is distributed when they go wrong, or what practices ought to replace the testimonial checks that no longer apply. In that sense, the proposal clarifies the shape of the problem without settling it.

I find it difficult to say which of these is correct. The frameworks don’t settle the question—they offer different ways of carving up the conceptual space, each with different implications for how we should relate to AI outputs.


The Accountability Gap

What’s at stake in this classification?

Hardwig’s analysis of epistemic dependence emphasizes something that pure reliability assessments miss: the structure of accountability.

When I trust an expert’s diagnosis, I’m not just betting that they’re reliable. I’m entering a relationship where they bear responsibility for what they tell me. If they’re wrong, they’ve let me down in a way that goes beyond mere inaccuracy. They represented themselves as having knowledge they didn’t have. They can be challenged, asked for reasons, held to account.

This accountability structure does epistemic work. It gives me something to assess beyond raw reliability. I can ask: Is this person operating in good faith? Do they have reason to deceive me? Are they under social pressure that might distort their testimony? Do other experts agree? These checks don’t require me to evaluate the content of what they say—I can’t do that, which is why I’m depending on them in the first place. But they let me assess the conditions under which testimony is trustworthy.

AI outputs lack this structure. There’s no one operating in good faith or bad faith. No one has reason to deceive me—or reason not to. The concept of social pressure doesn’t apply. And “do other experts agree?” doesn’t translate—the AI’s output doesn’t represent expert consensus in any straightforward way.

The accountability gap means the checks appropriate for human testimony don’t work for AI. I can’t assess the AI’s good faith because it doesn’t have faith, good or bad. I can’t ask whether it’s lying because it can’t lie—lying requires intending to deceive. I can’t hold it responsible because there’s no one there to hold.

Where does the chain of epistemic dependence terminate? When I rely on the AI’s diagnosis, who knows that there’s an off-by-one error?

Not the AI—it doesn’t know anything, on most accounts. Not the developers—they didn’t make this specific claim. Not the training data—that’s a corpus, not a knower. Not the users who provided feedback during training—they responded to outputs, they didn’t author this one.

The chain seems to terminate in… nothing. Or in a process that produces outputs without anyone knowing the specific things those outputs claim.


The Hearer’s Position

Even if AI outputs could count as testimony, our position as hearers is unusual.

Lackey emphasizes that hearer conditions matter. Her case of Bill, who is compulsively trusting of Jill, shows that even reliable testimony doesn’t confer knowledge on a hearer who can’t be appropriately sensitive to defeaters. Bill would believe Jill no matter what—he’s incapable of doubting her. His belief fails to be knowledge because his relationship to the testimony is defective.

For AI, many users may be in a position closer to Bill’s than we’d like to admit. The confident presentation of AI outputs, combined with lack of expertise to evaluate them independently, can produce a kind of compulsive trust—not psychological compulsion, but practical inability to do otherwise. If I can’t assess whether the diagnosis is correct, and the AI presents it confidently, what am I supposed to do but accept it?

Hardwig’s analysis offers some comfort. The rational layman, he argues, can legitimately assess conditions of trustworthy testimony without assessing content. I can check whether the expert is biased, operating in good faith, under distorting pressures. I don’t need to evaluate their physics to evaluate their trustworthiness.

But these checks translate poorly to AI. The biases in AI systems are real but different in kind—statistical patterns in training data, not personal interests or motivated reasoning. Good faith doesn’t apply. The pressures on AI outputs are architectural (training objectives, RLHF), not social.

We’re left in an awkward position. The practices developed for receiving human testimony—the checks and filters and assessments that let us calibrate trust—don’t map cleanly onto AI outputs. We receive testimony-shaped outputs without the epistemic practices suited to them.


Where Testimony Theory Goes Silent

Testimony theory illuminates several things about AI outputs.

The shift from beliefs to statements opens conceptual space. Lackey’s framework shows that what matters is statement reliability, not speaker mental states. This means AI’s lack of beliefs isn’t automatically disqualifying—the question is whether the outputs track truth reliably.

Hearer conditions matter independently. Even if AI outputs are reliable, users can fail to gain knowledge if they’re not appropriately positioned—if they can’t recognize defeaters, can’t calibrate their trust, can’t maintain the right kind of sensitivity to evidence.

Something is missing. The accountability structure that characterizes human testimony—the ability to challenge, to demand reasons, to hold sources responsible—is absent from AI outputs. Whether this absence is epistemically significant or merely typical is a question testimony theory raises but doesn’t settle.

Testimony theory goes silent on deeper questions. Whether AI outputs count as testimony at all remains unclear. The category question—testimony, instrument, or something else—isn’t resolved by the frameworks. And what epistemic practices are appropriate for AI outputs specifically is left open.

These silences mark the boundaries of what testimony theory asks. The framework developed to analyze how we learn from one another’s words. Whether AI outputs are “one another’s words” in the relevant sense is a question at the edge of the framework’s scope.


What Comes Next

Testimony theory asks about the relationship between speaker and hearer. But there’s a prior question: What makes something an assertion at all?

Brandom’s inferentialism offers an answer. To assert is to undertake a commitment in the space of reasons. It’s to make oneself responsible for a claim, liable to challenge, obligated to provide reasons if asked. Assertion isn’t just producing a sentence with a certain form—it’s making a move in the game of giving and asking for reasons.

Can AI outputs be assertions in this sense? If there’s no one to hold the commitment, is anything being asserted? The AI produces sequences that look like assertions—“Off-by-one error in line 47” has the form of a claim. But is the AI undertaking a commitment? Is it entering the space of reasons? Or are its outputs something else—patterns that mimic assertions without being moves in the game?

If AI outputs aren’t really assertions, then perhaps the testimony question is moot. You can’t have testimony without assertion. But if AI outputs are assertions in some sense, we need to understand what sense—and what that means for our epistemic relationship to them.

That’s the question for the next post.