sotto voce

Reliabilism Applied

Reliabilism offers the cleanest answer to whether we should trust AI outputs. If the process tends to produce true outputs, beliefs based on those outputs are justified. That's the claim. Applying it to AI reveals whether the claim holds—and where it strains.

The outputs that matter here are those we rely on epistemically—outputs that function as knowledge claims. This includes the obvious cases: assertions, explanations, answers to questions. But it also includes AI-generated code (an implicit claim about what will work), analysis (a claim about what the data shows), recommendations (claims about what's best), and agent actions (implicit claims about what the user wants done). Wherever we rely on AI output as if it were knowledge, the epistemic question arises.

Consider a concrete case. You ask an AI to help debug code. It tells you the bug is an off-by-one error in a loop on line 47. It recommends changing < to <=. If you permit, it implements the fix. Each step involves reliance. Is that reliance warranted?

Reliabilism says: it depends on whether the process is reliable. But which process?


What Counts as "the Process"?

When you rely on the AI's diagnosis of your bug, what process produced it?

The candidates range from general to specific. AI inference. Large language model inference. This specific model's inference. This model on coding tasks. This model on debugging tasks in Python. This model on debugging this type of error given this codebase and this prompt.

Each description picks out a different process with a different reliability profile. Reliabilism needs to specify which level is relevant for assessing justification. It never has. The generality problem is decades old and unsolved for human cognition. For AI, the problem intensifies.

The same model, used differently, exhibits different reliability. A vague "fix my code" prompt produces different outputs than a detailed prompt with context, error messages, and iteration. An agent given clear constraints performs differently than one given ambiguous instructions.

This raises a harder question: is the user part of the process?

If yes, then reliability isn't a property of the AI system. It's a property of the human-AI dyad. The relevant question becomes: is this user, working with this model, in this way, reliable at producing correct diagnoses?

I find myself genuinely uncertain here. This isn't a difficulty I'm noting for completeness—it's a difficulty that threatens to undermine the application before it starts. If we can't say what the process is, how can we assess its reliability?

One response: Goldman's reliabilism already allows that the same cognitive faculty used in different conditions has different reliability profiles. Perception in fog versus daylight. Maybe human-AI interaction is similar—one process with contextual variation. The AI system is the process; the user's contribution is part of the conditions under which it operates.

But this feels like a dodge. When I prompt poorly and get bad output, is that the AI process being unreliable under certain conditions? Or is it a different process—one that includes my prompting as a constituent, not just a condition? I don't know how to settle this. The frameworks I'm applying don't have a clear answer, and I'm not sure the question has been posed in quite this form before.

What follows treats the AI system as the process—a simplifying assumption, not a resolution. The question of human-AI dyads as unified processes returns when we examine epistemic cooperation directly.

For now: even granting the simplifying assumption, reliabilism faces further problems.


Synchronic Opacity

Reliability is domain-relative. A process can be highly reliable for some outputs and unreliable for others. The model that's excellent at diagnosing common bugs may be unreliable on edge cases, unfamiliar languages, or subtle logic errors.

You ask about your bug. You get an answer: "off-by-one error in the loop on line 47." Is this a domain where the model is reliable?

You can't tell. The output doesn't signal which domain it's in. The model's confident tone is identical whether it's diagnosing a pattern it's seen a thousand times or confabulating about something it's never encountered. The reliability of the output depends on features you can't observe: how well this bug type is represented in training, whether the diagnosis requires reasoning the model handles well, whether your codebase has patterns that confuse it.

This opacity applies across output types. The assertion "it's an off-by-one error" might be reliable or confabulated. The recommendation "change < to <=" might be correct or might break something else. An agent's implementation of the fix might work or might introduce new bugs. In each case, you receive confident output without access to domain boundaries.

The asymmetry is structural. The model's outputs don't signal their own reliability. You can't distinguish high-reliability outputs from low-reliability outputs based on surface features. At any given moment, you can't assess which reliability domain you're in.


Diachronic Opacity

Reliabilism doesn't require you to understand how the process works. That's a feature of the framework. I don't know how my visual system processes light into perception. I can still have justified beliefs through vision because vision is reliable.

But vision's reliability is established. It's been tested by natural selection over millions of years. It's corroborated by extensive track records under varied conditions. Its failure modes are known and learnable—optical illusions, low light, peripheral vision limitations. We've had time to calibrate.

AI systems are different. They're novel processes with short track records. They change rapidly—the model you used last year differs from today's, and next year's will differ again. Their failure modes aren't fully characterized, and they shift as capabilities change. There's no evolutionary vetting, no long-term empirical foundation.

How reliable is this model at debugging Python code? We have some benchmarks. But benchmarks don't capture your codebase, your bug type, your prompting pattern. The track record that would justify trust at the relevant level of specificity doesn't exist yet.

The verification problem varies by what you're relying on. For the assertion that this is the bug, you can sometimes verify after the fact—but verification takes effort and expertise, and you're often relying on the AI precisely because you lack that expertise. For the recommendation to fix it a certain way, you can test the fix—but testing is incomplete, and bugs hide. For an agent action implementing the fix, consequences may be irreversible before you can assess whether the action was reliable.

Over time, we lack the track record to have learned the reliability profile at the level of specificity that matters. Reliance may be running ahead of evidence.


The Calibration Gap

Even if a process is reliable, warranted reliance requires calibration. Your confidence should track the process's actual reliability. AI outputs create a calibration gap through two distinct mechanisms.

The first is presentation miscalibration. AI outputs present uniform surface confidence regardless of actual reliability. The model's diagnosis "off-by-one error in line 47" sounds the same whether it's highly probable or a guess. Code appears syntactically correct whether or not it handles edge cases. Recommendations are delivered with equal confidence whether well-grounded or shallow. The surface features don't track epistemic status.

The second is reception miscalibration. Users treat outputs as more reliable than they are. Confident presentation encourages confident reception. Users who lack expertise to evaluate outputs independently—often the same users relying on AI in the first place—are most susceptible.

These problems compound. The model says "off-by-one error" with no hedging. You implement the fix. It works. Next time, same confident tone, but the diagnosis is wrong. The fix breaks something else. Nothing in the presentation distinguished these cases. What have you learned about calibration? The model's presentation gave you nothing to work with.

Users who interact with AI poorly—vague prompts, no iteration, no verification—get less reliable outputs. But the surface confidence remains constant. The calibration gap is widest for users least equipped to close it.

There's a further complication. If the human-AI interaction is the process, then reception miscalibration isn't external to reliability—it's part of it. A user who consistently over-trusts is contributing to an unreliable process. The user's epistemic failure becomes part of the reliability profile of the dyad.

Reliability of the process may be necessary for warranted reliance. It's not sufficient. Calibration matters—both how AI presents and how users receive.


What Reliabilism Offers

After these complications, what does reliabilism still offer?

Consider what other frameworks would ask about the debugging case. Testimony theory would ask: is the AI a testifier? Does it assert? Can it be held accountable? Inferentialism would ask: does the AI grasp the concept of "bug"? Is it in the space of reasons?

Reliabilism asks something simpler: does the process tend to produce correct diagnoses?

Whether simpler is better is precisely what the other frameworks contest. Testimony theory thinks the source-hearer relationship matters independently. Inferentialism thinks genuine understanding matters independently. Reliabilism says: if the process is reliable, those questions are secondary. The disagreement is real, and I'm not resolving it here—only noting that reliabilism's simplicity is either a virtue or an evasion, depending on what the other frameworks reveal.

What reliabilism undeniably provides is tractability. We can study reliability empirically. We could run this model on a thousand debugging tasks of this type and measure accuracy. That's a tractable study. We can identify conditions under which reliability improves or degrades. We can compare systems. We don't need to settle questions about AI minds, AI understanding, or AI accountability to make progress on reliability.

The debugging case shows this. We don't need to know whether the model "understands" Python to ask whether its debugging outputs are accurate. We can test. We can measure. We can improve.

Reliabilism provides a tractable empirical question—does this work?—that applies across output types. It offers a framework that takes AI seriously as an epistemic source without requiring claims about AI cognition. These are real advantages, not because reliabilism is complete, but because it gives us something to do while other questions remain open.


Where Reliabilism Goes Silent

Reliabilism illuminates reliability. It goes silent on other questions.

It doesn't tell us which description of the process is correct. Reliabilism needs an answer to the generality problem but doesn't provide one.

It doesn't tell us how users should assess reliability when they can't observe domain boundaries or track records. Reliabilism is a theory of what makes belief justified, not a guide to epistemic practice.

It doesn't tell us whether it matters that the source is AI rather than human—independent of reliability. Reliabilism's answer is that it doesn't matter: reliability is what matters. But this is an answer by stipulation, not argument. Whether source type has independent epistemic significance is a question reliabilism sets aside.

On action specifically, reliabilism strains. The framework was developed for belief-forming processes—processes that produce representations that can be true or false. Actions complicate this. What does "reliable" mean for the AI implementing your code fix?

Reliable at executing your stated intention? Reliable at achieving good outcomes?

These come apart. The AI implements a fix that matches your intention exactly—and introduces a security vulnerability you didn't anticipate. Was the process reliable? At executing intention, yes. At achieving good outcomes, no.

For reliabilism to handle this, it would need a theory of action-reliability distinct from belief-reliability—some account of what makes an action-producing process reliable, and reliable at what. It doesn't have one. The framework was built for belief, and the extension to action isn't straightforward.

This doesn't mean reliabilism fails for agentic AI. It means the extension is genuinely uncertain—not a problem to be solved by careful application, but a question about whether the framework fits the phenomenon.

These silences aren't failures. They're the boundaries of what reliabilism asks. The framework illuminates one dimension—truth-conduciveness of the process—and goes silent on others. The silences indicate where other frameworks may speak.


What Comes Next

Reliabilism asks: is the process reliable?

It doesn't ask: what is owed between the source and the one who relies on it?

When you trust a human expert's diagnosis of your bug, something beyond reliability seems to be in play. The expert can be held accountable. They stake something on what they say. They stand in a certain relationship to their assertions—one that involves responsibility, the possibility of being wrong and answerable for it. If they mislead you, something has gone wrong beyond mere inaccuracy.

Testimony theory takes this seriously. It asks what's required of sources and what's required of hearers—not just whether the process is reliable. Can AI be held accountable? Does accountability matter for warranted trust, or is reliability enough?

Reliabilism says accountability is beside the point. Testimony theory disagrees. That's the question for the next post.