Blog · Evidence & admissibility

When an AI helps with a Daubert challenge.

Daubert is fundamentally a question of method. AI does not change the question. It can make the answer reproducible.

A Daubert challenge is not, at its core, a fight about whether an expert is smart, credentialed, or correct. It is a fight about whether the method the expert used is one the court should let a jury hear. The proponent of the testimony — the side offering the expert — bears the burden under Federal Rule of Evidence 702 to demonstrate that the testimony is reliable, helpful, and grounded in sufficient facts. The trial judge sits as a gatekeeper, applying the framework the Supreme Court set out in Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579 (1993).

When part of that method involves an AI pipeline — transcription, contradiction detection, speaker attribution, statement-level analysis — the gate gets narrower, not wider. The opposing side will (correctly) press hard on whether the model is reliable, whether the pipeline was run consistently, whether anyone can reproduce what the expert says happened. If the answers to those questions are vague, the testimony is in trouble before it ever reaches the jury.

Felarity does not make an AI “Daubert-proof.” Nothing does. But the attestation chain is engineered for exactly this kind of scrutiny, and it gives the proponent something concrete to put in front of the judge.

The Daubert factors, briefly

The Court in Daubert identified a non-exhaustive list of considerations relevant to whether scientific testimony rests on a reliable foundation:

Whether the technique can be (and has been) tested.
Whether it has been subjected to peer review and publication.
The known or potential error rate, and the existence of standards controlling the technique’s operation.
Whether the technique enjoys general acceptance within the relevant community.

Rule 702, as amended in 2023, sharpened the proponent’s burden of proof and made explicit that the court must find each requirement met by a preponderance. The catch-all reminder — that the expert’s opinion must reflect a reliable application of the principles and methods to the facts of the case — is doing more work post-amendment than it used to.

What the AI-skeptical side will say

Take the skepticism seriously, because it is mostly fair:

“The model is a black box. We can’t test it.”
“You ran it once, three months ago. We can’t reproduce the run.”
“Different versions of the model would give different answers.”
“You haven’t shown the error rate on inputs that look like ours.”
“Nobody in this field has accepted this technique.”

If the proponent has to answer those questions from memory, with screenshots, the testimony is exposed. The attestation chain exists so the proponent does not have to.

Mapping the chain to each factor

Testability — via hashed reproducibility

Each of the eight nodes in the chain (audio capture, transcription, contradiction detection pre-attribution, diarization, attribution binding, acoustic analysis, topology, final report) is hashed with SHA-256. The hashes are linked, Merkle-style, and the root is signed with Ed25519. The signing public key is published at /.well-known/felarity-signing-key.pem and the verifier endpoint is POST /api/verify. The opposing side, the court’s technical advisor, or an independent expert can request the input artifact and confirm, byte-for-byte, that the inputs at each stage match what is in the report. The method becomes testable in the literal sense: re-run the pipeline on the same audio, and the same hashes should appear.

Peer review — via documented composition

Felarity is not a single model. It is a composition of components that are peer-reviewed and widely published: OpenAI Whisper large-v3 for transcription, pyannote-audio 3.1 for speaker diarization, and DeBERTa-v3 fine-tuned on MNLI for natural-language inference scoring of candidate contradictions. The composition itself, and the topology and acoustic layers we add on top, are described in our public documentation. None of this hides behind “proprietary.” A reviewer can read the papers for the underlying components and read our composition description and form an opinion.

Known error rate — component-by-component

The honest version of this factor is that the end-to-end system does not have a single published error rate, because the end-to-end task (“find the consequential contradictions in a multi-party meeting”) is not a benchmark anyone runs. What the proponent can put on the record is the published error rate of each component on its standard benchmark — WER for Whisper, DER for pyannote, MNLI accuracy for DeBERTa-v3 — plus our internal evaluation of contradiction precision and recall on a labelled corpus. We do not invent a single number for the whole pipeline. We document the components.

General acceptance — in the relevant community

Whisper, pyannote, and DeBERTa-v3 are accepted tools in the speech and NLP research communities and are deployed in production by enough organizations to be uncontroversial as building blocks. The forensic application of contradiction detection over meeting transcripts is newer, and the proponent should say so plainly rather than over-claim. The relevant community is still forming. That is a candor point, not a fatal one — Daubert never required unanimous consensus.

The Rule 702 catch-all — reliable application to these facts

This is the factor the chain helps with most. The court does not just want to know that the method is reliable in general. It wants to know that in this case, on this audio, the method was applied as designed. The attestation chain records: which model versions ran, in what order, on what input bytes, producing what intermediate outputs, leading to which final claims. The proponent can show the judge a single signed document and a verification endpoint. That is what reliable application to the facts of this case looks like in 2026.

What the chain does NOT do

Be candid with the court — opposing counsel will be:

It does not prove the model’s output is correct. It proves the output was generated by the documented pipeline on the documented input, and has not been altered since.
It does not eliminate model error. Whisper still mis-hears words. Pyannote still merges or splits speakers under hard acoustics. DeBERTa-v3 still scores some non-contradictions as contradictions.
It does not authenticate the source audio. If the audio itself is disputed (chain of custody, alteration before ingestion), the attestation only covers what happened after Felarity received the file.
It does not satisfy Frye in the handful of jurisdictions still applying it. Frye’s “general acceptance” bar is higher than Daubert’s, and the field is too new.
It does not make the expert’s interpretation of the contradictions admissible. The chain authenticates the inputs the expert used. The expert still has to be qualified and still has to explain their reasoning.

Practical checklist for the proponent

Disclose the pipeline composition in the expert’s Rule 26 report — component models, versions, the eight-node chain, the signing key location.
Produce the signed attestation alongside the transcript and contradiction list. Do not produce the conclusions without the chain.
Offer the opposing side the verification command and the public key in writing. Make non-reproducibility their problem, not yours.
Have the expert state plainly which questions the AI answered (transcription, diarization, candidate contradiction surfacing) and which questions the expert answered (which contradictions matter, what they mean, how confident).
Concede component error rates rather than fight them. Concession on a known-error-rate factor is usually a win on the credibility of the witness.
Be ready to walk the judge through one example, end-to-end — this thirty-second slice of audio, this hash, this transcript line, this contradiction, this signed root. Concrete beats abstract every time at a hearing.

The smaller point, and the bigger one

The smaller point: an attestation chain helps a proponent answer the specific questions Daubert and Rule 702 require to be answered. It will not win a hearing on its own. It removes the failure modes — “we can’t reproduce the run,” “we don’t know what version ran,” “the model is a black box” — that lose hearings even when the underlying analysis is sound.

The bigger point: AI in litigation is not going to retreat. Courts are going to see more of it, not less. The systems that survive scrutiny will be the ones that produce evidence you can verify from outside the institution that produced it. Felarity is built on that premise. Private. Remembered. Defended.

Nothing in this article is legal advice. Consult counsel admitted in the relevant jurisdiction before relying on any of this in a specific matter.