# No detectable in-silico molecular mimicry between bacterial CdtB and human vinculin: a rigorous, null-controlled, multi-level immunoinformatics analysis

> A reproducible, null-controlled immunoinformatics pipeline (CVMP) tests the proposed molecular mimicry between bacterial CdtB and human vinculin, the mechanism behind commercialized anti-CdtB / anti-vinculin IBS biomarkers, at five independent levels. The pipeline recovers a known positive control (streptococcal M5 vs cardiac myosin) and rejects a negative control, but finds no statistically robust CdtB-vinculin mimicry at the sequence, structural, or B-cell-epitope level. The result constrains, but does not disprove, the clinical phenomenon and relocates the burden of proof to direct experiment.

_Source: https://leovanschaik.xyz/research/cdtb-vinculin-mimicry/_

---

Independent / SIBO Research. Correspondence: leo@etherian.io. Code and data:
[github.com/leonardo-vanschaik/cdtb-vinculin-mimicry](https://github.com/leonardo-vanschaik/cdtb-vinculin-mimicry)
(MIT). Status: hypothesis-generating / cautionary negative.

## Abstract

**Background.** Post-infectious irritable bowel syndrome (IBS) and small-intestinal bacterial overgrowth (SIBO) have been attributed to molecular mimicry between the cytolethal distending toxin B subunit (CdtB) of enteric bacteria and the human cytoskeletal protein vinculin: gastroenteritis is proposed to raise anti-CdtB antibodies that cross-react with vinculin in the enteric nervous system, and circulating anti-CdtB/anti-vinculin antibodies are marketed as IBS biomarkers. The cross-reactive epitope has never been published as a defined sequence, no in-silico CdtB-vs-vinculin mimicry analysis exists, and independent diagnostic replication is weak.

**Methods.** We built an open, reproducible, null-controlled immunoinformatics pipeline (CVMP) and tested CdtB↔vinculin mimicry at five independent levels: (i) exact pentapeptide identity, (ii) local-alignment similarity, (iii) fold/domain and (iv) fine surface-patch structural superposition, (v) overlap with experimentally-validated IEDB B-cell epitopes and with DiscoTope-3.0-predicted conformational epitopes, each against rigorous empirical null models (dipeptide-preserving and composition-preserving sequence shuffles; length-stratified random-fragment RMSD nulls) with Benjamini-Hochberg correction. A bona-fide mimic (streptococcal M5 protein ↔ cardiac myosin) and a non-mimic (E. coli MalE ↔ vinculin) were the positive/negative controls.

**Results.** The pipeline recovered the positive control (M5↔cardiac-myosin local-alignment empirical p = 0.002, z = 7.7, spanning the known cross-reactive region) and rejected the negative control (p = 0.71). Across 72 non-redundant CdtB variants (family identity 16 to 100%) and human vinculin (isoform P18206-2), we found no statistically robust mimicry at any level: no FDR-significant linear similarity (best local-alignment score 71 vs 225 for the true mimic), no significant fold/domain (Foldseek best e-value 4.7) or fine-patch structural mimicry, 0 of 153 experimental vinculin B-cell epitopes matched within CdtB, and no overlap between the conformational B-cell epitopes the two proteins present.

**Conclusion.** The CdtB-vinculin mimicry hypothesis has no in-silico support at the sequence, structural, or B-cell-epitope level. This is concordant with the weak independent clinical replication of the biomarkers and with the principle that short-peptide similarity is statistically ubiquitous. The result constrains, but does not disprove, the clinical phenomenon, and relocates the burden of proof to direct experiment; we propose a peptide-competition ELISA as the decisive test. The benchmarked pipeline is released open-source.

## 1. Introduction

A widely promoted model of post-infectious IBS/SIBO, originating from Cedars-Sinai (Pimentel and colleagues), holds that acute gastroenteritis induces antibodies against bacterial CdtB, the active, DNase-I-family subunit of cytolethal distending toxin, produced by *Campylobacter jejuni*, pathogenic *E. coli*, *Shigella*, *Salmonella*, *Helicobacter*, *Aggregatibacter* and *Haemophilus*, which then cross-react via molecular mimicry with host vinculin in the interstitial cells of Cajal and myenteric plexus, impairing motility and permitting bacterial overgrowth. Anti-CdtB and anti-vinculin ELISAs are commercialized as IBS biomarkers.

Three gaps motivate a rigorous computational test. First, **the cross-reactive epitope has never been disclosed**: the originating patents reference only an antigenic CdtB peptide and sequence identifiers, not an aligned CdtB-vinculin map. Second, **no in-silico CdtB-vs-vinculin mimicry analysis has been published**, nor a ranking of cdt-bearing pathogens by predicted vinculin mimicry. Third, **independent replication is weak**: multiple cohorts and a 2024 to 2025 meta-analysis report poor sensitivity or failure to discriminate.

One methodological hazard sits underneath any such test: short-peptide "mimicry" is statistically ubiquitous. No human protein lacks a bacterial pentapeptide motif (Trost, Lucchese, Kanduc et al., 2010), so any mimicry claim must beat an explicit empirical null, and recent work shows sequence identity is near-uncorrelated with structural similarity at the peptide level (MimicryDB-Auto, 2026). We therefore built a multi-level, null-controlled, benchmarked pipeline and applied it to CdtB↔vinculin across all cdt-bearing taxa.

## 2. Methods

**Data and provenance.** Human vinculin isoform **P18206-2** (1066 aa; the ubiquitous, gut-relevant form) was the primary antigen; canonical P18206 (metavinculin, 1134 aa) was retained as a secondary ensemble member, with an insert-aware coordinate mapper (insert at canonical 916 to 983). CdtB sequences were retrieved from UniProt across cdt-bearing taxa (gene + protein name), filtered to full-length (220 to 330 aa, fragments removed) and clustered to non-redundancy at 90% identity (MMseqs2), yielding 207 records → 117 full-length → **72 non-redundant variants** (pairwise identity 16 to 100%, median 52%). Experimental structures: CdtB 1SR4/2F1N/2F2F/4K6L; vinculin 1TR2/6FUY/1RKE; AlphaFold DB models (v6) AF-P18206-2, AF-P18206, AF-Q46101. Ground-truth B-cell epitopes were obtained from the IEDB IQ-API (vinculin n = 153 positive; CdtB n = 4). The human reviewed proteome (UP000005640, 20,416 sequences) and *Bacillus subtilis* (UP000001570) were the backgrounds/controls. All accessions, releases and checksums are recorded (`config/provenance/`).

**Sequence arm.** Exact pentapeptide longest-common-substring ("1D-mimic") and Smith-Waterman local alignment (BLOSUM62) between each CdtB variant and vinculin.

**Null models.** Exact-k-mer significance vs an Altschul-Erickson **dipeptide-preserving** shuffle; local-alignment significance vs a **composition-preserving** shuffle (classic shuffle-and-realign); ≥10³ iterations; empirical p with the +1/(N+1) correction; Benjamini-Hochberg FDR across variants/epitopes.

**Structure arm.** Foldseek easy-search (CdtB vs vinculin ensemble) for fold/domain similarity; fine surface-patch superposition (Cα, contiguous 5-to-12-mer windows, relative SASA ≥ 20%) vs a **length-stratified random-fragment RMSD null** (EMoMiS convention; Z < −1.645; RMSD ≤ 1 Å), with FDR.

**Epitope arms.** (a) Each IEDB vinculin B-cell epitope was tested for exact whole-epitope identity within CdtB and local-alignment similarity vs a per-epitope composition null (the method of the HPV-L1 negative, Int J Clin Oncol 2026). (b) DiscoTope-3.0 predicted conformational B-cell epitopes on vinculin and CdtB structures (DTU web server); contiguous conformational-epitope regions were compared for exact and local-alignment mimicry vs a composition null with FDR.

**Controls.** Positive: *S. pyogenes* M5 protein (P02977) ↔ cardiac myosin MYH7 (P12883), the rheumatic-fever precedent. Negative: *E. coli* MalE (P0AEX9) ↔ vinculin. Background: *B. subtilis* proteome (length-matched, max-vs-max bootstrap + Mann-Whitney).

**Reproducibility.** Seed-fixed Snakemake/Python pipeline; every table regenerates from accessions; code + provenance public under MIT. Structure prediction and epitope prediction used AlphaFold DB (v6) and the DiscoTope-3.0 web server respectively; no local GPU required.

## 3. Results

**3.1 Pipeline calibration (Fig 1).** The positive control recovered the established mimic: M5↔cardiac-myosin local-alignment BLOSUM62 score 225, empirical p = 0.0020 (z = 7.7) vs the composition null, with the aligned span (M5 ≈ 54 to 419) containing the known cross-reactive region 84 to 116 (Dale & Beachey, 1986). The negative control (MalE↔vinculin) was non-significant (p = 0.71). The pipeline detects a true mimic and rejects a non-mimic.

<figure class="research-figure">
  <div class="crt">
    <img class="crt__base" src="/research/cdtb-vinculin/fig1_controls.svg" alt="Control calibration: the positive control M5 vs cardiac myosin is recovered as significant, the negative control MalE vs vinculin is rejected." loading="lazy" decoding="async" />
    <img class="crt__top" src="/research/cdtb-vinculin/fig1_controls.svg" alt="" aria-hidden="true" loading="lazy" decoding="async" />
  </div>
  <figcaption><strong>Fig 1.</strong> Control calibration: positive (M5↔myosin) recovered, negative (MalE↔vinculin) rejected.</figcaption>
</figure>

**3.2 Sequence arm: negative (Fig 2).** Across 72 non-redundant CdtB variants, no CdtB↔vinculin region survived FDR. The strongest local-alignment hit (an *Aggregatibacter*-class variant) had nominal p ≈ 0.01 but did not survive Benjamini-Hochberg correction; the best exact-pentapeptide hit had p ≈ 0.14. The single best CdtB local alignment to vinculin scored 71, roughly one-third of the positive control (225).

<figure class="research-figure">
  <div class="crt">
    <img class="crt__base" src="/research/cdtb-vinculin/fig2_sequence_arm.svg" alt="Per-variant empirical p-value distribution across 72 CdtB variants; none are FDR-significant." loading="lazy" decoding="async" />
    <img class="crt__top" src="/research/cdtb-vinculin/fig2_sequence_arm.svg" alt="" aria-hidden="true" loading="lazy" decoding="async" />
  </div>
  <figcaption><strong>Fig 2.</strong> Sequence arm: per-variant empirical-p distribution across 72 CdtB variants (0 FDR-significant).</figcaption>
</figure>

**3.3 Fold/domain structure arm: negative.** Foldseek returned 46 short local alignments between CdtB (DNase-I fold) and vinculin (α-helical bundles), none significant (best e-value 4.7, probability 0.000), as expected for unrelated folds.

**3.4 Fine surface-patch arm: negative.** Pairwise Cα superposition of surface patches produced near-perfect short matches (best RMSD 0.11 Å over 7 residues) that are generic secondary-structure fragments; 0 survived the length-stratified null after FDR. The test is anti-conservative (best-of-many) yet still negative. This is the structural analogue of the short-peptide-ubiquity principle.

<figure class="research-figure">
  <div class="crt">
    <img class="crt__base" src="/research/cdtb-vinculin/fig3_identity_matrix.svg" alt="Pairwise percent-identity matrix across the non-redundant CdtB family, ranging 16 to 100 percent." loading="lazy" decoding="async" />
    <img class="crt__top" src="/research/cdtb-vinculin/fig3_identity_matrix.svg" alt="" aria-hidden="true" loading="lazy" decoding="async" />
  </div>
  <figcaption><strong>Fig 3.</strong> CdtB family pairwise %identity matrix (16 to 100%).</figcaption>
</figure>

**3.45 Conformational epitope arm: negative.** DiscoTope-3.0 predicted conformational B-cell epitopes on vinculin (P18206-2; 248 epitope residues) and the CdtB family (AlphaFold C. jejuni + experimental CdtB chains; 46 to 55 epitope residues each). Restricting the mimicry search to the contiguous conformational-epitope regions both proteins present (5 vinculin, 14 CdtB peptides), there were 0 exact matches and 0 FDR-significant similarities (best p = 0.25).

**3.5 Background comparison.** A non-pathogen proteome comparison (CdtB vs *B. subtilis*, both vs vinculin) was sensitive to the control length/composition window (best-hit p ranging 0.0005 to 0.55) and is therefore not a reliable test; we rely on the per-variant shuffle-null with FDR, which is negative throughout.

**3.6 Experimental-epitope arm: negative (Fig 4).** Of the 153 experimentally-validated IEDB vinculin B-cell epitopes, **0 occur (as a whole epitope) within any CdtB variant** and 0 show FDR-significant similarity (best nominal p = 0.033). The four IEDB CdtB epitopes likewise match nothing in vinculin. This directly tests whether the regions antibodies actually target are mimicked.

<figure class="research-figure">
  <div class="crt">
    <img class="crt__base" src="/research/cdtb-vinculin/fig4_epitope_overlap.svg" alt="Experimental and conformational epitope-overlap p-value distributions; none are significant." loading="lazy" decoding="async" />
    <img class="crt__top" src="/research/cdtb-vinculin/fig4_epitope_overlap.svg" alt="" aria-hidden="true" loading="lazy" decoding="async" />
  </div>
  <figcaption><strong>Fig 4.</strong> Experimental and conformational epitope-overlap p-value distributions (0 significant).</figcaption>
</figure>

**3.7 Pathogen ranking.** Under FDR control no cdt-bearing pathogen's CdtB variant passes the staged filter; the predicted "SIBO-risk ranking" is empty.

## 4. Discussion

Across five independent levels (sequence identity, sequence similarity, fold/domain structure, fine surface-patch structure, and both experimental and predicted B-cell epitopes) bacterial CdtB shows **no statistically robust molecular mimicry of human vinculin**, while the pipeline recovers a bona-fide mimic and rejects a non-mimic. The negative is therefore unlikely to be a methodological false-negative; it is consistent with the weak independent clinical replication of anti-CdtB/anti-vinculin biomarkers and with the principle that short-peptide and short-patch similarity are statistically ubiquitous and mostly coincidental.

**Limitations / what would change the conclusion.** (i) The cross-reactive epitope could be **discontinuous**; the conformational-epitope arm (DiscoTope-3) is negative, leaving only a full surface shape-and-electrostatics (MaSIF-style) comparison untested. (ii) Antibody cross-reactivity can arise at the paratope level and need not be visible in antigen-antigen comparison; current antibody-antigen docking (AlphaFold3/Boltz, <15% accuracy) cannot adjudicate this. (iii) Post-translational modification, conformational state, or species differences (animal-model vinculin) could matter. (iv) Anti-vinculin antibodies occur in other conditions (e.g., systemic sclerosis), so the biomarker need not reflect CdtB-driven mimicry at all. **In-silico mimicry is not cross-reactivity:** this analysis constrains the mechanism but does not disprove the clinical association.

**Contribution beyond the negative.** We release an open, benchmarked, null-controlled mimicry pipeline (positive control included), a non-redundant CdtB family with a computed identity matrix, and a vinculin B-cell-epitope map: reusable resources for any host-pathogen mimicry question.

## 5. Proposed experimental validation

A peptide-competition ELISA is the decisive test: synthesize the top predicted shared CdtB peptide(s) and assay whether they block anti-CdtB binding to vinculin (mirroring the undisclosed "CdtB blocking peptide"); alanine-scan/truncation to map any minimal epitope; and localize any cross-reactive region on vinculin (head 1 to 835 vs tail 879 to 1066), which the field has never reported.

## 6. Data and code availability

Open pipeline (MIT) at [github.com/leonardo-vanschaik/cdtb-vinculin-mimicry](https://github.com/leonardo-vanschaik/cdtb-vinculin-mimicry). All results regenerate from `config/accessions/` + `config/provenance/manifest.json` via Snakemake. An archived release with a Zenodo DOI will accompany submission.

## References (key)

1. Pimentel M, et al. Development and validation of a biomarker for diarrhea-predominant IBS. *PLoS One* 2015;10(5):e0126438.
2. Morales W, et al. Second-generation biomarker testing for IBS. *Dig Dis Sci* 2019;64:3115-21.
3. Barros LL, et al. *BMC Gastroenterol* 2024;24:448. / Mansoor review, *J Pak Med Assoc* 2024;74:1300-8.
4. Trost B, Lucchese G, Kanduc D, et al. No human protein is exempt from bacterial motifs. *Self/Nonself* 2010;1(4):328-34.
5. Balbin CA, et al. Epitopedia: identifying molecular mimicry between pathogens and known immune epitopes. *ImmunoInformatics* 2023;9:100023.
6. Høie MH, et al. DiscoTope-3.0: improved B-cell epitope prediction using inverse folding latent representations. *Front Immunol* 2024;15:1322712.
7. van Kempen M, et al. Fast and accurate protein structure search with Foldseek. *Nat Biotechnol* 2024.
8. Mirdita M, et al. ColabFold: making protein folding accessible to all. *Nat Methods* 2022;19:679-82.
9. Jumper J, et al. Highly accurate protein structure prediction with AlphaFold. *Nature* 2021;596:583-9.
10. Dale JB, Beachey EH. Sequence of myosin-crossreactive epitopes of streptococcal M protein. *J Exp Med* 1986;164:1785.
11. MimicryDB-Auto: structural validation reveals the inadequacy of sequence-based mimicry screening. *Preprints* 2026, 202603.2306.
12. Lack of molecular mimicry between HPV vaccine L1 antigen and human proteins by computational analysis. *Int J Clin Oncol* 2026.
13. Shreiner AB, et al. Anti-vinculin antibodies in systemic sclerosis. *Arthritis Care Res* 2023.
14. Noori et al. What does AlphaFold3 learn about antibody and nanobody docking. *mAbs* 2025.
