← Research
preprint 2026 Preprint

No detectable in-silico molecular mimicry between bacterial CdtB and human vinculin: a rigorous, null-controlled, multi-level immunoinformatics analysis

Leo Van Schaik

Abstract

A reproducible, null-controlled immunoinformatics pipeline (CVMP) tests the proposed molecular mimicry between bacterial CdtB and human vinculin, the mechanism behind commercialized anti-CdtB / anti-vinculin IBS biomarkers, at five independent levels. The pipeline recovers a known positive control (streptococcal M5 vs cardiac myosin) and rejects a negative control, but finds no statistically robust CdtB-vinculin mimicry at the sequence, structural, or B-cell-epitope level. The result constrains, but does not disprove, the clinical phenomenon and relocates the burden of proof to direct experiment.

Independent / SIBO Research. Correspondence: [email protected]. Code and data: github.com/leonardo-vanschaik/cdtb-vinculin-mimicry (MIT). Status: hypothesis-generating / cautionary negative.

Abstract

Background. Post-infectious irritable bowel syndrome (IBS) and small-intestinal bacterial overgrowth (SIBO) have been attributed to molecular mimicry between the cytolethal distending toxin B subunit (CdtB) of enteric bacteria and the human cytoskeletal protein vinculin: gastroenteritis is proposed to raise anti-CdtB antibodies that cross-react with vinculin in the enteric nervous system, and circulating anti-CdtB/anti-vinculin antibodies are marketed as IBS biomarkers. The cross-reactive epitope has never been published as a defined sequence, no in-silico CdtB-vs-vinculin mimicry analysis exists, and independent diagnostic replication is weak.

Methods. We built an open, reproducible, null-controlled immunoinformatics pipeline (CVMP) and tested CdtB↔vinculin mimicry at five independent levels: (i) exact pentapeptide identity, (ii) local-alignment similarity, (iii) fold/domain and (iv) fine surface-patch structural superposition, (v) overlap with experimentally-validated IEDB B-cell epitopes and with DiscoTope-3.0-predicted conformational epitopes, each against rigorous empirical null models (dipeptide-preserving and composition-preserving sequence shuffles; length-stratified random-fragment RMSD nulls) with Benjamini-Hochberg correction. A bona-fide mimic (streptococcal M5 protein ↔ cardiac myosin) and a non-mimic (E. coli MalE ↔ vinculin) were the positive/negative controls.

Results. The pipeline recovered the positive control (M5↔cardiac-myosin local-alignment empirical p = 0.002, z = 7.7, spanning the known cross-reactive region) and rejected the negative control (p = 0.71). Across 72 non-redundant CdtB variants (family identity 16 to 100%) and human vinculin (isoform P18206-2), we found no statistically robust mimicry at any level: no FDR-significant linear similarity (best local-alignment score 71 vs 225 for the true mimic), no significant fold/domain (Foldseek best e-value 4.7) or fine-patch structural mimicry, 0 of 153 experimental vinculin B-cell epitopes matched within CdtB, and no overlap between the conformational B-cell epitopes the two proteins present.

Conclusion. The CdtB-vinculin mimicry hypothesis has no in-silico support at the sequence, structural, or B-cell-epitope level. This is concordant with the weak independent clinical replication of the biomarkers and with the principle that short-peptide similarity is statistically ubiquitous. The result constrains, but does not disprove, the clinical phenomenon, and relocates the burden of proof to direct experiment; we propose a peptide-competition ELISA as the decisive test. The benchmarked pipeline is released open-source.

1. Introduction

A widely promoted model of post-infectious IBS/SIBO, originating from Cedars-Sinai (Pimentel and colleagues), holds that acute gastroenteritis induces antibodies against bacterial CdtB, the active, DNase-I-family subunit of cytolethal distending toxin, produced by Campylobacter jejuni, pathogenic E. coli, Shigella, Salmonella, Helicobacter, Aggregatibacter and Haemophilus, which then cross-react via molecular mimicry with host vinculin in the interstitial cells of Cajal and myenteric plexus, impairing motility and permitting bacterial overgrowth. Anti-CdtB and anti-vinculin ELISAs are commercialized as IBS biomarkers.

Three gaps motivate a rigorous computational test. First, the cross-reactive epitope has never been disclosed: the originating patents reference only an antigenic CdtB peptide and sequence identifiers, not an aligned CdtB-vinculin map. Second, no in-silico CdtB-vs-vinculin mimicry analysis has been published, nor a ranking of cdt-bearing pathogens by predicted vinculin mimicry. Third, independent replication is weak: multiple cohorts and a 2024 to 2025 meta-analysis report poor sensitivity or failure to discriminate.

One methodological hazard sits underneath any such test: short-peptide “mimicry” is statistically ubiquitous. No human protein lacks a bacterial pentapeptide motif (Trost, Lucchese, Kanduc et al., 2010), so any mimicry claim must beat an explicit empirical null, and recent work shows sequence identity is near-uncorrelated with structural similarity at the peptide level (MimicryDB-Auto, 2026). We therefore built a multi-level, null-controlled, benchmarked pipeline and applied it to CdtB↔vinculin across all cdt-bearing taxa.

2. Methods

Data and provenance. Human vinculin isoform P18206-2 (1066 aa; the ubiquitous, gut-relevant form) was the primary antigen; canonical P18206 (metavinculin, 1134 aa) was retained as a secondary ensemble member, with an insert-aware coordinate mapper (insert at canonical 916 to 983). CdtB sequences were retrieved from UniProt across cdt-bearing taxa (gene + protein name), filtered to full-length (220 to 330 aa, fragments removed) and clustered to non-redundancy at 90% identity (MMseqs2), yielding 207 records → 117 full-length → 72 non-redundant variants (pairwise identity 16 to 100%, median 52%). Experimental structures: CdtB 1SR4/2F1N/2F2F/4K6L; vinculin 1TR2/6FUY/1RKE; AlphaFold DB models (v6) AF-P18206-2, AF-P18206, AF-Q46101. Ground-truth B-cell epitopes were obtained from the IEDB IQ-API (vinculin n = 153 positive; CdtB n = 4). The human reviewed proteome (UP000005640, 20,416 sequences) and Bacillus subtilis (UP000001570) were the backgrounds/controls. All accessions, releases and checksums are recorded (config/provenance/).

Sequence arm. Exact pentapeptide longest-common-substring (“1D-mimic”) and Smith-Waterman local alignment (BLOSUM62) between each CdtB variant and vinculin.

Null models. Exact-k-mer significance vs an Altschul-Erickson dipeptide-preserving shuffle; local-alignment significance vs a composition-preserving shuffle (classic shuffle-and-realign); ≥10³ iterations; empirical p with the +1/(N+1) correction; Benjamini-Hochberg FDR across variants/epitopes.

Structure arm. Foldseek easy-search (CdtB vs vinculin ensemble) for fold/domain similarity; fine surface-patch superposition (Cα, contiguous 5-to-12-mer windows, relative SASA ≥ 20%) vs a length-stratified random-fragment RMSD null (EMoMiS convention; Z < −1.645; RMSD ≤ 1 Å), with FDR.

Epitope arms. (a) Each IEDB vinculin B-cell epitope was tested for exact whole-epitope identity within CdtB and local-alignment similarity vs a per-epitope composition null (the method of the HPV-L1 negative, Int J Clin Oncol 2026). (b) DiscoTope-3.0 predicted conformational B-cell epitopes on vinculin and CdtB structures (DTU web server); contiguous conformational-epitope regions were compared for exact and local-alignment mimicry vs a composition null with FDR.

Controls. Positive: S. pyogenes M5 protein (P02977) ↔ cardiac myosin MYH7 (P12883), the rheumatic-fever precedent. Negative: E. coli MalE (P0AEX9) ↔ vinculin. Background: B. subtilis proteome (length-matched, max-vs-max bootstrap + Mann-Whitney).

Reproducibility. Seed-fixed Snakemake/Python pipeline; every table regenerates from accessions; code + provenance public under MIT. Structure prediction and epitope prediction used AlphaFold DB (v6) and the DiscoTope-3.0 web server respectively; no local GPU required.

3. Results

3.1 Pipeline calibration (Fig 1). The positive control recovered the established mimic: M5↔cardiac-myosin local-alignment BLOSUM62 score 225, empirical p = 0.0020 (z = 7.7) vs the composition null, with the aligned span (M5 ≈ 54 to 419) containing the known cross-reactive region 84 to 116 (Dale & Beachey, 1986). The negative control (MalE↔vinculin) was non-significant (p = 0.71). The pipeline detects a true mimic and rejects a non-mimic.

Control calibration: the positive control M5 vs cardiac myosin is recovered as significant, the negative control MalE vs vinculin is rejected.
Fig 1. Control calibration: positive (M5↔myosin) recovered, negative (MalE↔vinculin) rejected.

3.2 Sequence arm: negative (Fig 2). Across 72 non-redundant CdtB variants, no CdtB↔vinculin region survived FDR. The strongest local-alignment hit (an Aggregatibacter-class variant) had nominal p ≈ 0.01 but did not survive Benjamini-Hochberg correction; the best exact-pentapeptide hit had p ≈ 0.14. The single best CdtB local alignment to vinculin scored 71, roughly one-third of the positive control (225).

Per-variant empirical p-value distribution across 72 CdtB variants; none are FDR-significant.
Fig 2. Sequence arm: per-variant empirical-p distribution across 72 CdtB variants (0 FDR-significant).

3.3 Fold/domain structure arm: negative. Foldseek returned 46 short local alignments between CdtB (DNase-I fold) and vinculin (α-helical bundles), none significant (best e-value 4.7, probability 0.000), as expected for unrelated folds.

3.4 Fine surface-patch arm: negative. Pairwise Cα superposition of surface patches produced near-perfect short matches (best RMSD 0.11 Å over 7 residues) that are generic secondary-structure fragments; 0 survived the length-stratified null after FDR. The test is anti-conservative (best-of-many) yet still negative. This is the structural analogue of the short-peptide-ubiquity principle.

Pairwise percent-identity matrix across the non-redundant CdtB family, ranging 16 to 100 percent.
Fig 3. CdtB family pairwise %identity matrix (16 to 100%).

3.45 Conformational epitope arm: negative. DiscoTope-3.0 predicted conformational B-cell epitopes on vinculin (P18206-2; 248 epitope residues) and the CdtB family (AlphaFold C. jejuni + experimental CdtB chains; 46 to 55 epitope residues each). Restricting the mimicry search to the contiguous conformational-epitope regions both proteins present (5 vinculin, 14 CdtB peptides), there were 0 exact matches and 0 FDR-significant similarities (best p = 0.25).

3.5 Background comparison. A non-pathogen proteome comparison (CdtB vs B. subtilis, both vs vinculin) was sensitive to the control length/composition window (best-hit p ranging 0.0005 to 0.55) and is therefore not a reliable test; we rely on the per-variant shuffle-null with FDR, which is negative throughout.

3.6 Experimental-epitope arm: negative (Fig 4). Of the 153 experimentally-validated IEDB vinculin B-cell epitopes, 0 occur (as a whole epitope) within any CdtB variant and 0 show FDR-significant similarity (best nominal p = 0.033). The four IEDB CdtB epitopes likewise match nothing in vinculin. This directly tests whether the regions antibodies actually target are mimicked.

Experimental and conformational epitope-overlap p-value distributions; none are significant.
Fig 4. Experimental and conformational epitope-overlap p-value distributions (0 significant).

3.7 Pathogen ranking. Under FDR control no cdt-bearing pathogen’s CdtB variant passes the staged filter; the predicted “SIBO-risk ranking” is empty.

4. Discussion

Across five independent levels (sequence identity, sequence similarity, fold/domain structure, fine surface-patch structure, and both experimental and predicted B-cell epitopes) bacterial CdtB shows no statistically robust molecular mimicry of human vinculin, while the pipeline recovers a bona-fide mimic and rejects a non-mimic. The negative is therefore unlikely to be a methodological false-negative; it is consistent with the weak independent clinical replication of anti-CdtB/anti-vinculin biomarkers and with the principle that short-peptide and short-patch similarity are statistically ubiquitous and mostly coincidental.

Limitations / what would change the conclusion. (i) The cross-reactive epitope could be discontinuous; the conformational-epitope arm (DiscoTope-3) is negative, leaving only a full surface shape-and-electrostatics (MaSIF-style) comparison untested. (ii) Antibody cross-reactivity can arise at the paratope level and need not be visible in antigen-antigen comparison; current antibody-antigen docking (AlphaFold3/Boltz, <15% accuracy) cannot adjudicate this. (iii) Post-translational modification, conformational state, or species differences (animal-model vinculin) could matter. (iv) Anti-vinculin antibodies occur in other conditions (e.g., systemic sclerosis), so the biomarker need not reflect CdtB-driven mimicry at all. In-silico mimicry is not cross-reactivity: this analysis constrains the mechanism but does not disprove the clinical association.

Contribution beyond the negative. We release an open, benchmarked, null-controlled mimicry pipeline (positive control included), a non-redundant CdtB family with a computed identity matrix, and a vinculin B-cell-epitope map: reusable resources for any host-pathogen mimicry question.

5. Proposed experimental validation

A peptide-competition ELISA is the decisive test: synthesize the top predicted shared CdtB peptide(s) and assay whether they block anti-CdtB binding to vinculin (mirroring the undisclosed “CdtB blocking peptide”); alanine-scan/truncation to map any minimal epitope; and localize any cross-reactive region on vinculin (head 1 to 835 vs tail 879 to 1066), which the field has never reported.

6. Data and code availability

Open pipeline (MIT) at github.com/leonardo-vanschaik/cdtb-vinculin-mimicry. All results regenerate from config/accessions/ + config/provenance/manifest.json via Snakemake. An archived release with a Zenodo DOI will accompany submission.

References (key)

  1. Pimentel M, et al. Development and validation of a biomarker for diarrhea-predominant IBS. PLoS One 2015;10(5):e0126438.
  2. Morales W, et al. Second-generation biomarker testing for IBS. Dig Dis Sci 2019;64:3115-21.
  3. Barros LL, et al. BMC Gastroenterol 2024;24:448. / Mansoor review, J Pak Med Assoc 2024;74:1300-8.
  4. Trost B, Lucchese G, Kanduc D, et al. No human protein is exempt from bacterial motifs. Self/Nonself 2010;1(4):328-34.
  5. Balbin CA, et al. Epitopedia: identifying molecular mimicry between pathogens and known immune epitopes. ImmunoInformatics 2023;9:100023.
  6. Høie MH, et al. DiscoTope-3.0: improved B-cell epitope prediction using inverse folding latent representations. Front Immunol 2024;15:1322712.
  7. van Kempen M, et al. Fast and accurate protein structure search with Foldseek. Nat Biotechnol 2024.
  8. Mirdita M, et al. ColabFold: making protein folding accessible to all. Nat Methods 2022;19:679-82.
  9. Jumper J, et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021;596:583-9.
  10. Dale JB, Beachey EH. Sequence of myosin-crossreactive epitopes of streptococcal M protein. J Exp Med 1986;164:1785.
  11. MimicryDB-Auto: structural validation reveals the inadequacy of sequence-based mimicry screening. Preprints 2026, 202603.2306.
  12. Lack of molecular mimicry between HPV vaccine L1 antigen and human proteins by computational analysis. Int J Clin Oncol 2026.
  13. Shreiner AB, et al. Anti-vinculin antibodies in systemic sclerosis. Arthritis Care Res 2023.
  14. Noori et al. What does AlphaFold3 learn about antibody and nanobody docking. mAbs 2025.