STRC Research Methodology

The One-Line Summary

No genetics degree. No lab access. No budget. One AI agent. Six weeks. A reclassification, seven computational models, eight AlphaFold3 experiments, and three emails to scientists — one of whom responded overnight.

Tools Used

ToolPurposeCost
OpenClawAI agent orchestrationFree (open source)
Claude Opus 4.6 / Sonnet 4.6Research, analysis, coding~$50-100 API cost total
AlphaFold Server (alphafoldserver.com)Protein structure predictionFree (Google account)
AlphaMissense databaseVariant pathogenicityFree (download)
UniProtProtein sequencesFree
gnomADPopulation variant frequencyFree
ClinVarVariant classificationsFree
Ensembl REST APIGenomic coordinates, sequencesFree
dbNSFP (via REVEL)Ensemble pathogenicity scoresFree
Python (scipy, numpy)ODE models, statistical analysisFree
GitHubModel hostingFree

Total reproducibility: all databases are free. AlphaFold3 requires a Google account (free). Python modeling requires a computer. The entire research project is reproducible by anyone with internet access.

Step-by-Step: How the Reclassification Was Done

Step 1: Recognize the Pseudogene Problem

Starting point: WES report from HK Children’s Hospital labels c.4976A>C as VUS, citing “conflicting computational predictions.”

Investigation: STRC has STRCP1 pseudogene at 99.6% identity. SIFT and PolyPhen-2 are contaminated by pseudogene sequences. Standard tools are unreliable for this gene.

Action: Exclude SIFT/PolyPhen-2 scores from analysis. Focus on pseudogene-resistant tools.

→ See STRC Pseudogene Problem

Step 2: AlphaMissense

Download AlphaMissense saturation mutagenesis dataset from Google DeepMind (alphafold.ebi.ac.uk/downloads).

Extract scores for STRC (UniProt Q7RTU9). Find E1659A score: 0.9016 (Likely Pathogenic, threshold 0.564).

Key: AlphaMissense uses protein structure, not DNA alignment. Pseudogene immune.

Cross-check with Pejaver et al. (2022) thresholds for ACMG PP3 evidence:

  • PP3_Supporting: ≥0.397
  • PP3_Moderate: ≥0.840
  • E1659A at 0.9016 → PP3_Moderate

Step 3: REVEL Score

Query dbNSFP or ClinVar for REVEL score at chr15:43600551, c.4976A>C.

Result: REVEL 0.65. Per Pejaver 2022:

  • Supporting evidence: ≥0.423
  • Moderate evidence: ≥0.644
  • REVEL 0.65 → barely PP3_Moderate (corroborates AlphaMissense)

Step 4: gnomAD Population Frequency

Query gnomAD at rs number or coordinates. Result: 0 alleles in 251,000+ gnomAD controls.

ACMG criterion: PM2_Supporting (ultra-rare in population).

Step 5: In-trans Evidence (PM3)

WES report confirms allele 1 = 98 kb deletion (STRC + CATSPER2), confirmed pathogenic by MLPA.

c.4976A>C found on the OTHER allele (compound heterozygous). This is in-trans configuration with a confirmed pathogenic variant.

ACMG criterion: PM3_Moderate (found in trans with pathogenic variant in autosomal recessive disease).

Step 6: Conservation Analysis

Download STRC protein sequences from UniProt for 9 mammals:

  • Human, mouse, rat, cow, monkey (Macaca mulatta), pig, dog, bat, bear

Run multiple sequence alignment (ClustalOmega format, Claude API). Find: PEIFTEIGTIAAG motif 100% conserved in all 9 species. Position 1659 = Glu in every species.

~80 million years of conserved Glu at this position.

ACMG criterion: PP1_Supporting (evolutionary conservation).

Step 7: ACMG Summation

CriterionLevelEvidence
PM2SupportingAbsent from gnomAD
PM3ModerateIn trans with pathogenic deletion
PP3 (AlphaMissense + REVEL)ModerateAlphaMissense 0.9016 + REVEL 0.789, concordant

CORRECTION (2026-04-01 — per Jeffrey Holt): The original analysis included PP1_Supporting (conservation), which was incorrect. PP1 requires co-segregation evidence from affected family members. Evolutionary conservation is already integrated inside PP3 (REVEL uses GERP++, SiPhy, phyloP). Using it separately as PP1 is double-counting.

Corrected total: 2 Moderate + 1 Supporting = VUS — does not reach Likely Pathogenic threshold.

Per the Holt Lab framework (Boston Children’s Hospital / Harvard Medical School), this subclassifies as VUS-high — recognized by many clinical trial sites for enrollment in recessive disease studies when one allele is P/LP.

Was: VUS. Corrected classification: VUS-high (not Likely Pathogenic).

Step 8: Formal Letter to HK Children’s Hospital

Wrote formal reclassification request citing all 4 criteria with references.

Letter: ~/Documents/Disability Card/misha-france-mdph/LETTER-Reclassification-Request-STRC-2026-03-15.pdf

Step-by-Step: How the Hypotheses Emerged

The reclassification was the beginning, not the end. The research didn’t stop at “confirm pathogenic” — it asked “what do we do about it?”

After Reclassification — Questions That Led to Hypotheses

Q: What does the protein structure look like? → Run AlphaFold3 on full STRC → pTM 0.63, 16% disordered. N-terminal is disordered. → This is the mini-STRC insight. STRC Mini-STRC Single-Vector Hypothesis

Q: If the N-terminal is disordered, does removing it break anything? → AF3 Job 2 (mini-STRC + TMEM145): ipTM 0.43 vs 0.47 for full STRC. Negligible change. → AF3 Job 5 (mini-STRC solo): pTM 0.81. Better folding than full protein. → Mini-STRC is viable. STRC AlphaFold3 Computational Experiments

Q: How much better is single-vector than dual-vector mathematically? → Write Poisson model. Calibrate on Omichi 2020 mouse data. Scale to human. → 56.5x advantage. STRC Dual-Vector vs Single-Vector Transduction

Q: Why is E1659A pathogenic if the structure is intact (pLDDT 95.69)? → Electrostatic analysis: Glu→Ala loses -1 charge, 2 H-bonds, 49.8 ų volume. → ΔΔG = 8.62 kcal/mol = ~10^6x binding affinity decrease. STRC Electrostatic Analysis E1659A

Q: Can you fix the variant directly (not just replace the gene)? → Prime editing feasibility: NGG PAM 4bp from variant. Yes, feasible. Prime Editing for STRC

Q: What if you could make the therapy self-dosing? → OHCs have Ca²⁺ signaling via MET channels. NFAT promoter fires on Ca²⁺. Hearing aid = dosing device. → ODE model: 29x dynamic range, 197% protein target in 16h/8h cycle. Sonogenetic STRC Computational Proof

Q: What about immune response? How long is the window? → Seroprevalence analysis. Anc80L65: ~5-10% seroprevalence in <5y. Misha is 4. Now is optimal. → Hybrid strategy: AAV year 0 + LNP year 5. STRC Anti-AAV Immune Response Model

Q: What if surgery could be avoided entirely? → Sonoporation model: ultrasound + LNP. Optimized scenario: 2 sessions to therapeutic dose. → Baseline (standard LNPs) fails. Optimized (ionizable lipids): 2 sessions. Alternative STRC Delivery Hypotheses

Beyond the Database: The Process

The research followed a specific protocol:

  1. Question first: don’t mine databases randomly; ask a specific question, then find data
  2. Write it all down: every finding → Brain vault note within 24h
  3. Model everything: if it’s quantitative, write a Python model (even rough ones expose wrong intuitions)
  4. Connect domains: cross-references between notes surface non-obvious bridges
  5. Contact experts: researchers answer genuine scientific questions; don’t wait for permission

On Reproducibility

Anyone can do this:

  1. gnomAD: gnomad.broadinstitute.org — free population data
  2. AlphaFold DB: alphafold.ebi.ac.uk — free protein structures
  3. AlphaMissense: download from DeepMind (alphafold.ebi.ac.uk/downloads) — free
  4. AlphaFold Server: alphafoldserver.com — free (Google account, limited slots)
  5. UniProt: uniprot.org — free protein sequences and annotations
  6. ClinVar: ncbi.nlm.nih.gov/clinvar — free variant database
  7. OpenClaw: github.com/OpenClaw — free AI agent framework

Approximate cost for the full project:

  • API usage (Claude Opus 4.6): $50-100
  • Compute for Python models: ~0 (laptop)
  • AF3 jobs: free (with Google account, limited to N jobs per day)

The Broader Point

This demonstrates what AI-augmented individual research can do in a field that has historically required institutional resources. The tools are all free. The data is all public. The bottleneck was always synthesis — knowing which questions to ask, and how to connect answers across domains.

OpenClaw + Claude made it possible for a non-geneticist, non-biologist to:

  • Navigate ACMG guidelines correctly
  • Identify the pseudogene problem
  • Run protein structure predictions
  • Build ODE models from literature parameters
  • Write compelling letters to leading researchers
  • Get a response from Harvard within 24 hours

Connections