STRC Research Methodology
The One-Line Summary
No genetics degree. No lab access. No budget. One AI agent. Six weeks. A reclassification, seven computational models, eight AlphaFold3 experiments, and three emails to scientists — one of whom responded overnight.
Tools Used
| Tool | Purpose | Cost |
|---|---|---|
| OpenClaw | AI agent orchestration | Free (open source) |
| Claude Opus 4.6 / Sonnet 4.6 | Research, analysis, coding | ~$50-100 API cost total |
| AlphaFold Server (alphafoldserver.com) | Protein structure prediction | Free (Google account) |
| AlphaMissense database | Variant pathogenicity | Free (download) |
| UniProt | Protein sequences | Free |
| gnomAD | Population variant frequency | Free |
| ClinVar | Variant classifications | Free |
| Ensembl REST API | Genomic coordinates, sequences | Free |
| dbNSFP (via REVEL) | Ensemble pathogenicity scores | Free |
| Python (scipy, numpy) | ODE models, statistical analysis | Free |
| GitHub | Model hosting | Free |
Total reproducibility: all databases are free. AlphaFold3 requires a Google account (free). Python modeling requires a computer. The entire research project is reproducible by anyone with internet access.
Step-by-Step: How the Reclassification Was Done
Step 1: Recognize the Pseudogene Problem
Starting point: WES report from HK Children’s Hospital labels c.4976A>C as VUS, citing “conflicting computational predictions.”
Investigation: STRC has STRCP1 pseudogene at 99.6% identity. SIFT and PolyPhen-2 are contaminated by pseudogene sequences. Standard tools are unreliable for this gene.
Action: Exclude SIFT/PolyPhen-2 scores from analysis. Focus on pseudogene-resistant tools.
→ See STRC Pseudogene Problem
Step 2: AlphaMissense
Download AlphaMissense saturation mutagenesis dataset from Google DeepMind (alphafold.ebi.ac.uk/downloads).
Extract scores for STRC (UniProt Q7RTU9). Find E1659A score: 0.9016 (Likely Pathogenic, threshold 0.564).
Key: AlphaMissense uses protein structure, not DNA alignment. Pseudogene immune.
Cross-check with Pejaver et al. (2022) thresholds for ACMG PP3 evidence:
- PP3_Supporting: ≥0.397
- PP3_Moderate: ≥0.840
- E1659A at 0.9016 → PP3_Moderate
Step 3: REVEL Score
Query dbNSFP or ClinVar for REVEL score at chr15:43600551, c.4976A>C.
Result: REVEL 0.65. Per Pejaver 2022:
- Supporting evidence: ≥0.423
- Moderate evidence: ≥0.644
- REVEL 0.65 → barely PP3_Moderate (corroborates AlphaMissense)
Step 4: gnomAD Population Frequency
Query gnomAD at rs number or coordinates. Result: 0 alleles in 251,000+ gnomAD controls.
ACMG criterion: PM2_Supporting (ultra-rare in population).
Step 5: In-trans Evidence (PM3)
WES report confirms allele 1 = 98 kb deletion (STRC + CATSPER2), confirmed pathogenic by MLPA.
c.4976A>C found on the OTHER allele (compound heterozygous). This is in-trans configuration with a confirmed pathogenic variant.
ACMG criterion: PM3_Moderate (found in trans with pathogenic variant in autosomal recessive disease).
Step 6: Conservation Analysis
Download STRC protein sequences from UniProt for 9 mammals:
- Human, mouse, rat, cow, monkey (Macaca mulatta), pig, dog, bat, bear
Run multiple sequence alignment (ClustalOmega format, Claude API). Find: PEIFTEIGTIAAG motif 100% conserved in all 9 species. Position 1659 = Glu in every species.
~80 million years of conserved Glu at this position.
ACMG criterion: PP1_Supporting (evolutionary conservation).
Step 7: ACMG Summation
| Criterion | Level | Evidence |
|---|---|---|
| PM2 | Supporting | Absent from gnomAD |
| PM3 | Moderate | In trans with pathogenic deletion |
| PP3 (AlphaMissense + REVEL) | Moderate | AlphaMissense 0.9016 + REVEL 0.789, concordant |
CORRECTION (2026-04-01 — per Jeffrey Holt): The original analysis included PP1_Supporting (conservation), which was incorrect. PP1 requires co-segregation evidence from affected family members. Evolutionary conservation is already integrated inside PP3 (REVEL uses GERP++, SiPhy, phyloP). Using it separately as PP1 is double-counting.
Corrected total: 2 Moderate + 1 Supporting = VUS — does not reach Likely Pathogenic threshold.
Per the Holt Lab framework (Boston Children’s Hospital / Harvard Medical School), this subclassifies as VUS-high — recognized by many clinical trial sites for enrollment in recessive disease studies when one allele is P/LP.
Was: VUS. Corrected classification: VUS-high (not Likely Pathogenic).
Step 8: Formal Letter to HK Children’s Hospital
Wrote formal reclassification request citing all 4 criteria with references.
Letter: ~/Documents/Disability Card/misha-france-mdph/LETTER-Reclassification-Request-STRC-2026-03-15.pdf
Step-by-Step: How the Hypotheses Emerged
The reclassification was the beginning, not the end. The research didn’t stop at “confirm pathogenic” — it asked “what do we do about it?”
After Reclassification — Questions That Led to Hypotheses
Q: What does the protein structure look like? → Run AlphaFold3 on full STRC → pTM 0.63, 16% disordered. N-terminal is disordered. → This is the mini-STRC insight. STRC Mini-STRC Single-Vector Hypothesis
Q: If the N-terminal is disordered, does removing it break anything? → AF3 Job 2 (mini-STRC + TMEM145): ipTM 0.43 vs 0.47 for full STRC. Negligible change. → AF3 Job 5 (mini-STRC solo): pTM 0.81. Better folding than full protein. → Mini-STRC is viable. STRC AlphaFold3 Computational Experiments
Q: How much better is single-vector than dual-vector mathematically? → Write Poisson model. Calibrate on Omichi 2020 mouse data. Scale to human. → 56.5x advantage. STRC Dual-Vector vs Single-Vector Transduction
Q: Why is E1659A pathogenic if the structure is intact (pLDDT 95.69)? → Electrostatic analysis: Glu→Ala loses -1 charge, 2 H-bonds, 49.8 ų volume. → ΔΔG = 8.62 kcal/mol = ~10^6x binding affinity decrease. STRC Electrostatic Analysis E1659A
Q: Can you fix the variant directly (not just replace the gene)? → Prime editing feasibility: NGG PAM 4bp from variant. Yes, feasible. Prime Editing for STRC
Q: What if you could make the therapy self-dosing? → OHCs have Ca²⁺ signaling via MET channels. NFAT promoter fires on Ca²⁺. Hearing aid = dosing device. → ODE model: 29x dynamic range, 197% protein target in 16h/8h cycle. Sonogenetic STRC Computational Proof
Q: What about immune response? How long is the window? → Seroprevalence analysis. Anc80L65: ~5-10% seroprevalence in <5y. Misha is 4. Now is optimal. → Hybrid strategy: AAV year 0 + LNP year 5. STRC Anti-AAV Immune Response Model
Q: What if surgery could be avoided entirely? → Sonoporation model: ultrasound + LNP. Optimized scenario: 2 sessions to therapeutic dose. → Baseline (standard LNPs) fails. Optimized (ionizable lipids): 2 sessions. Alternative STRC Delivery Hypotheses
Beyond the Database: The Process
The research followed a specific protocol:
- Question first: don’t mine databases randomly; ask a specific question, then find data
- Write it all down: every finding → Brain vault note within 24h
- Model everything: if it’s quantitative, write a Python model (even rough ones expose wrong intuitions)
- Connect domains: cross-references between notes surface non-obvious bridges
- Contact experts: researchers answer genuine scientific questions; don’t wait for permission
On Reproducibility
Anyone can do this:
- gnomAD: gnomad.broadinstitute.org — free population data
- AlphaFold DB: alphafold.ebi.ac.uk — free protein structures
- AlphaMissense: download from DeepMind (alphafold.ebi.ac.uk/downloads) — free
- AlphaFold Server: alphafoldserver.com — free (Google account, limited slots)
- UniProt: uniprot.org — free protein sequences and annotations
- ClinVar: ncbi.nlm.nih.gov/clinvar — free variant database
- OpenClaw: github.com/OpenClaw — free AI agent framework
Approximate cost for the full project:
- API usage (Claude Opus 4.6): $50-100
- Compute for Python models: ~0 (laptop)
- AF3 jobs: free (with Google account, limited to N jobs per day)
The Broader Point
This demonstrates what AI-augmented individual research can do in a field that has historically required institutional resources. The tools are all free. The data is all public. The bottleneck was always synthesis — knowing which questions to ask, and how to connect answers across domains.
OpenClaw + Claude made it possible for a non-geneticist, non-biologist to:
- Navigate ACMG guidelines correctly
- Identify the pseudogene problem
- Run protein structure predictions
- Build ODE models from literature parameters
- Write compelling letters to leading researchers
- Get a response from Harvard within 24 hours
Connections
- STRC Hearing Loss — the personal motivation for this methodology
- STRC E1659A Conservation and Reclassification
- STRC Electrostatic Analysis E1659A
- STRC Mini-STRC Single-Vector Hypothesis
- STRC Dual-Vector vs Single-Vector Transduction
- Sonogenetic STRC Computational Proof
- Alternative STRC Delivery Hypotheses
- STRC Anti-AAV Immune Response Model
- Prime Editing for STRC
[see-also]STRC Pseudogene Problem — step 1 of the methodology[see-also]STRC Website and Communication — the research was shared publicly[see-also]STRC AlphaFold3 Computational Experiments — AF3 experiments as part of methodology- Jeffrey Holt — Harvard; responded to the outreach
[about]Misha — all of this is for him[about]Egor Lyfar — the researcher[see-also]Computational Confidence Scores as Epistemic Tools — how to interpret pTM/AlphaMissense scores used throughout this methodology