STRC Research Methodology

The One-Line Summary

No genetics degree. No lab access. No budget. One AI agent. Six weeks. A reclassification, seven computational models, eight AlphaFold3 experiments, and three emails to scientists — one of whom responded overnight.

Tools Used

Tool	Purpose	Cost
OpenClaw	AI agent orchestration	Free (open source)
Claude Opus 4.6 / Sonnet 4.6	Research, analysis, coding	~$50-100 API cost total
AlphaFold Server (alphafoldserver.com)	Protein structure prediction	Free (Google account)
AlphaMissense database	Variant pathogenicity	Free (download)
UniProt	Protein sequences	Free
gnomAD	Population variant frequency	Free
ClinVar	Variant classifications	Free
Ensembl REST API	Genomic coordinates, sequences	Free
dbNSFP (via REVEL)	Ensemble pathogenicity scores	Free
Python (scipy, numpy)	ODE models, statistical analysis	Free
GitHub	Model hosting	Free

Total reproducibility: all databases are free. AlphaFold3 requires a Google account (free). Python modeling requires a computer. The entire research project is reproducible by anyone with internet access.

Step-by-Step: How the Reclassification Was Done

Step 1: Recognize the Pseudogene Problem

Starting point: WES report from HK Children’s Hospital labels c.4976A>C as VUS, citing “conflicting computational predictions.”

Investigation: STRC has STRCP1 pseudogene at 99.6% identity. SIFT and PolyPhen-2 are contaminated by pseudogene sequences. Standard tools are unreliable for this gene.

Action: Exclude SIFT/PolyPhen-2 scores from analysis. Focus on pseudogene-resistant tools.

→ See STRC Pseudogene Problem

Step 2: AlphaMissense

Download AlphaMissense saturation mutagenesis dataset from Google DeepMind (alphafold.ebi.ac.uk/downloads).

Extract scores for STRC (UniProt Q7RTU9). Find E1659A score: 0.9016 (Likely Pathogenic, threshold 0.564).

Key: AlphaMissense uses protein structure, not DNA alignment. Pseudogene immune.

Cross-check with Pejaver et al. (2022) thresholds for ACMG PP3 evidence:

PP3_Supporting: ≥0.397
PP3_Moderate: ≥0.840
E1659A at 0.9016 → PP3_Moderate

Step 3: REVEL Score

Query dbNSFP or ClinVar for REVEL score at chr15:43600551, c.4976A>C.

Result: REVEL 0.65. Per Pejaver 2022:

Supporting evidence: ≥0.423
Moderate evidence: ≥0.644
REVEL 0.65 → barely PP3_Moderate (corroborates AlphaMissense)

Step 4: gnomAD Population Frequency

Query gnomAD at rs number or coordinates. Result: 0 alleles in 251,000+ gnomAD controls.

ACMG criterion: PM2_Supporting (ultra-rare in population).

Step 5: In-trans Evidence (PM3)

WES report confirms allele 1 = 98 kb deletion (STRC + CATSPER2), confirmed pathogenic by MLPA.

c.4976A>C found on the OTHER allele (compound heterozygous). This is in-trans configuration with a confirmed pathogenic variant.

ACMG criterion: PM3_Moderate (found in trans with pathogenic variant in autosomal recessive disease).

Step 6: Conservation Analysis

Download STRC protein sequences from UniProt for 9 mammals:

Human, mouse, rat, cow, monkey (Macaca mulatta), pig, dog, bat, bear

Run multiple sequence alignment (ClustalOmega format, Claude API). Find: PEIFTEIGTIAAG motif 100% conserved in all 9 species. Position 1659 = Glu in every species.

~80 million years of conserved Glu at this position.

ACMG criterion: PP1_Supporting (evolutionary conservation).

Step 7: ACMG Summation

Criterion	Level	Evidence
PM2	Supporting	Absent from gnomAD
PM3	Moderate	In trans with pathogenic deletion
PP3 (AlphaMissense + REVEL)	Moderate	AlphaMissense 0.9016 + REVEL 0.789, concordant

CORRECTION (2026-04-01 — per Jeffrey Holt): The original analysis included PP1_Supporting (conservation), which was incorrect. PP1 requires co-segregation evidence from affected family members. Evolutionary conservation is already integrated inside PP3 (REVEL uses GERP++, SiPhy, phyloP). Using it separately as PP1 is double-counting.

Corrected total: 2 Moderate + 1 Supporting = VUS — does not reach Likely Pathogenic threshold.

Per the Holt Lab framework (Boston Children’s Hospital / Harvard Medical School), this subclassifies as VUS-high — recognized by many clinical trial sites for enrollment in recessive disease studies when one allele is P/LP.

Was: VUS. Corrected classification: VUS-high (not Likely Pathogenic).

Step 8: Formal Letter to HK Children’s Hospital

Wrote formal reclassification request citing all 4 criteria with references.

Letter: ~/Documents/Disability Card/misha-france-mdph/LETTER-Reclassification-Request-STRC-2026-03-15.pdf

Step-by-Step: How the Hypotheses Emerged

The reclassification was the beginning, not the end. The research didn’t stop at “confirm pathogenic” — it asked “what do we do about it?”

After Reclassification — Questions That Led to Hypotheses

Q: What does the protein structure look like? → Run AlphaFold3 on full STRC → pTM 0.63, 16% disordered. N-terminal is disordered. → This is the mini-STRC insight. STRC Mini-STRC Single-Vector Hypothesis

Q: If the N-terminal is disordered, does removing it break anything? → AF3 Job 2 (mini-STRC + TMEM145): ipTM 0.43 vs 0.47 for full STRC. Negligible change. → AF3 Job 5 (mini-STRC solo): pTM 0.81. Better folding than full protein. → Mini-STRC is viable. STRC AlphaFold3 Computational Experiments

Q: How much better is single-vector than dual-vector mathematically? → Write Poisson model. Calibrate on Omichi 2020 mouse data. Scale to human. → 56.5x advantage. STRC Dual-Vector vs Single-Vector Transduction

Q: Why is E1659A pathogenic if the structure is intact (pLDDT 95.69)? → Electrostatic analysis: Glu→Ala loses -1 charge, 2 H-bonds, 49.8 Å³ volume. → ΔΔG = 8.62 kcal/mol = ~10^6x binding affinity decrease. STRC Electrostatic Analysis E1659A

Q: Can you fix the variant directly (not just replace the gene)? → Prime editing feasibility: NGG PAM 4bp from variant. Yes, feasible. Prime Editing for STRC

Q: What if you could make the therapy self-dosing? → OHCs have Ca²⁺ signaling via MET channels. NFAT promoter fires on Ca²⁺. Hearing aid = dosing device. → ODE model: 29x dynamic range, 197% protein target in 16h/8h cycle. Sonogenetic STRC Computational Proof

Q: What about immune response? How long is the window? → Seroprevalence analysis. Anc80L65: ~5-10% seroprevalence in <5y. Misha is 4. Now is optimal. → Hybrid strategy: AAV year 0 + LNP year 5. STRC Anti-AAV Immune Response Model

Q: What if surgery could be avoided entirely? → Sonoporation model: ultrasound + LNP. Optimized scenario: 2 sessions to therapeutic dose. → Baseline (standard LNPs) fails. Optimized (ionizable lipids): 2 sessions. Alternative STRC Delivery Hypotheses

Beyond the Database: The Process

The research followed a specific protocol:

Question first: don’t mine databases randomly; ask a specific question, then find data
Write it all down: every finding → Brain vault note within 24h
Model everything: if it’s quantitative, write a Python model (even rough ones expose wrong intuitions)
Connect domains: cross-references between notes surface non-obvious bridges
Contact experts: researchers answer genuine scientific questions; don’t wait for permission

On Reproducibility

Anyone can do this:

gnomAD: gnomad.broadinstitute.org — free population data
AlphaFold DB: alphafold.ebi.ac.uk — free protein structures
AlphaMissense: download from DeepMind (alphafold.ebi.ac.uk/downloads) — free
AlphaFold Server: alphafoldserver.com — free (Google account, limited slots)
UniProt: uniprot.org — free protein sequences and annotations
ClinVar: ncbi.nlm.nih.gov/clinvar — free variant database
OpenClaw: github.com/OpenClaw — free AI agent framework

Approximate cost for the full project:

API usage (Claude Opus 4.6): $50-100
Compute for Python models: ~0 (laptop)
AF3 jobs: free (with Google account, limited to N jobs per day)

The Broader Point

This demonstrates what AI-augmented individual research can do in a field that has historically required institutional resources. The tools are all free. The data is all public. The bottleneck was always synthesis — knowing which questions to ask, and how to connect answers across domains.

OpenClaw + Claude made it possible for a non-geneticist, non-biologist to:

Navigate ACMG guidelines correctly
Identify the pseudogene problem
Run protein structure predictions
Build ODE models from literature parameters
Write compelling letters to leading researchers
Get a response from Harvard within 24 hours

Connections

STRC Hearing Loss — the personal motivation for this methodology
STRC E1659A Conservation and Reclassification
STRC Electrostatic Analysis E1659A
STRC Mini-STRC Single-Vector Hypothesis
STRC Dual-Vector vs Single-Vector Transduction
Sonogenetic STRC Computational Proof
Alternative STRC Delivery Hypotheses
STRC Anti-AAV Immune Response Model
Prime Editing for STRC
[see-also] STRC Pseudogene Problem — step 1 of the methodology
[see-also] STRC Website and Communication — the research was shared publicly
[see-also] STRC AlphaFold3 Computational Experiments — AF3 experiments as part of methodology
Jeffrey Holt — Harvard; responded to the outreach
[about] Misha — all of this is for him
[about] Egor Lyfar — the researcher
[see-also] Computational Confidence Scores as Epistemic Tools — how to interpret pTM/AlphaMissense scores used throughout this methodology

STRC Research

Explorer

STRC Research Methodology

STRC Research Methodology

The One-Line Summary

Tools Used

Step-by-Step: How the Reclassification Was Done

Step 1: Recognize the Pseudogene Problem

Step 2: AlphaMissense

Step 3: REVEL Score

Step 4: gnomAD Population Frequency

Step 5: In-trans Evidence (PM3)

Step 6: Conservation Analysis

Step 7: ACMG Summation

Step 8: Formal Letter to HK Children’s Hospital

Step-by-Step: How the Hypotheses Emerged

After Reclassification — Questions That Led to Hypotheses

Beyond the Database: The Process

On Reproducibility

The Broader Point

Connections

Graph View

Table of Contents

Backlinks