STRC Cross-Species Conservation Analysis
The Question
If the N-terminal of STRC is functionally dispensable (mini-STRC hypothesis), we might expect it to be less conserved across species. Is there a conservation gradient?
Methodology
- Fetched STRC orthologs from UniProt for 9 mammalian species
- Human (Q7RTU9, 1775 aa) as reference
- Species: Rhesus macaque, Green monkey, Mouse, Rat, Cow, Dog, Horse, Pig, Bat
- Initial approach: positional alignment — FAILED (gave misleading 10% vs 77% result)
- Correct approach: BioPython pairwise alignment, BLOSUM62 matrix, local alignment
Key Finding: N and C Are Equally Conserved
BioPython alignment results:
| Region | Avg identity |
|---|---|
| N-terminal (1-699) | 91.5% |
| C-terminal (700-1775) | 92.4% |
| Delta | +0.9 pp (C-term slightly higher) |
Per-species breakdown:
- Primates (macaque, green monkey): 95-97% in both halves
- Rodents (mouse, rat): 87-88% in both halves
- Ungulates (cow, horse, pig): 90-92% in both halves
Initial positional analysis was WRONG. Mouse/rat have 34 extra residues in the N-terminal repeat region (1809 vs 1775 aa). Positional comparison misaligns the whole N-terminal → 10% identity is an artifact of misalignment, not real conservation difference.
Mouse/Rat Length Difference
Mouse and rat STRC: 1809 aa (human: 1775 aa). The 34 extra residues are in the N-terminal repeat region. This length variation is itself evidence of something — repeat expansion/contraction is tolerated without loss of function. But it does NOT mean N-terminal is less conserved (the sequences are conserved, just offset).
Revised Argument for Mini-STRC
Conservation does NOT support the “N-term is less conserved” argument. But it doesn’t invalidate mini-STRC either. The argument shifts:
Original hypothesis: N-term is less conserved → dispensable Revised conclusion: N-term is highly conserved AND structurally disordered → may serve as a spacer/stalk/extension, not an enzymatic domain
This is actually a stronger and more honest framing:
- Conserved sequence + no stable fold = likely structural role (scaffolding, stalk, binding via disordered motif)
- Loss of spacer may be tolerated if the functional domains (LRR + C-term) are preserved and correctly oriented
- Precedent: many extracellular proteins have conserved disordered N-terminal extensions that serve as flexible linkers
Honest Presentation on Site
Cross-species conservation added to MiniSTRC.astro with full disclosure:
- Both halves ~92% conserved
- Earlier “N-term less conserved” claim corrected
- Reframed as “conserved but intrinsically disordered” = spacer/stalk interpretation
- 9-species table published on strc.egor.lol
Scripts:
/tmp/strc_conservation_v2.py(positional, produces misleading result — deprecated)/tmp/strc_conservation_v4.py(BioPython pairwise alignment, correct)
Key Lesson
Always use pairwise alignment before comparing positional sequence identity across protein families. Insertions/deletions in one lineage completely invalidate positional comparison. The first pass result (10% vs 77%) looked dramatic but was entirely artifact.
Connections
- STRC Mini-STRC Single-Vector Hypothesis — updates the conservation argument
- STRC pLDDT Profile and Cut Point Analysis — N-term disordered despite being conserved
- Positional alignment misleads when length varies; use pairwise BLOSUM62
- strc.egor.lol (MiniSTRC.astro, cross-species conservation section)
[about]Misha