STRC Cross-Species Conservation Analysis

The Question

If the N-terminal of STRC is functionally dispensable (mini-STRC hypothesis), we might expect it to be less conserved across species. Is there a conservation gradient?

Methodology

  • Fetched STRC orthologs from UniProt for 9 mammalian species
  • Human (Q7RTU9, 1775 aa) as reference
  • Species: Rhesus macaque, Green monkey, Mouse, Rat, Cow, Dog, Horse, Pig, Bat
  • Initial approach: positional alignment — FAILED (gave misleading 10% vs 77% result)
  • Correct approach: BioPython pairwise alignment, BLOSUM62 matrix, local alignment

Key Finding: N and C Are Equally Conserved

BioPython alignment results:

RegionAvg identity
N-terminal (1-699)91.5%
C-terminal (700-1775)92.4%
Delta+0.9 pp (C-term slightly higher)

Per-species breakdown:

  • Primates (macaque, green monkey): 95-97% in both halves
  • Rodents (mouse, rat): 87-88% in both halves
  • Ungulates (cow, horse, pig): 90-92% in both halves

Initial positional analysis was WRONG. Mouse/rat have 34 extra residues in the N-terminal repeat region (1809 vs 1775 aa). Positional comparison misaligns the whole N-terminal → 10% identity is an artifact of misalignment, not real conservation difference.

Mouse/Rat Length Difference

Mouse and rat STRC: 1809 aa (human: 1775 aa). The 34 extra residues are in the N-terminal repeat region. This length variation is itself evidence of something — repeat expansion/contraction is tolerated without loss of function. But it does NOT mean N-terminal is less conserved (the sequences are conserved, just offset).

Revised Argument for Mini-STRC

Conservation does NOT support the “N-term is less conserved” argument. But it doesn’t invalidate mini-STRC either. The argument shifts:

Original hypothesis: N-term is less conserved → dispensable Revised conclusion: N-term is highly conserved AND structurally disordered → may serve as a spacer/stalk/extension, not an enzymatic domain

This is actually a stronger and more honest framing:

  • Conserved sequence + no stable fold = likely structural role (scaffolding, stalk, binding via disordered motif)
  • Loss of spacer may be tolerated if the functional domains (LRR + C-term) are preserved and correctly oriented
  • Precedent: many extracellular proteins have conserved disordered N-terminal extensions that serve as flexible linkers

Honest Presentation on Site

Cross-species conservation added to MiniSTRC.astro with full disclosure:

  • Both halves ~92% conserved
  • Earlier “N-term less conserved” claim corrected
  • Reframed as “conserved but intrinsically disordered” = spacer/stalk interpretation
  • 9-species table published on strc.egor.lol

Scripts:

  • /tmp/strc_conservation_v2.py (positional, produces misleading result — deprecated)
  • /tmp/strc_conservation_v4.py (BioPython pairwise alignment, correct)

Key Lesson

Always use pairwise alignment before comparing positional sequence identity across protein families. Insertions/deletions in one lineage completely invalidate positional comparison. The first pass result (10% vs 77%) looked dramatic but was entirely artifact.

Connections