GeneStrand Explorer
Sequence Composition
GC Content by Element
Comprehensive Legend
Splice Site Recognition (Consensus Motifs)
How does the spliceosome know exactly where an intron begins and ends? It relies on statistical sequence patterns at the boundaries. In the Sequence Logos below, the total height of a column indicates conservation, and the height of each specific letter represents the probability of that nucleotide appearing at that position.
1. The Major Spliceosome (GT-AG Class)
Accounts for ~99% of human introns. Characterized by a highly conserved GT at the start of the intron and an AG at the end.
5' Splice Site (Exon-Intron Border)
3' Splice Site (Intron-Exon Border)
2. The Minor Spliceosome (U12-type / Atypical)
A rare class of introns (< 1%) often bounded by AT-AC instead of GT-AG. They have highly constrained 5' sites.
5' Splice Site (AT-AC)
Direct Repeats (DR) at Intron/Exon Borders
Analysis of sequences within 80 bp of the 5' and 3' intron/exon borders reveals the presence of Direct Repeats (DRs). These are identical sequence motifs appearing near both junctions. The most common DRs observed are 11 bp in length (mean 11.4 ± 4.6 bp). As DR length increases, the probability of finding a random match decreases exponentially.
Interactive DR Probability Model
High Probability (Hot)
Distribution of DR Lengths
Frequency of Direct Repeats found among exon/intron borders. The curve represents the local weighted regression.
Probabilities of Random DR Match
Probability of a sequence within 80 bp of the 5' border being present within 80 bp of the 3' border.
| Length of DR | Number Found | Possible Combinations | Probability of Random Match |
|---|
Distance of DRs from Exon/Intron Borders
Direct Repeats are not just found near splice sites; they are heavily concentrated exactly at the borders. The largest number of DRs (44.4%) span the exon/intron borders or are immediately adjacent. Furthermore, 80% of all observed DRs are located within 10 bp of the junctions.
Evolutionary Implication: This precise localization strongly supports the theory that introns may have originated via transposon-like insertions into Double-Strand Breaks (DSBs). Mobile element insertion typically involves staggered DNA breaks and error-prone repair, naturally generating flanking direct repeats.
Interactive DR Distance Explorer
Highest Frequency (Spanning Junction)
Distribution of Distances from Repeat to Exon/Intron Border
Values atop the bars represent the percentage of total DRs found at that specific distance.
Macroscopic to Microscopic: The Genomic Scale
While the models above focus heavily on the base-pair level of a single gene, it is vital to contextualize this within the massive scale of the human genome. A single chromosome contains millions of base pairs tightly packed into chromatin.
This visualization demonstrates the extreme "zoom" required to get from a mitotic chromosome down to the exact regulatory sequences and exons we explored earlier. Remarkably, only a tiny fraction of this massive structure actually codes for proteins.
Codes for Proteins
Comparative Gene Structure: Prokaryote vs. Eukaryote
While eukaryotic genes are highly complex and discontinuous (containing introns), prokaryotic genes (like those in bacteria) are much simpler and organized differently. A key feature of prokaryotes is the operon: a cluster of functional genes transcribed together into a single mRNA molecule, controlled by a shared promoter and operator.
This horizontal model highlights structural differences using our established color palette: Coding regions in Rose, Promoters/Enhancers in Emerald & Indigo, Introns in Stone, and Boundaries/Operators in Slate.
Prokaryotic Gene Structure
Eukaryotic Gene Structure
Silencer
TATA box
sequence
or Silencer
Additional Genomic Features (Not Visualized)
While the models above provide a comprehensive view of the core structural and regulatory elements of a gene, biological systems are inherently more complex. Below are critical genomic mechanisms and features that influence gene expression but are omitted from the current spatial visualizer.
Alternative Splicing
A single primary transcript can be spliced in multiple ways (e.g., exon skipping, mutually exclusive exons) to produce different protein isoforms. This greatly expands the proteomic diversity from a limited number of genes.
Epigenetic Landscape
Chemical modifications to DNA (e.g., CpG methylation) and histone proteins (e.g., H3K4me3 at active promoters, H3K27ac at enhancers). These marks dynamically control chromatin accessibility and gene silencing.
3D Chromatin Folding
Genes exist in a 3-dimensional space. Elements like Enhancers and Promoters are brought together physically via chromatin looping, often anchored by CTCF proteins within Topologically Associating Domains (TADs).
RNA Editing
Post-transcriptional alterations to the RNA sequence, such as Adenosine to Inosine (A-to-I) editing by ADAR enzymes. This can recode amino acids, alter splice sites, or change miRNA binding targets.
SNPs & eQTLs
Single Nucleotide Polymorphisms (SNPs) within these regions can act as expression quantitative trait loci (eQTLs), affecting the efficiency of promoters, enhancers, or splice sites, contributing to phenotypic variation and disease.
ncRNA Interactions
The interactions of the mRNA with microRNAs (miRNAs) or long non-coding RNAs (lncRNAs), particularly at the 3' UTR. These interactions are crucial for post-transcriptional silencing and mRNA degradation.