Interactive Gene Structure Explorer: Complete 5' to 3' Analysis

GeneStrand Explorer

Complete Molecular Map

This complete model includes all regulatory elements, UTRs, start/stop codons, spacers, introns, and exons. The strand is presented in the standard genomic orientation: 5' end at the top (upstream) to the 3' end at the bottom (downstream).

"Pop over" individual rectangles to identify sequence portions. Click to lock functional data.

Functional Detail

Awaiting Selection

Select a sequence segment from the complete strand on the right to view its biological significance, structural role, and exact metrics.

Sequence Components

14
Total Segments
5
Coding Regions
5' END (UPSTREAM)
3' END (DOWNSTREAM)

Sequence Composition

GC Content by Element

Comprehensive Legend

5' Cap / Start Site
Regulatory (Promoter/Enhancer)
Untranslated Regions (UTRs)
Start / Stop Codons
Exons (Coding)
Introns (Non-Coding)
Spacers
Boundary/Insulator
Locus Control Region
Silencer
Hormone Response Element

Splice Site Recognition (Consensus Motifs)

How does the spliceosome know exactly where an intron begins and ends? It relies on statistical sequence patterns at the boundaries. In the Sequence Logos below, the total height of a column indicates conservation, and the height of each specific letter represents the probability of that nucleotide appearing at that position.

Adenine (A)
Cytosine (C)
Guanine (G)
Thymine (T)
Uracil (U)

1. The Major Spliceosome (GT-AG Class)

Accounts for ~99% of human introns. Characterized by a highly conserved GT at the start of the intron and an AG at the end.

5' Splice Site (Exon-Intron Border)

3' Splice Site (Intron-Exon Border)

Notice the C/T-rich Pyrimidine Tract preceding the AG.

2. The Minor Spliceosome (U12-type / Atypical)

A rare class of introns (< 1%) often bounded by AT-AC instead of GT-AG. They have highly constrained 5' sites.

5' Splice Site (AT-AC)

Why distinct motifs? Minor introns are processed by a completely different set of cellular machinery (the U12-dependent spliceosome) which recognizes these specific atypical sequences rather than the standard ones.

Direct Repeats (DR) at Intron/Exon Borders

Analysis of sequences within 80 bp of the 5' and 3' intron/exon borders reveals the presence of Direct Repeats (DRs). These are identical sequence motifs appearing near both junctions. The most common DRs observed are 11 bp in length (mean 11.4 ± 4.6 bp). As DR length increases, the probability of finding a random match decreases exponentially.

Interactive DR Probability Model

5' EXON
80 bp Search Zone
// INTRON (Variable Length) //
80 bp Search Zone
3' EXON
4 bp
Random Match Probability: 0.300781250

High Probability (Hot)

Distribution of DR Lengths

Frequency of Direct Repeats found among exon/intron borders. The curve represents the local weighted regression.

Probabilities of Random DR Match

Probability of a sequence within 80 bp of the 5' border being present within 80 bp of the 3' border.

Length of DR Number Found Possible Combinations Probability of Random Match

Distance of DRs from Exon/Intron Borders

Direct Repeats are not just found near splice sites; they are heavily concentrated exactly at the borders. The largest number of DRs (44.4%) span the exon/intron borders or are immediately adjacent. Furthermore, 80% of all observed DRs are located within 10 bp of the junctions.

Evolutionary Implication: This precise localization strongly supports the theory that introns may have originated via transposon-like insertions into Double-Strand Breaks (DSBs). Mobile element insertion typically involves staggered DNA breaks and error-prone repair, naturally generating flanking direct repeats.

44.4%
At Exactly 0 bp Distance
80.0%
Within 10 bp of Border

Interactive DR Distance Explorer

Exon Coding Sequence
Intron Non-Coding Sequence
Junction (0 bp)
DR
0 bp
Frequency of Occurrence: 44.4%

Highest Frequency (Spanning Junction)

Distribution of Distances from Repeat to Exon/Intron Border

Values atop the bars represent the percentage of total DRs found at that specific distance.

Macroscopic to Microscopic: The Genomic Scale

While the models above focus heavily on the base-pair level of a single gene, it is vital to contextualize this within the massive scale of the human genome. A single chromosome contains millions of base pairs tightly packed into chromatin.

This visualization demonstrates the extreme "zoom" required to get from a mitotic chromosome down to the exact regulatory sequences and exons we explored earlier. Remarkably, only a tiny fraction of this massive structure actually codes for proteins.

1.5%
Of The Genome
Codes for Proteins
(A) Human chromosome 22 in mitotic conformation (48 × 10⁶ bp) heterochromatin
×10
(B) 10% of chromosome arm containing ~40 genes
×10
(C) 1% of chromosome arm containing 4 genes
×10
(D) One gene of 3.4 × 10⁴ nucleotide pairs
regulatory DNA
exon
intron
gene expression
RNA
protein
folded protein

Comparative Gene Structure: Prokaryote vs. Eukaryote

While eukaryotic genes are highly complex and discontinuous (containing introns), prokaryotic genes (like those in bacteria) are much simpler and organized differently. A key feature of prokaryotes is the operon: a cluster of functional genes transcribed together into a single mRNA molecule, controlled by a shared promoter and operator.

This horizontal model highlights structural differences using our established color palette: Coding regions in Rose, Promoters/Enhancers in Emerald & Indigo, Introns in Stone, and Boundaries/Operators in Slate.

Prokaryotic Gene Structure

Regulatory gene
Operon
DNA 5'
Promoter
Enhancer/Repressor
Promoter
Operator
Gene A
Gene B
Gene C
Gene D
Gene E
Terminator
3'

Eukaryotic Gene Structure

Upstream Regulatory sequence
Open Reading Frame (ORF)
Downstream Regulatory sequence
DNA 5'
Enhancer/
Silencer
Distal control elements
//
Proximal control elements
Promoter
TATA box
Exon
Intron
Exon
Intron
Exon
Poly-A signal
sequence
Terminator
//
Distal control elements
Enhancer
or Silencer
3'

Additional Genomic Features (Not Visualized)

While the models above provide a comprehensive view of the core structural and regulatory elements of a gene, biological systems are inherently more complex. Below are critical genomic mechanisms and features that influence gene expression but are omitted from the current spatial visualizer.

Alternative Splicing

A single primary transcript can be spliced in multiple ways (e.g., exon skipping, mutually exclusive exons) to produce different protein isoforms. This greatly expands the proteomic diversity from a limited number of genes.

🧬

Epigenetic Landscape

Chemical modifications to DNA (e.g., CpG methylation) and histone proteins (e.g., H3K4me3 at active promoters, H3K27ac at enhancers). These marks dynamically control chromatin accessibility and gene silencing.

♾️

3D Chromatin Folding

Genes exist in a 3-dimensional space. Elements like Enhancers and Promoters are brought together physically via chromatin looping, often anchored by CTCF proteins within Topologically Associating Domains (TADs).

✂️

RNA Editing

Post-transcriptional alterations to the RNA sequence, such as Adenosine to Inosine (A-to-I) editing by ADAR enzymes. This can recode amino acids, alter splice sites, or change miRNA binding targets.

SNPs & eQTLs

Single Nucleotide Polymorphisms (SNPs) within these regions can act as expression quantitative trait loci (eQTLs), affecting the efficiency of promoters, enhancers, or splice sites, contributing to phenotypic variation and disease.

〰️

ncRNA Interactions

The interactions of the mRNA with microRNAs (miRNAs) or long non-coding RNAs (lncRNAs), particularly at the 3' UTR. These interactions are crucial for post-transcriptional silencing and mRNA degradation.

Complete Genomic Architecture Model • 5' to 3' Vertical Integration