Professur für
Institute für Informatik
Universität Freiburg

Bioinformatics analysis of alternative splicing

Prior translation the majority of human primary transcripts undergoes a maturation process that splices out intronic sequences. Alternative splicing can yield different mature mRNAs from one gene by skipping exons, retaining introns or choosing parts thereof. Alternative splicing is considered to contribute significantly to the complexity of transcriptomes and proteomes. Alternatively spliced protein forms can differ in their functional domain composition, sub cellular location or affinity to bind ligands. It has been estimated that up to 60% of all human genes are alternatively spliced.

Non-EST based prediction of alternative splice events

One major goal of research in the post-genomic area lies in elucidating and characterizing the entire spectrum of alternative splice forms. Although EST-based approaches are powerful, not all splice events are represented in EST databases since ESTs have several biases. Therefore, it is of great interest to apply non-EST based methods to predict alternative splice events ab initio.

To predict alternative splice events we apply a novel strategy that is solely based on the annotation of protein domain families (Pfam). Our approach is independent of the existence of orthologous sequences and the predictions are very reliable. This approach complements a growing list of bioinformatics tools for non-EST based splice event prediction.

Bioinformatics analysis of alternative splice events at tandem splice sites

Alternative splice events often result in large effects for the proteins, for example by deleting functional units like protein domains or transmembrane helices. On the other hand, alternative splicing also allows the production of many very similar protein isoforms. The most frequent of these subtle events is the alternative splicing at NAGNAG or tandem acceptors. This splice acceptor motif frequently allows the selection of one of the two AGs in the splice process, resulting in the insertion/deletion of the I acceptor NAG in mRNAs About 5% of all human acceptors are NAGNAG acceptors. A detailed analysis revealed that NAGNAG acceptors are significantly biased towards intron phase 1, single amino acid indels, polar amino acid contexts, concurrent charge modifications and proteins interacting with other macromolecules. NAGNAG acceptors are conserved in mammalian evolution and extensively functional also in Drosophila. We experimentally demonstrated tissue specificity for tandem acceptors of ITGAM, SMARCA4 and BTNL2.

We also investigated alternative splicing at unusual donor sites with the motif GYNGYN (Y stands for C or T). While only one GY functions as a splice donor for the majority of these splice sites in human, we provide computational and experimental evidence that 110 (1.3%) allow alternative splicing at both GY donors. Analyzing what distinguishes alternatively from not alternatively spliced GYNGYN donors, we found differences in the binding to U1 snRNA, a strong correlation between U1 snRNA binding strength and the preferred donor, overrepresented sequence motifs in the adjacent introns, and a higher conservation of the exonic and intronic flanks between human and mouse. Extending our genome-wide analysis to seven other eukaryotic species, we found alternatively spliced GYNGYN donors in all species from mouse to C. elegans and even in A. thaliana. Experimental verification of a conserved GTAGTT donor of the STAT3 gene in human and mouse reveals a remarkably similar ratio of alternatively spliced transcripts in both species.

Alternative splicing at tandem acceptors is an important mechanism to increase proteome diversity in a wide range of species. We are interested in further research focused on the prevalence, regulation and mechanism of alternative splicing at tandem acceptors, its functional effects on the affected proteins as well as its general impact on biology.

We have developed a relational database TassDB (TAndem Splice Site DataBase) that stores extensive data about alternative splice events at GYNGYN donors and NAGNAG acceptors.

SNPs affect alternative NAGNAG splicing

Aberrant or modified splicing patterns of genes are causative for many human diseases. Therefore, the identification of genetic variations causing changes in the splicing pattern of a gene is important. We performed a genome-wide screen for SNPs that affect NAGNAG acceptors. From 121 SNPs identified, we extracted 64 SNPs that most likely affect alternative NAGNAG splicing. We demonstrated that the NAGNAG motif is necessary and sufficient for this type of alternative splicing. The evolutionary young NAGNAG alleles, as determined by the comparison to the chimpanzee genome, exhibit the same biases towards intron phase 1 and single amino acid insertion-deletions that were already observed for all human NAGNAG acceptors. As 28% of the NAGNAG SNPs occur in known disease genes, they represent preferable candidates for a more detailed functional analysis, especially since the splice relevance for some of the cSNPs is overlooked. The presented approach is highly effective in the prediction of polymorphisms that are causal for variations in alternative splicing.

Contributing group members

Main Publications