MultiSplice:User Guide

 

 User Guide

Note: For inquiries and bug reports, please contact Yan Huang (yan (at) netlab.uky.edu) and Dr. Jinze Liu (liuj (at) netlab.uky.edu)

 

 


System requirement

OS: Ubuntu 9.04 (64-bit)
Compiler: g++ 4.3.3 or higher
Matlab: MATLAB 2008 or higher


 

 

Installation

You can download the latest MultiSplice release package here. Use command "make -f Makefile" to compile.

 


Inputs and parameters

 The following is a detailed description of the inputs and parameters used in MultiSplice:


 

1. Reference annotation file. MultiSplice uses GFF format as shown below:

chr1 hg19_knownGene exon 10003486 10003573 0.000000 + . gene_id "uc001aqp.2"; transcript_id "uc001aqp.2"; 
chr1 hg19_knownGene exon 10032076 10032246 0.000000 + . gene_id "uc001aqp.2"; transcript_id "uc001aqp.2";
chr1 hg19_knownGene start_codon 10032132 10032134 0.000000 + . gene_id "uc001aqp.2"; transcript_id "uc001aqp.2";
chr1 hg19_knownGene CDS 10032132 10032246 0.000000 + 0 gene_id "uc001aqp.2"; transcript_id "uc001aqp.2";
chr1 hg19_knownGene CDS 10035650 10035833 0.000000 + 2 gene_id "uc001aqp.2"; transcript_id "uc001aqp.2";
chr1 hg19_knownGene exon 10035650 10035833 0.000000 + . gene_id "uc001aqp.2"; transcript_id "uc001aqp.2";
chr1 hg19_knownGene CDS 10041089 10041228 0.000000 + 1 gene_id "uc001aqp.2"; transcript_id "uc001aqp.2";


2. Read alignment file. MultiSplice uses SAM format as shown below:

HWI-ST254_0000:5:1:1575:1998#0/1 0 chr1 1263644 90 100M * 0 0 CAGTGCCCTCCATGCCCTGGCTGGCAGAAACCCTCAACAGCAGTCTGGGCACTGTGGGGCTCTCCCCGCCTCTCCTGCCTTGTTTGCCCCTCAGCGTGCC ccc\ccadbdgfbcfefdeedZcdYeb^ecaef_eaeea]cb^X\QWZ[[]_BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB NM:i:0 IH:i:1 HI:i:1
HWI-ST254_0000:5:1:1878:1970#0/1 0 chrX 70782818 136 50M112N50M * 0 0 NCCACACTTTTTTTATTGGTGATCATGCTAATATGTTCCCTCACCTGAAGAAAAAAGCAGTCATCGATTTTAAGTCCAATGGGCACATTTATGACAATCG BXSWTSUWXSccUYYYUUcVZY]]Y[^^\\^^\^^\\\]\^^^^^cc_ccc_YUUVVXVPUW[PV]YYSZYZYYSWWQTU[[[[]___[UUTTWW[[[[] NM:i:1 IH:i:1 HI:i:1
HWI-ST254_0000:5:1:2112:1951#0/1 0 chr5 134260898 83 100M * 0 0 NGGAGGTAGCGATGAGAGTAATAGATAGGGCTCAGGCGTTTGTGTATGATATGTTTGCGGTTTCGATGATGTGGTCTTTGGAGTAGAAACCTGTGCGGAA BWXWT[Q[WW^\^^^][[U[][[[]cccZc_____]U\\\SXTTXVVXYVVVVVV[[\]\^X^^BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB NM:i:4 IH:i:1 HI:i:1
HWI-ST254_0000:5:1:2434:1959#0/1 0 chr15 75212726 116 58M3205N42M * 0 0 NTCCAGGTAACTGTTCACACTCAAGTAGCAATGTCAATAAATCCTTGGGGAAGCCCATCCATGCGGTTTACACTTTGTCAAGGCCCAGTTCCTCCGGAGT BXTXX[YYYWccccccccXc___Z_cc_acc_c_c_ccccYTcccccc\_^^Z^U[[[][V[[]RZY]YUQR[VU_____BBBBBBBBBBBBBBBBBBBB NM:i:1 IH:i:1 HI:i:1
HWI-ST254_0000:5:1:2833:1963#0/1 0 chr17 16284421 60 78M717N22M * 0 0 NTTTGTTGGGTGAGCTTGTTTGTGTCCCTGTGGGTGGACGTGGTTGGTGATTGGCAGGATCCTGGTATCCGCTAACAGGGCAAAATGCAGATCTTCGTGA BJIMFIGIMKUTSXX^^X^^TT^^^BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB NM:i:2 IH:i:1 HI:i:1
HWI-ST254_0000:5:1:3057:1986#0/1 0 chr22 38627253 138 88M14600N12M * 0 0 ATGGGTTTTCCTCTGATCTCCGACATGATGGAACTTTCTCCTCCTAGGTACTCATAGCACAGGCTCAGGAAATTATAGATGACCAAGGCCTCATAGCAGT ggeeggeegggggggggfggggfgagcdgggdgdggggdggcggfgegcgggggg[\dccXadadd`b_abd_dbfeZd_defeef^gg`eeeZcecda` NM:i:0 IH:i:1 HI:i:1
HWI-ST254_0000:5:1:3148:1979#0/1 0 chr18 33702102 127 100M * 0 0 TCCCTTCCACGTGGAATCAAAATAGGCACTTTCTTCTATGTTTTGAGAAGACAGATGACTGAAAAGTGGTCCTCTTTTCATTTCCATTGCTGGTTCTTCA ggggfgeggggegggef^ffgggeggggeggggggggggggggegdgdbdcceebececegdbddd_`e[babObedfddeegeegeeeeedeaddadaB NM:i:1 IH:i:1 HI:i:1
HWI-ST254_0000:5:1:3100:1997#0/1 0 chr7 140399582 138 100M * 0 0 CAGTGATGTTTGTAAAAATCTTAAAAATCTCATCTAGCAGCAGAGTTGTTAATTTGGAGGGTTGTTCAGGGTATCTAGTTTGCCACAATGCAGTTCACAA ggggggfggggggggggfgggffggggggggggggggefggafdgcegdgbafffec`f_fTccbXW\__]UWa]``c\cedb^_cd_addbbadaeedc NM:i:0 IH:i:1 HI:i:


3. To run MultiSplice, use command:

./bin/MultiSplice MultiSplicePath AnnotationFileName SAMFileName OutputFilePath ReadLength FragmentLength




Examples 

 

UCSC human hg19 annotation and an sample read alignment file is provided here. To perform MultiSplice on this dataset, use
./bin/MultiSplice path/MultiSplice_v0.10/ path/SampleData/hg19.gtf path/SampleData/alignments.sam path/result/ 100 100

Please note that the first "100" is the readlength and the second "100" is the transcript fragment length.



MultiSplice output

locus transcriptId coverage

locus: transcript locus on the reference genome.

transcriptId: transcript name.

coverage: estimated coverage of the transcript by MultiSplice.



Last Update: 05/25/2012