User Guide
| Note: For inquiries and bug reports, please contact Yan Huang (yan (at) netlab.uky.edu) and Dr. Jinze Liu (liuj (at) netlab.uky.edu) |
System requirement
| OS: | Ubuntu 9.04 (64-bit) |
| Compiler: | g++ 4.3.3 or higher |
| Matlab: | MATLAB 2008 or higher |
Installation
You can download the latest MultiSplice release package here. Use command "make -f Makefile" to compile.
Inputs and parameters
The following is a detailed description of the inputs and parameters used in MultiSplice:
1. Reference annotation file. MultiSplice uses GFF format as shown below:
chr1 hg19_knownGene exon 10003486 10003573 0.000000 + . gene_id "uc001aqp.2"; transcript_id "uc001aqp.2";
chr1 hg19_knownGene exon 10032076 10032246 0.000000 + . gene_id "uc001aqp.2"; transcript_id "uc001aqp.2";
chr1 hg19_knownGene start_codon 10032132 10032134 0.000000 + . gene_id "uc001aqp.2"; transcript_id "uc001aqp.2";
chr1 hg19_knownGene CDS 10032132 10032246 0.000000 + 0 gene_id "uc001aqp.2"; transcript_id "uc001aqp.2";
chr1 hg19_knownGene CDS 10035650 10035833 0.000000 + 2 gene_id "uc001aqp.2"; transcript_id "uc001aqp.2";
chr1 hg19_knownGene exon 10035650 10035833 0.000000 + . gene_id "uc001aqp.2"; transcript_id "uc001aqp.2";
chr1 hg19_knownGene CDS 10041089 10041228 0.000000 + 1 gene_id "uc001aqp.2"; transcript_id "uc001aqp.2";2. Read alignment file. MultiSplice uses SAM format as shown below:
HWI-ST254_0000:5:1:1575:1998#0/1 0 chr1 1263644 90 100M * 0 0 CAGTGCCCTCCATGCCCTGGCTGGCAGAAACCCTCAACAGCAGTCTGGGCACTGTGGGGCTCTCCCCGCCTCTCCTGCCTTGTTTGCCCCTCAGCGTGCC ccc\ccadbdgfbcfefdeedZcdYeb^ecaef_eaeea]cb^X\QWZ[[]_BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB NM:i:0 IH:i:1 HI:i:1
HWI-ST254_0000:5:1:1878:1970#0/1 0 chrX 70782818 136 50M112N50M * 0 0 NCCACACTTTTTTTATTGGTGATCATGCTAATATGTTCCCTCACCTGAAGAAAAAAGCAGTCATCGATTTTAAGTCCAATGGGCACATTTATGACAATCG BXSWTSUWXSccUYYYUUcVZY]]Y[^^\\^^\^^\\\]\^^^^^cc_ccc_YUUVVXVPUW[PV]YYSZYZYYSWWQTU[[[[]___[UUTTWW[[[[] NM:i:1 IH:i:1 HI:i:1
HWI-ST254_0000:5:1:2112:1951#0/1 0 chr5 134260898 83 100M * 0 0 NGGAGGTAGCGATGAGAGTAATAGATAGGGCTCAGGCGTTTGTGTATGATATGTTTGCGGTTTCGATGATGTGGTCTTTGGAGTAGAAACCTGTGCGGAA BWXWT[Q[WW^\^^^][[U[][[[]cccZc_____]U\\\SXTTXVVXYVVVVVV[[\]\^X^^BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB NM:i:4 IH:i:1 HI:i:1
HWI-ST254_0000:5:1:2434:1959#0/1 0 chr15 75212726 116 58M3205N42M * 0 0 NTCCAGGTAACTGTTCACACTCAAGTAGCAATGTCAATAAATCCTTGGGGAAGCCCATCCATGCGGTTTACACTTTGTCAAGGCCCAGTTCCTCCGGAGT BXTXX[YYYWccccccccXc___Z_cc_acc_c_c_ccccYTcccccc\_^^Z^U[[[][V[[]RZY]YUQR[VU_____BBBBBBBBBBBBBBBBBBBB NM:i:1 IH:i:1 HI:i:1
HWI-ST254_0000:5:1:2833:1963#0/1 0 chr17 16284421 60 78M717N22M * 0 0 NTTTGTTGGGTGAGCTTGTTTGTGTCCCTGTGGGTGGACGTGGTTGGTGATTGGCAGGATCCTGGTATCCGCTAACAGGGCAAAATGCAGATCTTCGTGA BJIMFIGIMKUTSXX^^X^^TT^^^BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB NM:i:2 IH:i:1 HI:i:1
HWI-ST254_0000:5:1:3057:1986#0/1 0 chr22 38627253 138 88M14600N12M * 0 0 ATGGGTTTTCCTCTGATCTCCGACATGATGGAACTTTCTCCTCCTAGGTACTCATAGCACAGGCTCAGGAAATTATAGATGACCAAGGCCTCATAGCAGT ggeeggeegggggggggfggggfgagcdgggdgdggggdggcggfgegcgggggg[\dccXadadd`b_abd_dbfeZd_defeef^gg`eeeZcecda` NM:i:0 IH:i:1 HI:i:1
HWI-ST254_0000:5:1:3148:1979#0/1 0 chr18 33702102 127 100M * 0 0 TCCCTTCCACGTGGAATCAAAATAGGCACTTTCTTCTATGTTTTGAGAAGACAGATGACTGAAAAGTGGTCCTCTTTTCATTTCCATTGCTGGTTCTTCA ggggfgeggggegggef^ffgggeggggeggggggggggggggegdgdbdcceebececegdbddd_`e[babObedfddeegeegeeeeedeaddadaB NM:i:1 IH:i:1 HI:i:1
HWI-ST254_0000:5:1:3100:1997#0/1 0 chr7 140399582 138 100M * 0 0 CAGTGATGTTTGTAAAAATCTTAAAAATCTCATCTAGCAGCAGAGTTGTTAATTTGGAGGGTTGTTCAGGGTATCTAGTTTGCCACAATGCAGTTCACAA ggggggfggggggggggfgggffggggggggggggggefggafdgcegdgbafffec`f_fTccbXW\__]UWa]``c\cedb^_cd_addbbadaeedc NM:i:0 IH:i:1 HI:i:3. To run MultiSplice, use command:
./bin/MultiSplice MultiSplicePath AnnotationFileName SAMFileName OutputFilePath ReadLength FragmentLength
Examples
UCSC human hg19 annotation and an sample read alignment file is provided here. To perform MultiSplice on this dataset, use./bin/MultiSplice path/MultiSplice_v0.10/ path/SampleData/hg19.gtf path/SampleData/alignments.sam path/result/ 100 100
Please note that the first "100" is the readlength and the second "100" is the transcript fragment length.
| locus | transcriptId | coverage |
locus: transcript locus on the reference genome.
transcriptId: transcript name.
coverage: estimated coverage of the transcript by MultiSplice.
Last Update: 05/25/2012
