User Guide

 

Navigation

What is MapSplice?
How does MapSplice work?
What's new in MapSplice 2?
System requirements
Obtaining & Building MapSplice 2
Command Line
Sample Dataset
Output of MapSplice 2





What is MapSplice?

MapSplice is a software for mapping RNA-seq read to reference genome for splice junction discovery.

  • It depends only on reference genome, and not on any further annotations
  • It supports both paired-end reads and single-end reads, and utilizes the advantage of pair-end read for better mapping accuracy
    It supports variable length reads
    It aligns unspliced and spliced alignments simultaneous
  • It detects:
      • novel canonical, semi-canonical and non-canonical splice junctions 
      • novel insertions and deletions
      • novel gene fusion events 
 
 

How does MapSplice work?

MapSplice first splits reads into segments, and maps them to reference genome by using Bowtie. Then for unmapped segements, MapSplice tries to fix it as gapped alignments, with each gap corresponding to a splice junction. And later a remapping step is used to identify spliced alignments that are in the presence of small exons. MapSplice leverages the quality and diversity of read alignments of a given splice to increase accuracy.

 

What's new in MapSplice 2? 

  • MapSplice 2 improved mapping sensitiviy.
  • MapSplice 2 now supports multi-thread, dramatically improves the running time on multi-core system.
  • MapSplice 2 now supports variable length reads.
  • MapSplice 2 is optimized for repeats.
  • All the command line parameters have been re-designed for easier use.

 

System requirements

  • OS: Linux x86 64bit system
  • Memory: 6GB
  • Compiler: g++ 4.3.3 or higher
  • Script: Python 2.4.3 or higher

 

Obtaining & Building MapSplice 2

  • You can download the lastest version of MapSplice here.  For better compatibility and user's convenience,  Bowtie 0.12.7 and SAMtools 0.1.9 are included in the package.
  • To build MapSplice, extract compressed file, go to the MapSplice directory, and run "make"


Command Line

Usage

python mapsplice.py [options]* -c <Reference_Sequence> -x <Bowtie_Index> -1 <Read_List1> -2 <Read_List2>

Main Arguments
-c <string>

The directory containing the sequence files of reference genome. All sequence files are required to:

  • In "FASTA" format, with  '.fa' extension.
  • One chromosome per sequence file.
  • Chromosome name in the header line ('>' not included) is the same as the sequence file base name, and does not contain any blank space.
  • E.g. If the header line is '>chr1', then the sequence file name should be 'chr1.fa'.
-x <string>

The basename (including directory path) of Bowtie index to be searched. The basename is the name of any of the index files up to but not including the final .1.ebwt / .rev.1.ebwt / etc.

  • MapSplice uses Bowtie 1 index. Bowtie 2 index is not supported.
  • If you do not have a Bowite index, you can ignore this option, (or the index you specified can not be found), MapSplice will build the index from reference sequences you specified in -c. (build index from reference sequence costs extra time, the index is saved in output directory.
  • Pre-built indexes of various reference genome can be downloaded from Bowtie's website. Use with cation when using pre-built index. Make sure the index is consistent with the  sequence files specified in -c. For more information about Bowtie index, please visit Bowtie's website.
-1 <string> Comma-separated (no blank space) list of read sequence files in FASTA/FASTQ format. When running with pair-end read, this should contain #1 mates (filename usually includes _1).
-2 <string> Comma-separated (no blank space) list of read sequence files in FASTA/FASTQ format. -2 is only used when running with pair-end read. This should contain #2 mates (filename usually includes _2). Files must be in the same order with those specified in -1.
Optional arguments
  •  Input/Output and Performance options
-p / --threads <int> Number of threads to be used for parallel aligning. Default is 1.
-o / --output <string> The directory in which MapSplice will write its output. Default is "./mapsplice_out/".
--qual-scale <string>

Type of input qualities. By default MapSplice tries to determine the quality scale automatically. This option overrides the automatic detected quality scale.

  • phred33: the input quality type is Phred+33 (Illumina 1.8+, Sanger)
  • phred64:  the input quality type is Phred+64 (Illumina 1.3+ ~ 1.7+)
  • solexa64: the input quality type is Solexa+64 (Solexa)
--bam Generate BAM output. By default MapSplice reports alignmnet in SAM format.
--keep-tmp Keep the intermediate files. By default MapSplice deletes all the intermediate files once finished running.
  • Alignment options
-s / --seglen <int>  Read will be divided into <int> bp segments for initial aligning. Default is 25.
  • Suggested to be in range of [18,25], segment lengths shorter than 18 may cause more false positive and MapSplice may get significantly slower. 
  • For read longer than 50bp, segment length of 25(default) is highly recommended. 
--min-map-len <int>

MapSplice will only report read alignments that can be completely mapped or mapped longer than <int> bases. Default is 50.

-k / --max-hits <int> Maximum alignments per read. Any read that has more than <int> alignments will be abandoned. Default is 4.
-i / --min-intron <int> Minimum length of splice junctions. Mapsplice will not search for any splice junctions with a gap shorter than <int> bp. Default is 50.
-I / --max-intron <int> Maximum length of splice junctions. Mapsplice will not search for any splice junctions with a gap longer than <int> bp. Default is 300,000.
--non-canonical Search for non-canonical in addition to canonical and semi-canonical junctions.
-m / --splice-mis <int> Maximum number of mismatches that are allowed in a segment crossing a splice junction. Default is 1.
--max-append-mis <int>  Maximum number of mismatches allowed to append a high error exonic segment next to an adjacent low error segment. Default is 3. 
--ins <int> Maximum insertion length. (insertion in read / deletion in reference genome). Default is 6.
--del <int> Maximum deletion length. (deletion in read / insertion in reference genome). Default is 6.
--fusion | --fusion-non-canonical  --fusion: Search for canonical and semi-canonical fusion junctions.
--fusion-non-canonical: Search for canonical, semi-canonical, and non-canonical fusion junctions. 
--filtering <int>

The stringency level of filtering splice junctions in the range of [1, 2]. Default is 2.

  • 1: Less stringent filtering, with higher sensitivity of splice junction detection.
  • 2: Standard filtering.
  •  Other Options
-h/--help Print the usage message
-v/--version  Print the version of MapSplice

 

 

Sample Dataset

 Coming Soon...

 

Output of MapSplice 2

  • Alignment

By default, read alignments are reported in SAM format to alignments.sam.  If --bam is specified, read alignments are reported in BAM format to alignments.bam. Please see the SAM / BAM format specification.

  • Normal Splice Junction

Splice junctions are reported to "junctions.txt". Please see the detailed description of all the columns here.

  • Insertion

Inserstions are reported to insertions.txt. Please see the detailed description of all the columns here.

  • Deletion

Deletions are reported to deletions.txt.  Please see the detailed description of all the columns here.

  • Fusion alignment

If --fusion | --fusion-non-canonical is specified, fusion alignment are reported to fusion_alignment.txt. Please see the detailed description of all the columns here

  • Fusion Splice junction

If --fusion | --fusion-non-canonical is specified, fusion splice junctions are reported to fusion_junction.txt. Please see the detailed description of all the columns here.