User Guide
| Note: For inquiries and bug reports, please contact mapsplice (at) netlab.uky.edu. |
System requirement
| OS: | Ubuntu 9.04 (64-bit), Red Hat 4.1.2(64-bit), Red Hat 4.3.0 (64-bit) |
| Compiler: | g++ 4.1.2, g++ 4.3.0, g++ 4.3.3 or higher |
| Script: | Python 2.4.3, Python 2.5.1, Python 2.6 |
* MapSplice 1.15.2 is tested in the environment mentioned above
Obtaining and installing MapSplice
You can download the MapSplice 1.15.2 release package here.
We use Bowtie in MapSplice pipeline for segment mapping. The bowtie and bowtie-build are in the path of MapSplice/bin/. The bowtie version tested with MapSplice 1.15.2 is 0.12.7.
Run MapSplice with configuration file
1. Download MapSplice 1.15.2 package
2. Edit MapSplice.cfg file for your input data files and output directory. You may also need to edit the default settings
3. Run MapSplice pipeline with "python bin/mapsplice_segments.py MapSplice.cfg"
Inputs and Command-line options
The following is a detailed description of the options used to control the MapSplice script:
| Usage: | |
|
python bin/mapsplice_segments.py MapSplice.cfg or python bin/mapsplice_segments.py [inputs|options] MapSplice.cfg or python bin/mapsplice_segments.py [inputs|options] |
|
| Inputs and output: |
|
| -u/--reads-file <string> |
A comma separated (no blank space) list of FASTA or FASTQ read files(inlcude path) Format constraint: Reads names after @ or > should not contain a blank space or tab |
| -c/--chromosome-files-dir <string> |
The directory containing the sequence files corresponding to the reference genome (in FASTA format) |
| -B/--Bowtieidx <string> |
The path and basename of index to be searched by Bowtie. -E.g. if the index file name is index.1.ewbt, then the base name is index (Index only need to be built once, and the pre-built indexes of various reference genomes are downloadable at Bowtie's page.) However, use cation when downloading a pre-indexed genome (i.e. know what you are downloading, be sure the bowtie index is consistent with the chromosome files specified with -c option) |
| -o/--output-dir <string> |
The name of the directory in which MapSplice will write its output. The default is "mapsplice_out/" under the current directory MapSplice is run in. |
| -t/--avoid-regions <string> (optional) |
Regions to avoid (i.e. mask) while searching for alignments - gff format required - e.g. ~/examples/islands.gff |
| -T/--interested-regions <string>(optional) |
Regions of interest while searching for alignments - gff format required |
| -M/--sam-file <string> (optional) |
A comma separated (no blank space) list of sam files (including path) (optional) |
| --bam <string> (optional) | A comma separated (no blank space) list of bam files (including path) (optional) -Only supports single end reads -If this value is specified, then reads_file option will not be used, -The unmapped reads in the bam files will be converted into fastq format to be used as input reads |
|
--filter-fusion-by-repeat<string> (optional) |
Filter fusion junction if the doner sequence and acceptor sequence appears repeatedly -blat needs to be installed on the system, chromosome index in blat format needs to be provided -e.g. human index in blat format: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.2bit -The output is "fusion_remap_junction.unique.chr_seq.extracted.repeat_filtered" |
|
|
|
| Basic options: | Basic options are options suggested to be specified to run MapSplice correctly |
| -L/--seglen <int> |
Description: Length of read segments |
| -Q/--reads-format <string> | Format of input reads, fa OR fq |
| --pairend |
Whether or not the input reads are paired-end or single.Need to be specified for paired-end reads |
| Advanced options: | |
| -E/--segment-mismatches <int> |
The maximum number of mismatches (Hamming distance) that are allowed in an unspliced aligned read and segment. The default is 1. Must be in range [0-3] |
| --non-canonical | --semi-canonical |
Whether or not the semi-canonical and non-canonical junctions should be outputted If --non-canonical specified, output all junctions. If --semi-canonical specified, output semi-canonical and canonical junctions If none of them are specified, output is only canonical junctions |
| --fusion-non-canonical | --fusion-semi-canonical |
Whether or not the semi-canonical and non-canonical fusion junctions should be outputted If --fusion-non-canonical specified, output all fusion junctions. If --fusion-semi-canonical specified, output semi-canonical and canonical fusion junctions If none of them are specified, output is only canonical fusion junctions suggest output only canonical fusion junction |
| --not-rem-temp |
If specified, do not remove temporary directory and files after MapSplice is finished running |
| --full-running |
If specified, run a remapping step to increase the junction coverage |
| -n/--min-anchor <int> |
The anchor length that will be used for single anchored spliced alignment |
| -R/--remap-mismatches <int> | The maximum number of mismatches that are allowed during remapping. The default is 2. Should be in range [0-3] |
| -m/--splice-mismatches <int> | The maximum number of mismatches that are allowed in a segment crossing a splice junction. The default is 1. |
| -i/--min-intron-length <int> | The "minimum intron length". Mapsplice will not report alignments with a gap less than this many bases. The default is 1. |
| -x/--max-intron-length <int> | The "maximum intron length". Mapsplice will not report alignments with a gap longer than this many bases apart for a single anchored spliced alignment. The default is 200000. |
| -X/--threads<int> | Number of threads to run bowtie on when mapping reads |
| --max-hits<int> | max_hits x 10 is the maximum repeated hits permitted during segment mapping and read mapping (default is 4 x 10 = 40) |
| -r/--max-insert <int> |
The maximum small indel length (default is 3, suggested to be in [0-3]) |
| --min-missed-seg <int> |
An option to output incomplete alignments. # The minimal number of segments contained in alignment. # eg. If read length is 75bp, segment_length is 25, then setting min_missed_seg to 1 will output 50bp alignments if there are no 75bp alignments for the corresponding reads #-The default is output alignments of full read length |
| --search-whole-chromosome |
If specified, search up to the maximum intron length away in exonic region and non-exonic region. |
| --map-segments-directly |
#If specified, MapSplice will try to find spliced alignments and unspliced alignments of a read, and select the best alignment. (will increase running time) |
| --run-MapPER | If specified, run MapPER (PMID 20576625)and generate reads mappings based on a probabilistic framework, valid for PER reads |
| --fusion | Whether or not fusion junctions should be outputted # -Reads not aligned as normal unspliced or spliced alignments are consider as fusion candidates # -The outputs are "fusion.junction" and "fusion_junction.unique" if full-running is not turned on # -The outputs are "fusion_remap_junction.unique.chr_seq.extracted" if full-running is turned on |
| --cluster |
Whether or not to use paired-end reads to generate cluster regions for fusion read mappings # Use paired-end reads to find fusion alignments with a single anchored method # e.g. use 2x50 paired read and 25bp segment length to find fusion alignments # -Only valid for paired-end reads and the full running model and do_fusion on (set full_running = yes and do_fusion = yes) |
| Help and version options: | |
| -h/--help | Print the help message and exit |
| -v/--version | Print the version of MapSplice and exit |
Examples
Three examples run on hg18 chr20 reference genome
Before run the examples, make sure bowtie and bowtie-build are in MapSplice path, reference genome and index are in the path indicated in the command_line options.
Example 1 1M 36bp fastq reads
python mapsplice_segments.py -Q fq -o 1M_36bp_output_path -c chr20_sequence_index_path -u reads_path/1M_36bp_fastq.txt -B chr20_sequence_index _path/index -L 18 2>36bp_time.log
Example 2 1M 50bp fastq reads
python mapsplice_segments.py -Q fq -o 1M_50bp_output_path -c chr20_sequence_index_path -u reads_path/1M_50bp_fastq.txt -B chr20_sequence_index_path /index -L 25 2>50bp_time.log
Example 3 1M 100bp fastq reads
python mapsplice_segments.py -Q fq -o 1M_100bp_output_path -c chr20_sequence_index_path -u reads_path/1M_100bp_fastq.txt -B chr20_sequence_index_path/index -L 25 2>100bp_time.log
*hints:
*bowtie-build index will take hours to build index, but the index only need to be built 1 time, and is reusable, so don't delete the index
*time log is output to stderr. It can be redirected by '2>'.
| best_remapped_junction.bed | best_junction.bed | Junctions in UCSC bed format |
| alignments.sam | Spliced and unspliced reads alignment in SAM format. |
|
fusion_junction | fusion_remap_junction.unique.chr_seq.extracted (remapped junction) | fusion_remap_junction.unique.chr_seq.extracted.repeat_filtered (remapped repeat filtered junction) |
Format |
| fusion.remapped.unique | Format |
| prob_alignment.sam | Predicated reads alignment based on probabilistic framework |
Last Update: 07/08
