Version 1.0
Bairong Shen,Wentao Wu
March 26,2012
Contents
1.Introduction to NGSFormatConverter
1.1 What can you do with NGSFormatConverter
2.Quick start to use NGSFormatConverter
2.2.2 Conversion by build-in scripts
Appendix B Output naming rules
---------------------------------------------------------------------------------------------------------------------------------------------------
NGSFormatConverter(NGSFC)is first designed for format conversion and database retrieval of
deep sequencing. But I found many excellent softwares of deep sequencing are not easy to use.
For example,many softwares can be described as "one command,one file", if there are a lot of
files to process, it is very troublesome. So the NGSFC can call other programs and scripts to process
a list of files. I hope researchers would consider it to be useful.
The NGSFC has these functions in brief:
1.
Format conversion,and also can load external scripts
2. Batch processing
3. Execute other programs with batch processing
4. Databases retrieval and also can add new databases
NGSFC will run on most platforms with the Java 7 or above installed.
Java -jar NGSFC.jar -i <input> -o <output> [-s <script>] -t <type> [arguments]
The order of arguments is not mandatory,but the arguments must be in pair(e.x <-i> <input>).And the rest will be recognized as [arguments].
Java -jar NGSFC.jar -c <configfile>
All the information is in the config file.For more information,please refer to config file example.
Java -jar NGSFC.jar -h
Show help.
-i
Input file.
-o
Output directory.
-s
Script file if use external scripts.
-t
Use the build-in scripts to convert.if use external scripts use 10000. For more infomation of the conversion_type , please see Appendix A.
-c
Config file.In field inputs and args,use ;(semicolon) to separate multiple inputs.To use script,type must be 10000.
Example:
inputs = 1.fq;2.fq #multiple inputs must be separated by semicolon
output = outputs #give the output directory path
script = gtf2gff3.pl #if use external script,write the script file path
type = -fq2fa #conversion type,for more information,please refer to Appendix A.Write 10000 if use external script.
args = arg1;arg2 #arguments for conversion,use semicolon to separate.
-h
Show help.
Fig.1. Overview of NGSFormatConverter
2.2.2 Conversion by build-in scripts
Step 1: Select Build-in
Script tab.
Step 2: Select the input format in the list.
Step 3: Select the output format in the list.
Step 4: Add the files you want to process.
Step 5: Select the output directory.NGSFC will name the output file automatically. For more information please refer to Appeddix B.
Step 6: Input parameters if required.
Step 7: Press the start button and have a coffee,it will done soon.
If NGSFC doesn't have the conversion scripts you want or you want to execute a program(for example a analysis program), follow the following steps.
Step 1: Select Custom Scripts
tab.
Step 2: Select the script or program in the list(NGSFC will load all the scripts(pl,py,exe,jar...) in the scripts directory.
Note:If you want to execute a perl script,you must have a perl interpreter installed.Other scripts is similar.
Step 3: The same as step 4 to step 7 above.
To retrieve databases, follow the following steps.
Step 1: Select the databases you want to retrieve.
Step 2: Input your key words.
Step 3: Press Search button.
If you want to add some new databases,please do as follows.
Step 1: Press Edit button.
Step 2: Select a database if you want to modify or do not select anything if you want to add new.
Step 3: Input a name and URL.
Step 4: Press addnew button.
Note:The URL can not contain the key words.For example,if you want to search pubmed,you will input "http://www.ncbi.nlm.nih.gov/pubmed/?term=".
-fa2fq
FASTA to FASTQ.
The qual file must have the same file name with the fasta file and suffix ".qual". And in the same directory with fasta file.
-fa2fa
CSFASTA to FASTA.
-fa22bit
FASTA to 2bit.
-fq2fa
FASTQ to FASTA.
-fq2fq
Change FASTQ quality score. 0 for Sanger, 1 for Solexa, 2 for Illumina 1.3-1.4, 3 for Illumina 1.5+.
|
Arguments |
Description |
| Input score |
Input file quality score,required. |
|
Output score |
Output file quality score,required. |
-fq2sm
FASTQ to SAM, more information please refer to Picard(http://picard.sourceforge.net/command-line-overview.shtml#FastqToSam).
| Arguments |
Description |
|
FASTQ2 |
Input fastq file (optionally gzipped)for the second read of paired end data. Default value: null. |
|
Quality format |
Quality score format,sanger,solexa,Illumina 1.3+ or Illumina 1.5+,default value:sanger. |
| Read group name | Read group name Default value:A. |
| Sample name | Sample name to insert into the read group header.Default value:SampleName. |
|
Library name |
The library name to place into the LB attribute in the read group header Default value:null. |
|
Platform unit |
The platform unit (often run_barcode.lane
) to insert into the read group header Default value: null. |
| Platform |
The platform type (e.g.illumina,solid)to insert into the read group header Default value: null. |
|
Sequence center |
The sequencing center from which the data originated. Default value: null. |
| Predicted insert size | Predicted median insert size, to insert into the read group header.Default value: null. |
| Description | Inserted into the read group header. Default value: null. |
-qseq2fq
Qseq to FASTQ.
-qseq2fa
Qseq to FASTA.
-sc2fq
SCARF to FASTQ.
-sc2fa
SCARF to FASTA.
-2bit2fa
2bit to FASTA.
|
Arguments |
Description |
|
In one file |
Put all the sequences in one FASTA file or in separated files.Default value:false. |
-sm2bm
SAM to BAM.
-sm2fa
SAM to FASTA.
|
Arguments |
Description |
|
Second end FASTA |
Output fasta file (if paired, second end of the pair fasta).Default value:null. Cannot be used in conjuction with option(s) OUTPUT_PER_RG (OPRG) |
|
Output per read group |
Output a fastq file per read group (two fastq files per read group if the group is paired). Default value: false. Possible values: {true, false} Cannot be used in conjuction with option(s) SECOND_END_FASTQ (F2) FASTQ (F) |
|
Re-reverser |
Re-reverse bases and qualities of reads with negative strand flag set before writing them to fastq Default value: true. Possible values: {true, false} |
|
Include non-PF reads |
Include non-PF reads from the SAM file into the output FASTQ files. Default value: false. Possible values: {true, false} |
| Clipping attrbutes | The attribute that stores the position at which the SAM record should be clipped Default value: null. |
|
Clipping action |
The action that should be taken with clipped reads: 'X' means the reads and qualities should be trimmed at the clipped position; 'N' means the bases should be changed to Ns in the clipped region; and any integer means that the base qualities should be set to that value in the clipped region. Default value: null. |
|
Read1 trim |
The number of bases to trim from the beginning of read 1. Default value: 0. This option can be set to 'null' to clear the default value. |
| Read1 max bases to write | The maximum number of bases to write from read 1 after trimming. If there are fewer than this many bases left after trimming, all will be written. If this value is null then all bases left after trimming will be written. Default value: null. |
| Read2 trim | The number of bases to trim from the beginning of read 2. Default value: 0. This option can be set to 'null' to clear the default value. |
| Read2 max bases to write | The maximum number of bases to write from read 2 after trimming. If there are fewer than this many bases left after trimming, all will be written. If this value is null then all bases left after trimming will be written. Default value: null. |
-sm2fq
SAM to FASTQ, more information please refer to Picard(http://picard.sourceforge.net/command-line-overview.shtml#SamToFastq).
The arguments is the same as -sm2fa.
-bm2sm
BAM to SAM.
-bm2fa
BAM to FASTA, the arguments is the same as -sm2fa.
-bm2fq
BAM to FASTQ, the arguments is the same as -sm2fq.
-psl2sm
PSL to SAM.
|
Arguments |
Description |
|
Match |
The score that matches,default value: 1. |
|
Mismatch |
The score that mismatch, default value: 2. |
|
Open |
The score for gap open, default value: 5. |
|
Extension |
The score for gap extension, default value: 2. |
-bw2sm
Bowtie to SAM.
-bd2bg
BED to BedGraph.
-wg2bd
Wig to BED.
-wg2bg
Wig to BedGraph.
The output name follows the principles below:
Output file name is the same with input file and with the output format suffix.
If the above name is existed,then "_1" will add to the end of output name.The number will increase when there is still the same with anothor one.
So that,the output file will not overwrite any existed file.
These rules only effect on the build-in scripts.