VirAmp Documentation

Transcription

VirAmp Documentation
VirAmp Documentation
Release 1.0
Yinan Wan
February 13, 2015
Contents
1
Setting up a EC2 instance
1.1 Step-1: Choosing the instance
1.2 Step-2: Review Instance type
1.3 Step-3: Launch the Instance .
1.4 Step-4: Create Key-pairs . . .
.
.
.
.
3
3
3
5
5
2
Login to your VirAmp instance and start the server
2.1 Start exploring the VirAmp platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 (optional) Log in to the new instance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 (optional) FTP configuration for large dataset uploading . . . . . . . . . . . . . . . . . . . . . . . .
7
7
7
9
3
VirAmp assembly pipeline manual
3.1 One-click pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Step-by-Step Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
11
12
4
Post-Assembly Process: Assessment and Variation Analysis
4.1 QUAST REPORT . . . . . . . . . . . . . . . . . . . .
4.2 Assembly-Reference Alignment . . . . . . . . . . . . .
4.3 Circos graph visualization . . . . . . . . . . . . . . . .
4.4 SNP analysis . . . . . . . . . . . . . . . . . . . . . . .
4.5 Repeat and Tandem repeat analysis . . . . . . . . . . .
4.6 BWA aligner . . . . . . . . . . . . . . . . . . . . . . .
17
17
21
21
22
22
23
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
i
ii
VirAmp Documentation, Release 1.0
VirAmp is a galaxy-based system for fast virus genome assembly and variation discovery.
Following is an overview of how VirAmp platform works:
For VirAmp platform installation and usage on the Cloud:
Contents
1
VirAmp Documentation, Release 1.0
2
Contents
CHAPTER 1
Setting up a EC2 instance
Go to http://aws.amazon.com/, in a Web browser.
Select ‘My Account/Console’ on the top right if you already have an account; otherwise sign up with a new account.
Go to the ‘AWS Management Console’ option, click the ‘EC2’ at upper left.
Before import the AMI, make sure you are in the right Availability zone. Amazon EC2 is hosted in multiple locations
world-wide with multiple Availability zones, Resources cannot replicated across regions untill specified. Our AMI is
stored at region “US East(N. Virginia)”. Check the upper right corner next to your account name, and make sure it’s
set at the right region. If not, just click and select the right one on the dropdown manual.
Click the blue button ‘Launch Instance’ at the mid-level of the page.
1.1 Step-1: Choosing the instance
Click Community AMIs tab at mid-left and simply search “viramp”
1.2 Step-2: Review Instance type
Choose a proper instance type, for trials, one can choose the free tier, but for serious usage, it is advised to select at
least the m3.large (third option)
3
VirAmp Documentation, Release 1.0
4
Chapter 1. Setting up a EC2 instance
VirAmp Documentation, Release 1.0
1.3 Step-3: Launch the Instance
1.4 Step-4: Create Key-pairs
Congratulations you have successfully launched your own version of the instance. For login and start the viramp
version, please go to VirAmp instance login
1.3. Step-3: Launch the Instance
5
VirAmp Documentation, Release 1.0
6
Chapter 1. Setting up a EC2 instance
CHAPTER 2
Login to your VirAmp instance and start the server
At this point you have successfully owned your own version of VirAmp instance, so what’s next?
2.1 Start exploring the VirAmp platform
Open viramp from browser, type in public_IP:8080 (for example, the demo is viramp.com:8080), which public_IP is
the IP assigned to your instance, by default the server is open to public via port 8080
2.2 (optional) Log in to the new instance
Alternatively, for experienced users, one can also modify the system based on the the specific requirement.
An instruction and overview of the basic steps and parameters you need to login to the instance is provided at the
console
7
VirAmp Documentation, Release 1.0
Hit the “Connect” buttom to view information you need for login to the backend of the system
Start your terminal and type the following command:
chmod 400 inst-demo.pem
Connect to your instance using your public IP:
ssh -i inst-demo.pem [email protected]
8
Chapter 2. Login to your VirAmp instance and start the server
VirAmp Documentation, Release 1.0
Change to the galaxy directory:
cd /mnt/galaxy/galaxy-dist/
Change viramp settings:
vi universe_wsgi.ini
Start the viramp server:
sh run.sh
2.3 (optional) FTP configuration for large dataset uploading
Galaxy’s generic uploading function cannot handle files larger than 2GB properly. Use FTP to upload data instead.
ProFTPd has been preinstalled in the instance, and most of the configuration is already done, but users still may need
to log in to the instance for some change.
• Log in to the instance with instructions showing at the above section.
• Change to galaxy home directory cd /mnt/galaxy/galaxy-dist
• Edit the config file (universe_wsgi.ini), change the ftp_upload_site parameter to the IP address of the instance.
• The FTP configuration file is located at /usr/local/etc. In general, it has been configed to fit in the system. Only
experienced users may want to modify for further adjustment
For more information about general ftp configuration on Galaxy, please visit the Galaxy wiki
2.3. (optional) FTP configuration for large dataset uploading
9
VirAmp Documentation, Release 1.0
10
Chapter 2. Login to your VirAmp instance and start the server
CHAPTER 3
VirAmp assembly pipeline manual
This is a general description of the function of each tool via VirAmp website or in your own version of platform.
Detailed description is posted under the webpage of each tool.
3.1 One-click pipeline
Two general pipelines are provided for a one-click options, for paired-end and single-end data individually. Users only
need to submit raw data of read files and reference file. Besides running with the default settings, advanced setting is
provided for user to config the pipelinconfig the pipeline.
11
VirAmp Documentation, Release 1.0
3.2 Step-by-Step Process
Next we provide an introduction of each step in the process individually.
3.2.1 Quality Control
Trim out low quality bases, the input file is the raw data in fastq format. A choice of either trim out low quality bases
or mandatorily trim out certain length is provided.
12
Chapter 3. VirAmp assembly pipeline manual
VirAmp Documentation, Release 1.0
3.2.2 Diginorm
Reduce coverage and bias using Digital normalization, this step reduce the sample variation as well as sample bias.
3.2. Step-by-Step Process
13
VirAmp Documentation, Release 1.0
3.2.3 de novo Contig assembly
Assembling the short reads into longer contigs, by default the One-click pipeline uses velvet, two alternatives SPAdes
and VICUNA are provided.
3.2.4 Reference-based scaffolding
Assemble the contigs into longer super-contigs, this step is a modification from AMOScmp
3.2.5 Reference-independent scaffolding
Super-contig extension and connection, this step is using SSPACE. At the end of this step, the pipeline will produce a
draft genome, which is a multi-fasta usually contains 5~15 contigs, listed in the same order as the references.
14
Chapter 3. VirAmp assembly pipeline manual
VirAmp Documentation, Release 1.0
3.2.6 Gap closing
This step connecting all the contigs in the multi-fasta from the previous step into one linear genome. This is for the
convenience of downstream functional analysis especially for non-computational biologist. But this is optional and
highly recommended to be done after all the assessment of the draft genome, as the gaps between the contigs could
from misassembly, sequencing, genome feature etc.
3.2. Step-by-Step Process
15
VirAmp Documentation, Release 1.0
16
Chapter 3. VirAmp assembly pipeline manual
CHAPTER 4
Post-Assembly Process: Assessment and Variation Analysis
VIRAmp not only provides all the process related with assembly, the platform also integrates multiple tools for postassembly process, including quality assessment and variation analysis.
4.1 QUAST REPORT
It is important to evaluate how robust the new assembly is, before feed into the downstream functional analysis.
VIRAmp first provides a report of common assembly evaluation metrics based on comparing with reference. A
detailed QUAST report can be downloaded.
The input is the reference genome and new assembly.
Primary output is summary of common assembly evaluation metrics
17
VirAmp Documentation, Release 1.0
18
Chapter 4. Post-Assembly Process: Assessment and Variation Analysis
VirAmp Documentation, Release 1.0
Alternatively, a full report of QUAST report could be downloaded for more details.
Unzip and open the report at local folder
4.1. QUAST REPORT
19
VirAmp Documentation, Release 1.0
A demonstration of QUAST plot
20
Chapter 4. Post-Assembly Process: Assessment and Variation Analysis
VirAmp Documentation, Release 1.0
4.2 Assembly-Reference Alignment
VIRamp provide information of the difference between the reference and new assembly based on the MUMmer alignment. Coordinates and percentage identity is provided for each aligned region between two sequences. It helps the
users to identify large INDELs as well as other complex structure and variations. Table 1 demonstrates an example of
the comparison report.
4.3 Circos graph visualization
To help the users further understand the information provided above (Assembly-Reference Alignment), visualization
is provided. Circos projects the assembled draft genome to the aligned part of reference, creating a straightforward
visualization for large structural variation.
4.2. Assembly-Reference Alignment
21
VirAmp Documentation, Release 1.0
4.4 SNP analysis
With the alignment between assembly and reference, SNP information is also provided in VCF format.
4.5 Repeat and Tandem repeat analysis
By aligning the assembly against itself, VIRAmp also provides repeat information, starting coordinates and length is
provided based on the alignment.
22
Chapter 4. Post-Assembly Process: Assessment and Variation Analysis
VirAmp Documentation, Release 1.0
4.6 BWA aligner
Besides all the specific tools listed above, general tools like bwa is also provided for use based on users’ own creation.
For install VirAmp on your local machine
• Download an Galaxy and follow the installation instruction
• The Script/vamp directory contains all the scripts and galaxy tool config files, place the folder under galaxydist/tools.
• Place the tool_config.xml in config under ‘galaxy-dist’.
• Proftpd configuration as in ‘config/proftpd.conf’.
• To get everything running, your will need the following softwares installed:
– seqtk
– diginorm
– velvet
– AMOS
– Quast
– MUMmer
– Circos
4.6. BWA aligner
23