VirAmp Documentation
Transcription
VirAmp Documentation
VirAmp Documentation Release 1.0 Yinan Wan February 13, 2015 Contents 1 Setting up a EC2 instance 1.1 Step-1: Choosing the instance 1.2 Step-2: Review Instance type 1.3 Step-3: Launch the Instance . 1.4 Step-4: Create Key-pairs . . . . . . . 3 3 3 5 5 2 Login to your VirAmp instance and start the server 2.1 Start exploring the VirAmp platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 (optional) Log in to the new instance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 (optional) FTP configuration for large dataset uploading . . . . . . . . . . . . . . . . . . . . . . . . 7 7 7 9 3 VirAmp assembly pipeline manual 3.1 One-click pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Step-by-Step Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 11 12 4 Post-Assembly Process: Assessment and Variation Analysis 4.1 QUAST REPORT . . . . . . . . . . . . . . . . . . . . 4.2 Assembly-Reference Alignment . . . . . . . . . . . . . 4.3 Circos graph visualization . . . . . . . . . . . . . . . . 4.4 SNP analysis . . . . . . . . . . . . . . . . . . . . . . . 4.5 Repeat and Tandem repeat analysis . . . . . . . . . . . 4.6 BWA aligner . . . . . . . . . . . . . . . . . . . . . . . 17 17 21 21 22 22 23 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i ii VirAmp Documentation, Release 1.0 VirAmp is a galaxy-based system for fast virus genome assembly and variation discovery. Following is an overview of how VirAmp platform works: For VirAmp platform installation and usage on the Cloud: Contents 1 VirAmp Documentation, Release 1.0 2 Contents CHAPTER 1 Setting up a EC2 instance Go to http://aws.amazon.com/, in a Web browser. Select ‘My Account/Console’ on the top right if you already have an account; otherwise sign up with a new account. Go to the ‘AWS Management Console’ option, click the ‘EC2’ at upper left. Before import the AMI, make sure you are in the right Availability zone. Amazon EC2 is hosted in multiple locations world-wide with multiple Availability zones, Resources cannot replicated across regions untill specified. Our AMI is stored at region “US East(N. Virginia)”. Check the upper right corner next to your account name, and make sure it’s set at the right region. If not, just click and select the right one on the dropdown manual. Click the blue button ‘Launch Instance’ at the mid-level of the page. 1.1 Step-1: Choosing the instance Click Community AMIs tab at mid-left and simply search “viramp” 1.2 Step-2: Review Instance type Choose a proper instance type, for trials, one can choose the free tier, but for serious usage, it is advised to select at least the m3.large (third option) 3 VirAmp Documentation, Release 1.0 4 Chapter 1. Setting up a EC2 instance VirAmp Documentation, Release 1.0 1.3 Step-3: Launch the Instance 1.4 Step-4: Create Key-pairs Congratulations you have successfully launched your own version of the instance. For login and start the viramp version, please go to VirAmp instance login 1.3. Step-3: Launch the Instance 5 VirAmp Documentation, Release 1.0 6 Chapter 1. Setting up a EC2 instance CHAPTER 2 Login to your VirAmp instance and start the server At this point you have successfully owned your own version of VirAmp instance, so what’s next? 2.1 Start exploring the VirAmp platform Open viramp from browser, type in public_IP:8080 (for example, the demo is viramp.com:8080), which public_IP is the IP assigned to your instance, by default the server is open to public via port 8080 2.2 (optional) Log in to the new instance Alternatively, for experienced users, one can also modify the system based on the the specific requirement. An instruction and overview of the basic steps and parameters you need to login to the instance is provided at the console 7 VirAmp Documentation, Release 1.0 Hit the “Connect” buttom to view information you need for login to the backend of the system Start your terminal and type the following command: chmod 400 inst-demo.pem Connect to your instance using your public IP: ssh -i inst-demo.pem [email protected] 8 Chapter 2. Login to your VirAmp instance and start the server VirAmp Documentation, Release 1.0 Change to the galaxy directory: cd /mnt/galaxy/galaxy-dist/ Change viramp settings: vi universe_wsgi.ini Start the viramp server: sh run.sh 2.3 (optional) FTP configuration for large dataset uploading Galaxy’s generic uploading function cannot handle files larger than 2GB properly. Use FTP to upload data instead. ProFTPd has been preinstalled in the instance, and most of the configuration is already done, but users still may need to log in to the instance for some change. • Log in to the instance with instructions showing at the above section. • Change to galaxy home directory cd /mnt/galaxy/galaxy-dist • Edit the config file (universe_wsgi.ini), change the ftp_upload_site parameter to the IP address of the instance. • The FTP configuration file is located at /usr/local/etc. In general, it has been configed to fit in the system. Only experienced users may want to modify for further adjustment For more information about general ftp configuration on Galaxy, please visit the Galaxy wiki 2.3. (optional) FTP configuration for large dataset uploading 9 VirAmp Documentation, Release 1.0 10 Chapter 2. Login to your VirAmp instance and start the server CHAPTER 3 VirAmp assembly pipeline manual This is a general description of the function of each tool via VirAmp website or in your own version of platform. Detailed description is posted under the webpage of each tool. 3.1 One-click pipeline Two general pipelines are provided for a one-click options, for paired-end and single-end data individually. Users only need to submit raw data of read files and reference file. Besides running with the default settings, advanced setting is provided for user to config the pipelinconfig the pipeline. 11 VirAmp Documentation, Release 1.0 3.2 Step-by-Step Process Next we provide an introduction of each step in the process individually. 3.2.1 Quality Control Trim out low quality bases, the input file is the raw data in fastq format. A choice of either trim out low quality bases or mandatorily trim out certain length is provided. 12 Chapter 3. VirAmp assembly pipeline manual VirAmp Documentation, Release 1.0 3.2.2 Diginorm Reduce coverage and bias using Digital normalization, this step reduce the sample variation as well as sample bias. 3.2. Step-by-Step Process 13 VirAmp Documentation, Release 1.0 3.2.3 de novo Contig assembly Assembling the short reads into longer contigs, by default the One-click pipeline uses velvet, two alternatives SPAdes and VICUNA are provided. 3.2.4 Reference-based scaffolding Assemble the contigs into longer super-contigs, this step is a modification from AMOScmp 3.2.5 Reference-independent scaffolding Super-contig extension and connection, this step is using SSPACE. At the end of this step, the pipeline will produce a draft genome, which is a multi-fasta usually contains 5~15 contigs, listed in the same order as the references. 14 Chapter 3. VirAmp assembly pipeline manual VirAmp Documentation, Release 1.0 3.2.6 Gap closing This step connecting all the contigs in the multi-fasta from the previous step into one linear genome. This is for the convenience of downstream functional analysis especially for non-computational biologist. But this is optional and highly recommended to be done after all the assessment of the draft genome, as the gaps between the contigs could from misassembly, sequencing, genome feature etc. 3.2. Step-by-Step Process 15 VirAmp Documentation, Release 1.0 16 Chapter 3. VirAmp assembly pipeline manual CHAPTER 4 Post-Assembly Process: Assessment and Variation Analysis VIRAmp not only provides all the process related with assembly, the platform also integrates multiple tools for postassembly process, including quality assessment and variation analysis. 4.1 QUAST REPORT It is important to evaluate how robust the new assembly is, before feed into the downstream functional analysis. VIRAmp first provides a report of common assembly evaluation metrics based on comparing with reference. A detailed QUAST report can be downloaded. The input is the reference genome and new assembly. Primary output is summary of common assembly evaluation metrics 17 VirAmp Documentation, Release 1.0 18 Chapter 4. Post-Assembly Process: Assessment and Variation Analysis VirAmp Documentation, Release 1.0 Alternatively, a full report of QUAST report could be downloaded for more details. Unzip and open the report at local folder 4.1. QUAST REPORT 19 VirAmp Documentation, Release 1.0 A demonstration of QUAST plot 20 Chapter 4. Post-Assembly Process: Assessment and Variation Analysis VirAmp Documentation, Release 1.0 4.2 Assembly-Reference Alignment VIRamp provide information of the difference between the reference and new assembly based on the MUMmer alignment. Coordinates and percentage identity is provided for each aligned region between two sequences. It helps the users to identify large INDELs as well as other complex structure and variations. Table 1 demonstrates an example of the comparison report. 4.3 Circos graph visualization To help the users further understand the information provided above (Assembly-Reference Alignment), visualization is provided. Circos projects the assembled draft genome to the aligned part of reference, creating a straightforward visualization for large structural variation. 4.2. Assembly-Reference Alignment 21 VirAmp Documentation, Release 1.0 4.4 SNP analysis With the alignment between assembly and reference, SNP information is also provided in VCF format. 4.5 Repeat and Tandem repeat analysis By aligning the assembly against itself, VIRAmp also provides repeat information, starting coordinates and length is provided based on the alignment. 22 Chapter 4. Post-Assembly Process: Assessment and Variation Analysis VirAmp Documentation, Release 1.0 4.6 BWA aligner Besides all the specific tools listed above, general tools like bwa is also provided for use based on users’ own creation. For install VirAmp on your local machine • Download an Galaxy and follow the installation instruction • The Script/vamp directory contains all the scripts and galaxy tool config files, place the folder under galaxydist/tools. • Place the tool_config.xml in config under ‘galaxy-dist’. • Proftpd configuration as in ‘config/proftpd.conf’. • To get everything running, your will need the following softwares installed: – seqtk – diginorm – velvet – AMOS – Quast – MUMmer – Circos 4.6. BWA aligner 23