dsfsdfsd
Transcription
dsfsdfsd
BARCODE OF LIFE DATA SYSTEMS BOLDSYSTEMS.org Handbook O c to b e r 2008 B A R C O D E O F L I F E D A T A S Y S T E M S Table of C on t e n t s BOLD Handbook 1. Introduction 2. BOLD General System Map 3. Signing up for BOLD 4. Taxonomy Browser 5. BOLD Search 6. Create a BOLD Project 7. Submission Protocols a) Data Submission b) Image Submission c) Trace Submission d) Sequence Submission 8. BOLD Project Summary ..........................................1 ..........................................2 ..........................................3 ..........................................4 ..........................................5 ..........................................6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1. I n t rod uc tio n The Barcode of Life Data System (BOLD) is an informatics workbench aiding the acquisition, storage, analysis and publication of DNA barcode records. By assembling molecular, morphological and distributional data, it bridges a traditional bioinformatics chasm. BOLD is freely available to any researcher with interests in DNA barcoding. By providing specialized services, it aids the assembly of records that meet the standards needed to gain BARCODE designation in the global sequence databases. Because of its web-based delivery and flexible data security model, it is also well positioned to support projects that involve broad research alliances. This handbook provides details on how to sign up for BOLD and create a project. It also explains how to upload specimen data, images, traces and sequences to your project on BOLD. Figure 1-1: The front page of BOLD. 1 BOLDSY BO BOLDSYSTEMS.org LD DSY SYST SYST STEM EMS. EM MS. S or org B A R C O D E O F L I F E D A T A S Y S T E M S 2. BO L D Ge n er a l S y st em M ap www.barcodinglife.org Manual Input Request an account Document Action Tax Browser BOLD Tutorial BOLD Taxon Search Data Image All Barcodes Specimen ID Trace Browse Hierarchy Data Species Level Specimen ID Image ITS Database Identification Reference Specimen ID Trace Species Barcoded Report Download Sequences COI Database Identification ID Specimen Analysis Templates Manuals Documentation Downloadable Data BOLD Handbook BOLDSYSTEMS Legend Viewable Data Species Page Specimen ID Taxon ID Tree Project Management from Project Console or Record Listing Page Sequence Analysis Downloads Sequences Published Projects* Taxon ID Tree Data Spreadsheets Nearest Neighbour Distance Summary Distribution Map Spec Age vs Seq Length Specimen Labels Sequence Composition Traces Specimen Aggregates Image Library *The published projects are also accessible when a user is signed in to the private projects workspace These functions are only available from the private project console Private Projects* (log-in) Uploads Specimen Data Primers Publication View All Primers Submit to Genbank Register Primers Project Summary Images Create New Project Search All Records Traces Sequences BOLDSYSTEMS.org 2 B A R C O D E O F L I F E D A T A S Y S T E M S BOLD Handbook 3. Si g ni ng u p fo r B O LD Getting an account on BOLD allows you to upload your data into a private workspace and take advantage of the integrated analytical tools. On the BOLD main page (www.boldsystems.org) click on either one of the two links: ‘Request an Account’ or ‘Request a new user account’. These links will take you to the New User Application Form. (http://www.boldsystems.org/views/newuserapp.php) Click on ‘Submit Request’ to send your application to BOLD. An introductory e-mail will be sent to you with the information you need to log in and begin using BOLD. Once you have an account you can login via the main page to access your private workspace.Your next step will be to create a project. Please see page 4 for instructions. Valid Email Address Use a current institutional email. First Name Fill in your first name, first letter should be capitalized Middle Initial Fill in middle initial(s) if needed, capitalized Last Name Fill in your last name, first letter should be capitalized Institutional Affiliation Select the name of your institution Add New Institution If your institution is not listed, click on button to register it Password Should be at least 5 characters Table 3-1: Information required to create a new user account on BOLD. Figure 3-1: New user account creation on BOLD. 3 BOLDSY BO BOLDSYSTEMS.org LD DSY SYST SYST STEM EMS. EM MS. S or org B A R C O D E O F L I F E D A T A S Y S T E M S 4. BO L D Ta x o n o my B row ser BOLD Handbook The taxonomy browser allows users to examine the progress of DNA barcoding, and to browse different levels of the taxonomic hierarchy. Animals, Plants, Fungi, and Protists are being barcoded and the user can browse through each kingdom from phylum down to the species level. Figure 4-1: The BOLD taxonomy browser. Lineage Lists the higher taxonomic levels. Graphic Displays of: Specimen Records The number of specimen records. Specimens with Barcodes The number of barcoded specimens. Public Sequences The number of public sequences and a link to download them. List of Species Barcoded A list of all species with records on BOLD. The number of specimens, the number of sequences and the number of sequences greater than 500bp are listed. Link Outs Links to several community partners pages for that specimen Lower Taxonomy Links to all lower classifications » the total number of barcodes and reference barcodes. » quantity of species barcodes and those used as reference barcodes. » the institutions where the specimens are deposited. » a map of the world highlighting specimen collection locations. » a graph showing the frequency of specimens/barcodes against age. » a list of countries where specimens were collected, including the number of specimens from each country. » various images of specimens within that taxonomic group. Table 4-1: Information available at each taxonomic level within the BOLD taxonomy browser. BOLDSYSTEMS.org 4 B A R C O D E O F L I F E D A T A S Y S T E M S BOLD Handbook 5. B O L D Se arc h On the BOLD project list page, select the ‘Search All Records’ link on the top left hand side.There are two types of searches for BOLD: Basic Search and Advanced Search. Taxonomy Searches the taxonomic names on BOLD. There is a text field for search terms that should be either included or excluded from the search Geography Country/Province Searches the country and province names on BOLD. There is text field for search terms that should be either included or excluded from the search Geography Region Searches region names on BOLD. A text field for search terms that should be included in the search. Sequence Length Text fields for each of the minimum and maximum number of base pairs. Specimen/Sample ID Searches the taxonomic names on BOLD. More information about the drilldown menus and how they work would go here, pending content from Megan. Searches sample IDs and process IDs on BOLD. There is also the option of pasting a list of sample or process IDs from a spreadsheet (link to the right). Include GenBank Data When checked, the search includes GenBank records on BOLD. Country/FAO and State/Province: Searches the country and province names on BOLD Single Representative per Species When checked, the search will only display one representative per species found. Note that any search criteria containing spaces such as Species names, country names that consist of more than one word, and sample ID’s with spaces should be wrapped in double quotes (eg “United States” or “Drosophila melanogaster”). The Paste from Spreadsheet function allows you to paste a column of sample IDs or process IDs from a spreadsheet and will automatically place quotes around search criteria that require them. Taxonomy Geography Table 5-1: Explanation of the terms used within the Basic BOLD search functions. Table 5-2: Explanation of the terms used within the Advanced BOLD search functions. Figure 5-1: The BOLD search engine, showing both basic and advanced search functions. 5 BOLDSY BO BOLDSYSTEMS.org LD DSY SYST SYST STEM EMS. EM MS. S or org B A R C O D E O F L I F E D A T A S Y S T E M S 6. C re a t i n g a n ew p ro je c t BOLD Handbook Once logged into BOLD, select the ‘Create New Project’ link on the top left hand side of the project list page. It will take you to the New Project Submission Form. The following pieces of information need to be entered in order to create the project: Project Title Please create a descriptive name Project Code A 3-5 letter code. It needs to be unique across BOLD Project Type Choose between the following options: • Data Project (contains specimen & sequence records) • Folder Project (contains other projects) Primary Marker Select your primary marker. CO1 is the default. • Cytochrome Oxidase Subunit 1 5’ • Region Interspacer (ITS) Region Campaign Select the name of the campaign the project is part of or ‘None (General Project)’ if it is not part of a campaign. Place in container Select the name of the Folder Project or ‘Independent Project’ if it does not belong into a folder project. Project Description A brief summary of the use and intention of the project. Project Manager The person who creates a project is automatically the project manager, and has full specimen and sequence access. Assign Users Other BOLD users can be added to a project. Different levels of access are possible: • Sequence Access: Analyze,View, and Edit Sequences • Specimen Access: Edit Specimens Table 6-1: Required information for BOLD project creation. Figure 6-1: The BOLD new project submission form. Sequence Access permissions consist of three levels. With Analyze permission, the user can perform analysis on the data, but cannot view more than a summary of the data (sequence and related information remain hidden). With View permission, the user can view or download the sequence data. With Edit permission, the user can upload sequences or make changes to existing sequence features. Specimen Access permission allows the user control over sample identifiers, taxonomy, collection data, and images of the specimen: this level is intended for project managers, collectors, and taxonomists only. To submit your entries to BOLD, click ‘Save’ at the bottom of the form. Please note that the person who creates a project is automatically the project manager of that project. The project manager has full access to the project and can assign other users to the project. The project manager can change any details or add/remove users, by simply clicking on ‘Modify Project Properties’ in the upper left corner of the project. BOLDSYSTEMS.org 6 B A R C O D E O F L I F E D A T A S Y S T E M S D a t a S u b m i s s i o n P ro to c o l 7a) D a t a Su b m is sio n P ro t o c ol This protocol assists in the submission of bulk data to BOLD.This is the easiest way to populate your project with records, as well as the only way to enter new species taxonomy into the BOLD library. Described below is the necessary format of the data that is required for a correct submission. Whenever a bulk submission is sent to the data manager([email protected]), the following pieces of information need to be sent in the body of the emai, with the standard submission spreadsheet attached: I. II. III. IV. V. Project title Project code Project manager Priority Level (High, Intermediate or Low) Submission type (New Records or Update)* * If type is ‘Update’: Please specify which worksheets (Voucher Info, Taxonomy, Specimen Details, or Collection Data) need to be updated. See page 7 for more information. The data spreadsheet consists of 4 worksheets, a main specimen identifier worksheet (voucher info) that is linked to three other worksheets: taxonomy, specimen details, and collection data. (Refer to Tables 1 through 4 for field definitions) Sampple ID * ID associated with the sampl p e beingg sequenced (often an extension of field or Museum ID). Reproduction Sexual/asexual/cyclic parthenogen only. Fiel Fi eld d ID * Specim Spec imen en iide dent ntifi ifieerr fr from om a ppri riva vate te collllection i or Field ld number b ffrom a collection event. Life Stage Adult/immature only. Extra Info Muse Mu seum um IID D* Catalo Cata logg nu numb mber er iinn cu cura rate ted d co collllec ecti tion on for a vouchered specimen. User Specified Characteristics (free text) Can be displayed on a tree or used to sort records. Limited to a maximum of 50 characters. Designate FAO region here. Notes Collection Code Code associated with given collection. Institution Storing * Full name of the institution where spec sp ecim imen en iiss vo vouc uche here red d. Free text or XML tagged text. All XML text should be surrounded by the XML start (<xml>) and stop (<xml>) tags. Sample Donor Full name of individual responsible for providing specimen or tissue sample. Donor E-mail E-mail of the sample donor. Table 7a-1: Field definitions for Voucher info page on accompanying spreadsheed. Sex Male/female/hermaphrodite only. Table 7a-3: Field definitions for Specimen Details page on accompanying spreadsheet. Collectors Comma delimited list of collectors. Collection Date Date of collection, must be in MM-DDYYYY format. Continent/Ocean ISO Continents and Oceans. Full Taxonomy Full taxonomy consisting of phylum*, class, order, family, subfamily (optional), genus, species binomial. Identifier Full name of primary individual responsible for providing taxonomic identification of the specimen. Identifier E-mail E-mail address of the primary identifier. Identifier Institution Institution of the identifier. Country ISO Countries. State/Province States and provinces (according to Getty Geographical Thesaurus). Region Park, county, district, lake or river. Sector Sector of park or county/city. Exact Site Description of collection location. GPS Coordinates Latitude & Longitude in “degrees.decimal degrees” format (e.g. 45.837). Elevation/Depth Table 7a-2: Field definitions for Taxonomy page on accompanying spreadsheet. Table 7a-4: Field definitions for Collection Data page on accompanying spreadsheet. * Minimum required fields for new records. 7 Elevation or depth in meters. BOLDSY BO BOLDSYSTEMS.org LD DSY SYST SYST STEM EMS. EM MS. S or org B A R C O D E O F L I F E D A T A S Y S T E M S Dat a Su bmi s s io n - E xa m p le s D a t a S u b m i s s i o n P ro to c o l Here is an example of a properly filled in data submission. You can get this blank template in two ways: • From the info CD that came along with your sampling units from the CCDB • Online by clicking on “Specimen Data” under the Uploads menu – the sheet is available through the link at the top Use the tabs at the bottom of the workbook to navigate through the four pages. All of the data in BOLD is organized by projects. There is a limit of 1000 entries for a given project, to keep the size manageable. Related projects can be grouped into containers. An individual entry in the database represents a barcode of a given specimen. The Process ID uniquely represents a specimen in BOLD. This is the identifier that is used to track a specimen through the barcoding process: collection, taxonomic identification, sequencing, analysis and final publication of data. Process ID is assigned internally when a specimen record is created. Specimen data can be entered in one of two ways. As outlined here, for larger sets of samples, the data can be entered on the Data Submission Template spreadsheet and sent to BOLD. Data managers will review the data, to ensure that it meets the minimum requirements, and input it to BOLD. For smaller numbers of entries, (ie: 1-10 records) users can enter sample data through the web interface by clicking on “Specimen Data” under the Uploads menu and using the manual interface there. Sample ID Field ID Museum voucher ID Sample-demo01 Sample-demo01 Sample-demo02 Sample-demo02 15466-JUC-ISC Sample-demo03 Sample-demo03 Specimen Info Collection Code Institution Storing BIO Joe Smith [email protected] ISC ROM Joe Smith [email protected] BIO Joe Smith [email protected] Sample Donor Donor Email Figure 7a-1: Example data for Specimen Info Taxonomy Order Family Subfamily Genus Species Identifier Identifier Email Identifier Institution Sample-demo01 Arthropoda Insecta Diptera Asilidae Hydropsychinae Efferia Efferia aestuans Joe Smith [email protected] Oxford Sample-demo02 Arthropoda Insecta Diptera Asilidae Joe Smith [email protected] Oxford Sample-demo03 Arthropoda Insecta Diptera Joe Smith [email protected] Oxford Sample ID Phylum Class Asilus Figure 7a-2: Example data for Taxonomy Reproduction Specimen Details Life Stage Extra Info Sample-demo01 Female Sexual Adult Sample-demo02 Male Sexual Adult Sample-demo03 Male Sexual Adult Sample ID Sex Notes Commonly called ‘Robber Fly’ feeding on fruit Figure 7a-3: Example data for Specimen Details Sample ID Collection Info Collection Continent State / Exact Collectors Country Region Sector Date / Ocean Province Site Sample-demo01 Joe Smith 27-Jul-07 Sample-demo02 Joe Smith 27-Jul-07 Sample-demo03 Joe Smith 5-Sept-07 Asia Central America Japan Latitude Longitude Izarigawa, 42.878 141.572 Eniwa Hokkaido Japan Hokkaido Soya Costa Rica Guanacaste ACG Elevation 45 44.671 142.788 Mundo Neuvo 10.772 -85.434 305 Figure 7a-4: Example data for Collection Info BOLDSYSTEMS.org 8 B A R C O D E O F L I F E D A T A S Y S T E M S D a t a S u b m i s s i o n P ro to c o l Dat a Sub mi ssio n - Ty p es There are two types of submissions: “New Submission” and “Update”. A new submission is what is done every time new records are added to a project. Update submissions are for modifying records that already exist in a project. If you wish to only update one or two records, please manually select the specimen from the species record listing in your project and clicking on the “edit” button in the upper right corner. Any details can be edited in this way, except for adding new taxonomy to BOLD. If there is new taxonomy to add to the BOLD library this should be sent in as an update. New Submission Update Submission New submissions are project specific, so that their data can be associated with a project on BOLD. If records are submitted that need to be entered into different projects on BOLD, a separate file for each project needs to be sent. The quickest way to update data is to download the Data Spreadsheet from BOLD containing the records that need to be modified. To do so, click on “Data Spreadsheets” from the Downloads menu on the upper left side of your project. Only download the worksheets and records that will be affected by the update (e.g. if the taxonomy needs to be updated only download the Taxonomy worksheet, if specimen details and collection date need to be update only download the Specimen Details and Collection Data worksheets, etc.). Once the worksheets are downloaded, modify the data and copy it into the standard submission spreadsheet. The submitted update should reflect what the data should be on BOLD. Please send this on to the data manager. The minimal requirements for a new submission on BOLD are: • Voucher Info Page - Sample ID • Voucher Info Page - Field ID and/or Museum voucher ID • Voucher Info Page - Institution Storing • Taxonomy Page - Phylum Other useful information: It is important to use a unique and original format for the sample IDs. If the sample IDs provided are not original to BOLD, they will need to be changed before the data can go online. Provide as much detail and additional information as possible with a new submission. That way, it will take less time later to update the blanks. Only the following characters may be used in the sample ID, field ID, and museum ID: Numbers, letters, and ^ . : - _ ( ) # All other characters will be removed. If the specimen has sex, reproduction or life stage values that do not fit the accepted values for Specimen Details, then please move the information to the Extra Info or Notes fields. NOTE: Any fields left empty will be considered blank and thus removed from BOLD during an update. Do not remove any data from the update sheet if you’d like it to stay on BOLD. The computer cannot distinguish between “blank: do not update this field’ or “blank: delete the content of this field”. Updates to Voucher Info are slightly different from updates to Taxonomy, Specimen Details, and Collection Data. a.) Updates to Voucher Info Identical to new submissions, updates to the voucher info are project specific. The records need to be split into their corresponding project. In the case where the donor or identifier is deceased or retired, please make note of that in the email field. This is important to provide this information so we can keep the database up-todate. b.) Updates to Taxonomy, Specimen Details, and Collection Data Updates to taxonomy, specimen details, and collection data are project independent. Records from any number of projects can be submitted in one submission spreadsheet, and the number of records are (in theory) infinite for this type of update. If the submission is part of a campaign, iBOL Working Group, or a checklist, please let us know in the submission email. Please see the previous page for an example of the filled in spreadsheet. 9 BOLDSY BO BOLDSYSTEMS.org LD DSY SYST SYST STEM EMS. EM MS. S or org B A R C O D E O F L I F E D A T A S Y S T E M S 7b) Image S u b m is sio n P ro t o co l Image File * Complete (incl. extension) and identical file name (case sensitive) of images. Original Specimen * Enter yes if the image shows the actual specimen for this record. Otherwise enter no. View Metadata * A short tag describing the orientation of the image that will appear on BOLD. Caption Additional information about the image. Copyright info or descriptions are recommended. Measurement Measurement that was taken (including the unit of measurement. Measurement Type Item that was measured (e.g. body length, wing span, etc.) The recommended steps are oulined below: Sample ID * Sample ID for photographed specimen, must match Sample. 1. Collect Images: Collect high-quality images of specimens in .jpg format for your project. BOLD accepts high resolution images up (up to 20 megapixels) but only displays a greatly reduced thumnail. Your high resolution image is archived but will not be used without the submitter’s consent. Refer to the following page for a guide on picture orientation and quality. Process ID * Process ID for photographed specimen, must match Process ID in BOLD. The image submission package for BOLD is a zip file containing a set of images and an Excel spreadsheet that associates the necessary data with each image. There must be a row in the spreadsheet for each image uploaded and the required columns must be filled in (See Table 1). A template spreadsheet can be downloaded from the BOLD site (www.boldsystems.org/dsfsdfsd) 2. Assemble Package: The image submission package should consist of all images (.jpg) and a spreadsheet with the file names and ancillary data. Make sure that all images in the package are accounted for in the spreadsheet. When submitting more than one image per specimen simply copy the ‘Sample ID’ and ‘Process ID’ to the next line with the file name of the consecutive image. You can upload 1 to 10 images per specimen, depending on organism characteristics. Please photograph several different orientations if needed. The submission spreadsheet should be named ImageData.xls and contains the columns described in Table 1. Image File Original Specimen View Metadata Caption I m a g e S u b m i s s i o n P ro to c o l This protocol outlines the image submission process on BOLD. It describes the necessary format of the images and the ancillary data, and the steps required to build the uploadable package required for a successful submission. Table 7b-1: Field definitions for accompanying spreadsheet. * Required Fields Steps: A. Fill in the ImageData.xls data sheet with all the data related to the images in the submission package. To create the list of image files in a folder, open a terminal window (Start > Run > cmd in Windows), navigate to the folder containing the image files, and then run one of the following commands: Windows MacOS Linux/Unix dir /b *.jpg>list.txt ls *.jpg*.JPG>list.txt ls *.jpg*.JPG>list.txt These commands will generate a list of all the files in the current folder and save it in list.txt. You can then open list.txt in move the data into the Image File column. Measurement Measurement Type Sample Id ROM 10912 Process Id ROM101912-D.JPG yes Dorsal skull 15 mm skull length BM272-03 ROM101912-L.JPG yes Lateral lower jaw 7 mm length ROM 10912 BM272-03 ROM101912-L2.JPG yes Lateral skull 15 mm skull length ROM 10912 BM272-03 ROM101912-V.JPG yes Ventral skull 15 mm skull length ROM 10912 BM272-03 ROM101912-D2.JPG yes Dorsal skin 50 mm dody length ROM 10912 BM272-03 ROM101912-V2.JPG yes Ventral skin 50 mm body length ROM 10912 BM272-03 ROM101944-D.JPG no Dorsal skull 17 mm skull length ROM 10944 BM278-03 Figure 7b-1: Image Submission Spreadsheet (ImageData.xls) completed with sample data. BOLDSYSTEMS.org 10 I m a g e S u b m i s s i o n P ro to c o l B A R C O D E O F L I F E D A T A S Y S T E M S B. These two components (Image files and Spreadsheet) need to be placed in a single folder. Compress them all into a single file before submitting. The following free tools are available to provide this functionality: » WinZip - http://www.winzip.com » WinRar - http://www.rarsoft.com » MacZipIt - http://www.maczipit.com C. BOLD will accept a maximum file size of 195 MB. Upload the images to BOLD by clicking on the link Specimen Images in the Uploads menu of the desired project. Select the zipped folder of images and then hit “submit”. I mage Su bm is sio n - T ip s a n d Tro u bl esh o o ti n g This section describes the most commonly-encountered image upload problems. • Zipped file must be under 195MB in size. If the upload fails to initialize, the zipped file may be too large. Break it into two uploads, each with its own spreadsheet. • The spreadsheet can not contain any formulas. • If the upload program can not find the image files, it is possibly because it can not read • • • • • 11 the names. Make sure that the spreadsheet contains text values only. Full filenames must be used in excel sheet. The extension (.jpg) must be included in the image file name. The file extension is case sensitive. Spreadsheet must be named ImageData.xls. If the upload program can not find the excel sheet, confirm that it is named correctly (case sensitive). Max of 30 characters in the free text fields of the excel sheet. Verify that the data length in these fields and make adjustments if necessary Data must start on the second line of the spreadsheet. There is only one line for the column headers. Adding extra columns to the sheet will cause errors. BOLDSY BO BOLDSYSTEMS.org LD DSY SYST SYST STEM EMS. EM MS. S or org B A R C O D E O F L I F E D A T A S Y S T E M S Phot og raphy G u id e All images should be in landscape orientation, with a 2x3 aspect ratio. If your specimens do not easily fit these criteria please try to keep them in a standardized position., as this makes it much easier to compare specimens within a project. If desired, a measurement scale may be included in the image to provide a size reference. Figure 7b-2: Suggested sample photographs. Dorsal Lateral Dorsal • The anterior of the specimen should be facing the top of the image frame • The specimen should be face-down, with the dorsal aspect of the head visible I m a g e S u b m i s s i o n P ro to c o l Please take pictures using the high quality mode on your camera. The specimen should be centered in the image frame. Photos should be taken as close-up as possible, leaving very little gap around the edges. The following standard orientations should be adhered to when appropriate. Lateral • The anterior of the specimen should be facing the left side of the image frame • The specimen should be oriented with the feet towards the bottom of the image Ventral Ventral • The anterior of the specimen should be facing the top of the image frame • The specimen should be face-up, with the ventral aspect of the head visible BOLDSYSTEMS.org 12 B A R C O D E O F L I F E D A T A S Y S T E M S Tra c e F i l e S u b m i s s i o n P ro to c o l 7c) Tra c e Fi le S u b m issio n P ro to co l This protocol assists in the submission of trace files to BOLD. It describes the necessary format of the files and the ancillary data that is required for the correct submission. 1. Register Primers: Please see the next page for details on how to register primers. 2. Assemble Package: The submission package consists of trace files (.ab1), corresponding phred files (.phd.1) and a spreadsheet with the file names and ancillary data. The submission spreadsheet should be named data.xls and contain the columns described to the right. Trace File * Complete (incl. extension) and identical file name (case sensitive). Score File Complete (incl. extension) and identical file name (case sensitive). PCR Primers Fwd/Rev * Primer codes are case sensitive. Sequence Primer Primer codes are case sensitive. Read Direction * Forward or Reverse. Process ID * Process Id of specimen, must match Process Id in BOLD. Table 7c-1: Field definitions for accompanying spreadsheet. * Required Fields Steps: A. Fill in the data.xls sheet with all the data about your files. To create the list of the files in a folder, you need to open a terminal window (Start > Run > cmd in Windows), navigate to the folder where the trace and score files have been placed and then run one set of the following commands: Windows dir /b *.ab1>ab1.txt MacOS ls *.ab1>ab1.txt Linux/Unix ls *.ab1>ab1.txt and and and dir /b *.phd.1 >phd.txt ls *.phd.1 > phd.txt ls *.phd.1 > phd.txt These commands will generate lists of all the files in the current folder and save it ab1.txt and phd.txt. You can then open the text files and move the data into the appropriate columns. B. These components (Trace files, Score files and Spreadsheet) need to by placed in a single folder. Compress them all into a single file before submitting. The following free tools are available to provide this functionality: » WinZip - http://www.winzip.com » WinRar - http://www.rarsoft.com » MacZipIt - http://www.maczipit.com C. BOLD will accept a maximum file size of 195MB. Upload the images to BOLD by clicking on the link “Trace Files” in the Uploads panel of the desired project. Select the zipped folder of files and then hit “submit”. PCR Fwd KKBNA001-04_H01.ab1 KKBNA001-04_H01.phd.1 BirdF1 KKBNA001-04r_H07.ab1 KKBNA001-04r_H07.phd.1 BirdF1 PCR Rev BirdR1 BirdR1 BirdR1 BirdR1 Forward Reverse KKBNA001-04 KKBNA001-04 KKBNA002-04_G01.ab1 BirdF1 BirdR1 BirdR1 Forward KKBNA002-04 KKBNA002-04r_G07.ab1 KKBNA002-04r_G07.phd.1 BirdF1 BirdR1 BirdR1 Reverse KKBNA002-04 KKBNA003-04_F01.ab1 KKBNA003-04r_F07.ab1 KKBNA004-04_E01.ab1 BirdR1 BirdR1 BirdR1 BirdR1 BirdR1 BirdR1 Forward Reverse Forward KKBNA003-04 KKBNA003-04 KKBNA004-04 Trace File Score File KKBNA002-04_G01.phd.1 KKBNA003-04_F01.phd.1 KKBNA003-04r_F07.phd.1 KKBNA004-04_E01.phd.1 BirdF1 BirdF1 BirdF1 Seq Primer Figure 7c-1: Trace File Submission Spreadsheet (data.xls) completed with sample data. 13 BOLDSY BO BOLDSYSTEMS.org LD DSY SYST SYST STEM EMS. EM MS. S or org Read Direction Process Id B A R C O D E O F L I F E D A T A S Y S T E M S Tra c e F i l e S u b m i s s i o n P ro to c o l Tra c e Fi l e - P r im e r Reg is t r ati o n Be sure that your primer codes are registered with BOLD before assembling the submission package. To register your primers, select “Register Primers” from the Project Options menu in your project on BOLD. On the form, you are asked to fill in the following information: Figure 7c-2: BOLD Primer submission form Primer Code Create a code for your primer. If the primer is already published in a manuscript, please use the code that is in press. Direction Select the direction Fill in references and/or citations Primer Description This field is for filling in a description of what the primer is used for. Reference/ Citation Notes Notes about the primer Alias Codes Fill in any other known code names for your primer, separated by commas Publicly Available Target Marker Select the target marker from the controlled list of markers (e.g. ITS, COI 5’, matK, etc.) Primer Sequence Fill in the sequence, 5’ to 3’ If the primer has already been published, or if you wish to make it publicly available, this should be left public If the primer you used has already been registered under a different name, you will be provided with the registered code to be used in your submission. Table 7c-2: Field definitions for accompanying figure. BOLDSYSTEMS.org 14 B A R C O D E O F L I F E D A T A S Y S T E M S Tra c e F i l e S u b m i s s i o n P ro to c o l Tra c e Fi l e S u b m issio n - T ip s an d Tro u bl esh o o ti n g This section describes the most commonly encountered trace file upload problems. • Primers must be registered before upload. If the primers are not registered, there will be an error. Please refer to the previous page for details on how to register primers. • Zipped file must be under 195MB in size. If the upload fails to initialize, it is probably because the zipped file is too large. Try breaking it into two uploads, each with its own spreadsheet. • The spreadsheet cannot contain any formulas. • If the upload program can not find the files, it is possibly because it can not read the names. Make sure that you have text values only in the spreadsheet. • Full filenames must be used in excel sheet. The extension (.ab1, .phd.1) must be included in the file name. These extensions are case sensitive. • Spreadsheet must be named data.xls. If the upload program can not find the excel sheet, confirm that it is named correctly (case sensitive). • Data must start on the second line of the spreadsheet. There is only one line for the column headers. • Do not add extra columns to the spreadsheet. • Trace files will not be downloadable from BOLD until 24 hours after they have been submitted. Figure 7c-3: A list of public primers available from the project console. These are helpful for those who are new to barcoding. Figure 7c-4: Trace file for Vulpes vulpes (red fox). 15 BOLDSY BO BOLDSYSTEMS.org LD DSY SYST SYST STEM EMS. EM MS. S or org B A R C O D E O F L I F E D A T A S Y S T E M S This protocol outlines the sequence file submission process on BOLD. It describes the necessary format of the sequences, and the steps required for a successful submission. 1. Assemble Package: The sequence submission should consist of sequences in fasta format referenced by BOLD Process IDs. 2. Upload Package: You can put up to 1000 sequences into one upload. Upload the sequences to BOLD by clicking on the link “Sequences” in the Uploads menu of the desired project. Paste the sequences into the text box and hit “submit”. Figure 7d-1: Pop-up window for uploading traces » If you wish to replace a sequence on BOLD, simply upload the new one with the same Process ID. » If you wish to delete a sequence on BOLD, simply upload “NNNNN” associated with the process ID. Example: >TZBNA001-05 CTGCAGGANCAAAAAATGAAGTATTTAAATTTCGATCTGTTAATAATATAGTAATAGCTCCTGCTAATACAGGTAAAGATAATAATAATAAAAAAGCTGTAATTCCTACAGCTCAAACGAAAAGGGGTAGTTGATCGAAAAATATATTATTTAATCGTATATTAATAATAGTTGTAATAAAATTAATTGCTCCTAAAATAGAAGAA >TZBNA002-05 CAGCTAATACGGGTAAAGATAATAATAATAAAAAAGCTGTAATTCCTACTGCCCAAACAAAAAGAGGTAATTGATCAAAAAATATATTATTTAAGCGTATATTAATAATAGTTGTAATAAAATTAATTGCCCCTAAAATAGAAGAAATTCCTGCTAAATGAAGAGAAAAAATAGCTAAATCTACAGAACTACCCCCATGGGCGATATTAGAAGATAATGGGGGGTAGACTGTTCATCCTGTT >TZBNA012-05 AAAATAGCTAAATCAACTGAGCTTCCTCCATGAGCAATATTAGATGATAGTGGGGGGTAAACTGTTCATCCTGTTCCAGCTCCATTTTCTACCACTCTTCTTGAAATTAAAAGAGTAATAGAAGGGGGGAGTAATCAAAATCTTATATTATTTATTCGTGGGAAAGCN Figure 7d-2: Illustrative barcode for Homo sapiens (human). BOLDSYSTEMS.org 16 S e q u e n c e S u b m i s s i o n P ro to c o l 7d) Se qu e n c e S u b m issio n P ro to co l B A R C O D E O F L I F E D A T A S Y S T E M S BOLD Handbook 8. BO L D C o n so le Once your project has been populated with the data, images, traces and sequences that you have uploaded to BOLD, it will look like the figures on the right. For further information on how to navigate a project, please refer to the description below. Project Console The console shows you a report of the amount of specimens, along with tallies of any missing components of the records. The console includes graphs to provide a quick visual overview of the project, as well as a list of all the users on the project. The links to the left provide access to uploads, downloads and various analysis tools. The record listing can be accessed by clicking on “View All Records” under the Project Data Views menu in the upper left corner. Record List The record list gives access to the individual specimen and sequence data for each record. You can select specific records for analysis or updates using the checkboxes. Icons will appear next to a record to indicate the presence of certain aspects of a record. Figure 8-1: BOLD Project Console GPS coordinates present for sample Images present for sample The number of traces present Stop codons present in sequence Contamination present in sequence Flagged record, not in ID engine Table 8-1: BOLD Record List icons Click on the Sample ID or the Process ID to access the Specimen Data and Sequence Data respectively, for each record Specimen Window This window provides voucher details, taxonomy, specimen details and collection data, along with a world map of where the specimen was collected. The images for the specimen are located at the bottom of the window. To edit any details, simply select “Edit” from the upper right corner. Figure 8-2: BOLD Record List Sequence Window The sequence page gives access to various details about the trace files and sequences for the specimen. Trace files can be viewed or downloaded from this window. If desired, the ID engine can be used to identify the sequence. Near the bottom of the page is an illustrative barcode of the species, along with a link to the Laboratory Information Management System (LIMS) for the Canadian Centre for DNA Barcoding. 17 Figure 8-3: Specimen Data BOLDSY BO BOLDSYSTEMS.org LD DSY SYST SYST STEM EMS. EM MS. S or org Figure 8-4: Sequence Data B A R C O D E O F L I F E D A T A S Y S T E M S N ote s BOLDSYSTEMS.org 18 Last modified: Oct 2008 BOLDSYSTEMS.org Biodiversit ersit y I nstitute o off O Ontario ntario U n i ve r s i t y o f G u e l p h 579 G o rdon Street Gue l p h , On t ari o, Ca n a d a N1 G 2 W 1 Co py r i ght ©2008 B io diver sit y I nst i t u te of Onta rio