Nucleo de Sequence Manipula on IMBB Workshop 20, May 2015

Transcription

Nucleo de Sequence Manipula on IMBB Workshop 20, May 2015
Nucleo'de Sequence Manipula'on IMBB Workshop 20, May 2015 Joyce Njoki Nzioki BecA-­‐ILRI Hub, Nairobi, Kenya h;p://hub.africabiosciences.org/ h;p://www.Ilri.org/ [email protected] Sequence Analysis / Annota1on Any analy1cal methods connected to iden1fying higher biological meaning out of raw sequence data. Annota1on: bridging the gap from sequence to biology of and organism Nucleo1de level Analysis ü Sequence transcrip1on (DNA to RNA) ü Reverse complement ü Finding regions of interest ü Transla1on of DNA in 6 reading frames ü Database searching. Conver1ng DNA to RNA ü DNA is a double stranded having a forward and reverse strand. ü Typically a sequence is read from direc1on 5’ – 3’ for the forward strand that means leK to right as the reverse right to leK. ü During transcrip1on, RNA polymerase moves along the DNA in a 5’-­‐3’ orienta1on producing a complementary strand which includes uracil in place of thymidine Complementary Strand ü In the DNA double helix adenine pairs with thymine and guanine with cytosine. ü Each base pair in the double helix structure must pair with it complementary. GATTCGTACG CTAAGCATGC ü There are tools for complementary strand determina1on; CLC main workbench, Expasy tools Reverse compliment ü In some cases you may need the reverse compliment of a DNA sequence (this is the complementary in the 5’-­‐3’ direc1on) ü For instance in as a nega1ve control when doing DNA hybridiza1on experiment ü  There are tools for complementary strand determina1on; CLC main workbench, Expasy tools Central dogma of molecular biology ü  Genes are stretches of DNA that encode
ü  A triplet of nucleotides codes for an amino acid the building
blocks of proteins
Standard Gene1c Code Transla1ng Nucleo1de sequences GATTCGTACG
CTAAGCATGC
GATTCGTACG
CTAAGCATGC
1. 
GAT TCG TAC G
Asp Ser Thr
4.
CTA AGC ATG C
Leu Ser Met
2. 
G ATT
Ile
5.
C TAA
#
3. 
GA TTC
Phe
6.
CT
CGT
Arg
ACG
Thr
GTC CG
Val
GCA TGC
Ala Arg
AAG
Lys
CAT GC
His
Transla1on tools •  CLC Main workbench •  Expasy transla1on tool: hWp://web.expasy.org/translate/ •  EMBOSS: transeq •  SeWeR analysis: (Sequence analysis using web resources) http://www.bioinformatics.org/SeWeR/ Gene finding Gene finding refers to iden1fying stretches of sequences (genes) in genomic DNA that are biologically func1onal Gene finding Eukaryotes ü Larger genomes ~3 billion bases ü For yeast <25% of genome is coding, in humans <3% ü Splicing and alterna1ve splicing ü Introns and exons present Prokaryotes ü Smaller genomes ~5 million bases ü For H. influenzae ~85% of genome is coding ü Gene calling involves using programs that carry out 6 frame transla1on and iden1fy all ORFs longer than a given threshold Gene Finding Approaches •  Computa1on methods proposed to find genes in both prokaryotes and eukaryotes include; ü Homology based approaches ü Ab Ini1o approaches ü Compara1ve genomics Homology based approaches ü Based on sequence similarity e.g. BLAST ü Searching a library of known genes that resemble the target sequence. ü If the iden1fied sequences are genes, the target sequence is puta1ve a gene and func1on can be conferred ü Limita'on: iden1fies only known genes that present in databases Ab ini-o Approaches •  Rely on sta1s1cal quali1es of exon rather that homology •  Tools: GenScan, fgenesh, HMMER, Augustus •  Limita1ons: ü Prone to false posi1ves ü OKen cant predict splice variants ü Cannot predict 3’ and 5’ UTRs Protein Level Annota1on Compila1on of catalogue of proteins of the organism, name them and assign puta1ve func1on Protein family classifica1on uses similari1es to beWer characterized proteins. Search for similarity using Blastp or PSI-­‐Blast tools against several protein databases Protein level annota1on ü Domain searches; protein domains are func1onal units of a protein these are key in predic1ng func1on of genes ü The several databases of func1onal domains: PFAM, PRINTS, PROSITE, ProDom, BLOCKS ü InterPro integrates all the protein signatures and domains in one resource Process level annota1on • 
• 
• 
• 
• 
• 
• 
How do the genes and proteins relate to Cell cycle Aptosis Metabolism Health and disease Resistance Reproduc1on Gene Ontology (GO) A standard vocabulary for describing gene and gene product aWributes. Provides vital clues to the role that genes and proteins have in biological process and provide a layer of annota1on. Gene Ontology (GO) A standard vocabulary for describing gene and gene product aWributes. Provides vital clues to the role that genes and proteins have in biological process and provide a layer of annota1on. ü Molecular func1on: tasks carried out by individual genes products e.g. enzyma1c ac1vity – catalysis Gene Ontology (GO) A standard vocabulary for describing gene and gene product aWributes. Provides vital clues to the role that genes and proteins have in biological process and provide a layer of annota1on. ü Molecular func1on: tasks carried out by individual genes products e.g. enzyma1c ac1vity – catalysis ü Biological process: used for broader biological roles func1oning of integrated living units cells 1ssues organs and organisms Gene Ontology (GO) A standard vocabulary for describing gene and gene product aWributes. Provides vital clues to the role that genes and proteins have in biological process and provide a layer of annota1on. ü Molecular func1on: tasks carried out by individual genes products e.g. enzyma1c ac1vity – catalysis ü Biological process: used for broader biological roles func1oning of integrated living units cells 1ssues organs and organisms ü Cellular components: describes genes in terms of sub cellular structure eg localized in nucleus. The End