slides

Transcription

slides
Information-Theoretic Analysis of Molecular
(Co)Evolution Using Graphics Processing Units
Michael Waechter, Kathrin Jaeger, Stephanie Weissgraeber,
Sven Widmer, Michael Goesele, and Kay Hamacher
...AEERYAEYKEAFTLFDSDGD...
...TEEQGRQFRQMFEMFDKNGD...
...TDEQQRQYRQMFETFDKDGN...
...TKEQVEEFKQAFSMFDTDGD...
...SEEQVAEFKEAFDRFDKNKD...
...SKEQVAKFKEAFDRIDKNKD...
...SPEQVAEFKQAFSRFDKNGD...
...SEEQVAKFKAAFSRFDTNGD...
...PPEQVAKFKEVFSRFDKNGD...
...AEERYAEYKEAFTLFDSDGD...
...TEEQGRQFRQMFEMFDKNGD...
...TDEQQRQYRQMFETFDKDGN...
...TKEQVEEFKQAFSMFDTDGD...
...SEEQVAEFKEAFDRFDKNKD...
...SKEQVAKFKEAFDRIDKNKD...
...SPEQVAEFKQAFSRFDKNGD...
...SEEQVAKFKAAFSRFDTNGD...
...PPEQVAKFKEVFSRFDKNGD...
June 18, 2012 | ECMLS 2012 | Michael Waechter & Stephanie Weissgraeber
1
Motivation
●Huge amount of Multiple Sequence Alignments (MSAs)
available, some of them really large
● E.g., HIV protease [1]:
> 45,000 sequences of length > 1400
●Put them to use for coevolutionary and structural analysis
●But: Our computations take >25 days
[1] Pan et. al.:“The HIV positive selection mutation database”
June 18, 2012 | ECMLS 2012 | Michael Waechter & Stephanie Weissgraeber
2
Outline
●In this talk we will show…
● MSA analysis using Mutual Information
● GPU parallelization & speed improvements
● 3-point Mutual Information
● an application to a well-known protein
contributions
● that the use of this is beneficial
June 18, 2012 | ECMLS 2012 | Michael Waechter & Stephanie Weissgraeber
3
Introduction – Mutual Information
●Given an MSA:
Sequence
Sequence
Sequence
Sequence
Sequence
Sequence
Sequence
Sequence
1:
2:
3:
4:
5:
6:
7:
8:
AEERYAEYKEAFTLFDSDGD...
TEEQGRQFRQMFEMFDKNGD...
TDEQQRQYRQMFETFDKDGN...
TKEQVEEFKQAFSMFDTDGD...
SEEQVAEFKEAFDRFDKNKD...
SKEQVAKFKEAFDRIDKNKD...
SPEQVAEFKQAFSRFDKNGD...
SEEQVAKFKAAFSRFDTNGD...
●Mutual Information between two columns
(correlation  coevolution):
●Iteration over all column pairs  MI matrix:
June 18, 2012 | ECMLS 2012 | Michael Waechter & Stephanie Weissgraeber
4
Introduction – Shuffling Null-Model
●MI is sensitive to underlying amino acid distribution
●Computational Normalization: Shuffling Null-Model [2]
●Is MI distinguishable from “random evolution” MI?
[2] K. Hamacher: “Relating sequence evolution of HIV1-protease to its underlying
molecular mechanics”
June 18, 2012 | ECMLS 2012 | Michael Waechter & Stephanie Weissgraeber
Introduction – Shuffling Null-Model
●Compute original MI
●Iterate 10,000 times:
● Shuffle each MSA column
● Compute rand. MI matrix
AEER...
TEEQ...
TDEQ...
SEEQ...
SKEQ...
PPEQ...
●Normalize original MI
using random MI:
SEEQ...
TDER...
TKEQ...
SEEQ...
APEQ...
PEEQ...
SEEQ...
SEEQ...
PEEQ...
TPEQ...
AKEQ...
TDER...
.
.
.
June 18, 2012 | ECMLS 2012 | Michael Waechter & Stephanie Weissgraeber
.
.
.
6
Massive parallelism
●Highly compute intensive
●HIV-1 protease on single core:
● MI computation for all column pairs: ~3.5 min
● Repeat for 10,000 iterations: > 25 days
●But:
● Computation of each MI matrix entry independent of all others
● Shuffling of each MSA column independent of all others
●Parallelizable (to hundreds of thousands of threads)
June 18, 2012 | ECMLS 2012 | Michael Waechter & Stephanie Weissgraeber
7
GPU Implementation
●Iterate 10,000 times:
● Shuffling
– Map MSA columns to blocks of threads
– Shuffle columns (GPU suited algorithm)
– Synchronize
...AEERYA...
...TEEQGR...
...TDEQQR...
...TKEQVE...
...SEEQVA...
...SKEQVA...
...SPEQVA...
● MI computation
– Map MI matrix entries to blocks of threads
(suitable for MSA access pattern)
– Compute MI matrix entries
– Synchronize
●Combine results & normalize orig. MI with randomized MI
June 18, 2012 | ECMLS 2012 | Michael Waechter & Stephanie Weissgraeber
8
Speed Results
GeForce GTX 480
Calmodulin
753 sequences
of length 264
HIV‐1 protease
> 45,000 seqs.
of length > 1400
1.1 min
4 threads
on Core i7‐960
13.4 min
~ 12x speed‐up
1.85 days
7.3 days
~ 4x speed‐up
●Problem size dependent
June 18, 2012 | ECMLS 2012 | Michael Waechter & Stephanie Weissgraeber
9
Implications
●One order of magnitude speed-up
●Quickly redo previous steps (e.g., alignment) and recompute
MI
●New analysis tool feasible:
3-point MI:
Coevolution of a ‘3-clique’ of MSA columns
●Can we deduce more information from 3-point MI than we
could from 2-point MI alone?
June 18, 2012 | ECMLS 2012 | Michael Waechter & Stephanie Weissgraeber
10
Calmodulin
●149 amino acids
●Ca2+ binding
 conformational change
●Regulates various signaling pathways
June 18, 2012 | ECMLS 2012 | Michael Waechter & Stephanie Weissgraeber
11
Coevolution in Calmodulin – 2-point MI
●Finding coevolving pairs of
amino acids
●Structural or functional
connection
●Here: Coevolution within Nand C-terminus
● Ca2+ binding
● Propagation of conformational
change
●Conserved inner helix
● No coevolution without variation
June 18, 2012 | ECMLS 2012 | Michael Waechter & Stephanie Weissgraeber
12
Coevolution in Calmodulin – 3-point MI
●‘3-cliques’ of amino acids
●Higher order correlations
● Concerted motions
● Binding sites
June 18, 2012 | ECMLS 2012 | Michael Waechter & Stephanie Weissgraeber
13
Coevolution in Calmodulin – 3-point MI
●‘3-cliques’ of amino acids
●
●Color indicates the frequency
with which an amino acid
contributes to the ‘3-cliques’ set
●Key residues for important
functions
June 18, 2012 | ECMLS 2012 | Michael Waechter & Stephanie Weissgraeber
14
Conclusions
●MI for coevolutionary analysis
●GPU implementation ~10x faster on typical MSAs
●3-point MI analysis possible in acceptable time
●3-point MI does reveal new insights
● Next step could be k-point MI
●It may be possible to detect key residues in unknown proteins
June 18, 2012 | ECMLS 2012 | Michael Waechter & Stephanie Weissgraeber
15
What happened since?
●Multi-GPU parallelization:
● Distribute Shuffling Null-Model iterations among GPUs
● First tests: 32 GPUs  ~32x speed-up (on top of basic GPU speedup!)
June 18, 2012 | ECMLS 2012 | Michael Waechter & Stephanie Weissgraeber
16
Please visit
tinyurl.com/tud‐comic
Thank you.
for code & documentation
or contact us.
June 18, 2012 | ECMLS 2012 | Michael Waechter & Stephanie Weissgraeber
17

Similar documents