Robustness of the Parsimonious Reconciliation Method in

Comments

Transcription

Robustness of the Parsimonious Reconciliation Method in
Robustness of the Parsimonious Reconciliation
Method in Cophylogeny
Laura Urbini, Blerina Sinaimeri, Catherine Matias, Marie-France Sagot
Trujillo, Spain
June 21-22, 2016
L. Urbini
June 21-22, 2016 - AlCoB
1 / 22
Introduction
The cophylogeny problem
L. Urbini
June 21-22, 2016 - AlCoB
2 / 22
Introduction
Reconciliation model
Reconcile the trees through a mapping of S into H (asymmetric role
between the trees).
Events that can be recovered:
L. Urbini
June 21-22, 2016 - AlCoB
3 / 22
Introduction
Reconciliation model
Definition
Input: H, S (rooted trees), the map φ : Leaves(S) → Leaves(H) and cost vector
c = hcc , cd , cs , cl i.
Output: A reconciliation function ρ : V (S) → V (H), where ρ extends φ (i.e.
∀v ∈ Leaves(S), ρ(v ) = φ(v )).
L. Urbini
June 21-22, 2016 - AlCoB
4 / 22
Introduction
Reconciliation model
Definition
Input: H, S (rooted trees), the map φ : Leaves(S) → Leaves(H) and cost vector
c = hcc , cd , cs , cl i.
Output: A reconciliation function ρ : V (S) → V (H), where ρ extends φ (i.e.
∀v ∈ Leaves(S), ρ(v ) = φ(v )).
ρ partitions the set V (S) into three sets:
Σ vertices associated with cospeciation.
L. Urbini
June 21-22, 2016 - AlCoB
4 / 22
Introduction
Reconciliation model
Definition
Input: H, S (rooted trees), the map φ : Leaves(S) → Leaves(H) and cost vector
c = hcc , cd , cs , cl i.
Output: A reconciliation function ρ : V (S) → V (H), where ρ extends φ (i.e.
∀v ∈ Leaves(S), ρ(v ) = φ(v )).
ρ partitions the set V (S) into three sets:
Σ vertices associated with cospeciation.
∆ vertices associated to duplication.
L. Urbini
June 21-22, 2016 - AlCoB
4 / 22
Introduction
Reconciliation model
Definition
Input: H, S (rooted trees), the map φ : Leaves(S) → Leaves(H) and cost vector
c = hcc , cd , cs , cl i.
Output: A reconciliation function ρ : V (S) → V (H), where ρ extends φ (i.e.
∀v ∈ Leaves(S), ρ(v ) = φ(v )).
ρ partitions the set V (S) into three sets:
Σ vertices associated with cospeciation.
∆ vertices associated to duplication.
Γ vertices associated to host-switches.
L. Urbini
June 21-22, 2016 - AlCoB
4 / 22
Introduction
Reconciliation model
Definition
Input: H, S (rooted trees), the map φ : Leaves(S) → Leaves(H) and cost vector
c = hcc , cd , cs , cl i.
Output: A reconciliation function ρ : V (S) → V (H), where ρ extends φ (i.e.
∀v ∈ Leaves(S), ρ(v ) = φ(v )).
ρ partitions the set V (S) into three sets:
Σ vertices associated with cospeciation.
∆ vertices associated to duplication.
Γ vertices associated to host-switches.
The loss events are related to host vertices.
L. Urbini
June 21-22, 2016 - AlCoB
4 / 22
Introduction
Reconciliation model
Definition
Input: H, S (rooted trees), the map φ : Leaves(S) → Leaves(H) and cost vector
c = hcc , cd , cs , cl i.
Output: A reconciliation function ρ : V (S) → V (H), where ρ extends φ (i.e.
∀v ∈ Leaves(S), ρ(v ) = φ(v )).
ρ partitions the set V (S) into three sets:
Σ vertices associated with cospeciation.
∆ vertices associated to duplication.
Γ vertices associated to host-switches.
Parsimony Method:
Assign a cost to each
event and minimize the
total cost.
The loss events are related to host vertices.
L. Urbini
June 21-22, 2016 - AlCoB
4 / 22
Limitations of the reconciliation model
Limitations of the reconciliation model
(Multiple associations leaves)
The model makes a strong assumption on the data in input:
One symbiont leaf is mapped to at most one host leaf.
L. Urbini
June 21-22, 2016 - AlCoB
5 / 22
Limitations of the reconciliation model
Limitations of the reconciliation model
(Multiple associations leaves)
The model makes a strong assumption on the data in input:
One symbiont leaf is mapped to at most one host leaf.
Datasets obtained for each choice of the multiple associations.
L. Urbini
June 21-22, 2016 - AlCoB
5 / 22
Limitations of the reconciliation model
Limitations of the reconciliation model
(Multiple associations leaves)
The model makes a strong assumption on the data in input:
One symbiont leaf is mapped to at most one host leaf.
Datasets obtained for each choice of the multiple associations.
Association changes → Similar reconciliation?
L. Urbini
June 21-22, 2016 - AlCoB
5 / 22
Limitations of the reconciliation model
Limitations of the reconciliation model
(Rooting a phylogenetic tree)
The model makes a strong assumption on the data in input:
L. Urbini
June 21-22, 2016 - AlCoB
6 / 22
Limitations of the reconciliation model
Limitations of the reconciliation model
(Rooting a phylogenetic tree)
The model makes a strong assumption on the data in input:
L. Urbini
June 21-22, 2016 - AlCoB
6 / 22
Limitations of the reconciliation model
Limitations of the reconciliation model
(Rooting a phylogenetic tree)
The model makes a strong assumption on the data in input:
Datasets obtained for each choice of rooting.
L. Urbini
June 21-22, 2016 - AlCoB
6 / 22
Limitations of the reconciliation model
Limitations of the reconciliation model
(Rooting a phylogenetic tree)
The model makes a strong assumption on the data in input:
Datasets obtained for each choice of rooting.
Root changes → Similar reconciliation?
L. Urbini
June 21-22, 2016 - AlCoB
6 / 22
Limitations of the reconciliation model
Limitations of the reconciliation model
(Rooting a phylogenetic tree)
The model makes a strong assumption on the data in input:
PLATEAU PROPERTY [GET13]: All optimal rooting form a
subtree, called plateau, from which: the rooting along every path
toward a leaf have monotonically increasing cost.
L. Urbini
June 21-22, 2016 - AlCoB
7 / 22
Limitations of the reconciliation model
Limitations of the reconciliation model
(Rooting a phylogenetic tree)
The model makes a strong assumption on the data in input:
PLATEAU PROPERTY [GET13]: All optimal rooting form a
subtree, called plateau, from which: the rooting along every path
toward a leaf have monotonically increasing cost.
Coevolution → “many” cospeciations → “low” total reconciliation
cost.
L. Urbini
June 21-22, 2016 - AlCoB
7 / 22
Limitations of the reconciliation model
Limitations of the reconciliation model
The model makes a strong assumption on the data in input:
One symbiont leaf can be associated with more than one host leaf.
Finding the root of a phylogenetic tree is often problematic.
Explore the robustness of parsimonious model, with errors in input
given H, S, φ and c:
Change associations in case of multiple associations leaves.
Change root of symbiont tree.
Try all possible rootings and test the plateau property.
Try the rootings at distance k ≤ max(5%|V (S)|, 3), from the original root.
Input changes → Similar reconciliation?
L. Urbini
June 21-22, 2016 - AlCoB
8 / 22
Limitations of the reconciliation model
EUCALYPT [DBS+ 14]
Problem
Generating all optimal reconciliations.
The number of optimal reconciliations can be exponential in the
size of the trees.
A polynomial delay algorithm: The time between two
successive solutions is polynomial in the size of the input.
eucalypt.gforge.inria.fr
L. Urbini
June 21-22, 2016 - AlCoB
9 / 22
The Input
The Input
A dataset is a pair of H, S, and map φ. We considering the
following cost vector c = hcc , cd , cs , cl i ∈ C where
C = {h−1, 1, 1, 1i,
h0, 1, 1, 1i,
h0, 1, 2, 1i,
h0, 2, 3, 1i,
h1, 1, 1, 1i,
h1, 1, 3, 1i}.
L. Urbini
June 21-22, 2016 - AlCoB
10 / 22
The Output
The Output
A reconciliation is summarised as a pattern of integers.
π = hnc , nd , ns , nl i
Definition
For a given input: H, S, the map φ and cost vector c.
Optimal solution: Multisets of patterns ΛH,S,φ,c = {π; π has optimal
cost}
Dissimilarity between two multisets of patterns:
P
P
|| π∈Λ1 π − π∈Λ2 π||
d(Λ1 , Λ2 ) =
(|Λ1 | + |Λ2 |) ∗ maxπ∈Λ1 ∪Λ2 ||π||
L. Urbini
June 21-22, 2016 - AlCoB
(1)
11 / 22
The Output
The Output
Dissimilarity between two multisets of patterns:
P
P
|| π∈Λ1 π − π∈Λ2 π||
d(Λ1 , Λ2 ) =
(|Λ1 | + |Λ2 |) ∗ maxπ∈Λ1 ∪Λ2 ||π||
Example 1:
Λ1 = {[4, 2, 0, 1], [4, 2, 0, 1], [5, 1, 1, 0]}
Λ2 = {[4, 1, 0, 1]}
L. Urbini
June 21-22, 2016 - AlCoB
12 / 22
The Output
The Output
Dissimilarity between two multisets of patterns:
P
P
|| π∈Λ1 π − π∈Λ2 π||
d(Λ1 , Λ2 ) =
(|Λ1 | + |Λ2 |) ∗ maxπ∈Λ1 ∪Λ2 ||π||
Example 1:
Λ1 = {[4, 2, 0, 1], [4, 2, 0, 1], [5, 1, 1, 0]}
Λ2 = {[4, 1, 0, 1]}
d(Λ1 , Λ2 ) =
|| {[4, 2, 0, 1] + [4, 2, 0, 1] + [5, 1, 1, 0]} − {[4, 1, 0, 1]} ||
(3 + 1) ∗ max(7, 7, 7, 6)
d(Λ1 , Λ2 ) =
L. Urbini
||[9, 4, 1, 1]||
= 0.536
4∗7
June 21-22, 2016 - AlCoB
12 / 22
Tested Dataset
Datasets
Biological Datasets:
15 Datasets:
EC - Encyrtidae (7 leaves) & Coccidae (10 leaves)
PP - Primates (36 leaves) & Pinworms (40 leaves), RH Rodents (34 leaves) & Hantaviruses (42 leaves),
Multiple Associations:
3 of these datasets present multiple associations (namely MP,
SBL, SFC)
Simulated Datasets:
For each Biological Dataset we created 50 Simulated Datasets.
The simulated datasets will be used only for testing the rooting of
the trees.
L. Urbini
June 21-22, 2016 - AlCoB
13 / 22
Results
Perturbation of associations
(Multiple associations leaves)
SBL dataset, 5 out the 8 leaves of the symbiont tree have multiple
associations → 560 datasets.
Cost vector h0, 1, 1, 1i.
70% have cost 7
30% change the optimum cost value (from 7 to a value 6,8,9)
L. Urbini
June 21-22, 2016 - AlCoB
14 / 22
Results
Perturbation of associations
(Multiple associations leaves)
SBL dataset, 5 out the 8 leaves of the symbiont tree have multiple
associations → 560 datasets.
Cost vector h0, 1, 1, 1i.
70% have cost 7
30% change the optimum cost value (from 7 to a value 6,8,9)
L. Urbini
June 21-22, 2016 - AlCoB
14 / 22
Results
Perturbation of associations
(Multiple associations leaves)
SBL dataset, 5 out the 8 leaves of the symbiont tree have multiple
associations → 560 datasets.
Cost vector h0, 1, 1, 1i.
65.5% dissimilarity different to 0
8.5% biggest dissimilarity (0.6)
L. Urbini
June 21-22, 2016 - AlCoB
14 / 22
Results
Rerooting
(Testing the plateau property)
2 biological datasets and several simulated datasets have more
than one plateau.
plateau property is not valid in our model (because of the host
switch).
in 37% of biological datasets and in 17% of simulated datasets,
the original root is not in the plateau.
Hypothesis for real datasets, the original root is not in the
correct position.
L. Urbini
June 21-22, 2016 - AlCoB
15 / 22
Results
Rerooting
(At distance k )
Distance k ≤ max(5%|V (S)|, 3), from the original root.
Real Datasets:
Dissimilarity of reconciliation globally increases as k also increases.
L. Urbini
June 21-22, 2016 - AlCoB
16 / 22
Results
Rerooting
(At distance k )
Distance k ≤ max(5%|V (S)|, 3), from the original root.
Simulated Datasets:
Dissimilarity of reconciliation globally increases as k also increases.
L. Urbini
June 21-22, 2016 - AlCoB
17 / 22
Conclusions
Conclusions
(Multiple associations leaves)
Associate a symbiont to a unique host in case of multiple associations:
Not big impact for the reconciliation cost.
The choice of leaf associations may have a strong impact on the variability
of the reconciliation output.
Open problems:
Simulating the coevolution of symbiont and host allowing multiple
associations.
L. Urbini
June 21-22, 2016 - AlCoB
18 / 22
Conclusions
Conclusions
(Rooting a phylogenetic tree)
Rerooting:
The number of plateaux depends on the presence of host switches.
The original root may not be inside the plateau.
In general the variance of dissimilarity of reconciliations increases with the
increase of the distance k.
Open problems:
Is there a relation between the number of plateaux and the level of
dissimilarity of pattern?
Is there a relation between the number of plateaux and the number of host
switches in the optimal solutions?
L. Urbini
June 21-22, 2016 - AlCoB
19 / 22
Thank you
L. Urbini
June 21-22, 2016 - AlCoB
20 / 22
References I
[DBS+ 14] Beatrice Donati, Christian Baudet, Blerina Sinaimeri,
Pierluigi Crescenzi, and Marie-France Sagot.
EUCALYPT: efficient tree reconciliation enumerator.
Algo. Mol. Biol., 10(1):3, 2014.
[GET13]
Pawel Górecki, Oliver Eulenstein, and Jerzy Tiuryn.
Unrooted tree reconciliation: A unified approach.
IEEE/ACM Trans. Comput. Biology Bioinf., 10(2):522–536,
2013.
L. Urbini
June 21-22, 2016 - AlCoB
21 / 22
Time feasibility of the solutions
If no time information then finding a optimal time feasible solution is
NP-hard.
Allow for time infeasible host switches → polynomial time.
Check tome consistence of a solution → polynomial time.
L. Urbini
June 21-22, 2016 - AlCoB
22 / 22

Similar documents