Qiang Li

Transcription

Qiang Li
Estimation of evolutionary distance
based on K-string composition
Qiang Li
Department of Combinatorics and Geometry
CAS-MPG Partner Institute for Computational Biology
Difficulties in phylogenomics
• Species trees versus
gene trees
• Too few common
genes
“The tree of one
percent”
• Model selection
A science, or an art?
• Computational
complexity
• …
A tree based on 31 genes (Ciccarelli ’06)
Phylogenetics above sequence-level
• Rare genomic change
• Gene content & gene order
• K-string composition
“In summary, the methods for inferring trees based on
whole-genome features are at an early stage of their
development, which might be comparable to that of
sequence-based methods in the early 1970s. In particular,
they generally lack a global probabilistic modeling.”
(Philippe 2005)
CVTree: Compositional vector
• “Random background”: (K – 2)-order Markov chain
f (α1 α K −1 ) f (α 2 α K )
f ( α 1 α K ) =
f (α 2 α K −1 )
0
• “Subtraction”
 f (i ) − f 0 (i )
,

0
f (i )
ai = 
0,

f 0 (i ) ≠ 0
f 0 (i ) = 0
• Distance (dissimilarity)
1 − cos(a, b)
d (a, b)
=
∈ [0,1]
2
i ∈ ΣK
f versus f0
Courtesy of Alexander Grossmann
CVTree
•
•
•
•
simple
efficient
parameter free
truly whole
genomic
…
• mysterious
(Qi et al ’04)
CVTree of chloroplasts
CVTree of coronavirus
CVTree of dsDNA viruses
CVTree of fungi
Mycge
Mycpn
Mycpe
Urepa
Mycp
Borb u
Trep u
Chl a
Ch mu
Ch ltr
C lpnA
Ch hlpnC
A lp
Ag grt5 nJ
R rt5
hi W
m
e
pe
Aer ae
Pyr o
t
l
Su so
l
Su evo
Th eac
Th lsp
Ha tma
e
M
e
um su
r
B ru
B hilo r
R auc
C oliC
Ec oliE
Ec oliO
Ec liK
Eco l
Shif
Salti
Salty
YerpeC
YerpeK
Haein
Pasmu
Sheon
Vibch
Vibvu
Neim
Neim eM
Pse eZ
Ps ae
Ra epu
Xa lso
Xa nax
X nc
Bu ylfa a
Bu ca
ca i
p
S
St taa
a u
Lis auM N
m
L o
Oc isin
Ba eih
c
Ba hd
csu
Bac
The an
m
Aqua a
e
Lepin
Chlte
Deira
Strco
MyctuH
MyctuC
Mycle
Corgl
Coref
Biflo l
e
The c
p
n
Sy sp
a
An
br
ig n
W icc r
R cp
Ri amje
C lpj
He lpy
He
M
M e ta
et c
M ka
Me etth
Py tja
Py rfu
Py rho
ra
Arc b
Enc fu
cu
Plafa
Worm
Yeast
Schpo
Arath
Fusnu
Thete
Clope
Cloab
StrpnT
R
Strpn u
Strm S
y
Strp yM
p
Str pyG
Str agN
Str agV a
r
St acl p
L ae
St auW
a
St
0.05
The tree plotted to scale
“Concentrating on the
topology of the trees in
the first place, we did
not scale the branch
lengths on the tree.
However, these lengths
should reflect evolution
rates in terms of Kstring composition
changes. The
calibration of branch
lengths is further
complicated by the
overlapping nature of
the K-strings when K ≥
2.”
• “However, the methods proposed so far are
extremely crude. Oligonucleotide or oligopeptide
frequencies are transformed into distances
without any underlying model of evolution. It is
nevertheless remarkable that something
considered as a bias in standard sequence-based
methods (Lockhart et al. 1994) contains a
phylogenetic signal, but it is not yet clear whether
accurate methods can be developed to extract it.”
(Philippe et al. 2005)
• “Why, for example, does the K-string complement
of a proteome yield a tree that is similar to
sequence-based trees?”
(Snel et al. 2005)
A heuristic probabilistic model
• Transition of frequency distribution:
Assume that the evolution process is stationary, and
the average mutation rate per site is 1 – α. Let
L = |S| - K + 1, then the conditional distribution
(
( t +1)
f S
)
(i ) S (i=
) n= binomial(n, α )
(t )
K

K E S(i ) 
*binomial  L, (1 − α )

L 

• Time evolution of expectation
(
)
n =+
E S (i ) S (i ) =
E S(i ) α
( t ')
(t )
K ( t ' −t )
(n − E S(i ))
Calibration of CVTree
• Generalized CV
w(S,
=
i ) (S(i ) − S (i ))c(i )
0
where S0(i) is an unbiased estimator of S(i),
and c(i) is arbitrary “normalization” factor.
• Angle between vectors vs. sequence identity
E cos( w(S), w(S')) = q K
• Calibration
pˆ (S,S') = 1 − cos1/ K ( w(S), w(S'))
M Myc
yc p
ge n
Fu
sn
u
T
Clo het
C pe e
Str loab
p
Strp nT
StrmnR
Strpy u
StrpyMS
StrpyG
StragN
StragV
Lacla
Staep
StaauW
StaauN
M
Staaumo
Lis in
Lis ih
Oce hd
c
Ba csu
Ba can
Ba
0.05
e
m e
i
Rh rumsu
B ru ilo
B Rh cr
u
Ca iccn
R cpr
Ri cai
Bu cap
Bu gbr
Wi
EcoliC
EcoliE
EcoliO
EcoliK
Shifl
Salti
Salty
Yerp
Yerp eC
Hae eK
Pas in
S mu
Vibheon
c
NeVibvuh
Ne im
e
Ps ime M
Z
Ps ea
ep e
u
so
al ax
R an ca
X n
Xa lfa
Xy
M
e
P
Py yr tac
f
Py rho u
r
Me Me ab
tka tth
Me
tja
Plaf Arcfu
Encc a
u
Yeas
t
Schpo
Worm
Arath
Lepin
Chlte
n
lp
Ch J
ChlpnC
A
Chlpn ltr
Ch
u
Chlmpa
Tre u
rb
Bo elpy j
H elp
H mje
Ca
u
cp
y
M pa e
e p
Ur yc
M
Anasp
Synpc
Theel
Biflo
Core
Corg f
Myc l
Myc le
My tuC
ct
S
De trcouH
A i
Th qua ra
Ag emae
Ag r t
rt5 5
W
pe
Aer ae
Pyr to
l
Su so
l
Su evo
Th eac p
Th als tma
H
e
M
The calibrated tree
Using pre-/absence of words
• Any sequence S as above can be represented
also by just the set X(S) of its K-strings. In a
“gedanken alignment”, ignoring repeats and
collisions, we may put
1  X (S)  X (S') X (S)  X (S')
(K )
+
Q (S,S') : 
X (S)
X (S')
2
and
q
(K )
(S,S') := Q
(K )
1/ K
(S,S')



K = 9 Tree
M
M Me eta
et
k tth c
Me a
Py tja
Py rfu
rh
Py o
ra
Arc b
Plaf fu
a
Wor
m
Y eas
t
Schpo
Enccu
iE
ol liO
c
E co iK
E col
E hifl
S alti
S lty
Sa rpeC
Ye rpeK
Ye in
H ae u
Pasm n
S heo
Vibch
Vibvu
Pseae
Psepu
NeimeM
NeimeZ
Ralso
X ana
X an x
Xylf ca
Ca a
m
He je
l
H e pj
Bo lpy
Tr rbu
Le ep
C h pi n a
lm
u
tr A
hl n
C lp J
Ch hlpn C
C lpn
Ch
lte
Ch sp
a
A n pc
n
S y el
T he
C
Cl lop
S oa e
St trpn b
rp T
Str nR
Str mu
Str pyS
p
Strp yM
Stra yG
g
Strag V
N
Lacla
Staep
StaauW
StaauN
StaauM
Lismo
Lisin
Oceih
d
Bach
u
Bacs n
a
B ac
Arath
Deira
Strco
MyctuH
MyctuC
Mycle
l
Corg f
e
C or
o
f
i
B l
cpu
My epa
Ur e
cp
My ycpn e u
M ycg sn ete
M Fu Th
0.05
Aquae
Thema
Agrt5
Agrt5W
Rhim
Brum e
Bru e
s
R hi u
Ca lo
Ric ucr
R c
Bu icpr n
Bu ca
i
W cap
E c i gb
ol r
iC
pe
A er
to
S ul o
ls
Su
rae vo
e
Py
T h eac p
s
Th al tma
H
e
M
K =9
M
e
Py tja
Py rf
r u
P y ho
ra
Ar b
c
Me fu
t
Me ma
t
a
c
Ha
Plaf lsp
a
Y ea
st
S ch p
o
Worm
Enccu
p
ca gbr
u
B Wi oliC
Ec coliE
E oliO
Ec oliK
Ec ifl
Sh lti
Sa t y
S al eC
Yerp K
Yerpe
Haein
Pasmu
Sheon
Vibch
Vibvu
Pseae
Psep
Neim u
Neim eM
R al eZ
Xanso
X ax
X y a n ca
Ca lfa
He mje
H lp
Bo elp j
rb y
u
a
ep
T r pi n u
Le hlm r
C lt
A
Ch hlpn J
C lpn
Ch lpnC
Ch
te
C hl
C
Cl lop
S o e
St trpn ab
rp T
Str nR
Str mu
Str pyS
p
Strp yM
Stra yG
g
Strag V
N
L a cl a
Staep
StaauW
StaauN
StaauM
Lismo
Lisin
Oceih
d
Bach
u
s
c
Ba n
a
c
Ba
Arath
Deira
Strco
MyctuH
MyctuC
Mycle
l
Corg f
e
C or
o
Bifl
cpu
My epa
Ur e
cp n
My ycp e u e
M ycg sn et
M Fu Th
0.05
Anasp
Synpc
T heel
A quae
T hem
Agrt a
A gr 5
R h t 5W
Bru ime
Bru me
R su
Ca hilo
R i uc
Bu Ric ccn r
ca p r
i
pe
A er
to
S ul
lso
Su e
o
ra
ev
Py
T h eac h
t
T h et
M
ka
et
M
K = 10
M
e
Py tja
Py rf
u
Py rho
ra
Ar b
cfu
Me
t
Me ma
t
H a ac
Plaf lsp
a
Y ea
st
Schp
o
Worm
0.05
p
ca gbr
u
B Wi oliC
Ec coliE
E oliO
Ec oliK
Ec ifl
Sh lti
Sa t y
S al eC
Yerp K
Yerpe
Haein
Pasmu
Sheon
Vibch
Vibvu
Pseae
P se p
Neim u
Neim eM
R al eZ
Xanso
X ax
Xy anca
Ca lfa
He mje
H lp
Bo elp j
rb y
u
a
ep
T r pi n u
Le hlm r
C lt
A
Ch hlpn J
C lpn
Ch lpnC
Ch
te
C hl
C
Cl lop
S o e
St trpn ab
rp T
Str nR
Str mu
Str pyS
p
Strp yM
Stra yG
g
Strag V
N
Lacla
Staep
StaauW
StaauN
StaauM
Lismo
Lisin
Oceih
d
B a ch u
s
c
Ba n
a
B ac
Arath
Deira
Strco
MyctuH
MyctuC
Mycle
l
Corg f
e
C or o
i
B fl
cpu
My repa
U e
cp n
My ycp e u e
M ycg sn et
M Fu Th
Anasp
Synpc
T heel
A quae
T hem
Agrt a
A gr 5
R h t 5W
Bru ime
Bru me
R su
Ca hilo
R i uc
BuRicp ccn r
ca r
i
pe
A er
to
S ul o
ls
Su e
ra vo
e
Py
T h eac h
t
T h et
M
ka
Enccu
et
M
K = 11
Arath
Deira
Strco
MyctuH
MyctuC
Mycle
l
Corg f
e
r
Co o
Bifl
cpu
My repa
U e
cp
My ycpn e u e
M ycg sn het
M Fu T
0.05
Anasp
Synpc
T heel
A quae
T hem
Agrt a
A gr 5
R h t 5W
Bru ime
Bru me
R su
Ca hilo
R i uc
BuRicp ccn r
ca r
i
Py
Py rho
ra
Th Arc b
f
Th evo u
ea
c
Me
tm
Me a
H al t ac
Plaf sp
a
Wor
m
Y eas
t
Schpo
p
ca gbr
u
B Wi oliC
Ec coliE
E oliO
Ec oliK
Ec ifl
Sh lti
Sa t y
S al eC
Yerp K
Yerpe
Haein
Pasmu
Sheon
Vibch
Vibvu
Pseae
Psep
Neim u
Neim eM
R al eZ
Xa so
X a nax
Xy nca
Ca lfa
H mje
He elpj
Bo lp
rb y
u
a
ep
T r pi n u
Le hlm r
C lt
A
Ch hlpn J
C lpn
Ch lpnC
Ch
te
C hl
C
Cl lop
S o e
St trpn ab
rp T
Str nR
Str mu
Str pyS
p
Strp yM
Stra yG
g
Strag V
N
Lacla
Staep
StaauW
StaauN
StaauM
Lismo
Lisin
Oceih
d
Bach u
s
c
Ba n
a
B ac
Enccu
pe
A er
to
S ul
lso
Su e
h
ra Mett
Py
tka tja
Me Me rfu
Py
K = 12
iC
ol liE
c
E co liO
E co K
E coli
E hifl
S alti
S lty
Sa rpeC
Y e peK
Y er n
H aei u
Pasm n
S heo
Vibch
Vibvu
Pseae
Psepu
NeimeM
Neime
Z
Ralso
X an
X an ax
Xyl ca
Ca fa
m
He je
H lpj
Bo elpy
T rb
Le rep u
pi a
n
Ch
u
m
hl
C hltr lpnA
J
C
Ch hlpn C
C lpn
Ch
lte
sn
u
C l T he
o
t
C p e
Str loa e
Str pnT b
p
Str nR
m
Strp u
Strp yS
y
Strpy M
StragVG
StragN
Lacla
Staep
StaauW
StaauN
StaauM
Lismo
Lisin
ih
Oce d
h
c
Ba su
c
Ba can
Ba
Arath
Thema
Aquae
Deira
Strco
H
Myctu C
u
t
c
My ycle
M gl
Cor ref
Co flo
Bi
u
cp pa
y
M re
U e
cp n
My ycp cge
M y
M
0.05
Anasp
Synpc
T heel
Agrt5
Agrt5
Rhim W
Bru e
Bru me
Rh su
C ilo
Ric aucr
R c
B ic n
Bu uca pr
W ca p i
ig
br
M
et M
ka et
th
M
Th etja
Th evo
ea
c
Me
tm
Me a
H al t ac
Plaf sp
a
Wor
m
Y eas
t
Schpo
Fu
Enccu
pe
A er
to
S ul
lso
Su ae rfu
r
Py o
Py
rh
Py rab fu
Py Arc
K = 13
Arath
Deira
Strco
MyctuH
MyctuC e
Mycl
l
Corg f
e
r
Co o
Bifl
cpu
My repa
U e
cp n
My ycp e u e
M ycg usn het
M F T
0.05
p
ca igbr C
u
B W oli
Ec coliEO
E coli
E oliK
Ec ifl
Sh lti
Sa lty
S a peC
Y er eK
Yerp
Haein
Pasmu
Sheon
Vibch
Vibvu
Pseae
Pse
Neim pu
Neim eM
R al eZ
X a so
X a nax
Xy nca
Ca lfa
He mje
H lp
Bo elp j
rb y
u
a
ep
T r pi n u
Le hlmtr
C l nA
Ch hlp nJ
C lp
Ch lpnC
Ch
te
C hl
C
lo
St Clo pe
S t r pn ab
r
T
Str pnR
m
St
u
Str rpyS
p
Strp yM
Stra yG
StraggV
N
Lacla
Staep
StaauW
StaauN
StaauM
Lismo
Lisin
Oceih
d
Bach u
Bacsan
B ac
Enccu
ka M
et
th
M
e
Th tja
Th evo
ea
c
Me
t
Me ma
t
a
Ha
c
Plaf lsp
a
Wor
m
Y eas
t
Schpo
Anasp
Synpc
T heel
A quae
T hem
Agrt5 a
Ag
Rhi rt5W
Br me
Bruume
R su
Ca hilo
R
Ri icc ucr
Bu cp n
ca r
i
M
et
pe
A er
ae
P yr t o
l
Su lso
rfu
y
P
Su
o
rh b
Py yra cfu
P Ar
K = 14
Using pre-/absence of words
• Any sequence S as above can be represented
also by just the set X(S) of its K-strings. In a
“gedanken alignment”, ignoring repeats and
collisions, we may put
1  X (S)  X (S') X (S)  X (S')
(K )
+
Q (S,S') : 
X (S)
X (S')
2
and
or
q ( K ) (S,S') := Q ( K ) (S,S ')1/ K
:= Q ( K +1) (S,S') Q ( K ) (S,S')



Making up alignments
• Pattern:
1: key, must match
0: space, to watch
• “Align” pairwise:
For keys occur in both
sequences, concatenate
letters in spaces.
• Sequence-based distance
111110
X: VKAAWG,HAGEYG…
Y: VKAAWS,HAGEYG…
GG…
SG…
pˆ = nd / n
Y eas
t
Schpo
Enccu
Arath
Thema
Aquae
Mycpu
Urepa
Mycpeycpn
M cge
My
ei
ra
C l Th
C l op e
oa e te
Str b
St m
Str rpnTu
p
StrpnR
Strp yS
Strp yM
StragyVG
StragN
Lacla
Staep
StaauW
StaauN
StaauM
Lismo
Lisin
ih
Oce d
h
Bacsu
c
B a an
c
Ba
D
iE
ol liO
c
E co iK
E ol
Ec ifl
Sh alti
S alty
S rpeC
Ye rpeK
Y e heon
S
Vibchvu
Vib
Haein
Pasmu
Pseae
Psepu
Wigbr
Neim
NeimeeM
Z
Rals
X an o
X yl X a n a x
ca
Ca fa
He mje
B H lpj
Tr orbelpy
C h ep u
lm a
u
A
tr n
hl lp J
C Ch lpn C
Ch lpn
Ch
pi n
lte Le
Ch
o
H
Strc yctu C ycle
M ctu M
My
rgl f
o
C ore iflo u
C B sn
Fu
ol
iC
0.1
Ec
M
et
M
et M ac
ka e
tth
Me
Py tja
Py rfu
rh
Py o
ra
Arc b
fu
Anasp
Synpc
T heel
Agrt5
Agrt5
Rhim W
e
Bru
R
Bru me hilo
s
Bu R Ric
Bu c icp cn Cau
a
uc
ca i r
r
p
pe
A er
to
S ul o
ls
Su e
ra
Py
Pl
Wor afa
m
o
ev
Th eac sp
T h al a
H
m
et
M
011111110 tree
The missing words approach
• Probabilities of double- vs. single-missing
1/ K
p0 = u
2−q K
ln p0 

⇒ qˆ =  2 −

u
ln


• Estimation of probabilities
pˆ 0 =
n0
Σ
K
n0 + nB n0 + nA
·
uˆ =
K
K
Σ
Σ
E nc
M
e
Py tja
Py rf
u
Py rho
ra
Ar b
Me cfu
t
Me ma
t ac
H al
sp
e
im e
h
R um
Br rusu ilo
B Rh r
uc
Ca cn
Ric pr
Ric ai
B uc ap
B uc br
Wig
EcoliC
EcoliE
EcoliO
EcoliK
Shifl
Salti
Salty
YerpeC
Yerp
H ae eK
Pas in
m
S he u
Vib on
ch
V
Ne ibv
Ne ime u
M
i
Ps meZ
P s ea
ep e
u
C
St loa
S t r pn b
r
T
S t pnR
r
St mu
Str rpyS
p
Str yM
pyG
Stra
Stra gN
gV
Lacla
Staep
StaauW
StaauN
StaauM
Lismo
Lisin
Oceih
Bachd
u
Bacs
n
a
B ac
so
al x
R a n a ca
X n
Xa lfa
Xy mje
Ca elpj
H lpy
H e ae
Aqu ma
T he
cu
Y ea
st
Schp
o
Plafa
Worm
Arath
Mycpu
Urepa
Mycpe
Mycpn
Mycge J
n
Chlp C
n
p
C hl A
pn
Chl hltr
C u
lm
Ch repa
T bu u
r n te
B o F us he
T pe
lo
C
0.05
Biflo
Coref
Corgl
Mycle
Myctu
Myct C
uH
Dei Strco
r
An a
a
S
s
Th ynpc p
e
C h el
lte
Le
A p
Ag grt5 in
rt5
W
pe
A er
ae
P yr
lto
Su so
l
S u e vo
T h eac h
t
Th Met
ka
et
M
MW9 Tree
ilo r
Rh auc n
C icc
R icpr
R cai
Bu cap
Bu gbr
Wi liC
Eco liE
Eco O
Ecoli
EcoliK
Shifl
Salti
Salty
YerpeC
YerpeK
Haein
Pasmu
Sheo
Vibc n
Vib h
Ne vu
Ne imeM
Ra imeZ
Ps lso
P ea
Xa sep e
Xa na u
nc x
a
S
St trpy
St rpy S
St rpyG M
r
Str agN
ag
La V
Sta cla
S ta e p
au
Staa W
Staa uN
uM
Lismo
Lisin
Oceih
Bachd
Bacsu
Bacan
Thema
Aquae
lfa je
Xy m
Ca elpj y
H lp
He asp
An npc
Sy eel
Th te
Chl bu
Bor a
Trep
Lepin
M
M e ta
et c
M ka
Me etth
Py tja
Py rfu
Py rho
ra
Arc b
Enc fu
cu
Plafa
Worm
Yeast
Schpo
Arath
Deira
Strco
MyctuH
MyctuC
Mycle
l
Corg f
Core o
Bifl u
n
Fus ete
Th pe
Clo oab
Cl pnT
r R
St rpn u
St trm
S
Mycge
Mycpn
Mycpe
Urepa
Mycp
Chlm u
Chlt u
Ch r
Ch lpnA
Ch lpnC
Ag lpnJ
A rt5
Rh grt5
B im W
Br rum e
us e
u
pe
Aer lto
Su
lso
Su ae
r
Py evo
Th eac
Th lsp
Ha tma
e
M
A tree based on
“Jensen-Shannon”
distance JSh(S, S’),
the “symmetrized”
Kullback-Leibler
distance between the
“induced K-string
probabilities”
ilo r
Rh auc n
C icc
R icpr
R cai
B u ca p
Bu gbr
Wi liC
Eco liE
Eco O
Ecoli
EcoliK
Shifl
Salti
Salty
YerpeC
YerpeK
Haein
Pasmu
Sheo
Vibc n
Vib h
Ne vu
Ne imeM
Ra imeZ
Ps lso
P ea
Xa sep e
Xa na u
nc x
a
S
St trpy
St rpy S
St rpyG M
r
Str agN
ag
La V
Sta cla
S ta e p
au
Staa W
Staa uN
uM
Lismo
Lisin
Oceih
Bachd
Bacsu
Bacan
Thema
Aquae
lfa je
Xy m
Ca elpj y
H lp
He asp
An npc
Sy eel
Th te
Chl bu
Bor a
Trep
Lepin
M
M e ta
e
M tka c
e
Me tth
Py tja
Py rfu
Py rho
ra
Arc b
Enc fu
cu
Plafa
Worm
Yeast
Schpo
Arath
Deira
Strco
MyctuH
MyctuC
Mycle
l
Corg f
e
Cor o
Bifl u
n
Fus ete
Th pe
Clo oab
Cl pnT
r R
St rpn u
St trm
S
0.1
Mycge
Mycpn
Mycpe
Urepa
Mycp
Chlm u
Chl u
Chl tr
Ch pnA
Ch lpnC
Ag lpnJ
A rt5
Rh grt5
B im W
Br rum e
us e
u
pe
Aer lto
Su
lso
Su ae
r
Py evo
Th eac
Th lsp
Ha tma
e
M
The tree drawn to scale
0.05
N
au uW
a
St taa ep
S ta cla
S La agV
StrtragN
S rpyG
St rpyM
St rpyS
St mu
S tr n R
Strp T
Strpn
Cloab
Clope
Thete
Fusnu
Mycge
Mycp
Mycp n
Urep e
Agr Mycpa
u
A t5
Rhgrt5W
B i
Br rumme
R us e
Ca hil u
uc o
r
cn
ic r
R icp
R
M
et
k
M a
e
Th Ar tja
Th evo cfu
ea
Me c
Me tma
H tac
Plaf alsp
Wor a
m
Yeas
Schpot
Enccu
Arath
Thema
Aquae
Chlte
Theel
Synppc
AnasJ
n
Chlp nC
p
Chl lpnA r
Ch Chlt u
lm
Ch epina
L ep u
Tr orb py
B el
H
Deira
Biflo
Coref
Co
Mycrlegl
My
MycctuC
tu
Ba Strco H
c
B a
B acs n
Ocachd u
L
Li isineih
St sm
aa o
uM
1/ K
H
Ca el
m pj
Xa Xyl je
Xa ncafa
Ps nax
Pseepu
Ra ae
Neimlso
Neim eZ
e
VibvuM
Vibch
Sheon
Pasmu
Haein
YerpeK
YerpeCy
Salt i
Salt fl
ShiK
li
Eco liO
o
Ec oliE
Ec oliC
Ec igbr p
W uca ai
B uc
B
1 − (1 − JSh(S,S') )
pe
Aer lto
Su
l so
Su ae rfu
r
Py Py rho b
Py yra tth
P Me
The tree using the calibrated
distance
Comparison with “standard” taxonomy
12
10
# diff
8
6
JSh
4
cali.
2
0
8
9
10
11
12
Word length
13
14
Outlook
• More realistic evolutionary dynamics
• Taking account of the variance of the
frequency distribution
– Better construction of composition vector
– Optimal value of K
Acknowledgement
• Bailin Hao
• Andreas Dress
• Zhao Xu

Similar documents

Hjullastare 2,8-8,0 ton

Hjullastare 2,8-8,0 ton Egenvikt ..................................................ca 5050 kg Brytkraft ....................................................5160 daN Tipplast - rak maskin.................................36...

More information