What is DNA copy number?

Transcription

What is DNA copy number?
What is DNA copy number?
Normally, each somatic cell contains 2 copies of every
chromosome.
What is DNA copy number?
One of the earliest observed “copy number” changes is trisomy
of chromosome 21 in Down’s Syndrome.
What is DNA copy number?
In fact, it became apparent later that chromosome aberrations
come in all forms and sizes.
High density DNA copy number data
Array-based Comparative Genomic Hybridization
Figures from Garnis et al. (2004)
DNA Copy Number Data from Different Platforms
Why analyze DNA copy number?
Cancer genomics
Why analyze DNA copy number?
Douglas et al. (2004), colorectal cancer.
Why analyze DNA copy number?
Copy number polymorphisms in Hapmap samples
Statistical methods for single sample, total copy
number segmentation
1. Circular Binary Segmentation algorithm of Olshen et al.
(2004)
Statistical methods for single sample, total copy
number segmentation
1. Circular Binary Segmentation algorithm of Olshen et al.
(2004)
2. HMM based methods (Fridlyand et al. (2004), Lai et al.
(2007))
Statistical methods for single sample, total copy
number segmentation
1. Circular Binary Segmentation algorithm of Olshen et al.
(2004)
2. HMM based methods (Fridlyand et al. (2004), Lai et al.
(2007))
3. Wavlet based methods of Hsu et al. (2005)
Statistical methods for single sample, total copy
number segmentation
1. Circular Binary Segmentation algorithm of Olshen et al.
(2004)
2. HMM based methods (Fridlyand et al. (2004), Lai et al.
(2007))
3. Wavlet based methods of Hsu et al. (2005)
4. Cluster ALong Chromosomes method of Wang et al.
(2005)
Statistical methods for single sample, total copy
number segmentation
1. Circular Binary Segmentation algorithm of Olshen et al.
(2004)
2. HMM based methods (Fridlyand et al. (2004), Lai et al.
(2007))
3. Wavlet based methods of Hsu et al. (2005)
4. Cluster ALong Chromosomes method of Wang et al.
(2005)
5. Many others: CBS, HMM, GLAD, CNV, CGHseg,
Quantreg,Wavelet, Lowess, ChARM, GA, L1
Regularizaiton, ACE...
HMM Model of Fridlyand et al. (2004)
This is a classic application of hidden Markov models:
▶
The underlying states 1, . . . , K represent the “true” copy
number.
▶
Given state k , the observed intensity levels are N(𝜇k , 𝜎 2 ).
▶
The transition matrices and emission parameters are
estimated by EM.
▶
The AIC or BIC criterion is used to choose K .
A Bayesian Model for Inference
When we estimate model parameters,
confidence intervals are desirable!
A Bayesian Model for Inference
When we estimate model parameters,
confidence intervals are desirable!
1. Confidence bands on estimated copy number.
A Bayesian Model for Inference
When we estimate model parameters,
confidence intervals are desirable!
1. Confidence bands on estimated copy number.
2. How certain are we that [i, j] contains a CNV?
A Bayesian Model for Inference
When we estimate model parameters,
confidence intervals are desirable!
1. Confidence bands on estimated copy number.
2. How certain are we that [i, j] contains a CNV?
3. Confidence intervals on the aberration boundaries.
A Bayesian Model for Inference
When we estimate model parameters,
confidence intervals are desirable!
1. Confidence bands on estimated copy number.
2. How certain are we that [i, j] contains a CNV?
3. Confidence intervals on the aberration boundaries.
4. Confidence intervals on global measures of “complexity",
such as total number of aberrations.
Observations
1. For array-CGH data, there is a known baseline at 0.
Observations
1. For array-CGH data, there is a known baseline at 0.
2. Due to mosaicism, the data is drawn from mixtures of
discrete copy number levels, and thus is continuous.
Observations
1. For array-CGH data, there is a known baseline at 0.
2. Due to mosaicism, the data is drawn from mixtures of
discrete copy number levels, and thus is continuous.
3. In some tumors the number of distinct levels is very high.
Fitted Levels
Heterogeneity of cancer samples
Image from: http://science.kennesaw.edu/ mhermes/cisplat/cisplat19.htm
Stochastic Change Model
St ∈ {baseline, changed}
Stochastic Change Model
St ∈ {baseline, changed}
baseline state: 𝜃t = 0,
changed state: 𝜃t ∼ N(𝜇, v ).
If St jumps, 𝜃t takes on new value. Otherwise 𝜃t = 𝜃t − 1.
Stochastic Change Model
St ∈ {baseline, changed}
baseline state: 𝜃t = 0,
changed state: 𝜃t ∼ N(𝜇, v ).
If St jumps, 𝜃t takes on new value. Otherwise 𝜃t = 𝜃t − 1.
yt = 𝜃t + 𝜎𝜖t ,
𝜖t ∼ N(0, 1)
Stochastic Change Model
P(St = changed ∣ St−1 = baseline) = p
P(St = different changed state ∣ St−1 = changed) = b
P(St = baseline ∣ St−1 = changed) = c
Stochastic Change Model
P(St = changed ∣ St−1 = baseline) = p
P(St = different changed state ∣ St−1 = changed) = b
P(St = baseline ∣ St−1 = changed) = c
This can be modeled with a 3-state Markov model with transition
matrix:
⎞
⎛
1 − p 12 p 12 p
P=⎝ c
a
b ⎠.
c
b
a
Estimating 𝜃t , St
We can compute:
E(𝜃t ∣ y1:n )
P(St = changed ∣ y1:n )
P(CNV at [i,j] ∣ y1:n )
“smoothed" estimate of mean
probability of CNV at t
probability of aberration at [i, j]
Estimating 𝜃t , St
The posterior distribution of 𝜃t given 𝒴n (1 ≤ t ≤ n), which is a
mixture of normal distributions and a point mass at 0:
∑
𝛽ijt N(𝜇ij , vij ).
𝜃t ∣𝒴n ∼ 𝛼t 𝛿0 +
1≤i≤t≤j≤n
Estimating 𝜃t , St
The posterior distribution of 𝜃t given 𝒴n (1 ≤ t ≤ n), which is a
mixture of normal distributions and a point mass at 0:
∑
𝛽ijt N(𝜇ij , vij ).
𝜃t ∣𝒴n ∼ 𝛼t 𝛿0 +
1≤i≤t≤j≤n
The parameters of this distribution can be computed by
recursive formulas.
E(𝜃t ∣ y1:n )
∑
=
𝛽ijt 𝜇ij ,
1≤i≤t≤j≤n
where
P(St = changed ∣ y1:n )
=
𝛼t
P(CNV at [i,j] ∣ y1:n )
=
𝛽ijt ,
∗/
𝛼t = 𝛼t
At ,
∗/
𝛽ijt = 𝛽ijt At ,
∗
At = 𝛼t +
∑
1≤i≤t≤j≤n
/
∗
𝛼t = pt [(1 − p)p̃t+1 + c q̃t+1 ] c,
/
{
qi,t (pp̃t+1 + bq̃t+1 ) p,
∗
𝛽ijt =
aqi,t q̃j,t+1 𝜓i,t 𝜓t+1,j /(p𝜓𝜓i,j ),
i ≤ t = j,
i ≤ t < j.
∗
𝛽ijt ,
Hyperparameter Estimation
The model was defined as:
yt = 𝜃t + 𝜎𝜖t ,
𝜖t ∼ N(0, 1)
baseline state: 𝜃t = 0,
changed state: 𝜃t ∼ N(𝜇, v ).
St modeled by a 3-state Markov model with transition matrix:
⎛
⎞
1 − p 12 p 12 p
P=⎝ c
a
b ⎠.
c
b
a
Hyperparameter Estimation
The model was defined as:
yt = 𝜃t + 𝜎𝜖t ,
𝜖t ∼ N(0, 1)
baseline state: 𝜃t = 0,
changed state: 𝜃t ∼ N(𝜇, v ).
St modeled by a 3-state Markov model with transition matrix:
⎛
⎞
1 − p 12 p 12 p
P=⎝ c
a
b ⎠.
c
b
a
The hyperparameters of this model are 𝜎, 𝜇, v , a, b, c , p.
Likelihood of the data as a function of these hyperparameters
can be expressed by recursive formulas. Maximum-likelihood
values, computed by the EM algorithm, are used.
Confidence Bands for BT474
Inference on Measures of Genome Complexity

Similar documents