Cardiac Structural and Sarcomere Genes Associated with

Transcription

Cardiac Structural and Sarcomere Genes Associated with
DOI: 10.1161/CIRCGENETICS.112.963421
Cardiac Structural and Sarcomere Genes Associated with Cardiomyopathy
Exhibit Marked Intolerance of Genetic Variation
Running title: Pan et al.; Cardiomyopathy Gene Variant Intolerance
Downloaded from http://circgenetics.ahajournals.org/ by guest on November 19, 2016
Dunn
Dunn,
nn, MS
nn
MS,, CGC
CGC1;
Stephen Pan, MD, MS1,2; Colleen A. Caleshu, ScM, CGC1; Kylaa E. Du
Ashley,
shl
hley
ey,, MR
ey
MRCP
MRCP,
CP,, D
CP
DPhil1
Marcia J. Foti1; Maura K. Moran, BA1; Oretunlewa Soyinka1; Euan A. As
1
Stanford Cen
C
Center
ente
t r for Inhe
te
Inherited
eriitedd C
Cardiovascular
arddiovasc
scuulaar D
Disease,
iseaasee, St
Sta
Stanford
anfo
fordd H
Hospital
osppittall & Clin
Clinics;
i iccs;; 2B
in
Biomedical
ioom
Informatics
matics Trainingg P
Program,
roogr
g am
am, Stanford
Stan
St
a fo
an
ford
r U
rd
University
nive
ni
vers
ve
rsit
ityy School
it
Sccho
hool
ol of
of Medicine,
M di
Me
dici
cine
ci
ne, Stanford, CA
ne
Correspondence:
Euan A. Ashley MRCP DPhil
Stanford Center for Inherited Cardiovascular Disease
Falk Cardiovascular Research Center
300 Pasteur Drive
Stanford, CA 94305
Tel: (650) 498-4900
Fax: (650) 725-1599
E-mail: [email protected]
Journal Subject Codes: [16] Myocardial cardiomyopathy disease, [89] Genetics of
cardiovascular disease, [109] Clinical genetics, [146] Genomics
1
DOI: 10.1161/CIRCGENETICS.112.963421
Abstract:
Background - The clinical significance of variants in genes associated with inherited
cardiomyopathies can be difficult to determine due to uncertainty regarding population genetic
variation and a surprising amount of tolerance of the genome even to loss of function variants.
We hypothesized that genes associated with cardiomyopathy might be particularly resistant to
the accumulation of genetic variation.
Methods and Results - We analyzed the rates of single nucleotide genetic variation in all known
genes from the exomes of >5,000 individuals from the National Heart, Lung, and Blood
Downloaded from http://circgenetics.ahajournals.org/ by guest on November 19, 2016
Institute’s Exome Sequencing Project (ESP), as well as the rates of structural variation from the
Database of Genomic Variants. Most variants were rare, with over halff un
unique
uniq
ique
iq
ue tto
o on
onee in
iindividual.
indi
ndi
divv
Cardiomyopathy associated genes exhibited a rate of nonsense variantss 96.1
96.1%
1% lower
lowe
lo
werr th
we
than
an oother
Mendelian dise
disease
seas
se
asee ge
ggenes.
ene
n s.. We
W tested the ability of in-silico
in
n-si
s lico algorithmss to
to distinguish between
betweee a
set of variants
pathogenicity
nts in MYBPC3,
nt
MYBPC3
C3, MYH7,
C3
MYH7
MY
H77, and
and TNNT2
T NT
TN
NT22 with
withh strong
stroong evidence
evi
v dencee for
foor pa
ath
t og
ogen
e ic
en
icit
i y an
it
and
variants from
the
(GERP,
om th
om
he ESP data.
d taa. Algorithms
da
Alggoriith
t ms
m based
b se
ba
sed on
on conservation
connserv
rvaatio
rv
on att tthe
he nnucleotide
ucle
leootidee llevel
le
eve
vel (G
GER
PhastCons)) did not perform
algorithms
m as
as we
well
ll aass amino ac
acid
d level
lev
evel
e pprediction
redi
d ct
di
c io
on algo
gori
go
rith
ri
thmss (Polyphen-2,
SIFT). Variants
with strongg evidence
causality
the
i
iants
evid
i ence for
id
for di
disease caus
allit
i y were fo
ffound
undd iin
n th
he ESP data at
prevalence hhigher
ighe
ig
herr than
he
than expected.
exp
xpec
ecte
ted.
d
d.
Conclusions - Genes associated with cardiomyopathy carry very low rates of population
variation. The existence in population data of variants with strong evidence for pathogenicity
suggests that even for Mendelian disease genetics, a probabilistic weighting of multiple variants
may be preferred over the ‘single gene’ causality model.
Key words: cardiomyopathy; genetic heart disease; genetic variation; genomics; genetic testing
2
DOI: 10.1161/CIRCGENETICS.112.963421
Background
New DNA sequencing technologies are poised to transform the genetic evaluation of patients.
Soon the availability of genetic information will no longer be a barrier to our understanding of
the genetic basis of disease. Rather it will be our ability to understand and interpret the data that
will be paramount. The interpretation of clinical genetic testing is a complex process that
requires an appreciation of factors establishing causality as well as a detailed understanding of
the ‘tolerated’ genetic variation present in human genomes of different ethnicities. Until recently,
Downloaded from http://circgenetics.ahajournals.org/ by guest on November 19, 2016
much of the genetic variation in human populations was unknown. With
population
ith large
largge scale popu
puula
l
sequencing projects such as the 1000 Genomes Project1, the true extent
nt of tthis
his variation
hi
variat
i tio
i n iiss now
n
clear.
ar.
r. IIndeed,
ndee
nd
eed, recent
ee
recent analyses indicate a su
ssurprising
urprising prevale
prevalence
enc
nce of tolerated genet
becoming clea
genetic
variation.2-44
Clinical
n l genetic
nical
geene
n tic testing
t st
te
stin
tin
i g is
is increasingly
incre
reeasingl
glly av
available
vaila
ilabl
blle fo
forr co
conditions
ondit
itio
ions
io
ons such
suchh as
as hypertrophic
hypert
rtro
troph
phic
hic
cardiomyopathy,
pathy
y, where it is
i usedd ffor
or pr
ppredictive
ed
dic
i tiive ffamily
amil
ilyy testing
il
testing,
g, andd lo
llong
ngg QT
QT syndrome,
syyndrome,, whe
where
e it
57
5-7
may alter management
nt as well
ell
ll as impact
im ct ffamily
famil
il screening.
screening
in 5The yield
Th
ield
ield
ld from
fr
genetic
etiic testi
testing
testing,
in
however, can be variable. Evidence for or against a variant’s role is assembled from previous
reports in the literature, co-segregation, the likelihood that the variant disrupts the reading frame
(weighted more towards nonsense variants, small insertion-deletion variants, or splice site
variants) and algorithmic predictions based on conservation, constraint, or protein motif
disruption. Despite such resources, a large number of variants found through clinical genetic
testing remain of unclear significance. Greatly lacking is knowledge of the population genetic
variation in these and other genes, which is needed for the interpretation of variants not just in
Mendelian diseases, but also for common disease risk assessment8,9 and pharmacogenomics.10-12
One recent project to catalog population scale single nucleotide variant (SNV) data has
3
DOI: 10.1161/CIRCGENETICS.112.963421
been the National Heart, Lung, and Blood Institute (NHLBI) Exome Sequencing Project
(ESP)13,14. This large scale effort is aimed at sequencing the exome, consisting of the protein
coding regions (exons) of the human genome, from members of several different cohorts
followed throughout the country for the purpose of defining the genetic components of complex
diseases. In contrast with the 1000 Genomes study, which has low coverage of hundreds of
genomes, the NHLBI exomes study has high coverage (average >100x), high quality sequencing
data for >5000 individuals of Caucasian and African American ethnicity. Thus, it represents a
Downloaded from http://circgenetics.ahajournals.org/ by guest on November 19, 2016
valuable comparison dataset for variants thought to cause monogenic Mendelian disease.
diseasee. One
O
limitation of both of these datasets, however, is the absence of structural
ral variants.
varia
i nts.
t These
Thesee may
ma be
particularly
y imp
important
mpor
mp
orta
or
tant
ta
nt because
b ca
be
cause of theirr tendency to ddisrupt
isrupt the reading
is
g fframe.
rame. T
The
he Databasee of
Genomic Variants
Variaants (DGV)1155 iss a curated
Va
cu
ura
r tedd rrepository
epo
posito
ory ooff stru
structural
ucttural vvariation
ariattioon ((consisting
consis
isti
is
t ng off
ddeletions,
tion
ti
onss, andd ccopy-number
on
opy-nu
op
numb
berr varia
ant
nts)
ts) whi
hich
hi
h se
erv
rves
es a ssimilar
imil
im
mil
i ar purpose
purpo
posse ass the
po
th
he above
ab
abov
bov for
insertions, deleti
variants)
which
serves
structural variation.
v
Using
ng these
thhe sources
so rces off pop
population
llation
atii genetic
etiic variation,
ariation
iatii
wee sought
so ght
h to characterize
characteri
ch
h
terii e th
the
he
tolerance of the human genome to variation in genes associated with Mendelian diseases with a
specific focus on those that have been associated with inherited cardiomyopathy.
Methods
NHLBI Exome Sequencing Project Data
Data from the NHLBI ESP5400 dataset was accessed on December 12th, 2011 and downloaded
for analysis. This data is the accumulation of called variants from the exomes of 5,379
individuals from multiple cohorts of the ESP, including the Women’s Health Initiative,
Framingham Heart Study, Jackson Heart Study, Multi-Ethnic Study of Atherosclerosis,
Atherosclerosis Risk in Communities, Coronary Artery Risk Development in Young Adults,
4
DOI: 10.1161/CIRCGENETICS.112.963421
Cardiovascular Health Study, Genomic Research on Asthma in the African Diaspora, Lung
Health Study, Pulmonary Arterial Hypertension population, Acute Lung Injury cohort, and the
Cystic Fibrosis cohort. The primary purpose of the ESP is to sequence the exomes of a large
number of individuals selected for the extremes of primarily complex traits from these cohorts.
While these exomes may not represent a true sample of the general population, they do represent
a phenotyped cohort that is unlikely to be enriched for Mendelian disease, with the possible
exception of cystic fibrosis. Resulting SNV calls were filtered for depth and base call thresholds
Downloaded from http://circgenetics.ahajournals.org/ by guest on November 19, 2016
and were annotated for quality using a support vector machine algorithm
ESP
hm by
y the NHLBII ES
E
data analysis group. Only calls that passed all quality filters were usedd for downstream
analysis.
d wnstream
do
t
anal
Further information
regarding
ormation
orm
mat
atio
io
on re
reg
gard
rd
ding alignment, variant calls,
cal
alls
al
ls, and filtering, as
as well as the entirety of this
data is available
http://evs.gs.washington.edu/EVS/.
laablle at http://
/evs.ggs.waash
shingt
gtonn.eeduu/E
EVS
S /.
1000 Genomes
mes Data
Dat
ata
Data from the
t 1000 Genomes Phase
Phase 1 March
March
h 22012
0 2 release
01
rellease1 was retri
retrieved
ieved
d and the subset off small
insertion/deletion
eletion
el
letii (1-50
((1
1 50 bp)
bp)) calls
all
lls were
ere used
sed
ed
d for
f analysis
anall si
sis
i (http://www.1000genomes.org/).
((http://
http://
//
1000genomes
100
0000g
org/)
/)
Database of Genomic Variants
For evaluation of structural variation, the November 2010 data release from the Database of
Genomic Variants (http://projects.tcag.ca/variation), aligned to the hg19 version of the human
genome, was accessed. This includes data from 42 separate studies evaluating for structural
variation involving segments of DNA greater than 1kb, as well as smaller insertions/deletions
(indels). The data is collected from small individual genome and population level studies without
known enrichment for disease.
Gene Annotation
Gene annotation data was accessed from the Online Mendelian Inheritance in Man database16
5
DOI: 10.1161/CIRCGENETICS.112.963421
(http://www.ncbi.nlm.nih.gov/omim). All genes as annotated in the NCBI Reference Sequence
Database (RefSeq)17 via the University of California, Santa Cruz (UCSC) Genome Browser18
(including alternate isoforms) were divided into subgroups by OMIM annotation and/or literature
review according to their known association with: i) cardiomyopathy, specifically HCM or DCM
ii) any other Mendelian disease, or iii) neither of the above. After accounting for alternate
isoforms, there were 120 isoforms of 46 separate genes associated with inherited
cardiomyopathy, 5,764 isoforms of 2,831 separate genes with other Mendelian disease
Downloaded from http://circgenetics.ahajournals.org/ by guest on November 19, 2016
association, and 25,437 isoforms of 16,102 separate genes without known
own Mendelian disease
diise
sea
association for which variant data from the ESP5400 dataset was available.
lable.
Analysis off Population
Po
opu
pula
lati
la
tion
ti
o Variation
Var
a iation Data
Variants from
om the ESP5400
om
00 dataset
dat
attasett were
weree grouped
gro
rouped
ed byy gene
gene
ne into
into the
th
he three
thrreee categories
cattego
ca
ori
ries
e previously
prev
vio
ous
mino
minor
norr allele
no
alleele
l frequencies
fre
requ
quen
e ciiess ffor
or eac
each
achh vvariant
ac
ari
rian
iantt were
weere extracted.
extra
raccted
ra
d. Variant
V ri
Va
rian
a t subtypes
an
suubt
btyp
ypes
es we
described and mi
were
y pr
ppredicted
edicted func
functional
tiional effect
eff
ffect (s
ff
((synonymous,
y onym
yn
y ous,, missense,, nonsense,, splice)
spl
plice)) and the ssum of
analyzed by
minor allelee frequencies
ffreq encies
cii across known
kkno
n n isoforms
isoff
was
as used
sedd to come up
p with
ith
ithh a raw
ra co
count
ntt off
expected number of variants per type per transcript. For synonymous, missense, and nonsense
variants, this number was then normalized for transcript length based on data from RefSeq. For
splice site variants, this number was normalized by number of known exons per transcript.
In order to evaluate the distribution of small indels (1-50 bp) which were notably absent
from the public release of the ESP5400 dataset, the subset of called indels from the 1000
Genomes Phase 1 March 2012 release was retrieved and annotated using ANNOVAR19 software
against the NCBI RefSeq database17 to determine the subset in coding regions of genes with any
disease association as above.
6
DOI: 10.1161/CIRCGENETICS.112.963421
Curation of Known Variants
We manually curated a set of variants in MYH7, MYBPC3, and TNNT2 with strong evidence for
causing cardiomyopathy. This set comprised of missense variants seen in patients at the Stanford
Center for Inherited Cardiovascular Disease from September 2010 to December 2011 and
considered likely or very likely disease causing. To supplement this list, we selected variants
from a publicly available repository of sarcomeric variants20 with the highest number of
independent citations. These variants were then manually curated and any variants we considered
Downloaded from http://circgenetics.ahajournals.org/ by guest on November 19, 2016
likely or very likely disease causing were included in our high confidence
ence set. Curation relied
rel
e i on
published data, cases from our clinical cohort, and case or control dataa from commer
commercial
cial
ial ggenetic
e
testing laboratories.
unrelated
oratories.
orato
tori
to
ries
ri
es. Classification
es
Classi
Cl
sifi
si
f cation was based on segregation
seegr
greegation data,, presence
pres
eseence in multiple unre
es
cases, absence
controls,
availability
compelling
model
vitro
data.
nce
nc
ce in controls
s, andd avai
aiila
labili
l ty off co
li
ompeelllingg aanimal
nim
ni
mall m
oddel or
o in vi
itro da
ata.. Variants
Va
were considered
deredd ve
very
r lik
ry
likely
kelly di
dise
disease
seas
a e ca
causing
ausing on
only
y iiff st
strong
stro
trong
ng segregation
seg
egrrega
eg
gati
tion
ti
on ddata
a a and/or
at
and orr aanimal
an
and/
nima
ni
imall m
model
data was available.
v
vailable.
Algorithmic
i P
Prediction
d
dii tii off Variant
V i tP
Pathogenicity
th
h
i iitt
All missense variants from the NHLBI ESP5400 dataset as well as variants from our curated list
of known pathogenic variants in HCM were scored using GERP21, a measure of evolutionary
constraint at a nucleotide base level utilizing a rejected substitution score, and PhastCons22,
another measure of evolutionary conservation at the nucleotide base level utilizing multiple
sequence alignment, using the SeattleSeq SNP Annotation server
(http://snp.gs.washington.edu/SeattleSeqAnnotation). Polyphen223(http://genetics.bwh.harvard.edu/pph2/) and SIFT24 (http://sift.jcvi.org/) scores, both
predictions of pathogenicity of missense variants based on the effects of the predicted resulting
amino acid substitution, were obtained from their respective servers.
7
DOI: 10.1161/CIRCGENETICS.112.963421
Structural Variation Analysis
Structural variants from DGV were grouped on the basis of Mendelian disease association. The
average number of structural variants per gene was computed. Due to the varying size of both
structural variants and the transcripts they affect, we normalized by evaluating only structural
variants affecting protein coding regions of genes and calculating the percent of each gene’s
coding region based on transcript length affected by a deletion in DGV.
Statistical Analysis
Downloaded from http://circgenetics.ahajournals.org/ by guest on November 19, 2016
All data analysis was carried out using the R Statistical Programming Language. Tests fo
forr
statistical significance between groups were non-parametric tests without
assumption
out assum
ptio
ti n of the
the
underlying dis
distribution.
str
trib
ib
but
utio
tio
on.
n These
The
h se included the Wilcoxon
Wilcoxxon
on rank-sum test for
for direct comparison
between two
Kruskal-Wallis
Spearman’s
wo ggroups,
wo
roups, the Kr
K
usska
kal-Wa
W llis
iss test
tesst forr analysis
analy
ysiss of variance,
variaance, and
and S
pear
arrma
man’
n’’s rrank
annk order
for correlation.
Given
that
most
not
linkage
i
ion.
Give
Gi
ive
ven th
hatt m
ostt genes
os
genees aare
ge
re no
ot in linkage
linka
inkagge with
wit
ithh each
each other,
oth
ther
er, li
ink
nkaage
ag be
bbetween
tweenn ggenes
twee
does not affect
the
Kruskal-Wallis
significantly.
f the results off th
fect
he Kr
K
uska
k l-Wa
W ll
llis test si
ign
g ificantlly.
For th
analysis
the
the
h anal
all sis
is off th
h exonic
e onic
ic ddistribution
distrib
istrib
ib ti
tion
io off pathogenic
athho nii and
d ES
ESP
P variants,
ariants
rii ts Fisher’s
Fishh ’s exact
test was used. While Fisher’s test does assume independence of events which may not
necessarily be true for the distribution of variants in a gene due to linkage disequilibrium, given
the overall rarity of most variants analyzed (almost all less than 1% minor allele frequency and
the majority being unique) it is unlikely that a rare variant in one exon significantly affects the
probability of a variant in another exon.
Results
Most Genetic Variation is Rare
Most variants in the population data were not shared between many individuals. Private variants,
those that were found only in one person, were abundant. Out of the 9,974 total variants called in
8
DOI: 10.1161/CIRCGENETICS.112.963421
the NHLBI exomes distributed amongst 46 separate cardiomyopathy associated genes, 9,103
(91%) had minor allele frequencies less than 1%. Of these rare variants, 5,448 (60%) were
private. This predominance of rare variants was almost identical in other genes, whether or not
they were associated with Mendelian disease. Common variants (minor allele frequency > 5%)
comprised only 5% of all genetic variation in the coding regions of human genes.
We found many genes for which a large amount of genetic variation was not only
expected, but likely serves a critical purpose. Among Mendelian disease associated genes, the
Downloaded from http://circgenetics.ahajournals.org/ by guest on November 19, 2016
five with the highest rates of missense variation were all HLA loci (Supplementary
upplementary Table
Ta
abl
ble 2),
where high rates of polymorphism are thought to be selectively maintained.
ained.
d 25 A
Another
noth
ther w
th
wellell
el
ne lo
ocu
cuss wi
ith very high missense vari
riat
ri
atio
at
i n was the ABO
O blood group locus.
recognized gen
gene
locus
with
variation
Among non-Mendelian
n-Mendelian disease
ndiseasse asso
associated
s ciatted ggenes,
so
enes, thos
those
ose wi
with
th the
thhe m
most
ost variation
vari
riiationn included
in
nclluded
d many
ma of
r rece
ry
cept
ce
ptoor gen
pt
enes
es, co
cons
n iste
teent withh th
the
he su
surv
rviival
ivall advantage
adv
dvan
antaagee of
an
of a soph
phis
ph
isticaated
is
ted se
sens
nsii
the olfactory
receptor
genes,
consistent
survival
sophisticated
sensing
system for environmental
e
od
odorant
dorant mol
molecules.
leculles.266
Missense
ssense variant
ariant
iant and
ndd nonsense variant
ariant
rii t rates
te di
didd nott appear
a
correlated
elat
l ed
d when
hhen
e llooking
ooki
kin
across all genes (Spearman’s rho=0.36). This remained true when looking at the subset of genes
with Mendelian disease association or the subset without Mendelian disease association.
Mendelian Disease Genes Exhibit Lower Rates of Genetic Variation
We found significantly lower levels of variation in genes associated with Mendelian disease as
compared to genes without a known association (Table 1). In general, this reduction was much
stronger for types of genetic variation that would be predicted to have more impact on the
resulting protein product, such as splice site or nonsense variants. Mendelian disease genes were
noted to have a 67.3% lower rate of nonsense variants as compared to genes without known
disease association (p=9.6x10-6). These were even more rare in cardiomyopathy associated genes
9
DOI: 10.1161/CIRCGENETICS.112.963421
(Figure 1), which exhibited a 98.7% lower nonsense variant rate as compared to non-disease
associated genes and a 96.1% lower rate as compared to the remaining Mendelian disease
associated genes (p=5.7 x 10-7). Similarly lower variant rates were seen for both missense and
splice site variants as well. Interestingly, this was reversed with respect to synonymous variation,
with cardiomyopathy specific genes having slightly higher rates of variation (116.4 variants per
megabase of coding region per chromosome in cardiomyopathy genes vs. 90.8 and 95.1 variants
per megabase of coding region per chromosome for non-OMIM and OMIM genes, respectively,
Downloaded from http://circgenetics.ahajournals.org/ by guest on November 19, 2016
p=2.7x10-3).
Nonsense Variants are Extremely Rare in Cardiac Structural and Sarcomere Genes
Genes
es
Single nucleotide
eottid
idee variants
vari
va
riian
a ts
ts thought to have the mostt effect
eff
f ect on protein ffunction
unction are ones that result
in a premature
uree stop codon,
n, i.e
i.e.
e. nnonsense
ons
n ensee vvariants.
ns
ariiantts.. We
We looked
look
oked
ok
ed at
at cardiomyopathy-associated
caardio
omy
myop
oppathy
hy--asssoociat
hy
ated
at
d genes
in the NHLBI
L exo
LBI
exome
xoome
m dat
data
atta to
to evaluate
eva
valuatte for
fo the
th
he overall
oveeral
ov
alll prevalence
prev
pr
val
alen
nce ooff th
this
i type
is
ty e ooff vari
ty
variation
riiat
atio
tio
i n in
in a
population without known inherited
inhherited
d cardiomyopathy.
cardi
diomyo
di
y pa
p th
hy.
y Overall,
Overall
ll,, we ffound
ll
oundd tha
that
h t nonsense vari
variants
were extremely
mel
ell rare iin th
these
h
genes
genes. In
I fact
ffact,
act iin
n th
the
he ssubset
bset
b t off genes that
thhat are routinely
ro ti
tinel
inell sequenced
seq enc for
clinical purposes in HCM, we found only one nonsense variant each in MYH7 and MYBPC3.
Nonsense variants were completely absent in the sarcomeric genes ACTC1, TNNT2, TNNI3,
MYL2, MYL3, and TPM1. While the nonsense variant in MYH7 has not been reported previously,
the nonsense variant found in MYBPC3 (p.Trp1214Ter) has been associated with hypertrophic
cardiomyopathy in one published report in an Asian Indian population27.
Among cardiomyopathy-associated genes, the gene with the greatest number of nonsense
variants in the ESP5400 exomes data was the very large gene titin (TTN), which has been
implicated in familial DCM. This may be largely due to its immense size, as the coding region of
titin consists of upwards of 100 kilobases. In total we noted 23 predicted nonsense variants in
10
DOI: 10.1161/CIRCGENETICS.112.963421
titin in the NHLBI exome data. The majority of these nonsense variants seemed to be distributed
evenly throughout the length of the gene, although there were two notable clusters of nonsense
variants near the 5’ end of the gene (Figure 2). This is in direct contrast to a recent report of a
high burden of variants in the A band of the titin protein (corresponding to a group of exons near
the 3’ end of the transcript) associated with dilated cardiomyopathy (DCM)28. Both clusters of
nonsense variants in our analysis were in exons that are specific to the novex alternate splice
isoforms of titin, the first in the terminal exon (exon 46) of the novex-3 isoform (NM_133379)
Downloaded from http://circgenetics.ahajournals.org/ by guest on November 19, 2016
and the other in exon 44 of the novex-2 isoform (NM_133437). Neither
er of these are the major
m
ma
cardiac isoforms of titin, which may explain why nonsense variants in these regio
regions
i ns may
ay
y bbee more
tolerated.
the other hand
th
d, DMD,
DM
MD, which
w ichh hhas
wh
as bbeen
een
en im
mplicat
ated
ed in
in Duchenne
Duchen
Du
en
nnee aand
nd B
ecke
ker mu
ke
m
uscc
On the
hand,
implicated
Becker
muscular
29,30
29,3
,3
30
a wel
elll as X
el
-li
link
nked
ked
d ffamilial
amilia
am
iall cardiomyopathy,
ia
cardio
i my
io
myop
opat
atthy,
hy 29
was no
wa
noted
oteed to
to manifest
man
aniffestt an extremely
anif
ext
xtre
trem
m
dystrophy as
well
X-linked
low rate of nonsense variants de
ddespite
sppite it
iitss enormous size.
siize. Of
Of al
all
ll human
h man genes,
hu
g nes,, DMD spa
ge
spans
p ns the
th
h
largest region
with
of
on off the
th
h genome: encompassing
sii 22.4
4 million
illi
il
li bases,
bases
b
ith
ith
h a coding
odi
din region
gii consisting
consist
sist
i
about 14 kb spread over greater than 70 exons. The NHLBI dataset, however, contained only one
predicted nonsense variant within this gene.
Prediction of Pathogenicity of Missense Variants Remains Challenging
We collected 46 variants, 40 of which were missense, with particularly strong evidence of
causality from three genes most often found to be causal in HCM (MYBPC3, MYH7, and
TNNT2) (Supplemental Table 3). Given a large amount of ambiguity over the effects of
missense variants in the genome, we compared the missense variants from this pathogenic list to
missense variants from the NHLBI exome data within the same genes. These 40 pathogenic
missense variants were generally located in regions within these three genes that were notable for
11
DOI: 10.1161/CIRCGENETICS.112.963421
very low variant frequencies in the population data, suggesting that these are regions with vital
functions that do not tolerate high rates of variation (Figure 3).
Furthermore, 10/26 pathogenic missense variants in MYH7 and 6/10 of the pathogenic
missense variants in TNNT2 were found in exons that were notable for a complete absence of
non-synonymous likely benign variation (Supplemental Table 4). These exons in MYH7 (exons
6,7,9, 13, and 19 of NM_000257) and in TNNT2 (exon 10 of NM_000364) thus likely encode
critical functional domains in the resultant peptide. In support of this, the above noted exons in
Downloaded from http://circgenetics.ahajournals.org/ by guest on November 19, 2016
MYH7 all encode for portions of the functional head and neck domainss31. In addition, the
he above
ab
mentioned exon in TNNT2 encodes a portion of a tropomyosin binding
g site, wi
with
ith induced
indduced
ed
32 33
variants in this
t s exon
exon previously
previ
vioously shown to strongly reduce
vi
redu
re
d ce bindingg efficacy
effficcac
a y32,33
. In general, exonic
e
distributionn was
was strikinglyy different
diffferen
nt bbetween
etw
weeen thee ppathogenic
ath
hogen
enicc vari
variants
iannts an
and
nd ES
E
ESP5400
P5540
400 variants
varian
antss in
MYH7 (p=.0059)
0059)
9) aand
9)
n TNNT2
nd
TN
NNT
NT22 (p=.013).
(p=
p .013
13)). Thiss di
13
diff
difference
fferen
ff
ence
ce w
was
as nnot
ot sstatistically
tati
ta
ati
tist
sttical
ica ly
y ssignificant
ignifi
ig
fica
fi
cant
ntt iin
n
MYBPC3, which
w
mayy be due
due to the
th
he low
low number
number
b off pathogenic
p th
pa
hoggeniic mis
missense
i sense vari
variants
iants in this gene
genn in
our collection,
on consistent
istent with
iith
thh reports
rt that
h the
thhe majority
majorit
ajj it off ddisease-causing
disease
is
ca sing
in variants
ariants
rii ts in
i this
thi
his gene
g
tend to be frameshift, splice, or nonsense variants rather than missense34,35.
Of note, 4 of the 46 variants with good evidence of pathogenicity were present in the
NHLBI exome data. The individual incidences of these variants were very low, with almost all
found in only 1 individual each, except for one variant in TNNT2, p.Arg278Cys that was found
in 6 individuals in the NHLBI exome cohort. No phenotype information was available to us for
these individuals. These variants were removed from the NHLBI ESP variant list for any further
analysis.
We used widely accepted variant classification algorithms to predict pathogenicity of
missense variants. We found the evolutionary constraint based algorithms GERP and PhastCons
12
DOI: 10.1161/CIRCGENETICS.112.963421
to be poorly predictive of variant pathogenicity in this data. Notably, GERP scores appeared on
the whole to be higher in the NHLBI ESP variant set (Figure 4), the opposite of what would be
expected. While PhastCons predicted scores of > 0.95 (max score of 1) for all the variants in our
curated causative variant list, the majority of presumably tolerated missense variants (67%) from
the NHLBI exome data set were also noted to have a similarly high PhastCons score, resulting in
a c-statistic for classification of 0.52, akin to no discriminatory power (Figure 5).
The use of algorithms based on amino acid substitution gave much better results. SIFT
Downloaded from http://circgenetics.ahajournals.org/ by guest on November 19, 2016
had modest discriminatory power with a c-statistic of 0.70. Polyphen-2,
utilizes
2, which also utilize
zes
ze
information about peptide structure and interaction, performed the bestt with
c-statistic
witth a c-st
tati
tistic
ti
icc ooff 0.77.
It should bee noted
based
machine-learning
was
not
oted
ot
ed
d however
how
owev
ver that Polyphen-2 is base
ed oon
n a machine-lear
arning algorithm that w
ar
trained on va
may
included
some
curated
vvariants
ariiants that m
ay hhave
av
ve in
nclude
d d so
de
omee of tthose
hosee from
fro
rom
ro
m ou
oour
ur cu
uraateed list.
Cardiomyopathy
Genes
Exhibit
Structural
o thyy G
opat
enes E
xhi
hibi
b t Le
bi
Less
ss Stru
uct
ctur
tur
u all Variation
Varia
iati
ia
tion
ti
n
We attempted
ted to recap
recapitulate
pitulate th
these
hese fi
find
findings
dings
g iin
n othe
other
h r ty
type
types
p s off ggenetic
enetiic vari
variation
iatiion byy evaluating the
distributionn off small
Project.
notably
all
ll indels
inddells in
i data
data from
f
the
th
h 1000
1000 Genomes
G
P
Project
ject There
Th were
ere notabl
otabl
bl only
onl
nl
5,969 indels from this dataset in coding regions, of which 868 were in Mendelian disease
associated genes and 26 were in cardiomyopathy associated genes. This gave total rates of 17
indels per 1,000 exons in non-Mendelian disease genes, 10 indels per 1,000 exons in Mendelian
disease genes, and 9 indels per 1,000 exons in cardiomyopathy genes. However the overall low
number of these types of variants in this data limited any further statistical analysis.
We then used data from DGV to query on a per gene basis the number of all structural
variants that have been reported as well as the overall extent of the coding region of genes that
are covered by known structural variants. We found that the total number, per gene, of all
structural variants and only structural variants affecting coding regions did not differ between
13
DOI: 10.1161/CIRCGENETICS.112.963421
genes associated with Mendelian disease and those that are not (Table 2). However we did note
a 53% reduction of coding region covered by reported deletion type structural variants in genes
that are specifically associated with cardiomyopathy as compared to genes without Mendelian
disease association (p-value=0.02).
Discussion
Recent studies have suggested a surprising rate of tolerance to genetic variation within the
Downloaded from http://circgenetics.ahajournals.org/ by guest on November 19, 2016
human genome. Here, we show that this tolerance does not extend to genes
associated with
g
cardiomyopathy, especially structural and sarcomere genes. This observation
with
systems
rvatiion fi
fits
ts w
ithh a sy
it
sys
s
model of organism
disproportionately
rganism
m function
func
n tiion
o where some genes are di
disp
s roportionately
y iintolerant
ntolerant of variation
because their
describing
eirr fu
ffunction
unction hhas
as less
les
esss redundan
rredundancy.
redu
edu
dund
n an
nd
anccy.. IIn
n aaddition,
dditiion, in
in describi
desscri
de
riibi
bingg population
poppulation
ulatio
tio
on variation
vaari
riat
atio
at
ion data
io
d
for these genes,
variants
e es, we
ene
we note
noote the
the presence
preese
s ncce of a ssurprising
u pr
ur
pris
i in
is
ingg number
nuumb
mber
er of
of disease-associated
dise
di
seas
se
a e-as
as
a so
as
soci
ciat
ciat
ated
ed var
a iaant
ar
nts in a
population without enrichment
cardiomyopathy.
enrichm
hmen
hm
en
nt for
for ca
card
rdio
rd
iomy
io
myop
my
oppat
a hyy.
In contrast
diversity
ontrast
ontr
on
tras
astt to the
the
he high
hig
ighh rate
rate of
of genetic
gene
ge
neti
ticc variation
ti
vari
va
riat
attio
ionn found
foun
fo
undd in genes
gen
enes
es dependent
dep
epen
ende
dent
ntt oon
n di
dive
diversit
vers
rsiit for
effective function such as the olfactory receptor loci, we found that population genetic variation,
especially variation expected to affect protein function, was rare in Mendelian disease associated
genes. We hypothesized that genes essential for cardiac function might be among the genes most
intolerant of variation. Not only was this the case, but the strength of these associations was also
found to be dependent on the severity of the predicted alteration of protein function, exemplified
by the extreme rarity of nonsense variants in cardiomyopathy-specific genes. These findings
extended to structural variants as well, specifically in regards to the percent of the coding
transcript that is involved in deletion type structural variants in individuals without disease.
One strength of our study is in the practical application to clinical genetic testing, which
relies on data from unaffected individuals to judge the likely pathogenicity of novel variants. As
14
DOI: 10.1161/CIRCGENETICS.112.963421
our understanding of human genetic variation has improved, it has become clear that even rare
genetic variation can be normal and well tolerated, representing a challenge in linking genotypes
to phenotypes. One recent study has estimated, using 1000 Genomes data, that the average
person has as many as 100 loss of function variants per genome2. This population level of
variation has implications for the interpretation of results of clinical genetic testing. However our
results indicate that this variation is not evenly distributed and genes for which associations with
Mendelian disease have been established have much lower levels of such variation, likely
Downloaded from http://circgenetics.ahajournals.org/ by guest on November 19, 2016
representing the effects of purifying selection.
Why genes associated with cardiomyopathy show even lower rates off geneti
genetic
tic va
ti
variation
ari
riat
M
nde
deli
l an ddisease
li
isseaase associated genes is no
ot sself-evident
elf-evident but th
here are many possibilities
possibb
than other Men
Mendelian
not
there
his may
hi
may be thee case.
casee. One
ne study
d hhas
dy
as sug
uggeestted
d tthat
hatt M
ha
endeliian
en
n ddisease
iseasee ge
gen
nes ma
m
y not
as to why this
suggested
Mendelian
genes
may
hee hhubs
ubs off ggene
ub
enee ne
en
nnetworks
twor
orrks36 (b
((because
ecau
a se tto
o ma
manifest
anife
nife
fest ddisease
isea
is
sease
se a var
variant
aria
aria
iant ccannot
anno
an
nott bbee
necessarily be the
fatal). However,
w
wever,
, genes
genes associated
associiatedd with
i h cardiomyopathy
cardi
d omyo
di
y pa
p th
hy mayy bbee an except
exception
ptiion gi
pt
ggiven
ven their es
essential
functions within
iithin
thi
hi the
th
h sarcomere and
ndd the
thhe heart’s
h rt’’s unique
niq
nii e position
siitii in
i serving
ser iing
n all
ll other
othhe organs.
organs
Variants in these highly structured peptides with molecular motor functions that operate
constantly throughout life would be expected to be heavily selected against in the general
population. The finding of a slight increase in synonymous variants in cardiomyopathyassociated genes is unexpected. It is possible that this represents a decrease in codon use bias in
these genes relative to others, which may in turn reflect a decreased need of efficiency of
translation of these structural proteins, but why this may be the case is not evident.
One intriguing finding in cardiomyopathy genetics is the contrast between disease
causing variants found in MYBPC3 and those in MYH7, the two genes with the highest number
of HCM-causing variants. Indeed, the high rate of nonsense pathogenic variants found in
15
DOI: 10.1161/CIRCGENETICS.112.963421
MYBPC337 is in contrast with the almost universal missense nature of those found in MYH7. The
extreme rarity of nonsense variants in cardiomyopathy genes in the data presented here suggests
that a high probability for pathogenicity for such variants found in MYBPC3 in patients would be
appropriate. The absence of disease-causing nonsense variants in MYH7 is curious. It may be that
MYH7 haploinsufficiency may not be tolerated at all. We do note that predisposition of genes
towards one type of variation versus another is not uncommon given the poor correlation
between rates of different types of variation noted in our data, which may be driven by the
Downloaded from http://circgenetics.ahajournals.org/ by guest on November 19, 2016
resulting effects of such variants (dominant-negative effects in missense
se variants versus
haploinsufficiency states in nonsense variants).
Missense
s nse vvariants
ssen
aria
ar
iaantss rremain
emain among the most ddifficult
i fi
if
f cult to interpre
interpret
et in
i a clinical context.
Without a large
aarge
rgge number of
of affected
afffected
ed
d and
and
d unaffected
unaaffectted ffamily
fami
am
mily
ly m
members
em
mbe
bers too show
shhow co-segregation
coo-segr
gregaatiion of
gr
variant with
h disea
disease,
eaase,
see iitt iss of
ofte
often
fte
t n difficul
ddifficult
di
i ullt to ddetermine
eteermi
et
m ne iiff a mi
mi
miss
missense
sense var
variant
ariian
iant truly
tru
ruly
ly is pathogenic.
pathogen
path
pa
thog
th
ogen
e
Much has been made of the use off measures off evolu
evolutionary
l tiionaryy conservation
conservatiion to pr
pprioritize
ioritize missense
miss
variants. Ourr anal
analysis
l si
sis
i sh
sho
shows
h s that
h while
hhile
il th
these
h
meas
measures
res can help
hell eexclude
cll dde variants
ariants
rii t att position
positions
siitii
in
the genome that do not show conservation, they are unable to efficiently discriminate between
likely causative and non-causative variants. While evolutionary conservation at the nucleotide
base level appears to be a necessary characteristic of a pathogenic variant, it is not sufficient in
and of itself to classify a variant as causative. Algorithms using the predicted effects of the
resulting amino acid substitution showed much better classification potential although this may
in part reflect the use of cardiomyopathy causative variants as training data for these classifiers.
Our analysis also confirms recent evidence that the overwhelming majority of variation in
the human genome is rare (i.e. affecting < 1% of the population). Interestingly, more than half of
variants analyzed were private (found in only one person). In fact, taking all 8 commonly
16
DOI: 10.1161/CIRCGENETICS.112.963421
sequenced genes for HCM together (ACTC1, TNNT2, TNNI3, MYL2, MYL3, TPM1, MYH7,
MYBPC3), we found 159 private missense variants, 3 private splice site variants, and 2 private
nonsense variants for a total of 164 private variants that would have the potential to affect the
resulting protein. Assuming that none of these variants was found in the same person, this would
imply that 3% of a general population sample who were to be sequenced today would have
candidate variants not seen previously on a small HCM disease genetic testing panel. This
highlights the continued importance of co-segregation and other supporting data in deciding
Downloaded from http://circgenetics.ahajournals.org/ by guest on November 19, 2016
whether or not a novel variant is causative of disease.
It was also surprising to find 4 of 46 “gold standard” pathogenic
variants
ic va
riiants
t present
nt iin
n this
population sample,
with
count
5,379
sam
mpl
p e, w
ith a to
it
ttotal
tal pathogenic allele co
oun
unt of 9 among
g 5,3
,379
37 individuals. These
Thes data
would imply
variants
believed
HCM
approximately
ly a background
ly
nd prevalence
prrevvale
leenc
n e of
o var
rian
ntss be
eliieve
vedd ca
ccausative
usat
ative of H
HC
CM of app
pproxi
pp
xiimaa
0.2% (based
likely
However,
d on 446
6 vvariants
arian
antts in
i 3 ggenes,
enes
es, and th
es
tthus
us li
lik
kely a ssubstantial
ke
ubst
ub
staantiiall underestimate).
st
undderest
un
de st
stim
imatee).
im
) H
owev
ow
ev
this is much
higher
expected
where
the
h hig
gher than exp
pectedd iin
n a ggeneral
enerall ppopulation
oppullatiion wh
here th
he pr
pprevalence
evallence of HCM is
38-40
38-40
estimated too be
b 00.2%
22%
% iinn multiple
m lltiple
tiipll populations,
pop llations
atii s 38
when
hhen
e considering
id in that
h th
the
h yield
iield
eld
ld off genetic
etiic
testing is far from 100%. This is consistent with other recently published studies finding higher
than expected prevalence of genetic variants associated with other Mendelian cardiovascular
diseases such as familial DCM14 and long QT syndrome41, though the burden of evidence of
pathogenicity for variants in these studies was variable.
While it remains possible that some individuals within these cohorts may harbor
undiagnosed HCM given that phenotype data for these individuals is not publicly available, the
genetic prevalence rate would still be expected to be much lower than that observed in this data.
Based on this genetic variant prevalence data, estimates of the incidence of HCM would have to
be underestimated by a factor of at least 2 for our current models of HCM disease inheritance to
17
DOI: 10.1161/CIRCGENETICS.112.963421
be true. Given that these estimates of HCM disease prevalence were based on multi-modality
screening in diverse populations, it seems likely that some proportion of the variants thought to
be causal of HCM under a single gene model cannot be. Alternatively, we posit that the idea of a
single gene disorder with variable penetrance is likely an artifact of a limited genomic window,
and that what has commonly been perceived as a single gene disorder may in fact be the result of
a combination of multiple genetic variants each contributing a portion of the variance, with
variants contributing differently in different individuals. Just as some have suggested that a
Downloaded from http://circgenetics.ahajournals.org/ by guest on November 19, 2016
number of rare variants with strong effect size may be the driver of thee inherited compone
component
nen in
ne
many common diseases8,42,43, so too might this be the case for what have
ave hi
historically
isttoriicallly been
beeen
a m
onog
on
ogen
og
enic
en
i ddisorders.
i orders.
is
perceived as
monogenic
Limitations
ns
h li
limi
mita
mi
tati
ta
tionns. No
ti
No individual
ind
ndivid
d id
idua
uall phenotype
ua
phhen
enot
otyype data
ot
dat ffor
or tthe
he ccohorts
he
ohor
oh
hortts
ts iin
n NH
NHL
LBII ES
E P, 1000
10
Our study has
limitations.
NHLBI-ESP,
Genomes, or
o DGV is ppublicly
ubliiclyy avai
available,
ilabl
blle,, so iitt iiss not possible
p ssib
po
i le ffor
or us at thi
this
his ti
hi
time
ime to determinee if
those individuals
iid
d al
als
l with
iith
th
h variants
ariants
rii ts from
f
our
o r curated
c rated
tedd sett may
ma have
hhaa e features
ffeat
eat res off an undiagnosed
ndiagnosed
ndi
dia
ed
d
cardiomyopathy. While the accumulated set of variants from these 5,379 individuals is available,
individual exomes cannot be reconstructed so it is not possible to determine which variants may
be shared on the same chromosome. Also the family structure of the individuals within the
NHLBI ESP data was also unknown. It is thus possible that a rare variant could be
overrepresented if many members of the same family were sequenced.
Conclusion
In conclusion, using publicly available exome-wide sequencing data from thousands of
individuals, we found that genes associated with Mendelian diseases show much lower rates of
protein-altering genetic variation, including missense, nonsense, and splice-site variation, with an
18
DOI: 10.1161/CIRCGENETICS.112.963421
extreme intolerance of variation noted specifically in cardiomyopathy-associated genes.
Cardiomyopathy-associated genes specifically showed intolerance to structural variation as well.
Nonsense variants in genes that have been recurrently linked to hypertrophic cardiomyopathy
were extremely rare, and our results suggest that such variants in these genes found on clinical
testing have a very high likelihood of being pathogenic. In contrast, novel missense variants were
present in at least 3% of individuals, and thus the careful interpretation of missense variants
found on clinical genetic testing is critical. Current in silico classification schemes for predicting
Downloaded from http://circgenetics.ahajournals.org/ by guest on November 19, 2016
the pathogenicity of missense variants unfortunately have low power in
n classifying
cardiomyopathy variants. Finally, we note a much higher than expected
prevalence
ed preval
lence of vvariants
ariia
ar
with strong evidence
of genome
ev
vid
iden
ence
en
ce for
for ppathogenicity.
a hogenicity. This suggests
at
suggeest
sts that, using the power
po
sequencing,, a nnew
framework
ew framew
workk for
for he
hheterogeneous
teroogene
neouus Mendelian
Men
endeli
lian
li
an
n ddisorders
isor
orderss such
succh as iinherited
nheerit
nh
i ed
d
cardiomyopathies
members
p hiiess needs
pathi
needss to
to be
b developed
dev
e elop
oped
op
ed where
whher
ere variants
vari
rian
iantts ffound
ound
ou
ndd inn pati
ppatients
ati
t en
entts and
nd ffamily
am
mil
ily me
mem
mb
are viewed pr
spectrum
pprobabilistically
obabilisticallyy on a sp
pectrum from
from unlikely
unli
l ke
li
k llyy to likely
l kellyy contributors
li
contrib
ibutors of variable
ib
individual magnit
magnitude.
nit
i de
de Wh
While
Whil
il this
thi
his model
oddell challenges
hall
ll
the
th
h cl
classic
l sii ‘‘single
siingle
l variant
ariant
iant in
in a single
ingll gene
gen
disorder’ view, it may also begin to explain some of the significant variability in disease
expression found in family members with the same ‘causal’ variant.
Acknowledgments: The authors would like to thank the NHLBI GO Exome Sequencing Project
and its ongoing studies which produced and provided exome variant calls for comparison: the
Lung GO Sequencing Project (HL-102923), the WHI Sequencing Project (HL-102924), the
Broad GO Sequencing Project (HL-102925), the Seattle GO Sequencing Project (HL-102926)
and the Heart GO Sequencing Project (HL-103010).
Funding Sources: Stephen Pan is supported by NIH grant 5T15LM007033. This work was also
supported in part by NIH grants DP2OD004613, R01HL105993, UL1RR029890 (Euan Ashley).
Conflict of Interest Disclosures: Euan Ashley reports equity and consulting in relation to
Personalis Inc.
19
DOI: 10.1161/CIRCGENETICS.112.963421
References:
1. 1000 Genomes Project Consortium. A map of human genome variation from population-scale
sequencing. Nature. 2010;467:1061–1073.
2. MacArthur DG, Balasubramanian S, Frankish A, Huang N, Morris J, Walter K, et al. A
systematic survey of loss-of-function variants in human protein-coding genes. Science.
2012;335:823–828.
3. Li Y, Vinckenbosch N, Tian G, Huerta-Sanchez E, Jiang T, Jiang H, et al. Resequencing of
200 human exomes identifies an excess of low-frequency non-synonymous coding variants. Nat
Genet. 2010;42:969–972.
Downloaded from http://circgenetics.ahajournals.org/ by guest on November 19, 2016
4. Korbel JO, Urban AE, Affourtit JP, Godwin B, Grubert F, Simons JF, et al. Paired-end
mapping reveals extensive structural variation in the human genome. Science. 2007;318
2007;318:420–
8:4
:42
426.
5. Gersh BJ, Maron BJ, Bonow RO, Dearani JA, Fifer MA, Link MS, et al
al. 20
ACCF/AHA
2011
11 A
CCF/
CC
F/AH
F/
A
guideline for
summary: a
or the
the di
diag
diagnosis
agno
ag
n si
siss and treatment of hypertrophic
hypertrrop
ophic cardiomyop
cardiomyopathy:
pat
athy: executive summ
report of thee Am
Task
American
n College
Col
ollege
ge of
of Cardiology
Caard
Card
rdio
i lo
io
logy
g Foundation/American
Fou
o nddation/
n//Am
n/Am
Amer
eric
er
ican
ic
an Heart
Heart Association
Asssoc
o iati
iaati
tion
onn Tas
T
as
Force on Practice
Guidelines.
2011;124:2761–2796.
racttice Guidelin
ra
ines. Circulation.
in
Circcul
u atio
on. 20
0111;1
1244:2
27661–
1–27
2 966.
27
6. Ackerman
Priori
SG,
Willems
a MJ,
an
MJ Pr
MJ
Pri
iori S
G, W
G,
il
il
illems
s S,
S, Berul
Beru
rull C,
ru
C, Brugada
Bru
ruga
gada
da R,
R, Calkins
Calk
lkin
inss H,
in
H, et al.
al. HRS/EHRA
HRS/
HR
HRS/
S/EH
EHRA
EH
RA
expert consensus
statement
s
sensus
nt on
on the
t e state
th
stat
st
atte off ggenetic
enet
en
etic
et
i ttesting
ic
esti
es
tiing
n for
or tthe
he cchannelopathies
hann
ha
nnnel
elop
oppat
athies and
cardiomyopathies
ppathies
athies this document was ddeveloped
evellopped
d as a pa
ppartnership
rtnershi
hiip bbetween
etween th
the
he Heart Rhyt
Rhythm
y hm
Society (HRS)
the
European
Heart
Rhythm
Association
Rhythm.
RS) and th
he Eu
E
rope
ro
peean
an H
eart Rh
ea
eart
R
hyt
ythm
yt
hm
mA
ssoc
ss
occia
ocia
i ti
tion
on ((EHRA).
EHRA
EH
RA).
RA
). Heart
Heart
He
Hear
arrt Rh
hyt
ythm
h . 2011;
2011
8:1308–1339.
399
3
7. Wheeler M, Pavlovic A, DeGoma E, Salisbury H, Brown C, Ashley EA. A New Era in
Clinical Genetic Testing for Hypertrophic Cardiomyopathy. J Cardiovasc Transl Res.
2009;2:381–391.
8. Cirulli ET, Goldstein DB. Uncovering the roles of rare variants in common disease through
whole-genome sequencing. Nat Rev Genet. 2010;11:415–425.
9. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, et al. Finding the
missing heritability of complex diseases. Nature. 2009;461:747–753.
10. Ashley EA, Butte AJ, Wheeler MT, Chen R, Klein TE, Dewey FE, et al. Clinical assessment
incorporating a personal genome. Lancet. 2010;375:1525–1535.
11. Dewey FE, Chen R, Cordero SP, Ormond KE, Caleshu C, Karczewski KJ, et al. Phased
Whole-Genome Genetic Risk in a Family Quartet Using a Major Allele Reference Sequence.
PLoS Genet. 2011;7:e1002280.
12. Pan S, Dewey FE, Perez MV, Knowles JW, Chen R, Butte AJ, et al. Personalized Medicine
20
DOI: 10.1161/CIRCGENETICS.112.963421
and Cardiovascular Disease: From Genome to Bedside. Curr Cardiovasc Risk Rep. 2011;5:542–
551.
13. Tennessen JA, Bigham AW, O'Connor TD, Fu W, Kenny EE, Gravel S, et al. Evolution and
Functional Impact of Rare Coding Variation from Deep Sequencing of Human Exomes. Science.
2012;337:64-69.
14. Norton N, Robertson PD, Rieder MJ, Züchner S, Rampersaud E, Martin E, et al. Evaluating
Pathogenicity of Rare Variants From Dilated Cardiomyopathy in the Exome Era. Circ
Cardiovasc Genet. 2012;5:167–174.
Downloaded from http://circgenetics.ahajournals.org/ by guest on November 19, 2016
15. Zhang J, Feuk L, Duggan GE, Khaja R, Scherer SW. Development of bioinformatics
resources for display and analysis of copy number and other structural variants in the human
genome. Cytogenet Genome Res. 2006;115:205–214.
16. Online Mendelian Inheritance in Man, OMIM®. Online Mendeliann Inheritance
Inh
nher
erit
er
itan
it
ance
an
ce in
in Man.
Ma .
Available at: http://omim.org. Accessed December 11, 2011.
K , Tatusova
Tatu
Ta
tuso
tuso
sova
v T
DR
R. NCBI
N BI Reference S
NC
equences: current st
t
17. Pruitt KD,
T,, Klimke W, Maglott DR.
Sequences:
status,
policy and new
new
w initiativ
initiatives.
ives
iv
es. Nucleic
es
Nu
ucl
c ei
eicc Ac
Acid
Acids
idss Re
id
R
Res.
s 2009;37:D32–D36.
s.
200
009;377:D
D32–D
322–D36
–D36
36.
18. Fujita PA,
PA, Rhead
Rhead B,
B, Zweig
Zw
weiig AS,
AS
S, Hinrichs
H nrric
Hi
i hs
hs AS,
AS
S, Karolchik
Kaaroolch
chiik D
ch
D,, Cl
C
Cline
line MS,
MS,, et
et al.. The
Th UC
UCSC
CSC
o
owser
r ddatabase:
a abas
at
b se: up
pdate
dat 201
011
01
1. Nucleic
Nucl
clei
cl
eic
ei
ic A
cid
i s Re
id
Res
s. 2011;39:D876–D882.
201
011;
1;39
1;
39:D
39
:D87
D87
876–D8
D8882.
D8
genome browser
update
2011.
Acids
Res.
19. Wang K,, Li M,, Hakonarson H
H.. AN
ANNOVAR:
NNO
OVA
V R:
R ffunctional
unctiional annotation
annotatiion off ge
ggenetic
netic variants ffrom
ghput sequencing
sequuen
enci
cing
ci
ng ddata.
ata. Nucleic
at
ata.
Nucl
Nu
Nucl
clei
eic Acid
eic
A
Ac
cids
id
ds Re
Res.
s. 2010;38:e164–e164.
s.
201
0 0;
0;38
38:e
38:e
:e16
e164–
4–e1
4–
e1
164
64..
high-throughput
Acids
20. Genomics of Cardiovascular Development, Adaptation, and Remodeling. NHLBI Program
for Genomic Applications, Harvard Medical School Available at:
http://www.cardiogenomics.org. Accessed January 20, 2012.
21. Cooper GM. Distribution and intensity of constraint in mammalian genomic sequence.
Genome Res. 2005;15:901–913.
22. Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, et al.
Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res.
2005;15:1034–1050.
23. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, et al. A method
and server for predicting damaging missense mutations. Nat Methods. 2010;7:248–249.
24. Kumar P, Henikoff S, Ng PC. Predicting the effects of coding non-synonymous variants on
protein function using the SIFT algorithm. Nat Protoc. 2009;4:1073–1081.
25. Hughes AL, Yeager M. Natural selection at major histocompatibility complex loci of
vertebrates. Annu Rev Genet. 1998;32:415–435.
21
DOI: 10.1161/CIRCGENETICS.112.963421
26. Menashe I, Man O, Lancet D, Gilad Y. Different noses for different people. Nat Genet.
2003;34:143–144.
27. Bashyam MD, Purushotham G, Chaudhary AK, Rao KM, Acharya V, Mohammad TA, et al.
A low prevalence of MYH7/MYBPC3 mutations among familial hypertrophic cardiomyopathy
patients in India. Mol Cell Biochem. 2012;360:373–382.
28. Herman DS, Lam L, Taylor MRG, Wang L, Teekakirikul P, Christodoulou D, et al.
Truncations of titin causing dilated cardiomyopathy. N Engl J Med. 2012;366:619–628.
Downloaded from http://circgenetics.ahajournals.org/ by guest on November 19, 2016
29. Politano L, Nigro V, Nigro G, Petretta VR, Passamano L, Papparella S, et al. Development of
cardiomyopathy in female carriers of Duchenne and Becker muscular dystrophies. JAMA.
1996;275:1335–1338.
30. Sylvius N, Tesson F, Gayet C, Charron P, Bénaïche A, Peuchmaurd
rd M,
M, et al.
al. A new
ne locus
locc for
lo
autosomal dominant dilated cardiomyopathy identified on chromosome
6q12-q16.
Hum
me 6q1
q112 q116. Am
q12Am J H
u
Genet. 2001;68:241–246.
31. Van Driest
Jaeger
Ommen
SR,
Comprehensive
iestt SL, Jae
ie
ege
gerr MA
MA, Om
mmen
meen SR
R, Wi
Will
ll ML, Gersh
Ger
errsh BJ,
ersh
J Tajik
J,
Taj
ajik
ik
k AJ,, et al.
al. Comp
Co
omp
mpre
rehee
re
Analysis off the
Chain
Patients
thee Beta-Myosin
Beta-Myo
osin Heavy
Heav
He
vy Ch
hai
a n Gene
Genee in 389
38 Unrelated
Unre
Un
rellateed Pat
re
tieentts With
Witth Hypertrophic
Hypperttroop
Hy
Cardiomyopathy.
Coll
Cardiol.
pat
pat
athy
hy J Am
hy.
mC
oll Ca
ardio
iol.
io
l 2004;44:602–610.
l.
200
004;;44:6
00
602–6610..
32. Jin J-P, Chong SM. Localization
tropomyosin-binding
sites
Lo
ocaaliiza
z tiion
o of
of the
thee tw
th
twoo ttr
rop
opom
om
myo
y siinn bi
bind
n in
nd
ng si
ite
tess of troponin T. Arch.
Arr
Biochem. Bioph
Biophys.
B
p ys
y . 2010;500:144–150.
2010;;5000:14
1444–15
150.
0
33. Palm T, Graboski
S, Hitchcock-DeGregori
SE, Greenfield
NJ.
mutations
Grab
b ki S
Hitch
Hi
Hitchcock
hc k D
DeGregori
eG
Gr
i SE
Gr nfi
field
ld N
NJ
J Disease-causing
D
Disease
is
ca si
sing
i m
tatio
tatiio in
cardiac troponin T: identification of a critical tropomyosin-binding region. Biophys J.
2001;81:2827–2837.
34. Andersen PS, Havndrup O, Hougs L, Srensen KM, Jensen M, Larsen LA, et al. Diagnostic
yield, interpretation, and clinical utility of mutation screening of sarcomere encoding genes in
Danish hypertrophic cardiomyopathy patients and relatives. Hum. Mutat. 2009;30:363–370.
35. Richard P, Charron P, Carrier L, Ledeuil C, Cheav T, Pichereau C, et al. Hypertrophic
cardiomyopathy: distribution of disease genes, spectrum of mutations, and implications for a
molecular diagnosis strategy. Circulation. 2003;107:2227–2232.
36. Goh KI, Cusick ME, Valle D, Childs B, Vidal M, Barabasi AL. The human disease
network. Proc Natl Acad Sci U S A. 2007;104:8685–8690.
37. Erdmann J, Daehmlow S, Wischke S, Senyuva M, Werner U, Raible J, et al. Mutation
spectrum in a large cohort of unrelated consecutive patients with hypertrophic
cardiomyopathy. Clin Genet. 2003;64:339–349.
38. Maron BJ, Gardin JM, Flack JM, Gidding SS, Kurosaki TT, Bild DE. Prevalence of
22
DOI: 10.1161/CIRCGENETICS.112.963421
hypertrophic cardiomyopathy in a general population of young adults: echocardiographic
analysis of 4111 subjects in the CARDIA study. Circulation. 1995;92:785–789.
39. Zou Y, Song L, Wang Z, Ma A, Liu T, Gu H, et al. Prevalence of idiopathic hypertrophic
cardiomyopathy in China: a population-based echocardiographic analysis of 8080 adults. Am J
Med. 2004;116:14–18.
40. Maron BJ. Hypertrophic cardiomyopathy: a systematic review. JAMA. 2002;287:1308–1320.
41. Refsgaard L, Holst AG, Sadjadieh G, Haunsø S, Nielsen JB, Olesen MS. High prevalence of
genetic variants previously associated with LQT syndrome in new exome data. Eur J Hum
Genet. 2012;20:905-908.
Downloaded from http://circgenetics.ahajournals.org/ by guest on November 19, 2016
42. Holm H, Gudbjartsson DF, Sulem P, Masson G, Helgadottir HT, Zanon C, et al. A rare
variant in MYH6 is associated with high risk of sick sinus syndrome. Nat Genet. 2011;4
2011;43:316–
433::3
320.
43. Lupski JR, Belmont JW, Boerwinkle E, Gibbs RA. Clan genomicss and the
the complex
com
ompl
plex
pl
ex
architecturee off hu
human
2011;147:32–43.
huma
mann di
ma
ddisease.
sease. Cell. 2011;147:32–43
se
43.
43
23
DOI: 10.1161/CIRCGENETICS.112.963421
Table 1. Average rates of variation by subtype across genes without Mendelian disease
association(non-OMIM), genes with annotated Mendelian disease association (OMIM), and
genes associated with inherited cardiomyopathies. For synonymous, missense, and nonsense
variant rates, units are counts per 1x106 base pairs of coding region per chromosome. For rates of
splice site variants, units are counts per exon per chromosome. 1st Qu. = 1st quartile, 3rd Qu. = 3rd
quartile. P-values were computed using non-parametric Kruskal-Wallis test for analysis of
variance.
Synonymous
Downloaded from http://circgenetics.ahajournals.org/ by guest on November 19, 2016
Non-OMIM Genes
p-value
OMIM Genes
2.7x10-3
Cardiomyopathy Genes
1st Qu.
4.2
5.2
4.8
Median
30.1
35.2
44.8
44
.8
8
Mean
90.8
95.1
116.4
116.
11
6.4
6.
122.5
122 5
121.1
121 1
151.66
151
3rd Qu.
Missense
Missen
ense
Non-OMIM
Non
No
n-OM
OMIM
IM Genes
Gen
eness
OMIM
OM M G
Genes
enes
en
es
Cardiomyopathy
C rd
Ca
r io
omy
myop
opat
op
athy
at
hy Genes
Gen
enes
es
1st Qu.
2.2
2.2
22.5
2.
5
1.5
Median
13.5
16.2
11.4
Mean
85.5
85
5
76.6
76
.6
6
27.6
27
.6
6
3rd Qu.
89.9
86.2
46.8
-2
1.
1.8x10
.
Nonsense
Non-OMIM Genes
OMIM Genes
Cardiomyopathy Genes
1st Qu.
0.000
0.000
0.000
Median
0.000
0.000
0.000
Mean
0.794
0.266
0.011
3rd Qu.
0.048
0.037
0.012
5.7x10-7
Splice
Non-OMIM Genes
OMIM Genes
Cardiomyopathy Genes
1st Qu.
0.000
0.000
0.000
Median
0.000
0.000
0.000
Mean
0.094
0.072
0.004
3rd Qu.
0.000
0.003
0.000
24
8.8x10-8
DOI: 10.1161/CIRCGENETICS.112.963421
Table 2. Average counts of structural variants (SVs) and percent of transcript affected by known
SVs in the Database of Genomic Variants. Non-OMIM – genes without Mendelian disease
association. OMIM – genes with known Mendelian disease association. Numbers are averages of
per gene counts or percents for all genes within that classification. *Denotes statistically
significant difference between cardiomyopathy associated genes and genes without Mendelian
disease association (p=.02 by Wilcoxon rank-sum test).
Downloaded from http://circgenetics.ahajournals.org/ by guest on November 19, 2016
Non-OMIM OMIM Cardiomyopathy
y
Average Number of SVs Affecting Gene
2.8
3.1
3.1
3.2
32
3.
Average Number of SVs Affecting Gene in Coding Regions
1.5
1.6
1..6
1.2
12
1.
o Coding
Cod
o in
ing
g Regions
Re ion
Re
Regi
ons Affected by Known SVss
Average % of
%
32%
31%
15%**
Figure Legends:
g ds:
gend
Figure 1. Plot of missense and nonsense variant rates for all known human gene transcripts
calculated from the exomes of 5,379 persons in the NHLBI Exome Sequencing Project. NonOMIM = genes without known association with a Mendelian disease. OMIM = genes with a
known association with a Mendelian disease in the Online Mendelian Inheritance in Man
(OMIM) database. Cardiomyopathy = genes with known association with a familial
cardiomyopathy. Variant rates are in units of counts per 1,000 base pairs of coding region per
transcript per chromosome.
Figure 2. Location of nonsense variants found in the large sarcomeric gene titin (TTN). The
structure of 5 known isoforms is displayed at the top of the figure oriented by location on
25
DOI: 10.1161/CIRCGENETICS.112.963421
chromosome 2, with the 5’ end of the transcript on the right and the 3’ end on the left. Red
arrows depict exons in which clusters of nonsense variants were noted.
Figure 3. Plot of minor allele frequency of non-synonymous coding variants from the NHLBI
ESP data set over the distribution of the known exons of A) MYH7 (chromosome 14), B)
MYBPC3 (chromosome 11), and C) TNNT2 (chromosome 1). Red arrows denote locations of
pathogenic variants from a curated list from clinical experience at our institution and literature
Downloaded from http://circgenetics.ahajournals.org/ by guest on November 19, 2016
reports. X-axis is genomic coordinates in megabases. 5’ and 3’ refer to start and end of
transcript, respectively.
Figure 4. Rel
Relative
from
NHLBI
ESP
curated
R
elat
el
ative distribution
at
dist
stribu
st
buu onn of
bution
of missense
m ssen
mi
sssen
ensee variants
v ri
va
rian
ants fr
an
rom
m th
thee NHLB
NH
HLB
LBII ES
E
P an
andd a cu
ura
rate
tedd
te
pathogenic va
variant
MYH7,
MYBPC3,
TNNT2,
scored
A)) G
GERP,
B)
ari
rian
a t listt in the
th
he genes
geeness M
YH7
H7, M
YB
BP
PC3
3, and
an
nd TN
NNT
T2,, as sc
core
redd byy A
ERP
RP, B
PhastCons, C) SIFT, and D)
D Polyphen-2.
Pol
o ypphe
henn-2.
n2 Grey
2.
Gre
r y bars
bbaars denote
den
enot
ote va
ot
vvariants
rian
ri
a tss ffrom
an
room NH
NHLBI ESP data,, black
bars denotee variants ffrom
list.
C,, 1 – SIFT
was
rom
ro
m th
thee path
ppathogenic
pa
ath
hog
ogen
en
enic
nic
ic lis
ist.
is
t. For
t.
For
or panel
pan
anel
el C
SIFT
SI
T score
score
co
ore
re w
as used
as
use
s d to preserve
presee
consistency between panels, with far right predicted to be more pathogenic and far left predicted
to be less pathogenic.
Figure 5. Receiver operator curves for A) GERP, B) PhastCons, C) SIFT, and D) Polyphen-2 for
the classification of collected missense variants from the NHLBI ESP data set and a curated
pathogenic missense variant list in the genes MYH7, MYBPC3, and TNNT2. AUC = area under
the curve.
26
Downloaded from http://circgenetics.ahajournals.org/ by guest on November 19, 2016
Downloaded from http://circgenetics.ahajournals.org/ by guest on November 19, 2016
Downloaded from http://circgenetics.ahajournals.org/ by guest on November 19, 2016
Downloaded from http://circgenetics.ahajournals.org/ by guest on November 19, 2016
Downloaded from http://circgenetics.ahajournals.org/ by guest on November 19, 2016
Cardiac Structural and Sarcomere Genes Associated with Cardiomyopathy Exhibit Marked
Intolerance of Genetic Variation
Stephen Pan, Colleen A. Caleshu, Kyla E. Dunn, Marcia J. Foti, Maura K. Moran, Oretunlewa
Soyinka and Euan A. Ashley
Downloaded from http://circgenetics.ahajournals.org/ by guest on November 19, 2016
Circ Cardiovasc Genet. published online October 16, 2012;
Circulation: Cardiovascular Genetics is published by the American Heart Association, 7272 Greenville Avenue, Dallas,
TX 75231
Copyright © 2012 American Heart Association, Inc. All rights reserved.
Print ISSN: 1942-325X. Online ISSN: 1942-3268
The online version of this article, along with updated information and services, is located on the
World Wide Web at:
http://circgenetics.ahajournals.org/content/early/2012/10/16/CIRCGENETICS.112.963421
Data Supplement (unedited) at:
http://circgenetics.ahajournals.org/content/suppl/2012/10/16/CIRCGENETICS.112.963421.DC1.html
Permissions: Requests for permissions to reproduce figures, tables, or portions of articles originally published in
Circulation: Cardiovascular Genetics can be obtained via RightsLink, a service of the Copyright Clearance Center,
not the Editorial Office. Once the online version of the published article for which permission is being requested is
located, click Request Permissions in the middle column of the Web page under Services. Further information about
this process is available in the Permissions and Rights Question and Answer document.
Reprints: Information about reprints can be found online at:
http://www.lww.com/reprints
Subscriptions: Information about subscribing to Circulation: Cardiovascular Genetics is online at:
http://circgenetics.ahajournals.org//subscriptions/
Cardiac Structural and Sarcomere Genes Associated with
Cardiomyopathy Exhibit Marked Intolerance of Genetic Variation
Stephen Pan MD MS1,2, Colleen A. Caleshu ScM CGC1, Kyla E. Dunn MS CGC1, Marcia J. Foti1, Maura K. Moran BA1, Oretunlewa Soyinka1, Euan A. Ashley MRCP DPhil1 Supplemental Material CIRCCVG/2012/963421 Supplemental Material 1 ABCC9 DSG2 MYH7 PSEN2 TNNI3 ACTC1 DSP MYL2 RBM20 TNNT2 ACTN2 EYA4 MYL3 SCN5A TPM1 BAG3 FKTN MYLK2 SDHA TTN CALR3 JPH2 MYO6 SGCD TTR CAV3 LAMP2 MYOZ2 SLC25A4 VCL COX15 LDB3 NEXN TAZ CSRP3 LMNA PLN TCAP DES MYBPC3 PRKAG2 TMPO DMD MYH6 PSEN1 TNNC1 Supplemental Table I: List of genes determined to have association with cardiomyopathy. Associations were noted from the Online Mendelian Inheritance in Man (OMIM) database or through literature review. CIRCCVG/2012/963421 Supplemental Material 2 Missense
Rank
OMIM
Non-OMIM
Nonsense
OMIM
Non-OMIM
Splice
OMIM
Non-OMIM
1
HLA-DQB1
DEFB108B
FUT2
OR10X1
OAS1
PATE4
2
HLA-A
C6orf10
NPSR1
KRTAP13-2
MUC7
ZNF419
3
HLA-B
OR51B6
DYX1C1
OR5AR1
TMEM216
KLK12
4
HLA-C
OR52E6
C17orf107
MS4A12
APOL4
GUCA1C
5
HLA-DQA1
KRTAP12-2
FCGR2A
PLA2G2C
CYP2D6
RNASE9
6
PRR4
OR2W3
AMPD1
CENPM
AGL
C14orf105
7
KIR3DL1
OR11H6
CC2D2A
OR4X1
NPHP4
HTR3D
8
TAS2R38
APOBEC3H
POMT1
OR51Q1
UCP3
AVPI1
9
ABO
OR5B3
PRODH
FAM187B
DTNBP1
CEACAM21
10
BTNL2
OR13C5
DNAH11
OR6C74
XPNPEP3
UGT2B10
11
CYBA
PTX4
CLEC7A
UBE2NL
CFHR1
CFLAR
12
HLA-DRB1
TAS2R42
LPL
OVCH2
DAOA
NIPAL2
13
SPINK5
OR5H6
CYP2A6
OR2L8
TRPV4
C13orf26
14
NAT2
OR51Q1
CD36
MAGEB16
NPC2
GREB1
15
GYPB
OR12D2
CDH15
TAS2R46
BTNL2
GSDMB
16
APOL4
RAET1E
TRPM1
OR1B1
ANKK1
C15orf57
17
GP6
OR5R1
COL9A2
OR4C16
LRTOMT
XRCC4
CIRCCVG/2012/963421 Supplemental Material 3 18
SLC39A4
OR52B6
TLR5
SEC22B
PGM1
GSTT2
19
HRG
OR9G1
COQ2
CSAG1
LPA
TCTEX1D1
20
CYP2D6
OR8D4
KNG1
OR52N4
CES1
SLC22A24
Supplemental Table 2: Top 20 genes in each category with highest variant rate by subtype. CIRCCVG/2012/963421 Supplemental Material 4 Gene
MYBPC3
MYBPC3
MYBPC3
MYBPC3
MYBPC3
MYBPC3
MYBPC3
MYBPC3
MYBPC3
MYBPC3
MYBPC3
MYH7
MYH7
MYH7
MYH7
MYH7
a
Controls
c
Experimental
data
Frequency
in NHLBI
ESP
Variant
Classification
Cases
IVS8+1G>A
IVS11-2A>G
(c.927-2 A>G)
IVS27+1 G>A
(c.2905+1 G>A)
IVS30+2 T>G
(c.3330+2 T>G)
p.Val219Leu
(c.655G>C)
p.Arg502Gln
(c.1505G>A)
p.Arg502Trp
(c.1504C>T)
p.Glu542Gln
(c.1624G>C)
p.Trp792Arg
(c.2374T>C)
p.Trp792ValfsX41
(c.2373dupG)
p.Pro955ArgfsX95
(c.2864_2865delCT)
p.Arg169Gly
(c.505A>G)
p.Ala199Val
(c.596C>T)
p.Arg204His
(c.611 G>A)
p.Arg249Gln
(c.746A>G)
p.Ile263Thr
(c.788T>C)
likely disease causing
2
moderate
200
moderate
very likely disease causing
7
strong
300
moderate
likely disease causing
4
weak
250
moderate
likely disease causing
≥10
weak
250
moderate
likely disease causing
6
weak
1200
n/a
very likely disease causing
9
moderate
418
n/a
very likely disease causing
37
strong
395
n/a
1
likely disease causing
11
weak
650
moderate
1
likely disease causing
9
weak
400
n/a
very likely disease causing
≥14
strong
700
weak
very likely disease causing
4
strong
300
n/a
likely disease causing
1
strong
n/a
n/a
likely disease causing
1
strong
400
n/a
likely disease causing
4
n/a
300
n/a
very likely disease causing
11
strong
211
moderate
likely disease causing
5
weak
200
n/a
CIRCCVG/2012/963421 Supplemental Material Segregation
b
5 MYH7
MYH7
MYH7
MYH7
MYH7
MYH7
MYH7
MYH7
MYH7
MYH7
MYH7
MYH7
MYH7
MYH7
MYH7
MYH7
MYH7
MYH7
p.Arg403Gln
(c.1208 G>A)
p.Arg403Leu
(c.1208G>T)
p.Arg403Trp
(c.1207C>T)
p.Arg453Cys
(c.1357C>T)
p.Arg453His
(c.1358G>A)
p.Val606Met
(c.1816 G>A)
p.Arg663His
(c.1988G>A)
p.Arg719Gln
(c.2156G>A)
p.Gly716Arg
(c.2146G>A)
p.Arg723Cys
(c.2167C>T)
p.Ile736Thr
(c.2207T>C)
p.Gly741Arg
(c.2221G>A)
p.Gly741Trp
(c.2221G>T)
p.Arg870His
(c.2609G>A)
p.Leu908Val
(c.2722C>G)
p.Glu924Lys
(c.2770G>A)
p.Glu1356Lys
(c. 4066G>A)
p.Arg1712Gln
very likely disease causing
12
strong
100
strong
likely disease causing
3
strong
150
n/a
very likely disease causing
11
strong
300
n/a
very likely disease causing
14
strong
502
n/a
likely disease causing
3
n/a
n/a
n/a
very likely disease causing
≥17
strong
470
strong
very likely disease causing
19
strong
420
n/a
likely disease causing
11
moderate
1132
n/a
very likely disease causing
9
strong
400
n/a
Very likely disease causing
5
strong
440
n/a
likely disease causing
8
weak
496
weak
likely disease causing
8
weak
220
n/a
likely disease causing
3
moderate
96
weak
very likely disease causing
11
strong
370
moderate
very likely disease causing
16
strong
841
moderate
likely disease causing
6
weak
890
moderate
likely disease causing
5
weak
1096
moderate
likely disease causing
4
weak
200
n/a
CIRCCVG/2012/963421 Supplemental Material 1
6 (c.5135G>A)
MYH7
MYH7
TNNT2
TNNT2
TNNT2
TNNT2
TNNT2
TNNT2
TNNT2
TNNT2
TNNT2
TNNT2
p.Ser1776Gly
(c. 5326 A>G)
p.Lys1459Asn
(c. 4377G>T)
p.Ile79Asn
(c.236T>A)
p.Arg92Gln
(c.275G>A)
p.Arg92Leu
(c.275 G>T)
p.Arg92Trp
(c.274C>T)
p.Arg94Leu
(c.281G>T)
p.Phe110Ile
(c.328T>A)
p.Phe110Leu
(c.328T>C)
p.Arg130Cys
(c.388C>T)
p.Arg173Trp
(c.517C>T)
p.Arg278Cys
(c.832 C>T)
likely disease causing
6
weak
200
n/a
likely disease causing
4
weak
990
n/a
very likely disease causing
4
strong
390
strong
very likely disease causing
6
strong
530
strong
very likely disease causing
3
moderate
240
strong
very likely disease causing
16
strong
690
strong
likely disease causing
3
weak
890
weak
very likely disease causing
14
strong
460
strong
likely disease causing
2
weak
250
n/a
likely disease causing
5
moderate
370
weak
likely disease causing
3
moderate
335
n/a
likely disease causing
13
weak
600
moderate
6
Supplemental Table 3. Manually curated high confidence pathogenic variants. a: total number of unrelated individuals with the variant from published data, our clinical cohort, and clinical laboratory data provided in genetic test report. b: strength of segregation data based on largest number of affected individuals with the variant within a single kindred. >5 – strong, 4-­‐5 – moderate, 2-­‐3 – weak. c: total number of controls the variant was not observed in from published data and clinical laboratory data. CIRCCVG/2012/963421 Supplemental Material 7 MYBPC3 (NM_000256)
Exon Pathogenic
ESP
1
0
2
0
4
0
5
0
6
1
7
0
8
0
11
0
12
0
13
0
14
0
15
0
16
3
17
0
18
0
20
0
21
0
22
0
23
1
24
0
25
0
26
0
27
0
28
0
29
0
30
0
31
0
32
0
33
0
Total
5
1
7
3
8
4
1
3
1
2
1
5
4
7
2
5
2
1
3
1
4
5
5
1
9
2
9
1
4
1
102
p-value
0.3833
MYH7 (NM_000257)
Exon
Pathogenic
3
6
7
9
11
13
14
16
17
18
19
20
21
22
23
24
26
30
31
32
34
35
36
37
38
39
40
Total
0
1
2
2
0
3
2
1
0
1
2
4
0
1
2
0
0
1
0
1
0
1
0
1
0
0
0
25
p-value
0.001794
ESP
2
0
0
0
2
0
2
3
1
1
0
2
4
3
3
2
2
5
4
2
6
2
3
7
3
1
1
61
TNNT2 (NM_000364)
Exon
Pathogenic
2
6
8
9
10
11
12
13
14
15
16
Total
ESP
0
0
0
1
6
1
1
0
0
0
1
1
2
1
2
0
2
1
3
4
1
2
10
p-value
0.01289
19
Supplemental Table 4: Distribution of missense variants amongst the exons of MYBPC3, MYH7, and TNNT2. The canonical isoform was used in the case of multiple isoforms. P-­‐values represent results of a Fisher’s Exact Test for independence of distributions between pathogenic variants and the variants from the Exome Sequencing Project (ESP) dataset. CIRCCVG/2012/963421 Supplemental Material 8