Herbert KM, La Porta A, Wong BJ, Mooney RA, Neuman

Transcription

Herbert KM, La Porta A, Wong BJ, Mooney RA, Neuman
Sequence-Resolved Detection of Pausing
by Single RNA Polymerase Molecules
Kristina M. Herbert,1 Arthur La Porta,2 Becky J. Wong,2 Rachel A. Mooney,3 Keir C. Neuman,2,5
Robert Landick,3 and Steven M. Block2,4,*
1
Biophysics Program, Stanford University, Stanford, CA 94305, USA
Department of Biological Sciences, Stanford University, Stanford, CA 94305, USA
3
Department of Bacteriology, University of Wisconsin—Madison, Madison, WI 53706, USA
4
Department of Applied Physics, Stanford University, Stanford, CA 94305, USA
5
Present Address: Laboratoire de Physique Statistique, École Normale Supérieure, 75231 Paris, France.
*Contact: [email protected]
DOI 10.1016/j.cell.2006.04.032
2
SUMMARY
Transcriptional pausing by RNA polymerase
(RNAP) plays an important role in the regulation
of gene expression. Defined, sequence-specific
pause sites have been identified biochemically.
Single-molecule studies have also shown that
bacterial RNAP pauses frequently during transcriptional elongation, but the relationship of
these ‘‘ubiquitous’’ pauses to the underlying
DNA sequence has been uncertain. We employed an ultrastable optical-trapping assay to
follow the motion of individual molecules of
RNAP transcribing templates engineered with
repeated sequences carrying imbedded, sequence-specific pause sites of known regulatory function. Both the known and ubiquitous
pauses appeared at reproducible locations,
identified with base-pair accuracy. Ubiquitous
pauses were associated with DNA sequences
that show similarities to regulatory pause sequences. Data obtained for the lifetimes and efficiencies of pauses support a model where the
transition to pausing branches off of the normal
elongation pathway and is mediated by a common elemental state, which corresponds to the
ubiquitous pause.
INTRODUCTION
Transcription by RNA polymerase (RNAP) is one of the
most exquisitely controlled processes in the cell. Although
much regulation occurs during the initiation phase of transcription, elongation in prokaryotes and eukaryotes is
frequently interrupted by sequence-specific pauses that
are thought to play important roles in this process, either
in aggregate or at specific locations. Pauses at specific
sites allow for the recruitment of regulatory factors that
modify subsequent transcription (Artsimovitch and Landick, 2002; Bailey et al., 1997; Palangat et al., 1998; Ring
et al., 1996; Tang et al., 2000) or serve as a precursor state
for transcriptional arrest and termination (Kireeva et al.,
2005; Richardson and Greenblatt, 1996). In aggregate,
pausing allows coupling of transcription with translation
in prokaryotes (Landick et al., 1996) and splicing and
polyadenylation in eukaryotes (de la Mata et al., 2003; Yonaha and Proudfoot, 1999). Elongation regulators also
modulate pausing to control rates of RNA chain synthesis
in all organisms (Artsimovitch and Landick, 2000; Renner
et al., 2001; Tang et al., 2000).
Pausing has been studied for more than two decades,
but no unique consensus pause sequence has emerged.
Instead, pause signals appear to be multipartite, with potential contributions from all DNA and RNA segments in
contact with RNAP (Artsimovitch and Landick, 2000;
Chan and Landick, 1993; Palangat and Landick, 2001).
Two general classes of sequence-dependent pauses
have been characterized biochemically, which we collectively term ‘‘defined’’ pauses (Artsimovitch and Landick,
2000). One class of defined pause is stabilized by a hairpin
that forms in the nascent RNA transcript. These hairpinstabilized pauses are found, for example, in leader regions
of biosynthetic operons in bacteria, where they serve to
synchronize the progress of RNAP with ribosomes during
transcriptional attenuation (Henkin and Yanofsky, 2002).
One example is the his pause element found near the beginning of the histidine operon in E. coli (Artsimovitch and
Landick, 2000). Interactions between RNAP and the his
pause hairpin are thought to stabilize RNAP in its pretranslocated state (Toulokhonov and Landick, 2003).
A second class of defined pause is stabilized by an upstream motion of RNAP, leading to extrusion of the RNA 30
end from the nucleotide triphosphate (NTP) entry channel
(Artsimovitch and Landick, 2000; Komissarova and
Kashlev, 1997; Palangat and Landick, 2001). This motion,
termed backtracking, is thought to be a consequence of
a comparatively weak RNA:DNA hybrid, which favors
rearward enzyme motion to a more energetically stable
position. Backtracking prevents elongation by displacing
Cell 125, 1083–1094, June 16, 2006 ª2006 Elsevier Inc. 1083
the 30 end of the RNA from the active site. It is resolved by
reversal of the upstream motion or by endonucleolytic
cleavage of the extruded RNA. Backtracking pauses occur in prokaryotes and eukaryotes, sometimes allowing
for the recruitment of transcription factors (Adelman
et al., 2005; Artsimovitch and Landick, 2000; Palangat
and Landick, 2001). One example is the ops pause in
E. coli, where RNAP backtracks by a few base pairs, allowing the binding of RfaH, a factor that suppresses early
termination (Artsimovitch and Landick, 2000, 2002). The
his and ops pauses represent well-characterized cases
from a spectrum of possible pause signals.
Single-molecule studies of transcription by E. coli RNAP
performed at physiological NTP concentrations have
identified two classes of pauses, distinguished on the basis of their lifetimes, with an as yet uncertain relationship to
the defined pauses just described (Neuman et al., 2003;
Shaevitz et al., 2003). A small fraction of single-molecule
pauses, representing 5% of the population, have lifetimes
in excess of 20 s. These long-lived pauses are products of
enzyme backtracking associated with nucleotide misincorporation: They appear to play a role in transcriptional
proofreading, allowing RNAP to briefly reverse and then
cleave misincorporated bases before resuming RNA synthesis (Shaevitz et al., 2003).
The remaining short-lifetime pauses, representing 95%
of pauses in single-molecule data, occur at a roughly constant density of 1 pause per 100 bp; these have been
termed ubiquitous pauses (Neuman et al., 2003). The relationship of ubiquitous to defined pauses has been difficult
to establish for two reasons. First, single-molecule experiments have lacked the resolution to determine whether
ubiquitous pauses are caused by specific DNA sequences
(Neuman et al., 2003). Ubiquitous pausing could result
from efficient pausing at a frequently occurring sequence
or from a sequence-independent, stochastic behavior of
RNAP. Frequent pausing may synchronize transcription
with translation to prevent premature rho-dependent termination (Richardson and Greenblatt, 1996) and could,
in principle, be achieved by either mechanism. Second,
defined pauses have been studied under drastically different conditions from ubiquitous pauses, with subsaturating
nucleotides at 37ºC versus saturating nucleotides at
21.5ºC, respectively (Artsimovitch and Landick, 2000;
Neuman et al., 2003).
Single-molecule assays have recently achieved basepair resolution for relative motions of RNAP molecules
(Abbondanzieri et al., 2005), but larger uncertainties in
the absolute position of the enzyme along DNA persist,
making it difficult to assign individual translocation events
to underlying sequence. To overcome this limitation, we
produced a pair of periodic DNA templates, designed to
supply signals that could serve as registration marks during elongation. Templates were constructed carrying
repeats of a motif containing a defined pause signal (see
Experimental Procedures). Two variants of the template
were prepared: one with the his pause element (a ‘‘his
repeat’’ template) and one with the ops pause element
1084 Cell 125, 1083–1094, June 16, 2006 ª2006 Elsevier Inc.
(an ‘‘ops repeat’’ template). The pausing behavior of
RNAP on periodic templates, taken together with the release of DNA at a termination site shortly after the repeats,
permitted us to bring multiple records of transcription into
register. By collecting data using a low-drift ‘‘dumbbell’’
assay (Shaevitz et al., 2003) and performing an alignment
procedure on an ensemble of records, we were able to
localize pause positions with near base-pair accuracy
over the 2000 bp long repeat region of the template.
Aligned records not only supply sequences associated
with pause events but also can be used to determine lifetime distributions and efficiencies for individual pause
sites, facilitating direct comparisons among ubiquitous
and defined pauses. In addition, the repetitive character
of the templates allows us to address longstanding questions about enzyme ‘‘memory,’’ i.e., whether RNAP exists
in stable states with different intrinsic probabilities for
pausing and whether it can switch among such states,
possibly in response to conditions or sequences previously encountered (de Mercoyrol et al., 1990; Foster
et al., 2001; Harrington et al., 2001). Evidence for longlived, heterogeneous velocity states in RNAP has been
reported in previous single-molecule studies on nonrepeating templates (Neuman et al., 2003; Tolic-Norrelykke
et al., 2004), although the basis of the heterogeneity is unknown and its magnitude varies. Some studies have reported that individual wild-type (Davenport et al., 2000)
or mutant (Adelman et al., 2002) RNAPs can switch velocity states, but others have failed to detect switching (Adelman et al., 2002; Neuman et al., 2003; Tolic-Norrelykke
et al., 2004). Regulator binding is known to switch RNAP
into different persistent states (Artsimovitch and Landick,
2000; Yarnell and Roberts, 1999), but the existence of
spontaneous switching behavior is disputed (Pasman
and von Hippel, 2002).
RESULTS
Correlation-Based Alignments Give
Base-Pair Accuracy
Stalled transcription elongation complexes (TECs) formed
on the his and ops repeat templates (Figure 1A) were tethered between two polystyrene beads, creating bead:
RNAP:DNA:bead ‘‘dumbbells.’’ After transcription reinitiation by the introduction of NTPs (1 mM ATP, CTP, UTP;
250 mM GTP), the two beads were each captured by one
of a pair of optical traps (Figure 1B). Constant tension
was maintained on the upstream portion of the template
by feedback, supplying a moderate assisting load during
transcription (Shaevitz et al., 2003). Seven representative
records (of 114 collected) illustrate motion on the periodic
templates (Figure 1C).
Although the position of RNAP on the template can be
determined from the changing length of the DNA tether
between the beads, several factors, such as minor variations in the diameter of a bead, generate uncertainties in
calculations of absolute position. These factors lead to
a rescaling of the ordinate for each record, so that the
Figure 1. Single-Molecule Transcription on Engineered
Templates
(A) Engineered transcription templates. Single 230 bp repeat motifs
(red arrow) consist of a leader sequence (pink) with an associated defined pause (consisting of a his or an ops element; gray), along with
rpoB gene sequences (light green) and flanking DNA corresponding
to restriction sites used in cloning (blue). Transcription templates consist of eight repeat motifs located 1100 bp beyond a T7 A1 promoter
(dark green), from which transcription was initiated, and 80 bp in front
of the rrnB T1 terminator (yellow). Templates were labeled on the transcriptionally upstream end with digoxigenin (orange).
(B) Cartoon of the experimental geometry (not to scale). Two polystyrene beads (light blue) are held in optical traps (pink) above the surface
of a coverglass. A biotin label (black) on RNAP (green) is used to attach
RNAP to the smaller bead by an avidin linkage (yellow). The 30 upstream end of the DNA (dark blue), labeled with digoxigenin (orange),
is bound to a slightly larger bead by an anti-digoxigenin linkage (purple). Transcription proceeds in the direction shown (green arrow),
and polymerase experiences an assisting load.
true transcriptional displacement, x, is linearly related to
its measured value, x0 , through x = ax0 + b, where a is
a scale factor close to one and b is a small offset. In practice, these uncertainties are quite modest: Dispersion in
bead size produces shifts of just 30 nm, and the scale
factor departs from unity by at most a few percent. However, even such small discrepancies can mask evidence
of sequence-dependent pausing. A histogram of the logarithm of the dwell time compiled from multiple records
illustrates the problem (Figure 2A). Pause locations fail to
manifest themselves as peaks in the graph because favored dwell locations of individual traces fall slightly out
of register and fail to add coherently at corresponding
template positions.
Proper alignment and scaling of the records (Figures 2B
and 2C) were achieved using an algorithm consisting of
two stages. First, an initial alignment was performed on
the subset of records where the TEC dissociated within
experimental uncertainty (±20 nm) of the nominal termination site. The scale factor, a, for each of these records was
adjusted until the distances at which pauses were most
likely to recur coincided with the known length of the repeat motif, and the offset, b, was adjusted to bring the
dissociation position into coincidence with the terminator.
After this stage, dwell-time histograms for terminating records exhibit excellent registration out to 500 bp in advance of the terminator (Figure 2B). In the second stage,
these adjusted records were used to seed a crosscorrelation procedure. Here, the (a,b) parameters for all records,
including those that failed to terminate, were varied in
a narrow range (0.95 < a < 1.01; 35 nm < b < 35 nm) to
maximize the crosscorrelation of each individual dwelltime histogram with the combined dataset from the terminating records. This procedure generates a globally optimized (a,b) for each record, exhibiting excellent overall
alignment (Figure 2C). (A full description with sources of
error is found in the Supplemental Data available with
this article online.)
The dwell-time histogram derived from records of the
his repeat template exhibits periodic narrow peaks, with
half-maximal widths of just 2–3 bp, up to 2000 bp from
the terminator (Figure 2C). Equivalent results were obtained for records derived from the ops repeat template
(data not shown). These results represent a dramatic improvement in range and accuracy over a previous study
employing a surface-based assay to align records within
100 bp of a transcriptional runoff site (Shundrovsky
et al., 2004). We attribute the improvement to the stability
of the dumbbell assay and the use of periodic templates.
(C) Seven (of n = 61) representative records of transcription along the
3 kb ops repeat template versus time for single transcribing RNAP
molecules. Records are shown after alignment, as described: Most records display distinct pauses at locations corresponding to ops sites
(gray lines) and elsewhere. Four (of n = 25) records that dissociated
at the location of the rrnB T1 terminator (yellow) are displayed (red
traces); three (of n = 36) records that read through or dissociate prior
to the terminator are shown (blue traces).
Cell 125, 1083–1094, June 16, 2006 ª2006 Elsevier Inc. 1085
Figure 2. Record Alignment and Pause
Locations Identified
(A–C) Dwell-time histograms were compiled for
each transcriptional record as a function of position. The logarithms of these histograms were
then averaged for groups of records, as follows:
(A) Average log dwell-time histogram for his
records (n = 53) before any rescaling or offsets
applied.
(B) Average log dwell-time histogram for terminating his traces (n = 27) after initial rescaling
and alignment of records at the termination site.
(C) Average log dwell-time histogram for all his
traces (n = 53) after final alignment.
(D) Average log dwell-time histogram for
aligned data computed from all eight repeats
for the his repeat motif (red; n = 53 molecules,
310 records) and the ops repeat motif (magenta; n = 61 molecules, 419 records), shown
with the bootstrapped standard deviations
(white). Background color indicates origin of
the underlying sequences: rpoB gene (green),
restriction sites used for cloning (light blue), regulatory pause region (pink), ops pause site (dark
gray), his pause site (light gray). Major pause
sites are labeled.
(E) Comparison of single-molecule and bulk
transcription data. A simulated transcription
gel was created from the dataset in (D) using
a grayscale proportional to the peak height
and scaling the position logarithmically to approximate RNA gel mobility. [a-32P]GMPlabeled transcription complexes were incubated with 250 mM NTPs, quenched at times
between 0 s and 180 s, and run on a denaturing
polyacrylamide gel. Lane L shows the MspI
pBR322 ladder; lane C is a chase. Lines are
drawn between corresponding bands identified
in single-molecule and gel data, color coded as
in (D).
To identify pause sites, data from each of the eight repeat regions were combined to generate average dwelltime histograms for his and ops repeat motifs separately
(Figure 2D). This 8-fold averaging procedure assumes
that the behavior of an enzyme encountering each successive motif is statistically equivalent. Distinct peaks
are evident not only at the locations of the imbedded his
and ops sites, as anticipated, but also at four additional
sites found in the flanking regions (labeled a through d).
The similarity of the histograms throughout the common
flanking regions of the two templates illustrates the reproducibility of this technique. In particular, the relative distances computed between distal pairs of pauses (a–d)
on the two templates agreed to within 0.14 bp, indicative
of the level of precision attained. To compare single-molecule pauses with traditional gel-based assays of transcription, a simulated gel image was computed for the
1086 Cell 125, 1083–1094, June 16, 2006 ª2006 Elsevier Inc.
his repeat region (Figure 2E, left). The simulated image displayed excellent agreement with an actual gel from a conventional transcription assay carried out on a corresponding sequence from the his repeat template (Figure 2E,
right).
RNAP Pause Positions with Respect to DNA
Translocation and RNA Elongation
These high-resolution data allowed us for the first time to
correlate positions of paused RNAPs on DNA with the
lengths and sequences of RNA transcripts. We determined the sequences of the RNA transcripts at pause sites
using transcription gels and RNA sequencing ladders
(Supplemental Data; Figure 3). The his and major ops
pause positions were identical to those found previously
(Artsimovitch and Landick, 2000). However, only one of
the two previously observed minor ops pause sites was
Figure 3. Sequence Similarities for Identified Pauses
Table of DNA sequences underlying each pause, as mapped by transcription gels, along with the corresponding positions identified in single-molecule records. Top row: Consensus sequence generated from
alignment of all pause regions: a–d sites, primary and secondary ops
sites (ops1 and ops2, respectively), and his site. Also shown is the associated information, in bits, for each consensus base (Gorodkin et al.,
1997). Lower rows: Downstream DNA sequences in advance of each
pause are displayed (blue letters), along with the trailing sequence corresponding to the nascent RNA (red; with T substituted for U), with the
region subtended by the RNA:DNA hybrid identified (red underline).
The translocation state of RNAP at each pause site is indicated (green
bars; the widths of these bars show estimated errors in localizing the
position from single-molecule data).
seen (ops2, 2 bp downstream of the major site); this discrepancy may be attributable to differences in reaction
conditions. Direct comparisons of gel data (which determine lengths of the RNAs) and single-molecule data
(which determine enzyme positions on DNA) can be
used to compute a numerical value for the ‘‘translocation
state’’ of RNAP, i.e., the difference, measured in base
pairs, between where pauses map along DNA and where
they map along RNA (Figure 3; Supplemental Data).
Hypotranslocation (backtracking) corresponds to negative values of this quantity; translocation and hypertranslocation correspond to positive values.
To tabulate translocation states, his or ops pause positions were measured relative to the a–d positions; locations of the latter were assumed identical for both templates. Due to small uncertainties arising from (1) the
exact positions of complexes dissociating at the terminator and (2) small, sequence-dependent variations in the
pitch of DNA, we estimate that the entire set of green
bars in Figure 3 could be moved as a group upstream or
downstream by as much as 1.3 bp from their assigned
values. This ambiguity may be resolved, however, since
the his pause is known to trap RNAP in a pretranslocated
state (Toulokhonov et al., 2001; Toulokhonov and Landick,
2003). Assigning the his pause to the pretranslocated
state, we find that the absolute values of translocation
states at the a–d pause positions were all below 1 bp, statistically consistent with zero. At the ops site, RNAP halted
0.75 ± 0.25 bp downstream of ops1, which is 1.25 ± 0.25 bp
upstream of ops2. We cannot be certain, however, that
both pauses occur in the single-molecule experiment even
though they were evident in bulk experiments performed
in the same solution conditions (Figure S6). The narrowness of the ops peak (Figure 2D) indicates that the enzyme
resolves to a single position on the template after pausing
at the ops sites. This could be achieved by pausing at ops1
followed by 0.75 bp forward translocation, pausing at
ops2 followed by 1.25 bp backtracking, or a combination
of the two. We are unable to distinguish among these possibilities. Pausing at ops2 seems less likely because the
assisting force applied is expected to inhibit backtracking.
There are noteworthy similarities among the sequences
triggering pauses (Figure 3). G or C is present at the 11
position of the RNA for all seven sequences. G is present
at 10 for all but ops2. With the exception of the c pause
site, pauses occurred at positions where a purine was
being added to a 30 pyrimidine.
Pause Densities and Lifetimes Vary
in a Sequence-Dependent Manner
To study the kinetics of pause entry and escape, we used
an automated algorithm to identify pauses, scoring these
whenever velocity fell below half the average active elongation rate in records (Neuman et al., 2003; Shaevitz et al.,
2003). Within the initial 1000 bp segment encountered
prior to the tandem repeats (consisting of sequences derived from the rpoB gene), we recorded the same density
of pauses as previously (0.9 pauses per 100 bp). The
global distribution of pause lifetimes from the entire template was fit between 1 s and 25 s by a sum of two exponentials with time constants of 1.4 ± 0.1 and 6.3 ± 0.5 s
(amplitudes 66% and 34%, respectively), in agreement
with prior reports (Neuman et al., 2003; Shaevitz et al.,
2003), despite a lower GTP level (250 mM versus 1 mM).
Our aligned data clearly indicate that ubiquitous pauses
are sequence dependent. Within the tandem repeat motifs, roughly 65% of all pauses occurred within five narrow
zones, subtending just 21% of the region (Figure 4). Moreover, the residual pause density scored in the remaining
79% of the same region deviated significantly from a binomial distribution, which is the form expected for a uniform
background pause rate (Figure S8; p(c2) < 0.004). This
suggests that even in regions of low pause density where
peaks are not evident, the pause probability continues to
vary with underlying sequence, just as elsewhere. The
pause lifetime distributions for all six high-efficiency,
sequence-dependent pauses were well fit by single exponentials (Figure 5). Therefore, it seems likely that the
apparently double-exponential character of the global lifetime distribution (above; see also Neuman et al., 2003;
Shaevitz et al., 2003) results from a superposition of
many single-exponential distributions arising from individual pauses, among which long and short characteristic
Cell 125, 1083–1094, June 16, 2006 ª2006 Elsevier Inc. 1087
Figure 4. Pause Efficiency Is Sequence Dependent
(A) Histogram of the mean density of pauses (t > 1 s) versus template
position for the ops repeat template motif (1 bp bins). White bars show
the running density + SD based on bootstrapped error estimates.
Events within ±5 bp of identified pause sites are color coded according
to the scheme of Figure 2. The percentages of all events associated
with each labeled pause are shown.
(B) Histogram of the mean density of pauses versus template position
for the his repeat template motif.
time constants dominate. Consistent with this interpretation, the longest and shortest time constants scored for
the four ubiquitous pauses (6.4 s and 1.3 s; Figure 5)
match those obtained by the double-exponential fit.
Pausing Is a Nonobligate State
in the Elongation Pathway
The pause-finding algorithm reliably detects events lasting R1 s. The apparent pause efficiency (percentage of
molecules pausing at a site) therefore underestimates
the true value due to missing events. A corrected pause
efficiency, 3, was determined by adding the estimated
number of short undetected pauses to the detected number, assuming that the exponential distribution of pause
durations can be extrapolated to 0 s. Even with correction,
all identified pause sites exhibited efficiencies well below
100%. Biochemical states situated on the main reaction
pathway for transcription are visited, by definition, in an
obligatory fashion during each nucleotide addition cycle.
Measurements of corrected efficiency less than 100%,
therefore, imply that pauses represent ‘‘off-pathway’’
states that exit and return to the main pathway, consistent
with prior ensemble and single-molecule experiments
(Artsimovitch and Landick, 2000; Davenport et al., 2000;
Erie et al., 1993; Kassavetis and Chamberlin, 1981; Wang
et al., 1995).
In the simplest such branched scheme (Figure 6A),
pause efficiency is a consequence of kinetic competition
1088 Cell 125, 1083–1094, June 16, 2006 ª2006 Elsevier Inc.
between the rate of nucleotide addition, kn, and the rate
of entry into a pause, kp. This model makes testable predictions. All else being equal, enzymes with slower elongation rates stand a greater chance of falling into the
pause state per cycle, resulting in a positive correlation
between the reciprocal of kn and the pause density
(pauses per bp). Furthermore, assuming a constant rate
of entry into the pause state at each position, the pause
frequency, kp (pauses per unit time), is predicted to be
independent of the elongation rate, kn. (Conversely, onpathway pausing predicts no correlation between the
pause density and reciprocal elongation rate and positive
correlation between pause frequency and kn.) The wide
distribution of velocities for elongating polymerase molecules (equivalent to kn averaged over all template positions, Figure S9; consistent with Neuman et al., 2003)
allows us to test these models. As predicted by the offpathway scheme, a positive correlation was found
between the pause density and the inverse elongation velocity (Figure 6B), and no correlation was found between
the pause frequency and inverse elongation velocity (Figure 6C). We also computed the correlation between the
reciprocal elongation rate and apparent pause efficiency
for each of the six pauses in the repeat motif (Figure 6D).
Interestingly, no significant correlation was found for
pauses a, d, ops, and his, where the next nucleotide
added is G, but a stronger correlation was found for
pauses b and c, where the next nucleotide added is A or
C. For these assays, NTP concentrations were saturating
for A, C, and U but comparable to the apparent dissociation rate for G (Abbondanzieri et al., 2005). It is possible
that a slower rate of nucleotide binding at G sites complicates the kinetics at these positions, so that the sitespecific value of kn no longer reflects the average elongation rate.
The Intrinsic Lifetime of the Pause State
Is Conserved
The pause lifetime, t, is the reciprocal of the exit rate from
the pause state, 1/kp. However, molecules escaping this
state immediately re-encounter a kinetic competition
between further pausing (kp) and elongation (kn). For highefficiency pauses, it is likely that the molecule drops back
into the pause state. Single-molecule transcription assays
do not monitor exit from and reentry into the pause state
but instead detect the resumption of elongation. Assuming that the amount of time spent in the active state before
entering a pause is small compared with the time spent in
the pause state, then the apparent lifetime of a pause, t*,
will be nearly exponentially distributed. The measured lifetime, however, will be given by t* = t/(1 3), where t is the
intrinsic lifetime (Supplemental Data). This apparent lifetime is positively correlated with pause efficiency (r =
0.86; p = 0.03). After correction of apparent lifetimes at
all six pause sites for this effect, which vary by a factor
of 5, we found that all the intrinsic lifetimes fell within a narrow range, averaging 1.1 s ± 0.4 s (mean ± SD) (Figure 6D).
This model also predicts that low-efficiency pause sites
Figure 5. Lifetimes and Efficiencies for
Individual Pauses
Histograms of identified pause dwell times,
color coded according to the labeling scheme
of Figure 2, with exponential fits. Measured apparent pause lifetimes (t*) and corrected pause
efficiencies (3) are shown with estimated errors.
(those outside the six pause regions), where the molecule
is unlikely to reenter a pause state after escaping, should
exhibit the intrinsic lifetime (Figure S10).
Molecules Exist in Stable States with Different
Pause Efficiencies
The fact that RNAP molecules transcribe the same motif
up to eight times presents an opportunity to look for direct
experimental evidence of molecular memory or longrange sequence effects. The apparent efficiency of molecules pausing at a given site was plotted as a function of
the repeat number (Figure 7A and Figure S7). A statistically
significant variation of apparent pause efficiency across
the template was not seen, suggesting that the propensity
to pause is not influenced by such factors as the growing
size of the RNA or proximity to distal, nonrepeating sequences. This justified the pooling of pause statistics for
all repeats (e.g., Figure 2D).
Although the average amount of pausing doesn’t vary
from one repeat to the next, this does not exclude the possibility that enzymes vary in their individual propensity to
pause. We consider three possible cases: (1) individual
molecules exist in long-lived states with different intrinsic
pause propensities (a heterogeneous population of stable
states); (2) individual molecules can switch among states
with different intrinsic pause propensities (a homogeneous population of unstable states); or (3) pause propensity is constant (a homogeneous population with respect
to pausing). However, even in the latter case, the probabil-
ity of pausing at a given position will depend on the elongation velocity since entry to the paused state is in kinetic
competition with elongation in a branched pathway (Figure 6A). Heterogeneity of the population with respect to
elongation velocity therefore makes it difficult to distinguish case 1 from case 3. To minimize the influence of
this inhomogeneity, we restricted our attention to the
four pause sites where pausing was uncorrelated with
velocity (a, d, his, and ops). We defined a function that indicates whether pausing observed at a site within a given
repeat motif is correlated with pausing at the corresponding site on a subsequent repeat. This correlation is plotted
for the case where the second pause site is separated
from the first by 1–4 repeat motifs (Figure 7B). In all cases,
there was positive correlation, indicating that molecules
tend to repeat the same behavior at subsequent sites
(e.g., if a molecule paused at the first site, it is more likely
to pause again). Molecules therefore occupy states of
varying pause propensity, eliminating case 3. We also
found that the correlation did not diminish with the distance between sites (D repeat), indicating that individual
pause rates are conserved over at least 1000 bp. Constant correlations fail to support case 2 and lend support
for case 1, where the elongation-competent state to which
a molecule returns after pausing is always the same, although the kinetics of that state may differ from one molecule to the next. This is consistent with the conclusions of
another study of transcription-complex inhomogeneity
(Tolic-Norrelykke et al., 2004).
Cell 125, 1083–1094, June 16, 2006 ª2006 Elsevier Inc. 1089
Figure 6. Pause Pathway and Pause Correlations
(A) Simple off-pathway model for transcriptional pausing where the
pause state competes kinetically with elongation. Steps in the normal
elongation cycle are represented by a single transition with rate kn. The
rate of entering or exiting a pause is kp or kp, respectively. Corrected
pause efficiency (3), pause lifetime (t), and apparent pause lifetime (t*)
are shown in terms of individual rate constants.
(B) Pause density (pauses/100 bp) versus inverse elongation velocity.
Each point corresponds to 1 of n = 114 individual molecules. The correlation coefficient is r = 0.71 (p = 6 3 1019).
(C) Pause frequency (pauses/s) versus inverse elongation velocity.
Each point corresponds to 1 of n = 114 individual molecules. The correlation coefficient is r = 0.16 (p = 0.1).
(D) Corrected pause efficiencies (3), pause lifetimes (t), and apparent
pause lifetimes (t*) for each of the identified pause sites. Correlations
(and corresponding p values) between inverse velocity and apparent
pause efficiency are shown.
DISCUSSION
Ubiquitous Pauses Are Sequence Dependent
Previous single-molecule experiments established that
transcriptional pausing is ubiquitous, occurring at an approximately constant rate (Neuman et al., 2003). By using
1090 Cell 125, 1083–1094, June 16, 2006 ª2006 Elsevier Inc.
periodic templates to localize ubiquitous pauses, we now
show that they are explained by efficient, sequencedependent pausing at a small fraction of available template locations. Ubiquitous pauses are therefore triggered
by common sequence signals rather than random events.
Furthermore, the kinetic properties of ubiquitous pauses
were indistinguishable from those at the his and ops sites,
whose multipartite sequence components are relatively
well established. This suggested that sequence alignment
of all six pauses could be informative.
Interestingly, this alignment revealed sequence similarities (Figure 3), consistent with the idea that certain sequence components may occur frequently in pause signals. The a priori probability that at least six of seven
sequences match at one or more of the 14 base positions
is 7%, suggesting that the nearly conserved 10G is significant. The strong GC-bias at positions 10 and 11 has
received little attention previously, and its mechanistic role
remains unclear, although changes of 10G to U or C were
found to weaken some pauses (Chan and Landick, 1993;
Palangat and Landick, 2001). In principle, it could favor
backtracking, induce transient overextension of the RNA:
DNA hybrid to generate strain in the enzyme, or reflect
some pause-favoring interactions of RNAP with nucleic
acids at these positions. Although GC-rich sequences upstream of hybrids stabilize backtracking (Nudler et al.,
1995; Reeder and Hawley, 1996), our results, together
with others (Neuman et al., 2003; Toulokhonov and Landick, 2003), suggest that RNAP does not backtrack at these
pause sites. Interestingly, inhibition of upstream hybrid
melting is the first mechanism of transcriptional pausing
to have been proposed (Gilbert, 1976), and it also has
been suggested to influence abortive initiation (Kireeva
et al., 2000). However, further work will be required to distinguish among possible mechanisms. All but one of the
pauses occurred where a purine is added to a pyrimidine
nucleotide, consistent with previous reports (Aivazashvili
et al., 1981) and with the idea that this addition is either unusually slow and promotes pausing, or directly contributes
to an elemental pause rearrangement.
Algorithms for predicting pausing and elongation kinetics have been devised based on calculations of the energetic stability of the TEC as a function of position along
a given sequence (Bai et al., 2004; von Hippel, 1998).
One of these models predicts a class of short-lived pauses,
situated on the main elongation pathway, that results from
an exploration of both forward- and back-tracked translocation states, leading to a sequence-dependent transcription rate (Bai et al., 2004). The authors argued that the
nucleotide addition rate would decrease for sequences
where the energy of the pretranslocated state was significantly more favorable than the posttranslocated state and
thereby produce a new type of pause, termed a ‘‘pretranslocation pause,’’ which they proposed to explain ubiquitous pauses. In the present study, however, we found no
evidence to support such an on-pathway pause, even after
correction for missed events. In particular, all mapped
pauses appear to be pretranslocated (Figure 3) yet ranged
Figure 7. RNAP Heterogeneity
(A) Apparent pause efficiency versus repeat number of the pause site (numbered from 1 to 8) for sites indicated.
(B) Correlation between pausing at a given site and during a subsequent visit to an equivalent site, plotted versus the distance between the pair of
sites, measured in units of the motif repeat number (see Experimental Procedures). Probable errors (SD) were estimated from simulations.
in efficiency from 30% to 82%, implying that pretranslocated pauses occur off pathway, as nonobligate states
(Figure 4).
Although pausing is determined by sequence, we present direct evidence that the tendency to pause at a given
site exhibits molecule-to-molecule variation, as does the
elongation velocity between pauses. This inhomogeneity
is demonstrated by the positive correlation between pausing at a given site among different repeats (Figure 7). The
fact that the correlation does not decay as the molecule
traverses up to four repeats excludes internal state
switching on the corresponding time scale as the source
of any inhomogeneity and implies that the tendency to
pause may be an inherent characteristic of each enzyme.
The level of velocity inhomogeneity observed in our assay
is consistent with that seen in previous studies of transcription (Neuman et al., 2003; Tolic-Norrelykke et al.,
2004) but greater than in another assay (Adelman et al.,
2002). We failed to detect any evidence for velocity-state
switching, contrary to one report (Davenport et al., 2000).
The basis of molecular variation in velocities and pause
rates remains a subject for future study. We note, however, that our fundamental conclusions do not depend
on the source of this inhomogeneity.
A Common, Elemental Pause State
It has been previously proposed that longer-lived transcriptional pauses (i.e., hairpin-stabilized and backtracking pauses) may arise from a common nonbacktracked
precursor state (the elemental pause) (Artsimovitch and
Landick, 2000; Neuman et al., 2003; Palangat and Landick, 2001) that is related to ubiquitous pausing (Neuman
et al., 2003) and the ‘‘unactivated’’ state (Erie, 2002). Our
results support this model and allow us to elaborate
upon the original proposal. The fact that ubiquitous
pauses are sequence dependent is consistent with the
idea that this state represents a precursor to backtracked
and hairpin-stabilized pauses, which are known to be
sequence dependent. In support of this assignment, all
pause lifetimes followed exponential distributions, consistent with a transition from a single state. The intrinsic rates
for escaping the elemental pause state were similar for all
sites, despite a 5-fold variation in apparent lifetime. Taken
together, these observations suggest that the return to the
main elongation pathway may represent the exit from an
elemental pause state that forms under our experimental
conditions. Interestingly, we observed no backtracking
at the major ops regulatory site. Furthermore, in singlemolecule assays performed in the presence of DNA oligomers complementary to the his hairpin sequence that disrupt hairpin formation and prevented pause stabilization in
previous experiments (Artsimovitch and Landick, 2000),
we observed no decrease in the dwell time at the his site
(data not shown). Indeed, we found no clear evidence
for the formation of alternative, stabilized pause states at
either the ops or his pause sites, suggesting that the elemental pause may represent the only state populated under the conditions studied here—namely, high NTP
levels, moderate assisting loads that tend to inhibit backtracking, and lower temperature (21.5ºC), which is known
to inhibit arrest or backtracking (Gu and Reines, 1995; Kulish and Struhl, 2001). This is consistent with the fact that
backtracking at the ops site was suggested to occur in experiments conducted at 37ºC (Artsimovitch and Landick,
2000). Thus, in our single-molecule assay conditions, we
appear to observe only an elemental pause state that is
able to equilibrate with the online state from which it
formed.
A Two-Tiered Pause Mechanism
Taken together, our data support the view that a ubiquitous pause is generated by a sequence-dependent
Cell 125, 1083–1094, June 16, 2006 ª2006 Elsevier Inc. 1091
mechanism that induces RNAP to enter temporarily into
an inactive state where elongation is inhibited. This leads
to a brief transcriptional pause with little, if any, associated
motion of the enzyme along the DNA template. Once
RNAP pauses, secondary pause mechanisms, such as
RNA hairpin formation or enzyme backtracking, compete
with the slow rate of escape from the elemental pause
(1 s1), which is more than 10-fold slower than the normal elongation rate. Such ‘‘two-tiered’’ mechanisms for
the regulation of transcription have been previously suggested in studies of misincorporation, pausing, elongation, and termination (Artsimovitch and Landick, 2000;
Erie et al., 1993; Foster et al., 2001; Palangat and Landick,
2001; von Hippel and Yager, 1991, 1992). In this twotiered mechanism, a long-lived regulatory pause would
be comprised of two components acting in succession:
(1) a common sequence element that triggers a temporary
(elemental) pause state, followed by (2) additional sequence elements that convert the elemental pause into
a long-lived pause. The pause-inducing and pause-stabilizing sequence elements might be distinct, but more likely
they overlap or share common motifs. It follows from this
model that there are two potential points of regulation: formation of the elemental pause and subsequent steps that
stabilize pauses. Future work will entail discovering how
regulators such as NusA, Nus G, and Gre factors function
in the context of the two-tiered pause mechanism.
EXPERIMENTAL PROCEDURES
Cloning of a Single Repeat Motif
his and ops-pheP pause sequences have been described (Artsimovitch and Landick, 2000). Flanking sequences in the tandem repeat
regions, consisting of DNA derived from the rpoB gene, were from a region of pRL732 (1638–1800; Neuman et al., 2003; Shaevitz et al., 2003)
with little propensity to form RNA secondary structure, based on mfold
(Zuker, 2003). DNA templates were constructed from eight oligonucleotides of 60 bp with complementary overhanging ends for ligation in
a repetitive and directed fashion. Equimolar amounts of adjacent double-stranded segments were ligated. Overhanging ends were filled in
with the large Klenow fragment of DNA polymerase I (NEB). Bluntended products were ligated into pCR-Blunt vector (Zero Blunt Cloning Kit, Invitrogen). The two templates were amplified using PCR
primers designed to add flanking sequences and were ligated into
pCR-Script (Stratagene) vector.
Cloning Concatenated Pause Sequences
A BglII site was appended to the repeat motifs by PCR. PCR products
were subsequently digested with BamHI and ligated into BamHI/SmaIdigested pUC19. The resulting repeat motifs were 227 bp and 239 bp
for the ops and his motifs, respectively. Subsequent rounds of cloning
to create 2-mers, 4-mers, and finally 8-mers in the pUC19-derived
plasmid were done as described (Carrion-Vazquez et al., 1999).
Cloning Pauses behind the T7 A1 Promoter for Optical Trapping
The concatenated motifs were cloned into pKH1, an 4800 bp long
derivative of pRL732 previously used as a source of DNA templates
(Neuman et al., 2003; Shaevitz et al., 2003). pKH1 was constructed
by digesting pRL732 with SphI and ClaI to remove 3000 bp. Short
oligonucleotides containing BamHI sequence were annealed and
ligated into the digested plasmid. The repeated his and ops pause
sequences were released from their respective PUC19 plasmids by
1092 Cell 125, 1083–1094, June 16, 2006 ª2006 Elsevier Inc.
digestion with BamHI and BglII. These sequences were gel purified
and ligated into pKH1 plasmids that had been linearized with BamHI
and transformed into XL-1 Blue cells (Stratagene). These resulting
plasmids, pKH2 (ops) and pBW1 (his), were used to produce DNA
templates for optical trapping.
Transcription Templates for Optical-Trapping Assays
Linear labeled templates were constructed from pKH2 and pBW1
plasmids by digesting each plasmid at a unique AlwNI site. The resulting 30 ends were labeled using DIG-ddUTP and terminal transferase
(Roche DIG Oligonucleotide 30 -End Labeling Kit). Digestion at a unique
SapI site removed the label at the transcriptionally downstream end,
leaving a single label at the upstream end for tethering to anti-digoxigenin-coated polystyrene beads.
Optical-Trapping Assay
Biotin-labeled RNAP was stalled 29 bp after the T7A1 promoter (Neuman et al., 2003) on labeled templates (constructed as described
above). Six hundred nanometer diameter avidin-labeled and seven
hundred nanometer diameter anti-digoxigenin-labeled beads were
prepared, and stalled TECs were bound to the beads to create bead:
RNAP:DNA:bead dumbbells (Shaevitz et al., 2003). Experiments
were performed as described previously (Shaevitz et al., 2003), but
without heparin in the buffer and in the presence of 1 mM ATP, CTP,
UTP and 250 mM GTP. The experimental room was maintained at
21.5 ± 0.1ºC. The resting tension in the DNA was maintained by a force
clamp at 7.3 ± 2.4 pN (mean ± SD) by moving the 700 nm bead in 50 nm
increments whenever the tension on the DNA fell below 5 pN (Shaevitz
et al., 2003).
Data Analysis
The contour length of DNA between beads was calculated as described (Shaevitz et al., 2003). Template position was determined
from the measured DNA contour length during transcription by subtracting the fixed contour length of the segment of DNA template in advance of the transcription initiation site (2607 bp, based on a pitch of
0.338 nm/bp). After traces were aligned (Supplemental Experimental
Procedures), pauses were identified by an algorithm similar to Shaevitz
et al. (2003) except that, to avoid scoring a single pause multiple times
(due to drift), extra pauses found within 1 bp downstream of a prior
pause were concatenated. Pause correlations for a given site were calculated by scoring each visit to the site with 1 or 0, according to
whether a pause of R1 s was observed. The correlation was evaluated
for all pairs of visits separated by the prescribed number of repeats,
after preprocessing to subtract the mean value and renormalize the
variance to unity. Analysis was performed in Igor Pro (Wavemetrics)
and C.
Supplemental Data
Supplemental Data include Supplemental Experimental Procedures,
ten figures, and Supplemental References and can be found with this
article online at http://www.cell.com/cgi/content/full/125/6/1083/
DC1/.
ACKNOWLEDGMENTS
We thank J. Gelles and past and present members of the Block lab,
particularly E. Abbondanzieri, J. Shaevitz, R. Dalal, and W. Greenleaf,
for discussions and technical assistance. We thank P. Fordyce and M.
Woodside for comments on the manuscript. K.M.H. thanks M. CarrionVazquez for cloning advice. B.J.W. received support from a Stanford
Undergraduate Research Grant and an HHMI Summer Fellowship.
K.M.H. and R.A.M. acknowledge support from an HHMI Predoctoral
Fellowship and an NIH Biotechnology Training Grant, respectively.
This work was supported by grants from the NIH to S.M.B. and R.L.
Received: January 24, 2006
Revised: March 18, 2006
Accepted: April 13, 2006
Published: June 15, 2006
REFERENCES
Abbondanzieri, E.A., Greenleaf, W.J., Shaevitz, J.W., Landick, R., and
Block, S.M. (2005). Direct observation of base-pair stepping by RNA
polymerase. Nature 438, 460–465.
Adelman, K., La Porta, A., Santangelo, T.J., Lis, J.T., Roberts, J.W.,
and Wang, M.D. (2002). Single molecule analysis of RNA polymerase
elongation reveals uniform kinetic behavior. Proc. Natl. Acad. Sci.
USA 99, 13538–13543.
Adelman, K., Marr, M.T., Werner, J., Saunders, A., Ni, Z., Andrulis,
E.D., and Lis, J.T. (2005). Efficient release from promoter-proximal stall
sites requires transcript cleavage factor TFIIS. Mol. Cell 17, 103–112.
Aivazashvili, V.A., Bibilashvili, R., Vartikian, R.M., and Kutateladze, T.V.
(1981). Factors influencing the pulse character of RNA elongation
in vitro by E. coli RNA polymerase. Mol. Biol. (Mosk.) 15, 653–667.
Artsimovitch, I., and Landick, R. (2000). Pausing by bacterial RNA polymerase is mediated by mechanistically distinct classes of signals.
Proc. Natl. Acad. Sci. USA 97, 7090–7095.
Artsimovitch, I., and Landick, R. (2002). The transcriptional regulator
RfaH stimulates RNA chain synthesis after recruitment to elongation
complexes by the exposed nontemplate DNA strand. Cell 109, 193–
203.
Bai, L., Shundrovsky, A., and Wang, M.D. (2004). Sequence-dependent kinetic model for transcription elongation by RNA polymerase.
J. Mol. Biol. 344, 335–349.
Bailey, M.J., Hughes, C., and Koronakis, V. (1997). RfaH and the ops
element, components of a novel system controlling bacterial transcription elongation. Mol. Microbiol. 26, 845–851.
Carrion-Vazquez, M., Oberhauser, A.F., Fowler, S.B., Marszalek, P.E.,
Broedel, S.E., Clarke, J., and Fernandez, J.M. (1999). Mechanical and
chemical unfolding of a single protein: a comparison. Proc. Natl. Acad.
Sci. USA 96, 3694–3699.
Chan, C.L., and Landick, R. (1993). Dissection of the his leader pause
site by base substitution reveals a multipartite signal that includes
a pause RNA hairpin. J. Mol. Biol. 233, 25–42.
Davenport, R.J., Wuite, G.J., Landick, R., and Bustamante, C. (2000).
Single-molecule study of transcriptional pausing and arrest by E. coli
RNA polymerase. Science 287, 2497–2500.
de la Mata, M., Alonso, C.R., Kadener, S., Fededa, J.P., Blaustein, M.,
Pelisch, F., Cramer, P., Bentley, D., and Kornblihtt, A.R. (2003). A slow
RNA polymerase II affects alternative splicing in vivo. Mol. Cell 12,
525–532.
de Mercoyrol, L., Soulie, J.M., Job, C., Job, D., Dussert, C., Palmari, J.,
Rasigni, M., and Rasigni, G. (1990). Abortive intermediates in transcription by wheat-germ RNA polymerase II. Dynamic aspects of enzyme/
template interactions in selection of the enzyme synthetic mode.
Biochem. J. 269, 651–658.
(Cold Spring Harbor, NY, USA: Cold Spring Harbor Laboratory), pp.
193–205.
Gorodkin, J., Heyer, L.J., Brunak, S., and Stormo, G.D. (1997). Displaying the information contents of structural RNA alignments: the structure logos. Comput. Appl. Biosci. 13, 583–586.
Gu, W., and Reines, D. (1995). Identification of a decay in transcription
potential that results in elongation factor dependence of RNA polymerase II. J. Biol. Chem. 270, 11238–11244.
Harrington, K.J., Laughlin, R.B., and Liang, S. (2001). Balanced
branching in transcription termination. Proc. Natl. Acad. Sci. USA 98,
5019–5024.
Henkin, T.M., and Yanofsky, C. (2002). Regulation by transcription attenuation in bacteria: how RNA provides instructions for transcription
termination/antitermination decisions. Bioessays 24, 700–707.
Kassavetis, G.A., and Chamberlin, M.J. (1981). Pausing and termination of transcription within the early region of bacteriophage T7 DNA
in vitro. J. Biol. Chem. 256, 2777–2786.
Kireeva, M.L., Komissarova, N., and Kashlev, M. (2000). Overextended
RNA:DNA hybrid as a negative regulator of RNA polymerase II processivity. J. Mol. Biol. 299, 325–335.
Kireeva, M.L., Hancock, B., Cremona, G.H., Walter, W., Studitsky,
V.M., and Kashlev, M. (2005). Nature of the nucleosomal barrier to
RNA polymerase II. Mol. Cell 18, 97–108.
Komissarova, N., and Kashlev, M. (1997). Transcriptional arrest:
Escherichia coli RNA polymerase translocates backward, leaving the
30 end of the RNA intact and extruded. Proc. Natl. Acad. Sci. USA
94, 1755–1760.
Kulish, D., and Struhl, K. (2001). TFIIS enhances transcriptional elongation through an artificial arrest site in vivo. Mol. Cell. Biol. 21, 4162–
4168.
Landick, R., Turnbough, C.J., and Yanofsky, C. (1996). Transcription
attenuation. In Escherichia coli and Salmonella: Cellular and Molecular
Biology, F. Neidhardt, R. Curtiss, III, J.L. Ingraham, E.C.C. Lin, K.B.
Low, B. Magasanik, W.S. Rfznikopp, M. Riley, M. Schaechter, and
H.E. Umbarger, eds. (Washington, DC: ASM Press), pp. 1263–1286.
Neuman, K.C., Abbondanzieri, E.A., Landick, R., Gelles, J., and Block,
S.M. (2003). Ubiquitous transcriptional pausing is independent of RNA
polymerase backtracking. Cell 115, 437–447.
Nudler, E., Kashlev, M., Nikiforov, V., and Goldfarb, A. (1995). Coupling
between transcription termination and RNA polymerase inchworming.
Cell 81, 351–357.
Palangat, M., and Landick, R. (2001). Roles of RNA:DNA hybrid stability, RNA structure, and active site conformation in pausing by human
RNA polymerase II. J. Mol. Biol. 311, 265–282.
Palangat, M., Meier, T.I., Keene, R.G., and Landick, R. (1998). Transcriptional pausing at +62 of the HIV-1 nascent RNA modulates formation of the TAR RNA structure. Mol. Cell 1, 1033–1042.
Pasman, Z., and von Hippel, P.H. (2002). Active Escherichia coli transcription elongation complexes are functionally homogeneous. J. Mol.
Biol. 322, 505–519.
Erie, D.A. (2002). The many conformational states of RNA polymerase
elongation complexes and their roles in the regulation of transcription.
Biochim. Biophys. Acta 1577, 224–239.
Reeder, T.C., and Hawley, D.K. (1996). Promoter proximal sequences
modulate RNA polymerase II elongation by a novel mechanism. Cell
87, 767–777.
Erie, D.A., Hajiseyedjavadi, O., Young, M.C., and von Hippel, P.H.
(1993). Multiple RNA polymerase conformations and GreA: control of
the fidelity of transcription. Science 262, 867–873.
Renner, D.B., Yamaguchi, Y., Wada, T., Handa, H., and Price, D.H.
(2001). A highly purified RNA polymerase II elongation control system.
J. Biol. Chem. 276, 42601–42609.
Foster, J.E., Holmes, S.F., and Erie, D.A. (2001). Allosteric binding of
nucleoside triphosphates to RNA polymerase regulates transcription
elongation. Cell 106, 243–252.
Richardson, J.P., and Greenblatt, J. (1996). Control of RNA chain elongation and termination. In Escherichia coli and Salmonella: Cellular and
Molecular Biology, F. Neidhardt, R. Curtiss, III, J.L. Ingraham, E.C.C.
Lin, K.B. Low, B. Magasanik, W.S. Rfznikopp, M. Riley, M. Schaechter,
and H.E. Umbarger, eds. (Washington, DC: ASM Press), pp. 822–848.
Gilbert, W.J. (1976). Starting and stopping sequences of the RNA polymerase. In RNA Polymerase, R. Losick, and M.J. Chamberlin, eds.
Cell 125, 1083–1094, June 16, 2006 ª2006 Elsevier Inc. 1093
Ring, B.Z., Yarnell, W.S., and Roberts, J.W. (1996). Function of E. coli
RNA polymerase sigma factor sigma 70 in promoter-proximal pausing.
Cell 86, 485–493.
Shaevitz, J.W., Abbondanzieri, E.A., Landick, R., and Block, S.M.
(2003). Backtracking by single RNA polymerase molecules observed
at near-base-pair resolution. Nature 426, 684–687.
Shundrovsky, A., Santangelo, T.J., Roberts, J.W., and Wang, M.D.
(2004). A single-molecule technique to study sequence-dependent
transcription pausing. Biophys. J. 87, 3945–3953.
Tang, H., Liu, Y., Madabusi, L., and Gilmour, D.S. (2000). Promoterproximal pausing on the hsp70 promoter in Drosophila melanogaster
depends on the upstream regulator. Mol. Cell. Biol. 20, 2569–2580.
Tolic-Norrelykke, S.F., Engh, A.M., Landick, R., and Gelles, J. (2004).
Diversity in the rates of transcript elongation by single RNA polymerase
molecules. J. Biol. Chem. 279, 3292–3299.
Toulokhonov, I., and Landick, R. (2003). The flap domain is required for
pause RNA hairpin inhibition of catalysis by RNA polymerase and can
modulate intrinsic termination. Mol. Cell 12, 1125–1136.
Toulokhonov, I., Artsimovitch, I., and Landick, R. (2001). Allosteric control of RNA polymerase by a site that contacts nascent RNA hairpins.
Science 292, 730–733.
von Hippel, P.H. (1998). An integrated model of the transcription complex in elongation, termination, and editing. Science 281, 660–665.
von Hippel, P.H., and Yager, T.D. (1991). Transcript elongation and termination are competitive kinetic processes. Proc. Natl. Acad. Sci. USA
88, 2307–2311.
1094 Cell 125, 1083–1094, June 16, 2006 ª2006 Elsevier Inc.
von Hippel, P.H., and Yager, T.D. (1992). The elongation-termination
decision in transcription. Science 255, 809–812.
Wang, D., Meier, T.I., Chan, C.L., Feng, G., Lee, D.N., and Landick, R.
(1995). Discontinuous movements of DNA and RNA in RNA polymerase accompany formation of a paused transcription complex. Cell
81, 341–350.
Yarnell, W.S., and Roberts, J.W. (1999). Mechanism of intrinsic transcription termination and antitermination. Science 284, 611–615.
Yonaha, M., and Proudfoot, N.J. (1999). Specific transcriptional pausing activates polyadenylation in a coupled in vitro system. Mol. Cell 3,
593–600.
Zuker, M. (2003). Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 31, 3406–3415.
Note Added in Proof
We have recently become aware of an improved model for transcriptional pausing that incorporates the energetics of RNA folding and
long-lived kinetic barriers that depend upon the DNA sequence. The
reference for this model is as follows:
Tadigotla, V.R., O’Maoileidigh, D., Sengupta, A.M., Epshtein, V.,
Ebright, R.H., Nudler, E., and Ruckenstein, A.E. (2006). Thermodynamic and kinetic modeling of transcriptional pausing. Proc. Natl.
Acad. Sci. USA 103, 4439–4444.
This model predicts pauses that correlate with a high fraction of the
locations reported here (A.E. Ruckenstein, personal communication).
Herbert et al. 2006
SUPPLEMENTAL DATA
RNAP pausing on repeat templates
Supplemental Data
Sequence-resolved Detection of Pausing by
Single RNA Polymerase Molecules on Periodic Templates
Reveals an Elemental Pause State
Kristina M. Herbert, Arthur La Porta, Becky J. Wong, Rachel A. Mooney, Keir C.
Neuman, Robert Landick, and Steven M. Block
Supplemental Experimental Procedures
Detailed Description of the Alignment Algorithm
Raw data records obtained with the optical trapping apparatus using the dumbbell assay contain
small calibration uncertainties that are attributable, in part, to dispersion in the sizes of the
polystyrene beads to which transcription complexes were attached, as well as to other technical
limitations of the apparatus. The uncertainty in bead radius shifts the apparent position of the
transcription record by a corresponding amount. This same positional uncertainty also leads to a
calibration error in the applied force, which introduces a variation in the elastic extension of the
DNA tether, in effect stretching or compressing records. As a consequence, the true position, x,
of RNAP on the tether for any given record is related to its apparent position, x′, by x = αx′ +β,
where α is the stretch factor and β is the shift (offset) distance. For convenience, the origin of
our coordinate system is taken to be the position of the terminator. The purpose of the alignment
algorithm is to determine the optimal α and β for each individual record, so that features such as
pauses or dissociation events can be related directly to the underlying DNA sequence. That
algorithm is described here.
First Stage: Alignment of Primary Records
1. Before processing, all data records, expressed as position (in basepairs) vs. time (in
seconds), were low-pass filtered by convolution with a Gaussian of std. dev. 0.5 s. Data
were initially sampled at 2000 s−1 and the filtered records were decimated by a factor of
25. A histogram for the position of each molecule along the template was generated
from the records with bins of width 0.5 bp, and the logarithm of the histogram was
calculated. The resulting dwell histogram indicates the logarithm of the amount of time
the molecule spends at each position on the template.
2. For each record, the factor α was scanned between 0.95 and 1.01 in increments of
0.0005. For each trial α, the log-dwell histogram of the record was recalculated, and
sections corresponding to the repeat motifs were extracted and averaged together. When
the choice of α is optimal, any sequence-dependent pauses will tend to fall into register,
and a minimum number of maximally sharp peaks will be obtained in the 8-fold
averaged log-dwell histogram (Figure S1a), causing the overall distribution to become
highly skewed (where the measure of skew is computed from the normalized third
moment of the distribution after subtraction of the mean). The value of α producing the
largest skewness value was selected (Figure S1b).
S1
SUPPLEMENTAL DATA
Herbert et al. 2006
RNAP pausing on repeat templates
3. Once the value for α was optimized, the position at which each complex dissociates was
compared to the position of the expected termination site. Records terminating within
60 bp of the expected position and which traversed at least 5 repeat motifs were retained
for analysis. For each member of this ensemble of records, the value for β was selected
to shift the dissociation position to coincide with the termination site.
4. The log-dwell plots for all terminating records were averaged together after the firststage alignment to create a combined log-dwell plot (Figure 2b). The log-dwell plots for
all 8 repeat motifs could be averaged together to determine the log-dwell plot for a
single repeat.
Approximately half of the data records obtained satisfied the conditions necessary to be included
in the first stage of alignment, above. The resulting alignment exhibited high fidelity near the
terminator position , but this alignment tended to degrade somewhat as the distance from the
terminator increased, presumably due to insufficient resolution in the determination of the α’s.
The purpose of the second stage of analysis was to align all records— including those that read
through the terminator region and those that failed to reach the end of the template—and to
improve the global alignment far from the termination site.
1
before rescaling
after rescaling
Skewness
Log dwell
0
0.6
-1
-2
0.4
0.2
-3
0.0
0
50
100 150 200 250
0.96
Position (bp)
0.98
1.00
Scale factor
1.02
Figure S1. Optimization of the scale factor. (a) Increased sharpness of individual peaks in the
overlaid log-dwell function after the data were rescaled for the correct periodicity. (b) Computed skewness
for the ensemble distribution of log-dwell times as a function of the scale factor, α, showing the optimum
near 0.982.
Second Stage: Alignment of All Records
1. A template for aligning all records was created from the first stage of alignment. The
three terminator-proximal repeats of the log-dwell histogram were averaged together, and
this function was concatenated 8 times to represent the log-dwell function expected for a
full series of 8 repeats, producing an alignment template.
2. For each data record, the correlation function was computed between its particular logdwell histogram and the histogram of the alignment template, above, as a function of both
α and β. (Curves were offset to produce zero mean before correlations were calculated.)
An (α,β) pair for each record was selected that produced maximal correlation, subject to
the constraints (−110 bp < β < 110 bp) and (0.95 < α < 1.01)(See Figure S2a). Records
were retained if they overlapped with at least 50% of the repeat region and if the
S2
SUPPLEMENTAL DATA
Herbert et al. 2006
RNAP pausing on repeat templates
computed correlation surpassed a threshold value of 25%. The (α,β) values obtained
from this second stage alignment were considered final and overrode any preliminary
values assigned during the first stage of alignment (if applicable).
110
(a)
shift (bp)
shift (bp)
110
0
-110
0.94
0.98
scale
1.02
(b)
0
-110
0.94
0.98
scale
1.02
Figure S2. A 2D representation of optimal (α,β) values. (a) The level of brightness indicates the
overall correlation between the histograms of the alignment template and a single transcription record for
the his repeat template as a function of both shift (vertical axis) and scale factor (horizontal axis). The
global peak may be identified as the bright spot near (0.98, 0). (b) The same, but for a simulated data
record. Here, the peak lies near (0.99, 103).
Assessment of Alignment Quality by Numerical Simulation
After all stages of alignment, global histograms of dwell positions along the template displayed a
series of well-defined peaks indicative of sequence-dependent pauses. However, it is thought that
RNAP may also have a small but finite probability of pausing at any position on the template. It
was therefore necessary to consider whether a background of pauses falling outside the strong,
identified pause regions could introduce artifacts that might lead to misalignment of records. We
believe the algorithm remains robust for the following reasons: (1) The template used for final
alignment was generated directly from the terminating records, which are aligned a priori. As a
consequence, the algorithm does not have the freedom to create new features in the aligned data
or to drift away from the termination site; (2) The non-terminating records (for which a priori
alignment was not possible) visit an array of 5 high-efficiency pause sites repeated up to 8 times.
These high-efficiency sites account for 2/3 of all pauses, and therefore produce, on average, 20
sequence-dependent pauses per record that correlate with the alignment template. It seems
improbable that a small population of background pauses could out-compete this strong
repetitive signal by chance. However, to verify the fidelity of the alignment algorithm, we
created a numerical simulation of RNAP transcription for the his repeat template with
programmable sequence dependence. Simulated data records were generated and analyzed with
the identical software used to analyze actual data, and the measured alignment was compared
with the correct alignment, which is known for simulated data.
A parameter file was used by the simulation to specify the kinetic properties at every base
position along the template. These sequence-dependent properties included the rate of
elongation, the rates for entering and escaping each of two pause states [to facilitate modeling of
the dual exponential pause duration previously reported for ubiquitous pausing (Neuman et al.,
2003)], and a termination rate. For generic positions in the repeat motif, and in other regions of
the template, the elongation rate was selected from a normal distribution centered at 14 bp/s with
std. dev. 7 bp/s, and the pause entry and escape rates were selected to match the experimentallyS3
SUPPLEMENTAL DATA
Herbert et al. 2006
RNAP pausing on repeat templates
established ubiquitous pause properties. At the high-efficiency pause sites, a, b, c, d and his, the
pause entry and escape rates were chosen to reproduce the measured pause efficiencies and
apparent lifetimes. The dissociation rate was set to 10-5 bp-1at all positions on the template except
at the termination site, where it was 0.5 bp-1. The algorithm is a stochastic simulation: at each
basepair position, the program uses a random number generator to decide the kinetic competition
between elongation and pausing. If elongation is selected, the dwell time at the base is drawn
from an exponential distribution with time constant corresponding to the inverse elongation rate.
If the pause state is entered, the dwell time is generated from an exponential distribution based
on the escape rate from the pause state. The algorithm then simulates the competition between
dissociation and advancing to the next base (reentry to the pause state is not included in the
simulation, so the pause escape rate is based on the apparent pause lifetime, τ*, rather than the
true lifetime). The output is a specification of how much time the simulated enzyme dwelled at
each base, which is subsequently converted to a position-vs.-time record sampled at 2 kHz.
Finally, simulated Brownian motion (white noise) and instrument drift (a random walk) are
superimposed on every record. This algorithm was run 60 times to generate a set of 60 records.
For each record, the algorithm then imposed a random calibration error, with a small shift drawn
from a zero-mean Gaussian distribution with std. dev. 60 bp and a stretch factor drawn from a
unity-mean Gaussian distribution with a std. dev. of 2%. The resulting records bore a strong
resemblance to actual data, exhibiting similar velocity fluctuations and sequence-dependent
pausing interspersed with apparently random pauses and realistic levels of noise.
The simulated runs were then analyzed using the alignment algorithm, including both the first
and second phases of alignment. A typical correlation map obtained using simulated data was
similar to that obtained from actual data (Figure S2). The (α,β) pairs returned by the alignment
program were then compared to the values actually used to shift and rescale the simulated
records. The fidelity of the alignment procedure is best appreciated by plotting the measured
calibration parameters against the true calibration parameters used in the simulation (Figure S3).
1.04
100
Measured scale factor
Measured shift (bp)
150
50
0
-50
-100
-150
-150 -100 -50
0
50 100 150
Generated shift (bp)
1.03
first stage
second stage
1.02
1.01
1.00
0.99
0.98
0.97
0.96
0.98 0.99 1.00 1.01 1.02
Generated scale factor
Figure S3. Accuracy of the alignment algorithm based on simulations. (a) The measured (optimal)
shift, β, returned by the alignment procedure vs. the shift actually generated by the simulation. (b) The
measured (optimal) scale factor, α, returned by the alignment procedure vs. the scale factor actually
generated by the simulation. Results from both first (red open triangles) and second (blue filled circles)
stages of the algorithm are shown. Note the improvement in fidelity after the second stage.
S4
Herbert et al. 2006
SUPPLEMENTAL DATA
RNAP pausing on repeat templates
Only 2 of 60 simulated records were aligned with a shift that deviated appreciably from the
correct value. Two additional records were shifted by the full repeat distance, which brings the
sequence in the overall repeat region into correct alignment except for the initial or final repeat
motif. In all, 96% of all simulated data were aligned correctly by the algorithm.
Uncertainties Resulting from Sequence-dependent Variations in the Helical Pitch
of dsDNA
One assumption implicit in the alignment procedure is that physical distance along the DNA
tether (measured in nm) can be mapped directly to base pairs of sequence by applying a constant
scale factor (0.34 nm/bp), which represents the average value for the helical pitch of DNA.
However, it is known from crystallographic data that the rise per basepair is somewhat sequencedependent, varying with std. dev. equal to ~10% of the mean. Insufficient data are available to
calculate the rise of an arbitrary sequence (Yanagi et al., 1991). Our alignment algorithm
measures the repeat interval for the repetitive portion of the DNA template and equates this
distance to the known length of the his or ops templates in basepairs. Nevertheless, the actual
sequence distance between any two identified pause sites within a single repeat can differ from
the nominal value if the physical length of sequence separating them differs from the expected
value. Making the approximation that the variation in rise per basepair is normally distributed
and that the sequence separating any two locations can be approximated as random, the
uncertainty in sequence distance between two pause sites is 0.1 bp N , where N is the number of
base pairs separating the sites. The worst-case scenario occurs when comparing pause sites
separated by half the repeat distance, or ~120 bp. In this instance, the uncertainty is
0.1 bp 120 ≈ 1 bp . When pause sites are compared with nearby sites, the uncertainty is
correspondingly smaller.
The natural variation in base rise will also lead to an uncertainty in the absolute position of the
pause sites with respect to the terminator, which serves as the ultimate reference point for
sequence alignments. The sequence distance between the terminator and the his pause sequence
is exactly 160 bp, assuming that dissociation occurs for RNAP in its pre-translocated state. The
alignment algorithm places the his pause site 161.7 bp upstream of the terminator, indicating that
the difference in translocation state of RNAP at the terminator and at the his site is 1.7 bp. This
difference could be explained by backtracking from the his site, by forward translocation at the
terminator, or by some combination of the two. However, the base rise uncertainty over this
same distance comes to 0.1 bp 160 ≈ 1.3 bp . The discrepancy in translocation state of 1.7 bp is
therefore comparable to the statistical component of measurement uncertainty, and
experimentally consistent with zero.
Although variations in helical pitch limit the resolution of absolute measurements of position, the
translocation states of RNAP at the his and ops pause sites can nevertheless be measured relative
to one another with far greater precision: this is because the his and ops sequences have been
inserted into a nearly identical DNA context in the two repeat templates constructed. Excluding
the transcription bubble, the DNA sequence present in the tether when RNAP reaches the his
pause site differs by only 7 bp from the analogous configuration with RNAP at the ops site.
Using the a-d pause sites to define the coordinate system, only a variation in pitch for these 7 bp
can contribute to the relative position uncertainty for the his vs, ops pause sites. The uncertainty
in the relative translocation state is therefore 0.1 bp 7 ≈ 0.26 bp . We estimate the uncertainty in
S5
SUPPLEMENTAL DATA
Herbert et al. 2006
RNAP pausing on repeat templates
the relative translocation state for any of the a-d pauses with respect to neighboring pauses to be
on the order of 0.1 bp 25 ≈ 0.5 bp .
The variation in helical pitch of the his and ops inserts due to sequence effects may also be
estimated using the alignment algorithm itself. These two repeat segments have the same 207 bp
frame into which either the 20 bp ops or the 32 bp his pause element is inserted. Since the
sequence length of the repeat segment is computed experimentally by assuming constant helical
pitch, if the his sequence produced an anomalously large rise, then the remaining portion of the
record would appear to be compressed by the corresponding amount to make up this difference,
and visa versa (Figure S4a). The same argument holds true for any anomalous variations in the
ops insert. By comparing the spacing of the a through d pause sites, determined separately on the
his and ops repeat templates, it is possible to detect any such compression or expansion, and
thereby to measure any deviations from normal pitch in the his or ops inserts. The data
(Figure S4b) show the difference between the spacing of pauses on the his and ops templates as a
function of the spacing observed in the his template. The slope of the graph indicates a fractional
expansion of 0.0014, and an absolute expansion of 0.3 bp in the his template relative to the ops.
This means that the 32 bp his insert has an apparent length deficit corresponding to just 0.3 bp
with respect to the ops insert. This is less than the uncertainty of 0.7 bp predicted from random
sequence based on crystallographic data, suggesting that the sequence-dependent variation in
DNA helical pitch may be somewhat smaller in single-molecule studies, where the DNA is held
under tension. Pause locations reported in the paper were corrected for the measured
discrepancy.
d
a
his
b c
239 bp
His-ops deviation (bp)
0.16
0.14
data
fit
0.12
0.10
0.08
0.06
0.04
0.02
0.00
0 10 20 30 40 50 60 70 80 90
Pause spacing (bp)
Figure S4. Estimating the measurement uncertainty associated with variations in helical pitch.
(Left) Since the length of the repeat segment (in bp) is fixed, any anomalous rise in the his insert (red) will
cause features in the remaining sequence (blue) to appear to expand or contract. (Right) The difference
in interpause distances among the a-d pauses measured on his vs. ops templates is consistent with a
0.0014 relative expansion of the his insert, or 0.3 bp over the 207 bp sequence length.
Relationship Between Pause Lifetime and Pause Efficiency in the Re-entry Model
The relationship between pause lifetime and efficiency may be derived as follows. We postulate
the states shown in Figure 6A and assume reversible entry into the pause state, as well as nonreversible elongation of the nascent RNA. In single-molecule assays, we can measure two
quantities at a pause site. The first is pause efficiency, ε, which we define as the percentage of all
RNAP molecules pausing at the site, corrected for any missed pause events, as described in the
S6
Herbert et al. 2006
SUPPLEMENTAL DATA
RNAP pausing on repeat templates
main text. The second is the apparent pause lifetime, τ*, which represents the amount of time
elapsed before normal transcriptional elongation resumes. The apparent pause lifetimes at a
given site are well approximated by an exponential distribution, from which the time constant τ*
may be calculated, defined through P(t ) = exp(− t τ *) . If we assume that the pause efficiency is
due to a competition between elongation (kn) and pause entry (kp), then the pause efficiency is
defined by:
kp
ε=
kp + kn
Strictly speaking, the lifetime of the pause state is τ = 1/k-p. However, in single molecule
experiments, we do not observe the escape from the pause state, but rather the resumption of
elongation. We assume that the escape from the pause state represents a direct reversal of the
entry into this same state, and therefore that the enzyme returns to a kinetic competition between
further elongation or pause entry. If the rate of returning to the pause state were comparable to
the pause escape rate, this would result in a complex (i.e., non-exponential) distribution of
waiting times before the resumption of elongation. However, for high-efficiency pauses at
saturating NTP concentrations, the rate of nucleotide addition — and therefore the rate for
returning to the pause state (the ratio of these two rates being set by the efficiency) — will be
high compared to the rate of escape from the pause state. In that event, we can approximate reentry to the pause state as instantaneous. Put another way, escape from the pause state and
subsequent reentry can be regarded as a failure to escape from the paused state. The rate of failed
escapes is given by k −p ε and the rate of successful escapes by k − p (1 − ε ) (i.e., if a pause is 75%
efficient, only 25% of pause escapes result in resumption of elongation, and the lifetime of the
nonproductive state will be 4-fold longer.) The time constant for resumption of elongation may
therefore be expressed as:
1
τ
τ* =
=
k − p (1 − ε ) (1 − ε )
=
(k
n
+k p )
k −p k p
However, one cannot take for granted that this relationship will continue to hold as the
nucleotide conditions are reduced, or as other factors influencing elongation and pausing are
varied.
For example, we consider more generally the modified Brownian ratchet mechanism described
by (Abbondanzieri et al., 2005), where the reaction cycle consists of multiple steps. Simple
(exponential) behavior would be recovered if the overall rate of elongation were limited by a
single, rate-limiting transition, such that pausing would occur mainly prior to the corresponding
state. The diagram below (Figure S5), where the grayed-out portion of the pathway can be
neglected at high [NTP], represents one such possibility.
S7
Herbert et al. 2006
SUPPLEMENTAL DATA
RNAP pausing on repeat templates
Figure S5. A scheme for reconciling kinetic competition at a pause site with the Brownian ratchet
mechanism of Abbondanzieri et. al. (2005). The model shown incorporates a secondary NTP binding
site but assumes a fixed order of translocation and NTP shift. The grayed-out state (lower left) may be
neglected at saturating NTP concentrations because the reaction will be driven strongly towards the NTPbound state. The box (cyan) indicates the kinetic competition between pausing and resumption of
elongation.
Here, the translocation step is rate-limiting and pausing branches from the pre-translocated state.
This configuration reduces directly to the simple three state model of Figure 6. Similar kinetics
would also be obtained if the NTP shift from the secondary site or the condensation step were
rate-limiting, and all other transitions prior to phosphate release were rapid, so that the series of
states following the translocation step is effectively in equilibrium. Under these circumstances,
the pause re-entry model would hold, although the rate in kinetic competition with pausing
would be the rate associated with the sub-step (i.e., NTP shift or condensation), rather than the
overall elongation rate. At reduced NTP concentrations (or in the presence of certain
transcriptional co-factors), the probability of being in the NTP-bound state may become
significantly lower, and the grayed out portion of the scheme may come into play, leading to
more complex pause kinetics.
Gel Mapping of Transcriptional Pause Sites
Plasmid construction
To create a plasmid with a repeat motif situated close to the promoter, which leads to more
synchronous transcription and to improved mapping of pause sequences, we engineered two
derivatives of pRL418 (Chan and Landick, 1989). The T7 A1 promoter, followed by a T-less
cassette from pRL418, was moved into pKH2(ops) or pBW1(his) on a DraIII -BamHI fragment
to generate plasmids pKH3(ops) and pBW2(his), respectively.
Bulk Transcription Assays
Templates were prepared by digestion of pKH3 and pBW2 with MaeIII, resulting in a ~3300 bplong segment containing the T7 A1 promoter and repeat sequence; this was subsequently
purified from low-melting agarose followed by phenol extraction and precipitation, or by using
the Qiagen PCR purification kit. Halted transcription complexes were formed by mixing 10 nM
template, 25 nM RNAP, 150 μM ApU dinucleotide, and 10 μM ATP, GTP, α-32P CTP in
transcription buffer (20 mM Tris-HCl, pH 7.9; 20 mM NaCl; 3 mM MgCl2; 14 mM 2mercaptoethanol; 0.1 mM EDTA) and incubated for 10 min at 37°. After equilibration to room
temperature (~22°C ), transcription was restarted by addition of NTPs. Samples were taken at 0,
15", 30", 1', 2', 4', and 8' and separated by 8% PAGE alongside RNA and DNA sequencing
ladders and 32P-labeled pBR322 MspI digest fragments.
S8
Herbert et al. 2006
SUPPLEMENTAL DATA
RNAP pausing on repeat templates
Figure S6. Mapping of the 3′ ends of RNA transcripts at pause sites
Transcription was initiated with 10 µM ATP, GTP, and CTP and incubated for 15 min at 37°C. Halted
complexes were equilibrated to room temperature and elongation restarted by adjustment to 1 mM each
of ATP, CTP, UTP, and 250 µM GTP in the presence of 100 µg/ml Rifampicin. Samples were taken at
the time points indicated and separated by 8% PAGE. RNA markers were generated by adjusting halted
complexes to 40 µM for all 4 NTPS in the presence of 50 µM 3’-deoxy-NTP (G, A, U, or C). Reactions
were loaded either directly with buffer or mixed with a sample containing paused RNA to verify mapping.
Panels A through F each map one of the six identified pause sites. The labels above lanes indicate either
the times when samples were taken after restarting transcription or the identities of the 3’ deoxy-NTP
S9
Herbert et al. 2006
SUPPLEMENTAL DATA
RNAP pausing on repeat templates
present in the reaction buffer. Open circles label positions of paused RNAs. The sequence of the pause
region is shown beside each gel, with lines drawn from individual bases to the corresponding bands.
Each image has been contrast-enhanced to increase clarity, in some cases with different adjustments for
different lanes to compensate for differences in gel loading. Beneath each gel panel is the RNA
sequence, numbered according to positions in the his pause-containing repeat (except panel F, which is
numbered according to the ops repeat sequence). The line above each sequence represents the RNA
transcript, with the open circle identifying the 3’ end of the paused RNA(s).
(A) Mapping of pause a. GAUC sequencing reactions (lanes 1-4) were electrophoresed next to
transcription reactions halted at 15 and 30 s (lanes 5 & 6). The major pause is evident in the 30 s reaction
aligned with the top of the compressed bands corresponding to tandem Cs 4 and 5.
(B) Mapping of pause b. Transcription reactions halted at 15 s, 30 s, 1 min, and 2 min (lanes 1-4) were
electrophoresed adjacent to a G sequencing reaction (lane 5), G and C sequencing reactions mixed with
the 30 s transcription reaction (lanes 6 & 7), and GAUC sequencing reactions (lanes 8-11).
(C) Mapping of pause c. Pause bands formed after 20 s were electrophoresed with G or C sequencing
reactions (lanes 1 & 2), alone (lanes 3 & 4), and adjacent to GAUC sequencing reactions (lanes 5-8).
(D) Mapping of pause d. Pause bands formed after 30 s (lane 1) were electrophoresed adjacent to
GAAUC sequencing reactions (lanes 3-7; lane 2 is empty) and mixed with G or C sequencing reactions
(lanes 8 & 9). The pause band formed during all of the sequencing reactions.
(E) Mapping of the his pause. The pause band formed after 30 s (lane 4) was electrophoresed next to
AUC sequencing reactions (lanes 1-3). The G sequencing reaction failed in this experiment (not shown).
The his pause band is faintly evident in the A sequencing reaction.
(F) Mapping of the ops pause. Four time points from a pause reaction (lanes 1-4) were electrophoresed
next to the GAUC sequencing reactions (lanes 5-8). The pause bands, especially the C 153 pause band,
are evident in all four sequencing reaction lanes.
(G) Sequence of the nascent RNA corresponding to one his template repeat (underlined sequence
corresponds to the inserted pause element; sequence not underlined corresponds to flanking regions
derived from rpoB sequences; see Experimental Procedures). The RNA 3′ ends of the pause are
indicated by open circles. Portions of the sequence that differ in the ops repeat template are shown
beneath the his sequence with dashed lines to indicate the region that is replaced.
Molecules Exist in Stable States with Different Characteristic Pause Efficiencies
Figure S7. RNAP heterogeneity. (A) Apparent pause efficiency vs. the repeat number for the pause site
(numbered from 1 to 8) for the sites indicated. (B) Correlation between the probability of pausing at a
given site and for a subsequent visit to the same site, plotted vs. the distance between the pair of sites,
measured in units of the motif repeat number (see Experimental Procedures). Probable errors were
estimated from simulations.
S10
Herbert et al. 2006
SUPPLEMENTAL DATA
RNAP pausing on repeat templates
Pause Densities and Lifetimes Vary in a Sequence-dependent Manner
Figure S8. Distribution of pause densities outside high-density regions. Histogram of pause
densities (red) derived from flanking regions outside the labeled pauses: these template regions are
indicated as black bars in Figure 4 of the main text, and constitute 79% of the overall sequence of the
repeat region. For comparison, we have plotted a binomial distribution (black) with a mean identical to
that of the observed data, with associated statistical errors. These two distributions are statistically
different (see main text).
Pausing is a Non-obligate State in the Elongation Pathway
Figure S9. Enzyme velocities between pauses. Histograms of the velocity in a single-molecule record
display two distinct peaks: one centered at a characteristic run velocity, and one centered at 0 bp/s,
indicative of the paused state (see Neuman, et al, 2004) . Each enzyme exhibited its own characteristic
run velocity and was never observed to switch to another velocity state. The figure shows the distribution
of all individual enzyme run velocities obtained from single-record histograms, as just described (N =
114), with statistical errors indicated. The distribution was fit by a Gaussian with 11.9 ± 6.2 bp/s (mean ±
std. dev.) This heterogeneity is consistent with previous single-molecule studies (Neuman et al., 2003).
S11
Herbert et al. 2006
SUPPLEMENTAL DATA
RNAP pausing on repeat templates
The Intrinsic Lifetime of the Pause State is Conserved
Figure S10. Pooled dwell-time distribution for all low-efficiency pauses. Histogram of dwell times for
all pauses derived from flanking regions outside the labeled pauses: these template regions are indicated
as black bars in Figure 4 of the main text, and constitute 79% of the overall sequence of the repeat
region. The distribution was fit by a double exponential with time constants, τ1 and τ2, accounting for
80% and 20% of the pauses, respectively (see legend). The distribution is dominated by the short, 1.2 s
time constant, in contrast to the histogram compiled from all pauses in the entire template (see, for
example, (Neuman et al., 2003). The preponderance of short pauses is consistent with a prediction of the
model (Figure 6A, main text) that low efficiency pauses sites would be comparatively less prone to reentry
into the pause state, and therefore display apparent lifetimes that closely approximate the true lifetime.
The residual number of long pauses may arise from a small fraction of pauses at labeled sites (colored
bars in Figure 4, main text) being misidentified, due to incorrect sequence alignment.
Supplemental References
Abbondanzieri, E. A., Greenleaf, W. J., Shaevitz, J. W., Landick, R., and Block, S. M. (2005).
Direct observation of base-pair stepping by RNA polymerase. Nature 438, 460-465.
Chan, C. L., and Landick, R. (1989). The Salmonella typhimurium his operon leader region
contains an RNA hairpin-dependent transcription pause site. Mechanistic implications of the
effect on pausing of altered RNA hairpins. J Biol Chem 264, 20796-20804.
Neuman, K. C., Abbondanzieri, E. A., Landick, R., Gelles, J., and Block, S. M. (2003).
Ubiquitous transcriptional pausing is independent of RNA polymerase backtracking. Cell 115,
437-447.
Yanagi, K., Prive, G. G., and Dickerson, R. E. (1991). Analysis of local helix geometry in three
B-DNA decamers and eight dodecamers. J Mol Biol 217, 201-214.
S12