Herbert KM, La Porta A, Wong BJ, Mooney RA, Neuman
Transcription
Herbert KM, La Porta A, Wong BJ, Mooney RA, Neuman
Sequence-Resolved Detection of Pausing by Single RNA Polymerase Molecules Kristina M. Herbert,1 Arthur La Porta,2 Becky J. Wong,2 Rachel A. Mooney,3 Keir C. Neuman,2,5 Robert Landick,3 and Steven M. Block2,4,* 1 Biophysics Program, Stanford University, Stanford, CA 94305, USA Department of Biological Sciences, Stanford University, Stanford, CA 94305, USA 3 Department of Bacteriology, University of Wisconsin—Madison, Madison, WI 53706, USA 4 Department of Applied Physics, Stanford University, Stanford, CA 94305, USA 5 Present Address: Laboratoire de Physique Statistique, École Normale Supérieure, 75231 Paris, France. *Contact: [email protected] DOI 10.1016/j.cell.2006.04.032 2 SUMMARY Transcriptional pausing by RNA polymerase (RNAP) plays an important role in the regulation of gene expression. Defined, sequence-specific pause sites have been identified biochemically. Single-molecule studies have also shown that bacterial RNAP pauses frequently during transcriptional elongation, but the relationship of these ‘‘ubiquitous’’ pauses to the underlying DNA sequence has been uncertain. We employed an ultrastable optical-trapping assay to follow the motion of individual molecules of RNAP transcribing templates engineered with repeated sequences carrying imbedded, sequence-specific pause sites of known regulatory function. Both the known and ubiquitous pauses appeared at reproducible locations, identified with base-pair accuracy. Ubiquitous pauses were associated with DNA sequences that show similarities to regulatory pause sequences. Data obtained for the lifetimes and efficiencies of pauses support a model where the transition to pausing branches off of the normal elongation pathway and is mediated by a common elemental state, which corresponds to the ubiquitous pause. INTRODUCTION Transcription by RNA polymerase (RNAP) is one of the most exquisitely controlled processes in the cell. Although much regulation occurs during the initiation phase of transcription, elongation in prokaryotes and eukaryotes is frequently interrupted by sequence-specific pauses that are thought to play important roles in this process, either in aggregate or at specific locations. Pauses at specific sites allow for the recruitment of regulatory factors that modify subsequent transcription (Artsimovitch and Landick, 2002; Bailey et al., 1997; Palangat et al., 1998; Ring et al., 1996; Tang et al., 2000) or serve as a precursor state for transcriptional arrest and termination (Kireeva et al., 2005; Richardson and Greenblatt, 1996). In aggregate, pausing allows coupling of transcription with translation in prokaryotes (Landick et al., 1996) and splicing and polyadenylation in eukaryotes (de la Mata et al., 2003; Yonaha and Proudfoot, 1999). Elongation regulators also modulate pausing to control rates of RNA chain synthesis in all organisms (Artsimovitch and Landick, 2000; Renner et al., 2001; Tang et al., 2000). Pausing has been studied for more than two decades, but no unique consensus pause sequence has emerged. Instead, pause signals appear to be multipartite, with potential contributions from all DNA and RNA segments in contact with RNAP (Artsimovitch and Landick, 2000; Chan and Landick, 1993; Palangat and Landick, 2001). Two general classes of sequence-dependent pauses have been characterized biochemically, which we collectively term ‘‘defined’’ pauses (Artsimovitch and Landick, 2000). One class of defined pause is stabilized by a hairpin that forms in the nascent RNA transcript. These hairpinstabilized pauses are found, for example, in leader regions of biosynthetic operons in bacteria, where they serve to synchronize the progress of RNAP with ribosomes during transcriptional attenuation (Henkin and Yanofsky, 2002). One example is the his pause element found near the beginning of the histidine operon in E. coli (Artsimovitch and Landick, 2000). Interactions between RNAP and the his pause hairpin are thought to stabilize RNAP in its pretranslocated state (Toulokhonov and Landick, 2003). A second class of defined pause is stabilized by an upstream motion of RNAP, leading to extrusion of the RNA 30 end from the nucleotide triphosphate (NTP) entry channel (Artsimovitch and Landick, 2000; Komissarova and Kashlev, 1997; Palangat and Landick, 2001). This motion, termed backtracking, is thought to be a consequence of a comparatively weak RNA:DNA hybrid, which favors rearward enzyme motion to a more energetically stable position. Backtracking prevents elongation by displacing Cell 125, 1083–1094, June 16, 2006 ª2006 Elsevier Inc. 1083 the 30 end of the RNA from the active site. It is resolved by reversal of the upstream motion or by endonucleolytic cleavage of the extruded RNA. Backtracking pauses occur in prokaryotes and eukaryotes, sometimes allowing for the recruitment of transcription factors (Adelman et al., 2005; Artsimovitch and Landick, 2000; Palangat and Landick, 2001). One example is the ops pause in E. coli, where RNAP backtracks by a few base pairs, allowing the binding of RfaH, a factor that suppresses early termination (Artsimovitch and Landick, 2000, 2002). The his and ops pauses represent well-characterized cases from a spectrum of possible pause signals. Single-molecule studies of transcription by E. coli RNAP performed at physiological NTP concentrations have identified two classes of pauses, distinguished on the basis of their lifetimes, with an as yet uncertain relationship to the defined pauses just described (Neuman et al., 2003; Shaevitz et al., 2003). A small fraction of single-molecule pauses, representing 5% of the population, have lifetimes in excess of 20 s. These long-lived pauses are products of enzyme backtracking associated with nucleotide misincorporation: They appear to play a role in transcriptional proofreading, allowing RNAP to briefly reverse and then cleave misincorporated bases before resuming RNA synthesis (Shaevitz et al., 2003). The remaining short-lifetime pauses, representing 95% of pauses in single-molecule data, occur at a roughly constant density of 1 pause per 100 bp; these have been termed ubiquitous pauses (Neuman et al., 2003). The relationship of ubiquitous to defined pauses has been difficult to establish for two reasons. First, single-molecule experiments have lacked the resolution to determine whether ubiquitous pauses are caused by specific DNA sequences (Neuman et al., 2003). Ubiquitous pausing could result from efficient pausing at a frequently occurring sequence or from a sequence-independent, stochastic behavior of RNAP. Frequent pausing may synchronize transcription with translation to prevent premature rho-dependent termination (Richardson and Greenblatt, 1996) and could, in principle, be achieved by either mechanism. Second, defined pauses have been studied under drastically different conditions from ubiquitous pauses, with subsaturating nucleotides at 37ºC versus saturating nucleotides at 21.5ºC, respectively (Artsimovitch and Landick, 2000; Neuman et al., 2003). Single-molecule assays have recently achieved basepair resolution for relative motions of RNAP molecules (Abbondanzieri et al., 2005), but larger uncertainties in the absolute position of the enzyme along DNA persist, making it difficult to assign individual translocation events to underlying sequence. To overcome this limitation, we produced a pair of periodic DNA templates, designed to supply signals that could serve as registration marks during elongation. Templates were constructed carrying repeats of a motif containing a defined pause signal (see Experimental Procedures). Two variants of the template were prepared: one with the his pause element (a ‘‘his repeat’’ template) and one with the ops pause element 1084 Cell 125, 1083–1094, June 16, 2006 ª2006 Elsevier Inc. (an ‘‘ops repeat’’ template). The pausing behavior of RNAP on periodic templates, taken together with the release of DNA at a termination site shortly after the repeats, permitted us to bring multiple records of transcription into register. By collecting data using a low-drift ‘‘dumbbell’’ assay (Shaevitz et al., 2003) and performing an alignment procedure on an ensemble of records, we were able to localize pause positions with near base-pair accuracy over the 2000 bp long repeat region of the template. Aligned records not only supply sequences associated with pause events but also can be used to determine lifetime distributions and efficiencies for individual pause sites, facilitating direct comparisons among ubiquitous and defined pauses. In addition, the repetitive character of the templates allows us to address longstanding questions about enzyme ‘‘memory,’’ i.e., whether RNAP exists in stable states with different intrinsic probabilities for pausing and whether it can switch among such states, possibly in response to conditions or sequences previously encountered (de Mercoyrol et al., 1990; Foster et al., 2001; Harrington et al., 2001). Evidence for longlived, heterogeneous velocity states in RNAP has been reported in previous single-molecule studies on nonrepeating templates (Neuman et al., 2003; Tolic-Norrelykke et al., 2004), although the basis of the heterogeneity is unknown and its magnitude varies. Some studies have reported that individual wild-type (Davenport et al., 2000) or mutant (Adelman et al., 2002) RNAPs can switch velocity states, but others have failed to detect switching (Adelman et al., 2002; Neuman et al., 2003; Tolic-Norrelykke et al., 2004). Regulator binding is known to switch RNAP into different persistent states (Artsimovitch and Landick, 2000; Yarnell and Roberts, 1999), but the existence of spontaneous switching behavior is disputed (Pasman and von Hippel, 2002). RESULTS Correlation-Based Alignments Give Base-Pair Accuracy Stalled transcription elongation complexes (TECs) formed on the his and ops repeat templates (Figure 1A) were tethered between two polystyrene beads, creating bead: RNAP:DNA:bead ‘‘dumbbells.’’ After transcription reinitiation by the introduction of NTPs (1 mM ATP, CTP, UTP; 250 mM GTP), the two beads were each captured by one of a pair of optical traps (Figure 1B). Constant tension was maintained on the upstream portion of the template by feedback, supplying a moderate assisting load during transcription (Shaevitz et al., 2003). Seven representative records (of 114 collected) illustrate motion on the periodic templates (Figure 1C). Although the position of RNAP on the template can be determined from the changing length of the DNA tether between the beads, several factors, such as minor variations in the diameter of a bead, generate uncertainties in calculations of absolute position. These factors lead to a rescaling of the ordinate for each record, so that the Figure 1. Single-Molecule Transcription on Engineered Templates (A) Engineered transcription templates. Single 230 bp repeat motifs (red arrow) consist of a leader sequence (pink) with an associated defined pause (consisting of a his or an ops element; gray), along with rpoB gene sequences (light green) and flanking DNA corresponding to restriction sites used in cloning (blue). Transcription templates consist of eight repeat motifs located 1100 bp beyond a T7 A1 promoter (dark green), from which transcription was initiated, and 80 bp in front of the rrnB T1 terminator (yellow). Templates were labeled on the transcriptionally upstream end with digoxigenin (orange). (B) Cartoon of the experimental geometry (not to scale). Two polystyrene beads (light blue) are held in optical traps (pink) above the surface of a coverglass. A biotin label (black) on RNAP (green) is used to attach RNAP to the smaller bead by an avidin linkage (yellow). The 30 upstream end of the DNA (dark blue), labeled with digoxigenin (orange), is bound to a slightly larger bead by an anti-digoxigenin linkage (purple). Transcription proceeds in the direction shown (green arrow), and polymerase experiences an assisting load. true transcriptional displacement, x, is linearly related to its measured value, x0 , through x = ax0 + b, where a is a scale factor close to one and b is a small offset. In practice, these uncertainties are quite modest: Dispersion in bead size produces shifts of just 30 nm, and the scale factor departs from unity by at most a few percent. However, even such small discrepancies can mask evidence of sequence-dependent pausing. A histogram of the logarithm of the dwell time compiled from multiple records illustrates the problem (Figure 2A). Pause locations fail to manifest themselves as peaks in the graph because favored dwell locations of individual traces fall slightly out of register and fail to add coherently at corresponding template positions. Proper alignment and scaling of the records (Figures 2B and 2C) were achieved using an algorithm consisting of two stages. First, an initial alignment was performed on the subset of records where the TEC dissociated within experimental uncertainty (±20 nm) of the nominal termination site. The scale factor, a, for each of these records was adjusted until the distances at which pauses were most likely to recur coincided with the known length of the repeat motif, and the offset, b, was adjusted to bring the dissociation position into coincidence with the terminator. After this stage, dwell-time histograms for terminating records exhibit excellent registration out to 500 bp in advance of the terminator (Figure 2B). In the second stage, these adjusted records were used to seed a crosscorrelation procedure. Here, the (a,b) parameters for all records, including those that failed to terminate, were varied in a narrow range (0.95 < a < 1.01; 35 nm < b < 35 nm) to maximize the crosscorrelation of each individual dwelltime histogram with the combined dataset from the terminating records. This procedure generates a globally optimized (a,b) for each record, exhibiting excellent overall alignment (Figure 2C). (A full description with sources of error is found in the Supplemental Data available with this article online.) The dwell-time histogram derived from records of the his repeat template exhibits periodic narrow peaks, with half-maximal widths of just 2–3 bp, up to 2000 bp from the terminator (Figure 2C). Equivalent results were obtained for records derived from the ops repeat template (data not shown). These results represent a dramatic improvement in range and accuracy over a previous study employing a surface-based assay to align records within 100 bp of a transcriptional runoff site (Shundrovsky et al., 2004). We attribute the improvement to the stability of the dumbbell assay and the use of periodic templates. (C) Seven (of n = 61) representative records of transcription along the 3 kb ops repeat template versus time for single transcribing RNAP molecules. Records are shown after alignment, as described: Most records display distinct pauses at locations corresponding to ops sites (gray lines) and elsewhere. Four (of n = 25) records that dissociated at the location of the rrnB T1 terminator (yellow) are displayed (red traces); three (of n = 36) records that read through or dissociate prior to the terminator are shown (blue traces). Cell 125, 1083–1094, June 16, 2006 ª2006 Elsevier Inc. 1085 Figure 2. Record Alignment and Pause Locations Identified (A–C) Dwell-time histograms were compiled for each transcriptional record as a function of position. The logarithms of these histograms were then averaged for groups of records, as follows: (A) Average log dwell-time histogram for his records (n = 53) before any rescaling or offsets applied. (B) Average log dwell-time histogram for terminating his traces (n = 27) after initial rescaling and alignment of records at the termination site. (C) Average log dwell-time histogram for all his traces (n = 53) after final alignment. (D) Average log dwell-time histogram for aligned data computed from all eight repeats for the his repeat motif (red; n = 53 molecules, 310 records) and the ops repeat motif (magenta; n = 61 molecules, 419 records), shown with the bootstrapped standard deviations (white). Background color indicates origin of the underlying sequences: rpoB gene (green), restriction sites used for cloning (light blue), regulatory pause region (pink), ops pause site (dark gray), his pause site (light gray). Major pause sites are labeled. (E) Comparison of single-molecule and bulk transcription data. A simulated transcription gel was created from the dataset in (D) using a grayscale proportional to the peak height and scaling the position logarithmically to approximate RNA gel mobility. [a-32P]GMPlabeled transcription complexes were incubated with 250 mM NTPs, quenched at times between 0 s and 180 s, and run on a denaturing polyacrylamide gel. Lane L shows the MspI pBR322 ladder; lane C is a chase. Lines are drawn between corresponding bands identified in single-molecule and gel data, color coded as in (D). To identify pause sites, data from each of the eight repeat regions were combined to generate average dwelltime histograms for his and ops repeat motifs separately (Figure 2D). This 8-fold averaging procedure assumes that the behavior of an enzyme encountering each successive motif is statistically equivalent. Distinct peaks are evident not only at the locations of the imbedded his and ops sites, as anticipated, but also at four additional sites found in the flanking regions (labeled a through d). The similarity of the histograms throughout the common flanking regions of the two templates illustrates the reproducibility of this technique. In particular, the relative distances computed between distal pairs of pauses (a–d) on the two templates agreed to within 0.14 bp, indicative of the level of precision attained. To compare single-molecule pauses with traditional gel-based assays of transcription, a simulated gel image was computed for the 1086 Cell 125, 1083–1094, June 16, 2006 ª2006 Elsevier Inc. his repeat region (Figure 2E, left). The simulated image displayed excellent agreement with an actual gel from a conventional transcription assay carried out on a corresponding sequence from the his repeat template (Figure 2E, right). RNAP Pause Positions with Respect to DNA Translocation and RNA Elongation These high-resolution data allowed us for the first time to correlate positions of paused RNAPs on DNA with the lengths and sequences of RNA transcripts. We determined the sequences of the RNA transcripts at pause sites using transcription gels and RNA sequencing ladders (Supplemental Data; Figure 3). The his and major ops pause positions were identical to those found previously (Artsimovitch and Landick, 2000). However, only one of the two previously observed minor ops pause sites was Figure 3. Sequence Similarities for Identified Pauses Table of DNA sequences underlying each pause, as mapped by transcription gels, along with the corresponding positions identified in single-molecule records. Top row: Consensus sequence generated from alignment of all pause regions: a–d sites, primary and secondary ops sites (ops1 and ops2, respectively), and his site. Also shown is the associated information, in bits, for each consensus base (Gorodkin et al., 1997). Lower rows: Downstream DNA sequences in advance of each pause are displayed (blue letters), along with the trailing sequence corresponding to the nascent RNA (red; with T substituted for U), with the region subtended by the RNA:DNA hybrid identified (red underline). The translocation state of RNAP at each pause site is indicated (green bars; the widths of these bars show estimated errors in localizing the position from single-molecule data). seen (ops2, 2 bp downstream of the major site); this discrepancy may be attributable to differences in reaction conditions. Direct comparisons of gel data (which determine lengths of the RNAs) and single-molecule data (which determine enzyme positions on DNA) can be used to compute a numerical value for the ‘‘translocation state’’ of RNAP, i.e., the difference, measured in base pairs, between where pauses map along DNA and where they map along RNA (Figure 3; Supplemental Data). Hypotranslocation (backtracking) corresponds to negative values of this quantity; translocation and hypertranslocation correspond to positive values. To tabulate translocation states, his or ops pause positions were measured relative to the a–d positions; locations of the latter were assumed identical for both templates. Due to small uncertainties arising from (1) the exact positions of complexes dissociating at the terminator and (2) small, sequence-dependent variations in the pitch of DNA, we estimate that the entire set of green bars in Figure 3 could be moved as a group upstream or downstream by as much as 1.3 bp from their assigned values. This ambiguity may be resolved, however, since the his pause is known to trap RNAP in a pretranslocated state (Toulokhonov et al., 2001; Toulokhonov and Landick, 2003). Assigning the his pause to the pretranslocated state, we find that the absolute values of translocation states at the a–d pause positions were all below 1 bp, statistically consistent with zero. At the ops site, RNAP halted 0.75 ± 0.25 bp downstream of ops1, which is 1.25 ± 0.25 bp upstream of ops2. We cannot be certain, however, that both pauses occur in the single-molecule experiment even though they were evident in bulk experiments performed in the same solution conditions (Figure S6). The narrowness of the ops peak (Figure 2D) indicates that the enzyme resolves to a single position on the template after pausing at the ops sites. This could be achieved by pausing at ops1 followed by 0.75 bp forward translocation, pausing at ops2 followed by 1.25 bp backtracking, or a combination of the two. We are unable to distinguish among these possibilities. Pausing at ops2 seems less likely because the assisting force applied is expected to inhibit backtracking. There are noteworthy similarities among the sequences triggering pauses (Figure 3). G or C is present at the 11 position of the RNA for all seven sequences. G is present at 10 for all but ops2. With the exception of the c pause site, pauses occurred at positions where a purine was being added to a 30 pyrimidine. Pause Densities and Lifetimes Vary in a Sequence-Dependent Manner To study the kinetics of pause entry and escape, we used an automated algorithm to identify pauses, scoring these whenever velocity fell below half the average active elongation rate in records (Neuman et al., 2003; Shaevitz et al., 2003). Within the initial 1000 bp segment encountered prior to the tandem repeats (consisting of sequences derived from the rpoB gene), we recorded the same density of pauses as previously (0.9 pauses per 100 bp). The global distribution of pause lifetimes from the entire template was fit between 1 s and 25 s by a sum of two exponentials with time constants of 1.4 ± 0.1 and 6.3 ± 0.5 s (amplitudes 66% and 34%, respectively), in agreement with prior reports (Neuman et al., 2003; Shaevitz et al., 2003), despite a lower GTP level (250 mM versus 1 mM). Our aligned data clearly indicate that ubiquitous pauses are sequence dependent. Within the tandem repeat motifs, roughly 65% of all pauses occurred within five narrow zones, subtending just 21% of the region (Figure 4). Moreover, the residual pause density scored in the remaining 79% of the same region deviated significantly from a binomial distribution, which is the form expected for a uniform background pause rate (Figure S8; p(c2) < 0.004). This suggests that even in regions of low pause density where peaks are not evident, the pause probability continues to vary with underlying sequence, just as elsewhere. The pause lifetime distributions for all six high-efficiency, sequence-dependent pauses were well fit by single exponentials (Figure 5). Therefore, it seems likely that the apparently double-exponential character of the global lifetime distribution (above; see also Neuman et al., 2003; Shaevitz et al., 2003) results from a superposition of many single-exponential distributions arising from individual pauses, among which long and short characteristic Cell 125, 1083–1094, June 16, 2006 ª2006 Elsevier Inc. 1087 Figure 4. Pause Efficiency Is Sequence Dependent (A) Histogram of the mean density of pauses (t > 1 s) versus template position for the ops repeat template motif (1 bp bins). White bars show the running density + SD based on bootstrapped error estimates. Events within ±5 bp of identified pause sites are color coded according to the scheme of Figure 2. The percentages of all events associated with each labeled pause are shown. (B) Histogram of the mean density of pauses versus template position for the his repeat template motif. time constants dominate. Consistent with this interpretation, the longest and shortest time constants scored for the four ubiquitous pauses (6.4 s and 1.3 s; Figure 5) match those obtained by the double-exponential fit. Pausing Is a Nonobligate State in the Elongation Pathway The pause-finding algorithm reliably detects events lasting R1 s. The apparent pause efficiency (percentage of molecules pausing at a site) therefore underestimates the true value due to missing events. A corrected pause efficiency, 3, was determined by adding the estimated number of short undetected pauses to the detected number, assuming that the exponential distribution of pause durations can be extrapolated to 0 s. Even with correction, all identified pause sites exhibited efficiencies well below 100%. Biochemical states situated on the main reaction pathway for transcription are visited, by definition, in an obligatory fashion during each nucleotide addition cycle. Measurements of corrected efficiency less than 100%, therefore, imply that pauses represent ‘‘off-pathway’’ states that exit and return to the main pathway, consistent with prior ensemble and single-molecule experiments (Artsimovitch and Landick, 2000; Davenport et al., 2000; Erie et al., 1993; Kassavetis and Chamberlin, 1981; Wang et al., 1995). In the simplest such branched scheme (Figure 6A), pause efficiency is a consequence of kinetic competition 1088 Cell 125, 1083–1094, June 16, 2006 ª2006 Elsevier Inc. between the rate of nucleotide addition, kn, and the rate of entry into a pause, kp. This model makes testable predictions. All else being equal, enzymes with slower elongation rates stand a greater chance of falling into the pause state per cycle, resulting in a positive correlation between the reciprocal of kn and the pause density (pauses per bp). Furthermore, assuming a constant rate of entry into the pause state at each position, the pause frequency, kp (pauses per unit time), is predicted to be independent of the elongation rate, kn. (Conversely, onpathway pausing predicts no correlation between the pause density and reciprocal elongation rate and positive correlation between pause frequency and kn.) The wide distribution of velocities for elongating polymerase molecules (equivalent to kn averaged over all template positions, Figure S9; consistent with Neuman et al., 2003) allows us to test these models. As predicted by the offpathway scheme, a positive correlation was found between the pause density and the inverse elongation velocity (Figure 6B), and no correlation was found between the pause frequency and inverse elongation velocity (Figure 6C). We also computed the correlation between the reciprocal elongation rate and apparent pause efficiency for each of the six pauses in the repeat motif (Figure 6D). Interestingly, no significant correlation was found for pauses a, d, ops, and his, where the next nucleotide added is G, but a stronger correlation was found for pauses b and c, where the next nucleotide added is A or C. For these assays, NTP concentrations were saturating for A, C, and U but comparable to the apparent dissociation rate for G (Abbondanzieri et al., 2005). It is possible that a slower rate of nucleotide binding at G sites complicates the kinetics at these positions, so that the sitespecific value of kn no longer reflects the average elongation rate. The Intrinsic Lifetime of the Pause State Is Conserved The pause lifetime, t, is the reciprocal of the exit rate from the pause state, 1/kp. However, molecules escaping this state immediately re-encounter a kinetic competition between further pausing (kp) and elongation (kn). For highefficiency pauses, it is likely that the molecule drops back into the pause state. Single-molecule transcription assays do not monitor exit from and reentry into the pause state but instead detect the resumption of elongation. Assuming that the amount of time spent in the active state before entering a pause is small compared with the time spent in the pause state, then the apparent lifetime of a pause, t*, will be nearly exponentially distributed. The measured lifetime, however, will be given by t* = t/(1 3), where t is the intrinsic lifetime (Supplemental Data). This apparent lifetime is positively correlated with pause efficiency (r = 0.86; p = 0.03). After correction of apparent lifetimes at all six pause sites for this effect, which vary by a factor of 5, we found that all the intrinsic lifetimes fell within a narrow range, averaging 1.1 s ± 0.4 s (mean ± SD) (Figure 6D). This model also predicts that low-efficiency pause sites Figure 5. Lifetimes and Efficiencies for Individual Pauses Histograms of identified pause dwell times, color coded according to the labeling scheme of Figure 2, with exponential fits. Measured apparent pause lifetimes (t*) and corrected pause efficiencies (3) are shown with estimated errors. (those outside the six pause regions), where the molecule is unlikely to reenter a pause state after escaping, should exhibit the intrinsic lifetime (Figure S10). Molecules Exist in Stable States with Different Pause Efficiencies The fact that RNAP molecules transcribe the same motif up to eight times presents an opportunity to look for direct experimental evidence of molecular memory or longrange sequence effects. The apparent efficiency of molecules pausing at a given site was plotted as a function of the repeat number (Figure 7A and Figure S7). A statistically significant variation of apparent pause efficiency across the template was not seen, suggesting that the propensity to pause is not influenced by such factors as the growing size of the RNA or proximity to distal, nonrepeating sequences. This justified the pooling of pause statistics for all repeats (e.g., Figure 2D). Although the average amount of pausing doesn’t vary from one repeat to the next, this does not exclude the possibility that enzymes vary in their individual propensity to pause. We consider three possible cases: (1) individual molecules exist in long-lived states with different intrinsic pause propensities (a heterogeneous population of stable states); (2) individual molecules can switch among states with different intrinsic pause propensities (a homogeneous population of unstable states); or (3) pause propensity is constant (a homogeneous population with respect to pausing). However, even in the latter case, the probabil- ity of pausing at a given position will depend on the elongation velocity since entry to the paused state is in kinetic competition with elongation in a branched pathway (Figure 6A). Heterogeneity of the population with respect to elongation velocity therefore makes it difficult to distinguish case 1 from case 3. To minimize the influence of this inhomogeneity, we restricted our attention to the four pause sites where pausing was uncorrelated with velocity (a, d, his, and ops). We defined a function that indicates whether pausing observed at a site within a given repeat motif is correlated with pausing at the corresponding site on a subsequent repeat. This correlation is plotted for the case where the second pause site is separated from the first by 1–4 repeat motifs (Figure 7B). In all cases, there was positive correlation, indicating that molecules tend to repeat the same behavior at subsequent sites (e.g., if a molecule paused at the first site, it is more likely to pause again). Molecules therefore occupy states of varying pause propensity, eliminating case 3. We also found that the correlation did not diminish with the distance between sites (D repeat), indicating that individual pause rates are conserved over at least 1000 bp. Constant correlations fail to support case 2 and lend support for case 1, where the elongation-competent state to which a molecule returns after pausing is always the same, although the kinetics of that state may differ from one molecule to the next. This is consistent with the conclusions of another study of transcription-complex inhomogeneity (Tolic-Norrelykke et al., 2004). Cell 125, 1083–1094, June 16, 2006 ª2006 Elsevier Inc. 1089 Figure 6. Pause Pathway and Pause Correlations (A) Simple off-pathway model for transcriptional pausing where the pause state competes kinetically with elongation. Steps in the normal elongation cycle are represented by a single transition with rate kn. The rate of entering or exiting a pause is kp or kp, respectively. Corrected pause efficiency (3), pause lifetime (t), and apparent pause lifetime (t*) are shown in terms of individual rate constants. (B) Pause density (pauses/100 bp) versus inverse elongation velocity. Each point corresponds to 1 of n = 114 individual molecules. The correlation coefficient is r = 0.71 (p = 6 3 1019). (C) Pause frequency (pauses/s) versus inverse elongation velocity. Each point corresponds to 1 of n = 114 individual molecules. The correlation coefficient is r = 0.16 (p = 0.1). (D) Corrected pause efficiencies (3), pause lifetimes (t), and apparent pause lifetimes (t*) for each of the identified pause sites. Correlations (and corresponding p values) between inverse velocity and apparent pause efficiency are shown. DISCUSSION Ubiquitous Pauses Are Sequence Dependent Previous single-molecule experiments established that transcriptional pausing is ubiquitous, occurring at an approximately constant rate (Neuman et al., 2003). By using 1090 Cell 125, 1083–1094, June 16, 2006 ª2006 Elsevier Inc. periodic templates to localize ubiquitous pauses, we now show that they are explained by efficient, sequencedependent pausing at a small fraction of available template locations. Ubiquitous pauses are therefore triggered by common sequence signals rather than random events. Furthermore, the kinetic properties of ubiquitous pauses were indistinguishable from those at the his and ops sites, whose multipartite sequence components are relatively well established. This suggested that sequence alignment of all six pauses could be informative. Interestingly, this alignment revealed sequence similarities (Figure 3), consistent with the idea that certain sequence components may occur frequently in pause signals. The a priori probability that at least six of seven sequences match at one or more of the 14 base positions is 7%, suggesting that the nearly conserved 10G is significant. The strong GC-bias at positions 10 and 11 has received little attention previously, and its mechanistic role remains unclear, although changes of 10G to U or C were found to weaken some pauses (Chan and Landick, 1993; Palangat and Landick, 2001). In principle, it could favor backtracking, induce transient overextension of the RNA: DNA hybrid to generate strain in the enzyme, or reflect some pause-favoring interactions of RNAP with nucleic acids at these positions. Although GC-rich sequences upstream of hybrids stabilize backtracking (Nudler et al., 1995; Reeder and Hawley, 1996), our results, together with others (Neuman et al., 2003; Toulokhonov and Landick, 2003), suggest that RNAP does not backtrack at these pause sites. Interestingly, inhibition of upstream hybrid melting is the first mechanism of transcriptional pausing to have been proposed (Gilbert, 1976), and it also has been suggested to influence abortive initiation (Kireeva et al., 2000). However, further work will be required to distinguish among possible mechanisms. All but one of the pauses occurred where a purine is added to a pyrimidine nucleotide, consistent with previous reports (Aivazashvili et al., 1981) and with the idea that this addition is either unusually slow and promotes pausing, or directly contributes to an elemental pause rearrangement. Algorithms for predicting pausing and elongation kinetics have been devised based on calculations of the energetic stability of the TEC as a function of position along a given sequence (Bai et al., 2004; von Hippel, 1998). One of these models predicts a class of short-lived pauses, situated on the main elongation pathway, that results from an exploration of both forward- and back-tracked translocation states, leading to a sequence-dependent transcription rate (Bai et al., 2004). The authors argued that the nucleotide addition rate would decrease for sequences where the energy of the pretranslocated state was significantly more favorable than the posttranslocated state and thereby produce a new type of pause, termed a ‘‘pretranslocation pause,’’ which they proposed to explain ubiquitous pauses. In the present study, however, we found no evidence to support such an on-pathway pause, even after correction for missed events. In particular, all mapped pauses appear to be pretranslocated (Figure 3) yet ranged Figure 7. RNAP Heterogeneity (A) Apparent pause efficiency versus repeat number of the pause site (numbered from 1 to 8) for sites indicated. (B) Correlation between pausing at a given site and during a subsequent visit to an equivalent site, plotted versus the distance between the pair of sites, measured in units of the motif repeat number (see Experimental Procedures). Probable errors (SD) were estimated from simulations. in efficiency from 30% to 82%, implying that pretranslocated pauses occur off pathway, as nonobligate states (Figure 4). Although pausing is determined by sequence, we present direct evidence that the tendency to pause at a given site exhibits molecule-to-molecule variation, as does the elongation velocity between pauses. This inhomogeneity is demonstrated by the positive correlation between pausing at a given site among different repeats (Figure 7). The fact that the correlation does not decay as the molecule traverses up to four repeats excludes internal state switching on the corresponding time scale as the source of any inhomogeneity and implies that the tendency to pause may be an inherent characteristic of each enzyme. The level of velocity inhomogeneity observed in our assay is consistent with that seen in previous studies of transcription (Neuman et al., 2003; Tolic-Norrelykke et al., 2004) but greater than in another assay (Adelman et al., 2002). We failed to detect any evidence for velocity-state switching, contrary to one report (Davenport et al., 2000). The basis of molecular variation in velocities and pause rates remains a subject for future study. We note, however, that our fundamental conclusions do not depend on the source of this inhomogeneity. A Common, Elemental Pause State It has been previously proposed that longer-lived transcriptional pauses (i.e., hairpin-stabilized and backtracking pauses) may arise from a common nonbacktracked precursor state (the elemental pause) (Artsimovitch and Landick, 2000; Neuman et al., 2003; Palangat and Landick, 2001) that is related to ubiquitous pausing (Neuman et al., 2003) and the ‘‘unactivated’’ state (Erie, 2002). Our results support this model and allow us to elaborate upon the original proposal. The fact that ubiquitous pauses are sequence dependent is consistent with the idea that this state represents a precursor to backtracked and hairpin-stabilized pauses, which are known to be sequence dependent. In support of this assignment, all pause lifetimes followed exponential distributions, consistent with a transition from a single state. The intrinsic rates for escaping the elemental pause state were similar for all sites, despite a 5-fold variation in apparent lifetime. Taken together, these observations suggest that the return to the main elongation pathway may represent the exit from an elemental pause state that forms under our experimental conditions. Interestingly, we observed no backtracking at the major ops regulatory site. Furthermore, in singlemolecule assays performed in the presence of DNA oligomers complementary to the his hairpin sequence that disrupt hairpin formation and prevented pause stabilization in previous experiments (Artsimovitch and Landick, 2000), we observed no decrease in the dwell time at the his site (data not shown). Indeed, we found no clear evidence for the formation of alternative, stabilized pause states at either the ops or his pause sites, suggesting that the elemental pause may represent the only state populated under the conditions studied here—namely, high NTP levels, moderate assisting loads that tend to inhibit backtracking, and lower temperature (21.5ºC), which is known to inhibit arrest or backtracking (Gu and Reines, 1995; Kulish and Struhl, 2001). This is consistent with the fact that backtracking at the ops site was suggested to occur in experiments conducted at 37ºC (Artsimovitch and Landick, 2000). Thus, in our single-molecule assay conditions, we appear to observe only an elemental pause state that is able to equilibrate with the online state from which it formed. A Two-Tiered Pause Mechanism Taken together, our data support the view that a ubiquitous pause is generated by a sequence-dependent Cell 125, 1083–1094, June 16, 2006 ª2006 Elsevier Inc. 1091 mechanism that induces RNAP to enter temporarily into an inactive state where elongation is inhibited. This leads to a brief transcriptional pause with little, if any, associated motion of the enzyme along the DNA template. Once RNAP pauses, secondary pause mechanisms, such as RNA hairpin formation or enzyme backtracking, compete with the slow rate of escape from the elemental pause (1 s1), which is more than 10-fold slower than the normal elongation rate. Such ‘‘two-tiered’’ mechanisms for the regulation of transcription have been previously suggested in studies of misincorporation, pausing, elongation, and termination (Artsimovitch and Landick, 2000; Erie et al., 1993; Foster et al., 2001; Palangat and Landick, 2001; von Hippel and Yager, 1991, 1992). In this twotiered mechanism, a long-lived regulatory pause would be comprised of two components acting in succession: (1) a common sequence element that triggers a temporary (elemental) pause state, followed by (2) additional sequence elements that convert the elemental pause into a long-lived pause. The pause-inducing and pause-stabilizing sequence elements might be distinct, but more likely they overlap or share common motifs. It follows from this model that there are two potential points of regulation: formation of the elemental pause and subsequent steps that stabilize pauses. Future work will entail discovering how regulators such as NusA, Nus G, and Gre factors function in the context of the two-tiered pause mechanism. EXPERIMENTAL PROCEDURES Cloning of a Single Repeat Motif his and ops-pheP pause sequences have been described (Artsimovitch and Landick, 2000). Flanking sequences in the tandem repeat regions, consisting of DNA derived from the rpoB gene, were from a region of pRL732 (1638–1800; Neuman et al., 2003; Shaevitz et al., 2003) with little propensity to form RNA secondary structure, based on mfold (Zuker, 2003). DNA templates were constructed from eight oligonucleotides of 60 bp with complementary overhanging ends for ligation in a repetitive and directed fashion. Equimolar amounts of adjacent double-stranded segments were ligated. Overhanging ends were filled in with the large Klenow fragment of DNA polymerase I (NEB). Bluntended products were ligated into pCR-Blunt vector (Zero Blunt Cloning Kit, Invitrogen). The two templates were amplified using PCR primers designed to add flanking sequences and were ligated into pCR-Script (Stratagene) vector. Cloning Concatenated Pause Sequences A BglII site was appended to the repeat motifs by PCR. PCR products were subsequently digested with BamHI and ligated into BamHI/SmaIdigested pUC19. The resulting repeat motifs were 227 bp and 239 bp for the ops and his motifs, respectively. Subsequent rounds of cloning to create 2-mers, 4-mers, and finally 8-mers in the pUC19-derived plasmid were done as described (Carrion-Vazquez et al., 1999). Cloning Pauses behind the T7 A1 Promoter for Optical Trapping The concatenated motifs were cloned into pKH1, an 4800 bp long derivative of pRL732 previously used as a source of DNA templates (Neuman et al., 2003; Shaevitz et al., 2003). pKH1 was constructed by digesting pRL732 with SphI and ClaI to remove 3000 bp. Short oligonucleotides containing BamHI sequence were annealed and ligated into the digested plasmid. The repeated his and ops pause sequences were released from their respective PUC19 plasmids by 1092 Cell 125, 1083–1094, June 16, 2006 ª2006 Elsevier Inc. digestion with BamHI and BglII. These sequences were gel purified and ligated into pKH1 plasmids that had been linearized with BamHI and transformed into XL-1 Blue cells (Stratagene). These resulting plasmids, pKH2 (ops) and pBW1 (his), were used to produce DNA templates for optical trapping. Transcription Templates for Optical-Trapping Assays Linear labeled templates were constructed from pKH2 and pBW1 plasmids by digesting each plasmid at a unique AlwNI site. The resulting 30 ends were labeled using DIG-ddUTP and terminal transferase (Roche DIG Oligonucleotide 30 -End Labeling Kit). Digestion at a unique SapI site removed the label at the transcriptionally downstream end, leaving a single label at the upstream end for tethering to anti-digoxigenin-coated polystyrene beads. Optical-Trapping Assay Biotin-labeled RNAP was stalled 29 bp after the T7A1 promoter (Neuman et al., 2003) on labeled templates (constructed as described above). Six hundred nanometer diameter avidin-labeled and seven hundred nanometer diameter anti-digoxigenin-labeled beads were prepared, and stalled TECs were bound to the beads to create bead: RNAP:DNA:bead dumbbells (Shaevitz et al., 2003). Experiments were performed as described previously (Shaevitz et al., 2003), but without heparin in the buffer and in the presence of 1 mM ATP, CTP, UTP and 250 mM GTP. The experimental room was maintained at 21.5 ± 0.1ºC. The resting tension in the DNA was maintained by a force clamp at 7.3 ± 2.4 pN (mean ± SD) by moving the 700 nm bead in 50 nm increments whenever the tension on the DNA fell below 5 pN (Shaevitz et al., 2003). Data Analysis The contour length of DNA between beads was calculated as described (Shaevitz et al., 2003). Template position was determined from the measured DNA contour length during transcription by subtracting the fixed contour length of the segment of DNA template in advance of the transcription initiation site (2607 bp, based on a pitch of 0.338 nm/bp). After traces were aligned (Supplemental Experimental Procedures), pauses were identified by an algorithm similar to Shaevitz et al. (2003) except that, to avoid scoring a single pause multiple times (due to drift), extra pauses found within 1 bp downstream of a prior pause were concatenated. Pause correlations for a given site were calculated by scoring each visit to the site with 1 or 0, according to whether a pause of R1 s was observed. The correlation was evaluated for all pairs of visits separated by the prescribed number of repeats, after preprocessing to subtract the mean value and renormalize the variance to unity. Analysis was performed in Igor Pro (Wavemetrics) and C. Supplemental Data Supplemental Data include Supplemental Experimental Procedures, ten figures, and Supplemental References and can be found with this article online at http://www.cell.com/cgi/content/full/125/6/1083/ DC1/. ACKNOWLEDGMENTS We thank J. Gelles and past and present members of the Block lab, particularly E. Abbondanzieri, J. Shaevitz, R. Dalal, and W. Greenleaf, for discussions and technical assistance. We thank P. Fordyce and M. Woodside for comments on the manuscript. K.M.H. thanks M. CarrionVazquez for cloning advice. B.J.W. received support from a Stanford Undergraduate Research Grant and an HHMI Summer Fellowship. K.M.H. and R.A.M. acknowledge support from an HHMI Predoctoral Fellowship and an NIH Biotechnology Training Grant, respectively. This work was supported by grants from the NIH to S.M.B. and R.L. Received: January 24, 2006 Revised: March 18, 2006 Accepted: April 13, 2006 Published: June 15, 2006 REFERENCES Abbondanzieri, E.A., Greenleaf, W.J., Shaevitz, J.W., Landick, R., and Block, S.M. (2005). Direct observation of base-pair stepping by RNA polymerase. Nature 438, 460–465. Adelman, K., La Porta, A., Santangelo, T.J., Lis, J.T., Roberts, J.W., and Wang, M.D. (2002). Single molecule analysis of RNA polymerase elongation reveals uniform kinetic behavior. Proc. Natl. Acad. Sci. USA 99, 13538–13543. Adelman, K., Marr, M.T., Werner, J., Saunders, A., Ni, Z., Andrulis, E.D., and Lis, J.T. (2005). Efficient release from promoter-proximal stall sites requires transcript cleavage factor TFIIS. Mol. Cell 17, 103–112. Aivazashvili, V.A., Bibilashvili, R., Vartikian, R.M., and Kutateladze, T.V. (1981). Factors influencing the pulse character of RNA elongation in vitro by E. coli RNA polymerase. Mol. Biol. (Mosk.) 15, 653–667. Artsimovitch, I., and Landick, R. (2000). Pausing by bacterial RNA polymerase is mediated by mechanistically distinct classes of signals. Proc. Natl. Acad. Sci. USA 97, 7090–7095. Artsimovitch, I., and Landick, R. (2002). The transcriptional regulator RfaH stimulates RNA chain synthesis after recruitment to elongation complexes by the exposed nontemplate DNA strand. Cell 109, 193– 203. Bai, L., Shundrovsky, A., and Wang, M.D. (2004). Sequence-dependent kinetic model for transcription elongation by RNA polymerase. J. Mol. Biol. 344, 335–349. Bailey, M.J., Hughes, C., and Koronakis, V. (1997). RfaH and the ops element, components of a novel system controlling bacterial transcription elongation. Mol. Microbiol. 26, 845–851. Carrion-Vazquez, M., Oberhauser, A.F., Fowler, S.B., Marszalek, P.E., Broedel, S.E., Clarke, J., and Fernandez, J.M. (1999). Mechanical and chemical unfolding of a single protein: a comparison. Proc. Natl. Acad. Sci. USA 96, 3694–3699. Chan, C.L., and Landick, R. (1993). Dissection of the his leader pause site by base substitution reveals a multipartite signal that includes a pause RNA hairpin. J. Mol. Biol. 233, 25–42. Davenport, R.J., Wuite, G.J., Landick, R., and Bustamante, C. (2000). Single-molecule study of transcriptional pausing and arrest by E. coli RNA polymerase. Science 287, 2497–2500. de la Mata, M., Alonso, C.R., Kadener, S., Fededa, J.P., Blaustein, M., Pelisch, F., Cramer, P., Bentley, D., and Kornblihtt, A.R. (2003). A slow RNA polymerase II affects alternative splicing in vivo. Mol. Cell 12, 525–532. de Mercoyrol, L., Soulie, J.M., Job, C., Job, D., Dussert, C., Palmari, J., Rasigni, M., and Rasigni, G. (1990). Abortive intermediates in transcription by wheat-germ RNA polymerase II. Dynamic aspects of enzyme/ template interactions in selection of the enzyme synthetic mode. Biochem. J. 269, 651–658. (Cold Spring Harbor, NY, USA: Cold Spring Harbor Laboratory), pp. 193–205. Gorodkin, J., Heyer, L.J., Brunak, S., and Stormo, G.D. (1997). Displaying the information contents of structural RNA alignments: the structure logos. Comput. Appl. Biosci. 13, 583–586. Gu, W., and Reines, D. (1995). Identification of a decay in transcription potential that results in elongation factor dependence of RNA polymerase II. J. Biol. Chem. 270, 11238–11244. Harrington, K.J., Laughlin, R.B., and Liang, S. (2001). Balanced branching in transcription termination. Proc. Natl. Acad. Sci. USA 98, 5019–5024. Henkin, T.M., and Yanofsky, C. (2002). Regulation by transcription attenuation in bacteria: how RNA provides instructions for transcription termination/antitermination decisions. Bioessays 24, 700–707. Kassavetis, G.A., and Chamberlin, M.J. (1981). Pausing and termination of transcription within the early region of bacteriophage T7 DNA in vitro. J. Biol. Chem. 256, 2777–2786. Kireeva, M.L., Komissarova, N., and Kashlev, M. (2000). Overextended RNA:DNA hybrid as a negative regulator of RNA polymerase II processivity. J. Mol. Biol. 299, 325–335. Kireeva, M.L., Hancock, B., Cremona, G.H., Walter, W., Studitsky, V.M., and Kashlev, M. (2005). Nature of the nucleosomal barrier to RNA polymerase II. Mol. Cell 18, 97–108. Komissarova, N., and Kashlev, M. (1997). Transcriptional arrest: Escherichia coli RNA polymerase translocates backward, leaving the 30 end of the RNA intact and extruded. Proc. Natl. Acad. Sci. USA 94, 1755–1760. Kulish, D., and Struhl, K. (2001). TFIIS enhances transcriptional elongation through an artificial arrest site in vivo. Mol. Cell. Biol. 21, 4162– 4168. Landick, R., Turnbough, C.J., and Yanofsky, C. (1996). Transcription attenuation. In Escherichia coli and Salmonella: Cellular and Molecular Biology, F. Neidhardt, R. Curtiss, III, J.L. Ingraham, E.C.C. Lin, K.B. Low, B. Magasanik, W.S. Rfznikopp, M. Riley, M. Schaechter, and H.E. Umbarger, eds. (Washington, DC: ASM Press), pp. 1263–1286. Neuman, K.C., Abbondanzieri, E.A., Landick, R., Gelles, J., and Block, S.M. (2003). Ubiquitous transcriptional pausing is independent of RNA polymerase backtracking. Cell 115, 437–447. Nudler, E., Kashlev, M., Nikiforov, V., and Goldfarb, A. (1995). Coupling between transcription termination and RNA polymerase inchworming. Cell 81, 351–357. Palangat, M., and Landick, R. (2001). Roles of RNA:DNA hybrid stability, RNA structure, and active site conformation in pausing by human RNA polymerase II. J. Mol. Biol. 311, 265–282. Palangat, M., Meier, T.I., Keene, R.G., and Landick, R. (1998). Transcriptional pausing at +62 of the HIV-1 nascent RNA modulates formation of the TAR RNA structure. Mol. Cell 1, 1033–1042. Pasman, Z., and von Hippel, P.H. (2002). Active Escherichia coli transcription elongation complexes are functionally homogeneous. J. Mol. Biol. 322, 505–519. Erie, D.A. (2002). The many conformational states of RNA polymerase elongation complexes and their roles in the regulation of transcription. Biochim. Biophys. Acta 1577, 224–239. Reeder, T.C., and Hawley, D.K. (1996). Promoter proximal sequences modulate RNA polymerase II elongation by a novel mechanism. Cell 87, 767–777. Erie, D.A., Hajiseyedjavadi, O., Young, M.C., and von Hippel, P.H. (1993). Multiple RNA polymerase conformations and GreA: control of the fidelity of transcription. Science 262, 867–873. Renner, D.B., Yamaguchi, Y., Wada, T., Handa, H., and Price, D.H. (2001). A highly purified RNA polymerase II elongation control system. J. Biol. Chem. 276, 42601–42609. Foster, J.E., Holmes, S.F., and Erie, D.A. (2001). Allosteric binding of nucleoside triphosphates to RNA polymerase regulates transcription elongation. Cell 106, 243–252. Richardson, J.P., and Greenblatt, J. (1996). Control of RNA chain elongation and termination. In Escherichia coli and Salmonella: Cellular and Molecular Biology, F. Neidhardt, R. Curtiss, III, J.L. Ingraham, E.C.C. Lin, K.B. Low, B. Magasanik, W.S. Rfznikopp, M. Riley, M. Schaechter, and H.E. Umbarger, eds. (Washington, DC: ASM Press), pp. 822–848. Gilbert, W.J. (1976). Starting and stopping sequences of the RNA polymerase. In RNA Polymerase, R. Losick, and M.J. Chamberlin, eds. Cell 125, 1083–1094, June 16, 2006 ª2006 Elsevier Inc. 1093 Ring, B.Z., Yarnell, W.S., and Roberts, J.W. (1996). Function of E. coli RNA polymerase sigma factor sigma 70 in promoter-proximal pausing. Cell 86, 485–493. Shaevitz, J.W., Abbondanzieri, E.A., Landick, R., and Block, S.M. (2003). Backtracking by single RNA polymerase molecules observed at near-base-pair resolution. Nature 426, 684–687. Shundrovsky, A., Santangelo, T.J., Roberts, J.W., and Wang, M.D. (2004). A single-molecule technique to study sequence-dependent transcription pausing. Biophys. J. 87, 3945–3953. Tang, H., Liu, Y., Madabusi, L., and Gilmour, D.S. (2000). Promoterproximal pausing on the hsp70 promoter in Drosophila melanogaster depends on the upstream regulator. Mol. Cell. Biol. 20, 2569–2580. Tolic-Norrelykke, S.F., Engh, A.M., Landick, R., and Gelles, J. (2004). Diversity in the rates of transcript elongation by single RNA polymerase molecules. J. Biol. Chem. 279, 3292–3299. Toulokhonov, I., and Landick, R. (2003). The flap domain is required for pause RNA hairpin inhibition of catalysis by RNA polymerase and can modulate intrinsic termination. Mol. Cell 12, 1125–1136. Toulokhonov, I., Artsimovitch, I., and Landick, R. (2001). Allosteric control of RNA polymerase by a site that contacts nascent RNA hairpins. Science 292, 730–733. von Hippel, P.H. (1998). An integrated model of the transcription complex in elongation, termination, and editing. Science 281, 660–665. von Hippel, P.H., and Yager, T.D. (1991). Transcript elongation and termination are competitive kinetic processes. Proc. Natl. Acad. Sci. USA 88, 2307–2311. 1094 Cell 125, 1083–1094, June 16, 2006 ª2006 Elsevier Inc. von Hippel, P.H., and Yager, T.D. (1992). The elongation-termination decision in transcription. Science 255, 809–812. Wang, D., Meier, T.I., Chan, C.L., Feng, G., Lee, D.N., and Landick, R. (1995). Discontinuous movements of DNA and RNA in RNA polymerase accompany formation of a paused transcription complex. Cell 81, 341–350. Yarnell, W.S., and Roberts, J.W. (1999). Mechanism of intrinsic transcription termination and antitermination. Science 284, 611–615. Yonaha, M., and Proudfoot, N.J. (1999). Specific transcriptional pausing activates polyadenylation in a coupled in vitro system. Mol. Cell 3, 593–600. Zuker, M. (2003). Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 31, 3406–3415. Note Added in Proof We have recently become aware of an improved model for transcriptional pausing that incorporates the energetics of RNA folding and long-lived kinetic barriers that depend upon the DNA sequence. The reference for this model is as follows: Tadigotla, V.R., O’Maoileidigh, D., Sengupta, A.M., Epshtein, V., Ebright, R.H., Nudler, E., and Ruckenstein, A.E. (2006). Thermodynamic and kinetic modeling of transcriptional pausing. Proc. Natl. Acad. Sci. USA 103, 4439–4444. This model predicts pauses that correlate with a high fraction of the locations reported here (A.E. Ruckenstein, personal communication). Herbert et al. 2006 SUPPLEMENTAL DATA RNAP pausing on repeat templates Supplemental Data Sequence-resolved Detection of Pausing by Single RNA Polymerase Molecules on Periodic Templates Reveals an Elemental Pause State Kristina M. Herbert, Arthur La Porta, Becky J. Wong, Rachel A. Mooney, Keir C. Neuman, Robert Landick, and Steven M. Block Supplemental Experimental Procedures Detailed Description of the Alignment Algorithm Raw data records obtained with the optical trapping apparatus using the dumbbell assay contain small calibration uncertainties that are attributable, in part, to dispersion in the sizes of the polystyrene beads to which transcription complexes were attached, as well as to other technical limitations of the apparatus. The uncertainty in bead radius shifts the apparent position of the transcription record by a corresponding amount. This same positional uncertainty also leads to a calibration error in the applied force, which introduces a variation in the elastic extension of the DNA tether, in effect stretching or compressing records. As a consequence, the true position, x, of RNAP on the tether for any given record is related to its apparent position, x′, by x = αx′ +β, where α is the stretch factor and β is the shift (offset) distance. For convenience, the origin of our coordinate system is taken to be the position of the terminator. The purpose of the alignment algorithm is to determine the optimal α and β for each individual record, so that features such as pauses or dissociation events can be related directly to the underlying DNA sequence. That algorithm is described here. First Stage: Alignment of Primary Records 1. Before processing, all data records, expressed as position (in basepairs) vs. time (in seconds), were low-pass filtered by convolution with a Gaussian of std. dev. 0.5 s. Data were initially sampled at 2000 s−1 and the filtered records were decimated by a factor of 25. A histogram for the position of each molecule along the template was generated from the records with bins of width 0.5 bp, and the logarithm of the histogram was calculated. The resulting dwell histogram indicates the logarithm of the amount of time the molecule spends at each position on the template. 2. For each record, the factor α was scanned between 0.95 and 1.01 in increments of 0.0005. For each trial α, the log-dwell histogram of the record was recalculated, and sections corresponding to the repeat motifs were extracted and averaged together. When the choice of α is optimal, any sequence-dependent pauses will tend to fall into register, and a minimum number of maximally sharp peaks will be obtained in the 8-fold averaged log-dwell histogram (Figure S1a), causing the overall distribution to become highly skewed (where the measure of skew is computed from the normalized third moment of the distribution after subtraction of the mean). The value of α producing the largest skewness value was selected (Figure S1b). S1 SUPPLEMENTAL DATA Herbert et al. 2006 RNAP pausing on repeat templates 3. Once the value for α was optimized, the position at which each complex dissociates was compared to the position of the expected termination site. Records terminating within 60 bp of the expected position and which traversed at least 5 repeat motifs were retained for analysis. For each member of this ensemble of records, the value for β was selected to shift the dissociation position to coincide with the termination site. 4. The log-dwell plots for all terminating records were averaged together after the firststage alignment to create a combined log-dwell plot (Figure 2b). The log-dwell plots for all 8 repeat motifs could be averaged together to determine the log-dwell plot for a single repeat. Approximately half of the data records obtained satisfied the conditions necessary to be included in the first stage of alignment, above. The resulting alignment exhibited high fidelity near the terminator position , but this alignment tended to degrade somewhat as the distance from the terminator increased, presumably due to insufficient resolution in the determination of the α’s. The purpose of the second stage of analysis was to align all records— including those that read through the terminator region and those that failed to reach the end of the template—and to improve the global alignment far from the termination site. 1 before rescaling after rescaling Skewness Log dwell 0 0.6 -1 -2 0.4 0.2 -3 0.0 0 50 100 150 200 250 0.96 Position (bp) 0.98 1.00 Scale factor 1.02 Figure S1. Optimization of the scale factor. (a) Increased sharpness of individual peaks in the overlaid log-dwell function after the data were rescaled for the correct periodicity. (b) Computed skewness for the ensemble distribution of log-dwell times as a function of the scale factor, α, showing the optimum near 0.982. Second Stage: Alignment of All Records 1. A template for aligning all records was created from the first stage of alignment. The three terminator-proximal repeats of the log-dwell histogram were averaged together, and this function was concatenated 8 times to represent the log-dwell function expected for a full series of 8 repeats, producing an alignment template. 2. For each data record, the correlation function was computed between its particular logdwell histogram and the histogram of the alignment template, above, as a function of both α and β. (Curves were offset to produce zero mean before correlations were calculated.) An (α,β) pair for each record was selected that produced maximal correlation, subject to the constraints (−110 bp < β < 110 bp) and (0.95 < α < 1.01)(See Figure S2a). Records were retained if they overlapped with at least 50% of the repeat region and if the S2 SUPPLEMENTAL DATA Herbert et al. 2006 RNAP pausing on repeat templates computed correlation surpassed a threshold value of 25%. The (α,β) values obtained from this second stage alignment were considered final and overrode any preliminary values assigned during the first stage of alignment (if applicable). 110 (a) shift (bp) shift (bp) 110 0 -110 0.94 0.98 scale 1.02 (b) 0 -110 0.94 0.98 scale 1.02 Figure S2. A 2D representation of optimal (α,β) values. (a) The level of brightness indicates the overall correlation between the histograms of the alignment template and a single transcription record for the his repeat template as a function of both shift (vertical axis) and scale factor (horizontal axis). The global peak may be identified as the bright spot near (0.98, 0). (b) The same, but for a simulated data record. Here, the peak lies near (0.99, 103). Assessment of Alignment Quality by Numerical Simulation After all stages of alignment, global histograms of dwell positions along the template displayed a series of well-defined peaks indicative of sequence-dependent pauses. However, it is thought that RNAP may also have a small but finite probability of pausing at any position on the template. It was therefore necessary to consider whether a background of pauses falling outside the strong, identified pause regions could introduce artifacts that might lead to misalignment of records. We believe the algorithm remains robust for the following reasons: (1) The template used for final alignment was generated directly from the terminating records, which are aligned a priori. As a consequence, the algorithm does not have the freedom to create new features in the aligned data or to drift away from the termination site; (2) The non-terminating records (for which a priori alignment was not possible) visit an array of 5 high-efficiency pause sites repeated up to 8 times. These high-efficiency sites account for 2/3 of all pauses, and therefore produce, on average, 20 sequence-dependent pauses per record that correlate with the alignment template. It seems improbable that a small population of background pauses could out-compete this strong repetitive signal by chance. However, to verify the fidelity of the alignment algorithm, we created a numerical simulation of RNAP transcription for the his repeat template with programmable sequence dependence. Simulated data records were generated and analyzed with the identical software used to analyze actual data, and the measured alignment was compared with the correct alignment, which is known for simulated data. A parameter file was used by the simulation to specify the kinetic properties at every base position along the template. These sequence-dependent properties included the rate of elongation, the rates for entering and escaping each of two pause states [to facilitate modeling of the dual exponential pause duration previously reported for ubiquitous pausing (Neuman et al., 2003)], and a termination rate. For generic positions in the repeat motif, and in other regions of the template, the elongation rate was selected from a normal distribution centered at 14 bp/s with std. dev. 7 bp/s, and the pause entry and escape rates were selected to match the experimentallyS3 SUPPLEMENTAL DATA Herbert et al. 2006 RNAP pausing on repeat templates established ubiquitous pause properties. At the high-efficiency pause sites, a, b, c, d and his, the pause entry and escape rates were chosen to reproduce the measured pause efficiencies and apparent lifetimes. The dissociation rate was set to 10-5 bp-1at all positions on the template except at the termination site, where it was 0.5 bp-1. The algorithm is a stochastic simulation: at each basepair position, the program uses a random number generator to decide the kinetic competition between elongation and pausing. If elongation is selected, the dwell time at the base is drawn from an exponential distribution with time constant corresponding to the inverse elongation rate. If the pause state is entered, the dwell time is generated from an exponential distribution based on the escape rate from the pause state. The algorithm then simulates the competition between dissociation and advancing to the next base (reentry to the pause state is not included in the simulation, so the pause escape rate is based on the apparent pause lifetime, τ*, rather than the true lifetime). The output is a specification of how much time the simulated enzyme dwelled at each base, which is subsequently converted to a position-vs.-time record sampled at 2 kHz. Finally, simulated Brownian motion (white noise) and instrument drift (a random walk) are superimposed on every record. This algorithm was run 60 times to generate a set of 60 records. For each record, the algorithm then imposed a random calibration error, with a small shift drawn from a zero-mean Gaussian distribution with std. dev. 60 bp and a stretch factor drawn from a unity-mean Gaussian distribution with a std. dev. of 2%. The resulting records bore a strong resemblance to actual data, exhibiting similar velocity fluctuations and sequence-dependent pausing interspersed with apparently random pauses and realistic levels of noise. The simulated runs were then analyzed using the alignment algorithm, including both the first and second phases of alignment. A typical correlation map obtained using simulated data was similar to that obtained from actual data (Figure S2). The (α,β) pairs returned by the alignment program were then compared to the values actually used to shift and rescale the simulated records. The fidelity of the alignment procedure is best appreciated by plotting the measured calibration parameters against the true calibration parameters used in the simulation (Figure S3). 1.04 100 Measured scale factor Measured shift (bp) 150 50 0 -50 -100 -150 -150 -100 -50 0 50 100 150 Generated shift (bp) 1.03 first stage second stage 1.02 1.01 1.00 0.99 0.98 0.97 0.96 0.98 0.99 1.00 1.01 1.02 Generated scale factor Figure S3. Accuracy of the alignment algorithm based on simulations. (a) The measured (optimal) shift, β, returned by the alignment procedure vs. the shift actually generated by the simulation. (b) The measured (optimal) scale factor, α, returned by the alignment procedure vs. the scale factor actually generated by the simulation. Results from both first (red open triangles) and second (blue filled circles) stages of the algorithm are shown. Note the improvement in fidelity after the second stage. S4 Herbert et al. 2006 SUPPLEMENTAL DATA RNAP pausing on repeat templates Only 2 of 60 simulated records were aligned with a shift that deviated appreciably from the correct value. Two additional records were shifted by the full repeat distance, which brings the sequence in the overall repeat region into correct alignment except for the initial or final repeat motif. In all, 96% of all simulated data were aligned correctly by the algorithm. Uncertainties Resulting from Sequence-dependent Variations in the Helical Pitch of dsDNA One assumption implicit in the alignment procedure is that physical distance along the DNA tether (measured in nm) can be mapped directly to base pairs of sequence by applying a constant scale factor (0.34 nm/bp), which represents the average value for the helical pitch of DNA. However, it is known from crystallographic data that the rise per basepair is somewhat sequencedependent, varying with std. dev. equal to ~10% of the mean. Insufficient data are available to calculate the rise of an arbitrary sequence (Yanagi et al., 1991). Our alignment algorithm measures the repeat interval for the repetitive portion of the DNA template and equates this distance to the known length of the his or ops templates in basepairs. Nevertheless, the actual sequence distance between any two identified pause sites within a single repeat can differ from the nominal value if the physical length of sequence separating them differs from the expected value. Making the approximation that the variation in rise per basepair is normally distributed and that the sequence separating any two locations can be approximated as random, the uncertainty in sequence distance between two pause sites is 0.1 bp N , where N is the number of base pairs separating the sites. The worst-case scenario occurs when comparing pause sites separated by half the repeat distance, or ~120 bp. In this instance, the uncertainty is 0.1 bp 120 ≈ 1 bp . When pause sites are compared with nearby sites, the uncertainty is correspondingly smaller. The natural variation in base rise will also lead to an uncertainty in the absolute position of the pause sites with respect to the terminator, which serves as the ultimate reference point for sequence alignments. The sequence distance between the terminator and the his pause sequence is exactly 160 bp, assuming that dissociation occurs for RNAP in its pre-translocated state. The alignment algorithm places the his pause site 161.7 bp upstream of the terminator, indicating that the difference in translocation state of RNAP at the terminator and at the his site is 1.7 bp. This difference could be explained by backtracking from the his site, by forward translocation at the terminator, or by some combination of the two. However, the base rise uncertainty over this same distance comes to 0.1 bp 160 ≈ 1.3 bp . The discrepancy in translocation state of 1.7 bp is therefore comparable to the statistical component of measurement uncertainty, and experimentally consistent with zero. Although variations in helical pitch limit the resolution of absolute measurements of position, the translocation states of RNAP at the his and ops pause sites can nevertheless be measured relative to one another with far greater precision: this is because the his and ops sequences have been inserted into a nearly identical DNA context in the two repeat templates constructed. Excluding the transcription bubble, the DNA sequence present in the tether when RNAP reaches the his pause site differs by only 7 bp from the analogous configuration with RNAP at the ops site. Using the a-d pause sites to define the coordinate system, only a variation in pitch for these 7 bp can contribute to the relative position uncertainty for the his vs, ops pause sites. The uncertainty in the relative translocation state is therefore 0.1 bp 7 ≈ 0.26 bp . We estimate the uncertainty in S5 SUPPLEMENTAL DATA Herbert et al. 2006 RNAP pausing on repeat templates the relative translocation state for any of the a-d pauses with respect to neighboring pauses to be on the order of 0.1 bp 25 ≈ 0.5 bp . The variation in helical pitch of the his and ops inserts due to sequence effects may also be estimated using the alignment algorithm itself. These two repeat segments have the same 207 bp frame into which either the 20 bp ops or the 32 bp his pause element is inserted. Since the sequence length of the repeat segment is computed experimentally by assuming constant helical pitch, if the his sequence produced an anomalously large rise, then the remaining portion of the record would appear to be compressed by the corresponding amount to make up this difference, and visa versa (Figure S4a). The same argument holds true for any anomalous variations in the ops insert. By comparing the spacing of the a through d pause sites, determined separately on the his and ops repeat templates, it is possible to detect any such compression or expansion, and thereby to measure any deviations from normal pitch in the his or ops inserts. The data (Figure S4b) show the difference between the spacing of pauses on the his and ops templates as a function of the spacing observed in the his template. The slope of the graph indicates a fractional expansion of 0.0014, and an absolute expansion of 0.3 bp in the his template relative to the ops. This means that the 32 bp his insert has an apparent length deficit corresponding to just 0.3 bp with respect to the ops insert. This is less than the uncertainty of 0.7 bp predicted from random sequence based on crystallographic data, suggesting that the sequence-dependent variation in DNA helical pitch may be somewhat smaller in single-molecule studies, where the DNA is held under tension. Pause locations reported in the paper were corrected for the measured discrepancy. d a his b c 239 bp His-ops deviation (bp) 0.16 0.14 data fit 0.12 0.10 0.08 0.06 0.04 0.02 0.00 0 10 20 30 40 50 60 70 80 90 Pause spacing (bp) Figure S4. Estimating the measurement uncertainty associated with variations in helical pitch. (Left) Since the length of the repeat segment (in bp) is fixed, any anomalous rise in the his insert (red) will cause features in the remaining sequence (blue) to appear to expand or contract. (Right) The difference in interpause distances among the a-d pauses measured on his vs. ops templates is consistent with a 0.0014 relative expansion of the his insert, or 0.3 bp over the 207 bp sequence length. Relationship Between Pause Lifetime and Pause Efficiency in the Re-entry Model The relationship between pause lifetime and efficiency may be derived as follows. We postulate the states shown in Figure 6A and assume reversible entry into the pause state, as well as nonreversible elongation of the nascent RNA. In single-molecule assays, we can measure two quantities at a pause site. The first is pause efficiency, ε, which we define as the percentage of all RNAP molecules pausing at the site, corrected for any missed pause events, as described in the S6 Herbert et al. 2006 SUPPLEMENTAL DATA RNAP pausing on repeat templates main text. The second is the apparent pause lifetime, τ*, which represents the amount of time elapsed before normal transcriptional elongation resumes. The apparent pause lifetimes at a given site are well approximated by an exponential distribution, from which the time constant τ* may be calculated, defined through P(t ) = exp(− t τ *) . If we assume that the pause efficiency is due to a competition between elongation (kn) and pause entry (kp), then the pause efficiency is defined by: kp ε= kp + kn Strictly speaking, the lifetime of the pause state is τ = 1/k-p. However, in single molecule experiments, we do not observe the escape from the pause state, but rather the resumption of elongation. We assume that the escape from the pause state represents a direct reversal of the entry into this same state, and therefore that the enzyme returns to a kinetic competition between further elongation or pause entry. If the rate of returning to the pause state were comparable to the pause escape rate, this would result in a complex (i.e., non-exponential) distribution of waiting times before the resumption of elongation. However, for high-efficiency pauses at saturating NTP concentrations, the rate of nucleotide addition — and therefore the rate for returning to the pause state (the ratio of these two rates being set by the efficiency) — will be high compared to the rate of escape from the pause state. In that event, we can approximate reentry to the pause state as instantaneous. Put another way, escape from the pause state and subsequent reentry can be regarded as a failure to escape from the paused state. The rate of failed escapes is given by k −p ε and the rate of successful escapes by k − p (1 − ε ) (i.e., if a pause is 75% efficient, only 25% of pause escapes result in resumption of elongation, and the lifetime of the nonproductive state will be 4-fold longer.) The time constant for resumption of elongation may therefore be expressed as: 1 τ τ* = = k − p (1 − ε ) (1 − ε ) = (k n +k p ) k −p k p However, one cannot take for granted that this relationship will continue to hold as the nucleotide conditions are reduced, or as other factors influencing elongation and pausing are varied. For example, we consider more generally the modified Brownian ratchet mechanism described by (Abbondanzieri et al., 2005), where the reaction cycle consists of multiple steps. Simple (exponential) behavior would be recovered if the overall rate of elongation were limited by a single, rate-limiting transition, such that pausing would occur mainly prior to the corresponding state. The diagram below (Figure S5), where the grayed-out portion of the pathway can be neglected at high [NTP], represents one such possibility. S7 Herbert et al. 2006 SUPPLEMENTAL DATA RNAP pausing on repeat templates Figure S5. A scheme for reconciling kinetic competition at a pause site with the Brownian ratchet mechanism of Abbondanzieri et. al. (2005). The model shown incorporates a secondary NTP binding site but assumes a fixed order of translocation and NTP shift. The grayed-out state (lower left) may be neglected at saturating NTP concentrations because the reaction will be driven strongly towards the NTPbound state. The box (cyan) indicates the kinetic competition between pausing and resumption of elongation. Here, the translocation step is rate-limiting and pausing branches from the pre-translocated state. This configuration reduces directly to the simple three state model of Figure 6. Similar kinetics would also be obtained if the NTP shift from the secondary site or the condensation step were rate-limiting, and all other transitions prior to phosphate release were rapid, so that the series of states following the translocation step is effectively in equilibrium. Under these circumstances, the pause re-entry model would hold, although the rate in kinetic competition with pausing would be the rate associated with the sub-step (i.e., NTP shift or condensation), rather than the overall elongation rate. At reduced NTP concentrations (or in the presence of certain transcriptional co-factors), the probability of being in the NTP-bound state may become significantly lower, and the grayed out portion of the scheme may come into play, leading to more complex pause kinetics. Gel Mapping of Transcriptional Pause Sites Plasmid construction To create a plasmid with a repeat motif situated close to the promoter, which leads to more synchronous transcription and to improved mapping of pause sequences, we engineered two derivatives of pRL418 (Chan and Landick, 1989). The T7 A1 promoter, followed by a T-less cassette from pRL418, was moved into pKH2(ops) or pBW1(his) on a DraIII -BamHI fragment to generate plasmids pKH3(ops) and pBW2(his), respectively. Bulk Transcription Assays Templates were prepared by digestion of pKH3 and pBW2 with MaeIII, resulting in a ~3300 bplong segment containing the T7 A1 promoter and repeat sequence; this was subsequently purified from low-melting agarose followed by phenol extraction and precipitation, or by using the Qiagen PCR purification kit. Halted transcription complexes were formed by mixing 10 nM template, 25 nM RNAP, 150 μM ApU dinucleotide, and 10 μM ATP, GTP, α-32P CTP in transcription buffer (20 mM Tris-HCl, pH 7.9; 20 mM NaCl; 3 mM MgCl2; 14 mM 2mercaptoethanol; 0.1 mM EDTA) and incubated for 10 min at 37°. After equilibration to room temperature (~22°C ), transcription was restarted by addition of NTPs. Samples were taken at 0, 15", 30", 1', 2', 4', and 8' and separated by 8% PAGE alongside RNA and DNA sequencing ladders and 32P-labeled pBR322 MspI digest fragments. S8 Herbert et al. 2006 SUPPLEMENTAL DATA RNAP pausing on repeat templates Figure S6. Mapping of the 3′ ends of RNA transcripts at pause sites Transcription was initiated with 10 µM ATP, GTP, and CTP and incubated for 15 min at 37°C. Halted complexes were equilibrated to room temperature and elongation restarted by adjustment to 1 mM each of ATP, CTP, UTP, and 250 µM GTP in the presence of 100 µg/ml Rifampicin. Samples were taken at the time points indicated and separated by 8% PAGE. RNA markers were generated by adjusting halted complexes to 40 µM for all 4 NTPS in the presence of 50 µM 3’-deoxy-NTP (G, A, U, or C). Reactions were loaded either directly with buffer or mixed with a sample containing paused RNA to verify mapping. Panels A through F each map one of the six identified pause sites. The labels above lanes indicate either the times when samples were taken after restarting transcription or the identities of the 3’ deoxy-NTP S9 Herbert et al. 2006 SUPPLEMENTAL DATA RNAP pausing on repeat templates present in the reaction buffer. Open circles label positions of paused RNAs. The sequence of the pause region is shown beside each gel, with lines drawn from individual bases to the corresponding bands. Each image has been contrast-enhanced to increase clarity, in some cases with different adjustments for different lanes to compensate for differences in gel loading. Beneath each gel panel is the RNA sequence, numbered according to positions in the his pause-containing repeat (except panel F, which is numbered according to the ops repeat sequence). The line above each sequence represents the RNA transcript, with the open circle identifying the 3’ end of the paused RNA(s). (A) Mapping of pause a. GAUC sequencing reactions (lanes 1-4) were electrophoresed next to transcription reactions halted at 15 and 30 s (lanes 5 & 6). The major pause is evident in the 30 s reaction aligned with the top of the compressed bands corresponding to tandem Cs 4 and 5. (B) Mapping of pause b. Transcription reactions halted at 15 s, 30 s, 1 min, and 2 min (lanes 1-4) were electrophoresed adjacent to a G sequencing reaction (lane 5), G and C sequencing reactions mixed with the 30 s transcription reaction (lanes 6 & 7), and GAUC sequencing reactions (lanes 8-11). (C) Mapping of pause c. Pause bands formed after 20 s were electrophoresed with G or C sequencing reactions (lanes 1 & 2), alone (lanes 3 & 4), and adjacent to GAUC sequencing reactions (lanes 5-8). (D) Mapping of pause d. Pause bands formed after 30 s (lane 1) were electrophoresed adjacent to GAAUC sequencing reactions (lanes 3-7; lane 2 is empty) and mixed with G or C sequencing reactions (lanes 8 & 9). The pause band formed during all of the sequencing reactions. (E) Mapping of the his pause. The pause band formed after 30 s (lane 4) was electrophoresed next to AUC sequencing reactions (lanes 1-3). The G sequencing reaction failed in this experiment (not shown). The his pause band is faintly evident in the A sequencing reaction. (F) Mapping of the ops pause. Four time points from a pause reaction (lanes 1-4) were electrophoresed next to the GAUC sequencing reactions (lanes 5-8). The pause bands, especially the C 153 pause band, are evident in all four sequencing reaction lanes. (G) Sequence of the nascent RNA corresponding to one his template repeat (underlined sequence corresponds to the inserted pause element; sequence not underlined corresponds to flanking regions derived from rpoB sequences; see Experimental Procedures). The RNA 3′ ends of the pause are indicated by open circles. Portions of the sequence that differ in the ops repeat template are shown beneath the his sequence with dashed lines to indicate the region that is replaced. Molecules Exist in Stable States with Different Characteristic Pause Efficiencies Figure S7. RNAP heterogeneity. (A) Apparent pause efficiency vs. the repeat number for the pause site (numbered from 1 to 8) for the sites indicated. (B) Correlation between the probability of pausing at a given site and for a subsequent visit to the same site, plotted vs. the distance between the pair of sites, measured in units of the motif repeat number (see Experimental Procedures). Probable errors were estimated from simulations. S10 Herbert et al. 2006 SUPPLEMENTAL DATA RNAP pausing on repeat templates Pause Densities and Lifetimes Vary in a Sequence-dependent Manner Figure S8. Distribution of pause densities outside high-density regions. Histogram of pause densities (red) derived from flanking regions outside the labeled pauses: these template regions are indicated as black bars in Figure 4 of the main text, and constitute 79% of the overall sequence of the repeat region. For comparison, we have plotted a binomial distribution (black) with a mean identical to that of the observed data, with associated statistical errors. These two distributions are statistically different (see main text). Pausing is a Non-obligate State in the Elongation Pathway Figure S9. Enzyme velocities between pauses. Histograms of the velocity in a single-molecule record display two distinct peaks: one centered at a characteristic run velocity, and one centered at 0 bp/s, indicative of the paused state (see Neuman, et al, 2004) . Each enzyme exhibited its own characteristic run velocity and was never observed to switch to another velocity state. The figure shows the distribution of all individual enzyme run velocities obtained from single-record histograms, as just described (N = 114), with statistical errors indicated. The distribution was fit by a Gaussian with 11.9 ± 6.2 bp/s (mean ± std. dev.) This heterogeneity is consistent with previous single-molecule studies (Neuman et al., 2003). S11 Herbert et al. 2006 SUPPLEMENTAL DATA RNAP pausing on repeat templates The Intrinsic Lifetime of the Pause State is Conserved Figure S10. Pooled dwell-time distribution for all low-efficiency pauses. Histogram of dwell times for all pauses derived from flanking regions outside the labeled pauses: these template regions are indicated as black bars in Figure 4 of the main text, and constitute 79% of the overall sequence of the repeat region. The distribution was fit by a double exponential with time constants, τ1 and τ2, accounting for 80% and 20% of the pauses, respectively (see legend). The distribution is dominated by the short, 1.2 s time constant, in contrast to the histogram compiled from all pauses in the entire template (see, for example, (Neuman et al., 2003). The preponderance of short pauses is consistent with a prediction of the model (Figure 6A, main text) that low efficiency pauses sites would be comparatively less prone to reentry into the pause state, and therefore display apparent lifetimes that closely approximate the true lifetime. The residual number of long pauses may arise from a small fraction of pauses at labeled sites (colored bars in Figure 4, main text) being misidentified, due to incorrect sequence alignment. Supplemental References Abbondanzieri, E. A., Greenleaf, W. J., Shaevitz, J. W., Landick, R., and Block, S. M. (2005). Direct observation of base-pair stepping by RNA polymerase. Nature 438, 460-465. Chan, C. L., and Landick, R. (1989). The Salmonella typhimurium his operon leader region contains an RNA hairpin-dependent transcription pause site. Mechanistic implications of the effect on pausing of altered RNA hairpins. J Biol Chem 264, 20796-20804. Neuman, K. C., Abbondanzieri, E. A., Landick, R., Gelles, J., and Block, S. M. (2003). Ubiquitous transcriptional pausing is independent of RNA polymerase backtracking. Cell 115, 437-447. Yanagi, K., Prive, G. G., and Dickerson, R. E. (1991). Analysis of local helix geometry in three B-DNA decamers and eight dodecamers. J Mol Biol 217, 201-214. S12