- PINE Server

Transcription

- PINE Server
ARECA@NMRFAM
sftp://chianina/raid/data/mani/website/ARECA/m...
ARECA Server
Validation of Protein NMR Chemical Shift
Assignments against NOE Data
Home | Examples | NOESY Contacts Probabilities | Manual | About | NMRFAM
Contents:
1.
2.
Section.0 Applications of ARECA
Section.1 Inputs
1. Input files
2. How to prepare the input files
1. Option A: Peak lists
1. Chemical shift assignments
2. NOESY peak lists
3. Optimizing ARECA's calculation
2. Option B: NOESY spectra
1. Packed input file
2. Optimizing ARECA's calculation
3.
Section 2. Outputs
1.
2.
3.
4.
4.
2.1.
2.2.
2.3.
2.4.
Short report via email
Simple report
Comprehensive report
NOESY peak lists
Section 3. FAQ
1. How to interpret ARECA's probabilities
1. Truth model
2. Overall Probability of a proton
3. Overall Probability of a heavy atom
2. What are these erroneous atoms?
3. What does is it mean when the percentage of erroneous atoms is higher than 5%
4. How to find the atoms with low probabilities?
5. How to find the reasoning behind a probability?
6. Who to contact for further questions and comments?
Section.0 Applications of ARECA
1 of 9
04/27/2015 04:34 PM
ARECA@NMRFAM
2 of 9
sftp://chianina/raid/data/mani/website/ARECA/m...
04/27/2015 04:34 PM
ARECA@NMRFAM
sftp://chianina/raid/data/mani/website/ARECA/m...
[TOP]
Section 1. Inputs
1.1. Input files
To validate the chemical shift assignments ARECA uses:
1. 1. Chemical shift (CS) assignments.
2. 2. At least one of the common NOESY experiments:
1. 15N-NOESY
2. 13C-NOESY
3. 13C-NOESY (Aromatic)
4. 13C-NOESY (D2O)
[TOP]
1.2. How to prepare the input files
There are two options for preparing the input files:
1.2.1. Peak lists
To use NOESY peak lists, you need to prepare the chemical shift assignments and NOESY peak list in one of the
following ways.
1.2.1.1. Chemical shift assignments:
3 of 9
04/27/2015 04:34 PM
ARECA@NMRFAM
sftp://chianina/raid/data/mani/website/ARECA/m...
You can prepare the chemical shift assignmens in either the BMRB NMR-STAR or XEASY formats
1. BMRB NMR-STAR formats: extensive descriptions of NMR-STAR 2.1 and 3.1 could be found here
1. NMR-STAR 2.1:
The assignment file should start with a header that explains the format in which the assignments are stored
in. ARECA explicitly looks for the following tags:
_Residue_seq_code
_Residue_label
_Atom_name
_Chem_shift_value
2. NMR-STAR 3.1:
The assignment file should start with a header that explains the format in which the assignments are stored
in. ARECA explicitly looks for the following tags:
_Atom_chem_shift.Seq_ID
_Atom_chem_shift.Comp_ID
_Atom_chem_shift.Atom_ID
_Atom_chem_shift.Val
2. XEASY format: XEASY prot file requires an additional file that shows the sequence of the protein. The following
formats are needed to run ARECA:
1. Assignment .prot file:
Col1: assignment index
Col2: chemical shift
Col3: error estimate
Col4: atom name
Col5: residue index
example:
1 62.755 0.000 CA 2
2 34.367 0.000 CB 3
3 7.689 0.000 H 3
4 107.846 0.000 N 3
2. Sequence (3-letter-code with indices):
Col1: residue three letters
Col2: residue index
example:
GLY 1
SER 2
LYS 3
[TOP]
1.2.1.2. NOESY peak lists:
Peak lists could be in SPARKY or XEASY formats, where the SPARKY file starts with a header usually Assignment w1
w2 w3 Data Height and the XEASY files contain peak information as follows:
# Number of dimensions 3
#INAME 1 13C
#INAME 2 H1
#INAME 3 1H
4525 119.578 4.684 7.067 2 U 8.400e+02 0.00e+00 m 0 0 0 0 0
4526 119.578 3.206 7.067 2 U 1.201e+03 0.00e+00 m 0 0 0 0 0
4527 119.578 3.092 7.067 2 U 1.399e+03 0.00e+00 m 0 0 0 0 0
[TOP]
1.2.1.3. Optimizing ARECA's calculation
As indicated in the Section.0 Applications of ARECA, the calculation of probabilities in ARECA can narrow down to
specific NOE contacts. In this section you can optimize your calculation according to your NOESY data.
[TOP]
1.2.2. NOESY spectra
Packed input file
Use the PONDEROSA Client program for preparing a 'packed-input.txt' file.
4 of 9
04/27/2015 04:34 PM
ARECA@NMRFAM
sftp://chianina/raid/data/mani/website/ARECA/m...
This client performs an advanced 'restricted' peak picking and generates one compact file (packed-input.txt). This file
contains the CS assignments and the NOESY peaks lists and is ready to be submitted to the ARECA's website.
[TOP]
Section 2. Outputs
For the complete list of the output files check out ARECA's publication. Here we describe some of them:
2.1. Short report via email
As soon as ARECA finishes the validation process, an email will be sent to you with the following information:
1. A short report: In this report you will see
1. Number of missing assignments (residues): This number indicates the number of residues without any
chemical shift assignments.
2. Number of missing assignments (atoms): Number of atoms with missing chemical shift assignments.
3. Number of missing NOESY strips: For every heavy atom and its directly attached proton, ARECA looks
for peaks in the given NOESY peak lists within a tolerance (heavy atom 0.4ppm, proton 0.03ppm, entire
HNOE dimension), and when ARECA cannot find any peak then the heavy atom will be marked as
‘missing NOESY strip’. The ‘Number of missing NOESY strips’ shows the number of such
heavy atoms.
4. Number of erroneous atoms:ARECA calculates a probability of correctness for every atom. The atoms
with the probability lower than 50% are considered erroneous. More information about these probabilities
(Section 3.1.).
5. Percentage of erroneous atoms: This number is the ratio of a.4 over the total number of assigned atoms.
Note: when this percentage is more than 5%, it means the assignments are not consistent with the NOESY
spectra and the spectra should be investigated manually. When this number is less than 5%, it means there
are some incorrect assignments that should be reconsidered. These incorrect assignments could be found
(Section 3.4.).
2. An URL to a compressed file that contains the complete report.
3. An URL to a figure that shows the ‘Overall backbone heavy atoms probabilities’ in bar-plot. In this plot
the residues are distributed on the x-axis and y-axis shows the probabilities.
4. An URL to a figure that shows the ‘Overall protons probabilities’ in bar-plot. In this plot the residues are
distributed on the x-axis and y-axis shows the probabilities.
2.2. Simple report
This report is in xml format and for every residue it shows the atoms with low probability (less than 0.5) or missing
assignments.
2.3. Comprehensive report
This report in pdf format contains all the necessary information for recalculating the probabilities and investigating
the reasoning behind them.
2.4. NOESY peak lists
ARECA maps the the given assignments onto the NOESY peak lists which will be reported along with the calculated
probability for every assignment. These probabilities are listed under the 'Note' column of the Sparky peak list table
(two letter code 'lt'). When ARECA's extension in NMRFAM-SPARKY is used to load the peak lists, it will
automatically color the peaks (acceptable assignmens: green and blue; suspecious assignments: red, yellow).
Section 3. FAQ
3.1. How to interpret ARECA's probabilities
In this section we explain the meaning of ARECA’s probabilities, and then different representations of these
probabilities are discussed.
3.1.1. Truth model
5 of 9
04/27/2015 04:34 PM
ARECA@NMRFAM
sftp://chianina/raid/data/mani/website/ARECA/m...
The truth model indicates the probability of observing a NOESY contact (a cross-peak) between every two protons of
an amino acid (intra-residue contacts) and also this probability between two protons of two sequentially adjacent
amino acids (inter-residue contacts). These probabilities are explained in ARECA’s manuscript and could be
found on “NOESY Contacts Probabilities (NCP)†server.
3.1.2. Overall Probability of a proton
For a proton, the truth model provides a list of heavy atoms that their directly attached protons are expected to have
a NOESY contact with the proton. These expected NOESY contacts have a probability higher than 95% in the truth
model. For each of these expected NOESY contacts, ARECA looks for a peak in the given NOESY peak lists and
assigns a probability to the contact. This probability is 1 of there is peak with ppm differences less than 0.03ppm and
is zero for differences higher than 0.04ppm. The probabilities for other ppm-differences between these cutoffs are
calculated based on a linear function that is indicated in Fig 3.1.f1.
Fig 3.1.f1. Probability function for of observing an expected NOESY contact in the given NOESY peak list.
The x-axis shows the minimum ppm differences between the expected position of a NOES peak and the
peaks in the NOESY peak lists.
Therefore, to calculate the overall probability of a proton, ARECA performs the following steps.
1. For a proton (H) extract all the intra- and inter-residue protons that are expected to form a NOESY contact with
it. Call the set of these protons (h).
2. For every proton in h
1. Find its directly attached heavy atom.
2. Find a set of peaks (from the given peak list) that are in tolerance of 0.4ppm of the heavy atom and
0.03ppm of the proton.
3. Calculate the minimum ppm differences between the H and the peaks in set of peaks.
4. Assign a probability according to Fig 3.1.f1.
3. Take average of the probabilities to calculate the overall intra- or inter-residue probabilities.
For an example we calculate the overall probability of E10HG2 of the protein HR8254A (CASD-NMR
[https://www.wenmr.eu/wenmr/casd-nmr-data-sets]). For this example the raw peak list is used. ARECA reported the
overall probability of 0.918 for this proton, and here we see the process step by step.
To calculate the intra- and inter-residue probabilities we consider the triplet Glu9-Glu10-Gln11.
1. Overall inter-residue probability of E10HG2 being observed with heavy atoms of Glu9(overall inter-residue
probability)
According to the truth model there is no expected NOESY contact between protons of Glu9 and the E10HG2.
The probabilities of the truth model could be found [http://pine.nmrfam.wisc.edu/NCP].Therefore, ARECA
expects no NOESY contacts and the overall inter-residue probability between E10HG2 and protons of Glu9 is 1.
2. Overall intra-residue probability of E10HG2 being observed with intra-residue heavy atoms(overall intra-residue
probability)
The probabilities of expecting NOESY contacts between intra-residue protons of Glu are shown in Table 3.1.T1.
Intra-residue CS Assignments Min. ppm differences Truth model probability ARECA’s probability
6 of 9
04/27/2015 04:34 PM
ARECA@NMRFAM
sftp://chianina/raid/data/mani/website/ARECA/m...
H
7.83
0.010
0.98
1
HA
3.92
0.017
0.98
1
HB2
2.00
0.019
1.00
1
HB3
2.32
0.442
0.98
0
HG2
2.34
0.001
1.00
1
HG3
1.92
0.003
0.98
1
HE
NA
-
0.03
-
Therefore, 5 of the 6 expected NOESY contacts could be confirmed with the given NOESY peak lists. Thus, the
overall intra-residue probability for E10HG2 is 0.83.
3. Overall inter-residue probability of E10HG2 being observed with heavy atoms of Gln11(overall inter-residue
probability)
Table 3.1.T2 shows the expected probabilities and minimum ppm differences between the chemical shift of
E10HG2 and peaks in the strip plots of heavy atoms of the Gln11.
Intra-residue CS Assignments Min. ppm differences Truth model Probability ARECA’s probability
H
8.608
0.016
0.99
1
HA
4.117
-
0.42
-
HB2
2.227
-
0.23
-
HB3
2.370
-
0.14
-
HE21
7.990
-
0.14
-
HE22
6.243
-
0.09
-
HG2
2.406
-
0.22
-
HG3
2.303
-
0.19
-
According to the truth model, ARECA expect to observe a NOESY cross-peak between the amide of Gln11 and
the E10HG2, and since the min ppm difference between the peaks in the strip plot of amide of Gln11 is less than
.03, ARECA assigned the overall inter-residue probability of E10HG2 being observed by heavy atoms of Q11 as
1.00.
The overall probability of E10HG2 is a weighted sum of these overall intra- and inter-residue probabilities. In this
sum the weight of intra-residue probability is 50% and the inter-residue probabilities 25%. Therefore, the overall
probability of E10HG2 is
0.25*1+0.5*0.83+0.25*1=0.915.
If this overall probability is less 0.50, ARECA flags the assignment of this atom as suspicious.
This overall probability is reported in the comprehensive report (pdf file), chapter 3, table (Chemical Shift
Distributions and Assignment Probabilities). The overall intra- and inter-residue probabilities are reported in the pdf
file table (Atom overall intra/inter-residue assignment probabilities).
The ppm differences could be found in the strip plots in the chapter3 of the comprehensive report.
3.1.3. Overall Probability of a heavy atom
To explain these probabilities, let’s assume we want to calculate the overall intra-residue probability of amide
nitrogen of an amino acid. The following steps show the process.
1. For a heavy atom (i.e. N) find its directly attached proton (i.e. H).
2. From the given chemical shift assignments, find the chemical shift of the directly attached proton.
3. From the truth model, find all the intra-residue protons that are expected to form a NOESY contact with the
directly attached proton of the heavy atom. Call the set of these protons h *.
4. From the given NOESY peak lists, extract the peaks that are in tolerance of 0.4ppm and 0.03ppm of the
chemical shifts of the heavy atom and its directly attached proton, respectively. Call the set of these peaks p.
5. For every proton (p) in h*, calculate the minimum ppm differences between p and peaks in p.
6. Calculate the probabilities of these differences using the probability function (Fig 3.1.f1.)
7. Assign the average of these probabilities as the probability of the overall intra-residue probability of the heavy
atom.
These steps will be used for other heavy atoms and inter-residue probabilities.
For example let us calculate the overall probability of the amide nitrogen of Arg70 from a triplet Ser69-Arg70-Ala71
from the protein HR8254A (CASD-NMR [https://www.wenmr.eu/wenmr/casd-nmr-data-sets]). In this example the raw
7 of 9
04/27/2015 04:34 PM
ARECA@NMRFAM
sftp://chianina/raid/data/mani/website/ARECA/m...
peak lists of the protein are used.
1. Amide nitrogen of Arg70 observing protons of Ser69 (overall inter-residue probability)
The probabilities of expecting a NOESY contact between protons of Ser69 and the amide proton of Arg70 are
shown in Table 3.2.T1 (“Truth model Probabilityâ€). The proton S69HG is not assigned; therefore ARECA
does not consider it in the calculation of the probabilities. However, this atom will be reported in as “a
missing assignmentâ€.
The assigned chemical shifts of the other three protons are shown in the table. The minimum ppm differences
(from step 5) are calculated and reported in the table as well.
The probability function (Fig 3.1.f1.) was used to assign a probability to each of differences (ARECA’s
probability). The overall probability of amide nitrogen observing protons of Ser69 is calculated by averaging
these probabilities (0.66).
Protons of Ser(i-1) CS assignments Min ppm differences Truth model probability ARECA’s probability
H
8.10
0.072
1.000
0
HA
4.34
0.015
1.000
1
HB
3.86
0.011
1.000
1
HG
NA
-
0.969
-
2. Amide nitrogen of Arg70 observing protons of Arg70 (overall intra-residue probability) In Table 3.2.T2 we report
the necessary values for calculating the overall intra-residue probability of amide nitrogen of Arg70. Since the
expectation probability of HD is less than 95%, ARECA does not consider this proton in its probability
calculation process. From other assigned and expected atoms two of them have the probability of 0.0 (ppm
difference more than 0.04). The average of the assigned probabilities is (1+1+.4)/5=0.48.
Intra-residue CS Assignments Min. ppm differences Truth model Probability ARECA’s probability
H
8.01
0.014
1.000
1
HA
4.27
0.050
0.995
0
HB2
1.84
0.036
0.977
0.4
HB3
1.72
0.090
0.993
0
HD
3.14
0.258
0.849
-
HE
NA
-
0.591
-
HG
1.60
0.007
0.977
1
HH#
NA
-
0.240
-
3. Amide nitrogen of Arg70 observing protons of Ala71 (overall inter-residue probability)
The same steps are followed to calculate the overall probability of amide nitrogen observing the protons of
Ala71.
Table 3.2.T3 shows the necessary information for calculating this probability. As indicated in the table, the only
Ala71 proton that is expected (truth model probability > 95%) to form a NOESY contact with the amide proton
of Arg70 is A71H. And since the minimum ppm difference for A71H is less than 0.03ppm ARECA’s
probability is 1.
Protons of Ala(i+1) CS Assignments Min ppm differences Truth model Probability ARECA’s probability
H
8.04
0.013
1.000
1
HA
4.23
0.072
0.888
-
HB
1.33
0.054
0.472
-
Next, to calculate the overall probability of R70N, we take a weighted sum of the calculated overall intra- and interresidue probabilities. For this sum, the weight for intra-residue probability is 50% and the weights for inter-residue
probabilities are 25%. Therefore, the overall probability of R70N is equal to
0.25*0.66 + 0.50*0.48 + 0.25*1 = 0.65
3.2. What are these erroneous atoms?
Flagged atoms are the ones with the overall probability lower than 0.50. When the overall probability is less the 50%,
it means more than half of the expected peaks could not be found in the NOESY spectra.
3.3. What does is it mean when the percentage of erroneous atoms is
higher than 5%
8 of 9
04/27/2015 04:34 PM
ARECA@NMRFAM
sftp://chianina/raid/data/mani/website/ARECA/m...
Most probably, there is something wrong with the NOESY peak lists; bad peak picking or low resolution spectra. I
guess you need to check the quality of your spectra/peak list. However, when the percentage is less than 5%,
probably there are some incorrect assignments and you should check the assignments.
3.4. How to find atoms overall probabilities?
If the overall probability is less than 50%, ARECA considers the assignment as an erroneous assignment. This overall
probability is reported in
1. From the comprehensive report (pdf file), Section 3, Table named “Chemical Shift Distributions and
Assignment Probabilitiesâ€
2. If the probability is lower than 50%: Short summary (xml file).
3. “Assigned NOESY Peaks†folder in the 15N- or 13C-NOESY peak list.
The overall intra- and inter-residue probabilities are reported in the comprehensive report (pdf file) in table
“Atom overall intra/inter-residue assignment probabilitiesâ€.
The ppm differences are shown in NOESY plots in the comprehensive report (pdf file), Section 3, simulated strip
plots.
3.5. How to find the atoms with low probabilities?
There are several ways to find these atoms. An easy way is to open the summary.xml file under Output/txt. If you
want to check them on the spectra, open the provided peak lists using the two letter code ‘ar’ in the NMRFAMSPARKY. This will load the peaks on the NOESY spectra and color them. The low probability assignments are colored
yellow and red. Or you can load ARECA’s peak lists with the ‘rp’ command (in NMRFAM-SPARKY) and
open the peak list with the ‘lt’ command. The probabilities of the assignments are shown under the
‘Note’ column.
3.6. How to find the reasoning behind a probability?
Follow the steps in Section 3. The necessary information are provided in the comprehensive report (pdf file) and the
NCP web-site.
3.6. Who to contact for further questions and comments?
Feel free to contact us: areca.nmrfam @ gmail.com
NMRFAM©
9 of 9
04/27/2015 04:34 PM