PDF - WashU Epigenome Browser

Transcription

PDF - WashU Epigenome Browser
Everything can be found at
epigenomegateway.wustl.edu
REFERENCES
1.
Zhou X, et al., Nature Methods 8, 989-990 (2011)
2.
Zhou X & Wang T, Current Protocols in Bioinformatics Unit 10.10
(2012)
3.
Zhou X, et al., Nature Methods 10, 375-376 (2013)
4.
Zhou X, et al., Bioinformatics 30, 2206-2207 (2014)
5.
Zhou X, et al., Nature Biotechnology 33, 345-346 (2015)
FUNDING
NIH 5U01ES017154, NIH R01ES024992, NIH R01HG007175,
NIH R01HG007354, NIDA DA027995, RSG-14-049-01-DMC,
U01CA200060
LATEST DEVELOPMENT
Google+: epigenomegateway.wustl.edu/+
Facebook: epigenomegateway.wustl.edu/fb
Twitter: @wuepgg
SUPPORT
epigenomegateway.wustl.edu/support/
CONTACT US Lab: wang.wustl.edu
V: 1/2016
Authors: Xin Zhou, Daofeng Li, Deepak Purushotham, Nicole Rockweiler, Renee Sears,
Joseph Costello, Ting Wang
Cover art: Ting Wang
Copyright © WashU EpiGenome Browser 2010-2016
WASHU EPIGENOME BROWSER
2016
epigenomegateway.wustl.edu
BROWSER MAP
3 14 15 16 17 18 19 20 21 22 23 24
5
1
8
9
10
12
6
11
7
13
Key
1
= Go to this page number to learn about the browser feature
2
4
TABLE OF CONTENTS
BROWSER
FEATURES
BROWSER
TRACKS
DATA
MANAGEMENT
APPS
&
FUNCTIONS
1.
Navigation
2.
Tracks
3.
Apps
4.
Metadata
5.
Metadata Heatmap
6.
Genes
7.
Repetitive Elements
8.
Numerical Tracks
9.
Matplots
10.
MethylC Track
11.
Genome Comparison
12.
Long-Range Interaction
13.
SNPs and LD
14.
File Upload
15.
Datahub
16.
Screenshot
17.
Session
18.
Gene & Region Set View
19.
Split Panel
20.
Juxtaposition
21.
Genome Snapshot
22.
Find Orthologs
23.
Scatter Plot
24.
Gene Plot
25.
Roadmap EpiGenome Browser
Browser Features: Navigation
1
Click to zoom in.
Click to zoom out.
Click to show options.
Click to scroll.
Chromosome
ideogram.
Scale
Coordinate ruler.
Drag on
ruler to
zoom in.
Drag on track
to scroll
Chromosome
ideogram of region.
Enter coordinates to jump to a region.
Enter a gene name to jump to a gene.
Coordinate string can be one of:
Multiple gene models may be shown for
a gene. Choose one gene model to
jump to its location.
1.
2.
3.
4.
chr9:1234-5678 a region.
chr9:123456 a single base.
chr9 a chromosome to jump
to the middle of that
chromosome.
1234-5678 coordinates
without a chromosome name
to jump to this region on the
current chromosome.
Enter the reference SNP cluster ID
(rsID) to jump to a specific SNP.
At fine resolution, the
chromosome ideogram is
replaced by the DNA sequence.
Browser Features: Tracks
2
A browser track is a visualization of a dataset along a genome. Examples of browser
tracks include gene model annotation tracks and RNA-seq expression tracks.
Tracks
Click to find browser tracks.
Click the box labeled with total track count
to access all available experimental assay
tracks from the interactive facet table.
Search any track by
keywords. Join multiple
keywords by “AND."
Access annotation tracks such
as genes.
Click a button to submit a custom track.
8
6
11
12
15
Click “Reference human epigenomes from
Roadmap Epigenomics Consortium” and
then the “Load” button to load the Roadmap
Epigenomics dataset.
The numbers indicate the tracks
available for each sample+assay
combination (green), and the tracks that
are currently shown in the browser (red).
Click a table cell to show a list of
available tracks for a sample+assay
combination.
Show available public track hubs to
load tracks from projects including
Roadmap Epigenomics and ENCODE.
3
1
Browser Features: Apps
A browser app is a self-contained program for executing a specific task. Examples of
browser apps include uploading files and taking screenshots.
Apps
Click to find browser apps.
Find apps by name.
Apps will appear as you type.
Show all apps.
Recently used apps.
14
18
21
23
Frequently used apps.
19
17
16
Apps usually appear as transparent panels on top of the
browser and are used in the context of browser visualization.
You never have to leave the browser to use an app.
Drag the app name
banner to move the panel.
Close this app.
Get help on this app.
Browser Features: Metadata
4
Metadata are vocabularies for
annotating tracks with experimental and
sample information. Terms in a
vocabulary are organized in a
hierarchical structure. The same
vocabulary can be used across
datasets to facilitate data integration.
To load metadata vocabularies
available for the human genome,
load the public datahub for the
Roadmap Epigenomics project. The
metadata annotation for a track can
be viewed by right-clicking a track
and selecting “Information."
Internal
metadata
To view loaded metadata vocabularies,
right-click on the metadata heatmap
header.
5
Metadata
vocabularies.
Once a metadata vocabulary has been
loaded, its terms can be searched by
keyword. Results include the term id for
each found term. The term id can be
used to annotate tracks in a datahub.
Wiki
Learn more about how to define metadata vocabularies and annotate
tracks at http://wiki.wubrowse.org/Metadata.
Browser Features: Metadata Heatmap
5
A metadata heatmap with two
metadata terms.
Tracks 1, 2, and 3 share the
same “sample” attribute
(IMR90 cells) and thus share
the same color.
Track 1
Tracks 1, 2, and 3 are each
annotated by a different “assay”
attribute (H3K4me3, H3K4me1,
H3K27me3) and thus are
colored differently.
Track 2
Track 3
Track 4 is not annotated by
“sample” or “assay” attributes
so is shown in gray.
Track 4
Chromosome ideogram.
Track 5 is below the
chromosome ideogram and
thus is not shown in the
metadata heatmap.
Track 5
To add or remove a track from the metadata heatmap, drag the track name above or
below the chromosome ideogram.
To search for new terms to be added to the
metadata heatmap, right-click the term name and
then open the Metadata term finder app by
clicking “Add metadata terms."
The source metadata vocabulary. 4
Click to show this term in the heatmap.
Browser Tracks: Genes
UTR.
Exon.
6
Intron. Arrows indicate direction of transcription.
Transcription
start site.
Gene symbol.
Link to NCBI
Nucleotide database.
Gene body coordinates,
orientation, and length.
Gene name.
Coordinates of
exons and UTRs.
The human RefSeq gene track for HOXA3 is shown above. The tooltip bubble
displays information on the HOXA3 gene.
Multiple gene tracks are usually available
for a genome. To find other gene tracks,
go to “Tracks” > “Annotation tracks” >
“Genes” or search by keyword “gene” in
“Tracks."
When a gene is partially visible in the
browser, click this gene to display the
entire gene model in the tooltip bubble.
The visible section is marked by a yellow
box.
Right-click on the
gene track (and any
other tracks) for the
configuration
menu.
Display modes.
Configure rendering style.
20 View only genes.
Show metadata.
Link to Wiki.
Wiki
The gene track is based on the “hammock” track format, which can be
displayed as a custom track.
Learn more at http://wiki.wubrowse.org/Hammock.
Browser Tracks: Repetitive Elements
7
The RepeatMasker and RepeatMasker slim tracks show all repetitive elements in the
genome. Repetitive elements and transposons are predicted by the RepeatMasker
software (http://www.repeatmasker.org/).
To add the RepeatMasker track, go to “Tracks” >
“Annotation tracks” > “RepeatMasker” >
“RepeatMasker." The track is also available as
RepeatMasker slim, a simplified version of the
RepeatMasker track.
Full mode
Elements are shown as boxes, transparency reflects the 1-divergence% score of
each element. More transparent elements have greater divergence.
Bar plot mode
Elements are packed tightly into a single row with bars on
top indicating 1-divergence% scores.
The elements are colored by class. To view
the list of classes, right-click the
RepeatMasker track and click “Configure.”
The user can choose which type of score to
show for the repetitive elements using the
configuration menu.
Hover over a specific element to display the
element’s score, class, name, and genomic
position.
The user can also choose to show elements
from a specific class or family in the
Annotation tracks menu.
Wiki
The repetitive element track is based on the “hammock” track format.
Learn more at http://wiki.wubrowse.org/Hammock.
Browser Tracks: Numerical Tracks
8
A numerical track displays a series of quantitative values along the genome as a
highly customizable graph. When the track height is small, the track is shown as a
heatmap, otherwise it is shown as a bar plot.
Bar plot (track height ≥ 20 pixels)
Heatmap (track height < 20 pixels)
Positive and negative values are
rendered using different colors.
The default y-axis scale is an automatic
scale which can be changed into a fixed
or percentile scale using the configuration
menu. Bars with values beyond a set
threshold are indicated with a different
color on the peaks.
Bar plot shape can be smoothed using the configuration menu.
A background can be applied to bar plots to distinguish regions with no data from
those with low data values.
No background
With background
Missing values are labelled as “No data” on
the tooltip for bedGraph format tracks
(not applicable for bigWig format tracks).
Wiki
Learn more about the supported numerical track formats bedGraph (http:
//wiki.wubrowse.org/bedgraph) and bigWig (http://wiki.wubrowse.org/bigwig).
Browser Tracks: Matplots
9
A matplot (also called a line plot) displays multiple numerical tracks on the same X
and Y axes to easily compare datasets. Data is plotted as curves instead of bar plots.
Matplots can be created while browsing:
Method 1:
1.
2.
Hold shift and click on track names
to select multiple numerical tracks.
(Track names will be highlighted in
yellow.)
Right-click on the selected tracks
and select “Apply matplot."
Method 2:
Right-click on a colored box in the metadata
heatmap and select “Apply matplot” to
convert a group of tracks sharing the same
metadata attributes into a matplot.
Colors of member tracks in a matplot can be
individually configured using the configuration
menu.
To cancel a matplot, right-click on the track and
select “Cancel matplot." The matplot will be
replaced by individually displayed member
tracks.
Wiki
Matplots can be defined in datahub.
Learn more at http://wiki.wubrowse.org/matplot.
Browser Tracks: MethylC Track
10
The methylC track1 is designed to display DNA methylation data from whole-genome
bisulfite sequencing experiments. It distinguishes cytosine methylation levels (as bar
plots) on separate strands and in different sequence contexts and integrates
sequencing read depth (as curves) as a measure of confidence.
The color legend for a methylC track can be viewed
using its configuration menu. All colors are
configurable by clicking on the color boxes.
To filter methylation data by read depth, in the configuration menu, select “Filter by
read depth," enter a threshold, and click “Apply."
No filtering
Filtered by read depth value 5
To combine the forward and reverse strands, in the configuration menu, select
“Combine two strands."
To scale the methylation level bar plots by read depth, in the configuration menu,
select “Scale bar height by read depth." The y-axis value will now represent the
read depth.
Wiki
1
Learn more about MethylC tracks at http://wiki.wubrowse.org/MethylC_track.
Zhou X, et al., Bioinformatics 30, 2206-2207 (2014)
Browser Tracks: Genome Comparison
11
The genome comparison track visualizes pairwise alignments of two genomes allowing
for comparison at fine (base pair) or large (megabase) scale. Alignment is unbiased with
gaps in both the query and target genomes.
To add the genome comparison track,
go to “Tracks” > “Annotation tracks” >
“Genome comparison."
Many pre-built genome comparison
tracks are available.
Annotation tracks for either species can
now be loaded using the tracks panel.
Human HOXC10.
3 bp gap on the human genome.
Human genome
as target.
sequence
|||||||||||
alignment
Mouse Hoxc10.
2 bp gap on the
mouse genome.
Mouse genome
as query.
At 10 bp/pixel resolution, the browser will transition from individual alignment blocks to
a joined alignment block.
Individual alignment blocks.
Joined alignment block.
Complex genome rearrangements can be visualized by observing synteny blocks.
Wiki
Learn more at http://wiki.wubrowse.org/Genome_alignment.
Browser Tracks: Long-Range Interactions
12
Long-range chromatin interaction experiments can be accessed through public track hubs1.
Human HOXA gene cluster.
Hi-C data from IMR90 cells shown as
a heatmap.
ChIA-PET data from K562 cells
shown as arcs.
Highlights:
1.
Supports pairwise chromatin interaction results from Hi-C, 5C, and ChIA-PET.
2.
Multiple display modes: heatmaps, arcs, and joined-boxes (full display).
3.
Visualizes interactions from distant regions and different chromosomes.
4.
The Circlet view visualizes global interactions.
Hi-C data from
IMR90 cells shown
as joined-boxes.
Wiki
1
Learn more at http://wiki.wubrowse.org/Long-range.
Zhou X, et al., Nature Methods 10, 375-376 (2013)
Browser Tracks: SNPs and LD
13
Human dbSNP release 137.
Human HapMap LD
Han Chinese (CHB).
SNP and LD annotation tracks are available for human genomes. By default, the LD
scoring system is D’. The correlation coefficient (R square) or LOD can be displayed
using the configuration menu.
These tracks can be found in the “Population variation” group of the annotation track
panel. To search for a SNP, type the reference SNP cluster ID (rsID) into the search
bar and click “Find SNP." SNPs are colored by class.
Wiki
The SNP and LD tracks are based on the “hammock” track format.
Learn more at http://wiki.wubrowse.org/Hammock.
Data Management: File Upload
Use the File Upload app to upload data
from a text file.
Click the “Choose files(s)” button to
select one or more unzipped text files
from the computer for upload.
Selected files appear as
boxes.
To prepare a file for upload:
1.
Click the “Setup” button.
2.
Inspect the content of the first 10 lines.
3.
Select the appropriate file format.
4.
Click “add as Track” to load this file as a custom track or click
“add as Set” to load this file as a gene set. The gene set option
is limited to 100 items per set.
14
Data Management: Datahub
15
A datahub is a collection of data from multiple sources.
An example datahub.
[
# this hub contains only one track
{
type:"bedgraph",
url:"http://vizhub.wustl.edu/hubSample/hg19/GSM432686.gz",
name:"my track",
mode:"show",
colorpositive:"#ff33cc",
height:50,
},
]
Highlights:
1.
Batch uploading of many tracks at the same time.
2.
Custom track information is preserved in a datahub.
3.
Tracks in a datahub can come from different servers.
4.
Track rendering style can be customized.
5.
Tracks can be annotated by metadata.
A datahub is written in JSON text. The JSON content of a datahub can be
validated by the browser. Search for the “Validate datahub” app to run validation.
Use the datahub app to upload a datahub to the browser.
A datahub file can be either hosted on the Web or saved locally.
If the datahub is hosted on the Web, it can be referenced by the browser through
the URL parameter. In this way, you can bookmark the parameterized browser
link for quick reference or sharing.
http://epigenomegateway.wustl.edu/browser/?genome=hg19&datahub=http:
//vizhub.wustl.edu/hubSample/hg19/hub.json
Dissecting the browser URL parameters.
browser URL
Wiki
?genome=
genome
identifier
&datahub=
datahub URL
Learn more about datahubs (http://wiki.wubrowse.org/Datahub) and
URL parameters (http://wiki.wubrowse.org/URL_parameter).
Data Management: Screenshot
Use the Screenshot app in the Apps menu to save
images of the current genomic view.
The “Screenshot” app will convert the browser contents to an SVG file. The SVG
file is a high-quality vector-based graphics file. In addition, a PDF file will also be
created.
To take a screenshot, click the “Take screenshot” button. Links to both the SVG
and PDF files will be displayed.
Click either link and the file will be shown on the browser. From either page you
can save the file to your computer.
16
17
Data Management: Session
Use the Session app in the Apps menu to save the
current browser status including tracks, view range,
and customization for later viewing.
To save a session, click the “Save” button. Enter a
name for this session (optional). A link to the saved
session will appear. Alternatively, the user can
download the session as a JSON datahub file.
Multiple session names
can be saved under one
session ID.
A link is generated for each session name.
A session can be recovered in three ways:
1.
Save the generated link and simply use this link to reload the session.
2.
Upload a saved JSON datahub file by clicking the “Upload” button in the “Sessions”
app.
3.
Copy the unique session ID and paste this into the “Retrieve” box in the “Sessions”
app.
Sessions and datahubs only record information about tracks; they do not save actual track data.
If the track file has been moved, the browser won’t be able to recover that track from the
session or datahub.
Apps: Gene & Region Set View
18
Use the Gene & region set app to show track data over a set of genes or regions. The
“Gene & region set” app enables track data to be displayed over regions that are not
adjacent on a chromosome or even on different chromosomes.
Red: 2.5 KB
downstream of TSS.
Green: 2.5 KB
upstream of TSS.
Middle: gene transcription start site (TSS).
The user can create a set of genes or regions of
interest.
Gene and region sets can be submitted in three
ways:
1. By pasting a list of gene names or genomic
coordinates in the “Gene & region set” app.
2. By file upload in the “Gene & region set” app.
3. By using predefined KEGG pathways.
The user can specify custom flanking regions (up to
5 KB on each side) surrounding the gene
transcriptional start sites to focus on the gene
promoters.
“Gene set view” can be applied to see all regions in
one browser view. To quit the gene set view, click
the red button next to the zoom buttons near the top
of the browser.
Apps: Split Panel
19
Use the Split panel app to “split” the browser panel in two. The order of the tracks
remains the same in both of the panels but the panels can be separately scrolled and
zoomed. This allows the user to easily explore data patterns of the same set of tracks
across two different genomic locations.
Main panel navigation buttons.
HOXA
gene cluster
Main panel
Split panel navigation buttons.
HOXB
gene cluster
Split panel
When splitting the browser panel, the browser inserts a blank panel to the right of the
existing panel.
Click the “SELECT VIEW RANGE”
button to choose a view range for
the new panel.
Features: Juxtaposition
Use the juxtaposition function to focus on data over a subset of the genome.
After juxtaposing on RefSeq genes, intergenic regions are hidden, and only data
over gene bodies are shown. When running juxtaposition, the browser can be
zoomed and scrolled as normal. The juxtaposition function is applicable for other
types of positional annotation data in addition to genes.
To run juxtaposition, right-click on a gene or annotation track and click
“Juxtapose." To quit the juxtaposition view, right-click on any gene or annotation
track and click “Undo juxtaposition."
Normal view
Juxtaposition view
Genomic view with intergenic regions included.
Genomic view with intergenic regions removed.
In the following example, juxtaposition reveals enhancer signatures (H3K4me1)
over several LTR elements that are otherwise hard to see.
20
21
Apps: Genome Snapshot
Use the Genome snapshot app in the Apps menu to visualize
the genome-wide profile for numerical tracks over all
chromosomes.
Numerical tracks from the browser can be added to the genome snapshot using the “Add
track” button. Above, H3K4me1 and H3K4me3 of IMR90 cells are shown. Just like browser
tracks, track styles can be customized when the user right clicks. The global view can be
changed using the “Configure” button. Lastly, a snapshot of all chromosomes can be captured
and saved using the “Screenshot” button.
Apps: Find Orthologs
22
Use the Find orthologs app to identify highly similar genomic regions from the
query genome for a set of target genomic regions based on the information in a
genome-comparison track.
To find orthologs:
1.
Display a genome-comparison track by
selecting “Tracks” > “Annotation Tracks” >
“Genome comparison”.
2.
Create a gene set for the target genome.
3.
Send this gene set to the “Find orthologs” app.
4.
Click the “find orthologs” button.
5.
View or export the result.
25
For each target region, the most similar region is found from the query genome
based on the data in the genome alignment track. In the resulting output, the
target genome regions are ranked by length in descending order. Each pair of
aligned regions is graphically rendered.
Target gene name. Line width
indicates relative region length.
Target genome gene model.
Sequence
alignment.
Target genome coordinate.
Query genome gene name, gene
model, and coordinate.
23
Apps: Scatter Plot
Use the Scatter plot app in the Apps menu to assess the
relationship between two numerical tracks over a gene set.
Choose a pre-made gene set. Choose two
numerical tracks for the x- and y- axes and
click “SUBMIT."
Anti-correlation between
H3K4me3 (active mark) and
H3K9me3 (repressive mark) in
human IMR90 cells over a list
of regions.
The plot is interactive. Mouse
over each datapoint for
information.
Click a datapoint to show its
corresponding region in the
browser.
Customization options.
In making the scatter plot, the average value over each item in the gene set is
calculated for both numerical tracks.
Apps: Gene Plot
24
Use the Gene plot app to explore the data variation and distribution of a numerical
track with respect to a group of genes or regions of interest. The gene set needs to be
loaded using the “Gene & region set” app before using the “Gene plot" app. Search
for the “Gene plot” app in the Apps menu.
Choose a gene set.
Select the data to be plotted.
Four plots (box plot, matplot, gene part
plot, and clustering) are available and
each is fully customizable.
Plots can be rendered in either R or
Google Charts.
Gene transcription start sites.
2.5 KB downstream of TSS.
2.5 KB upstream of TSS.
The above boxplot shows the IMR90 H3K4me3 signal distribution over 5 KB regions
centered on the transcription start site of 100 random human genes. Data from each
region is evenly summarized into 100 data points, and a boxplot is shown over each
summary point to indicate the data distribution. Outliers are hidden in the boxplot.
Individual curves for each item.
Profile over gene features.
Hierarchical clustering.
Roadmap EpiGenome Browser
25
The Roadmap EpiGenome Browser1 is built on top of the WashU EpiGenome
Browser to serve as a point-of-access to explore and analyze comprehensive
epigenomics data generated by the Roadmap Project.
Highlights:
1.
Access tens of thousands of epigenomic assays with a few clicks.
2.
Applies real-time data clustering to reveal cell type-specificity of
epigenetic marks.
3.
Reveals covariations of epigenomic profiles and gene expression.
4.
Integrates datasets from Roadmap Epigenomics and ENCODE projects.
5.
Supports cross-species epigenome comparison (human and mouse).
Visit the browser at: http://epigenomegateway.wustl.edu/browser/roadmap
Select a genome (human hg19 and/or
mouse mm9) and click “Load” to continue.
The browser starts loading information on the
samples.
Click the Navigation box to choose a view range.
Select an assay type to launch the
browser.
Browser
options.
Navigation
options.
Hierarchical
clustering.
High H3K4me3
appears over the
CHRNA7 promoter
in tissues including
brain, but not blood
cells.
Sample names.
RefSeq genes.
1
Zhou X, et al., Nature Biotechnology 33, 345-346 (2015)
Roadmap EpiGenome Browser
1 Epigenetic annotation of genetic variants
Multiple sclerosis-associated noncoding SNPs are annotated using epigenomic and
expression data. rs307896 marks an enhancer common across all displayed samples,
whereas rs756699 is located in an enhancer specific to immune cells.
2 Cross-species epigenome comparison
Human and mouse epigenomes can be compared over orthologous regions.
Human and mouse homologous genes found by the “Find orthologs" app. 22
Click an alignment to show epigenomes over the two regions.
Human
epigenome
samples
over the
human gene
Mouse
epigenome
samples
over the
mouse gene
25
Notes
Notes