NPS (Nucleosome
Positioning from Sequencing)
NPS is a python software package that
can identify nucleosome positions given histone-modification ChIP-seq or
nucleosome sequencing at the nucleosome level. NPS obtains continuous wave-form
that represents the enrichment of histone modifications (or nucleosomes) by
extending each tag (25nt, Solexa) to 150nt in the 3Õ direction and taking the
middle 75, and detects the positions of nucleosomes based on Laplacian of
Gaussian (LOG) edge detection. The p value of each detection was estimated
using Poisson approximation and the user can decide a cut-off for the final
selection of nucleosome positions. In case of histone modification, the
sequence tags are regrouped by different types of histone modification after
nucleosome positioning and then the p-value of a particular histone
modification at a positioned nucleosome was calculated based on the tag count
of that histone modification in the nucleosome region using Poisson
distribution, similar to the method mentioned above. The user also can select a
cut-off of p value in histone modification assignment.
Paper
Our paper has recently been accepted by
BMC Genomics. For details, please refer to
Zhang Y, Shin H, Song JS, Lei Y, Liu
XS. Identifying Positioned Nucleosomes with Epigenetic Marks in Human from
ChIP-Seq. BMC
Genomics 2008, 9:537. (see http://www.biomedcentral.com/1471-2164/9/537)
Our experimental results can be
obtained at http://liulab.dfci.harvard.edu/NPS/Result/.
Contact
If you have any further questions, please contact
Chenfei Wang (emailto: wcftongji@gmail.com).
Installation
The latest version of NPS is 1.3.2 and
it is available as an open source (download). NPS is
developed using Python 2.5 and three Python packages, NumPy and Pywavelets, must be installed.
Unlike the previous versions, NPS 1.3.2 does not need R and RPy to be
installed.
Usage
After downloading and unpacking the NPS package, go to
the folder and type on the command window as follows.
$ python SeqTag.py NPS.par
NPS.par is a text file that contains
the self-explaining parameters for NPS. The following is an example parameter
file, which was actually used in our study.
Note that minimum peak width
(MIN_WIDTH) is recommended to be larger than tag extension (EXTENSION);
otherwise, you will have a warning.
If you want to generate wiggle files of
nucleosome signals, type the following on the command window.
$ python SeqTagProcess.py NPS.par
However, you need to make sure that
your output file name must be changed from the NPS.par in peak detection;
otherwise, it will overwrite the peak file if it already exists in the same
folder. For example, you may want to use ÔHM_hg18.wigÕ instead of
ÔHM_hg18_peak.bedÕ for this purpose. Other parameters must be the same as the
NPS.par in the peak detection.
|
############################################################################# # Parameters
for NPS. # |
|
# # Input and output
files # INFILE =
HM_hg18.bed OUTFILE =
HM_hg18_peak.bed |
# input tag
file # output file
of identified nucleosome positions |
|
# # Preprocessing # SPENAME = hg18 EXTENSION = 75 SHIFT = 37 WANT_SORT = yes TAG_THR = 2 |
# species name # each Tag is
extended to 150nt and # the middle 75
nt is taken # yes: sorting
the tags, no: not sorting the tags # minimum tag
count to declare as a peak |
|
# # Wavelet
denoising # DECOMP_LEVEL =
2 WAVELET = coif4 THRESHOLD_EST =
heursure THRESHOLD_TYPE
= soft SCALE = mln |
# level 2
wavelet decomposition # use coflet4
for wavelet denoising # denoising
threshold selection using SURE # soft
thresholding # multi-level
denoising |
|
# # Peak finding # INTERVAL = 10 GENOME_LEN =
2.4e9 PVALUE = 1e-5 TAG_NUM = 186e6 LOG = 3 SLOPE = 2 MIN_WIDTH = 80 MAX_WIDTH = 250 MIN_HEIGHT = MAX_HEIGHT =
10000 PEAK_INFLECTION_RATIO
= 1.2 MED_WSIZE = 5 BIAS_RATIO = 4 |
# 10nt sampling
for efficient signal processing # the genome
length to estimate the background lambda # p value
cut-off for identifying nucleosomes # the total tag
numbers # level of
Gaussian smoothing in peak detection # minimum peak
width # maximum peak
width # minimum peak
height # maximum peak
height # minimum peak
to inflection point ratio # median filter
window size # allowable ratio between + tags and
– tags |
Identified Peaks (Nucleosomes)
The output file (OUTFILE in *.par file) is a tab-delimited
file and contains the list of identified peaks (nucleosomes) as follows. The
first line of the output file is a header line that indicates each of the 5
columns. Note that the p-values of the identified peaks (nucleosomes) are given
as log P values (-10*log10(pvalue)).
|
chr |
start |
end |
name |
-10*log10(pvalue) |
|
chr21 |
9719971 |
9720101 |
nucleosome1 |
1.22E+02 |
|
chr21 |
9720591 |
9720671 |
nucleosome2 |
5.36E+01 |
|
chr21 |
9721111 |
9721191 |
nucleosome3 |
6.99E+01 |
|
chr21 |
9721341 |
9721421 |
nucleosome4 |
1.87E+02 |
|
chr21 |
9722171 |
9722251 |
nucleosome5 |
6.43E+01 |
|
chr21 |
9722311 |
9722391 |
nucleosome6 |
6.99E+01 |
|
chr21 |
9723031 |
9723111 |
nucleosome7 |
2.44E+02 |
|
chr21 |
9723531 |
9723631 |
nucleosome8 |
2.31E+02 |
|
chr21 |
9723971 |
9724051 |
nucleosome9 |
2.70E+02 |
NPS Q&A - Please read this to understand how to use the
above parameters
1.
How do EXTENSION and SHIFT work? Do they depend on each other?
From the above example of parameters, SHIFT looks like the
half of EXTENSION, but this is just a coincidence.
Just getting to the point, the following is the equation to
calculate SHIFT given a DNA fragment size and EXTENSION.
SHIFT
= ( Fragment size - EXTENSION ) / 2
We will see how we obtained the above equation via an
illustration.
1) Suppose that the DNA fragment size is 150nt. The +/-
signs stand for sequence reads mapped on +/- strands, respectively. Ideally, +
and - reads from a single fragment form bimodal distributions, whose summits
are about 150nt apart.
Original
+
-
|----|
|----|
<-------------------------> 150nt (fragment size)
2) The next step is extending sequence reads in their 3` directions.
This increases the genome coverage, which helps to obtain the nucleosome
position signal. We recommend to use the half of the nucleosome size (75nt)
because, based on our analysis, this value provided the best nucleosome
positioning. Please refer to our paper for details.
Extension: extend tags to
75nt
|-------------|
|------------|
3) The final step is moving these extended +/- reads in their 3`
directions for them to overlap with each other, resulting in the nucleosom
position signal. According to the above equation, SHIFT = (150 - 75) / 2 ~= 37.
Shift: shift by 37
nt
|------------|
|------------|
Therefore, if your estimated fragment size is 200nt and you
still want to keep 75nt as EXTENSION, SHIFT will be (200 - 75) / 2 = 62nt.