NPS (Nucleosome
Positioning from Sequencing)
NPS is a python software package that
can identify nucleosome positions given histone-modification ChIP-seq or
nucleosome sequencing at the nucleosome level. NPS obtains continuous wave-form
that represents the enrichment of histone modifications (or nucleosomes) by
extending each tag (25nt, Solexa) to 150nt in the 3Õ direction and taking the
middle 75, and detects the positions of nucleosomes based on Laplacian of
Gaussian (LOG) edge detection. The p value of each detection was estimated
using Poisson approximation and the user can decide a cut-off for the final
selection of nucleosome positions. In case of histone modification, the
sequence tags are regrouped by different types of histone modification after
nucleosome positioning and then the p-value of a particular histone
modification at a positioned nucleosome was calculated based on the tag count
of that histone modification in the nucleosome region using Poisson
distribution, similar to the method mentioned above. The user also can select a
cut-off of p value in histone modification assignment.
Paper
Our paper has recently been accepted by
BMC Genomics. For details, please refer to
Zhang Y, Shin H, Song JS, Lei Y, Liu
XS. Identifying Positioned Nucleosomes with Epigenetic Marks in Human from
ChIP-Seq. BMC
Genomics 2008, 9:537. (see http://www.biomedcentral.com/1471-2164/9/537)
Our experimental results can be
obtained at http://liulab.dfci.harvard.edu/NPS/Result/.
Installation
The latest version of NPS is 1.0.3.2
and it is available as an open source (download). NPS is
developed using Python 2.5 and three Python packages, NumPy, RPy,
and Pywavelets, must be
installed along with R for the proper
running of NPS. RPy2 and R 2.7.1 are not compatible with the current version of
NPS.
NPS-1.0.3.2 provides Ôhg19Õ and Ômm9Õ
in addition to Ôhg18Õ and Ômm8.Õ
Usage
After downloading and unpacking the NPS package, go
to the folder and type
Ôpython SeqTag.py SeqTag.parÕ
on the command window to run NPS.
SeqTag.par is a text file that contains the self-explaining parameters needed
to be set for NPS. The following is an example parameter file, which was
actually used in our study.
Note that minimum peak width
(MIN_WIDTH) is recommended to be larger than tag extension (EXTENSION) for
estimating accurate pvalue. In the most recent version (NPS-1.0.2), when peaks
whose widths are shorter than extension are met, a warning is raised.
|
############################################################################# # Parameters
for NPS. # |
|
# # Input and
output files # INFILE =
HM_hg18.bed OUTFILE =
HM_hg18_peak.bed |
# input tag
file # output file
of identified nucleosome positions |
|
# # Preprocessing # SPENAME = hg18 EXTENSION = 75 SHIFT = 37 WANT_SORT = yes |
# species name # each Tag is
extended to 150nt and # the middle 75
nt is taken # yes: sorting
the tags, no: not sorting the tags |
|
# # Wavelet
denoising # WANT_DENOISE =
yes DECOMP_LEVEL =
2 WAVELET = coif4 THRESHOLD_EST =
heursure THRESHOLD_TYPE
= soft SCALE = mln |
# yes: doing
the wavelet denoising, no: skipping it # level 2
wavelet decomposition # use coflet4
for wavelet denoising # denoising
threshold selection using SURE # soft
thresholding # multi-level
denoising |
|
# # Peak finding # INTERVAL = 10 PVALUE = 1e-5 TAG_NUM = 186e6 LOG = 3 SLOPE = 2 MIN_WIDTH = 80 MAX_WIDTH = 250 MIN_HEIGHT = MAX_HEIGHT =
10000 PEAK_INFLECTION_RATIO
= 1.2 MED_WSIZE = 5 BIAS_RATIO = 4 |
# 10nt sampling
for efficient signal processing # p value
cut-off for identifying nucleosomes # the total tag
numbers # level of
Gaussian smoothing in peak detection # minimum peak
width # maximum peak
width # minimum peak
height # maximum peak
height # minimum peak
to inflection point ratio # median filter
window size # allowable ratio between + tags and
– tags |
Identified Peaks (Nucleosomes)
The output file (OUTFILE in *.par file) is a tab-delimited
file and contains the list of identified peaks (nucleosomes) as follows. The
first line of the output file is a header line that indicates each of the 5
columns. Note that the p-values of the identified peaks (nucleosomes) are given
as log P values (-10*log10(pvalue)).
|
chr |
start |
end |
name |
-10*log10(pvalue) |
|
chr21 |
9719971 |
9720101 |
nucleosome1 |
1.22E+02 |
|
chr21 |
9720591 |
9720671 |
nucleosome2 |
5.36E+01 |
|
chr21 |
9721111 |
9721191 |
nucleosome3 |
6.99E+01 |
|
chr21 |
9721341 |
9721421 |
nucleosome4 |
1.87E+02 |
|
chr21 |
9722171 |
9722251 |
nucleosome5 |
6.43E+01 |
|
chr21 |
9722311 |
9722391 |
nucleosome6 |
6.99E+01 |
|
chr21 |
9723031 |
9723111 |
nucleosome7 |
2.44E+02 |
|
chr21 |
9723531 |
9723631 |
nucleosome8 |
2.31E+02 |
|
chr21 |
9723971 |
9724051 |
nucleosome9 |
2.70E+02 |