NPS (Nucleosome Positioning from Sequencing)

 

Overview

NPS is a python software package that can identify nucleosome positions given histone-modification ChIP-seq or nucleosome sequencing at the nucleosome level. NPS obtains continuous wave-form that represents the enrichment of histone modifications (or nucleosomes) by extending each tag (25nt, Solexa) to 150nt in the 3’ direction and taking the middle 75, and detects the positions of nucleosomes based on Laplacian of Gaussian (LOG) edge detection. The p value of each detection was estimated using Poisson approximation and the user can decide a cut-off for the final selection of nucleosome positions. In case of histone modification, the sequence tags are regrouped by different types of histone modification after nucleosome positioning and then the p-value of a particular histone modification at a positioned nucleosome was calculated based on the tag count of that histone modification in the nucleosome region using Poisson distribution, similar to the method mentioned above. The user also can select a cut-off of p value in histone modification assignment.

 

Paper

Our paper has recently been accepted by BMC Genomics. For details, please refer to

 

Zhang Y, Shin H, Song JS, Lei Y, Liu XS. Identifying Positioned Nucleosomes with Epigenetic Marks in Human from ChIP-Seq. BMC Genomics 2008, 9:537. (see http://www.biomedcentral.com/1471-2164/9/537)

 

Our experimental results can be obtained at http://liulab.dfci.harvard.edu/NPS/Result/.

 

 

Contact

If you have any further questions, please contact Chenfei Wang (emailto: wcftongji@gmail.com).

 

 

Installation

The latest version of NPS is 1.3.2 and it is available as an open source (download). NPS is developed using Python 2.5 and three Python packages, NumPy and Pywavelets, must be installed. Unlike the previous versions, NPS 1.3.2 does not need R and RPy to be installed.

 

Usage

After downloading and unpacking the NPS package, go to the folder and type on the command window as follows.

 

$ python SeqTag.py NPS.par

 

NPS.par is a text file that contains the self-explaining parameters for NPS. The following is an example parameter file, which was actually used in our study.

 

Note that minimum peak width (MIN_WIDTH) is recommended to be larger than tag extension (EXTENSION); otherwise, you will have a warning.

 

If you want to generate wiggle files of nucleosome signals, type the following on the command window.

 

$ python SeqTagProcess.py NPS.par

 

However, you need to make sure that your output file name must be changed from the NPS.par in peak detection; otherwise, it will overwrite the peak file if it already exists in the same folder. For example, you may want to use ‘HM_hg18.wig’ instead of ‘HM_hg18_peak.bed’ for this purpose. Other parameters must be the same as the NPS.par in the peak detection.



#############################################################################

# Parameters for NPS.

#

 

#

# Input and output files

#

INFILE = HM_hg18.bed

OUTFILE = HM_hg18_peak.bed

 

 

 

# input tag file

# output file of identified nucleosome positions

#

# Preprocessing

#

SPENAME = hg18

EXTENSION = 75

SHIFT = 37

WANT_SORT = yes

TAG_THR = 2

 

 

 

 

# species name

# each Tag is extended to 150nt and

# the middle 75 nt is taken

# yes: sorting the tags, no: not sorting the tags

# minimum tag count to declare as a peak

#

# Wavelet denoising

#

DECOMP_LEVEL = 2

WAVELET = coif4

THRESHOLD_EST = heursure

THRESHOLD_TYPE = soft

SCALE = mln

 

 

 

 

# level 2 wavelet decomposition

# use coflet4 for wavelet denoising

# denoising threshold selection using SURE

# soft thresholding

# multi-level denoising

#

# Peak finding

#

INTERVAL = 10

GENOME_LEN = 2.4e9

PVALUE = 1e-5 

TAG_NUM = 186e6

LOG = 3

SLOPE = 2

MIN_WIDTH = 80

MAX_WIDTH = 250

MIN_HEIGHT =

MAX_HEIGHT = 10000

PEAK_INFLECTION_RATIO = 1.2

MED_WSIZE = 5

BIAS_RATIO = 4

 

 

 

 

# 10nt sampling for efficient signal processing

# the genome length to estimate the background lambda

# p value cut-off for identifying nucleosomes

# the total tag numbers

# level of Gaussian smoothing in peak detection

 

# minimum peak width

# maximum peak width

# minimum peak height

# maximum peak height

# minimum peak to inflection point ratio

# median filter window size

# allowable ratio between + tags and – tags

 

 

 

Identified Peaks (Nucleosomes)

The output file (OUTFILE in *.par file) is a tab-delimited file and contains the list of identified peaks (nucleosomes) as follows. The first line of the output file is a header line that indicates each of the 5 columns. Note that the p-values of the identified peaks (nucleosomes) are given as log P values (-10*log10(pvalue)).

 

chr

start

end

name

-10*log10(pvalue)

chr21

9719971

9720101

nucleosome1

1.22E+02

chr21

9720591

9720671

nucleosome2

5.36E+01

chr21

9721111

9721191

nucleosome3

6.99E+01

chr21

9721341

9721421

nucleosome4

1.87E+02

chr21

9722171

9722251

nucleosome5

6.43E+01

chr21

9722311

9722391

nucleosome6

6.99E+01

chr21

9723031

9723111

nucleosome7

2.44E+02

chr21

9723531

9723631

nucleosome8

2.31E+02

chr21

9723971

9724051

nucleosome9

2.70E+02

 

 

 

NPS Q&A - Please read this to understand how to use the above parameters

 

1. How do EXTENSION and SHIFT work? Do they depend on each other?

 

From the above example of parameters, SHIFT looks like the half of EXTENSION, but this is just a coincidence.

 

Just getting to the point, the following is the equation to calculate SHIFT given a DNA fragment size and EXTENSION.

 

SHIFT = ( Fragment size - EXTENSION ) / 2

 

We will see how we obtained the above equation via an illustration.

 

1) Suppose that the DNA fragment size is 150nt. The +/- signs stand for sequence reads mapped on +/- strands, respectively. Ideally, + and - reads from a single fragment form bimodal distributions, whose summits are about 150nt apart.

 

 

Original

+                      -

|----|              

                           |----|

<-------------------------> 150nt (fragment size)

 

 

2) The next step is extending sequence reads in their 3` directions. This increases the genome coverage, which helps to obtain the nucleosome position signal. We recommend to use the half of the nucleosome size (75nt) because, based on our analysis, this value provided the best nucleosome positioning. Please refer to our paper for details.

 

Extension: extend tags to 75nt

|-------------|

                |------------|

 

3) The final step is moving these extended +/- reads in their 3` directions for them to overlap with each other, resulting in the nucleosom position signal. According to the above equation, SHIFT = (150 - 75) / 2 ~= 37.

 

Shift: shift by 37 nt 

        |------------|

        |------------|

 

Therefore, if your estimated fragment size is 200nt and you still want to keep 75nt as EXTENSION, SHIFT will be (200 - 75) / 2 = 62nt.