HMMTiling

 

Introduction. 1

Application descriptions. 7

BPmapRemoveRedundance (Internal) 7

Qnormalize. 9

Cel2Matrix. 10

PBfileMaker (Internal) 12

HMMTiling. 16

BedCall 19

Installation: 20

Sample usage: 31

Introduction

HMMTiling is a comprehensive software package for tiling array data analysis. It includes command line python applications for filtering, mapping, quantile-normalizing and enriched-region identification from ChIP-chip experiments on tiling arrays. HMMTiling models the behavior of each individual probe as a baseline for each ChIP or control experiment to be compared with. It then uses a Hidden Markov Model to identify the enrichment probability at each probe location, thus can determine the exact enriched regions.

 

As input, HMMTiling expects a set of cel files from ChIP and control experiments as well as two internal files: 1) a text version non-redundant bpmap file (.bpmap.NR.txt); 2) Probe Behavior file (.pb). This package uses Affymetrix FileIO SDK to read the binary format cel files.

 

The algorithm used by HMMTiling is described in:

1.         Li W, Meyer CA and Liu XS: A hidden Markov model for analyzing ChIP-chip experiments on genome tiling arrays and its application to p53 binding sequences. To appear in Bioinformatics and ISMB2005.

2.         Carroll JS, Liu XS, Brodsky AS, Li W, Szary AJ, Meyer CA, Shao W, Hestermann EV, Geistlinger TR, Fox EA, Silver PA and Brown M: Chromosome-wide Mapping of Estrogen Receptor Binding Reveals Long-range Combinatorial Regulation. To appear in Cell.

 

 

 

Application descriptions

Users can run each program without any options for the command-line help.

 

BPmapRemoveRedundance (Internal)


Usage: %prog bpmapfile1 bpmapfile2 ...

          This program will make non-redundant text-format bpmap files from the original Affymetrix binary bpmap files. It uses two-stage Redundancy Removal to ensure unique probe measurement within 1 kb. It will add NR.txt suffix in the non-redundant bpmap files

 

Qnormalize

Usage:   %prog  *.cel ..

This program will take a set of input cel files and create quantile-normalized cel files. If you want to use the pre-computed probe behavior file in the HMM algorithm, you must choose a corresponding normalized cel file as a reference in the quantile normalization.

 

Cel2Matrix

Usage: %prog options cel1 cel2 ..

This program will take a set of normalized cel files and the text-version non-redundant bpmap files to create a matrix file. One matrix file per chromosome. The cel files and bpmap files should be in the same folder. In the matrix file, the first two columns are ChipID and Position, the rightwards columns are different array samples (experiments).

 

PBfileMaker (Internal)

Usage: %prog -i Matrixfile -o PBfile --pcutoff  --pcols –ecols

This program will take the matrix file generated from Cel2Matrix program and create a probe behavior file. The default output is the chromosome name from the matrix file + ".pb" suffix like chr21.gb. You may specify the experiment index as partial outlier (you have to specify the percentage cutoffs as well, the top percentage probes in this experiment are excluded in probe behavior estimation) or entire probe experiments (all probes in this experiment are included in estimation). So the rest of the experiments not mentioned are entire outliers (all probes in this experiment are excluded in estimation). The leftmost experiment is indexed as 0, the second left is indexed as 1 and so on.

 

 

HMMTiling

Usage: %prog -i MatrixFile [-p 300|-r 0.001] -t Treat –c Control  -b PBFile

This is the main program in this package. This program will take the matrix file generated from Cel2Matrix program to run HMM algorithm to get the log_odds of treatment and control groups for each probe. Users can run BedCall program afterwards to get the bed-format enriched region. It will add a ‘.hmm’ suffix to the matrix file as the output hmm file. You may specify the experiment index in the treatment and control group. The leftmost experiment is indexed as 0, the second left is indexed as 1 and so on.

 

BedCall

Usage: %prog options hmmfile1 hmmfile2 …

 

This program will take the a set of hmm files (one hmm file one chromosome) generated by HMMTiling program and create One ChIP-enriched regions in UCSC bed format according to user specified criteria

 

 

 

Installation:

The HMMTiling package is provided as python source code. We are going to provide windows and Linux binary release with GUI later.

1.    Download the HMMTiling source code.

2.    Check the dependencies listed below and install / upgrade if necessary

a.     Python >= 2.4 with modules: numarray >= 1.3.2;

b.     gcc/g++ >= 3.2.2

c.     SWIG >= 1.3.24

3.    cd into the HMMTiling/SDK directory

4.    Change the first two lines of install.sh to specify the python install directory in your machine
PYTHONHEAD=~/local/include/python2.4/
PYTHONLIB=~/local/lib/python2.4/config/

5.    run install.sh to wrap the c++ codes as python modules

6.    Now go back to the HMMTiling directory and test every program by simply type its name, like ‘python Qnormalize.py’

       
Sample usage:

 

·       Make a new directories as below in your linux machine
lab/
normalized             # normalized cel files and non-redundant bpmap files
lab/
HMMTiling            # HMMTiling software package
lab/
rawdata                   # raw cel files and Affymetrix bpmap files
lab/test                           # some parameter files used in testing

·       Download the raw cel files (A, B and C arrays) of Chr21/22 ChIP-chip experiments in Cawley et al. 2004. and Affymetrix original bpmap files (version 33 of the NCBI assembly of the Human Genome). Put all the files together into directory ‘rawdata’. To make the cel file names human-readable, try to rename them with rename.sh

##########################
Make non-redundant text-format bpmap files from the original Affymetrix binary bpmap files
################################

cd rawdata
python ../HMMTiling/BPmapRemoveRedundance.py P1_CHIP_A.Anti-Sense.hs.NCBIv33.sary.bpmap  P1_CHIP_C.Anti-Sense.hs.NCBIv33.sary.bpmap P1_CHIP_B.Anti-Sense.hs.NCBIv33.sary.bpmap

 

##############################
Quantile normalize on A, B and C arrays separately to make the probe distribution the same across all samples. Put the normalized cel files into ../normalized
###############################

python ../HMMTiling/Qnormalize.py --outdir ../normalized/ *_A_*CEL
python ../HMMTiling/Qnormalize.py --outdir ../normalized/ *_B_*CEL
python ../HMMTiling/Qnormalize.py --outdir ../normalized/ *_C_*CEL

 

##############################
Make matrix files
###############################

cd ..
#mv the non-redundant bpmap files to the normalized directory
mv rawdata/P1_CHIP_*.txt  normalized

python HMMTiling/Cel2Matrix.py -o CawleyAll -m 1 --dir ./normalized/ --outlier --cel-file test/Cawley.celfile

 

##############################
Make probe behavior file from the matrix files
###############################

python HMMTiling/PBfileMaker.py -i CawleyAll.chr21.matrix --pcutoff 0.005 \
--pcols 24,25,26,27,32,33,34,35,40,41,42,43 --ecols \
0,1,2,3,8,9,10,11,16,17,18,19,28,29,30,31,36,37,38,39,44,45,46,47 &

python HMMTiling/PBfileMaker.py -i CawleyAll.chr22.matrix --pcutoff 0.005 \
--pcols 24,25,26,27,32,33,34,35,40,41,42,43 --ecols \
0,1,2,3,8,9,10,11,16,17,18,19,28,29,30,31,36,37,38,39,44,45,46,47 &

 

##############################
Run HMM algorithm on the matrix file to find the p53-FL vs Input enriched regions.
If we assume 300 peaks in Chr21/22, the transition probability will be defined as 300/1,000,000(probes) = 0.0003
###############################

python HMMTiling/HMMTiling.py -i CawleyAll.chr21.matrix \
-r 0.0003 -t 6,7,14,15,22,23 -c 2,3,10,11,18,19 -b chr21.pb  &
python HMMTiling/HMMTiling.py -i CawleyAll.chr22.matrix \
-r 0.0003 -t 6,7,14,15,22,23 -c 2,3,10,11,18,19 -b chr22.pb &

 

##############################
Run bed calling these hmm files to get the bed-format enriched regions.
Set the cutoff as 6.
###############################

                                          
python HMMTiling/BedCall.py -o p53.bed  --tp 6 --tc 6 CawleyAll.*.hmm

 

##############################
Now you can use your probe behavior file and reference cel file (any normalized cel file in the normalized directory is ok, one reference cel file one array type) to help analyze other ChIP-chip experiments on human Chr21/22. Let’s use the same raw cel files in Cawley et al. 2004 as an example. Suppose we only have 6 p53-FL ChIP and 6 genomic input experiments.
###############################

 

##############################
Quantile normalize using the referece cel file. Put the normalized cel files into ../p53FL
###############################

python ../HMMTiling/Qnormalize.py \
--reference ../normalized/HCT116_A_1_controlGST_1.CEL  \
--outdir ../p53FL/  \
HCT116_A_1_p53-FL_1.CEL  HCT116_A_2_p53-FL_1.CEL  HCT116_A_3_p53-FL_1.CEL HCT116_A_1_p53-FL_2.CEL  HCT116_A_2_p53-FL_2.CEL  HCT116_A_3_p53-FL_2.CEL HCT116_A_1_controlInput_1.CEL  HCT116_A_2_controlInput_1.CEL  HCT116_A_3_controlInput_1.CEL HCT116_A_1_controlInput_2.CEL  HCT116_A_2_controlInput_2.CEL  HCT116_A_3_controlInput_2.CEL &

python ../HMMTiling/Qnormalize.py \
--reference ../normalized/HCT116_B_1_controlGST_1.CEL  \
--outdir ../p53FL/  \
HCT116_B_1_p53-FL_1.CEL  HCT116_B_2_p53-FL_1.CEL  HCT116_B_3_p53-FL_1.CEL HCT116_B_1_p53-FL_2.CEL  HCT116_B_2_p53-FL_2.CEL  HCT116_B_3_p53-FL_2.CEL  HCT116_B_1_controlInput_1.CEL  HCT116_B_2_controlInput_1.CEL  HCT116_B_3_controlInput_1.CEL   HCT116_B_1_controlInput_2.CEL  HCT116_B_2_controlInput_2.CEL  HCT116_B_3_controlInput_2.CEL &

python ../HMMTiling/Qnormalize.py \
--reference ../normalized/HCT116_C_1_controlGST_1.CEL  \
--outdir ../p53FL/  \
HCT116_C_1_p53-FL_1.CEL  HCT116_C_2_p53-FL_1.CEL  HCT116_C_3_p53-FL_1.CEL HCT116_C_1_p53-FL_2.CEL  HCT116_C_2_p53-FL_2.CEL  HCT116_C_3_p53-FL_2.CEL  HCT116_C_1_controlInput_1.CEL  HCT116_C_2_controlInput_1.CEL  HCT116_C_3_controlInput_1.CEL HCT116_C_1_controlInput_2.CEL  HCT116_C_2_controlInput_2.CEL  HCT116_C_3_controlInput_2.CEL &

##############################
Make matrix, hmm and bed files
###############################

cp rawdata/P1_CHIP_*.txt  p53FL/

python HMMTiling/Cel2Matrix.py -o p53FL -m 1 --dir ./p53FL/ --outlier --cel-file test/p53FL.celfile

python HMMTiling/HMMTiling.py -i p53FL.chr21.matrix \
-r 0.0003 -t 2,3,6,7,10,11 -c 0,1,4,5,8,9 -b chr21.pb  &
python HMMTiling/HMMTiling.py -i p53FL.chr22.matrix \
-r 0.0003 -t 2,3,6,7,10,11 -c 0,1,4,5,8,9 -b chr22.pb &

python HMMTiling/BedCall.py -o p53.1.bed  --tp 6 --tc 6 p53FL*.hmm

##############################
Now you have a new list of p53 enriched region p53.1.bed which should be almost the same as p53.bed
###############################