We propose a novel analysis algorithm MAT to reliably detect regions enriched by transcription factor Chromatin ImmunoPrecipitation (ChIP) on Affymetrix tiling arrays (chip). MAT models the baseline probe behavior by considering probe sequence and copy number on each array. The correlation between the baseline probe model estimates and the observed measurements can be as high as 0.72. MAT standardizes the probe value via the probe model, eliminating the need for sample normalization. A novel scoring function is applied to the standardized data to identify the ChIP-enriched regions, which allows robust p-value and false discovery rate calculations. MAT can detect ChIP-regions from a single ChIP sample, multiple ChIP samples, or multiple ChIP samples with controls with increasing accuracy. Based on the mock ChIP samples provided by the ENCODE consortium, MAT achieved 100% accuracy (0 false positive and 0 false negative) for the target detection of those spike-in plasmids, which are 2,4,8,-256 fold enriched compared with the genomic background. Quantitatively, MAT yielded a 0.95 correlation coefficient between the spike-in DNA concentration and the predicted score. Upon further analysis, MAT identified more than 70% of the true targets at 5% FDR cutoff from a single ChIP sample. This is a valuable feature for quickly testing the protocols and antibodies for ChIP-chip, and easily identifying ChIP-chip samples with questionable quality.

MAT requires four types of input files: the Affymetrix .cel files which contain the signal value of every probe (sample); the .bpmap library files which contain the sequence, locations (on the array and on the genome), and copy number of each probe(sample); the repeat-library file which contains the chromosome coordinates of RepeatMasker repeats, simple repeats and segmental duplication (sample). The MAT parameters (including the grouping of the .cel and .bpmap files) are then organized into a user-edited .tag (Tiling Array Group) file (sample). MAT returns two types output files: the .bar files which contain the MAT score for each probe which can be imported to Affymetrix Integrated Genome Browser (IGB) for visualization; a .bed file with the chromosomal coordinates of all the ChIP-regions with MAT score and repeat (including segmental duplications) flag which can be loaded into UCSC Genome Browser

References:
 
1) Johnson WE*, Li W*, Meyer CA*, Gottardo R, Carroll JS, Brown M and Liu XS: Model-based analysis of tiling-arrays for ChIP-chip. Proc. Natl. Acad. Sci. USA 103 (2006) 12457-12462. *Joint first authors [Abstract], [Full Paper]

2) Li W, Carroll JS, Brown M and Liu XS: xMAN: extreme MApping of OligoNucleotides. Accepted, BIOCOMP'07, BMC Genomics.

 

 

 

 

 

 

 

 

 

 

 

 

 

Google Groups
Subscribe to MAT.announce with your academic email address
for the download password and notification of new releases and bug fixes.
Email:  Visit this group

Questions, comments, suggestions or bug report, please contact mat.support at gmail.com

Last updated by Wei Li on Thursday, May 17, 2007