2016-12-01 61 views
0

我正在尝试在GRanges对象中的meta列的某个窗口中查找平均值。窗口区域的分级平均值

所以我有一个数据对象:如果我再

gr.RleList <- mcolAsRleList(gr.data, varname = "value") 

gr.RleList 
RleList of length 3 
$chr1 
numeric-Rle of length 249239887 with 5816335 runs 
Lengths:    10467     1     1     1 ...     1     1    13     2 
Values :    NA     0     1    0.5 ...     0    NaN    NA    NaN 

> gr.data 
GRanges object with 10505026 ranges and 1 metadata column: 
     seqnames     ranges strand |  value 
      <Rle>    <IRanges> <Rle> | <numeric> 
    [1]  chr1   [10468, 10468]  + |   0 
    [2]  chr1   [10469, 10469]  - |   1 
    [3]  chr1   [10470, 10470]  + |  0.5 
    [4]  chr1   [10471, 10471]  - |   1 
    [5]  chr1   [10483, 10483]  + |  0.6 

和窗口对象:

gr.windows 
GRanges object with 6077 ranges and 0 metadata columns: 
    seqnames     ranges strand 
     <Rle>    <IRanges> <Rle> 
[1]  chr1  [  1, 100000]  * 
[2]  chr1  [100001, 200000]  * 
[3]  chr1  [200001, 300000]  * 
[4]  chr1  [300001, 400000]  * 
[5]  chr1  [400001, 500000]  * 

我然后将数据对象转换为RleList使用binnedAverage,我只是得到NA值。

gr.binnedAvg <- binnedAverage(gr.windows, gr.RleList, "value") 
gr.binnedAvg 
GRanges object with 6077 ranges and 1 metadata column: 
    seqnames     ranges strand |  value 
     <Rle>    <IRanges> <Rle> | <numeric> 
[1]  chr1  [  1, 100000]  * |  <NA> 
[2]  chr1  [100001, 200000]  * |  <NA> 
[3]  chr1  [200001, 300000]  * |  <NA> 
[4]  chr1  [300001, 400000]  * |  <NA> 
[5]  chr1  [400001, 500000]  * |  <NA> 

unique(mcols(gr.binnedAvg)$value) 
[1] NA 

我也尝试总结这些值,结果相同。 NA列表中的NA值是否会让我感到困惑,或者我的问题出现了什么问题?

小例子:

library(BSgenome.Hsapiens.UCSC.hg19) 
library(data.table) 

gr.windows <- tileGenome(seqinfo(Hsapiens), tilewidth=100000,cut.last.tile.in.chrom=TRUE) 

gr.data <- GRanges(c("chr1", "chr2"), IRanges(c(10, 50), c(15, 55)), value = c(20, 10)) 
gr.data.RleList <- mcolAsRleList(gr.data, varname = "value") 
seqlevels(gr.windows, force=TRUE) <- names(gr.data.RleList) 
gr.data.binnedAvg <- binnedAverage(gr.windows, gr.data.RleList, "value") 

gr.data.binnedAvg 
GRanges object with 4925 ranges and 1 metadata column: 
    seqnames     ranges strand |  value 
     <Rle>    <IRanges> <Rle> | <numeric> 
[1]  chr1  [  1, 100000]  * |  <NA> 
[2]  chr1  [100001, 200000]  * |   0 
[3]  chr1  [200001, 300000]  * |   0 
[4]  chr1  [300001, 400000]  * |   0 

非常感谢您的帮助!

+0

您可以请发布一个最小可重现的例子。 http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – emilliman5

+0

对不起,我忘了。所以我现在附上它。我猜这是因为窗口中缺少值?但我无法弄清楚如何避免这种情况。 –

+0

'mcolAsRleList'和'binnedAverage'不是您加载的库的一部分,它们也不在'GenomicRanges'包中 – emilliman5

回答

0

使用“小例子”,以下工作:

library(BSgenome.Hsapiens.UCSC.hg19) 
library(GenomicRanges) 

gr.windows <- tileGenome(seqinfo(Hsapiens), tilewidth=100000, cut.last.tile.in.chrom=TRUE) 
gr.data <- GRanges(c("chr1", "chr2"), IRanges(c(10, 50), c(15, 55)), value=c(20, 10)) 

gr.data.cov <- GenomicRanges::coverage(gr.data, weight="value") 

seqlevels(gr.windows, pruning.mode="coarse") <- names(gr.data.cov) 

gr.data.binnedAvg <- binnedAverage(gr.windows, gr.data.cov, "value") 

反应迟缓,但可能有助于将来参照。