2012-05-28 29 views
1

我创建了以下矩阵R的具体子:得到一个序列

positions = cbind(seq(from = 20, to = 68, by = 4),seq(from = 22, to = 70, by = 4)) 

我也有以下字符串:

"SEQRES 1 L 36 THR PHE GLY SER GLY GLU ALA ASP CYS GLY LEU ARG PRO   " 

我试图使用应用功能制作第一个索引来自位置[,1],第二个来自位置[,2]的子字符串列表(mystring,start.position,end.position)。我可以使用for循环轻松完成此操作,但我认为应用会更快。

我能得到它的工作如下,但我不知道是否有一个更清洁的方式:

parse.me = cbind(seq(from = 20, to = 68, by = 4),seq(from = 22, to = 70, by = 4), input) 
apply(parse.me, MARGIN = 1, get.AA.seqres) 

get.AA.seqres <- function(items){ 
start.position = as.numeric(items[1]) 
end.position = as.numeric(items[2]) 
string = items[3] 
return (substr(string, start.position, end.position) ) 
} 
+1

你为什么不分配空白空间并丢弃前三个元素? – Andrie

+0

PDB文件元素由不是由空白的列定义。因此,当规范特别提及列数时,我很犹豫是否会将空白分割出来。虽然感谢虽然! – user1357015

回答

3

试试这个:

> substring(input, positions[, 1], positions[, 2]) 
[1] "THR" "PHE" "GLY" "SER" "GLY" "GLU" "ALA" "ASP" "CYS" "GLY" "LEU" "ARG" "PRO" 
0

我喜欢Andrie的切实可行的建议,但如果你需要走这条路线的一些其他原因,你的问题听起来像它可以通过Vectorize()解决:

#Your data 
positions = cbind(seq(from = 20, to = 68, by = 4),seq(from = 22, to = 70, by = 4)) 
input <- "SEQRES 1 L 36 THR PHE GLY SER GLY GLU ALA ASP CYS GLY LEU ARG PRO   " 

#Vectorize the function substr() 
vsubstr <- Vectorize(substr, USE.NAMES = FALSE) 
vsubstr(input, positions[,1], positions[,2]) 
#----- 
[1] "THR" "PHE" "GLY" "SER" "GLY" "GLU" "ALA" "ASP" "CYS" "GLY" "LEU" "ARG" "PRO" 

#Or, read the help page on ?substr about the bit for recycling in the first paragraph of details 

substr(rep(input, nrow(positions)), positions[,1], positions[,2]) 
#----- 
[1] "THR" "PHE" "GLY" "SER" "GLY" "GLU" "ALA" "ASP" "CYS" "GLY" "LEU" "ARG" "PRO"