[R染色体频率计算

我应该算一个字母有多少是在chromosome.txt文件：http://users.utu.fi/jjahol/chromosome.txt [R染色体频率计算

到目前为止，我已经成功实现代码：

cromo <- read.table("http://users.utu.fi/jjahol/chromosome.txt", header=FALSE) 
cromo2 <- as.character(unlist(cromo))

此代码创建的矢量这些元素中包含1000个元素，长度为60个字符。如何将其转换为一个元素等于一个字符的向量？

来源

2014-04-04 jaw325

这个功课？ –

你可以像这样使用stringr库的str_split函数：'''sapply（cromo2，str_split，“”）''' – dorvak

这应该给你想要的结果：

cromo <- read.table("http://users.utu.fi/jjahol/chromosome.txt", header=FALSE) 
cromo2 <- unlist(strsplit(as.character(cromo$V1),"")) 
table(cromo2)

它给你：

A  C  G  T 
15520 13843 14215 16422

来源

2014-04-04 11:51:22 Jaap

strsplit做到这一点：

> strsplit('text', '') 
[[1]] 
[1] "t" "e" "x" "t"

来源

2014-04-04 11:44:19

这是一个有点非正统的方法（和unlist(strsplit(...))无论如何都会非常快），但是你可以使用其中一个字符串搜索包提供矢量化搜索模式选项的ges，如“stringi”：

## Read the data in. Since it's not a data.frame, just use readLines 
X <- readLines("http://users.utu.fi/jjahol/chromosome.txt") 

## Paste the lines together into a single block of text 
Y <- paste(X, collapse = "") 

library(stringi) 
Strings <- c("A", "C", "G", "T") 
stri_count_fixed(Y, Strings) 
# [1] 15520 13843 14215 16422 

## Named output.... 
setNames(stri_count_fixed(Y, Strings), Strings) 
#  A  C  G  T 
# 15520 13843 14215 16422

来源

2014-04-05 03:27:06 A5C1D2H2I1M1N2O1R2T1

[R染色体频率计算

回答

相关问题