那么,我试图为您的需求创建一个可行的解决方案。不过,可能有更好的方法来执行它,可能使用的软件包如data.table
和/或stringr
。无论如何,这个片段可能是一个工作的起点。哦,我修改了Ad_title
数据,以便物种名称在标题中。
# Re-create data
Ad_title <- c("1 year old Ball Python", "Young Red Blood Python. - For Sale",
"1 Year Old Male Bearded Dragon - For Sale")
df2 <- data.frame(Latin_name = c("Python regius", "Python brongersmai", "Pogona barbata"),
Common_name = c("E: Ball Python, Royal Python G: Königspython",
"E: Red Blood Python, Malaysian Blood Python",
"E: Eastern Bearded Dragon, Bearded Dragon"),
stringsAsFactors = F)
# Aggregate common names
Common_name <- paste(df2$Common_name, collapse = ", ")
Common_name <- unlist(strsplit(Common_name, "(E:)|(G:)|(,)"))
Common_name <- Common_name[Common_name != ""]
# Data frame latin names vs common names
df3 <- data.frame(Common_name, Latin_name = sapply(Common_name, grep, df2$Common_name),
row.names = NULL, stringsAsFactors = F)
df3$Latin_name <- df2$Latin_name[df3$Latin_name]
# Data frame Ad vs common names
Ad_Common_name <- unlist(sapply(Common_name, grep, Ad_title))
df4 <- data.frame(Ad_title, Common_name = sapply(1:3, function(i) names(Ad_Common_name[Ad_Common_name==i])),
stringsAsFactors = F)
你的输入文件都是字符串,对吗?您是否尝试修改第二个数据框,以便它成为所有常用名称的列表/矢量? – zyurnaidi