2017-03-04 35 views
0

我有20000行的数据集,在其最纯粹的形式看起来是这样的:可变字符串匹配之间的两列的

v1     v2 
1 Case 1 (A v. B)  A v. B 
2 Case 2 (A v. C)  A v. B 
3 Case 2 (A v. C)  C v. B 
4 Case 4 (X v. Z)  X v. Z 
5 Case 5 (B v. A)  A v. B 
6 Case 6 (X v. A)  X v. A 
7 Case 6 (X v. A)  A v. X 
... 

...除了也有V1正许多变化,V2(实际上约150,但仍然太多以至于无法列出)。

我想回到第三列V3含有的V1任何字符串是否V2字符串相匹配的逻辑指标。

library(stringr) 
x$v3 <- with(x, str_detect(v1, v2)) 

,我会很感激,如果有人能在正确的方向指向我:

v1     v2   v3 
1 Case 1 (A v. B)  A v. B  TRUE 
2 Case 2 (A v. C)  A v. B  FALSE 
3 Case 2 (A v. C)  C v. B  FALSE 
4 Case 4 (X v. Z)  X v. Z  TRUE 
5 Case 5 (B v. A)  A v. B  FALSE 
6 Case 6 (X v. A)  X v. A  TRUE 
7 Case 6 (X v. A)  A v. X  FALSE 

我一直在这样的事情,我认为这是正确的轨道上玩耍解决方案/解决方法。

MWE表明我的str_detect()技术不工作:

x <- structure(list(v1 = c("Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation", 
          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation", 
          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation", 
          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation", 
          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation", 
          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation", 
          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation", 
          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation", 
          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation", 
          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation", 
          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation", 
          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation", 
          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation", 
          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation", 
          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation", 
          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation", 
          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation", 
          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation", 
          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation", 
          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation", 
          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation", 
          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation", 
          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation", 
          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation", 
          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation", 
          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation", 
          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation", 
          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation", 
          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation", 
          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation" 
), v2 = c("Georgia v Russian Federation", " Ethiopia v South Africa Liberia v South Africa", 
      " Cameroon v United Kingdom", " New Zealand v France", " Australia v France", 
      " Nicaragua v United States of America", " Nicaragua v Honduras", 
      " Nauru v Anustralia", " Nnew Zealand v France", " Islamic Republic of Iran v United States of America", 
      " Bosnia and Herzegovina v Serbia and Montenegro", " Spain v Cananda", 
      " Libyan Arab Jamahiriya v United States of America", " Libyan Arab Jamahiriya v United Kingdom", 
      " Democratic Republic of the Congo v Burundi", " Germany v United States of America", 
      " Democratic Republic of the Congo v Belgium", " Liechtenstein v Germany", 
      " Democratic Republic of the Congo v Ugandan", " Democratic Republic of the Congo v Rwandan", 
      " Nicaragua v Colombia", " Djibouti v France", " Georgia v Russian Federation", 
      " Croatia v Serbia", " Mexico v United States of American", " Democratic Republic of the Congo v Rwanda", 
      " Spain v Canada", " Australia v France", " New Zealand v France", 
      " New Zealand v France")), .Names = c("v1", "v2" 
      ), row.names = c(NA, 30L), class = "data.frame") 

回答

1

grepl可用于从V2比较单一值V1的可能子

您需要申请它的每所以一个快速的解决方案可以是: apply(data.frame(v1,v2),MARGIN=1, FUN=function(x) {grepl(x[2],x[1])})

如果您想忽略空格数量的差异(如在行#1中),您可以将x [2]中的值替换为适当的值e正则表达式使用gsub,所以" "将被替换为" *"以允许多个空格。

在这种情况下,该应用将工作:

apply(x,MARGIN=1, FUN=function(x) {grepl(gsub(" "," *",x[2]),x[1])})

+1

我不认为你是对的。 v1在第1行和第23行在格鲁吉亚之后包含2个空格,并且在“v”之后,它不包含v2中的双空格。 我会在答案中添加关于空格的解释以及如何解决它们 –

+0

您可以发布您在此使用的函数吗? 也许重新检查您发布的数据? 我创建了你在问题中发布的数据框并应用了相同的功能,并在1和23上得到了TRUE,其他都是假的 –

+0

我重置了我的记忆并且它可以工作 - 谢谢!你为我节省了很多时间。很高兴答案非常简单。我也能够使用agrep()函数来实现模糊字符串匹配:apply(x,MARGIN = 1,FUN = function(x){agrepl(gsub(“”,“*”,x [2]),x [1],max.distance = .25)}) – beddotcom

相关问题