0
我有20000行的数据集,在其最纯粹的形式看起来是这样的:可变字符串匹配之间的两列的
v1 v2
1 Case 1 (A v. B) A v. B
2 Case 2 (A v. C) A v. B
3 Case 2 (A v. C) C v. B
4 Case 4 (X v. Z) X v. Z
5 Case 5 (B v. A) A v. B
6 Case 6 (X v. A) X v. A
7 Case 6 (X v. A) A v. X
...
...除了也有V1正许多变化,V2(实际上约150,但仍然太多以至于无法列出)。
我想回到第三列V3含有的V1任何字符串是否V2字符串相匹配的逻辑指标。
library(stringr)
x$v3 <- with(x, str_detect(v1, v2))
,我会很感激,如果有人能在正确的方向指向我:
v1 v2 v3
1 Case 1 (A v. B) A v. B TRUE
2 Case 2 (A v. C) A v. B FALSE
3 Case 2 (A v. C) C v. B FALSE
4 Case 4 (X v. Z) X v. Z TRUE
5 Case 5 (B v. A) A v. B FALSE
6 Case 6 (X v. A) X v. A TRUE
7 Case 6 (X v. A) A v. X FALSE
我一直在这样的事情,我认为这是正确的轨道上玩耍解决方案/解决方法。
MWE表明我的str_detect()技术不工作:
x <- structure(list(v1 = c("Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation"
), v2 = c("Georgia v Russian Federation", " Ethiopia v South Africa Liberia v South Africa",
" Cameroon v United Kingdom", " New Zealand v France", " Australia v France",
" Nicaragua v United States of America", " Nicaragua v Honduras",
" Nauru v Anustralia", " Nnew Zealand v France", " Islamic Republic of Iran v United States of America",
" Bosnia and Herzegovina v Serbia and Montenegro", " Spain v Cananda",
" Libyan Arab Jamahiriya v United States of America", " Libyan Arab Jamahiriya v United Kingdom",
" Democratic Republic of the Congo v Burundi", " Germany v United States of America",
" Democratic Republic of the Congo v Belgium", " Liechtenstein v Germany",
" Democratic Republic of the Congo v Ugandan", " Democratic Republic of the Congo v Rwandan",
" Nicaragua v Colombia", " Djibouti v France", " Georgia v Russian Federation",
" Croatia v Serbia", " Mexico v United States of American", " Democratic Republic of the Congo v Rwanda",
" Spain v Canada", " Australia v France", " New Zealand v France",
" New Zealand v France")), .Names = c("v1", "v2"
), row.names = c(NA, 30L), class = "data.frame")
我不认为你是对的。 v1在第1行和第23行在格鲁吉亚之后包含2个空格,并且在“v”之后,它不包含v2中的双空格。 我会在答案中添加关于空格的解释以及如何解决它们 –
您可以发布您在此使用的函数吗? 也许重新检查您发布的数据? 我创建了你在问题中发布的数据框并应用了相同的功能,并在1和23上得到了TRUE,其他都是假的 –
我重置了我的记忆并且它可以工作 - 谢谢!你为我节省了很多时间。很高兴答案非常简单。我也能够使用agrep()函数来实现模糊字符串匹配:apply(x,MARGIN = 1,FUN = function(x){agrepl(gsub(“”,“*”,x [2]),x [1],max.distance = .25)}) – beddotcom