2012-08-14 38 views
3

有时候我会用R来解析pdf中的文本来写文章(我使用LATEX)。我想要做的一件事是将左右引号更改为LATEX风格的左右引号。gsub每隔一个条件出现

乳胶会改变"dog"``dog''(所以两个`为右左,两个“)

这里是什么,我有什么,我想获得一个例子。

#currently 
x <- c('I like "proper" cooking.', 'I heard him say, "I want some too" and "nice".') 

[1] "I like \"proper\" cooking." "I heard him say, \"I want some too\" and \"nice\"." 

#desired outcome 
[1] "I like ``proper'' cooking." "I heard him say, ``I want some too'' and ``nice''." 

编辑:想我会分享上下文的实际使用。使用ttmaccer的解决方案(在Windows计算机上工作):

g <- function(){ 
    require(qdap) 
    x <- readClipboard() 
    x <- clean(paste2(x, " ")) 
    zz <- mgsub(c("- ", "“", "”"), c("", "``", "''"), x) 
    zz <- gsub("\"([^\"].*?)\"","``\\1''", zz) 
    writeClipboard(noquote(zz), format = 1) 
} 

注:qdap可以下载HERE

回答

3

幼稚解决方案w乌尔德是:

> gsub("\"([^\"].*?)\"","``\\1''",x) 

[1] "I like ``proper'' cooking."       
[2] "I heard him say, ``I want some too'' and ``nice''." 

,但我不知道你会如何处理"some \"text\" with one \""

+0

工程。无论如何,我会检查输出,所以我只是在大部分时间寻找它(节省时间)。 – 2012-08-14 02:10:49

1

两个阶段方案:

阶段1:使用"((?:[^\\"]|\\.)*)"双引号的字符串匹配
阶段2:使用\\"([^\\"]*)\\"来代替来自阶段1的组1的\"