2011-09-20 43 views
3

这里是代码显示问题:使用“猫”写非英文字符为.html文件(以R)

myPath = getwd() 
cat("abcd", append = T, file =paste(myPath,"temp1.html", sep = "\\")) # This is fine 
cat("<BR/><BR/><BR/>", append = T, file =paste(myPath,"temp1.html", sep = "\\")) # This is fine 
cat("שלום", append = F, file =paste(myPath,"temp1.html", sep = "\\")) # This text gets garbled when the html is opened using google chrome on windows 7. 
cat("שלום", append = F, file =paste(myPath,"temp1.txt", sep = "\\")) # but if I open this file in a text editor - the text looks fine 

# The text in the HTML folder would look as if I where to run this in R: 
(x <- iconv("שלום", from = "CP1252", to = "UTF8")) 
# But if I where to try and put it into the file, it wouldn't put anything in: 
cat(x, append = T, file =paste(myPath,"temp1.html", sep = "\\")) # empty 

编辑: 我用下面也尝试编码(没有成功)

ff <-file(paste(myPath,"temp1.html", sep = "\\"), encoding="CP1252") 
cat("שלום", append = F, file =ff) 
ff<-file(paste(myPath,"temp1.html", sep = "\\"), encoding="utf-8") 
cat("שלום", append = F, file =ff) 
ff<-file(paste(myPath,"temp1.html", sep = "\\"), encoding="ANSI_X3.4-1986") 
cat("שלום", append = F, file =ff) 
ff<-file(paste(myPath,"temp1.html", sep = "\\"), encoding="iso8859-8") 
cat("שלום", append = F, file =ff) 

有什么建议吗?谢谢。

+0

它看起来像你需要一些睡眠... =) 'Sys.sleep(样品(3600 * 1.5:8.5,1))' – aL3xa

+0

看看这个问题[关于使用UTF-8编码保存csv](http://stackoverflow.com/q/7402307/168747)。 – Marek

+0

嗨马雷克,当我尝试使用它时,我得到的文字变成“\ xf9 \ xec \ xe5 \ xed” –

回答

1

您的代码有点多余。第5行是temp1.txt错字(.html)?无论如何,也许你应该在<meta>标记内设置字符集

拿这个作为一个例子:

<html> 
<head> 
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"> 
</head> 
<body> 
<% 
cat("abcd") 
cat("<BR/><BR/><BR/>") 
cat("שלום") 
cat("שלום") 
(x <- iconv("שלום", from = "CP1252", to = "UTF8")) 
cat(x) 
-%> 
</body> 
</html> 

这是一个brew代码,所以如果你继续前进,brew它,你会得到正确的响应。长话短说,关键字是charset

1

问题不在于R(R正确生成UTF-8编码输出)......它只是在没有显式指定编码的情况下,您的Web浏览器会采用错误的编码。只是使用下面的代码段(从内部R)代替:

<html> 
    <head> 
     <meta http-equiv="content-type" content="text/html; charset=utf-8"> 
    </head> 
    <body> 
     שלום 
    </body> 
</html> 

这指定了一个正确的编码(UTF-8),并因此导致正确螺纹下面的文本浏览器。

+0

该死,我迟了2分钟! =/ – aL3xa

1

尝试这种方式

cat("abcd", file = (con <- file("temp1.html", "w", encoding="UTF-8"))); close(con) 
+0

感谢gd047,但它不起作用。它留给我这个:ש××××。而不是שלום –