2016-04-04 213 views
1

我试图通过访问Web服务和搜索邮政编码来建立一个巴西地址的数据框。实际上,我可以接收一个单独的结果并将其存储在一个数据框中,但是当我尝试搜索多个邮政编码(例如向量中)时,我的数据框只保留最后一个元素。 有人可以帮我吗?R.如何将循环(for)结果追加到数据框中?

请参见下面的代码:

############### 
library(httr) 
library(RCurl) 
library(XML) 
library(dplyr) 
############### 

# ZIPs I want to search for: 
vectorzip <- c("71938360", "70673052", "71020510") 
j <- length(vectorzip) 

# loop: 
for(i in 1:j) { 

# Save the URL of the xml file in a variable: 
xml.url <- getURL(paste("http://cep.republicavirtual.com.br/web_cep.php?cep=",vectorzip[i], sep = ""), encoding = "ISO-8859-1") 
xml.url 

# Use the xmlTreeParse-function to parse xml file directly from the web: 
xmlfile <- xmlTreeParse(xml.url) 
xmlfile 
# the xml file is now saved as an object you can easily work with in R: 
class(xmlfile) 

# Use the xmlRoot-function to access the top node: 
xmltop = xmlRoot(xmlfile) 

# have a look at the XML-code of the first subnodes: 
print(xmltop) 

# To extract the XML-values from the document, use xmlSApply: 
zips <- xmlSApply(xmlfile, function(x) xmlSApply(x, xmlValue)) 
zips 
# Finally, get the data in a data-frame and have a look at the first rows and columns: 
zips <- NULL 
zips <- rbind(zips_df, data.frame(t(zips),row.names=NULL)) 

View(zips_df)} 
+1

什么是zips < - NULL行为zips_df定义的位置? –

+0

用rbind生长一个对象通常不是一个好主意。更好的方法是定义一个特定大小的空数据框(从而分配必要的内存),然后填充行。 – RHertel

回答

0

您希望:

一)定义zips_df
b)定义zips_df的循环之外。
c)不设置zips_df为空内环路:)

############### 
library(httr) 
library(RCurl) 
library(XML) 
library(dplyr) 
############### 

# ZIPs I want to search for: 
vectorzip <- c("71938360", "70673052", "71020510") 
j <- length(vectorzip) 
zips_df <- data.frame() 

i<-1 
# loop: 
for(i in 1:j) { 

    # Save the URL of the xml file in a variable: 
    xml.url <- getURL(paste("http://cep.republicavirtual.com.br/web_cep.php?cep=",vectorzip[i], sep = ""), encoding = "ISO-8859-1") 
    xml.url 

    # Use the xmlTreeParse-function to parse xml file directly from the web: 
    xmlfile <- xmlTreeParse(xml.url) 
    xmlfile 
    # the xml file is now saved as an object you can easily work with in R: 
    class(xmlfile) 

    # Use the xmlRoot-function to access the top node: 
    xmltop = xmlRoot(xmlfile) 

    # have a look at the XML-code of the first subnodes: 
    print(xmltop) 

    # To extract the XML-values from the document, use xmlSApply: 
    zips <- xmlSApply(xmlfile, function(x) xmlSApply(x, xmlValue)) 
    zips 
    # Finally, get the data in a data-frame and have a look at the first rows and columns: 

    zips_df <- rbind(zips_df, data.frame(t(zips),row.names=NULL)) 
} 

    View(zips_df) 

你得到这样的:

> zips_df 
    resultado.text  resultado_txt.text uf.text cidade.text   bairro.text tipo_logradouro.text logradouro.text 
1    1 sucesso - cep completo  DF Taguatinga Sul (Ãguas Claras)     Rua    09 
2    1 sucesso - cep completo  DF Cruzeiro  Setor Sudoeste    Quadra  300 Bloco O 
3    1 sucesso - cep completo  DF  Guará   Guará I    Quadra QI 11 Conjunto U 
+0

非常感谢Serban! –

0

请尽量提供一个最低工作的例子。你的例子有很多与你的实际问题无关的代码行。如果您试图删除这些不必要的代码,那么在保存之前,您可能已经发现了zips <- NULL行擦除了zip文件的信息。其次,你引用了一个zips_df对象,但这不是在你的代码中创建的。

要回答你的问题:

  • 添加一行创建zips_df为空数据框对象启动循环之前:

    vectorzip <- c("71938360", "70673052", "71020510") 
    j <- length(vectorzip) 
    zips_df <- data.frame() 
    
  • 删除行,你擦除zips对象(zips <- NULL

  • 更改生长线zips_df d ata.frame完整的数据保存到data.frame对象,而不是临时的“拉链”变量:

    zips <- rbind(zips_df, data.frame(t(zips),row.names=NULL)) 
    

我建议删除“查看”线和检测带有印记的data.frame :

print(zips_df) 
resultado.text  resultado_txt.text uf.text cidade.text    bairro.text tipo_logradouro.text logradouro.text 
1    1 sucesso - cep completo  DF Taguatinga Sul (Ã\u0081guas Claras)     Rua    09 
2    1 sucesso - cep completo  DF Cruzeiro   Setor Sudoeste    Quadra  300 Bloco O 
3    1 sucesso - cep completo  DF  Guará     Guará I    Quadra QI 11 Conjunto U 
+0

非常感谢Andre。我感谢你的建议和你的回答! –

相关问题