2017-09-01 71 views
0

我一直在运行,从以下优良的代码...问题与R和谷歌地图的地理编码

https://www.shanelynn.ie/massive-geocoding-with-r-and-google-maps/

它就像一个梦想,但...随机停止中间过程和抛出错误。这发生在使用相同数据集的不同点上。我已经采取了其中一个地址,抛出一个错误,并通过代码手动运行它,它工作正常?我认为这可能是导致此问题的服务器或超时问题。有没有其他人使用这个代码,并有你有类似的问题?你找到解决方案吗?

错误总是看起来像......

contacting http://maps.googleapis.com/maps/api/geocode/json?address=NICHOLS,%20ACT,%202613,%20AUSTRALIA&sensor=false...Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=NICHOLS,%20ACT,%202613,%20AUSTRALIA&sensor=false 
Error in geo_reply$status : $ operator is invalid for atomic vectors 
In addition: Warning messages: 
1: In readLines(connect, warn = FALSE) : 
    cannot open URL 'http://maps.googleapis.com/maps/api/geocode/json?address=NICHOLS,%20ACT,%202613,%20AUSTRALIA&sensor=false': HTTP status was '500 Internal Server Error' 
2: In geocode(address, output = "all", messaging = TRUE, override_limit = TRUE) : 
geocoding failed for "NICHOLS, ACT, 2613, AUSTRALIA". 
if accompanied by 500 Internal Server Error with using dsk, try google. 

我的地址是在像(约2000条记录)数据表...

| MAIL_STATE | MAIL_SUBBURB | MAIL_POSTCODE | | ---------- | ------------ | ------------- | | ACT | NICHOLLS | 2613 |

地址是通过使用下面的代码创建...

addresses = paste0(data$MAIL_SUBURB,", ",data$MAIL_STATE,", ",data$MAIL_POSTCODE,", AUSTRALIA", sep = "") 

完整的代码,它利用addressses低于...

#define a function that will process googles server responses for us. 
getGeoDetails <- function(address){ 
#use the gecode function to query google servers 
geo_reply = geocode(address, output='all', messaging=TRUE, override_limit=TRUE) 
#now extract the bits that we need from the returned list 
answer <- data.frame(lat=NA, long=NA, accuracy=NA, formatted_address=NA, address_type=NA, status=NA) 
answer$status <- geo_reply$status 

#if we are over the query limit - want to pause for an hour 
while(geo_reply$status == "OVER_QUERY_LIMIT"){ 
print("OVER QUERY LIMIT - Pausing for 24 hours at:") 
time <- Sys.time() 
print(as.character(time)) 
Sys.sleep(60*60*24) 
geo_reply = geocode(address, output='all', messaging=TRUE, override_limit=TRUE) 
answer$status <- geo_reply$status 
} 

#return Na's if we didn't get a match: 
if (geo_reply$status != "OK"){ 
return(answer) 
} 
#else, extract what we need from the Google server reply into a dataframe: 
answer$lat <- geo_reply$results[[1]]$geometry$location$lat 
answer$long <- geo_reply$results[[1]]$geometry$location$lng 
if (length(geo_reply$results[[1]]$types) > 0){ 
answer$accuracy <- geo_reply$results[[1]]$types[[1]] 
} 
answer$address_type <- paste(geo_reply$results[[1]]$types, collapse=',') 
answer$formatted_address <- geo_reply$results[[1]]$formatted_address 

return(answer) 
} 

#initialise a dataframe to hold the results 
geocoded <- data.frame() 
# find out where to start in the address list (if the script was interrupted before): 
startindex <- 1 
#if a temp file exists - load it up and count the rows! 
tempfilename <- paste0(infile, '_temp_geocoded.rds') 
if (file.exists(tempfilename)){ 
print("Found temp file - resuming from index:") 
geocoded <- readRDS(tempfilename) 
startindex <- nrow(geocoded) 
print(startindex) 
} 



# Start the geocoding process - address by address. geocode() function takes care of query speed limit. 
for (ii in seq(startindex, length(addresses))){ 
print(paste("Working on index", ii, "of", length(addresses))) 
#query the google geocoder - this will pause here if we are over the limit. 
result = getGeoDetails(addresses[ii]) 
print(result$status)  
result$index <- ii 
#append the answer to the results file. 
geocoded <- rbind(geocoded, result) 
#save temporary results as we are going along 
saveRDS(geocoded, tempfilename) 
} 
+0

这是无关的代码。我刚刚尝试http://maps.googleapis.com/maps/api/geocode/json?address=NICHOLS,%20ACT,%202613,%20AUSTRALIA&sensor=false,这很有效。我怀疑对谷歌服务器的限制(每秒/分钟有限的电话号码) –

+0

@EricLecoutre,谢谢。正如我所说的,这段代码工作的很好......一直到它失败的地步!失败没有模式。这是随机的。有没有一种方法可以在代码中构建一个节流阀,以减慢每分钟的请求数量,或者更可能成为网络问题,延迟接收结果? –

回答

0

就个人而言,我喜欢这个版本。

# Geocoding a csv column of "addresses" in R 

#load ggmap 
library(ggmap) 

# Select the file from the file chooser 
fileToLoad <- file.choose(new = TRUE) 

# Read in the CSV data and store it in a variable 
origAddress <- read.csv(fileToLoad, stringsAsFactors = FALSE) 

# Initialize the data frame 
geocoded <- data.frame(stringsAsFactors = FALSE) 

# Loop through the addresses to get the latitude and longitude of each address and add it to the 
# origAddress data frame in new columns lat and lon 
for(i in 1:nrow(origAddress)) 
{ 
    # Print("Working...") 
    result <- geocode(origAddress$addresses[i], output = "latlona", source = "google") 
    origAddress$lon[i] <- as.numeric(result[1]) 
    origAddress$lat[i] <- as.numeric(result[2]) 
    origAddress$geoAddress[i] <- as.character(result[3]) 
} 
# Write a CSV file containing origAddress to the working directory 
write.csv(origAddress, "geocoded.csv", row.names=FALSE) 

enter image description here

+0

我喜欢你的方法的简单性。也就是说,我必须使用的数据非常简陋,并且其中有许多不正确的地址。其中一个强制循环失败,如下所示... 。来自URL的信息:http://maps.googleapis.com/maps/api/geocode/json?address=CRAIGIE,%20ACT,%202632,%20AUSTRALIA&sensor =假 错误'[.data.frame'(结果,3):未定义的列选择 此外:警告消息: 地址解析失败,状态ZERO_RESULTS,位置= “克雷吉,ACT,2632,澳大利亚” I”需要找出一种方法来处理这些错误,以使其可行。 –

+0

好吧,我不是任何方式的R专家,但我想你将不得不以某种方式处理错误,并清理数据集。也许这会有所帮助,至少对于设置一些尝试...赶上块.. https://www.r-bloggers.com/error-handling-in-r/ – ryguy72

+0

似乎很好... for(i in 1:nrow(origAddress)) { #打印(“正在工作...”) result < - geocode(origAddress $ addresses [i],output =“latlona”,source =“google”) if 。NA(结果$ LON)){ origAddress $ LON [I] < - 99 origAddress $ LAT [I] < - 99个 origAddress $ geoAddress [I] < - “错误” }否则{ 结果< - 地址解析( origAddAdd $ addresses [i],output =“latlona”,source =“google”) origAddress $ lon [i] < - as.numeric(result [1]) origAddress $ lat [i] < - as.numeric结果[2]) origAddress $ geoAddress [i] < - as.character(result [3]) } } –