2014-10-17 25 views
2

我有下面列出的SPARQL查询(对长度表示歉意)我想将此查询的结果转换为R数据框,类似于预览here中可用的数据框将查询内容粘贴到中输入查询窗口。在一个句子中,我只对下载数字,列标题和第一列标识地理区域感兴趣。当运行当前查询并试图强制数据框中的结果并在gggplot中使用它时,我一直得到一个错误ggplot2不知道如何处理班级列表的数据,这是因为返回的数据没有与测试查询内容时在预览窗口中返回的CSV文件相似。我的问题是我应该在下面的代码中更改什么,它会生成一个R数据框对象,其值和结构对应于下面的预览表。 query results将SPARQL结果作为CSV获取到R

代码导入数据

# Libs 
    library(SPARQL) 

    # Source the data 
    ## Define endpoint URL. 
    endpoint <- "http://data.opendatascotland.org/sparql?query" 

    ### Create Query and download table for the SIMD rank 
    query.simd <- "PREFIX stats: <http://statistics.data.gov.uk/id/statistical-geography/> 
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
    PREFIX simd: <http://data.opendatascotland.org/def/simd/> 
    PREFIX cube: <http://purl.org/linked-data/cube#> 
    PREFIX stats_dim: <http://data.opendatascotland.org/def/statistical-dimensions/> 
    PREFIX year: <http://reference.data.gov.uk/id/year/> 

    SELECT DISTINCT 
    ?dz_label 
    ?overall_rank 
    ?income_deprivation_rank 
    ?employment_deprivation_rank 
    ?health_deprivation_rank 
    ?education_deprivation_rank 
    ?access_deprivation_rank 
    ?housing_deprivation_rank 
    ?crime_deprivation_rank 

    WHERE { 

    GRAPH <http://data.opendatascotland.org/graph/simd/rank> { 
    ?overall_rank_observation stats_dim:refArea ?dz . 
    ?overall_rank_observation stats_dim:refPeriod year:2012 . 
    ?overall_rank_observation simd:rank ?overall_rank . 
    } 

    GRAPH <http://data.opendatascotland.org/graph/simd/income-rank> { 
    ?income_rank_observation stats_dim:refArea ?dz . 
    ?income_rank_observation stats_dim:refPeriod year:2012 . 
    ?income_rank_observation simd:incomeRank ?income_deprivation_rank . 
    } 

    GRAPH <http://data.opendatascotland.org/graph/simd/employment-rank> { 
    ?employment_rank_observation stats_dim:refArea ?dz . 
    ?employment_rank_observation stats_dim:refPeriod year:2012 . 
    ?employment_rank_observation simd:employmentRank ?employment_deprivation_rank . 
    } 

    GRAPH <http://data.opendatascotland.org/graph/simd/health-rank> { 
    ?health_rank_observation stats_dim:refArea ?dz . 
    ?health_rank_observation stats_dim:refPeriod year:2012 . 
    ?health_rank_observation simd:healthRank ?health_deprivation_rank . 
    } 

    GRAPH <http://data.opendatascotland.org/graph/simd/education-rank> { 
    ?education_rank_observation stats_dim:refArea ?dz . 
    ?education_rank_observation stats_dim:refPeriod year:2012 . 
    ?education_rank_observation simd:educationRank ?education_deprivation_rank . 
    } 

    GRAPH <http://data.opendatascotland.org/graph/simd/geographic-access-rank> { 
    ?access_rank_observation stats_dim:refArea ?dz . 
    ?access_rank_observation stats_dim:refPeriod year:2012 . 
    ?access_rank_observation simd:geographicAccessRank ?access_deprivation_rank . 
    } 

    GRAPH <http://data.opendatascotland.org/graph/simd/housing-rank> { 
    ?housing_rank_observation stats_dim:refArea ?dz . 
    ?housing_rank_observation stats_dim:refPeriod year:2012 . 
    ?housing_rank_observation simd:housingRank ?housing_deprivation_rank . 
    } 

    GRAPH <http://data.opendatascotland.org/graph/simd/crime-rank> { 
    ?crime_rank_observation stats_dim:refArea ?dz . 
    ?crime_rank_observation stats_dim:refPeriod year:2012 . 
    ?crime_rank_observation simd:crimeRank ?crime_deprivation_rank . 
    } 

    { 
    SELECT ?dz ?dz_label WHERE 
    { 
    ?dz a <http://data.opendatascotland.org/def/geography/DataZone> . 
    ?dz rdfs:label ?dz_label . 
    } 
    } 
    }" 

    # Make the data 
    dta.main <- SPARQL(endpoint, query.simd, format="csv") 
+0

什么是服务器托管的数据? “format = csv”是常见的但不是标准。它可能拼写为“输出”。理想情况下,使用HTTP请求的“Accept”头来询问服务器。 – AndyS 2014-10-19 15:12:47

+0

感谢您表示兴趣,服务器是http://www.opendatascotland.org/。据推测,我应该能够通过在端点地址中提供.csv扩展名来获取CSV表,但我一直在获取返回的XML内容不可读的信息。这很奇怪,因为通过网站进行测试时查询起作用。 – Konrad 2014-10-19 20:45:26

+0

它运行的是Apache Jena Fuseki 1.0.0(如果他们认为Fuseki是免费和开放源代码的,并且他们对此没有贡献)。他们已经使用原始SPARQL协议之上的某种处理器对其进行了分层。有一个联系人的电子邮件地址 - 你需要问问他们。 – AndyS 2014-10-20 09:35:08

回答

0

对于那些谁可能有兴趣在这个问题上,一组试验和错误,并检查与它出现在网站开发之后工作的解决办法请使用以下命令执行查询:

dta.simd<- SPARQL(url = endpoint, query = query.simd, format = "csv")$results 

和下面的查询来源数据:

## Define endpoint URL. 
    endpoint <- "http://data.opendatascotland.org/sparql.csv" 

    ### Create Query and download table for the SIMD rank 
    query.simd <- "PREFIX stats: <http://statistics.data.gov.uk/id/statistical-geography/> 
     PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
     PREFIX simd: <http://data.opendatascotland.org/def/simd/> 
     PREFIX cube: <http://purl.org/linked-data/cube#> 
     PREFIX stats_dim: <http://data.opendatascotland.org/def/statistical-dimensions/> 
     PREFIX year: <http://reference.data.gov.uk/id/year/> 

     SELECT DISTINCT 
     ?dz_label 
     ?overall_rank 
     ?income_deprivation_rank 
     ?employment_deprivation_rank 
     ?health_deprivation_rank 
     ?education_deprivation_rank 
     ?access_deprivation_rank 
     ?housing_deprivation_rank 
     ?crime_deprivation_rank 

     WHERE { 

     GRAPH <http://data.opendatascotland.org/graph/simd/rank> { 
     ?overall_rank_observation stats_dim:refArea ?dz . 
     ?overall_rank_observation stats_dim:refPeriod year:2012 . 
     ?overall_rank_observation simd:rank ?overall_rank . 
     } 

     GRAPH <http://data.opendatascotland.org/graph/simd/income-rank> { 
     ?income_rank_observation stats_dim:refArea ?dz . 
     ?income_rank_observation stats_dim:refPeriod year:2012 . 
     ?income_rank_observation simd:incomeRank ?income_deprivation_rank . 
     } 

     GRAPH <http://data.opendatascotland.org/graph/simd/employment-rank> { 
     ?employment_rank_observation stats_dim:refArea ?dz . 
     ?employment_rank_observation stats_dim:refPeriod year:2012 . 
     ?employment_rank_observation simd:employmentRank ?employment_deprivation_rank . 
     } 

     GRAPH <http://data.opendatascotland.org/graph/simd/health-rank> { 
     ?health_rank_observation stats_dim:refArea ?dz . 
     ?health_rank_observation stats_dim:refPeriod year:2012 . 
     ?health_rank_observation simd:healthRank ?health_deprivation_rank . 
     } 

     GRAPH <http://data.opendatascotland.org/graph/simd/education-rank> { 
     ?education_rank_observation stats_dim:refArea ?dz . 
     ?education_rank_observation stats_dim:refPeriod year:2012 . 
     ?education_rank_observation simd:educationRank ?education_deprivation_rank . 
     } 

     GRAPH <http://data.opendatascotland.org/graph/simd/geographic-access-rank> { 
     ?access_rank_observation stats_dim:refArea ?dz . 
     ?access_rank_observation stats_dim:refPeriod year:2012 . 
     ?access_rank_observation simd:geographicAccessRank ?access_deprivation_rank . 
     } 

     GRAPH <http://data.opendatascotland.org/graph/simd/housing-rank> { 
     ?housing_rank_observation stats_dim:refArea ?dz . 
     ?housing_rank_observation stats_dim:refPeriod year:2012 . 
     ?housing_rank_observation simd:housingRank ?housing_deprivation_rank . 
     } 

     GRAPH <http://data.opendatascotland.org/graph/simd/crime-rank> { 
     ?crime_rank_observation stats_dim:refArea ?dz . 
     ?crime_rank_observation stats_dim:refPeriod year:2012 . 
     ?crime_rank_observation simd:crimeRank ?crime_deprivation_rank . 
     } 

    { 
     SELECT ?dz ?dz_label WHERE 
    { 
     ?dz a <http://data.opendatascotland.org/def/geography/DataZone> . 
     ?dz rdfs:label ?dz_label . 
    } 
    } 
     }" 

这为6505个地域和所有指标类似于下面的例子中所需的数据帧:

datazone   overall_rank income_deprivation rank 
Data zone S000001 2   4 
Data zone S000002 5   3