我已经编写了以下代码,用于从链接数据应用程序的内容类型为application/rdf-xml
的网页中提取URI。使用Jena Library从Java中的RDF网页中提取URI
public static void test(String url) {
try {
Model read = ModelFactory.createDefaultModel().read(url);
System.out.println("to go");
StmtIterator si;
si = read.listStatements();
System.out.println("to go");
while(si.hasNext()) {
Statement s=si.nextStatement();
Resource r=s.getSubject();
Property p=s.getPredicate();
RDFNode o=s.getObject();
System.out.println(r.getURI());
System.out.println(p.getURI());
System.out.println(o.asResource().getURI());
}
}
catch(JenaException | NoSuchElementException c) {}
}
但对于输入
<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:ex="http://example.org/stuff/1.0/">
<rdf:Description rdf:about="http://www.w3.org/TR/rdf-syntax-grammar"
dc:title="RDF/XML Syntax Specification (Revised)">
<ex:editor>
<rdf:Description ex:fullName="Dave Beckett">
<ex:homePage rdf:resource="http://purl.org/net/dajobe/" />
</rdf:Description>
</ex:editor>
</rdf:Description>
</rdf:RDF>
输出是:
Subject URI is http://www.w3.org/TR/rdf-syntax-grammar
Predicate URI is http://example.org/stuff/1.0/editor
Object URI is null
Subject URI is http://www.w3.org/TR/rdf-syntax-grammar
Predicate URI is http://purl.org/dc/elements/1.1/title
Website is read
我需要在输出目前该网页建立RDF页面的网络爬虫上的所有URI。 我需要输出的所有访问以下链接:
http://www.w3.org/TR/rdf-syntax-grammar
http://example.org/stuff/1.0/editor
http://purl.org/net/dajobe
http://example.org/stuff/1.0/fullName
http://www.w3.org/TR/rdf-syntax-grammar
http://purl.org/dc/elements/1.1/title
把XML网上,给我们另外,你不应该在所有的三元手动迭代的URL – Raffaele
。请参阅[这个旧答案](http://stackoverflow.com/a/12236809/315306)简要介绍您应该在Jena中使用的查询语言以从序列化模型中提取信息 – Raffaele
删除这两个无用的评论,并编辑您的问题提供所需的输出,因为我不能完全理解您的问题 – Raffaele