DOM解析器与具有DOCTYPE声明的HTML冻结

该程序从我的站点读取两个HTML，然后解析每个HTML。第一个HTML（pass.html）中没有DOCTYPE声明。 pass.html正常解析。DOM解析器与具有DOCTYPE声明的HTML冻结

第二个HTML（freeze.html）有一个DOCTYPE声明。 freeze.html被W3C的验证服务判定为 fully valid 。但是，当我尝试解析freeze.html时，程序冻结在.parse(is)

什么是错？

pass.html

<?xml version="1.0" encoding="US-ASCII"?> 
<html xmlns="http://www.w3.org/1999/xhtml"> 
<head> 
<title>pass</title> 
</head> 
<body> 
    <h1>no DOCTYPE declaration</h1> 
    </body> 
</html>

freeze.html

<?xml version="1.0" encoding="US-ASCII"?> 
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> 
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"> 
<head> 
<title>freeze</title> 
</head> 
<body> 
    <h1>has DOCTYPE declaration</h1> 
</body> 
</html>

来源

2016-08-28 Hiroki Horiuchi

你的问题的内容必须是**你的问题，而不仅仅是链接**。在问题中加入'freeze.html'。（如果超过几行，请将其删除，直到您找到足够小的数据以显示问题为止;请参阅[mcve]。） –

必须在DocumentBuilder上调用setEntityResolver（），并提供一个解析器来解析本地的DTD。否则，解析会尝试从网站位置下载，该网站故意非常缓慢地响应，导致您的冻结。请参阅http://stackoverflow.com/questions/2640825/how-to-parse-a-xhtml-ignoring-the-doctype-declaration-using-dom-parser?rq=1 – Alohci

以下设置指示解析器不加载从DOCTYPE声明外部DTD。更改方法newDocumentBuilder()：

DocumentBuilder newDocumentBuilder() throws Exception { 
    final DocumentBuilderFactory f = DocumentBuilderFactory.newInstance(); 
    f.setValidating(false); 
    f.setAttribute("http://apache.org/xml/features/nonvalidating/load-external-dtd", false); 
    return f.newDocumentBuilder(); 
}

来源

2016-09-18 12:01:19 Djoerd

更多信息，请访问：https：// jaxp .java.net/1.5/JAXP1.5Guide.html – Djoerd

DOM解析器与具有DOCTYPE声明的HTML冻结

回答

相关问题