2017-09-06 213 views
2

我刮了一个网站,昨天要求登录,页面是xml格式,如下所示。由于某些教师属于两个部门,所以我解决了这个问题,而且我不需要前三行,因此我只能成功登录。我需要把它变成一个数据帧(或列表,JSON格式)从xml中提取信息

我的代码:

ID <- xpathApply(xml, "//teacher[@id]") 
ID_unlist <- unlist(ID) 
matrix <- as.data.frame(matrix(ID_unlist),nrow= 2, byrow=TRUE) 

Error in prettyNum(.Internal(format(x, trim, digits, nsmall, width, 3L, : 
    first argument must be atomic 

XML:

<result status="success"> 
    <code>1</code> 
    <note>success</note> 
    <teacherList> 
    <teacher id="D95"> 
     <name>Mary</name> 
     <department id="420"> 
     <name>Math</name> 
     </department> 
     <department id="421"> 
     <name>Statistics</name> 
     </department> 
    </teacher> 
    <teacher id="D73"> 
     <name>Adam</name> 
     <department id="412"> 
     <name>English</name> 
     </department> 
    </teacher> 
    </teacherList> 
</result> 

而且我预计其结果将是:

t_id  teacher  d_id department 
D95   Mary  420   Math 
D95   Mary  421 statistics 
D73   Adam  412  English 

回答

2

可能不是最有效的方式,但有效。

require(XML) 
content_list <- XML::xmlToList(content) 
df<-as.data.frame (do.call(rbind, 
    lapply(content_list$teacherList, function(teacher) { 
     unname (do.call(cbind, list (teacher$.attrs, teacher$name, do.call(rbind, teacher[names(teacher) == "department"]))) ) 
    }) 
) 
) 
colnames(df)<-c("id","teacher","department","did") 


    id teacher department did 
1 D95 Mary  Math 420 
2 D95 Mary Statistics 421 
3 D73 Adam English 412