2017-07-12 144 views
0

之间的字,我有以下数据集:解析标签

<?xml version='1.0' encoding='utf-8'?> 
<AudioDoc name="x" path="x"><SpeakerList><Speaker ch="1" gender="1" lang="dut" spkid="S0" /><Speaker ch="1" gender="1" lang="dut" spkid="S1" /><Speaker ch="1" gender="1" lang="dut" spkid="S100" /><Speaker ch="1" gender="1" lang="dut" spkid="S103" /><Speaker ch="1" gender="1" lang="dut" spkid="S105" /><Speaker ch="1" gender="1" lang="dut" spkid="S12" /><Speaker ch="1" gender="1" lang="dut" spkid="S135" /><Speaker ch="1" gender="1" lang="dut" spkid="S306" /></SpeakerList><SegmentList><SpeechSegment align="left" spkid="S0"><Word dur="0.26" stime="0.54"> nou </Word><Word dur="0.18" stime="0.8"> mijn </Word><Word dur="0.24" stime="0.99"> hele </Word><Word dur="0.27" stime="1.23"> jeugd </Word><Word dur="0.27" stime="1.76"> was </Word><Word dur="0.21" stime="2.45"> lekker </Word><Word dur="0.11" stime="2.68"> mee </Word></SpeechSegment><SpeechSegment align="left" spkid="S1"><Word dur="0.12" stime="2.88"> en </Word><Word dur="0.1" stime="3.02"> ben </Word><Word dur="0.18" stime="3.18"> heel </Word><Word dur="0.23" stime="3.59"> pak </Word><Word dur="0.09" stime="3.83"> ik </Word><Word dur="0.11" stime="4.26"> ik </Word><Word dur="0.22" stime="4.46"> speel </Word><Word dur="0.36" stime="4.68"> zelf </Word><Word dur="0.26" stime="5.27"> koken </Word><Word dur="0.27" stime="6.9"> paar </Word><Word dur="0.55" stime="7.17"> vraag </Word><Word dur="0.17" stime="7.72"> is </Word><Word dur="0.27" stime="9.81"> eerst </Word><Word dur="0.36" stime="10.08"> over </Word><Word dur="0.33" stime="11.01"> bedrijf </Word><Word dur="0.32" stime="11.34"> zelf </Word><Word dur="0.11" stime="11.66"> maar </Word><Word dur="0.07" stime="11.77"> daar </Word><Word dur="0.16" stime="11.84"> ga </Word><Word dur="0.09" stime="12.0"> ik </Word><Word dur="0.27" stime="12.09"> dan </Word><Word dur="0.17" stime="12.43"> iets </Word><Word dur="0.15" stime="12.6"> meer </Word><Word dur="0.38" stime="12.75"> over </Word><Word dur="0.15" stime="13.67"> wat </Word><Word dur="0.07" stime="13.83"> ik </Word><Word dur="0.27" stime="13.9"> zelf </Word><Word dur="0.19" stime="14.18"> van </Word><Word dur="0.09" stime="14.37"> het </Word><Word dur="0.42" stime="14.46"> onderzoek </Word><Word dur="0.3" stime="14.91"> ben </Word><Word dur="0.33" stime="16.89"> functie </Word><Word dur="0.15" stime="17.22"> binnen </Word><Word dur="0.06" stime="17.37"> het </Word><Word dur="0.36" stime="17.43"> bedrijf </Word><Word dur="0.51" stime="18.03"> founder </Word><Word dur="0.39" stime="19.2"> klopt </Word><Word dur="0.39" stime="20.25"> zelf </Word><Word dur="0.45" stime="20.64"> opleiding </Word><Word dur="0.39" stime="21.09"> gedaan </Word></SpeechSegment><SpeechSegment align="right" spkid="S100"><Word dur="0.53" stime="21.59"> ja </Word><Word dur="0.29" stime="22.19"> ik </Word><Word dur="0.15" stime="22.49"> of </Word><Word dur="0.11" stime="22.64"> ik </Word><Word dur="0.11" stime="22.76"> zo </Word><Word dur="0.14" stime="22.91"> een </Word><Word dur="0.32" stime="23.67"> ja </Word><Word dur="0.09" stime="23.99"> ik </Word><Word dur="0.18" stime="24.08"> heb </Word><Word dur="0.12" stime="24.26"> waar </Word><Word dur="0.16" stime="24.43"> ik </Word><Word dur="0.21" stime="24.62"> mee </Word><Word dur="0.26" stime="24.84"> aan </Word><Word dur="0.23" stime="26.25"> waren </Word><Word dur="0.12" stime="28.51"> het </Word><Word dur="0.25" stime="28.75"> eerste </Word><Word dur="0.22" stime="29.15"> wordt </Word></SpeechSegment><SpeechSegment align="right" spkid="S105"><Word dur="0.14" stime="29.76"> en </Word><Word dur="0.1" stime="30.01"> dan </Word><Word dur="0.17" stime="30.15"> mensen </Word><Word dur="0.15" stime="1148.31"> joh </Word><Word dur="0.12" stime="1148.49"> ik </Word><Word dur="0.12" stime="1148.61"> ben </Word><Word dur="0.21" stime="1148.73"> goed </Word><Word dur="0.09" stime="1148.94"> in </Word><Word dur="0.57" stime="1149.03"> wiskunde </Word><Word dur="0.24" stime="1149.63"> en </Word><Word dur="0.11" stime="1149.87"> ik </Word><Word dur="0.14" stime="1149.98"> kan </Word><Word dur="0.13" stime="1150.12"> ook </Word><Word dur="0.18" stime="1150.59"> die </Word><Word dur="0.19" stime="1150.78"> met </Word><Word dur="0.44" stime="1150.99"> nederlands </Word><Word dur="0.28" stime="1151.44"> engels </Word><Word dur="0.2" stime="1151.72"> komt </Word><Word dur="0.13" stime="1151.93"> wil </Word><Word dur="0.1" stime="1152.08"> ik </Word><Word dur="0.09" stime="1152.21"> ook </Word><Word dur="0.19" stime="1152.3"> gewoon </Word><Word dur="0.18" stime="1152.48"> kunnen </Word><Word dur="0.42" stime="1152.66"> maken </Word></SpeechSegment><SpeechSegment align="left" spkid="S0"><Word dur="0.36" stime="1154.62"> zie </Word><Word dur="0.48" stime="1154.99"> je </Word><Word dur="0.21" stime="1155.5"> zie </Word><Word dur="0.08" stime="1155.72"> de </Word><Word dur="0.57" stime="1155.8"> mogelijkheid </Word><Word dur="0.09" stime="1156.37"> om </Word><Word dur="0.09" stime="1156.46"> wat </Word><Word dur="0.09" stime="1156.55"> je </Word><Word dur="0.18" stime="1156.64"> nu </Word><Word dur="0.27" stime="1156.82"> dus </Word><Word dur="0.21" stime="1157.09"> met </Word><Word dur="0.39" stime="1157.36"> wiskunde </Word><Word dur="0.48" stime="1157.81"> bijvoorbeeld </Word><Word dur="0.51" stime="1158.38"> betrekt </Word><Word dur="0.24" stime="1159.37"> zou </Word><Word dur="0.06" stime="1159.61"> je </Word><Word dur="0.15" stime="1159.67"> dat </Word><Word dur="0.12" stime="1159.82"> met </Word><Word dur="0.45" stime="1159.94"> engels </Word><Word dur="0.15" stime="1160.42"> en </Word><Word dur="0.27" stime="1160.57"> andere </Word><Word dur="0.54" stime="1160.84"> vakken </Word><Word dur="0.23" stime="1161.71"> kunnen </Word><Word dur="0.24" stime="1161.95"> pas </Word><Word dur="0.14" stime="1162.2"> als </Word><Word dur="0.08" stime="1162.37"> je </Word><Word dur="0.3" stime="1162.46"> daar </Word><Word dur="0.33" stime="1162.79"> nog </Word><Word dur="0.5" stime="1163.25"> moeilijk </Word></SpeechSegment><SpeechSegment align="right" spkid="S100"><Word dur="0.18" stime="1164.13"> kan </Word><Word dur="0.05" stime="1164.31"> er </Word><Word dur="0.16" stime="1164.36"> ook </Word><Word dur="0.3" stime="1164.52"> maar </Word><Word dur="0.34" stime="1164.82"> wel </Word><Word dur="0.18" stime="1165.21"> wel </Word><Word dur="0.35" stime="1165.4"> minder </Word><Word dur="0.33" stime="1165.75"> goed </Word><Word dur="0.3" stime="1166.08"> omdat </Word><Word dur="0.39" stime="1166.38"> je </Word></SpeechSegment><SpeechSegment align="right" spkid="S100"><Word dur="0.15" stime="1167.39"> je </Word><Word dur="0.15" stime="1167.54"> kan </Word><Word dur="0.09" stime="1167.69"> in </Word><Word dur="0.15" stime="1167.78"> ieder </Word><Word dur="0.15" stime="1167.93"> geval </Word><Word dur="0.09" stime="2221.4"> ik </Word><Word dur="0.12" stime="2221.49"> was </Word><Word dur="0.44" stime="2221.62"> moeilijk </Word></SpeechSegment><SpeechSegment align="left" spkid="S0"><Word dur="0.21" stime="2223.29"> ik </Word><Word dur="0.15" stime="2223.98"> ik </Word><Word dur="0.18" stime="2224.13"> weet </Word><Word dur="0.27" stime="2224.31"> genoeg </Word><Word dur="0.12" stime="2224.61"> ik </Word><Word dur="0.15" stime="2224.73"> zeg </Word><Word dur="0.16" stime="2225.03"> dat </Word><Word dur="0.26" stime="2225.86"> voor </Word><Word dur="0.22" stime="2226.13"> is </Word><Word dur="0.15" stime="2226.38"> en </Word><Word dur="0.06" stime="2226.53"> je </Word><Word dur="0.18" stime="2226.59"> hebt </Word><Word dur="0.41" stime="2227.1"> alles </Word><Word dur="0.14" stime="2227.99"> voor </Word><Word dur="0.12" stime="2228.15"> mij </Word><Word dur="0.23" stime="2228.27"> wordt </Word></SpeechSegment></SegmentList></AudioDoc> 

我想现在要做的就是解析字和字的持续时间。因此,在这个例子:

<Word dur="0.21" stime="2.45"> lekker </Word> 

我找

0.21 
lekker 

我做了这个:

from xml.dom import minidom 
xmldoc = minidom.parse('x.xml') 
itemlist = xmldoc.getElementsByTagName('Word') 
print(itemlist)[0].attributes['dur'].value 

这给了我时间。但我正在寻找一种方法来同样使用这个词(“lekker”)。有什么想法吗?

回答

4

你想:

itemlist[0].firstChild.nodeValue 

它有助于了解该XML DOM访问规则是一样的那些HTML DOM作为在JS使用(如果你知道JS)。