我有几个大的.xml文件。我想解析出文件做几件事情。用Python解析XML
我想仅抽出:
- 基于XML/TITLE1并将其保存到列表A(例如)
- 基于XML /标题2,并将其保存到列表B中
- 基于XML/TITLE3并保存到列表C
- 等,等
使用Python 2.x的哪个库将是最好的导入/使用。我将如何设置? 有何建议?
例如:
<PubmedArticle>
<MedlineCitation Owner="NLM" Status="MEDLINE">
<PMID Version="1">8981971</PMID>
<Article PubModel="Print">
<Journal>
<ISSN IssnType="Print">0002-9297</ISSN>
<JournalIssue CitedMedium="Print">
<Volume>60</Volume>
<Issue>1</Issue>
<PubDate>
<Year>1997</Year>
<Month>Jan</Month>
</PubDate>
</JournalIssue>
<Title>American journal of human genetics</Title>
<ISOAbbreviation>Am. J. Hum. Genet.</ISOAbbreviation>
</Journal>
<ArticleTitle>mtDNA and Y chromosome-specific polymorphisms in modern Ojibwa: implications about the origin of their gene pool.</ArticleTitle>
<Pagination>
<MedlinePgn>241-4</MedlinePgn>
</Pagination>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Scozzari</LastName>
<ForeName>R</ForeName>
<Initials>R</Initials>
</Author>
</AuthorList>
<MeshHeadingList>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Alleles</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="Y">Y Chromosome</DescriptorName>
</MeshHeading>
</MeshHeadingList>
<OtherID Source="NLM">PMC1712541</OtherID>
</MedlineCitation>
</PubmedArticle>
我会使用'xml.dom.minidom'为此,它带有Python和工作正常。 'lxml'是另一个很好的库,但你必须安装它。 – kindall 2012-02-28 18:49:11