我从Nokogiri :: XML :: Reader上使用Xml :: Parser从XML文件中提取条目。我想只抓住“Property/PropertyID/Identification ['OrganizationName'=='northsteppe']”的标签,但无法找出正确的语法来完成此操作,这里是我一直在构建的整个耙子任务接下来是一个样本节点,其中包含所有信息和标签。任何指导将不胜感激。在特定节点上只抓取具有特定属性值的条目
================ UPDATE ===============
我解析该文件正在使用中的拉open-uri,因为它来自外部来源,我只是在本地机器上使用旧版本的硬拷贝,以便在开发过程中加快速度,因为文件大小为300MB +。我试图使用一个SAX解析器,但是这个逻辑似乎有点复杂,我真的能够掌握发生了什么,并且遇到了同样的问题,这限制了我只抓住那些'northsteppe'作为Identification标签中的OrganizationName,我说过,我选择使用当前的方法尝试相同的任务,我能够抓住几乎所有我需要的信息,我只是错过了上面提到的最后一部分。
===============抵达尽可能具体=============
所以,我觉得好像描述的确切我正在尝试预成型的任务将有助于消除任何缺失的空白。任务如下。
从<Identification>
标记中的OraganizationName ='northsteppe'的XML文件中抓取每个属性,然后分别获取与每个属性相关的所有相应信息并将其插入散列。在将单个财产的所有信息收集并放入该散列之后,需要将其作为单独条目上载到数据库,该数据库已按照其需要的方式构建。一旦该属性被插入到数据库中,则耙取任务将移动到Property
的下一个条目,该条目符合<Identification>
标记中具有OrganizationName ='northsteppe'的规范并重复该过程,直到满足上述列表中的所有属性规格已插入到数据库中。这样做的目的是为了让我可以快速搜索Northsteppe属性的数据,而无需使用XML文件中的每个属性将系统陷入困境。
最终,我将使用open-uri从该文件的外部源中提取文件,并运行一个cron作业,每6小时执行一次这个rake任务并替换数据库。
================= CODE =================
namespace :db do
# RAKE TASK DESCRIPTION
desc "Fetch property information and insert it into the database"
# RAKE TASK NAME
task :print_properties => :environment do
require 'rubygems'
require 'nokogiri'
module Xml
class Parser
def initialize(node, &block)
@node = node
@node.each do
self.instance_eval &block
end
end
def name
@node.name
end
def inner_xml
@node.inner_xml.strip
end
def is_start?
@node.node_type == Nokogiri::XML::Reader::TYPE_ELEMENT
end
def is_end?
@node.node_type == Nokogiri::XML::Reader::TYPE_END_ELEMENT
end
def attribute(attribute)
@node.attribute(attribute)
end
def for_element(name, &block)
return unless self.name == name and is_start?
self.instance_eval &block
end
def inside_element(name=nil, &block)
return if @node.self_closing?
return unless name.nil? or (self.name == name and is_start?)
name = @node.name
depth = @node.depth
@node.each do
return if self.name == name and is_end? and @node.depth == depth
self.instance_eval &block
end
end
end
end
Xml::Parser.new(Nokogiri::XML::Reader(open("app/assets/xml/mits.xml"))) do
inside_element 'Property' do
# OPEN AND PARSE THE <PropertyID> TAG
inside_element 'PropertyID' do
inside_element 'Identification' do
puts attribute_nodes()
end
# OPEN AND PARSE THE <Address> TAG
inside_element 'Address' do
for_element 'AddressLine1' do puts "Street Address: #{inner_xml}" end
for_element 'City' do puts "City: #{inner_xml}" end
for_element 'PostalCode' do puts "Zipcode: #{inner_xml}" end
end
for_element 'MarketingName' do puts "Short Description: #{inner_xml}" end
end
# OPEN AND PARSE THE <Information> TAG
inside_element 'Information' do
for_element 'LongDescription' do puts "Long Description: #{inner_xml}" end
inside_element 'Rents' do
for_element 'StandardRent' do puts "Rent: #{inner_xml}" end
end
end
inside_element 'Fee' do
for_element 'ApplicationFee' do puts "Application Fee: #{inner_xml}" end
end
inside_element 'ILS_Identification' do
for_element 'Latitude' do puts "Latitude: #{inner_xml}" end
for_element 'Longitude' do puts "Longitude: #{inner_xml}" end
end
end
end
end #END INSERT_PROPERTIES TASK
end #END NAMESPACE
和样品该XML -
<Property IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8">
<PropertyID>
<Identification IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8" OrganizationName="northsteppe" IDType="property"/>
<Identification IDValue="6e1e61523972d5f0e260e3d38eb488337424f21e" OrganizationName="northsteppe" IDType="Company"/>
<MarketingName>Spacious House Central Campus OSU, available fall</MarketingName>
<WebSite>http://northsteppe.appfolio.com/listings/listings/642da00e-9be3-4a7c-bd50-66a4f0d70af8</WebSite>
<Address AddressType="property">
<Description>Address of Available Listing</Description>
<AddressLine1>1689 N 4th St </AddressLine1>
<City>Columbus</City>
<State>OH</State>
<PostalCode>43201</PostalCode>
<Country>US</Country>
</Address>
<Phone PhoneType="office">
<PhoneNumber>(614) 299-4110</PhoneNumber>
</Phone>
<Email>[email protected]</Email>
</PropertyID>
<ILS_Identification ILS_IdentificationType="Apartment" RentalType="Market Rate">
<Latitude>39.997694</Latitude>
<Longitude>-82.99903</Longitude>
<LastUpdate Month="11" Day="11" Year="2013"/>
</ILS_Identification>
<Information>
<StructureType>Standard</StructureType>
<UnitCount>1</UnitCount>
<ShortDescription>Spacious House Central Campus OSU, available fall</ShortDescription>
<LongDescription>One of our favorites! This great house is perfect for students or a single family. With huge living and sleeping rooms, there is plenty of space. The kitchen is totally modernized with new appliances, and the bathroom has been updated. Natural woodwork and brick accents are seen within the house, and the decorative mantles. Ceiling fans and mini-blinds are included, as well as a FREE stack washer and dryer. The front and side deck. On site parking available.</LongDescription>
<Rents>
<StandardRent>2000.00</StandardRent>
</Rents>
<PropertyAvailabilityURL>http://northsteppe.appfolio.com/listings/listings/642da00e-9be3-4a7c-bd50-66a4f0d70af8</PropertyAvailabilityURL>
</Information>
<Fee>
<ProrateType>Standard</ProrateType>
<LateType>Standard</LateType>
<LatePercent>0</LatePercent>
<LateMinFee>0</LateMinFee>
<LateFeePerDay>0</LateFeePerDay>
<NonRefundableHoldFee>0</NonRefundableHoldFee>
<AdminFee>0</AdminFee>
<ApplicationFee>30.00</ApplicationFee>
<BrokerFee>0</BrokerFee>
</Fee>
<Deposit DepositType="Security Deposit">
<Amount AmountType="Actual">
<ValueRange Exact="2000.00" Currency="USD"/>
</Amount>
</Deposit>
<Policy>
<Pet Allowed="false"/>
</Policy>
<Phase IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8">
<Name/>
<Description/>
<UnitCount>1</UnitCount>
<RentableUnits>1</RentableUnits>
<TotalSquareFeet>0</TotalSquareFeet>
<RentableSquareFeet>0</RentableSquareFeet>
</Phase>
<Building IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8">
<Name/>
<Description/>
<UnitCount>1</UnitCount>
<SquareFeet>0</SquareFeet>
</Building>
<Floorplan IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8">
<Name/>
<UnitCount>1</UnitCount>
<Room RoomType="Bedroom">
<Count>4</Count>
<Comment/>
</Room>
<Room RoomType="Bathroom">
<Count>1</Count>
<Comment/>
</Room>
<SquareFeet Min="0" Max="0"/>
<MarketRent Min="2000" Max="2000"/>
<EffectiveRent Min="2000" Max="2000"/>
</Floorplan>
<ILS_Unit IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8">
<Units>
<Unit>
<Identification IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8" OrganizationName="UL Portfolio"/>
<MarketingName>Spacious House Central Campus OSU, available fall</MarketingName>
<UnitBedrooms>4</UnitBedrooms>
<UnitBathrooms>1.0</UnitBathrooms>
<MinSquareFeet>0</MinSquareFeet>
<MaxSquareFeet>0</MaxSquareFeet>
<SquareFootType>internal</SquareFootType>
<UnitRent>2000.00</UnitRent>
<MarketRent>2000.00</MarketRent>
<Address AddressType="property">
<AddressLine1>1689 N 4th St </AddressLine1>
<City>Columbus</City>
<PostalCode>43201</PostalCode>
<Country>US</Country>
</Address>
</Unit>
</Units>
<Availability>
<VacateDate Month="7" Day="23" Year="2014"/>
<VacancyClass>Unoccupied</VacancyClass>
<MadeReadyDate Month="7" Day="23" Year="2014"/>
</Availability>
<Amenity AmenityType="Other">
<Description>All new stainless steel appliances! Refinished hardwood floors</Description>
</Amenity>
<Amenity AmenityType="Other">
<Description>Ceramic tile</Description>
</Amenity>
<Amenity AmenityType="Other">
<Description>Ceiling fans</Description>
</Amenity>
<Amenity AmenityType="Other">
<Description>Wrap-around porch</Description>
</Amenity>
<Amenity AmenityType="Dryer">
<Description>Free Washer and Dryer</Description>
</Amenity>
<Amenity AmenityType="Washer">
<Description>Free Washer and Dryer</Description>
</Amenity>
<Amenity AmenityType="Other">
<Description>off-street parking available</Description>
</Amenity>
</ILS_Unit>
<File Active="true" FileID="820982141">
<FileType>Photo</FileType>
<Description>Unit Photo</Description>
<Name/>
<Caption/>
<Format>image/jpeg</Format>
<Src>http://pa.cdn.appfolio.com/northsteppe/images/31077069-6e81-4373-8a89-508c57585543/medium.jpg</Src>
<Width>360</Width>
<Height>300</Height>
<Rank>1</Rank>
</File>
<File Active="true" FileID="820982145">
<FileType>Photo</FileType>
<Description>Unit Photo</Description>
<Name/>
<Caption/>
<Format>image/jpeg</Format>
<Src>http://pa.cdn.appfolio.com/northsteppe/images/84e1be40-96fd-4717-b75d-09b39231a762/medium.jpg</Src>
<Width>350</Width>
<Height>265</Height>
<Rank>2</Rank>
</File>
<File Active="true" FileID="820982149">
<FileType>Photo</FileType>
<Description>Unit Photo</Description>
<Name/>
<Caption/>
<Format>image/jpeg</Format>
<Src>http://pa.cdn.appfolio.com/northsteppe/images/cd419635-c37f-4676-a43e-c72671a2a748/medium.jpg</Src>
<Width>350</Width>
<Height>265</Height>
<Rank>3</Rank>
</File>
<File Active="true" FileID="820982152">
<FileType>Photo</FileType>
<Description>Unit Photo</Description>
<Name/>
<Caption/>
<Format>image/jpeg</Format>
<Src>http://pa.cdn.appfolio.com/northsteppe/images/6b68dbd5-2cde-477c-99d7-3ca33f03cce8/medium.jpg</Src>
<Width>350</Width>
<Height>265</Height>
<Rank>4</Rank>
</File>
<File Active="true" FileID="820982155">
<FileType>Photo</FileType>
<Description>Unit Photo</Description>
<Name/>
<Caption/>
<Format>image/jpeg</Format>
<Src>http://pa.cdn.appfolio.com/northsteppe/images/17b6c7c0-686c-4e46-865b-11d80744354a/medium.jpg</Src>
<Width>350</Width>
<Height>265</Height>
<Rank>5</Rank>
</File>
<File Active="true" FileID="820982157">
<FileType>Photo</FileType>
<Description>Unit Photo</Description>
<Name/>
<Caption/>
<Format>image/jpeg</Format>
<Src>http://pa.cdn.appfolio.com/northsteppe/images/3545ac8b-471f-404a-94b2-fcd00dd16e25/medium.jpg</Src>
<Width>350</Width>
<Height>265</Height>
<Rank>6</Rank>
</File>
<File Active="true" FileID="820982160">
<FileType>Photo</FileType>
<Description>Unit Photo</Description>
<Name/>
<Caption/>
<Format>image/jpeg</Format>
<Src>http://pa.cdn.appfolio.com/northsteppe/images/02471172-2183-4bf1-a3d7-33415f902c1c/medium.jpg</Src>
<Width>350</Width>
<Height>265</Height>
<Rank>7</Rank>
</File>
</Property>
http://amolnpujari.wordpress.com/2012/03/31/reading_huge_xml-rb/ 我还发现,在阅读大型XML时,牛比nokogiri快5倍。 另外我有一个包装器,它只是让你用ox来搜索大的xml,允许你迭代指定的元素。 https://gist.github.com/amolpujari/5966431 –