0
我使用配置单元serDe(https://github.com/dvasilen/Hive-XML-SerDe/wiki/XML-data-sources)进行XML解析并将其加载到配置单元。 示例XML内容:Xml与配置单元解析
<records>
<record customer_id="0000-JTALA">
<income>200000</income>
<address type="M">
<Flatno>345</FlatNo>
<Street>ABS</street>
<city>QWW</city>
<country>US</country>
<pin>3235</pin>
</address>
<address type="B">
<Street>ABS</street>
<city>QWW</city>
<country>US</country>
<pin>3235</pin>
</address>
</record>
<record customer_id="0001-JTALA">
<income>200000</income>
<address type="M">
<Flatno>45</FlatNo>
<Street>fgBS</street>
<city>QWW</city>
<country>US</country>
<pin>3235</pin>
</address>
<address type="B">
<Street>ABS</street>
<city>QWW</city>
<country>US</country>
<pin>325</pin>
</address>
<address type="P">
<Street>ABS</street>
<city>QWW</city>
<country>UK</country>
<pin>325</pin>
</address>
</record>
</records>
对于行应创建的每个地址。根据上面的第一个客户的样本应该创建2个记录,第二个客户3个记录应该创建总共5个记录,按照我现在的代码,每个单个客户创建两个记录,并且在地址列中所有地址都连接在一起为第一个客户街道栏目(第一个地址街道+第二个街道地址)。 样品查询:
CREATE external TABLE msg_details(customer_id STRING, income BIGINT, AType String,Flatno String, Street string,city string,country string,pin string)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.customer_id"="/record/@customer_id",
"column.xpath.income"="/record/income/text()",
"column.xpath.address_type"="/record/address/@type",
"column.xpath.Flatno"="/record/address/Flatno/text()",
"column.xpath.Street"="/record/address/Street/text()",
"column.xpath.city"="/record/address/city/text()",
"column.xpath.country"="/record/address/country/text()"
"column.xpath.pin"="/record/address/pin/text()"
)
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
location '/user/root/serdeinput'
TBLPROPERTIES (
"xmlinput.start"="<record customer",
"xmlinput.end"="</record>"
);
任何人都可以帮助我! – bhargavi