2012-03-09 44 views
0

从给定的字符串,即$ code我只想要所有的语言到语言数组,所有的代码来编码数组,最后所有的家庭到家庭数组,我怎样才能做到这一点在PHP?我曾尝试使用DOM,但它不可能任何其他方式将不胜感激,在此先感谢。将字符串分隔成一个数组?

<?php 
$codes = '<pre> 
LANGUAGE  CODE  LANGUAGE FAMILY 

AFAR   AA  HAMITIC 
ABKHAZIAN  AB  IBERO-CAUCASIAN 
AFRIKAANS  AF  GERMANIC 
AMHARIC   AM  SEMITIC 
ARABIC   AR  SEMITIC 
ASSAMESE  AS  INDIAN 
AYMARA   AY  AMERINDIAN 
AZERBAIJANI  AZ  TURKIC/ALTAIC 
BASHKIR   BA  TURKIC/ALTAIC 
BYELORUSSIAN BE  SLAVIC 
BULGARIAN  BG  SLAVIC 
BIHARI   BH  INDIAN 
BISLAMA   BI  [not given] 
BENGALI;BANGLA BN  INDIAN 
TIBETAN   BO  ASIAN 
BRETON   BR  CELTIC 
CATALAN   CA  ROMANCE 
CORSICAN  CO  ROMANCE 
CZECH   CS  SLAVIC 
WELSH   CY  CELTIC 
DANISH   DA  GERMANIC 
GERMAN   DE  GERMANIC 
BHUTANI   DZ  ASIAN 
GREEK   EL  LATIN/GREEK 
ENGLISH   EN  GERMANIC 
ESPERANTO  EO  INTERNATIONAL AUX. 
SPANISH   ES  ROMANCE 
ESTONIAN  ET  FINNO-UGRIC 
BASQUE   EU  BASQUE 
PERSIAN (farsi) FA  IRANIAN 
FINNISH   FI  FINNO-UGRIC 
FIJI   FJ  OCEANIC/INDONESIAN 
FAROESE   FO  GERMANIC 
FRENCH   FR  ROMANCE 
FRISIAN   FY  GERMANIC 
IRISH   GA  CELTIC 
SCOTS GAELIC GD  CELTIC 
GALICIAN  GL  ROMANCE 
GUARANI   GN  AMERINDIAN 
GUJARATI  GU  INDIAN 
HAUSA   HA  NEGRO-AFRICAN 
HEBREW   HE  SEMITIC [*Changed 1989 from original ISO 639:1988, IW] 
HINDI   HI  INDIAN 
CROATIAN  HR  SLAVIC 
HUNGARIAN  HU  FINNO-UGRIC 
ARMENIAN  HY  INDO-EUROPEAN (OTHER) 
INTERLINGUA  IA  INTERNATIONAL AUX. 
INTERLINGUE  IE  INTERNATIONAL AUX. 
INUPIAK   IK  ESKIMO 
INDONESIAN  ID  OCEANIC/INDONESIAN [*Changed 1989 from original ISO 639:1988, IN] 
ICELANDIC  IS  GERMANIC 
ITALIAN   IT  ROMANCE 
INUKTITUT  IU  [  ] 
JAPANESE  JA  ASIAN 
JAVANESE  JV  OCEANIC/INDONESIAN 
GEORGIAN  KA  IBERO-CAUCASIAN 
KAZAKH   KK  TURKIC/ALTAIC 
GREENLANDIC  KL  ESKIMO 
CAMBODIAN  KM  ASIAN 
KANNADA   KN  DRAVIDIAN 
KOREAN   KO  ASIAN 
KASHMIRI  KS  INDIAN 
KURDISH   KU  IRANIAN 
KIRGHIZ   KY  TURKIC/ALTAIC 
LATIN   LA  LATIN/GREEK 
LINGALA   LN  NEGRO-AFRICAN 
LAOTHIAN  LO  ASIAN 
LITHUANIAN  LT  BALTIC 
LATVIAN;LETTISH LV  BALTIC 
MALAGASY  MG  OCEANIC/INDONESIAN 
MAORI   MI  OCEANIC/INDONESIAN 
MACEDONIAN  MK  SLAVIC 
MALAYALAM  ML  DRAVIDIAN 
MONGOLIAN  MN  [not given] 
MOLDAVIAN  MO  ROMANCE 
MARATHI   MR  INDIAN 
MALAY   MS  OCEANIC/INDONESIAN 
MALTESE   MT  SEMITIC 
BURMESE   MY  ASIAN 
NAURU   NA  [not given] 
NEPALI   NE  INDIAN 
DUTCH   NL  GERMANIC 
NORWEGIAN  NO  GERMANIC 
OCCITAN   OC  ROMANCE 
AFAN (OROMO) OM  HAMITIC 
ORIYA   OR  INDIAN 
PUNJABI   PA  INDIAN 
POLISH   PL  SLAVIC 
PASHTO;PUSHTO PS  IRANIAN 
PORTUGUESE  PT  ROMANCE 
QUECHUA   QU  AMERINDIAN 
RHAETO-ROMANCE RM  ROMANCE 
KURUNDI   RN  NEGRO-AFRICAN 
ROMANIAN  RO  ROMANCE 
RUSSIAN   RU  SLAVIC 
KINYARWANDA  RW  NEGRO-AFRICAN 
SANSKRIT  SA  INDIAN 
SINDHI   SD  INDIAN 
SANGHO   SG  NEGRO-AFRICAN 
SERBO-CROATIAN SH  SLAVIC 
SINGHALESE  SI  INDIAN 
SLOVAK   SK  SLAVIC 
SLOVENIAN  SL  SLAVIC 
SAMOAN   SM  OCEANIC/INDONESIAN 
SHONA   SN  NEGRO-AFRICAN 
SOMALI   SO  HAMITIC 
ALBANIAN  SQ  INDO-EUROPEAN (OTHER) 
SERBIAN   SR  SLAVIC 
SISWATI   SS  NEGRO-AFRICAN 
SESOTHO   ST  NEGRO-AFRICAN 
SUNDANESE  SU  OCEANIC/INDONESIAN 
SWEDISH   SV  GERMANIC 
SWAHILI   SW  NEGRO-AFRICAN 
TAMIL   TA  DRAVIDIAN 
TELUGU   TE  DRAVIDIAN 
TAJIK   TG  IRANIAN 
THAI   TH  ASIAN 
TIGRINYA  TI  SEMITIC 
TURKMEN   TK  TURKIC/ALTAIC 
TAGALOG   TL  OCEANIC/INDONESIAN 
SETSWANA  TN  NEGRO-AFRICAN 
TONGA   TO  OCEANIC/INDONESIAN 
TURKISH   TR  TURKIC/ALTAIC 
TSONGA   TS  NEGRO-AFRICAN 
TATAR   TT  TURKIC/ALTAIC 
TWI    TW  NEGRO-AFRICAN 
UIGUR   UG  [  ] 
UKRAINIAN  UK  SLAVIC 
URDU   UR  INDIAN 
UZBEK   UZ  TURKIC/ALTAIC 
VIETNAMESE  VI  ASIAN 
VOLAPUK   VO  INTERNATIONAL AUX. 
WOLOF   WO  NEGRO-AFRICAN 
XHOSA   XH  NEGRO-AFRICAN 
YIDDISH   YI  GERMANIC [*Changed 1989 from original ISO 639:1988, JI] 
YORUBA   YO  NEGRO-AFRICAN 
ZHUANG   ZA  [  ] 
CHINESE   ZH  ASIAN 
ZULU   ZU  NEGRO-AFRICAN 
</pre>'; 

$doc= new DOMDocument(); 
$doc->loadHTML($codes); 

$xmlL = simplexml_import_dom($doc); 
$pathL = $xmlL->xpath('//pre'); 
print_r($pathL); 

?> 
+0

无论这个代码来自哪里,我建议重做构建它的函数。我建议将已存储的数组转换为HTML,而不是将存储的HTML转换为数组。 – Joseph 2012-03-09 09:13:30

+1

看看http://www.php.net/manual/en/function.str-getcsv.php – 2012-03-09 09:14:05

+0

[不规则空间和选项卡文件split/explode columnwise]的可能dup(http://stackoverflow.com/q/8349551/90527),[将字符串拆分为PHP部分](http://stackoverflow.com/q/715747/90527),[根据数组中的值拆分字符串](http://stackoverflow.com/ q/891204/90527)以及许多其他许多人。 – outis 2012-03-09 09:27:48

回答

1

明显生成列表,让你有更好的运气固定发电机,但如果你坚持这样一个列表,下面应该分析它的方式,你想要:

$langs_ar = array(); 
$codes_ar = array(); 
$families_ar = array(); 

foreach(preg_split('/[\r\n]+/', $codes) as $line) 
{ 
    if (preg_match('/^(\S+\s*\S+)\s+(\S{2})\s+(\S.*\S)\s*$/', $line, $matches)) 
    { 
     $langs_ar[] = $matches[1]; 
     $codes_ar[] = $matches[2]; 
     $families_ar[] = $matches[3]; 
    }                                    
} 

哦,而不是3个数组,我推荐一个数组存储散列3个字段,而不是;或者使用3个属性lang,code和family创建自己的对象。

编辑:更短的方式做同样的是这样的:

preg_match_all('/^(\S+\s*\S+)\s+(\S{2})\s+(\S.*\S)\s*$/m', $codes, $matches, PREG_SET_ORDER); 
var_dump($matches); 

$匹配现在是“物”的所有行的数组,其中索引:

  • 0是全线
  • 1是语言
  • 2是代码
  • 3是家庭

只是迭代完成任何你想做的事情。

+0

是的男人它工作正常 – 2012-03-09 10:36:10

+0

你能解释这是什么请/^(\S+\s*\S+)\s+(\S{2})\s+(\S.*\S)\s*$/ – 2012-03-09 10:37:32

+0

这只是一个正则表达式,请参阅php doc here:http://www.php.net/manual/en/book.pcre.php – 2012-03-09 13:58:29

1

我想你应该看看php的爆炸函数。

这样,你可以先用“\ n”字符分隔(分隔线),然后得到第一个数组。 然后对于每一行,您可以通过\ t(假设您有分隔您的数据的选项卡)来爆炸,以获得具有3个单独条目的数组,然后将这些数组中的每个数组推送到您想要的数组中。

喜欢的东西:

$codes_array = array(); 
foreach($line as explode("\n",$codes)){ 
    $codes_array[] = explode("\t",$line); 
} 
+1

*对于多行字符串定义,请使用双引号。*为什么? – Yoshi 2012-03-09 09:22:07

+0

因为,即使现在它工作,以前它没有被标准支持。 – kappa 2012-03-09 11:22:01

+0

什么?您可能想分享一个链接以供参考。因为在过去的8年中,我从未听说过这一点。 – Yoshi 2012-03-09 11:30:31

相关问题