中文提供PHP preg_match_all

我有一个中文提供导出的文本文件。该文本文件具有以下模式的许多条目。中文提供PHP preg_match_all

@article{ls_leimeister, 
    added-at = {2013-01-18T11:14:11.000+0100}, 
    author = {Wegener, R. and Leimeister, J. M.}, 
    biburl = {http://www.bibsonomy.org/bibtex/27bb26b4b4858439f81aa0ec777944ac5/ls_leimeister}, 
    journal = {International Journal of Technology Enhanced Learning (to appear)}, 
    keywords = {Challenges Communities: Factors Learning Success VirtualCommunity and itegpub pub_jml pub_rwe}, 
    note = {JML_390}, 
    title = {Virtual Learning Communities: Success Factors and Challenges}, 
    year = 2013 
}

我想使用PHP和preg_match_all

考虑下面没得到我的任何地方：

preg_match_all('/@^.*}$/', file_get_contents($file_path),$results);

我想开始简单的，但真正地没有工作。我有点新的PHP RegEx。

完美的最终输出将是：

Array 
    (
     [0] => Array 
      (
       ['type'] => article 
       ['unique_name'] => ls_leimeister 
       ['added-at'] => 2013-01-18T11:14:11.000+0100 
       ['author'] => Wegener, R. and Leimeister, J. M. 
       ['biburl'] => http://www.bibsonomy.org/bibtex/27bb26b4b4858439f81aa0ec777944ac5/ls_leimeister 
       ['journal'] => International Journal of Technology Enhanced Learning (to appear) 
       ['keywords'] => Challenges Communities: Factors Learning Success VirtualCommunity and itegpub pub_jml pub_rwe 
       ['note'] => JML_390 
       ['title'] => Virtual Learning Communities: Success Factors and Challenges 
       ['year'] => 2013 
      ) 

     [1] => Array 
      (
       [...] => … 
      ) 

    )

来源

2013-02-28 Spurious

@renanbr [renanbr]（https://stackoverflow.com/users/5249251/renanbr）推荐：renanbr/bibtex-parser https://github.com/renanbr/bibtex-parser（我认为是他自己的发明）。 – mickmackusa 2017-12-12 03:15:08

我已经看到的所有BibTex文档都将年份值包裹在大括号中。这是发布时的错字吗？ – mickmackusa 2017-12-12 13:54:05

试试这个：这里我只进账和typeunique_name，通过观察它，你可以获取所有其他人。

$str = '@article{ls_leimeister, 
    added-at = {2013-01-18T11:14:11.000+0100}, 
    author = {Wegener, R. and Leimeister, J. M.}, 
    biburl = {http://www.bibsonomy.org/bibtex/27bb26b4b4858439f81aa0ec777944ac5/ls_leimeister}, 
    journal = {International Journal of Technology Enhanced Learning (to appear)}, 
    keywords = {Challenges Communities: Factors Learning Success VirtualCommunity and itegpub pub_jml pub_rwe}, 
    note = {JML_390}, 
    title = {Virtual Learning Communities: Success Factors and Challenges}, 
    year = 2013 
}'; 

preg_match_all('/@(?P<type>\w+){(?P<unique_name>\w+),(.*)/',$str,$matches); 

echo $matches['type'][0]; 
echo "<br>"; 
echo $matches['unique_name'][0]; 
echo "<br>"; 

echo "<pre>"; 
print_r($matches);

输出数组格式与您的输出数组格式稍有不同，但您可以将此格式更改为您的格式。

来源

2013-02-28 12:13:14

感谢这个工程，但其他行是比较困难的。行数是可变的，也有一些线路有“{...}，”和其他人不。 – Spurious 2013-02-28 12:25:03

是的，我知道这很难，但你试着去做。 – 2013-02-28 12:30:00

preg_match_all（ '/ @（\ W +）{（+），\ S +（\ S +）\ S + = \ S + {（*）}，（*）/。'，$ FILE_CONTENT，$结果）; 这产生的第一行，以及。如何让RegEx检索具有相同格式的无限数量的行？我需要读出条目的匹配，然后为不同的匹配做另一个preg_match。 – Spurious 2013-02-28 12:41:51

图样：/^@([^{]+)\{([^,]+),\s*$|^\s*([^\[email protected]=]+) = \{(.*?)}/ms（Demo）

这种模式有两个替代方案;每个包含两个捕获组。

type和unique_name被捕获并存储在元件[1]和[2]。
所有其他键 - 值对存储在元素[3]和[4]。

此分离的阵列存储允许可靠的加工构建所期望的输出阵列结构时。

输入：

$bibtex='@BOOK{ko, 
    title = {Wissenschaftlich schreiben leicht gemacht}, 
    publisher = {Haupt}, 
    year = {2011}, 
    author = {Kornmeier, M.}, 
    number = {3154}, 
    series = {UTB}, 
    address = {Bern}, 
    edition = {4}, 
    subtitle = {für Bachelor, Master und Dissertation} 
} 

@BOOK{nial, 
    title = {Wissenschaftliche Arbeiten schreiben mit Word 2010}, 
    publisher = {Addison Wesley}, 
    year = {2011}, 
    author = {Nicol, N. and Albrecht, R.}, 
    address = {München}, 
    edition = {7} 
} 

@ARTICLE{shome, 
    author = {Scholz, S. and Menzl, S.}, 
    title = {Alle Wege führen nach Rom}, 
    journal = {Medizin Produkte Journal}, 
    year = {2011}, 
    volume = {18}, 
    pages = {243-254}, 
    subtitle = {ein Vergleich der regulatorischen Anforderungen und Medizinprodukte 
    in Europa und den USA}, 
    issue = {4} 
} 

@INBOOK{shu, 
    author = {Schulz, C.}, 
    title = {Corporate Finance für den Mittelstand}, 
    booktitle = {Praxishandbuch Firmenkundengeschäft}, 
    year = {2010}, 
    editor = {Hilse, J. and Netzel, W and Simmert, D.B.}, 
    booksubtitle = {Geschäftsfelder Risikomanagement Marketing}, 
    publisher = {Gabler}, 
    pages = {97-107}, 
    location = {Wiesbaden} 
}';

方法：（Demo）

$pattern='/^@([^{]+)\{([^,]+),\s*$|^\s*([^\[email protected]=]+) = \{(.*?)}/ms'; 
if(preg_match_all($pattern,$bibtex,$out,PREG_SET_ORDER)){ 
    foreach($out as $line){ 
     if(isset($line[1])){ 
      if(!isset($line[3])){ // this is the starting line of a new set 
       if(isset($temp)){ 
        $result[]=$temp; // send $temp data to permanent storage 
       } 
       $temp=['type'=>$line[1],'unique_name'=>$line[2]]; // declare fresh new $temp 
      }else{ 
       $temp[$line[3]]=$line[4]; // continue to store the $temp data 
      } 
     } 
    } 
    $result[]=$temp; // store the final $temp data 
} 
var_export($result);

输出：

array (
    0 => 
    array (
    'type' => 'BOOK', 
    'unique_name' => 'ko', 
    'title' => 'Wissenschaftlich schreiben leicht gemacht', 
    'publisher' => 'Haupt', 
    'year' => '2011', 
    'author' => 'Kornmeier, M.', 
    'number' => '3154', 
    'series' => 'UTB', 
    'address' => 'Bern', 
    'edition' => '4', 
    'subtitle' => 'für Bachelor, Master und Dissertation', 
), 
    1 => 
    array (
    'type' => 'BOOK', 
    'unique_name' => 'nial', 
    'title' => 'Wissenschaftliche Arbeiten schreiben mit Word 2010', 
    'publisher' => 'Addison Wesley', 
    'year' => '2011', 
    'author' => 'Nicol, N. and Albrecht, R.', 
    'address' => 'München', 
    'edition' => '7', 
), 
    2 => 
    array (
    'type' => 'ARTICLE', 
    'unique_name' => 'shome', 
    'author' => 'Scholz, S. and Menzl, S.', 
    'title' => 'Alle Wege führen nach Rom', 
    'journal' => 'Medizin Produkte Journal', 
    'year' => '2011', 
    'volume' => '18', 
    'pages' => '243-254', 
    'subtitle' => 'ein Vergleich der regulatorischen Anforderungen und Medizinprodukte 
    in Europa und den USA', 
    'issue' => '4', 
), 
    3 => 
    array (
    'type' => 'INBOOK', 
    'unique_name' => 'shu', 
    'author' => 'Schulz, C.', 
    'title' => 'Corporate Finance für den Mittelstand', 
    'booktitle' => 'Praxishandbuch Firmenkundengeschäft', 
    'year' => '2010', 
    'editor' => 'Hilse, J. and Netzel, W and Simmert, D.B.', 
    'booksubtitle' => 'Geschäftsfelder Risikomanagement Marketing', 
    'publisher' => 'Gabler', 
    'pages' => '97-107', 
    'location' => 'Wiesbaden', 
), 
)

这里是the site我提取新的采样输入的字符串从。

来源

2017-12-12 14:32:32 mickmackusa

中文提供PHP preg_match_all

回答

相关问题