找到子字符串时，创建一个DOM元素-php

我想用正则表达式分割一个字符串，然后创建一个dom元素，我找到了匹配，并且直到字符串结束。给出一个字符串;找到子字符串时，创建一个DOM元素-php

$str="hi there! [1], how are you? [2]";

期望的结果：

<sentence> 
hi there! <child1>1</child1>, how are you? <child2>2</child2> 
</sentence>

我使用php dom -> $dom = new DOMDocument('1.0'); ...

创建根; （这可能不是有什么关系，但有些人抱怨没有精力和东西..）

 $root= $dom->createElement('sentence', null); 
     $root= $dom->appendChild($root); 
     $root->setAttribute('attr-1', 'value-1');

我用几种方法等，有的用preg-split;

$counter=1; 
$pos = preg_match('/\[([1-9][0-9]*)\]/', $str); 
    if ($pos == true) { 
    $substr=$dom->createElement('child', $counter); 
    $root->appendChild($substr); 
    $counter++; 
    }

我知道，代码是不值得的，但再次显示它不是一种享受..

任何帮助表示赞赏..

来源

2012-03-19 teutara

您的原始代码是不会太远的。但是，您需要使正则表达式匹配以及要添加的文本（并且您需要一个textnode）。每场比赛后，你需要提前偏移为好，在那里继续匹配：

$str = "hi there! [1], how are you? [2]"; 

$dom = new DOMDocument('1.0'); 
$root= $dom->createElement('sentence', null); 
$root= $dom->appendChild($root); 
$root->setAttribute('attr-1', 'value-1'); # ... 

$counter = 0; 
$offset = 0; 
while ($pos = preg_match('/(.*?)\[([1-9][0-9]*)\]/', $str, $matches, NULL, $offset)) { 
    list(, $text, $number) = $matches; 
    if (strlen($text)) { 
     $root->appendChild($dom->createTextNode($text)); 
    } 
    if (strlen($number)) { 
     $counter++; 
     $root->appendChild($dom->createElement("child$counter", $number)); 

    } 
    $offset += strlen($matches[0]); 
}

的while环路媲美if你有，只是把它变成一个循环。另外，textnodes相加，如果有匹配的一些文本（例如，你可以有[1] [2]在您的字符串，使文本将是空的这个例子的输出：

<?xml version="1.0"?> 
<sentence attr-1="value-1"> 
    hi there! <child1>1</child1>, how are you? <child2>2</child2> 
</sentence>

编辑在玩完这一点之后，我得出结论：你可能想要分解问题，一部分是解析字符串，另一部分是实际插入节点（例如textnode和textnode，如果是数字）。从后面开始，这看起来很实用，第二部分第一部分：

$dom = new DOMDocument('1.0'); 
$root = $dom->createElement('sentence', null); 
$root = $dom->appendChild($root); 
$root->setAttribute('attr-1', 'value-1'); # ... 

$str = "hi there! [1], how are you? [2] test"; 

$it = new Tokenizer($str); 
$counter = 0; 
foreach ($it as $type => $string) { 
    switch ($type) { 
     case Tokenizer::TEXT: 
      $root->appendChild($dom->createTextNode($string)); 
      break; 

     case Tokenizer::NUMBER: 
      $counter++; 
      $root->appendChild($dom->createElement("child$counter", $string)); 
      break; 

     default: 
      throw new Exception(sprintf('Invalid type %s.', $type)); 
    } 
} 

echo $dom->saveXML();

在这个例子中，我们根本不关心解析。我们可以得到一个文本或一个数字（$type），我们可以决定插入文本节点或元素。所以，但是对字符串的解析完成后，此代码将始终工作。如果存在问题（例如，$counter不再有趣），则与字符串的解析/标记化无关。

解析本身已被封装为Iterator，称为Tokenizer。它包含了将字符串拆分为文本和数字元素的所有内容。它涉及的一样，如果存在的最后一个号码等后一些文字会发生什么所有的细节：

class Tokenizer implements Iterator 
{ 
    const TEXT = 1; 
    const NUMBER = 2; 
    private $offset; 
    private $string; 
    private $fetched; 

    public function __construct($string) 
    { 
     $this->string = $string; 
    } 

    public function rewind() 
    { 
     $this->offset = 0; 
     $this->fetch(); 
    } 

    private function fetch() 
    { 
     if ($this->offset >= strlen($this->string)) { 
      return; 
     } 
     $result = preg_match('/\[([1-9][0-9]*)\]/', $this->string, $matches, PREG_OFFSET_CAPTURE, $this->offset); 
     if (!$result) { 
      $this->fetched[] = array(self::TEXT, substr($this->string, $this->offset)); 
      $this->offset = strlen($this->string); 
      return; 
     } 
     $pos = $matches[0][1]; 
     if ($pos != $this->offset) { 
      $this->fetched[] = array(self::TEXT, substr($this->string, $this->offset, $pos - $this->offset)); 
     } 
     $this->fetched[] = array(self::NUMBER, $matches[1][0]); 
     $this->offset = $pos + strlen($matches[0][0]); 
    } 

    public function current() 
    { 
     list(, $current) = current($this->fetched); 
     return $current; 
    } 

    public function key() 
    { 
     list($key) = current($this->fetched); 
     return $key; 
    } 

    public function next() 
    { 
     array_shift($this->fetched); 
     if (!$this->fetched) $this->fetch(); 
    } 

    public function valid() 
    { 
     return (bool)$this->fetched; 
    } 
}

做完这些已经裂开了两个问题彼此。除了迭代器类外，还可以创建一个数组或类似的数组，但我发现迭代器更有用，所以我很快写了一个。

这个例子再一次输出XML，所以这里是示例性的。请注意，我在最后一个元素后添加了一些文本：

<?xml version="1.0"?> 
<sentence attr-1="value-1"> 
    hi there! <child1>1</child1>, how are you? <child2>2</child2> test 
</sentence>

来源

2012-03-19 15:44:16 hakre

感谢哈克雷，我正试图实现您的解决方案。将发布结果.. – teutara 2012-03-19 15:52:45

我仍然错过了一些东西，:(会发布。 – teutara 2012-03-19 16:05:59

@teurara：这不是很稳定，我编辑了可能有用的答案：分解问题。 – hakre 2012-03-19 16:30:27

-1

首先执行与正则表达式替换，然后解析该文件。

$xml = preg_replace('/\[(\d+)\]/', '<child$1>$1</child$1>', $str); 
$doc = new DOMDocument('1.0'); 
$doc->loadXML("<sentence>$xml</sentence>");

Here's a demo.

来源

2012-03-19 15:29:16 Ryan

谢谢，但这不会帮助，因为我将解析之后创建的xml。我将无法获得 s .. – teutara 2012-03-19 15:33:45

@teutara：对不起，请您重新说明一下吗？ – Ryan 2012-03-19 15:35:04

我的意思是，因为数据来自一个数据库，并有很多元素创建，在同一级别或上层等，有孩子作为字符串赢得'帮助 – teutara 2012-03-19 15:39:05

找到子字符串时，创建一个DOM元素-php

回答

相关问题