我目前有一段PHP代码，基本上从xml文件中提取数据，并使用$products = new SimpleXMLElement($xmlString);创建简单的xml对象然后，我使用for循环遍历此代码，在该循环中，我为XML文档中的每个产品设置产品详细信息。然后它被保存到一个mySql数据库。大型PHP for循环与SimpleXMLElement非常缓慢：内存问题？

在运行此脚本时，产品添加频率降低，直到它们在达到最大值之前最终停止。我尝试过间隔运行垃圾收集，但无济于事。以及取消设置似乎不起作用的各种变量。代码

部分如下图所示：

<?php 
$servername = "localhost"; 
$username = "database.database"; 
$password = "demwke"; 
$database = "databasename"; 
$conn = new mysqli($servername, $username, $password, $database); 

$file = "large.xml"; 
$xmlString = file_get_contents($file); 
$products = new SimpleXMLElement($xmlString); 
unset($xmlString, $file); 
$total = count($products->datafeed[0]); 

echo 'Starting<br><br>'; 

for($i=0;$i<$total;$i++){ 
    $id = $products->datafeed->prod[$i]['id']; 
etc etc 
    $sql = "INSERT INTO products (id, name, uid, cat, prodName, brand, desc, link, imgurl, price, subcat) VALUES ('$id', '$store', '$storeuid', '$category', '$prodName', '$brand', '$prodDesc', '$link', '$image', '$price', '$subCategory')"; 
} 
echo '<br>Finished'; 
?>

的PHP变量使用类似的线与$ ID的所有定义，但删除，以便更容易阅读。

有关我可以做什么/阅读以完成此任务的任何想法？只要最终完成，所花费的时间对我来说并不重要。

来源

2015-04-20 Adam Moseley

你能解释一下“减少频率直到它们最终停止吗？也许添加一段XML结构来说明？ –

我有另一页用于检查数据库中的总行数。之后的第一个5秒约4000，然后再过5秒约2000新增自此以来。然后这会减少，直到它仅为每秒10个左右。 –

可能的欺骗：http://stackoverflow.com/questions/18518602/stream-parse-4-gb-xml-file-in-php –

更新：从来没有使用SimpleXML索引，除非您有真的很少对象。改为使用foreach。：

// Before, with [index]: 
for ($i=0;$i<$total;$i++) { 
    $id = $products->datafeed->prod[$i]['id']; 
    ... 

// After, with foreach(): 
$i = 0; 
foreach ($products->datafeed->prod as $prod) { 
    $i++; // Remove if you don't actually need $i 
    $id = $prod['id']; 
    ...

一般而言，...->node[$i]将访问阵列node[]并朗读所有到所需的索引，以便迭代所述节点数组不是O（N），但O（N ）。没有解决方法，因为不能保证当您访问项目K时，您刚刚访问了项目K-1（以递归方式等等）。 foreach保存指针，从而在o（N）中工作。

出于同样的原因，它可能是有利的foreach来整个阵列，即使你真的需要只有少数，知道的东西（除非他们是少数，非常靠近该阵列的开始）：

$a[0] = $products->datafeed->prod[15]['id']; 
    ... 
    $a[35] = $products->datafeed->prod[1293]['id']; 

// After, with foreach(): 
$want = [ 15, ... 1293 ]; 
$i = 0; 
foreach ($products->datafeed->prod as $prod) { 
    if (!in_array(++$i, $want)) { 
     continue; 
    } 
    $a[] = $prod['id']; 
}

您应该首先验证增加的延迟是由MySQLi还是由XML处理引起的。您可以从循环中删除（注释掉）SQL查询执行，而不是其他任何事情，以验证速度（现在认为它会更高...... :-)）现在保持不变，或者显示相同的减少。

我怀疑是XML处理是罪魁祸首，在这里：

for($i=0;$i<$total;$i++){ 
    $id = $products->datafeed->prod[$i]['id'];

...在这里你访问一个指数，这是越来越远成SimpleXMLObject。这可能会遇到Schlemiel the Painter的问题。

直接回答你的问题，“我怎样才能完成循环，不管时间如何”，都是“增加内存限制和最大执行时间”。

为了提高性能，您可以使用不同的接口进料对象：

$i = -1; 
foreach ($products->datafeed->prod as $prod) { 
    $i++; 
    $id = $prod['id']; 
    ... 
}

做实验

我用这个小程序来读取大型XML和重复的内容：

// Stage 1. Create a large XML. 
$xmlString = '<?xml version="1.0" encoding="UTF-8" ?>'; 
$xmlString .= '<content><package>'; 
for ($i = 0; $i < 100000; $i++) { 
    $xmlString .= "<entry><id>{$i}</id><text>The quick brown fox did what you would expect</text></entry>"; 
} 
$xmlString .= '</package></content>'; 

// Stage 2. Load the XML. 
$xml = new SimpleXMLElement($xmlString); 

$tick = microtime(true); 
for ($i = 0; $i < 100000; $i++) { 
    $id = $xml->package->entry[$i]->id; 
    if (0 === ($id % 5000)) { 
     $t = microtime(true) - $tick; 
     print date("H:i:s") . " id = {$id} at {$t}\n"; 
     $tick = microtime(true); 
    } 
}

在生成XML之后，一个循环会解析它并打印出需要多少元才能迭代5000个元素。为了验证它确实是时间增量，日期也被打印出来。增量应该近似于时间戳之间的时间差。

21:22:35 id = 0 at 2.7894973754883E-5 
21:22:35 id = 5000 at 0.38135695457458 
21:22:38 id = 10000 at 2.9452259540558 
21:22:44 id = 15000 at 5.7002019882202 
21:22:52 id = 20000 at 8.0867099761963 
21:23:02 id = 25000 at 10.477082967758 
21:23:15 id = 30000 at 12.81209897995 
21:23:30 id = 35000 at 15.120756149292

所以这是发生了什么：处理XML阵列变为慢。

这主要是相同的程序中使用的foreach：

// Stage 1. Create a large XML. 
$xmlString = '<?xml version="1.0" encoding="UTF-8" ?>'; 
$xmlString .= '<content><package>'; 
for ($i = 0; $i < 100000; $i++) { 
    $xmlString .= "<entry><id>{$i}</id><text>The quick brown fox did ENTRY {$i}.</text></entry>"; 
} 
$xmlString .= '</package></content>'; 

// Stage 2. Load the XML. 
$xml = new SimpleXMLElement($xmlString); 

$i  = 0; 
$tick = microtime(true); 
foreach ($xml->package->entry as $data) { 
    // $id = $xml->package->entry[$i]->id; 
    $id = $data->id; 
    $i++; 
    if (0 === ($id % 5000)) { 
     $t = microtime(true) - $tick; 
     print date("H:i:s") . " id = {$id} at {$t} ({$data->text})\n"; 
     $tick = microtime(true); 
    } 
}

的时间现在似乎是恒定的......我说“似乎”，是因为他们似乎已经由约一万因素减少，我在获得可靠的测量方面遇到一些困难。

（不，我不知道，我可能从来没有使用大型XML数组索引）。

21:33:42 id = 0 at 3.0994415283203E-5 (The quick brown fox did ENTRY 0.) 
21:33:42 id = 5000 at 0.0065329074859619 (The quick brown fox did ENTRY 5000.) 
... 
21:33:42 id = 95000 at 0.0065121650695801 (The quick brown fox did ENTRY 95000.)

来源

2015-04-20 21:07:51 LSerni

感谢这一点，问题是for循环。更改为foreach允许我在不到一秒的时间内插入55000。 –

您可以检查以下2个步骤吗？它可以帮助您。

1) Increase the default PHP execution time from 30 sec to a bigger one. 
    ini_set('max_execution_time', 300000); 

2) If fails please try to execute your code though cron job/back end.

来源

2015-04-20 19:00:58

我以前有过同样的问题。

将您的大型xml文件分解为比file1，file2，file3更小的文件，而不是处理它们。

你可以用文本编辑器来分解你的xml文件，它可以打开大文件。当爆炸你的文件时，不要浪费你的时间。

编辑：我找到了一个巨大的XML文件的答案。我认为这是达到这个目的的最佳答案。 Parsing Huge XML Files in PHP

来源

2015-04-20 19:12:32 hakiko

根据XML结构的复杂程度，这可能不是一个这么简单的修复，这就是为什么我建议只使用php来跟踪您的位置并继续在稍后页面加载时留下的位置。 –

@JonathanKuhn我编辑了我的答案，请看。 – hakiko

你确定这是XML文件吗？无需使用文本编辑器剪切，您可以使用XMLReader之类的解析器，然后处理一个主要元素 - 如果XML文件太大（从问题中提供的错误信息来看，XML可能不是问题在这里）。 – hakre

您可以尝试增加内存限制。如果这不是一个选项，你只需要完成一次，我个人只是把它组装起来，一次处理5k值。

<?php 
$servername = "localhost"; 
$username = "database.database"; 
$password = "demwke"; 
$database = "databasename"; 
$conn = new mysqli($servername, $username, $password, $database); 

$file = "large.xml"; 
$xmlString = file_get_contents($file); 
$products = new SimpleXMLElement($xmlString); 
unset($xmlString, $file); 

$total = count($products->datafeed[0]); 

//get your starting value for this iteration 
$start = isset($_GET['start'])?(int)$_GET['start']:0; 

//determine when to stop 
//process no more than 5k at a time 
$step = 5000; 
//where to stop, either after our step (max) or the end 
$limit = min($start+$step, $total); 

echo 'Starting<br><br>'; 

//modified loop so $i starts at our start value and stops at our $limit for this load. 
for($i=$start;$i<$limit;$i++){ 
    $id = $products->datafeed->prod[$i]['id']; 
etc etc 
    $sql = "INSERT INTO products (id, name, uid, cat, prodName, brand, desc, link, imgurl, price, subcat) VALUES ('$id', '$store', '$storeuid', '$category', '$prodName', '$brand', '$prodDesc', '$link', '$image', '$price', '$subCategory')"; 
} 

if($limit >= $total){ 
    echo '<br>Finished'; 
} else { 
    echo<<<HTML 
<html><head> 
<meta http-equiv="refresh" content="2;URL=?start={$limit}"> 
</head><body> 
Done processing {$start} through {$limit}. Moving on to next set in 2 seconds. 
</body><html> 
HTML; 
} 
?>

只要这不是你有一个用户负载（像你的网站的标准访问者）应该没有问题。

另一个选择，你有没有尝试正确准备/绑定您的查询？

来源

2015-04-20 19:22:52

这里有两个问题需要解决：

内存

在你阅读完整的文件到内存的file_get_contents（），并将其解析为与SimpleXML的一个对象结构的那一刻。这两个操作都将整个文件加载到内存中。

一个更好的解决方案是使用的XMLReader：

$reader = new XMLReader; 
$reader->open($file); 
$dom = new DOMDocument; 
$xpath = new DOMXpath($dom); 

// look for the first product element 
while ($reader->read() && $reader->localName !== 'product') { 
    continue; 
} 

// while you have an product element 
while ($reader->localName === 'product') { 
    // expand product element to a DOM node 
    $node = $reader->expand($dom); 
    // use XPath to fetch values from the node 
    var_dump(
    $xpath->evaluate('string(@category)', $node), 
    $xpath->evaluate('string(name)', $node), 
    $xpath->evaluate('number(price)', $node) 
); 
    // move to the next product sibling 
    $reader->next('product'); 
}

性能

工作有很多数据需要时间，以串行的方式做这件事，甚至更多。

将脚本移动到命令行可以处理超时。也可以用`set_time_limit（）来增加限制。

另一种选择是优化插入，收集一些记录并将它们组合到一个插入。这减少了数据库服务器上的往返/工作，但消耗更多的内存。你将不得不寻找一个平衡点。

INSERT INTO table 
    (field1, field2) 
VALUES 
    (value1_1, value1_2), 
    (value2_1, value2_2), ...

您甚至可以将SQL写入文件并使用mysql命令行工具插入记录。这非常快，但具有安全隐患，因为您需要使用exec()。

来源

2015-04-21 09:36:29 ThW

大型PHP for循环与SimpleXMLElement非常缓慢：内存问题？

回答

做实验

内存

性能

相关问题