我写了一个脚本,它向Google发送大块文本进行翻译,但有时文本是html源代码)将最终分裂成html标签的中间,Google会错误地返回代码。将一个大字符串拆分成一个数组,但拆分点不能破坏标签
我已经知道如何将字符串拆分成数组,但是有没有更好的方法来做到这一点,同时确保输出字符串不超过5000个字符并且不会在标签上分割?
UPDATE:多亏了答案,这是我最终使用在我的项目的代码,它的伟大工程
function handleTextHtmlSplit($text, $maxSize) {
//our collection array
$niceHtml[] = '';
// Splits on tags, but also includes each tag as an item in the result
$pieces = preg_split('/(<[^>]*>)/', $text, -1, PREG_SPLIT_DELIM_CAPTURE);
//the current position of the index
$currentPiece = 0;
//start assembling a group until it gets to max size
foreach ($pieces as $piece) {
//make sure string length of this piece will not exceed max size when inserted
if (strlen($niceHtml[$currentPiece] . $piece) > $maxSize) {
//advance current piece
//will put overflow into next group
$currentPiece += 1;
//create empty string as value for next piece in the index
$niceHtml[$currentPiece] = '';
}
//insert piece into our master array
$niceHtml[$currentPiece] .= $piece;
}
//return array of nicely handled html
return $niceHtml;
}
哇琥珀,谢谢你。它应该真的让我的车轮转动。我会放弃它。 – james 2010-07-21 01:57:49