2016-04-03 37 views
0

我有一个格式不正确的文本文件,我想将其转换为csv。php - 将格式不正确的txt转换为csv

下面是一个例子:

100910 NA/1-2013-99636 VIA DEI PESCATORI 2/A LODI APR 8 2013 4:24PM DANNEGGIATO -10% 200 2700 0 0 NO 
148013 NA/1-2014-146194 CAVALLOTTI SNC LODI GEN 3 2014 3:37PM DANNEGGIATO -10% 0 0 2 0 NO 
160032 NA/1-2014-158129 PAOLO GORINI SNC LODI MAG 6 2014 11:51AM DANNEGGIATO -10% 2 0 2 0 NO 
54900 NA/1-2014-158070 STRADA VECCHIA CREMONESE SNC LODI MAG 6 2014 9:53AM DANNEGGIATO +10% 10 0 10 0 NO 
100910 NA/1-2013-99636 VIA DEI PESCATORI 2/A LODI APR 8 2013 4:24PM DANNEGGIATO -10% 200 2700 0 0 NO 
147959 NA/1-2014-146140 DOSSENA SNC LODI GEN 3 2014 10:45AM DANNEGGIATO -10% 200 0 200 0 NO 

这大约是在这种形式:

[number] [id] [awfully formatted street] ['LODI'] [timestamp] [damaged or not] [percentage] [squaremeters] [squaremeters] [squaremeters] [squaremeters] [asbest-crumbled or not] 

我的问题是如何提取的第三部分,[非常格式化的街道。 基本上它是字符串['LODI']之前的[id]之后的字符串(但['LODI']必须在[timestamp]之前)

我应该用空格展开每行,然后遍历数组向后,超越[timestamp],超过['LODI']并加入array [id]之前的值,即array [1]?还是有更聪明的(优雅)的方式来做到这一点,也许preg_match()?

感谢您的任何提示!

+0

''通过LODI' – splash58

回答

0
<?php 
    // read file line by line 
    $line = '148013 NA/1-2014-146194 CAVALLOTTI SNC LODI GEN 3 2014 3:37PM DANNEGGIATO -10% 0 0 2 0 NO'; 

    //start by seperating the string on LODI 
    $lodi_split = explode('LODI', $line); 

    // Now split the first occ into an array on space 
    $bits = explode(' ', $lodi_split[0]); 

    $address = ''; 
    // start reading occurance from occ 2 to loose the first 2 fields 
    for ($i=2; $i < count($bits); $i++) { 
     $address .= $bits[$i] . ' '; 
    } 
    echo $address . PHP_EOL; 

结果是

CAVALLOTTI SNC 
0

这应该能够从一行中提取地址。

<?php 
$row = "100910 NA/1-2013-99636 VIA DEI PESCATORI 2/A LODI APR 8 2013 4:24PM DANNEGGIATO -10% 200 2700 0 0 NO"; 
$row_array = preg_split('/\s+/', $row); 


array_shift($row_array); 
array_shift($row_array); 

for($i=0; $i<12; $i++){ 
    array_pop($row_array); 
} 

$address = implode(" ", $row_array); 

?> 
0

我觉得爆炸不会在这里做。我建议使用regexp。举例来说,如果你读的.txt文件作为一个字符串(数据串用\ n分隔):

$f = fopen($fname="file.txt", "rt"); 
$str = fread($f, filesize($fname))); 
fclose($f); 

然后使用preg_match_all()这样的:

$re = "/^(\\d+)\\s*(.*)(LODI)\\s*(.+(?:AM|PM))\\s*(\\w+)\\s+(-?\\d{1,3}%)\\s+(\\d+)\\s+(\\d+)\\s+(\\d+)\\s+(\\d+)\\s+(\\w+)$/m"; 
preg_match_all($re, $str, $matches,PREG_SET_ORDER); 
echo "<pre>\n"; 
print_r($matches); 
echo "</pre>\n"; 

输出结果如下像这样:

Array 
(
    [0] => Array 
     (
      [0] => 100910 NA/1-2013-99636 VIA DEI PESCATORI 2/A LODI APR 8 2013 4:24PM DANNEGGIATO -10% 200 2700 0 0 NO 
      [1] => 100910 
      [2] => NA/1-2013-99636 VIA DEI PESCATORI 2/A 
      [3] => LODI 
      [4] => APR 8 2013 4:24PM 
      [5] => DANNEGGIATO 
      [6] => -10% 
      [7] => 200 
      [8] => 2700 
      [9] => 0 
      [10] => 0 
      [11] => NO 
     ) 

    [1] => Array 
     (
      [0] => 148013 NA/1-2014-146194 CAVALLOTTI SNC LODI GEN 3 2014 3:37PM DANNEGGIATO -10% 0 0 2 0 NO 
      [1] => 148013 
      [2] => NA/1-2014-146194 CAVALLOTTI SNC 
      [3] => LODI 
      [4] => GEN 3 2014 3:37PM 
      [5] => DANNEGGIATO 
      [6] => -10% 
      [7] => 0 
      [8] => 0 
      [9] => 2 
      [10] => 0 
      [11] => NO 
    ) 
..........// And so on 

我使用了本例中上面提供的文本。因此在输出中,您将数据格式化为数组列表。所以你可以随心所欲地做任何事情。 $ matches [$ i] [0] - 将存储整个比赛,只需跳过它并使用$ matches [$ i] [1] .... $ matches [$ i] [11]作为数据。

+0

是啊,谢谢explode'。我想这是要走的路。似乎这个第一选择组(。*)是相当贪婪的(尽管它与所有文件都匹配到最后一行,而preg_match_all()总是只返回最后一行,也就是只有一次模式)。很奇怪,考虑到preg_match_all标记为m,所以我认为它应该逐行阅读 –