2012-02-09 227 views
0

考虑以下从“NASDAQ.csv”片断CSV数据导入CSV数据到MySQL

"Symbol,""Name"",""LastSale"",""MarketCap"",""ADR TSO"",""IPOyear"",""Sector"",""industry"",""Summary Quote"",";; 
"FLWS,""1-800 FLOWERS.COM, Inc."",""2.9"",""81745200"",""n/a"",""1999"",""Consumer Services"",""Other Specialty Stores"",""http://www.nasdaq.com/symbol/flws"",";; 
"FCTY,""1st Century Bancshares, Inc"",""4"",""36172000"",""n/a"",""n/a"",""Finance"",""Major Banks"",""http://www.nasdaq.com/symbol/fcty"",";; 
"FCCY,""1st Constitution Bancorp (NJ)"",""8.8999"",""44908895.4"",""n/a"",""n/a"",""Finance"",""Savings Institutions"",""http://www.nasdaq.com/symbol/fccy"",";; 

我试图导入符号,部门,产业进入一个MySQL表对应字段:

$path = "NASDAQ.csv"; 
$row = 1; 
if (($handle = fopen($path, "r")) !== FALSE) { 
    while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) { 
    $row++; 
    $entries[] = $data ; 
    } 
    fclose($handle); 
} 

foreach ($entries as $line) { 
    db_query(" 
    INSERT INTO us_stocks (symbol, name, sector, industry) 
    VALUES ('%s', '%s', '%s', '%s', '%s')", 
    $line[0], $line[1], $line[6], $line[7] 
); 
} 

但是,结果并不是我所期望的。在数据库中,只有符号字段得到填补,甚至不会正确:

symbol  name sector industry 
---------------------------------- 
Symbol,"Na 
FLWS,"1-80 
FCTY,"1st 
FCCY,"1st 

我在做什么错?

[编辑]

如果我的print_r($项目),输出看起来像

Array (
    [0] => Array(
    [0] => Symbol,"Name","LastSale","MarketCap","ADR TSO","IPOyear","Sector","industry","Summary Quote",;; 
) 
    [1] => Array(
    [0] => FLWS,"1-800 FLOWERS.COM, Inc.","2.9","81745200","n/a","1999","Consumer Services","Other Specialty Stores","http://www.nasdaq.com/symbol/flws",;; 
) 
    [2] => Array(
    [0] => FCTY,"1st Century Bancshares, Inc","4","36172000","n/a","n/a","Finance","Major Banks","http://www.nasdaq.com/symbol/fcty",;; 
) 
) 

[EDIT2]

我删除了CSV的第一行,所建议的。我现在有一种快速和肮脏的方式来完成我想要的东西。基本上,只要有一个公司名称与“,公司”在一起就会搅乱。所以我只是把它粘到上面的名字上:$ data [1] = $ data [1]。 $数据[2]:

$path = "NASDAQ.csv"; 
$row = 1; 
if (($handle = fopen($path, "r")) !== FALSE) { 
    while (($data = fgetcsv($handle, 1000, ";;")) !== FALSE) { 
    if ($row < 100) { 
     $row++; 
     $data = explode(',', $data[0]); 
     if (substr($data[2], 0, 1) == ' ') { 
     $data[1] = $data[1] . $data[2]; 
     unset($data[2]); 
     } 
     $entries[] = $data ; 
    } 
    } 
    fclose($handle); 
} 

一的print_r($项目)现在给:

[0] => Array 
    (
     [0] => FLWS 
     [1] => "1-800 FLOWERS.COM Inc." 
     [3] => "2.9" 
     [4] => "81745200" 
     [5] => "n/a" 
     [6] => "1999" 
     [7] => "Consumer Services" 
     [8] => "Other Specialty Stores" 
     [9] => "http://www.nasdaq.com/symbol/flws" 
     [10] => 
    ) 

最后一个问题:我不知道如何重新编号的钥匙。因此,3到2,4到3等,使输出的样子:

[0] => Array 
    (
     [0] => FLWS 
     [1] => "1-800 FLOWERS.COM Inc." 
     [2] => "2.9" 
     [3] => "81745200" 
     [4] => "n/a" 
     [5] => "1999" 
     [6] => "Consumer Services" 
     [7] => "Other Specialty Stores" 
     [8] => "http://www.nasdaq.com/symbol/flws" 
     [9] => 
    ) 

任何帮助将不胜感激!

+1

我猜它有d o使用CSV文件中使用的双引号。 'fgetcsv()'('$ enclosure')的第四个参数可以设置为'“\”\“”'来查看是否属于这种情况。 – Crontab 2012-02-09 13:03:52

回答

1

正如Crontab所说,这可能是一个报价问题。尝试:

foreach ($entries as $line) { 

    // Escape (see mysql_real_escape_string too) and remove double quotes 
    foreach ($line as $k => $v) $line[$k] = mysql_escape_string(trim($v, '"')); 

    // Rebuild array 
    $line = array_values($line); 

    db_query(" 
    INSERT INTO us_stocks (symbol, name, sector, industry) 
    VALUES ('%s', '%s', '%s', '%s', '%s')", 
    $line[0], $line[1], $line[6], $line[7] 
); 

} 

PS:我不知道你是否已经在db_query()逃脱字符串。

+0

我已经做了。但是,它不起作用。你的代码也不是。它现在只是读取FLWS,“1-8等,因为双重逃逸。也许最好是使用正则表达式从每个$数据行中删除所有单引号和双引号? – Pr0no 2012-02-09 13:18:51

+0

'trim($ v,''')'从字符串的开始和结尾中删除单个或多个双引号,所以,恐怕是'fgetcsv()'无法正确解析CSV。在查询和没有我的代码之前尝试查看'print_r($ line)'的输出?是否正确地分割了字段? – 2012-02-09 13:25:26

+0

请参阅编辑:) – Pr0no 2012-02-09 13:27:21

2

我会说数据不是“真正”的CSV。 。

“FLWS”, “1-800 FLOWERS.COM,公司” “ ”“ 2.9 ”“, 应该是: ”FLWS“, ”1-800 FLOWERS.COM,INC“,” 2.9。 “ - 。该报价应换用逗号分隔条件各领域的各个字段通常数字字段不裹

取决于你如何加载数据,逗号的数据可以混淆(即FLOWERS.COM ,INC”

顺便说一句 - 如果它真的CSV - 看:http://dev.mysql.com/doc/refman/5.1/en/load-data.html

+0

嗯,它肯定不是我见过的最好的csv文件...但它是nasday.com上提供的,我无法找到任何其他来源来导入所有美国股票的代码符号(我有其他的csv,比如来自同一个网站的AMEX,NYSE)。我不能从所有字段中去掉所有的“和”字符吗? – Pr0no 2012-02-09 13:16:37

+0

第一行必须有一个拼写错误,因为Symbol和Name之间的分隔符不在引号内,我只是用“ (更改或tr 2 x引号为1 x引用),并使用加载数据infile跳过第1行,并指定要加载的列。*我保证*如果您使用加载数据infile,插入将会非常快速。 – FreudianSlip 2012-02-09 13:26:54

+0

也许,但现在,我认为在PHP中一起黑客攻击工作对我来说工作得更快......几乎在那里btw :)请看看我的最后一个问题 - 如何重新编号键 - 如果你有时间的话。 – Pr0no 2012-02-09 14:15:22