2016-03-26 94 views
2

我有一种将学生链接到项目的匹配算法。它正在工作,而且我无法将数据导出到csv文件。只有当需要输出200个值时,它才会使用最后的值和输出。使用熊猫将多行数据导出到csv

导出的数据使用每个数字作为值,当我想要得到整个's'而不是组成's'的三个数字,它们被分成三列。我附上了下面的图片。任何帮助,将不胜感激。

What it looks like

What it should look like

#Imports for Pandas 

import pandas as pd 
from pandas import DataFrame 

SPA() 
for m in M: 
    s = m['student'] 
    l = m['lecturer'] 
    Lecturer[l]['limit'] = Lecturer[l]['limit'] - 1 
    id = m['projectid'] 
    p = Project[id]['title'] 
    c = Project[id]['sourceid'] 
    r = str(getRank("Single_Projects1copy.csv",s,c)) 


    print(s+","+l+","+p+","+c+","+r) 

    dataPack = (s+","+l+","+p+","+c+","+r) 

    df = pd.DataFrame.from_records([dataPack]) 
    df.to_csv('try.csv') 

回答

1

你不断改写的循环,使你只用数据的最后一位结束了,需要追加到CSV与df.to_csv('try.csv',mode="a",header=False)或创建一个DF和追加并写在循环之外,如下所示:

df = pd.DataFrame() 
for m in M: 
    s = m['student'] 
    l = m['lecturer'] 
    Lecturer[l]['limit'] = Lecturer[l]['limit'] - 1 
    id = m['projectid'] 
    p = Project[id]['title'] 
    c = Project[id]['sourceid'] 
    r = str(getRank("Single_Projects1copy.csv",s,c)) 


    print(s+","+l+","+p+","+c+","+r) 

    dataPack = (s+","+l+","+p+","+c+","+r) 

    df.append(pd.DataFrame.from_records([dataPack])) 
df.to_csv('try.csv') # write all data once outside the loop 

更好的选择是打开文件并传递该文件对象to_csv

with open('try.csv', 'w') as f: 
    for m in M: 
     s = m['student'] 
     l = m['lecturer'] 
     Lecturer[l]['limit'] = Lecturer[l]['limit'] - 1 
     id = m['projectid'] 
     p = Project[id]['title'] 
     c = Project[id]['sourceid'] 
     r = str(getRank("Single_Projects1copy.csv",s,c)) 
     print(s+","+l+","+p+","+c+","+r) 

     dataPack = (s+","+l+","+p+","+c+","+r) 
     pd.DataFrame.from_records([dataPack]).to_csv(f, header=False) 

你得到个别字符,因为你用from_records传递一个字符串dataPack的值,因此它遍历的字符:

In [18]: df = pd.DataFrame.from_records(["foobar,"+"bar"]) 

In [19]: df 
Out[19]: 
    0 1 2 3 4 5 6 7 8 9 
0 f o o b a r , b a r 

In [20]: df = pd.DataFrame(["foobar,"+"bar"]) 

In [21]: df 
Out[21]: 
      0 
0 foobar,bar 

我想你基本上要为离开一个元组dataPack = (s, l, p,c, r)和使用pd.DataFrame(dataPack)。你根本不需要熊猫,csv lib会为你做所有这些,而不需要创建数据框。

+0

打开文件起作用,它显示所有学生在csv中的数据。感谢您的意见,非常感谢。在csv中,它跳过标题,但第一列由0组成。我将不得不进行更改以使列结构正确。 – MrPool

+0

我被指示使用熊猫,所以如果将来需要将数据导出到MySQL,它会更容易。 – MrPool

+0

你想使用文件中的csv头还是创建你自己的 –