在python中格式化CSV文件中的数据（计算平均值）

import csv 
with open('Class1scores.csv') as inf: 
    for line in inf: 
     parts = line.split() 
     if len(parts) > 1: 
      print (parts[4]) 


f = open('Class1scores.csv') 
csv_f = csv.reader(f) 
newlist = [] 
for row in csv_f: 

    row[1] = int(row[1]) 
    row[2] = int(row[2]) 
    row[3] = int(row[3]) 

    maximum = max(row[1:3]) 
    row.append(maximum) 
    average = round(sum(row[1:3])/3) 
    row.append(average) 
    newlist.append(row[0:4]) 

averageScore = [[x[3], x[0]] for x in newlist] 
print('\nStudents Average Scores From Highest to Lowest\n')

此处代码旨在读取CSV文件，并在前三行（第0行是用户名）中添加所有三个分数和除以三，但它不计算适当的平均值，它只取最后一列的分数。在python中格式化CSV文件中的数据（计算平均值）

来源

2016-03-14 Billy

您能发布您的CSV文件的前几行吗？ – Igor

打开文件两次有什么意义？ – Seekheart

比利，看看我的答案。您可以切出不需要的零件，并根据自己的需要实施零件。 – Igor

基本上你想每一行的统计数据。一般来说，你应该这样做：

import csv 

with open('data.csv', 'r') as f: 
    rows = csv.reader(f) 
    for row in rows: 
     name = row[0] 
     scores = row[1:] 

     # calculate statistics of scores 
     attributes = { 
      'NAME': name, 
      'MAX' : max(scores), 
      'MIN' : min(scores), 
      'AVE' : 1.0 * sum(scores)/len(scores) 
     } 

     output_mesg ="name: {NAME:s} \t high: {MAX:d} \t low: {MIN:d} \t ave: {AVE:f}" 
     print(output_mesg.format(**attributes))

尽量不要考虑如果做特定的事情本地效率低下。一个好的Pythonic脚本应该尽可能对每个人都可读。

在代码中，我发现了两个错误：

追加到row不会改变任何东西，因为排在循环中的局部变量，会得到垃圾收集。
row[1:3]只给出第二和第三个元素。 row[1:4]给出你想要的，以及row[1:]。索引在Python通常是最终排他性的。

而且为你思考一些问题：

如果我可以打开Excel文件，它不是那么大，为什么不只是做在Excel？我可以利用所有必要的工具尽快完成工作吗？我能在30秒内完成这项任务吗？

来源

2016-03-14 15:39:04 Mai

您的代码包含错误，因为它不输出任何内容。打印行不需要括号，因为它的python 3？ – Billy

@Billy我在https://docs.python.org/3.1/tutorial/inputoutput.html查找了Python3的打印示例，并添加了一个括号和一个格式函数调用。希望是对的。 – Mai

因为我对Python很陌生，所以我很好奇你为什么将sum（分数）/ len（分数）乘以1.0。是否要防止使用整数数据类型？ –

下面是做到这一点的方法之一。见两部分。首先，我们创建一个名称作为关键字和结果列表作为值的字典。

import csv 


fileLineList = [] 
averageScoreDict = {} 

with open('Class1scores.csv', newline='') as csvfile: 
    reader = csv.reader(csvfile) 
    for row in reader: 
     fileLineList.append(row) 

for row in fileLineList: 
    highest = 0 
    lowest = 0 
    total = 0 
    average = 0 
    for column in row: 
     if column.isdigit(): 
      column = int(column) 
      if column > highest: 
       highest = column 
      if column < lowest or lowest == 0: 
       lowest = column 
      total += column  
    average = total/3 
    averageScoreDict[row[0]] = [highest, lowest, round(average)] 

print(averageScoreDict)

输出：

{'Milky': [7, 4, 5], 'Billy': [6, 5, 6], 'Adam': [5, 2, 4], 'John': [10, 7, 9]}

现在，我们有我们的字典，我们可以通过列表进行排序，创建你想要的最终输出。看到这个更新的代码：

import csv 
from operator import itemgetter 


fileLineList = [] 
averageScoreDict = {} # Creating an empty dictionary here. 

with open('Class1scores.csv', newline='') as csvfile: 
    reader = csv.reader(csvfile) 
    for row in reader: 
     fileLineList.append(row) 

for row in fileLineList: 
    highest = 0 
    lowest = 0 
    total = 0 
    average = 0 
    for column in row: 
     if column.isdigit(): 
      column = int(column) 
      if column > highest: 
       highest = column 
      if column < lowest or lowest == 0: 
       lowest = column 
      total += column  
    average = total/3 
    # Here is where we put the emtpy dictinary created earlier to good use. 
    # We assign the key, in this case the contents of the first column of 
    # the CSV, to the list of values. 
    # For the first line of the file, the Key would be 'John'. 
    # We are assigning a list to John which is 3 integers: 
    # highest, lowest and average (which is a float we round) 
    averageScoreDict[row[0]] = [highest, lowest, round(average)] 

averageScoreList = [] 

# Here we "unpack" the dictionary we have created and create a list of Keys. 
# which are the names and single value we want, in this case the average. 
for key, value in averageScoreDict.items(): 
    averageScoreList.append([key, value[2]]) 

# Sorting the list using the value instead of the name. 
averageScoreList.sort(key=itemgetter(1), reverse=True)  

print('\nStudents Average Scores From Highest to Lowest\n') 
print(averageScoreList)

输出：

Students Average Scores From Highest to Lowest [['John', 9], ['Billy', 6], ['Milky', 5], ['Adam', 4]]

来源

2016-03-14 15:17:55 Igor

你可以评论每一行，所以我有一个更好的理解字典的概念？ – Billy

我会为你做一个更好的，在这里阅读。我认为他解释这个概念的确很出色。 [实践Python教程> 1.12。字典]（http://anh.cs.luc.edu/python/hands-on/3.1/handsonHtml/dictionaries.html#dictionaries） – Igor

此外，他们以任何方式逐行打印数据，因为它存储在数组，因为你不能只使用“\ n”。 – Billy

在python中格式化CSV文件中的数据（计算平均值）

回答

相关问题