2017-09-22 280 views
0

我正在做一个Jupyter笔记本分析一些数据,看起来像这样:Jupyter笔记本 - Python代码

The Data I'm analyzing

我必须找出以下信息:

The Questions

这是我尝试过的,但它不起作用,而且我完全丧失了如何去做b部分。

# Import relevant packages/modules 
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 
from scipy import stats 

# Import relevant csv data file 
data = pd.read_csv("C:/Users/Hanna/Desktop/Sheridan College/Statistics for Data Science/Assignment 1/MATH37198_Assignment1_Individual/IGN_game_ratings.csv") 

# Part a: Determine the z-score of "Super Mario Kart" and print out result 
superMarioKart_zscore = data[data['Game']=='Super Mario Kart'] ['Score'].stats.zscore() 
print("Z-score of Super Mario Kart: ", superMarioKart_zscore) 

# Part b: The top 20 (most common) platforms 

# Part c: The average score of all the Shooter games 
averageShooterScore = data[data['Group']=='Game']['Score'].mean() 
# Print output 
print("The average score of all the Shooter games is: ", averageShooterScore) 

# Part d: The top two platforms witht the most perfect scores (10) 

# Part e: The probability of a game randomly selected that is an RPG 
# First find the number of games in the list that is an RPG 
numOfRPGGames = 0 
for game in data['Game']: 
    if data['Genre'] == 'RPG': 
     numOfRPGGames += 1 
# Divide this by the total number of games to find the probablility of selecting one 
print("The probability of selecting a game that is an RPG is: ", numOFRPGGames/totalNumGames) 

# Part f: The probability of a game randomly selected with a score less than 5 
# First find the number of games in the list with a score less than 5 using a for loop: 
numScoresLessThan5 = 0 
for game in data['Game']: 
    if data['Score'] < 5: 
     numScoresLessThan5 += 1 
# Divide this by the total number of games to find the probablility of selecting one 
print("The probability of selecting a game with a score less than 5 is: ", numScoresLessThan5/totalNumGames) 
+2

您可能要分解这个问题到个人问题,你这样可能会得到更好的答案。如果你还没有注意到[MCVE](https://stackoverflow.com/help/mcve),并且真的关注一个特定的问题,你尝试了什么,为什么它不工作以及你期望的输出成为。 – johnchase

回答

0

熊猫有这种类型的问题的优秀的内置功能。以下是使用从CSV导入的一些测试数据解决b部分的建议。我使用的test.csv刚刚这些领域,但仍然在你的工作的情况下更改列名和导入新文件

Sample CSV structure

# Import relevant packages/modules 
import numpy as np 
import pandas as pd 

# Import a dummy csv data file 
data = pd.read_csv("./test.csv") 
# Visualize the file before the process 
print(data) 

# Extract the column you're interesting in counting 
initial_column = data['Name'] 

# Create object for receiving the output of the value_counts function 
count_object = pd.value_counts(initial_column) 

# Create an empty list for receiving the sorted values 
sorted_grouped_column = [] 

# You determine the number of items. In your exercise is 20. 
number_of_items = 3 
counter = 0 

for i in count_object.keys(): 
    if counter == number_of_items: 
     break 
    else: 
     sorted_grouped_column.append(i) 
     counter = counter + 1 

print(sorted_grouped_column)