2017-07-09 68 views

回答

3

您可以使用:

  • read_csv与省略第4行通过skiprows
  • 检查,如果最后一列由isnull拥有所有NaN值与all
  • 所有的值都是NaN,所以你可以忽略它,一种可能的解决方案是使用iloc用于选择第一61
  • 重塑通过set_indexunstack
  • 最后reset_indexrename柱由dict

df = pd.read_csv('API_NY.GDP.MKTP.KD.ZG_DS2_en_csv_v2.csv', skiprows=4) 
print (df.head()) 
    Country Name Country Code   Indicator Name  Indicator Code 1960 \ 
0  Aruba   ABW GDP growth (annual %) NY.GDP.MKTP.KD.ZG NaN 
1 Afghanistan   AFG GDP growth (annual %) NY.GDP.MKTP.KD.ZG NaN 
2  Angola   AGO GDP growth (annual %) NY.GDP.MKTP.KD.ZG NaN 
3  Albania   ALB GDP growth (annual %) NY.GDP.MKTP.KD.ZG NaN 
4  Andorra   AND GDP growth (annual %) NY.GDP.MKTP.KD.ZG NaN 

    1961 1962 1963 1964 1965  ...   2008  2009  2010 \ 
0 NaN NaN NaN NaN NaN  ...  -6.881302 -5.653502  NaN 
1 NaN NaN NaN NaN NaN  ...  3.611368 21.020649 8.433290 
2 NaN NaN NaN NaN NaN  ...  13.817146 2.412870 3.407655 
3 NaN NaN NaN NaN NaN  ...  7.530000 3.350000 3.710000 
4 NaN NaN NaN NaN NaN  ...  -8.594256 -3.817986 -5.347977 

     2011  2012  2013  2014  2015  2016 Unnamed: 61 
0  NaN  NaN  NaN  NaN  NaN  NaN   NaN 
1 6.113685 14.434741 1.959123 1.312531 1.112558 2.232272   NaN 
2 3.918597 5.155441 6.813586 4.804473 3.006981 0.000000   NaN 
3 2.550000 1.420000 1.110000 1.800000 2.590000 3.460000   NaN 
4 -4.802675 -1.760010 -0.063514  NaN  NaN  NaN   NaN 

cols = ['Country Name','Country Code','Indicator Name','Indicator Code'] 

print (df.iloc[:, 61].isnull().all()) 
True  

d = {'level_4':'year'} 
df = df.iloc[:, :60] 
     .set_index(cols) 
     .stack() 
     .reset_index(name='vals') 
     .rename(columns=d) 
print (df.head()) 
    Country Name Country Code   Indicator Name  Indicator Code year \ 
0  Aruba   ABW GDP growth (annual %) NY.GDP.MKTP.KD.ZG 1995 
1  Aruba   ABW GDP growth (annual %) NY.GDP.MKTP.KD.ZG 1996 
2  Aruba   ABW GDP growth (annual %) NY.GDP.MKTP.KD.ZG 1997 
3  Aruba   ABW GDP growth (annual %) NY.GDP.MKTP.KD.ZG 1998 
4  Aruba   ABW GDP growth (annual %) NY.GDP.MKTP.KD.ZG 1999 

     vals 
0 1.245086 
1 7.814432 
2 6.666622 
3 1.154469 
4 4.514062 

对于3列是非常溶液simila R,用于删除列使用drop

df = pd.read_csv('API_NY.GDP.MKTP.KD.ZG_DS2_en_csv_v2.csv', skiprows=4) 

d = {'level_1':'year'} 
df= df.drop(['Country Code','Indicator Name','Indicator Code', 'Unnamed: 61'], axis=1) 
df = df.set_index('Country Name').stack().reset_index(name='vals').rename(columns=d) 
print (df.head()) 
    Country Name year  vals 
0  Aruba 1995 1.245086 
1  Aruba 1996 7.814432 
2  Aruba 1997 6.666622 
3  Aruba 1998 1.154469 
4  Aruba 1999 4.514062 
2

你可以试试这个:

df = pd.read_csv('Your file path.csv', dtype = object) # read file 
df.drop(['Country Name','Indicator Name'], axis = 1, inplace = True) # delete unwanted columns 
dfT = pd.melt(df, id_vars=['Country Code','Indicator Code']) # transpose 
dfF = dfT.dropna(subset=['value']) # delete Nan