2017-08-14 40 views
0

我有一个.csv我试图读入一个有多列列标题的熊猫数据框,但第一行的标签是稀疏的。使用熊猫阅读带稀疏标签的列标题的CSV

例如:

Binned_average_and_predicted_H2O_spectra_sorted_by_RH-class.,,,,,,,, 
,RH=0.8,,,,RH=0.9,,, 
,n_=_60,,,,n_=_29,,, 
nat_freq,avrg_sp(T),avrg_sp(h2o),denoised_avrg_sp(h2o),pred_sp(h2o),avrg_sp(T),avrg_sp(h2o),denoised_avrg_sp(h2o),pred_sp(h2o) 
6.10E-04,8.40E-02,0.117551351,0.117550357,8.64E-02,0.128696811,0.163304381,0.163304015,0.127552704 
1.22E-03,7.49E-02,0.126467592,0.126465605,7.70E-02,9.05E-02,0.200350295,0.200349563,8.97E-02 
1.83E-03,7.54E-02,0.124370072,0.124367091,7.76E-02,8.54E-02,0.121274897,0.121273799,8.46E-02 
2.44E-03,7.76E-02,0.136590839,0.136586865,7.99E-02,5.45E-02,0.100995665,0.100994202,5.40E-02 
3.05E-03,8.73E-02,0.141422799,0.141417832,8.98E-02,7.57E-02,0.170033442,0.170031614,7.50E-02 
3.66E-03,7.29E-02,0.143599074,0.143593115,7.50E-02,0.10001777,0.165468366,0.165466173,9.91E-02 

当我读了CSV,

Cosp2 = pd.read_csv(DPath,index_col=0, header=[1,3]) 
print(Cosp2) 

我结束了无名:对所有的头第一级标头都没有明确标注#_level_0标签。

   RH=0.8 Unnamed: 2_level_0 Unnamed: 3_level_0 \ 
nat_freq avrg_sp(T)  avrg_sp(h2o) denoised_avrg_sp(h2o) 
0.00061  0.0840   0.117551    0.117550 
0.00122  0.0749   0.126468    0.126466 
0.00183  0.0754   0.124370    0.124367 
0.00244  0.0776   0.136591    0.136587 
0.00305  0.0873   0.141423    0.141418 
0.00366  0.0729   0.143599    0.143593 

     Unnamed: 4_level_0  RH=0.9 Unnamed: 6_level_0 \ 
nat_freq  pred_sp(h2o) avrg_sp(T)  avrg_sp(h2o) 
0.00061    0.0864 0.128697   0.163304 
0.00122    0.0770 0.090500   0.200350 
0.00183    0.0776 0.085400   0.121275 
0.00244    0.0799 0.054500   0.100996 
0.00305    0.0898 0.075700   0.170033 
0.00366    0.0750 0.100018   0.165468 

      Unnamed: 7_level_0 Unnamed: 8_level_0 
nat_freq denoised_avrg_sp(h2o)  pred_sp(h2o) 
0.00061    0.163304   0.127553 
0.00122    0.200350   0.089700 
0.00183    0.121274   0.084600 
0.00244    0.100994   0.054000 
0.00305    0.170032   0.075000 
0.00366    0.165466   0.099100 

有没有办法让熊猫在整个未标记的列上传播0级标签?我想的东西,看起来像这样:

   RH=0.8             \ 
nat_freq avrg_sp(T) avrg_sp(h2o) denoised_avrg_sp(h2o) pred_sp(h2o) 
0.00061  0.0840  0.117551    0.117550  0.0864 
0.00122  0.0749  0.126468    0.126466  0.0770 
0.00183  0.0754  0.124370    0.124367  0.0776 
0.00244  0.0776  0.136591    0.136587  0.0799 
0.00305  0.0873  0.141423    0.141418  0.0898 
0.00366  0.0729  0.143599    0.143593  0.0750 

      RH=0.9             
nat_freq avrg_sp(T) avrg_sp(h2o) denoised_avrg_sp(h2o) pred_sp(h2o) 
0.00061 0.128697  0.163304    0.163304  0.127553 
0.00122 0.090500  0.200350    0.200350  0.089700 
0.00183 0.085400  0.121275    0.121274  0.084600 
0.00244 0.054500  0.100996    0.100994  0.054000 
0.00305 0.075700  0.170033    0.170032  0.075000 
0.00366 0.100018  0.165468    0.165466  0.099100 

回答

1

您可以使用get_level_valuesto_seriesSeries第一:

a = Cosp2.columns.get_level_values(0).to_series() 
print (a) 
RH=0.8       RH=0.8 
Unnamed: 2_level_0 Unnamed: 2_level_0 
Unnamed: 3_level_0 Unnamed: 3_level_0 
Unnamed: 4_level_0 Unnamed: 4_level_0 
RH=0.9       RH=0.9 
Unnamed: 6_level_0 Unnamed: 6_level_0 
Unnamed: 7_level_0 Unnamed: 7_level_0 
Unnamed: 8_level_0 Unnamed: 8_level_0 
dtype: object 

如果startswithUnnamed然后使用maskNaN s,不ffill更换NaNfillnamethod='ffill'

b = a.mask(a.str.startswith('Unnamed')).ffill() 
print (b) 
RH=0.8    RH=0.8 
Unnamed: 2_level_0 RH=0.8 
Unnamed: 3_level_0 RH=0.8 
Unnamed: 4_level_0 RH=0.8 
RH=0.9    RH=0.9 
Unnamed: 6_level_0 RH=0.9 
Unnamed: 7_level_0 RH=0.9 
Unnamed: 8_level_0 RH=0.9 
dtype: object 

末创建新MultiIndex通过from_arrays

Cosp2.columns = pd.MultiIndex.from_arrays([b, Cosp2.columns.get_level_values(1)]) 
print (Cosp2) 
      RH=0.8             \ 
nat_freq avrg_sp(T) avrg_sp(h2o) denoised_avrg_sp(h2o) pred_sp(h2o) 
0.00061  0.0840  0.117551    0.117550  0.0864 
0.00122  0.0749  0.126468    0.126466  0.0770 
0.00183  0.0754  0.124370    0.124367  0.0776 
0.00244  0.0776  0.136591    0.136587  0.0799 
0.00305  0.0873  0.141423    0.141418  0.0898 
0.00366  0.0729  0.143599    0.143593  0.0750 

      RH=0.9             
nat_freq avrg_sp(T) avrg_sp(h2o) denoised_avrg_sp(h2o) pred_sp(h2o) 
0.00061 0.128697  0.163304    0.163304  0.127553 
0.00122 0.090500  0.200350    0.200350  0.089700 
0.00183 0.085400  0.121275    0.121274  0.084600 
0.00244 0.054500  0.100996    0.100994  0.054000 
0.00305 0.075700  0.170033    0.170032  0.075000 
0.00366 0.100018  0.165468    0.165466  0.099100