我按年份对数据帧进行分组(它是列上多索引的一个级别),应用填充df的函数有11列(根据需要添加尽可能多的空列),然后返回填充的df。但是这会产生一个错误。ValueError:无法从重复轴进行重新索引 - 没有重复的轴值
finalFormat = (penultimateFormatNot11Columns.groupby(level = 'Year',
axis = 1)
.apply(padDFToXColumns)
)
raise ValueError("cannot reindex from a duplicate axis")
里面的填充功能被应用,则返回不会对任一轴
>>> paddedDF.index.duplicated().any()
False
>>> paddedDF.columns.duplicated().any()
False
>>>
哪里这个错误是来自任何想法,任何复制水平paddedDF?
填充功能
def padDFToXColumns(df, TOT_COLUMNS = 11):
"""
Pad out the number of columns in df to TOT_COLUMNS (add TOT_COLUMNS - len(df) empty columns)
"""
numColsInDF = len(df.columns)
if numColsInDF > TOT_COLUMNS:
print("ERROR: Number Of Columns (%s) Exceeds Max Columns (%s)" % (numColsInDF, TOT_COLUMNS))
return
### Add Empty Columns ###
numColsToAdd = TOT_COLUMNS - numColsInDF
columnsToAdd = [ 'EmptyColumn' + str(num) for num in range(numColsInDF + 1, TOT_COLUMNS + 1) ]
emptyColumns = pd.DataFrame(columns = columnsToAdd, index = np.arange(len(df.index)))
paddedDF = df.join(emptyColumns)
#paddedDF.reset_index(drop = True, inplace = True)
return paddedDF
数据帧
>>> mydata.head()
SurveyYear Age Race Gender WeightAdjusted
0 1996 39 1.White 1.Female 1039.13
1 1996 9 1.White 2.Male 995.13
2 1996 8 1.White 2.Male 775.66
3 1996 39 1.White 2.Male 404.28
4 1996 33 3.Hispanic 1.Female 404.28
>>> groupbyKeys = ['SurveyYear', 'Age', 'Race', 'Gender']
>>> cellPopulations = mydata.groupby(groupbyKeys).agg({'WeightAdjusted':'sum'})
>>> cellPopulations.head(20)
WeightAdjusted
SurveyYear Age Race Gender
1996 0 1.White 1.Female 1204859.60
2.Male 1227666.34
2.Black 1.Female 307495.16
2.Male 263571.07
3.Hispanic 1.Female 320359.68
2.Male 392902.80
4.Asian 1.Female 78615.49
2.Male 82341.54
5.Other 1.Female 16134.33
2.Male 19365.76
1 1.White 1.Female 1195134.70
2.Male 1195659.14
2.Black 1.Female 328376.10
2.Male 383293.79
3.Hispanic 1.Female 322862.58
2.Male 404322.04
4.Asian 1.Female 79499.56
2.Male 73783.69
5.Other 1.Female 20647.55
2.Male 24222.52
>>> unstackKey = ['SurveyYear', 'Age', 'Gender']
>>> penultimateFormatNot11Columns = cellPopulations.unstack(unstackKey)
>>> penultimateFormatNot11Columns
WeightAdjusted ...
SurveyYear 1996 ... 1997
Age 0 1 2 3 4 ... 76 77 78 79 80
Gender 1.Female 2.Male 1.Female 2.Male 1.Female 2.Male 1.Female 2.Male 1.Female 2.Male ... 1.Female 2.Male 1.Female 2.Male 1.Female 2.Male 1.Female 2.Male 1.Female 2.Male
Race ...
1.White 1204859.60 1227666.34 1195134.70 1195659.14 1197386.21 1288700.89 1251324.65 1307458.14 1236790.33 1374989.75 ... 764103.31 506844.04 702775.64 425705.16 666705.33 423419.49 577674.82 366109.58 3898404.40 2283771.11
2.Black 307495.16 263571.07 328376.10 383293.79 291976.23 326400.85 310870.61 323344.13 301025.43 323199.08 ... 68272.99 43254.98 50082.98 34347.45 50788.70 36772.29 31393.21 20720.47 366569.11 180108.23
3.Hispanic 320359.68 392902.80 322862.58 404322.04 344564.20 340702.86 303325.65.53 382663.64 311911.38 ... 39084.04 17362.56 27507.45 18803.48 17619.95 24060.91 35665.78 23802.81 174972.00 105530.84
4.Asian 78615.49 82341.54 79499.56 73783.69 96289.08 88222.32 96411.97 92029.56 77070.10 90370.15 ... 30196.58 27745.90 18419.49 15406.79 7272.27 17891.33 18116.50 3606.67 57684.54 42662.74
5.Other 16134.33 19365.76 20647.55 24222.52 17469.53 27237.94 11220.90 6996.58 23640.43 14917.77 ... 4441.26 nan 1487.90 2845.89 522.43 2453.52 303.66 2982.57 18870.12 6232.88
我认为你可以使用添加一些错误的数据样本,谢谢。 – jezrael
增加了一些关于底层数据以及它是如何制作的信息。 – Nirvan