电池我有2个dataframes,df1
和df2
,并要做到以下几点,结果存储在df3
:比较2个熊猫dataframes,逐行,通过细胞
for each row in df1:
for each row in df2:
create a new row in df3 (called "df1-1, df2-1" or whatever) to store results
for each cell(column) in df1:
for the cell in df2 whose column name is the same as for the cell in df1:
compare the cells (using some comparing function func(a,b)) and,
depending on the result of the comparison, write result into the
appropriate column of the "df1-1, df2-1" row of df3)
例如,像:
df1
A B C D
foo bar foobar 7
gee whiz herp 10
df2
A B C D
zoo car foobar 8
df3
df1-df2 A B C D
foo-zoo func(foo,zoo) func(bar,car) func(foobar,foobar) func(7,8)
gee-zoo func(gee,zoo) func(whiz,car) func(herp,foobar) func(10,8)
我已经开始与此:
for r1 in df1.iterrows():
for r2 in df2.iterrows():
for c1 in r1:
for c2 in r2:
,但我不知道该怎么办,并希望得到一些帮助。
因为你应用FUNC同名的列,你可以遍历仅通过列和使用矢量化,例如df3 ['A'] = func(df1 ['A'],df2 ['A']),等等? – StarFox
@StarFox有趣,所以我可能会做类似于:df3中的列:df3 [column] = func(df1 [column],df2 [column])? – Zubo
当然!这就是熊猫/ numpy的力量(一般来说,矢量化)。我将在下面提供一些示例,并且我们将从那里开始 – StarFox