2014-05-05 47 views
0

我有一个函数,它绘制了PandasDataFrame中两列的日志。因为这样的零会导致错误并需要删除。此时该函数的输入是DataFrame的两列。有没有办法删除任何包含零的行?例如DF = DF的等效版本[df.ColA!= 0]删除两个Pandas系列中包含零的整行

def logscatfit(x,y,title): 
    xvals2 = np.arange(-2,6,1) 
    a = np.log(x) #These are what I want to remove the zeros from 
    b = np.log(y) 
    plt.scatter(a, b, c='g', marker='x', s=35) 
    slope, intercept, r_value, p_value, std_err = stats.linregress(a,b) 
    plt.plot(xvals2, (xvals2*slope + intercept), color='red') 
    plt.title(title) 
    plt.show() 
    print "Slope is:",slope, ". Intercept is:",intercept,". R-value is:",r_value,". P-value is:",p_value,". Std_err is:",std_err 

在想不到的两个ab去除零,但让他们同样长度,使得的方式,我可以绘制散点图。是我唯一的选择重写函数采取DataFrame,然后删除零如df1 = df[df.ColA != 0]然后df2 = df1[df1.ColB != 0]

回答

2

根据我的理解你的问题,你需要删除或者(和/或)xy为零的行。

一个简单的方法是

keepThese = (x > 0) & (y > 0) 
a = x[keepThese] 
b = y[keepThese] 

,然后用你的代码进行。

0

插入FooBar的回答到你的函数给出:

def logscatfit(x,y,title): 
    xvals2 = np.arange(-2,6,1) 
    keepThese = (x > 0) & (y > 0) 
    a = x[keepThese] 
    b = y[keepTheese]   
    a = np.log(a) 
    b = np.log(b) 
    plt.scatter(a, b, c='g', marker='x', s=35) 
    slope, intercept, r_value, p_value, std_err = stats.linregress(a,b) 
    plt.plot(xvals2, (xvals2*slope + intercept), color='red') 
    plt.title(title) 
    plt.show() 
    print "Slope is:",slope, ". Intercept is:",intercept,". R-value is:",r_value,". P-value is:",p_value,". Std_err is:",std_err 
1

我喜欢FooBar的对简单的答案。更通用的方法是将数据帧传递给您的函数,并使用.any()方法。

def logscatfit(df,x_col_name,y_col_name,title): 
    two_cols = df[[x_col_name,y_col_name]] 
    mask = two_cols.apply(lambda x: (x==0).any(), axis = 1) 
    df_to_use = df[mask] 
    x = df_to_use[x_col_name] 
    y = df_to_use[y_col_name] 

    #your code 
    a = n.log(x) 
    etc 
相关问题