单引号替换，处理pandas/python2中的空整数

新的熊猫/ Python和我不得不写一些kludgy代码。我会很感激你如何做到这一点，并加快速度（我将这样做的千兆字节的数据）。单引号替换，处理pandas/python2中的空整数

因此，我使用pandas/python进行一些ETL工作。按行进行计算，因此我需要它们作为过程中的数字类型（将此部分留出）。我需要输出一些字段作为数组，并摆脱单引号，nan和“.0”。

第一个问题，有没有办法矢量化这些if else语句ala ifelse in R？其次，当然有更好的方法去除“.0”。熊猫/ numpy处理数字类型的空值似乎存在主要问题。

最后，.replace似乎不适用于单引号的DataFrame。我错过了什么吗？这里是示例代码，请让我知道如果您有任何疑问：

import pandas as pd 

# have some nulls and need it in integers 
d = {'one' : [1.0, 2.0, 3.0, 4.0],'two' : [4.0, 3.0, NaN, 1.0]} 
dat = pd.DataFrame(d) 

# make functions to get rid of the ".0" and necessarily converting to strings 
def removeforval(val): 
    if str(val)[-2:] == ".0": 
     val = str(val)[:len(str(val))-2] 
    else: 
     val = str(val) 
    return val 
def removeforcol(col): 
    col = col.apply(removeforval) 
    return col 
dat = dat.apply(removeforcol,axis=0) 
# remove the nan's 
dat = dat.replace('nan','') 

# need some fields in arrays on a postgres database 
quoted = ['{' + str(tuple(x))[1:-1] + '}' for x in dat.to_records(index=False)] 
print "Before single quote removal" 
print quoted 

# try to replace single quotes using DataFrame's replace 
quoted_df = pd.DataFrame(quoted).replace('\'','') 
quoted_df = quoted_df.replace('\'','') 
print "DataFrame does not seem to work" 
print quoted_df 

# use a loop 
for item in range(len(quoted)): 
    quoted[item] = quoted[item].replace('\'','') 
print "This Works" 
print quoted

谢谢！

来源

2013-06-26 ideamotor

你可以告诉你希望你的输出是什么？ – Jeff

[{4,1}，{2,3}，{3，}，{4,1}]就像上一个输出 – ideamotor

我错过了这样的列表['{1，4}'，'{2 ，'{3，}'，'{4，1}'] – ideamotor

你明白，制作一个完全像这样的字符串是很奇怪的。这根本不是有效的python。你在做什么？你为什么把它串起来？

修订

In [144]: list([ "{%s , %s}" % tup[1:] for tup in df.replace(np.nan,0).astype(int).replace(0,'').itertuples() ]) 
Out[144]: ['{1 , 4}', '{2 , 3}', '{3 , }', '{4 , 1}']

来源

2013-06-26 17:13:40 Jeff

我对Python的信仰是恢复。 2轻微偏差。我实际上不知道脚本运行之前有多少列。我只需将所需的字符串提供给您的代码。另外，我在整个DataFrame中还有很多其他字段。现在我从其他逻辑确定的列名称列表中进行子设置。我仍然想知道是否有一种方法来实现这种逻辑的矢量化。即，如果列表中的列名然后执行此操作，否则请执行此操作，而不使用该构造。整个过程就是通过psql COPY保存并加载到postgres，这是SQL数组的格式。 – ideamotor

只要使用''df.to_sql（）''（在即将推出的0.12中重命名），你会好得多，请看这里：http://pandas.pydata.org/pandas-docs/dev/io.html#sql-queries – Jeff

你可能会觉得这很有用：https：//gist.github.com/catawbasam/3164289 – Jeff

单引号替换，处理pandas/python2中的空整数

回答

相关问题