熊猫随机抽样配比1：1的特定列条目

我有一个熊猫数据框对象，列['text', 'label']，标签的值为'pos'或'neg'。熊猫随机抽样配比1：1的特定列条目

问题是我有更多的'负'标签列，因为我有'pos'。

现在的问题是，是否存在随机选择与'pos'句子一样多的'neg'句子的可能性，所以我得到一个新的数据框，两个标签的比例为50:50？

我是否必须计算'pos'句子将它们全部放在一个新的数据框中，然后执行neg_df = dataframe.sample(n=pos_count)并将其追加到之前创建的所有正数据框中，还是有更快的方法？

感谢您的帮助。

2016-02-11 d.a.d.a

# Sample data. 
df = pd.DataFrame({'text': ['a', 'b', 'c', 'd', 'e'], 
        'label': ['pos'] * 2 + ['neg'] * 3}) 
>>> df 
    label text 
0 pos a 
1 pos b 
2 neg c 
3 neg d 
4 neg e 

# Create views of 'pos' and 'neg' text. 
neg_text = df.loc[df.label == 'neg', 'text'] 
pos_text = df.loc[df.label == 'pos', 'text'] 

# Equally sample 'pos' and 'neg' with replacement and concatenate into a dataframe. 
result = pd.concat([neg_text.sample(n=5, replace=True).reset_index(drop=True), 
        pos_text.sample(n=5, replace=True).reset_index(drop=True)], axis=1) 

result.columns = ['neg', 'pos'] 

>>> result 
    neg pos 
0 c b 
1 d a 
2 c b 
3 d a 
4 e a

来源

2016-02-11 18:01:51 Alexander

感谢这导致了我想要的行为。首先，我不能多次使用相同的文本行，因为我正在使用它来训练分类器，但是删除'replace = True'的确有用。其次我需要追加两个新的帧而不是concat othervise我的分类器抛出一个错误。 –

熊猫随机抽样配比1：1的特定列条目

回答

相关问题