2016-01-30 198 views
0

我有一个拥有唯一记录的熊猫数据框,但我需要根据其中一列创建唯一键。以下是示例数据,我尝试通过迭代数据并将计数增加1来创建第二列。我的计划是加入这两个创造独特的关键。从数据列创建唯一索引

问题: 有没有更好的方法? 我的方法有什么缺陷?

import pandas as pd 
import numpy as np 

d = {'subid': {0: '327598650129611740', 1: '327598650129611740', 2: '327559921352747760', 3: '327676431535405027', 4: '327676431535405027', 5: '327676431535405027', 6: '327662567602840733', 7: '327778468325442201', 8: '327777161261272775', 9: '327777161261272775'}} 

df = pd.DataFrame(d) 
old_index = 0 
child_no = 1 
for subid, row in df.iterrows(): 
    if subid == old_index: 
    df['child_no'] = child_no + 1 
    old_index = subid 
    child_no = child_no + 1 
else: 
    child_no = 1 
    df['child_no'] = child_no 
    old_index = subid 

df 


subid    child_no 
0 327598650129611740 1 
1 327598650129611740 1 
2 327559921352747760 1 
3 327676431535405027 1 
4 327676431535405027 1 
5 327676431535405027 1 
6 327662567602840733 1 
7 327778468325442201 1 
8 327777161261272775 1 
9 327777161261272775 1 

期望的结果

subid    child_no 
0 327598650129611740 1 
1 327598650129611740 2 
2 327559921352747760 1 
3 327676431535405027 1 
4 327676431535405027 2 
5 327676431535405027 3 
6 327662567602840733 1 
7 327778468325442201 1 
8 327777161261272775 1 
9 327777161261272775 2 

任何帮助,将不胜感激。

+0

d = { '的子ID':{0: '327598650129611740', '327598650129611740', '327559921352747760', '327676431535405027', '327676431535405027', '327676431535405027', '327662567602840733', '327778468325442201', '327777161261272775', '327777161261272775'}} 这是不是赖特,你能提供正确的字典 –

回答

2

你可以groupby上 '的子ID',然后调用cumcount,并添加1,因为它从0开始:

In [30]: 
df['child_no'] = df.groupby('subid').cumcount()+1 
df 
Out[30]: 
       subid child_no 
0 327598650129611740   1 
1 327598650129611740   2 
2 327559921352747760   1 
3 327676431535405027   1 
4 327676431535405027   2 
5 327676431535405027   3 
6 327662567602840733   1 
7 327778468325442201   1 
8 327777161261272775   1 
9 327777161261272775   2 
+0

这工作完美,谢谢! – Zymurgist66