刷新指数

我已删除的熊猫数据帧的某些行，但在新的数据帧的索引将不会从该刷新，即：刷新指数

我已获得：

虽然我想：

来源

2017-05-04 Bernheart

对于默认使用索引reset_index - 指数开始从0索引的length：

df = df.reset_index(drop=True) 
print (df) 
    id marks 
0 124  67 
1 257  10 
2 345  34 

#if need starts index values from 1 
df.index = df.index + 1 
print (df) 
    id marks 
1 124  67 
2 257  10 
3 345  34

另一种解决方案是分配值指标：

df.index = range(1, len(df.index) + 1) 
print (df) 
    id marks 
1 124  67 
2 257  10 
3 345  34

最快的是使用RangeIndex：

df.index = pd.RangeIndex(1, len(df.index) + 1) 
print (df) 
    id marks 
1 124  67 
2 257  10 
3 345  34

时序是真的有趣：

In [19]: %timeit df.reset_index(drop=True) 
The slowest run took 7.41 times longer than the fastest. This could mean that an intermediate result is being cached. 
10000 loops, best of 3: 83.3 µs per loop 

In [20]: %timeit df.set_index(np.arange(1, len(df)+1)) 
The slowest run took 7.06 times longer than the fastest. This could mean that an intermediate result is being cached. 
10000 loops, best of 3: 114 µs per loop 

In [21]: %timeit df.index = range(1, len(df.index) + 1) 
The slowest run took 13.12 times longer than the fastest. This could mean that an intermediate result is being cached. 
100000 loops, best of 3: 14.5 µs per loop 

In [22]: %timeit df.index = np.arange(1, len(df.index) + 1) 
The slowest run took 11.54 times longer than the fastest. This could mean that an intermediate result is being cached. 
10000 loops, best of 3: 26.9 µs per loop 

In [23]: %timeit df.index = pd.RangeIndex(1, len(df.index) + 1) 
The slowest run took 14.43 times longer than the fastest. This could mean that an intermediate result is being cached. 
100000 loops, best of 3: 8.07 µs per loop

df = pd.concat([df]*10000) 

In [26]: %timeit df.reset_index(drop=True) 
The slowest run took 4.71 times longer than the fastest. This could mean that an intermediate result is being cached. 
10000 loops, best of 3: 109 µs per loop 

In [27]: %timeit df.set_index(np.arange(1, len(df)+1)) 
The slowest run took 4.71 times longer than the fastest. This could mean that an intermediate result is being cached. 
1000 loops, best of 3: 238 µs per loop 

In [28]: %timeit df.index = range(1, len(df.index) + 1) 
The slowest run took 13.19 times longer than the fastest. This could mean that an intermediate result is being cached. 
100000 loops, best of 3: 14.8 µs per loop 

In [29]: %timeit df.index = np.arange(1, len(df.index) + 1) 
The slowest run took 11.29 times longer than the fastest. This could mean that an intermediate result is being cached. 
10000 loops, best of 3: 62.8 µs per loop 

In [30]: %timeit df.index = pd.RangeIndex(1, len(df.index) + 1) 
The slowest run took 14.33 times longer than the fastest. This could mean that an intermediate result is being cached. 
100000 loops, best of 3: 8.24 µs per loop

来源

2017-05-04 10:24:31 jezrael

df = df.set_index(np.arange(1, len(df)+1))

来源

2017-05-04 10:25:33 MaxU

回答

相关问题