0
我已删除的熊猫数据帧的某些行,但在新的数据帧的索引将不会从该刷新,即 :刷新指数
id marks
1 123 45
2 124 67
3 127 89
4 257 10
5 345 34
我已获得:
id marks
2 124 67
4 257 10
5 345 34
虽然我想:
id marks
1 124 67
2 257 10
3 345 34
我已删除的熊猫数据帧的某些行,但在新的数据帧的索引将不会从该刷新,即 :刷新指数
id marks
1 123 45
2 124 67
3 127 89
4 257 10
5 345 34
我已获得:
id marks
2 124 67
4 257 10
5 345 34
虽然我想:
id marks
1 124 67
2 257 10
3 345 34
对于默认使用索引reset_index
- 指数开始从0
索引的length
:
df = df.reset_index(drop=True)
print (df)
id marks
0 124 67
1 257 10
2 345 34
#if need starts index values from 1
df.index = df.index + 1
print (df)
id marks
1 124 67
2 257 10
3 345 34
另一种解决方案是分配值指标:
df.index = range(1, len(df.index) + 1)
print (df)
id marks
1 124 67
2 257 10
3 345 34
最快的是使用RangeIndex:
df.index = pd.RangeIndex(1, len(df.index) + 1)
print (df)
id marks
1 124 67
2 257 10
3 345 34
时序是真的有趣:
In [19]: %timeit df.reset_index(drop=True)
The slowest run took 7.41 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 83.3 µs per loop
In [20]: %timeit df.set_index(np.arange(1, len(df)+1))
The slowest run took 7.06 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 114 µs per loop
In [21]: %timeit df.index = range(1, len(df.index) + 1)
The slowest run took 13.12 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 14.5 µs per loop
In [22]: %timeit df.index = np.arange(1, len(df.index) + 1)
The slowest run took 11.54 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 26.9 µs per loop
In [23]: %timeit df.index = pd.RangeIndex(1, len(df.index) + 1)
The slowest run took 14.43 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 8.07 µs per loop
df = pd.concat([df]*10000)
In [26]: %timeit df.reset_index(drop=True)
The slowest run took 4.71 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 109 µs per loop
In [27]: %timeit df.set_index(np.arange(1, len(df)+1))
The slowest run took 4.71 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 238 µs per loop
In [28]: %timeit df.index = range(1, len(df.index) + 1)
The slowest run took 13.19 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 14.8 µs per loop
In [29]: %timeit df.index = np.arange(1, len(df.index) + 1)
The slowest run took 11.29 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 62.8 µs per loop
In [30]: %timeit df.index = pd.RangeIndex(1, len(df.index) + 1)
The slowest run took 14.33 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 8.24 µs per loop
df = df.set_index(np.arange(1, len(df)+1))