在Pandas中按名称标识多列

有没有办法使用文本匹配或正则表达式来选择列的子集？在Pandas中按名称标识多列

在该R这将是这样的：

attach(iris) #Load the 'Stairway to Heaven' of R's built-in data sets 
iris[grep(names(iris),pattern="Length")] #Prints only columns containing the word "Length"

来源

2014-04-11 duber

您可以使用filter方法（使用axis=1过滤列名称）。此功能有不同的可能性：

相当于if 'Length' in col：
```
df.filter(like='Length', axis=1) 
```
使用正则表达式（但是，它使用re.search而不是re.match，所以你有可能调整正则表达式）：
```
df.filter(regex=r'\.Length$', axis=1) 
```

来源

2014-04-11 14:45:47 joris

很好的信息@joris。但是我也需要获取包含一些其他字符以及列名的列名。例如“Length_1”，“Length_2”，“Width_1”，“Width_2”等是我的列名。我的过滤器函数就像df.filter（like = col +'_'，axis = 1），其中col将具有像“Length”，“Width”等等的值，这不是取值。任何想法我应该改正什么？ – JKC

你应该可以用正则表达式来做到这一点，例如'regex = r“Length | Width”' – joris

使用Python的in声明，它的工作是这样的：

#Assuming iris is already loaded as a df called 'iris' and has a proper header 
iris = iris[[col for col in iris.columns if 'Length' in col]] 
print iris.head()

或者，使用正则表达式，

import re 
iris = iris[[col for col in iris.columns if re.match(r'\.Length$',col)]] 
print iris.head()

第一个会跑得更快，但第二个会更准确。

来源

2014-04-11 14:22:06 duber

在Pandas中按名称标识多列

回答

相关问题