指定一个布尔过滤表达式到python脚本

我有一个包含学生信息的CSV（逗号分隔值）文件。列标题看起来像StudentId，StudentFirstName，StudentLastName，StudentZipCode，StudentHeight，StudentCommuteMethod等，后续行包含个别学生的信息。现在，我想编写一个Python 2.5脚本，它将过滤条件作为命令行参数，并返回与此过滤条件匹配的学生（行）集合。例如，过滤条件可以是类似下面（使用伪代码格式）：指定一个布尔过滤表达式到python脚本

"StudentCommuteMethod = Bus AND StudentZipCode = 12345"

和Python脚本可以被调用：

MyPythonScript.py -filter "<above string>" -i input.csv

这应返回所有学生的名单（行）住在一个邮政编码为12345的地区，并乘坐巴士上下班。过滤器也可以是任意复杂的，并且可以包括任意数量的AND，OR操作符。

问题：

什么是在其中该程序可以具有用户指定的过滤条件（作为命令行参数）的最佳格式。对于简单的表达式，格式应该很简单，并且必须足够强大才能表达所有类型的条件。
- 我想到的格式是（1）SQL和（2）python语言本身。无论哪种情况，我都不知道如何让python在运行时应用这些过滤器。也就是说，如何在命令行中输入表达式并将其应用于行以获得真或假？
我想有一个以可视方式表达过滤条件的UI。也许是允许每行输入一个简单的双操作数条件的东西，以及一些使用AND和OR组合它们的方式。它应该能够以上面（1）决定的格式发出一个过滤器表达式。有一些我可以重复使用的开源项目吗？
如果您认为有比传递命令行表达式+ UI更好的方法来解决此问题，请随时提及它。最终，用户（一位对编程了解不多的电气工程师）应该能够轻松地输入过滤器表达式。

谢谢！

注意：我无法控制输入或输出格式（包括csv文件）。

来源

2011-09-09 LeoNeo

您似乎在使用python重新创建数据库。为什么不使用数据库？ – jozzas

安装openoffice，导入并使用自动过滤功能 – 2011-09-09 05:06:03

非常一般的问题，一个简单的网络搜索提供了如下的答案： 1.看看如何解析命令行参数：http://docs.python.org/py3k/ library/argparse.html 2. UI库有几个python绑定可用：http://wiki.python.org/moin/GuiProgramming 3.“一位对编程了解不多的电气工程师：什么？编程与使用这个有关吗？ – steabert

你肯定试图在Python中重新实现SQL。我相信使用关系数据库并运行SQL查询会更好。

但是，关于问题1，您可以轻松地让用户在每行数据上输入Python表达式和eval()。

这是一个工作示例，它使用exec将列值绑定到局部变量（一个讨厌的黑客，我承认）。为简洁起见省略了CVS解析。

import optparse, sys 

# Assume your CSV data is read into a list of dictionaries 
sheet = [ 
    {'StudentId': 1, 'StudentFirstName': 'John', 'StudentLastName': 'Doe', 'StudentZipCode': '12345', 'StudentCommuteMethod': 'Bus'}, 
    {'StudentId': 2, 'StudentFirstName': 'Bob', 'StudentLastName': 'Chen', 'StudentZipCode': '12345', 'StudentCommuteMethod': 'Bus'}, 
    {'StudentId': 3, 'StudentFirstName': 'Jane', 'StudentLastName': 'Smith', 'StudentZipCode': '12345', 'StudentCommuteMethod': 'Train'}, 
    {'StudentId': 4, 'StudentFirstName': 'Dave', 'StudentLastName': 'Burns', 'StudentZipCode': '45467', 'StudentCommuteMethod': 'Bus'}, 
] 

# Options parsing 
parser = optparse.OptionParser() 
parser.add_option('--filter', type='string', dest='filter') 
options, args = parser.parse_args() 

# Filter option is required 
if options.filter is None: 
    print >> sys.stderr, 'error: no filter expression given' 
    sys.exit(1) 

# Process rows and build result set 
result = [] 
for row in sheet: 
    # Bind each column to a local variable (StudentId, StudentFirstName, etc.); 
    # this allows evaluating Python expressions on a row, for example: 
    # 'StudentCommuteMethod = "Bus" and StudentZipCode = "12345"' 
    for col, val in row.iteritems(): 
     exec '%s = %s' % (col, repr(val)) 

    # Apply filter to the row 
    if eval(options.filter): 
     result.append(row) 

# Print out result set 
for row in result: 
    print row

我测试使用以下的过滤器表达式：

./MyPythonScript.py --filter 'StudentCommuteMethod == "Bus" and StudentZipCode == "12345"' 
./MyPythonScript.py --filter 'StudentCommuteMethod == "Bus" or StudentZipCode == "12345"'

（壳当心运行在命令行程序时引用规则。）

来源

2011-09-09 07:19:57

感谢您的帮助！ – LeoNeo

这是上了Danilo的微小变化建议。您可避免通过传递当地人绑定的每一行变量exec字典eval，并csv.DictReader返回类型的字典很好地工作这样的：

import csv, optparse 
infile = open('datafile.csv') 
reader = csv.DictReader(infile) 

parser = optparse.OptionParser() 
parser.add_option('--filter', type='string', dest='filter') 
options, args = parser.parse_args() 

for row in reader: 
    if eval(options.filter, row): 
     print row

这是假设输入文件的第一行有列标题，任何你想在表达式中使用的标题必须是有效的Python标识符。

来源

2011-09-09 07:35:57 babbageclunk

感谢您提供关于'eval'的建议，我没有尝试传递适当的字典作为当地人。 –

指定一个布尔过滤表达式到python脚本

回答

相关问题