关于字符串查找的困惑？

我有我想要搜索的数据列表。这个新的数据列表的结构就像这样。关于字符串查找的困惑？

的姓名，地址DOB家庭成员的年龄身高等。

我希望通过数据的线，所以我停在该名称后显示优化搜索的搜索“”搜索。我相信我想要使用这个命令：

str.find(sub[, start[, end]])

虽然我在编写代码时遇到了麻烦。有关如何让字符串为我找到工作的任何提示？

下面是一些样本数据：

Bennet, John, 17054099","5","156323558","-","0", 714 // 
Menendez, Juan,7730126","5","158662525" 11844 // 
Brown, Jamal,"9","22966592","+","0",,"4432 //

的想法是我希望我的程序只搜索到第一“”并通过大线的其余部分不进行搜索。

编辑。所以这是我的代码。

我想要搜索completedataset中的行直到第一个逗号。我仍然对如何将这些建议落实到现有的代码中感到困惑。

counter = 1 
for line in completedataset: 
    print counter 
    counter +=1 
    for t in matchedLines: 
     if t in line: 
      smallerdataset.write(line)

来源

2010-08-04 Robert A. Fettikowski

你可以举一个你的数据列表的例子吗？ – 2010-08-04 15:09:17

这些'//'是什么？新行？ – kennytm 2010-08-04 15:16:41

如果我正确理解您的规格，

for thestring in listdata: 
    firstcomma = thestring.find(',') 
    havename = thestring.find(name, 0, firstcomma) 
    if havename >= 0: 
     print "found name:", thestring[:firstcomma]

编辑：给定Q的OP的编辑，这将成为类似：

counter = 1 
for line in completedataset: 
    print counter 
    counter += 1 
    firstcomma = thestring.find(',') 
    havename = thestring.find(t, 0, firstcomma) 
    if havename >= 0: 
     smallerdataset.write(line)

当然，使用counter是unPythonically低的水平，和更好的当量是

for counter, line in enumerate(completedataset): 
    print counter + 1 
    firstcomma = thestring.find(',') 
    havename = thestring.find(t, 0, firstcomma) 
    if havename >= 0: 
     smallerdataset.write(line)

但这并不影响问题。

来源

2010-08-04 15:21:13

那么，我怎样才能将这个代码集成到我已经拥有的代码中呢......查看我编辑过的帖子。谢谢。 – 2010-08-04 15:53:31

你会在每行可能进行搜索，所以你可以通过将它们分割“”然后做的第一元件上的搜索：

for line in file: 
    name=line.split(', ')[0] 
    if name.find('smth'): 
     break

来源

2010-08-04 15:18:46

为什么smth？我有多条线路和多个我想要搜索的名称。 – 2010-08-04 15:39:25

您可以很直接地做到这一点：

s = 'Bennet, John, 17054099","5","156323558","-","0", 714 //' 
print s.find('John', 0, s.index(',')) # find the index of ',' and stop there

来源

2010-08-04 15:19:40

任何你必须使用find的原因？为什么不这样做：

if str.split(",", 1)[0] == search_string: 
    ...

编辑： 想我要指出的 - 我只是测试这和split方法看起来一样快（如果不是发现不是更快）。使用timeit模块测试两种方法的性能，并查看您得到的结果。

尝试：

python -m timeit -n 10000 -s "a='''Bennet, John, 17054099','5','156323558','-','0', 714'''" "a.find('Bennet', 0, a.find(','))"

使名称较长（如"BennetBennetBennetBennetBennetBennet"），你知道，发现遭受超过分裂

注：上午

python -m timeit -n 10000 -s "a='''Bennet, John, 17054099','5','156323558','-','0', 714'''" "a.split(',',1)[0] == 'Bennet'"

然后用比较使用split与maxsplit选项

来源

2010-08-04 15:22:01 domino

OP表示他不想处理冗长的问题;使用拆分将检查整条线，并建立所有领域的数组，当他明确要通过不处理整条线进行优化时。 – 2010-08-04 15:24:09

是的，这正是我不想做的。 – 2010-08-04 15:29:15

如果你正在检查每行的很多名字，看起来最大的优化可能只是处理逗号的每一行！

for line in completedataset: 
    i = line.index(',') 
    first_field = line[:i] 
    for name in matchedNames: 
     if name in first_field: 
      smalldataset.append(name)

来源

2010-08-05 23:59:22 fholo

关于字符串查找的困惑？

回答

相关问题