2015-04-04 50 views
-2

我必须找到单词是否在列表中,如果它在列表中找到,那么文件将用标记“1”写入该列表,否则文件将写入标签为“0”的列表。我的Python代码是低于遇到类型错误的错误:只能串联列表(不是 “STR”),列出如何在python中找到列表中的特定单词

f2 = open("C:/Python26/Semantics.txt",'w') 
sem = ["cells", "gene","factor","alpha", "receptor", "t","promotor"]; 
with open("C:/Python26/trigram.txt") as f: 
contents = f.readlines() 
for lines in contents: 
    tokens = lines.split('$') 
    for t in tokens: 
     if t.strip() in sem: 
      f2.write(tokens+"\t"+"1 \n"); 
     else: 
      f2.write(tokens+"\t"+"0 \n"); 
f2.close() 

我的文件看起来像这样:

IL-2$gene$expression$and 
IL-2$gene$expression$and$NF-kappa 
IL-2$gene$expression$and$NF-kappa$B 
IL-2$gene$expression$and$NF-kappa$B$activation 
gene$expression$and$NF-kappa$B$activation$through 
expression$and$NF-kappa$B$activation$through$CD28 

我所需的输出

IL-2 gene expression and 1 
IL-2 gene expression and NF-kappa 1 
IL-2 gene expression and NF-kappa B 1 
IL-2 gene expression and NF-kappa B activation 1 
gene expression and NF-kappa B activation through 1 
expression and NF-kappa B activation through CD28 0 

的情况下,我想产生像

Token           cells gene factor……. promoter 
IL-2 gene expression and       0  1  0  ………  0 
IL-2 gene expression and NF-kappa     0  1  0  ………  0 
IL-2 gene expression and NF-kappa B    0  1  0  ………  0 
IL-2 gene expression and NF-kappa B activation 0  1  0  ………  0 
gene expression and NF-kappa B activation through 0  1  0  ………  0 
expression and NF-kappa B activation through CD28 0  0  0  ………  0 

我认为将需要在代码一点点变化

+0

为什么你以semicolon结束sem,no需要在python中分号 – Hackaholic 2015-04-04 06:07:26

+0

在代码中粘贴后,选择整个块并然后按Ctrl + K缩进所有**。您的程序需要按照显示运行,因为它有缩进错误。 Andy为什么在几行后没有一个分号后面有分号? – Anthon 2015-04-04 06:11:43

回答

1

尝试这样的输出:

sem = ["cells", "gene","factor","alpha", "receptor", "t","promotor"] 
with open("C:/Python26/trigram.txt") as f, open("C:/Python26/Semantics.txt",'w') as f2: 
    for x in f: 
     x = x.strip().split("$") 
     print " ".join(x), len(set(sem) & set(x)) 
     f2.write("{} {}\n".format(" ".join(x), len(set(sem) & set(x)))) 

或写入文件,而不是打印到控制台

f2.write("{} {}\n".format(" ".join(x), len(set(sem) & set(x)))) 

输出:

IL-2 gene expression and 1 
IL-2 gene expression and NF-kappa 1 
IL-2 gene expression and NF-kappa B 1 
IL-2 gene expression and NF-kappa B activation 1 
gene expression and NF-kappa B activation through 1 
expression and NF-kappa B activation through CD28 0 

Explanation of " ".join(x), len(set(sem) & set(x))

“”。加入(X):这将加入由空格分隔列表

LEN(集(SEM)&集(X)):一套给你,没有列出(sem)& set(x)与math设置和操作相同,只会给你两个列表中的匹配元素,然后我有列表长度的列表

+0

x = x.strip.split(“$”)而不是x = x.strip()。split(“$”)的错误。 thx寻求帮助:) – 2015-04-04 06:14:14

+0

可以解释行打印“”.join(x),len(set(sem)&set(x)) – 2015-04-04 06:26:58

+0

ValueError:零长度字段名格式错误在写入文件的情况下 – 2015-04-04 06:30:51

相关问题