2017-01-24 40 views
-1

我有一个文本文件,像这样的10K字的列表:AttributeError的:“名单”对象有没有属性“低” gensim

G15 KDN C30A 行动标准 喷笔 空气稀释

我想将它们转换为使用后续处理此代码GenSim下套管令牌:

data = [line.strip() for line in open("C:\corpus\TermList.txt", 'r')] 
texts = [[word for word in data.lower().split()] for word in data] 

,我也得到了followi ng回调:

AttributeErrorTraceback (most recent call last) 
<ipython-input-84-33bbe380449e> in <module>() 
     1 data = [line.strip() for line in open("C:\corpus\TermList.txt", 'r')] 
----> 2 texts = [[word for word in data.lower().split()] for word in data] 
     3 
AttributeError: 'list' object has no attribute 'lower' 

任何建议,我在做什么错,如何纠正它将不胜感激!谢谢!!

回答

4

尝试:

data = [line.strip() for line in open("C:\corpus\TermList.txt", 'r')] 
texts = [[word.lower() for word in text.split()] for text in data] 

你想)申请.lower(数据,这是一个列表。
.lower()只能应用于字符串。

+0

谢谢!!!它工作完美。现在我明白我做错了什么。我是python的新手。 – tom

+0

np队友,别忘了upvote/mark回答:) – epattaro

1

你需要

texts = [[word.lower() for word in line.split()] for line in data] 

data[... for line in data])代码为每line生成([word.lower() for word in line.split()])的小写字母单词的列表。每个str line将包含一系列空格分隔的单词。 line.split()将把这个序列变成列表。而word.lower()会将每个单词转换为小写。

0

你在做什么错的是,呼吁列表的字符串方法(lower())(在你的情况下,数据)

data = [line.strip() for line in open('corpus.txt', 'r')] 

让行作为列表条目后,你应该做的是

texts = [[words for words in sentences.lower().split()] for sentences in data] 
#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^*********^^^^^^^^^^^^^^^^^^^^^^*********^^^^ 
#you should call lower on iter. value - in our case it is "sentences" 

这将给你列表的列表。每个列表都包含小写单词表单行。

$ tail -n 10 corpus.txt 
G15 KDN C30A Action Standard Air Brush Air Dilution 
G15 KDN C30A Action Standard Air Brush Air Dilution 
G15 KDN C30A Action Standard Air Brush Air Dilution 
G15 KDN C30A Action Standard Air Brush Air Dilution 
G15 KDN C30A Action Standard Air Brush Air Dilution 
G15 KDN C30A Action Standard Air Brush Air Dilution 
G15 KDN C30A Action Standard Air Brush Air Dilution 
G15 KDN C30A Action Standard Air Brush Air Dilution 
G15 KDN C30A Action Standard Air Brush Air Dilution 
G15 KDN C30A Action Standard Air Brush Air Dilution 


$ python 
Python 2.7.12 (default, Nov 19 2016, 06:48:10) 
[GCC 5.4.0 20160609] on linux2 
Type "help", "copyright", "credits" or "license" for more information. 
>>> data = [line.strip() for line in open('corpus.txt', 'r')] 
>>> texts = [[words for words in sentences.lower().split()] for sentences in data] 
>>> texts[:5] 
[['g15', 'kdn', 'c30a', 'action', 'standard', 'air', 'brush', 'air', 'dilution'], ['g15', 'kdn', 'c30a', 'action', 'standard', 'air', 'brush', 'air', 'dilution'], ['g15', 'kdn', 'c30a', 'action', 'standard', 'air', 'brush', 'air', 'dilution'], ['g15', 'kdn', 'c30a', 'action', 'standard', 'air', 'brush', 'air', 'dilution'], ['g15', 'kdn', 'c30a', 'action', 'standard', 'air', 'brush', 'air', 'dilution']] 
>>> 

确定您可以平放或保持原样。

>>> flattened = reduce(lambda x,y: x+y, texts) 
>>> flattened[:30] 
['g15', 'kdn', 'c30a', 'action', 'standard', 'air', 'brush', 'air', 'dilution', 'g15', 'kdn', 'c30a', 'action', 'standard', 'air', 'brush', 'air', 'dilution', 'g15', 'kdn', 'c30a', 'action', 'standard', 'air', 'brush', 'air', 'dilution', 'g15', 'kdn', 'c30a'] 
>>> 
相关问题