如何将numpy数组转换为常规python列表？

所以我正在使用熊猫从csv文件中输入内容并使用nltk对它执行标记。但是我收到以下错误：如何将numpy数组转换为常规python列表？

Traceback (most recent call last): 
    File "test.py", line 20, in <module> 
    word = nltk.word_tokenize(words) 
    File "/home/codelife/.local/lib/python3.5/site-packages/nltk/tokenize/__init__.py", line 109, in word_tokenize 
    return [token for sent in sent_tokenize(text, language) 
    File "/home/codelife/.local/lib/python3.5/site-packages/nltk/tokenize/__init__.py", line 94, in sent_tokenize 
    return tokenizer.tokenize(text) 
    File "/home/codelife/.local/lib/python3.5/site-packages/nltk/tokenize/punkt.py", line 1237, in tokenize 
    return list(self.sentences_from_text(text, realign_boundaries)) 
    File "/home/codelife/.local/lib/python3.5/site-packages/nltk/tokenize/punkt.py", line 1285, in sentences_from_text 
    return [text[s:e] for s, e in self.span_tokenize(text, realign_boundaries)] 
    File "/home/codelife/.local/lib/python3.5/site-packages/nltk/tokenize/punkt.py", line 1276, in span_tokenize 
    return [(sl.start, sl.stop) for sl in slices] 
    File "/home/codelife/.local/lib/python3.5/site-packages/nltk/tokenize/punkt.py", line 1276, in <listcomp> 
    return [(sl.start, sl.stop) for sl in slices] 
    File "/home/codelife/.local/lib/python3.5/site-packages/nltk/tokenize/punkt.py", line 1316, in _realign_boundaries 
    for sl1, sl2 in _pair_iter(slices): 
    File "/home/codelife/.local/lib/python3.5/site-packages/nltk/tokenize/punkt.py", line 310, in _pair_iter 
    prev = next(it) 
    File "/home/codelife/.local/lib/python3.5/site-packages/nltk/tokenize/punkt.py", line 1289, in _slices_from_text 
    for match in self._lang_vars.period_context_re().finditer(text): 
TypeError: expected string or bytes-like object

以下是代码：

from textblob import TextBlob 
import nltk  #for cleaning and stop wrds removal 
import pandas as pd  #csv 
import numpy 

data = pd.read_csv("Sample.csv", usecols=[0]) #reading from csv file 
num_rows = data.shape[0] 
#print(questions) 

# cleaning the data 

count_nan = data.isnull().sum() #counting no of null elements 
count_without_nan = count_nan[count_nan==0] #no of not null elements 
data = data[count_without_nan.keys()] # removing null columns 

data_mat = data.as_matrix(columns= None) #converting to numpy matrix 
print(data_mat) 
for question in data_mat: 
    words = question.tolist() 
    word = nltk.word_tokenize(words) 
    print(word)

我以为这是因为我使用numpy的阵列。我如何将它转换成一个普通的Python列表？

来源

2017-05-08 strawhatsai

不，这是因为'nltk.word_tokenize''期望的字符串或类似字节的对象'，如错误所示。这无助于转换为一个'list'，无论如何你已经成功完成了。我需要一个*字符串*。 –

看看“单词”或其中的一部分。确保你了解它是什么以及它包含什么。 – hpaulj

['df.apply（）']（http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.apply.html）是你的朋友=） – alvas

nltk的word_tokenize()函数需要得到一个字符串。它将返回它包含的令牌列表。要将其应用于整个Python列表，numpy数组或熊猫数据框，您需要在Python中进行迭代（循环或理解）或使用numpy或pandas方法。例如，如果words是np.array，则可以使用以下理解来遍历它。

sentences = [ nltk.word_tokenize(string) for string in words ]

如果单词是别的，您需要修改代码或向我们展示您的问题的外观。

来源

2017-05-08 21:11:56 alexis

如何将numpy数组转换为常规python列表？

回答

相关问题