单词分析和得分从一个文件python

我正在做一个单词分析的句子，如
“嘿那里!!这是一个很好的电影???”
单词分析和得分从一个文件python

我有很多像上面这样的句子。我有一个巨大的数据集文件，如下所示，如果该单词存在，我必须做一个快速查找。如果确实存在，则分析并存储在字典中，例如从单词的文件中得到分数，句子的最后一个单词的分数，句子的第一个单词等等。

句子[i] =>嘿！这是一部很棒的电影？句子[0] =嘿，句子[1] =有！句子[2] =这个等等。

下面是代码：

def unigrams_nrc(file): 
    for line in file: 
     (term,score,numPos,numNeg) = re.split("\t", line.strip()) 
     if re.match(sentence[i],term.lower()): 
      #presence or absence of unigrams of a target term 
      wordanalysis["unigram"] = found 
     else: 
      found = False 
     if found: 
      wordanalysis["trail_unigram"] = found if re.match(sentence[(len(sentence)-1)],term.lower()) else not(found) 
      wordanalysis["lead_unigram"] = found if re.match(sentence[0],term.lower()) else not(found) 
      wordanalysis["nonzero_sscore"] = float(score) if (float(score) != 0) else 0    
      wordanalysis["sscore>0"] = (float(score) > 0) 
      wordanalysis["sscore"] = (float(score) != 0) 

     if re.match(tweet[len(sentence)-1],term.lower()): 
      wordanalysis["sscore !=0 last token"] = (float(score) != 0)

以下是文件（该文件超过4000个字）：

#fabulous 7.526 2301 2 
#excellent 7.247 2612 3 
#superb 7.199 1660 2 
#perfection 7.099 3004 4 
#terrific 6.922 629 1 
#magnificent 6.672 490 1 
#sensational 6.529 849 2 
#heavenly 6.484 2841 7 
#ideal 6.461 3172 8 
#partytime 6.111 559 2 
#excellence 5.875 1325 6 
@thisisangel 5.858 217 1 
#wonderful 5.727 3428 18 
elegant 5.665 537 3 
#perfect 5.572 3749 23 
#fine 5.423 2389 17 
excellence 5.416 279 2 
#realestate 5.214 114 1 
bicycles 5.205 113 1

我想知道是否有更好的方式做以上？定义更好的方法：更快，更少的代码和优雅。我是新来的python，所以我知道这不是最好的代码。我有大约4个文件，我必须去检查分数，因此希望以最好的方式实现此功能。

来源

2013-12-12 fscore

定义 “更好的方式”？更快，更少的代码，优雅？看起来你的解决方案写得不错。我认为它有效？ –

我可以建议将文件存储为JSON，以便您可以简单地将''json.laods（data）''数据文件。 –

@JamesMills更好的方式会更快，更少的代码和优雅。我的解决方案工作正常，但是看看有没有更好的方法。 – fscore

这里是我的秘诀：

使用 json.dumps()

负载在你的文件中JSON使用json.laods()

分离出来的数据加载从您的分析成独立的逻辑代码块

写您的文件出来为JSON 。如：功能

的Python dict（S）是用于查找更快地进行O（1）比迭代复杂性，其具有O（N） - 所以你会得到一些性能优势存在，只要你最初加载你的数据文件。

例子（S）：

from json import dumps, loads 


def load_data(filename): 
    return json.loads(open(filename, "r").read()) 

def save_data(filename, data): 
    with open(filename, "w") as f: 
     f.write(dumps(data)) 

data = load_data("data.json") 

foo = data["word"] # O(1) lookup of "word"

我可能会存储这样的数据：

data = { 
    "fabulous": [7.526, 2301, 2], 
    ... 
}

然后，您会怎么做：

stats = data.get(word, None) 
if stats is not None: 
    score, x, y = stats 
    ...

NB：...是不是真正的代码和占位符在哪里你应该填写空白。

来源

2013-12-12 11:10:20

，请发现我的示例：我以前从未使用json。 – fscore

字典比迭代更好，你的意思是，把文件放在字典中，并进行查找比使用for循环？ – fscore

已更新。往上看。 –

也许一次将单词/分数文件加载到内存中作为字典的词典，然后遍历每个句子中的每个单词，检查句子中每个单词的单词文件中的单词键。

会是这样的工作：

word_lookup = load_words(file) 
for s in sentences: 
    run_sentence(s) 

def load_words(file): 
    word_lookup = {} 
    for line in file: 
     (term,score,numPos,numNeg) = re.split("\t", line.strip()) 
     if not words.has_key(term): 
      words[term] = {'score': score, 'numPos': numPos, 'numNeg': numNeg} 
    return word_lookup 

def run_sentence(s): 
    s = standardize_sentence(s) # Assuming you want to strip punctuation, symbols, convert to lowercase, etc 
    words = s.split(' ') 
    first = words[0] 
    last = words[-1] 
    for word in words: 
     word_info = check_word(word) 
     if word_info: 
      # Matched word, use your scores somehow (word_info['score'], etc) 

def check_word(word): 
    if word_lookup.has_key(word): 
     return word_lookup[word] 
    else: 
     return None

来源

2013-12-12 11:46:17

这段代码的输出是什么？和它有什么不同？ – fscore

单个句子的目标输出是什么？一个字典，总结各种单词的分数，或句子中每个单词的字典？这里的主要建议是在检查句子中的每个单词时，将word文件存储为大字典，以获得word_lookup.has_key（）的好处。 –

单词分析和得分从一个文件python

回答

相关问题