2017-11-18 36 views
0

我从网站复制代码听特定字词使用Python pocketsphinx.It虽然运行,但从来没有输出关键字作为expected.This是我的代码:Pocketsphinx在python回报关键字搜索随机单词

import sys, os 
from pocketsphinx.pocketsphinx import * 
from sphinxbase.sphinxbase import * 
import pyaudio 

# modeldir = "../../../model" 
# datadir = "../../../test/data" 

modeldir="C://Users//hp//AppData//Local//Programs//Python//Python35//Lib//site-packages//pocketsphinx//model//en-us" 
dictdir="C://Users//hp//AppData//Local//Programs//Python//Python35//Lib//site-packages//pocketsphinx//model//cmudict-en-us.dict" 
lmdir="C://Users//hp//AppData//Local//Programs//Python//Python35//Lib//site-packages//pocketsphinx//model//en-us.lm.bin" 
# Create a decoder with certain model 
config = Decoder.default_config() 
config.set_string('-hmm', modeldir) 
config.set_string('-lm', lmdir) 
config.set_string('-dict', dictdir) 
config.set_string('-keyphrase', 'forward') 
config.set_float('-kws_threshold', 1e+20) 

p = pyaudio.PyAudio() 
stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=1024) 
stream.start_stream() 

# Process audio chunk by chunk. On keyword detected perform action and restart search 
decoder = Decoder(config) 
decoder.start_utt() 
while True: 
    buf = stream.read(1024) 
    if buf: 
     decoder.process_raw(buf, False, False) 
    else: 
     break 
    if decoder.hyp() != None: 
     #print(decoder.hyp().hypstr) 
     if decoder.hyp().hypstr == 'forward': 
     print ([(seg.word, seg.prob, seg.start_frame, seg.end_frame) for seg in decoder.seg()]) 
     print ("Detected keyword, restarting search") 
     decoder.end_utt() 
     decoder.start_utt() 

此外,当我使用print(decoder.hyp().hypstr)

它只是输出随机单词时,我如果我说一个字或行其输出讲anything.For例如:

the 
the 
the 
the 
the 
the 
the 
the 
the 
the 
the 
the 
the 
the 
the 
the 
the 
the 
the 
the 
the 
the 
the 
the 
the 
the 
the 
the 
the 
the 
the 
the 
the da 
the head 
the bed 
the bedding 
the heading of 
the bedding and 
the bedding and 
the bedding and 
the bedding and 
the bedding and 
the bedding and 
the bedding and 
the bedding and 
the bedding and 
the bedding and 
the bedding and 
the bedding and 
the bedding and well 
the bedding and well 
the bedding and well 
the bedding and butler 
the bedding and what lingus 
the bedding and what lingus 
the bedding and what lingus 
the bedding and what lingus ha 
the bedding and blessed are 
the bedding and blessed are 
the bedding and what lingus on 
the bedding and what lingus want 
the bedding and what lingus want 
the bedding and what lingus want 
the bedding and what lingus want 
the bedding and what lingus want or 
the bedding and what lingus want to talk 
the bedding and what lingus current top 
the bedding and what lingus want to talk 
the bedding and what lingus want to talk 
the bedding and what lingus want to talk 
the bedding and what lingus want to talk 
the bedding and what lingus want to talk to her 
the bedding and what lingus want to talk to her 
the bedding and what lingus want to talk to her 
the bedding and what lingus want to talk to her 

请帮助我通过它。我只是一个Python新手。

回答

1

首先,我只是想澄清;你的Pocketsphinx 工作。

因此,根据我使用pocketsphinx的经验,您几乎可以使用most accurate语音识别工具,但可能是您离线解决方案的最佳选择。 Pocketsphinx只能翻译您的文字(音频),最好像它的'model规定的那样。这些模型似乎仍然是一项正在进行的工作,其中大部分需要改进。有几件事你可以尝试提高识别的准确性;如reducing noisetuning the recognition,但这不在此问题的直接范围之内。

从我的理解你的代码中,你正在寻找一个特定的关键字被说出来(用户的声音),并使用pocketshinx的后端识别它。这个关键词似乎是“前进”的。你可以进一步阅读如何正确完成"hot word listening"

你有正确的想法,但方法可以改进。这是我的“速战速决”版本的代码:

import os 
import pyaudio 
import pocketsphinx as ps 

modeldir = "C://Users//hp//AppData//Local//Programs//Python//Python35//Lib//site-packages//pocketsphinx//model//" 

# Create a decoder with certain model 
config = ps.Decoder.default_config() 
config.set_string('-hmm', os.path.join(modeldir, 'en-us')) 
config.set_string('-lm', os.path.join(modeldir, 'en-us.lm.bin')) 
config.set_string('-dict', os.path.join(modeldir, 'cmudict-en-us.dict')) 
config.set_string('-keyphrase', 'forward') 
config.set_float('-kws_threshold', 1e+20) 

p = pyaudio.PyAudio() 
stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=1024) 
stream.start_stream() 

# Process audio chunk by chunk. On keyword detected perform action and restart search 
decoder = ps.Decoder(config) 
decoder.start_utt() 

while True: 
    buf = stream.read(1024) 
    if buf: 
     decoder.process_raw(buf, False, False) 
    else: 
     break 
    if decoder.hyp() is not None: 
     print(decoder.hyp().hypstr) 
     if 'forward' in decoder.hyp().hypstr: 
      print([(seg.word, seg.prob, seg.start_frame, seg.end_frame) for seg in decoder.seg()]) 
      print("Detected keyword, restarting search") 
      decoder.end_utt() 
      decoder.start_utt() 

对于任何一个pocketsphinx.Decoder()“会话”(即调用.start_utt()方法而随后调用.ent_utt()),该decoder.hyp().hypstr变量将继续有效将单词添加到自身一旦它检测到输入音频流具有来自pocketsphinx解码的“有效”翻译/识别。

您已使用if decoder.hyp().hypstr == 'forward':。它所做的是强制整个字符串完全“向前”,以便代码进入(我认为,期望...是?)条件代码块。由于pocketshinx默认情况下不是很准确,因此通常需要尝试大部分单词才能实际注册正确的单词。由于这个原因,并且自从decoder.hyp().hypstr增加到自身(如前所述),我已经使用了线if 'forward' in decoder.hyp().hypstr:。这会在整个字符串中查找所需的关键字“forward”。这样,直到找到关键字才允许识别错误。

我希望它有帮助!

+0

谢谢你的回答。但是这段代码对兄弟来说太没有帮助了。它永远不会在语言中识别单词“前进”,而只是在我对它说话时打印随机单词。是否有什么我在模型中缺少的东西? – TechieBoy101

+0

所有这一切意味着pocketsphinx的“翻译”对于您输入的数据而言并不十分准确。因此,正如我指出的那样,在pocketsphinx正确识别您的单词之前,您将不得不尝试几次(多次)。我明白这是多么令人不满。然后,您需要查看**增加识别的准确性,**正确**执行“热门词汇收听”。这些链接在我原来的答案中提供。 –

0

您需要删除此行

config.set_string('-lm', lmdir) 

关键词的搜索和LM搜索是互斥的。

+0

非常感谢,真的很有用。我需要问你是否有办法在pocketsphinx中听到1个以上的关键词或句子。这可能吗? – TechieBoy101

+0

是的,您可以使用关键词列表,请参阅http://cmusphinx.github.io/wiki/tutoriallm –