2014-12-29 21 views
2

比方说,我有一个音频wav文件用句:比较字节时,是否应将提取的音频采样包含在其原始源中?

+-----------+----------------------------------------+ 
| meta data | 'Audio recognition sometimes is trick' |.wav 
+-----------+----------------------------------------+ 

现在考虑开在这个无畏的音频和提取并保存在基于其波平局另一个文件中的单词“有时”。

+-----------+-------------+ 
| meta data | 'sometimes' |.wav 
+-----------+-------------+ 

然后我用这个Java代码仅从这两个文件获得音频数据:

//... 
    Path source = Paths.get("source.wav"); 
    Path sample = Paths.get("sometimes.wav"); 
    int index = compare(transform(source), transform(sample)); 
    System.out.println("Shouldn't I be greater than -1!? " + (index > -1)); 
    //... 

    private int compare(int[] source, int[] sample) throws IOException { 
     return Collections.indexOfSubList(Arrays.asList(source), Arrays.asList(sample)); 
    } 

    private int[] transform(Path audio) throws IOException, UnsupportedAudioFileException { 
    try (AudioInputStream ais = AudioSystem.getAudioInputStream(
      new ByteArrayInputStream(Files.readAllBytes(audio)))) { 

     AudioFormat format = ais.getFormat(); 
     byte[] audioBytes = new byte[(int) (ais.getFrameLength() * format.getFrameSize())]; 
     int nlengthInSamples = audioBytes.length/2; 
     int[] audioData = new int[nlengthInSamples]; 
     for (int i = 0; i < nlengthInSamples; i++) { 
      int LSB = audioBytes[2*i]; /* First byte is LSB (low order) */ 
      int MSB = audioBytes[2*i+1]; /* Second byte is MSB (high order) */ 
      audioData[i] = (MSB << 8) | (255 & LSB); 
     } 
     return audioData; 
    } 
} 

现在又来了我的问题。

考虑到之前提到的提取,此代码不应该能够在原始音频文件内找到'有时'音频数据字节吗?

我想比较内容作为字符串,但没有幸运可言:

new String(source).contains(new String(sample)); 

有人能指出我在这里失踪?

+0

这些未压缩(PCM)WAV?另外,你的两个文件的帧大小是多少? – NPE

+2

我很困惑的方法来读取音频文件。为什么不使用“AudioInputStream ais = AudioSystem.getAudioInputStream(url);”这假设您传递文件的URL而不是Path,无论资源是否在jar中或程序的外部,该路径都应该有效。然后,在解码之后测试之前测试比较字节[]数组到PCM。这是我的建议,作为解决问题的第一步,我会做什么。如果原始文件和Audacity剪辑的格式不相同,则即使听起来相同,生成的PCM也肯定也是不相同的。 –

+0

@菲尔。其实你的第一个建议简化了一些线,谢谢。但即使比较没有转换的字节,我也无法在源音频文件中找到样本。考虑到我使用Audacity从源代码中提取了样本,它应该保留通道数,速率等等,对吧?即使如此,以下是AudioFormat从AudioInputStream获取的内容:PCM_SIGNED 22050.0 Hz,16位,单声道,2字节/帧,小尾数 PCM_SIGNED 22050。0 Hz,16位,单声道,2字节/帧,小端 – zeh

回答

0

@菲尔,你是那个人!你的提示让我看到了解决方案!

  1. 大胆性示例音频提取以某种不同的方式编码示例字节;

  2. 我编写了一个Java程序来识别源音频中的静音,然后我逐个分割了一些 样本;

  3. 比较匹配的源和新的非大胆样本!

这里有新的转变和比较:

private int compare(byte[] captchaData, byte[] sampleData) throws IOException { 
    return new String(captchaData).indexOf(new String(sampleData)); 
} 

private byte[] transform(Path audio) throws IOException, UnsupportedAudioFileException { 
    AudioInputStream ais = AudioSystem.getAudioInputStream(audio.toFile()); 
    AudioFormat format = ais.getFormat(); 
    try (ByteArrayOutputStream baos = new ByteArrayOutputStream()) { 
     int nBufferSize = 1024 * format.getFrameSize(); 
     byte[] abBuffer = new byte[nBufferSize]; 
     int nBytesRead; 
     while ((nBytesRead = ais.read(abBuffer)) > -1) { 
      baos.write(abBuffer, 0, nBytesRead); 
     } 
     return baos.toByteArray(); 
    } 
} 

分离器:

private List<byte[]> split(byte[] audioData) { 
    System.out.println(audioData.length); 
    List<byte[]> byteList = new ArrayList<>(); 
    int zeroCounter = 0; 
    int lastPos = 0; 
    for (int i = 0; i < audioData.length; i++) { 
     if (audioData[i] >= -1 && audioData[i] <= 1) { 
      zeroCounter++; //too many leading 'zeros' could indicate silence or very low noise... 
     } else if (zeroCounter > 0) { 
      if (zeroCounter > 2000) { 
       int from = lastPos; 
       int to = i - (zeroCounter/2); 
       byteList.add(
        Arrays.copyOfRange(
         audioData, 
         from, 
         to)); 
       System.out.println("split from: " + from + " to: " + to); 
       lastPos = to; 
      } 
      zeroCounter = 0; 
     } 
    } 
    return byteList; 
} 

谢谢!