2013-01-09 45 views
0

方法:如何处理iOS检测到的NSString语言,英文无法准确检测?

- (NSString *)languageForString:(NSString *) text{ 
    return (__bridge NSString *)CFStringTokenizerCopyBestStringLanguage((CFStringRef)[text cStringUsingEncoding:NSUnicodeStringEncoding], CFRangeMake(0, MIN(text.length,100))); 
} 

用途:

NSLog(@"\"%@\" language is %@",@"Tokenizer",[self languageForString:@"tokenizer"]); 
NSLog(@"\"%@\" language is %@",@"Tokenizer detect",[self languageForString:@"Tokenizer detect"]); 
NSLog(@"\"%@\" language is %@",@"detect",[self languageForString:@"detect"]); 
NSLog(@"\"%@\" language is %@",@"我们",[self languageForString:@"我们"]); 
NSLog(@"\"%@\" language is %@",@"집안일",[self languageForString:@"집안일"]); 
NSLog(@"\"%@\" language is %@",@"Démocratie",[self languageForString:@"Démocratie"]); 
NSLog(@"\"%@\" language is %@",@"Tokenizer English",[self languageForString:@"Tokenizer English"]); 
NSLog(@"\"%@\" language is %@",@"ここはデパートです",[self languageForString:@"ここはデパートです"]); 

输出:

2013-01-09 16:12:28.582 TestCommandLine[6478:c07] "Tokenizer" language is tr<br/> 
2013-01-09 16:12:28.586 TestCommandLine[6478:c07] "Tokenizer detect" language is tr<br/> 
2013-01-09 16:12:28.586 TestCommandLine[6478:c07] "detect" language is cs<br/> 
2013-01-09 16:12:28.587 TestCommandLine[6478:c07] "我们" language is zh-Hans<br/> 
2013-01-09 16:12:28.560 TestCommandLine[6478:c07] "집안일" language is ko<br/> 
2013-01-09 16:12:28.577 TestCommandLine[6478:c07] "Démocratie" language is fr<br/> 
2013-01-09 16:12:28.590 TestCommandLine[6478:c07] "Tokenizer English" language is en<br/> 
2013-01-09 16:12:28.591 TestCommandLine[6478:c07] "ここはデパートです" language is ja<br/> 

怎么变成这样:

2013-01-09 16:12:28.582 TestCommandLine[6478:c07] "Tokenizer" language is en<br/> 
2013-01-09 16:12:28.586 TestCommandLine[6478:c07] "Tokenizer detect" language is en<br/> 
2013-01-09 16:12:28.586 TestCommandLine[6478:c07] "detect" language is en<br/> 
2013-01-09 16:12:28.587 TestCommandLine[6478:c07] "我们" language is zh-Hans<br/> 
2013-01-09 16:12:28.560 TestCommandLine[6478:c07] "집안일" language is ko<br/> 
2013-01-09 16:12:28.577 TestCommandLine[6478:c07] "Démocratie" language is fr<br/> 
2013-01-09 16:12:28.590 TestCommandLine[6478:c07] "Tokenizer English" language is en<br/> 
2013-01-09 16:12:28.591 TestCommandLine[6478:c07] "ここはデパートです" language is ja<br/> 
+2

来自'CFStringTokenizerCopyBestStringLanguage'的文档:*结果不能保证准确。通常情况下,函数需要200-400个字符才能可靠地猜出字符串的语言。*用这么短的字符串得到好的结果是不可能的。 – rmaddy

+0

这[问题/答案](https://stackoverflow.com/questions/47890747/how-to-detect-text-string-language-in-ios/47890753#47890753)可能会有所帮助。 –

回答

0

你不能识别它的方式....至少不与任何像样的准确性。 你必须提供更长的字符串。

CFStringTokenizerCopyBestStringLanguage文件说至少还需要200-400

- >有没有更好的办法,我们用我们自己的解决方案试了一下也和它需要的准确性

+0

检测单词的语言,然后翻译,不是很长的字符 –

+0

抱歉不关注。文档很清晰,SDK或其他方法没有更好的方法(尽管您可能会发现第三方组件,稍微好一点,但总的来说它们同样很糟糕) –

+0

好的,我知道,谢谢。 –

1

这里更多的文字是我的解决方案

- (NSString *)detectLanguage { 

    if ([self isEmpty]) { 
     return nil; 
    } 

    NSString *string = nil; 

    // You can set a larger detect number here 
    if (self.length > 30) { 
     string = self; 
    } else { 
     NSMutableString *tempString = [NSMutableString stringWithString:self]; 

     while (tempString.length < 30) { 
      [tempString appendFormat:@" %@",self]; 
     } 

     string = tempString; 
    } 

    NSArray *tagschemes = [NSArray arrayWithObjects:NSLinguisticTagSchemeLanguage, nil]; 
    NSLinguisticTagger *tagger = [[NSLinguisticTagger alloc] initWithTagSchemes:tagschemes options:0]; 
    [tagger setString:string]; 
    NSString *language = [tagger tagAtIndex:0 scheme:NSLinguisticTagSchemeLanguage tokenRange:NULL sentenceRange:NULL]; 

    if (![language isEqualToString:@"und"]) { 
     return language; 
    } 

    return (__bridge NSString *)CFStringTokenizerCopyBestStringLanguage((CFStringRef)string, CFRangeMake(0, MIN(string.length,400))); 
}