Python：如何确定语言？

我想要得到这样的：Python：如何确定语言？

Input text: "ру́сский язы́к" 
Output text: "Russian" 

Input text: "中文" 
Output text: "Chinese" 

Input text: "にほんご" 
Output text: "Japanese" 

Input text: "العَرَبِيَّة" 
Output text: "Arabic"

我怎么能做到这一点在Python？谢谢。

来源

2016-08-25 Rita

那你试试？ – Raskayu

这可能有助于http://stackoverflow.com/questions/4545977/python-can-i-detect-unicode-string-language-code –

你吃过看看langdetect？

from langdetect import detect 

lang = detect("Ein, zwei, drei, vier") 

print lang 
#output: de

来源

2016-08-25 10:38:59 dhdavvie

您可以尝试确定输入字符串中的Unicode字符组的指出语言的类型，（西里尔俄罗斯，例如），然后搜索文本中的特定语言的符号。

来源

2016-08-25 11:10:34 Kerbiter

TextBlob。需要NLTK套件，使用Google。

b = TextBlob("bonjour") 
b.detect_language()

pip install textblob

Polyglot。需要numpy和一些神秘的库，不太可能让它在Windows上运行。能够检测混合语言的文本。

mixed_text = u""" 
China (simplified Chinese: 中国; traditional Chinese: 中國), 
officially the People's Republic of China (PRC), is a sovereign state 
located in East Asia. 
""" 
for language in Detector(mixed_text).languages: 
     print(language)

pip install polyglot

chardet还具有检测语言的特征，如果有在范围（字符的字节127-255]：

>>> chardet.detect("Я люблю вкусные пампушки".encode('cp1251')) 
{'encoding': 'windows-1251', 'confidence': 0.9637267119204621, 'language': 'Russian'}

pip install chardet

langdetect需要大部分文本。它使用非确定性方法。这意味着对于相同的文本样本你会得到不同的结果。文件说，你必须使用下面的代码，使之确定：
```
from langdetect import detect, DetectorFactory 
DetectorFactory.seed = 0 
detect('今一はお前さん') 
```

pip install langdetect

guess_language可以通过使用this拼写检查使用词典检测非常短的样本。

pip install guess_language-spirit

来源

2017-11-04 02:32:58 Rabash

Python：如何确定语言？

回答

相关问题