如何获得给定偏移ID的WordNet同义词集？

我有一个WordNet同义词偏移量（例如id="n#05576222"）。鉴于此偏移量，我如何使用Python获取同义词集？如何获得给定偏移ID的WordNet同义词集？

2011-11-10 user1039457

对于NTLK 3.2.3或更高版本，请参阅donners45的答案。

对于旧版本的NLTK的：

有一个在NLTK没有内置的方法，但你可以这样做：

from nltk.corpus import wordnet 

syns = list(wordnet.all_synsets()) 
offsets_list = [(s.offset(), s) for s in syns] 
offsets_dict = dict(offsets_list) 

offsets_dict[14204095] 
>>> Synset('heatstroke.n.01')

然后，您可以酸洗字典，并加载它时，你需要它。

对于NLTK之前的版本3.0，与

offsets_list = [(s.offset, s) for s in syns]

替换行

offsets_list = [(s.offset(), s) for s in syns]

因为之前NLTK 3.0 offset物的属性，而不是方法。

来源

2012-09-11 21:53:53

有趣的是，这将引发与NLTK 3.0 – duhaime

一个关键错误'offset'是现在是一种方法。试试这个： 'offsets_dict = {s.offset（）：s for w in.all_synsets（）}' – Omer

*“NLTK中没有内置方法”* - 现在有！查看donners45的答案;这个已经过时了。 –

除了使用NLTK，另一种选择是使用普林斯顿WordNet中的.tab文件Open Multilingual WordNethttp://compling.hss.ntu.edu.sg/omw/。通常我用下面的配方来访问共发现作为一个字典，偏移为重点和;分隔字符串作为一个价值观：

# Gets first instance of matching key given a value and a dictionary.  
def getKey(dic, value): 
    return [k for k,v.split(";") in dic.items() if v in value] 

# Read Open Multi WN's .tab file 
def readWNfile(wnfile, option="ss"): 
    reader = codecs.open(wnfile, "r", "utf8").readlines() 
    wn = {} 
    for l in reader: 
    if l[0] == "#": continue 
    if option=="ss": 
     k = l.split("\t")[0] #ss as key 
     v = l.split("\t")[2][:-1] #word 
    else: 
     v = l.split("\t")[0] #ss as value 
     k = l.split("\t")[2][:-1] #word as key 
    try: 
     temp = wn[k] 
     wn[k] = temp + ";" + v 
    except KeyError: 
     wn[k] = v 
    return wn 

princetonWN = readWNfile('wn-data-eng.tab') 
offset = "n#05576222" 
offset = offset.split('#')[1]+'-'+ offset.split('#')[0] 

print princetonWN.split(";") 
print getKey('heatstroke')

来源

2013-02-02 02:21:28 alvas

由于NLTK 3.2.3中，有这样做的一个公共方法：

wordnet.synset_from_pos_and_offset(pos, offset)

在早期版本中，你可以使用：

wordnet._synset_from_pos_and_offset(pos, offset)

这将返回基于它一个同义词集的POS和offest ID。我认为这种方法仅适用于NLTK 3.0，但我不确定。

例子：

from nltk.corpus import wordnet as wn 
wn._synset_from_pos_and_offset('n',4543158) 
>> Synset('wagon.n.01')

来源

2014-11-26 09:37:04 donners45

您可以使用of2ss()，例如：

from nltk.corpus import wordnet as wn 
syn = wn.of2ss('01580050a')

将返回 Synset('necessary.a.01')

来源

2017-03-20 14:36:28 carcar

如何获得给定偏移ID的WordNet同义词集？

回答

相关问题