斯坦福分析器的这种特定输出格式称为“括号内解析(树)”。它应该与
- 字作为节点(例如为,一个,会计师)
- 短语/从句作为标记物(例如S,NP,VP)
- 边缘被分层链接被读取为一曲线图和
- 通常的解析TOP或根节点是幻觉
ROOT
(在这种情况下,可以读取它作为向非循环图(DAG),因为它是单向的且非循环)
这里有一些库可以读取括号分析,例如在NLTK
的nltk.tree.Tree
(http://www.nltk.org/howto/tree.html):
>>> from nltk.tree import Tree
>>> output = '(ROOT (S (PP (IN As) (NP (DT an) (NN accountant))) (NP (PRP I)) (VP (VBP want) (S (VP (TO to) (VP (VB make) (NP (DT a) (NN payment))))))))'
>>> parsetree = Tree.fromstring(output)
>>> print parsetree
(ROOT
(S
(PP (IN As) (NP (DT an) (NN accountant)))
(NP (PRP I))
(VP
(VBP want)
(S (VP (TO to) (VP (VB make) (NP (DT a) (NN payment))))))))
>>> parsetree.pretty_print()
ROOT
|
S
______________________|________
| | VP
| | ________|____
| | | S
| | | |
| | | VP
| | | ________|___
PP | | | VP
___|___ | | | ________|___
| NP NP | | | NP
| ___|______ | | | | ___|_____
IN DT NN PRP VBP TO VB DT NN
| | | | | | | | |
As an accountant I want to make a payment
>>> parsetree.leaves()
['As', 'an', 'accountant', 'I', 'want', 'to', 'make', 'a', 'payment']
FWIW这是列出如何嵌套在Lisp中被表示。设想方括号,而不是圆括号和标记周围的引号(如果有帮助的话)。 – tripleee
@tripleee出于好奇,是否有一个本地python正则表达式或函数来读取像python嵌套列表Lisp Lisp? – alvas
绝对不是正则表达式!我无法找到内置的解析器,但请参阅http://stackoverflow.com/questions/3182594/parsing-s-expressions-in-python和https://sexpdata.readthedocs.org/en/latest/ – tripleee