你会创建一个列表字典来捕捉连接:
connections = {}
for topic, (conns, some_number) in data:
for conn in conns:
connections.setdefault(conn, set()).add(topic)
此连接值映射到主题集。
现在您可以查看反向连接;刚刚获得所有连接值集合的并集,如果顺序并不重要:
output = [tuple(set().union(*(connections[c] for c in conns)))
for topic, (conns, some_number) in data]
演示:
:
>>> data = [('topic1', (['apples', 'oranges'], 0.14975108213820515)),
... ('topic2', (['oranges', 'raisins'], 0.14975108213820515)),
... ('topic3', (['grapes', 'raisins'], 0.14975108213820515)),
... ('topic4', (['trees', 'flowers'], 0.14975108213820515))]
>>> connections = {}
>>> for topic, (conns, some_number) in data:
... for conn in conns:
... connections.setdefault(conn, set()).add(topic)
...
>>> [tuple(set().union(*(connections[c] for c in conns)))
... for topic, (conns, some_number) in data]
[('topic1', 'topic2'), ('topic1', 'topic3', 'topic2'), ('topic3', 'topic2'), ('topic4',)]
>>> from pprint import pprint
>>> pprint(_)
[('topic1', 'topic2'),
('topic1', 'topic3', 'topic2'),
('topic3', 'topic2'),
('topic4',)]
由该组第一移除它以其他方式移动topic
到前面
output = [(topic,) + tuple(set().union(*(connections[c] for c in conns)) - {topic})
for topic, (conns, some_number) in data]
>>> [(topic,) + tuple(set().union(*(connections[c] for c in conns)) - {topic})
... for topic, (conns, some_number) in data]
[('topic1', 'topic2'), ('topic2', 'topic1', 'topic3'), ('topic3', 'topic2'), ('topic4',)]
>>> pprint(_)
[('topic1', 'topic2'),
('topic2', 'topic1', 'topic3'),
('topic3', 'topic2'),
('topic4',)]
Topic2具有与Topic1和Topic3相同的元素,但Topic1和Topic3没有任何元素,并且仅因为Topic2而相关。这很重要吗? – MeetTitan 2014-12-02 18:14:47