2014-07-10 67 views
1

我已经写了一些代码来从我有一些Twitter关系数据的.txt文件中导入边界列表。数据是定向的并包含重复值。我想将这些数据加载到包含边缘权重的DiGraph()中,但我无法确定这一部分。我正在考虑使用类似Counter()的东西来计算重复的边缘,但我不确定如何计算并将其包括在图中。parse_edgelist重复加权网络? python networkx

我已经包含了一个.txt文件的样本来显示我的数据的外观。

样品.txt文件数据

# twitter data 
# retrieved at: 07.08.2014 
# total number of records: 8 
# exported by: userXYZ 
# 
# fields: date, time, source, target 
10.12.2013; 02:00; tweeterA; tweeterB 
10.12.2013; 02:01; tweeterB; tweeterC 
10.13.2013; 02:04; tweeterC; tweeterA 
10.13.2013; 02:08; tweeterC; tweeterA 
10.13.2013; 02:10; tweeterD; tweeterB 
10.13.2013; 02:11; tweeterA; tweeterC 
10.13.2013; 02:13; tweeterC; tweeterB 
10.13.2013; 02:18; tweeterA; tweeterD 

现有代码

import networkx as nx 

header = ['date', 'time', 'source', 'target'] 

data = [{key: value for (key, value) in zip(header, line.strip().split('; '))} for line in open('data.txt') if not line.startswith('#')] 

edgelist = [] 
for i in data: 
    edgelist.append(" ".join([ 
    i['source'], 
    i['target']])) 

G = nx.parse_edgelist(edgelist,create_using=nx.DiGraph()) 

nx.draw(G) 

回答

1

你已经很接近,我想,是的,你可以使用collections.Counter()。然后,您需要通过将权重设置为数据属性来将权重与每个边相关联。

from collections import Counter 
import networkx as nx 

edge_counts = Counter((' '.join(line.strip().split('; ')[2:]) for line in open('data.txt') if not line.startswith('#'))) 

G = nx.parse_edgelist(('%s %d' % edge for edge in edge_counts.items()), 
         data=(('weight',int),), 
         create_using=nx.DiGraph()) 

# nx.draw(G) 

from pprint import pprint 
pprint(sorted(G.edges(data=True))) 

应该给你这样的输出:

[('tweeterA', 'tweeterB', {'weight': 1}), 
('tweeterA', 'tweeterC', {'weight': 1}), 
('tweeterA', 'tweeterD', {'weight': 1}), 
('tweeterB', 'tweeterC', {'weight': 1}), 
('tweeterC', 'tweeterA', {'weight': 2}), 
('tweeterC', 'tweeterB', {'weight': 1}), 
('tweeterD', 'tweeterB', {'weight': 1})] 
相关问题