2017-03-05 30 views
-2

我有数据框由Word(代表英文单词),sentence_ID(代表句子编号)和Flag(代表是否是句子的这个单词部分)组成,如果Flag = 1这意味着句子边界内的单词,如果Flag = 0,这意味着单词在句子的边缘)。根据与句子边界的距离如何排列单词

我想排名单词基于离句子中心有多远。 因此,输入

Word sentence_ID Flag 
A 1 1 
B 1 1 
C 1 1 
D 1 1 
E 1 1 
A 1 0 
F 2 1 
G 2 1 
H 2 1 
I 2 1 
A 2 0 
J 0 0 
k 0 0 
M 0 0 
C 3 1 
D 3 1 
E 3 1 
A 3 1 
F 3 1 
G 3 1 
H 3 1 
I 3 1 
A 3 1 
J 3 1 
G 3 0 
H 0 0 
I 0 0 
L 4 1 

输出

Word sentence_ID Flag Rank 
A 1 1 1 
B 1 1 2 
C 1 1 3 
D 1 1 3 
E 1 1 2 
A 1 0 1 
F 2 1 1 
G 2 1 2 
H 2 1 3 
I 2 1 2 
A 2 0 1 
J 0 0 
k 0 0 
M 0 0 
C 3 1 1 
D 3 1 2 
E 3 1 3 
A 3 1 4 
F 3 1 5 
G 3 1 6 
H 3 1 5 
I 3 1 4 
A 3 1 3 
J 3 1 2 
G 3 0 1 
H 0 0 
I 0 0 
L 4 1 1 
+0

好的。你有什么问题?这太宽了。 – Carcigenicate

+0

问题是我们如何找到Rank列? –

回答

0

试试这个,例如:

sentence = [("foo",0), ("bar",0) , ("baz",0), ("foo",0), ("bar",0) ] 
words = len(sentence) 
if odd(words): 
    center = int(words/2) + 1 
else: 
    center = words/2 

for rank, i in enumerrate(range(0, center), 1): 
    sentence [i] [1] = rank 

for rank, i in reversed(range(center, words), center-1): 
    sentence [i] [1] = rank 

print(sentence). 
0

后六小时的编码,我发现搜索解决方案:

df = pd.read_csv(f_Name, sep=";",index_col=False) 
    df2= df.groupby(["sentence_ID"]).size().reset_index(name='count') # Find the length for each sentense 

    #Process first Sentense 
    j = 0 

    for index in range(0, len(df)): 
     if index in df['sentence_ID']: 
      if df.ix[index, 'sentence_ID'] in df2['sentence_ID'] and df.ix[index, 'sentence_ID'] != 0: 
       if index > 1 and df.ix[index, 'sentence_ID'] != df.ix[index -1, 'sentence_ID']: 
        j=0 
        CurrentSentensLength = df2.ix[df.ix[index, 'sentence_ID'], 'count'] 
        if CurrentSentensLength % 2 == 1: 
         center = int(CurrentSentensLength/2) + 1 
         center = index + center 
        else: 
         center = CurrentSentensLength/2 
         center = index + center 
       elif index == 0: 
        # Process first Sentense 
        CurrentSentensLength = df2.ix[df.ix[index, 'sentence_ID'], 'count'] 
        if CurrentSentensLength % 2 == 1: 
         center = int(CurrentSentensLength/2) + 1 
         center = index + center 

        else: 
         center = CurrentSentensLength/2 
         center = index + center 
       if index >= center: 
        if index !=center: 
         j=j-1 
       else: 
        j=j+1 

       df.ix[index, 'Gloss_Rank_On_Sentense'] = j 
相关问题