2017-06-12 70 views
1
d_hsp={"1":"I","2":"II","3":"III","4":"IV","5":"V","6":"VI","7":"VII","8":"VIII", 
     "9":"IX","10":"X","11":"XI","12":"XII","13":"XIII","14":"XIV","15":"XV", 
     "16":"XVI","17":"XVII","18":"XVIII","19":"XIX","20":"XX","21":"XXI", 
     "22":"XXII","23":"XXIII","24":"XXIV","25":"XXV"} 
HSP_OLD['tryl'] = HSP_OLD['tryl'].replace(d_hsp, regex=True) 

HSP_OLD是一个数据帧,trylHSP_OLD一列,这里是在tryl值的一些例子:转换小数为罗马数字

SAF/HSP: Secondary diagnosis E code 1

SAF/HSP: Secondary diagnosis E code 11

我使用字典取代它,它适用于1-10,但是对于11,它将变成“II”,对于12,它将变成“III”。

+2

标题似乎与代码相反。 – stark

回答

2

您需要保留项目的顺序,并开始搜索最长的子字符串。

您可以在这里使用OrderDict。要初始化它,请使用元组列表。在初始化时,您可能会在此处将其取消,但您也可以稍后再执行此操作。

import collections 
import pandas as pd 
# My test data  
HSP_OLD = pd.DataFrame({'tryl':['1. Text', '11. New Text', '25. More here']}) 

d_hsp_lst=[("1","I"),("2","II"),("3","III"),("4","IV"),("5","V"),("6","VI"),("7","VII"),("8","VIII"), ("9","IX"),("10","X"),("11","XI"),("12","XII"),("13","XIII"),("14","XIV"),("15","XV"), ("16","XVI"),("17","XVII"),("18","XVIII"),("19","XIX"),("20","XX"),("21","XXI"), ("22","XXII"),("23","XXIII"),("24","XXIV"),("25","XXV")] 
d_hsp = collections.OrderedDict(d_hsp_lst) # Creating the OrderedDict 
d_hsp = collections.OrderedDict(reversed(d_hsp.items())) # Here, reversing 

>>> HSP_OLD['tryl'] = HSP_OLD['tryl'].replace(d_hsp, regex=True) 
>>> HSP_OLD 
      tryl 
0   I. Text 
1 XI. New Text 
2 XXV. More here 
2

对不起,没注意,你不只是更新领域,但你真正想要更换号码末,但即使是这样的话 - 这是好多了你的电话号码正确地转换为罗马数字比映射每一个可能出现的这种情况(如果数字大于25,你的代码会发生什么?)。所以,这里是做到这一点的一种方法:

ROMAN_MAP = [(1000, 'M'), (900, 'CM'), (500, 'D'), (400, 'CD'), (100, 'C'), (90, 'XC'), 
      (50, 'L'), (40, 'XL'), (10, 'X'), (9, 'IX'), (5, 'V'), (4, 'IV'), (1, 'I')] 

def romanize(data): 
    if not data or not isinstance(data, str): # we know how to work with strings only 
     return data 
    data = data.rstrip() # remove potential extra whitespace at the end 
    space_pos = data.rfind(" ") # find the last space before the number 
    if space_pos != -1: 
     try: 
      number = int(data[space_pos + 1:]) # get the number at the end 
      roman_number = "" 
      while number > 0: # loop-reduce while converting our number to roman numerals 
       for i, r in ROMAN_MAP: # simple substitution based on the above ROMAN_MAP 
        while number >= i: 
         roman_number += r 
         number -= i 
      return data[:space_pos + 1] + roman_number # put everything back together 
     except (TypeError, ValueError): 
      pass # couldn't extract a number 
    return data 

因此,如果我们创建自己的数据帧:

HSP_OLD['tryl'] = HSP_OLD['tryl'].apply(romanize) 

HSP_OLD = pd.DataFrame({"tryl": ["SAF/HSP: Secondary diagnosis E code 1", 
           None, 
           "SAF/HSP: Secondary diagnosis E code 11", 
           "Something else without a number at the end"]}) 

我们可以NOE容易在整个柱,运用我们的功能

导致:

          tryl 
0  SAF/HSP: Secondary diagnosis E code I 
1          None 
2  SAF/HSP: Secondary diagnosis E code XI 
3 Something else without a number at the end 

当然,您可以根据自己的需要调整romanize()函数来搜索字符串中的任意数字并将其转换为罗马数字 - 这只是如何在字符串末尾快速找到数字的示例。