我有一个数据帧data
与2列ID
和Text
。目标是根据日期将Text
列中的值分成多列。通常情况下,日期会启动一系列需要在列中的字符串值,除非日期位于字符串的末尾(在这种情况下,它被视为以前一个日期开始的字符串的一部分)。如何使用日期来分割一个数据帧列python中的多列
data:
ID Text
10 6/26/06 begin tramadol, penicilin X 6 CYCLES. 1000mg tylenol X 1 YR after 11/2007
20 7/17/06-advil, qui;
10 7/19/06-ibuprofen. 8/31/06-penicilin, tramadol;
40 9/26/06-penicilin, tramadol;
91 5/23/06-penicilin, amoxicilin, tylenol;
84 10/20/06-ibuprofen, tramadol;
17 12/19/06-vit D, tramadol. 12/1/09 -6/18/10 vit D only for 5 months. 3/7/11 f/up
23 12/19/06-vit D, tramadol; 12/1/09 -6/18/10 vit D; 3/7/11 video follow-up
15 Follow up appt. scheduled
69 talk to care giver
32 12/15/06-2/16/07 everyday Follow-up; 6/8/16 discharged after 2 months
70 12/1/06?Follow up but no serious allergies
70 12/12/06-tylenol, vit D,advil; 1/26/07 scheduled surgery but had to cancel due to severe allergic reactions to advil
预期输出:
ID Text Text2 Text3
10 6/26/06 begin tramadol, penicilin X 6 CYCLES. 1000mg tylenol X 1 YR after 11/2007
20 7/17/06-advil, qui;
10 7/19/06-ibuprofen. 8/31/06-penicilin, tramadol;
40 9/26/06-penicilin, tramadol;
91 5/23/06-penicilin, amoxicilin, tylenol;
84 10/20/06-ibuprofen, tramadol;
17 12/19/06-vit D, tramadol. 12/1/09 -6/18/10 vit D only for 5 months. 3/7/11 f/up
23 12/19/06-vit D, tramadol; 12/1/09 -6/18/10 vit D; 3/7/11 video follow-up
15 Follow up appt. scheduled
69 talk to care giver
32 12/15/06-2/16/07 everyday Follow-up; 6/8/16 discharged after 2 months
70 12/1/06?Follow up but no serious allergies
70 12/12/06-tylenol, vit D,advil; 1/26/07 scheduled surgery but had to cancel due to severe allergic reactions to advil
到目前为止我的代码:
d = []
for i in data.Text:
d = list(datefinder.find_dates(i)) #I can get the dates so far but still want to format the date values as %m/%d/%Y
if len(d) > 1:#Checks for every record that has more than 1 date
for j in range(0,len(d)):
i = " " + " ".join(re.split(r'[^a-z 0-9/-]',i.lower())) + " " #cleans the text strings of any special characters
#data.Text[j] = d[j]r'[/^(.*?)]'d[j+1]'/'#this is not working
#The goal is for the Text column to retain the string from the first date up to before the second date. Then create a new Text1, get every value from the second date up to before the third date. And if there are more dates, create Textn and so on.
#Exception, if a date immediately follows a date (i.e. 12/1/09 -6/18/10) or a date ends a value string (i.e. 6/26/06 begin tramadol, penicilin X 6 CYCLES. 1000mg tylenol X 1 YR after 11/2007), they should be considered to be in the same column
如何使这项工作将节省我一天的任何想法。谢谢!
将所有相关的日期格式是MM/DD/YY格式? –
@Brad Solomon - 最好以mm/dd/yyy为单位。谢谢! – CodeLearner
我的意思是在您的输入数据 –