2017-04-15 74 views
0

下面类似的描述是银行对账单的样品部分:合并在银行对账单输出

Description<-c(
"EXXONMOBIL 46344172 " 
"EXXONMOBIL 97142239 " 
"EXXONMOBIL 97523322 " 
"EXXONMOBIL 99123183 " 
"JIMMY JOHNS - 1236 " 
"JIMMY JOHNS - 2453 " 
"JIMMY JOHNS # 95612 " 
"KWIK FILL 212 " 
"KWIK TRIP 245000" 
"KWIK TRIP0002342 " 
"KWIK TRIP 67200003453 " 
"MCDONALD'S F11123 " 
"MCDONALD'S F11234 " 
"MCDONALD'S F25345 " 
"MCDONALD'S F5349 " 
) 

Debit<-as.numeric(c(
"25.98", 
"24.54", 
"29.59", 
"31.85", 
"7.61", 
"17.82", 
"10.58", 
"26.5", 
"22.48", 
"146.62", 
"52.51", 
"2.57", 
"7.77", 
"9.59", 
"11.85" 
)) 

df<-data.frame(Description,Debit) 

与下面的输出:

Description     Debit 
EXXONMOBIL 46946182  25.98 
EXXONMOBIL 97302509  24.54 
EXXONMOBIL 97585822  29.59 
EXXONMOBIL 99374183  31.85 
JIMMY JOHNS - 1476   7.61 
JIMMY JOHNS - 2763   17.82 
JIMMY JOHNS # 90012   10.58 
KWIK FILL 228    26.5 
KWIK TRIP 24500002451  22.48 
KWIK TRIP 146.62 
KWIK TRIP 67200006726  52.51 
MCDONALD'S F11780   2.57 
MCDONALD'S F11883   7.77 
MCDONALD'S F25398   9.59 
MCDONALD'S F4789   11.85  

我wondernig怎么会是可能的汇总结果由描述,以便独特的代码被删除,我得到像Exxonmobil,吉米琼斯等每个公司的费用总结量。不知道如果消除所有空白后的一切,消除所有的数字字符,或者(在我看来可能是最好的)得到r所有数字和特殊字符的编号,只保留字母?

以任何方式所需的输出会是这样的:

Description  Debit 
EXXONMOBIL  111.96 
JIMMY JOHNS  36.01 
KWIK FILL  26.5 
KWIK TRIP  221.61 
MCDONALD'S  31.78 

有什么建议?

+0

结账[OpenRefine](http://openrefine.org) –

+0

@BenBolker谢谢,虽然不理想,但它是一个不错的选择,它的集群功能 – Oposum

回答

1

这在REGEX中会很简单。

E.g.

EXXONMOBIL.* (\d*.\d*) 

You can see it working here...

一旦你有,你可以用任何语言来概括起来价值观或改变了组内的值这根你的搜索。

+0

谢谢,你的意思是,我必须复制+粘贴我的数据您提供的网站?或者我在R里做? – Oposum