这是我的数据的一个样本:删除标点符号格式的文本 - 星火
case time (especially it's purse), read manual care, follow care instructions make stays waterproof -- example, inspect rubber seals doors (especially battery/memory card door open time)
xm "life support" picture . flip part bit flimsy guessing won't long . sound great altec speaker dock it! chance back base (xm3020) . traveling bag connect laptop extra speaker . amount paid ($25).
我想删除所有标点符号除了点,并与length < = 2
删除的话,比如我的预期输出()。是:
case time especially its purse read manual care follow care instructions . make stays waterproof example inspect rubber seals doors especially batterymemory card door open time
life support picture . flip part bit flimsy guessing wont long . sound great altec speaker dock chance back base xm3020 . traveling bag connect laptop extra speaker . amount paid $25 .
,这应该在Scala中实现, 我已经试过:
replaceAll("""\\W\s""", "")
replaceAll(""""[^a-zA-Z\.]""", "")
但无法正常工作,任何人都可以帮助我吗?
'$ 25'有一个特殊的字符,你没有删除。 – tuxdna