2013-07-07 30 views
2

我正在编写一个程序来将标准SVG路径转换为Raphael.js友好格式。Python-删除字符,然后加入字符串

的路径数据格式为

d="M 62.678745, 
    259.31235 L 63.560745, 
    258.43135 L 64.220745, 
    257.99135 L 64.439745, 
    258.43135 L 64.000745 
    ... 
    ... 
    " 

我想要做的是首先删除十进制数字,然后取出空白。最终的结果应该是在格式

d="M62, 
    259L63, 
    258L64, 
    257L64, 
    258L64 
    ... 
    ... 
    " 

我有大约2000个左右,这些路径来解析和转换成一个JSON文件。

我已经得到了迄今所做的是

from bs4 import BeautifulSoup 

svg = open("/path/to/file.svg", "r").read() 
soup = BeautifulSoup(svg) 
paths = soup.findAll("path") 

raphael = [] 

for p in paths: 
    splitData = p['d'].split(",") 
    tempList = [] 

    for s in splitData: 
     #strip decimals from string 
     #don't know how to do this 

     #remove whitespace 
     s.replace(" ", "") 

     #add to templist 
     tempList.append(s + ", ") 

    tempList[-1].replace(", ", "") 
    raphael.append(tempList) 

回答

1

试试这个:

import re 
from bs4 import BeautifulSoup 

svg = open("/path/to/file.svg", "r").read() 
soup = BeautifulSoup(svg) 
paths = soup.findAll("path") 

raphael = [] 

for p in paths: 
    splitData = p['d'].split(",") 
    for line in splitData: 
     # Remove ".000000" part 
     line = re.sub("\.\d*", "", line) 
     line = line.replace(" ", "") 
     raphael.append(line) 

d = ",\n".join(raphael) 
3

您可以使用regex

>>> import re 
>>> d="""M 62.678745, 
    259.31235 L 63.560745, 
    258.43135 L 64.220745, 
    257.99135 L 64.439745, 
    258.43135 L 64.000745""" 

for strs in d.splitlines(): 
    print re.sub(r'(\s+)|(\.\d+)','',strs) 
...  
M62, 
259L63, 
258L64, 
257L64, 
258L64 
+0

+1更简单的解决方案... –

1

您可以构建一个蛮力解析器:

def isint(x): 
    try: 
     int(float(x)) 
     return True 
    except: 
     return False 

def parser(s): 
    mystr = lambda x: str(int(float(x))) 
    s = s.replace('\n','##') 
    tmp = ','.join([''.join([mystr(x) if isint(x) else x \ 
         for x in j.split()]) \ 
         for j in s.split(',')]) 
    return tmp.replace('##', '\n') 

测试:

d="M 62.678745,\n 259.31235 L 63.560745,\n 258.43135 L 64.220745, \n 257.99135 L 64.439745, \n 258.43135 L 64.000745 " 
print parser(d) 
# M62, 
# 259L63, 
# 258L64, 
# 257L64, 
# 258L64 
+0

你不觉得这是矫枉过正?这可以通过're.sub(r'(\ s +)|(\。\ d +)','',line)很容易地完成' –

+0

我肯定必须学习正则表达式...... –