我对html不太了解...... 如何从页面中删除文本? 例如,如果HTML页面读取为:处理HTML文件Python
<meta name="title" content="How can I make money at home online? No gimmacks please? - Yahoo! Answers">
<title>How can I make money at home online? No gimmicks please? - Yahoo! Answers</title>
我只是想提取此。
How can I make money at home online? No gimmicks please? - Yahoo! Answers
我重新使用功能:
def striphtml(data):
p = re.compile(r'<.*?>')
return p.sub(' ',data)
但仍没有做什么,我想让它做..?
上述功能被称为:
for lines in filehandle.readlines():
#k = str(section[6].strip())
myFile.write(lines)
lines = striphtml(lines)
content.append(lines)
可能重复http://stackoverflow.com/questions/717541/parsing-html-in- python),[使用Python处理HTML文件](http://stackoverflow.com/q/7694637) – Sathya 2012-01-09 02:45:43
检查此问题:http://stackoverflow.com/questions/328356/extracting-text-from-html-file - 使用的Python – mgibsonbr 2012-01-09 02:47:15