打印使用python

特定单词后所有单词

假设我有已下列数据的文件：打印使用python

<td class="w"><a href="show.cgi?id=120012" title="[Title] &#64;Blue: Session_TIMEOUT after 60033 ms">[Title] &#64;Blue: Session_TIMEOUT after 60033 ms</a></td>' 
<td class="w"><a href="show.cgi?id=120012" title="[Title] &#64;Blue: Session_TIMEOUT after 60500 ms">[Title] &#64;Blue: Session_TIMEOUT after 60033 ms</a></td>'

在该上面的字符串我怎样才能retrive字符串标题后=“[标题] @蓝色： Session_TIMEOUT在60033 ms之后“，对于HTML标签下的两行，并在下一行写回字符串。

我想输出是这样的：

<td class="w"><a href="show.cgi?id=120012" title="[Title] &#64;Blue: Session_TIMEOUT after 60033 ms">[Title] &#64;Blue: Session_TIMEOUT after 60033 ms</a></td>' 
&#64;Blue: Session_TIMEOUT after 60033 ms 
<td class="w"><a href="show.cgi?id=120012" title="[Title] &#64;Blue: Session_TIMEOUT after 60500 ms">[Title] &#64;Blue: Session_TIMEOUT after 60033 ms</a></td>' 
&#64;Blue: Session_TIMEOUT after 60500 ms

请帮我一样的.... 预先感谢

来源

2012-11-28 Surya Gupta

可以使用regulare表达。如果你可以告诉你的intereset的字符串时，始终title="和结束ms之间，也就是说，挂靠那么你可以做：

进口RE＃regulare表达模块 G = re.compile（'标题=“（*。？MS）“）。搜索（线）＃搜索您的字符串

然后你的字符串将可通过g.group(1)。您可能会发现usefule阅读有关Python文档中的正则表达式，这是一个非常重要的编程工具对于每种语言，特别是在脚本中。

您可能还想添加regex标记为您的问题。

来源

2012-11-28 09:20:10

提到这是真正使用充满感激。但我还有一个问题，你再看看我的修改问题，在页面的开始。 –

使用Beautiful Soup库，你可以做到这一点很容易：

from BeautifulSoup import BeautifulSoup 
myHTML = '<td class="w"><a href="show.cgi?id=120012" title="[Title] &#64;Blue: Session_TIMEOUT after 60033 ms">[Title] &#64;BlueScreen: RCU_PCPU_TIMEOUT after 60033 ms</a></td>' 
html_doc = BeautifulSoup(myHTML) 
print html_doc.td.a.string

美丽的汤可以使用pip或easy_install，或者apt-get安装如果你是一个基于Debian的操作系统，如您想：

pip install BeautifulSoup 
easy_install BeautifulSoup 
apt-get install python-beautifulsoup

来源

2012-11-28 09:28:02

一个简单的方法：

line = line[(line.index('[Title]')+len('[Title]')):] 
line = line[(line.index('[Title]')+len('[Title]')):] 
text = line[:line.index('</a></td>')] 
print line + '\n' + text

虽然，一个更好的方式去了解这将是使用正则表达式由CodeChordsman

来源

2012-11-28 09:38:39 asheeshr

我改变了我的问题，请看看并回复给我。 –

打印使用python

回答

相关问题