2015-06-24 51 views
1

我只需要简单地处理USPTO商标网站以获得简单模式。使用Mechanize和python处理USPTO网站

#!/usr/bin/python 

import mechanize 
import cookielib 
br=mechanize.Browser() 
cg = cookielib.LWPCookieJar() 
br.set_cookiejar(cg); 

#br.set_all_readonly(False) 
br.set_handle_robots(False) 
br.set_handle_refresh(False) 
br.addheaders=[('User-agent', 'Firefox')] 

response=br.open("http://uspto.gov/trademarks-application-process/search-trademark-database") 

tess = 'TESS' 
start_search = 'Basic Word Mark Search (New User)' 

assert br.viewing_html() 
print br.title() 

for l in br.links(url_regex='tmsearch'): 
     if l.text == tess: 
       print l.url; 
       break 

br.follow_link(l) 
newlink=br.geturl() 
print newlink 

br.open(newlink) 
for link in br.links(): 
     if link.text == start_search: 
       print "Found Basic Search" 
       print link.text 
       print link.url 
       break; 
**#Why do we need the contactination. Witoug this it doesn't generate a full URL** 

newurl="http://tmsearch.uspto.gov" + link.url 
print newurl 
response1 = br.open(newurl); 

print response1.read() 

#for form in br.forms(): 
     #print "Form Name" form.name 

两个问题。

  1. 没有手动连接前缀,我没有得到完整的网址在这一步。
  2. 节目的最后结束时,我得到一些警告时,它说的形式英寸
  3. 最后,我想输入“搜索术语”一些搜索文本,我假定这是一个形式!但无法达到它。然后提交。接下来是跟进后面显示的表格。

回答

0

那么;

  1. 设置你的HTTP变量的一个变量,只是通过它作为newurl = oldurl + link.url,你总是可以做到在开始br.open(oldurl + "w/e goes here")

  2. for i in response1.forms(): print "Form name:", i.name

  3. 需要选择的形式,发送文本,然后点击提交..这里是一些提示:

    for form in br.forms(): 
        if form.attrs['id'] == 'search': 
        br.form = form 
        break 
    br["search"] = "text_search" 
    br.submit()