使用Beautifulsoup和Selen从某个网页获取链接

我编写了此代码以登录到我的FB帐户，并使用Selenuim和BeautifulSoup获取页面上的所有组链接，但BeautifulSoup用法无法正常工作。使用Beautifulsoup和Selen从某个网页获取链接

我想知道如何在相同的代码中使用Selenuim和BeautifulSoup。

我不想使用Facebook API;我想使用Selenium和BeautifulSoup。

from selenium import webdriver 
from selenium.webdriver.common.keys import Keys 
from selenium.webdriver.common.by import By 
import httplib2 
from BeautifulSoup import BeautifulSoup, SoupStrainer 


usr = raw_input('--> ') 
pwd = raw_input('--> ') 
poo = raw_input('--> ') 

driver = webdriver.Firefox() 
# or you can use Chrome(executable_path="/usr/bin/chromedriver") 
driver.get("https://www.facebook.com/groups/?category=membership") 
assert "Facebook" in driver.title 
elem = driver.find_element_by_id("email") 
elem.send_keys(usr) 
elem = driver.find_element_by_id("pass") 
elem.send_keys(pwd) 
elem.send_keys(Keys.RETURN) 

scheight = .1 
while scheight < 9.9: 
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight/%s);" % scheight) 
    scheight += .01 
soup = BeautifulSoup(html) 
http = httplib2.Http() 
status, response = ('https://www.facebook.com/groups/?category=membership') 

count = 0 
for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')): 
    count = count + 1 
print 'Count: ', count 

for tag in BeautifulSoup(('a')): 
    if link.has_key('href'): 
     if '/groups/' in link['href']: 

      print link['href'] 


elem = driver.find_element_by_css_selector(".input.textInput") 
elem.send_keys(poo) 
elem = driver.find_element_by_css_selector(".selected") 
elem.send_keys(Keys.RETURN) 
elem.click() 
time.sleep(5)

来源

2015-03-18 elsharkawey

链接，您需要澄清。 “*美丽的汤不适合工作*”的含义是什么？ - 发生了什么，这与预期的行为有什么不同？ – Celeo 2015-03-18 20:51:52

的resualt回溯（最近通话最后一个）：文件 “tk.py” 28行，在汤= BeautifulSoup（HTML） NameError：名字 'HTML' 没有定义 – elsharkawey 2015-03-18 20:53:21

您从未声明过html。

硒的webdriver的有page_source方法，您可以使用：

soup = BeautifulSoup(driver.page_source)

更新第二个错误

你行，

status, response = ('https://www.facebook.com/groups/?category=membership')

试图分配一个字符串都status和response。没有什么可分配给response，因此该变量未定义。

来源

2015-03-18 21:12:53 Celeo

回溯（最近最后一次通话）：文件“tk.py”，第33行，在 for BeautifulSoup中的链接（response，parseOnlyThese = SoupStrainer（'a'））： NameError：未定义名称'响应' – elsharkawey 2015-03-18 21:40:00

更新为第二个错误。你是从某处复制这段代码还是正在写它？ – Celeo 2015-03-18 21:49:19

我写了一些此代码，并从谷歌 – elsharkawey 2015-03-18 21:51:25

我想BeautifulSoup没有返回正确的链接？

我确实觉得在BeautifulSoup与

soup = BeautifulSoup(html) 
for i in soup.find_all('a'): 
if '/groups/' in i.get('href'): 
    print(i.get('href'))

来源

2015-03-21 00:48:08 Jajo

soup = BeautifulSoup（html） NameError：名称'html'未定义 – elsharkawey 2015-03-21 11:45:22

然后在您的代码中，您从未定义过html ... – Jajo 2015-03-21 12:13:52

我如何定义html？ – elsharkawey 2015-03-21 12:24:26

使用Beautifulsoup和Selen从某个网页获取链接

回答

相关问题