如何返回未开始HREF链接“\目录”使用python

我想刮链接在https://www.panpages.my/search_results?q=如何返回未开始HREF链接“目录”使用python

我写的Python脚本来获得我要过滤的链接每个页面中的所有链接未起"\Listings"

请在下面找到我的剧本，并帮助我：

import requests 
from bs4 import BeautifulSoup 
import re 
from io import StringIO 
import csv 

data = open("D:/Mine/Python/Projects/Freelancer/seekProgramming/rootpages.csv").read() 
dataFile = StringIO(data) 
csvReader = csv.reader(dataFile) 
f = open('paylinks.csv', 'w', newline = '') 
writer = csv.writer(f) 

for row in csvReader: 
    myurl = row[0] 
    def simple_web_scrapper(url): 
     source_code = requests.get(url) 
     plain_text = source_code.text 
     soup = BeautifulSoup(plain_text, 'html.parser') 
     for root in soup.findAll('div', {"class": "mid_section col-xs-10 col-sm-7 tmargin xs-nomargin"}): 
      for link in root.findAll('a'): 
       href = link.get('href') 
       print(href) 
    simple_web_scrapper(myurl)

来源

2017-08-09 Saranaone

它工作正常。非常感谢 – Saranaone

for row in csvReader: 
    myurl = row[0] 
    def simple_web_scrapper(url): 
     source_code = requests.get(url) 
     plain_text = source_code.text 
     soup = BeautifulSoup(plain_text, 'html.parser') 
     for root in soup.findAll('div', {"class": "mid_section col-xs-10 col-sm-7 tmargin xs-nomargin"}): 
      for link in root.findAll('a'): 
       href = link.get('href') 
       if href.startswith('\listings'): #that's the row you need 
        print(href) 
    simple_web_scrapper(myurl)

来源

2017-08-09 12:55:29

@Saranaone，有什么反馈？我的回答有帮助吗？ –

如何返回未开始HREF链接“\目录”使用python

回答

相关问题