0
我想刮链接在https://www.panpages.my/search_results?q=如何返回未开始HREF链接“目录”使用python
我写的Python脚本来获得 我要过滤的链接每个页面中的所有链接未起"\Listings"
请在下面找到我的剧本,并帮助我:
import requests
from bs4 import BeautifulSoup
import re
from io import StringIO
import csv
data = open("D:/Mine/Python/Projects/Freelancer/seekProgramming/rootpages.csv").read()
dataFile = StringIO(data)
csvReader = csv.reader(dataFile)
f = open('paylinks.csv', 'w', newline = '')
writer = csv.writer(f)
for row in csvReader:
myurl = row[0]
def simple_web_scrapper(url):
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text, 'html.parser')
for root in soup.findAll('div', {"class": "mid_section col-xs-10 col-sm-7 tmargin xs-nomargin"}):
for link in root.findAll('a'):
href = link.get('href')
print(href)
simple_web_scrapper(myurl)
它工作正常。非常感谢 – Saranaone