2017-01-18 52 views
0

目前,我有下面的代码行刮痧表和打印各种数据

import requests, re, bs4 
from urllib.parse import urljoin 
start_url = 'http://www.racingaustralia.horse/' 

def make_soup(url): 
    r = requests.get(url) 
    soup = bs4.BeautifulSoup(r.text,"lxml") 
    return soup 

def get_links(url): 
    soup = make_soup(url) 
    a_tags = soup.find_all('a', href=re.compile(r"^/FreeFields/")) 
    links = [urljoin(start_url,a['href']) for a in a_tags] 
return links 

def get_tds(link): 
    soup = make_soup(link) 
    tds = soup.find_all('td', class_="horse") 
    for td in tds: 
      print(td.text) 

if __name__ == '__main__': 
    links = get_links(start_url) 
    for link in links: 
     get_tds(link) 

其中全部刮掉马的名字从racingaustralia.com/horse表内的会议

这正是我想要的但我也想检索会议日期,会议地点和每场比赛,列出马名。

这是我想要什么样的一个例子:使每一个赛大会的日期和地点打印以及为竞赛号码

Date of Race Meet 
Location of Race Meet 
Race Number 
Horse.... 
... 
... 
... 
Race Number 
Horse 
... 
... 
etc 

会有人能够帮助我调整代码每匹马?

我尝试了以下方法,但我想知道是否有更有效的方法来做到这一点。

def get_tds(link): 
    soup = make_soup(link) 
    race_date = soup.find_all('span', class_="race-venue-date") 
    for span in race_date: 
     print(span.text) 

    tds = soup.find_all('td', class_="horse") 
    for td in tds: 
     print(td.text) 

def get_info(link): 
    item = soup.find_all('div', class_="top") 
    for div in item: 
     print(div.text) 

if __name__ == '__main__': 
    links = get_links(start_url) 
    for link in links: 
     get_info(link), get_tds(link) 

在此先感谢

+0

我写上面的代码让你明白它是如何工作的,你不应该让其他人为你写代码。 –

+2

嗨,你可能已经注意到我实际上已经改编了你为我写的代码。在你改变我所拥有的东西之前,我还写了一段相当的代码。我只是寻求帮助,如果这是要求太多,我会删除它 – Kirsty

回答

0
import requests, re, bs4 
from urllib.parse import urljoin 


def make_soup(url): 
    r = requests.get(url) 
    soup = bs4.BeautifulSoup(r.text,"lxml") 
    return soup 

def get_links(url): 
    soup = make_soup(url) 
    a_tags = soup.find_all('a', href=re.compile(r"^/FreeFields/")) 
    links = [urljoin(start_url,a['href']) for a in a_tags] 
    return links 

def get_info(link): 
    soup = make_soup(link) 
    tds = soup.find_all('td', class_="horse") 
    if tds: 
     top = soup.find(class_="top").h2 
     for s in top.stripped_strings: 
      print(s) 
     for index, td in enumerate(tds, 1): 
      print(index, td.text, sep='\n') 
    else: 
     print('not find') 

if __name__ == '__main__': 
    start_url = 'http://www.racingaustralia.horse/' 
    links = get_links(start_url) 
    for link in links: 
     get_info(link) 

出来:

Warwick Farm: Australian Turf Club 
Wednesday, 18 January 2017 
1 
GAUGUIN (NZ) 
2 
DAHOOIL (NZ) 
3 
METAMORPHIC 
4 
MY KIND 
5 
CONCISELY 
6 
ARAZONA 
7 
APOLLO 
8 
IGNITE THE LIGHT 
9 
KRUPSKAYA 

有不包含你的需要,你应改变正则表达式过滤出来的信息很多网址,通过这种方式,您的代码可以运行得更快。

+0

嗨,这个结果打印马数,而不是比赛号码。我并不是真的想要马数 - 只是比赛号码。 – Kirsty