1

同行您好程序员。我试图运行一个python代码来从thetrainline.com上刮掉信息。 我最近开始编程,我似乎无法弄清楚如何从发布请求中提取数据。请参阅下面的信息。刮掉票务和票价offtrainline.com上的数据

以下是我的代码现在:

postURL = 'https://www.thetrainline.com/buytickets/' 
predata = {'OriginStation':'Stockport', 
'DestinationStation':'Birmingham New Street', 
'RouteRestriction':'NULL', 
'ViaAvoidStation':'', 
'journeyTypeGroup':'return', 
'outwardDate':'14-Apr-17', 
'OutwardLeaveAfterOrBefore':'A', 
'OutwardHour':'15', 
'OutwardMinute':'15', 
'returnDate':'16-Apr-17', 
'InwardLeaveAfterOrBefore':'A', 
'ReturnHour':'9', 
'ReturnMinute':'0', 
'AdultsTravelling':'1', 
'ChildrenTravelling':'0', 
'railCardsType_0':'YNG', 
'railCardNumber_0':'1', 
'ExtendedSearch':'Get times & tickets'} 

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'} 
import requests 
postform=requests.post(postURL,headers=headers, data=predata) 

from bs4 import BeautifulSoup 
soup=BeautifulSoup(postform.content,'html.parser') 
table=soup.find(id='timetable') 

如果我运行shell命令“表”,我得到如下:你会如何建议得到数据集

>>> table 
<form action="combinedmatrix.aspx" class="form matrix matrix-search-outdep matrix-search-returndep" data-defaults='{"adultPassengers":1,"canChangeJourney":true,"canPreselectTicket":true,"childPassengers":0,"destinationName":"Birmingham New Street","fullJourneys":[{"cheapestTickets":[{"label":"Cheapest Standard Single","tickets":[{"code":"MBS","departureTime":"15:16","groupIdentifier":"cheapest","isCheapest":true,"journeyId":1,"price":"9.30","value":"{\"ArrivalStationCode\":\"BHM\",\"DepartureStationCode\":\"SPT\",\"Id\":208,\"JourneyArrivalDate\":\"\\\/Date(1492183980000+0100)\\\/\",\"JourneyDepartureDate\":\"\\\/Date(1492179360000+0100)\\\/\",\"Price\":9.3,\"PriceInPounds\":\"£9.30\",\"Type\":2}"},{"code":"MBS","departureTime":"15:36","groupIdentifier":"cheapest","isCheapest":true,"journeyId":2,"price":"9.30","value":"{\"ArrivalStationCode\":\"BHM\",\"DepartureStationCode\":\"SPT\",\"Id\":208,\"JourneyArrivalDate\":\"\\\/Date(1492185480000+0100)\\\/\",\"JourneyDepartureDate\":\"\\\/Date(1492180560000+0100)\\\/\",\"Price\":9.3,\"PriceInPounds\":\"£9.30\",\"Type\":2}"},{"code":"SVS","departureTime":"15:40","groupIdentifier":"cheapest","journeyId":3,"price":"23.85","value":"{\"ArrivalStationCode\":\"BHM\",\"DepartureStationCode\":\"SPT\",\"Id\":253,\"JourneyArrivalDate\":\"\\\/Date(1492186680000+0100)\\\/\",\"JourneyDepartureDate\":\"\\\/Date(1492180800000+0100)\\\/\",\"Price\":23.85,\"PriceInPounds\":\"£23.85\",\"Type\":2}"},{"code":"MBS","departureTime":"16:16","groupIdentifier":"cheapest","isCheapest":true,"journeyId":4,"price":"9.30","value":"{\"ArrivalStationCode\":\"BHM\",\"DepartureStationCode\":\"SPT\",\"Id\":208,\"JourneyArrivalDate\":\"\\\/Date(1492187580000+0100)\\\/\",\"JourneyDepartureDate\":\"\\\/Date(1492182960000+0100)\\\/\",\"Price\":9.3,\"PriceInPounds\":\"£9.30\",\"Type\":2}"}],"ticketsType":"S"},{"label":"Cheapest First Class Single","tickets":[{"code":"MBF","departureTime":"15:16","groupIdentifier":"cheapest","journeyId":1,"price":"24.30","value":"{\"ArrivalStationCode\":\"BHM\",\"DepartureStationCode\":\"SPT\",\"Id\":210,\"JourneyArrivalDate\":\"\\\/Date(1492183980000+0100)\\\/\",\"JourneyDepartureDate\":\"\\\/Date(1492179360000+0100)\\\/\",\"Price\":24.3,\"PriceInPounds\":\"£24.30\",\"Type\":2}"},{"code":"MBF","departureTime":"15:36","groupIdentifier":"cheapest","journeyId":2,"price":"24.30","value":"{\"ArrivalStationCode\":\"BHM\",\"DepartureStationCode\":\"SPT\",\"Id\":210,\"JourneyArrivalDate\":\"\\\/Date(1492185480000+0100)\\\/\",\"JourneyDepartureDate\":\"\\\/Date(1492180560000+0100)\\\/\",\"Price\":24.3,\"PriceInPounds\":\"£24.30\",\"Type\":2}"}, 
... 

从POST请求?

在此先感谢了很多的帮助

+1

是不是你想要的'数据defaults'属性里面的数据?你可以解析这个json。 –

+0

你实际需要什么数据? –

+0

http://pastebin.com/tVPxQn96我把你的JSON格式化得好一点,所以它可能会让你更有意识到你应该如何访问它 – Dillanm

回答

0
In [8]: import json 

In [9]: json.loads(table.get('data-defaults')) 
+0

尽管此代码可能会回答问题,但提供有关如何解决问题和/或为何解决问题的其他上下文可以提高答案的长期价值。 –

+1

非常感谢。那做了这个工作。有信息..现在工作更多 –