2010-05-02 29 views
0

我想抓取使用机械化的网站。 该网站提供不同页面的搜索结果。 发布以获取下一组结果时,出现了一些错误,服务器将我重定向到第一页,要求机械化更新SearchSession Cookie。Python机械化无法避免重定向时发布

我一直在调试使用Firefox的请求,他们看起来完全一样, ,我无法找到问题。任何建议?请求下:

-----------第一个正确的序列,在FIREFOX中使用篡改---------------------- --- POST XXX /职位搜索/ Results.aspx?关键词= Python的& LTxt =伦敦%2C +南+东&半径= 0 & LIds2 = ZV & CLID = 1621 & cltypeid = 2 &列表CLNAME =伦敦负载标志[LOAD_DOCUMENT_URI LOAD_INITIAL_DOCUMENT_URI]内容大小[-1] Mime类型[text/html] 请求头: 主机[www.cwjobs.co.uk] 用户代理[Mozilla/5.0(X11; U; Linux i686; en-US; rv:1.9.1.9)Gecko/20100401 Ubuntu/9.10(karmic)Firefox/3.5.9] 接受[text/html,application/xhtm Accept-Language [en-us,en; q = 0.5] Accept-Encoding [gzip,deflate] Accept-Charset [ISO]应用程序/ xml; q = 0.9,/; q = 0.8] Accept- -8859-1,utf-8; q = 0.7,*; q = 0.7] Keep-Alive [300] 连接[keep-alive] Referer [XXX/JobSearch/Results.aspx?Python & LTxt = London%2c + South + East & Radius = 0 & LIds2 = ZV & clid = 1621 & cltypeid = 2 & clName = London] Cookie [ecos = 774803468-0; AnonymousUser = MemberId = acc079dd-66b6-4081-9b07-60d6955ee8bf & IsAnonymous = True; PJBIPPOPUP =; WT_FPC = ID = 86.181.183.106-2262469600。30073025:LV = 1272812851736:SS = 1272812789362; SearchSession = SessionGuid = 71de63de-3bd0-4787-895d-b6b9e7c93801 & LOGSOURCE = NAT] 邮政数据: __EVENTTARGET [srpPager%24btnForward] __EVENTARGUMENT [] hdnSearchResults [BV%2CA%2CC0P5x%2COou-%2CB4S-%2CBuC- %2CDzx-%2CHwn-%2CKPP-%2CIVA-%2CC9D-%2CH6X-%2CH7x-%2CJ0x-%2CCvX-%2CCra-%2COHa-%2CHhP-%2CCoj-%2CBlM-%2CE9W-%2CIm8-%2CBqG - %2CPFy-%2 CN%2Fm-%2Ceaa%2CCvj-%2CCtJ-%2CCr7-%2CBpu-%2Cmh%2CMb6-%2CJ%2Fk-%2CHY8-%2COJ7-%2CNtF-%2CEya-%2CErT-%2CEo4 - %2CEKU-%2CDnL-%2CC5M-%2CCyB-%2CBsD-%2CBrc-%2CBpU-%2Col%2C30 2CC1%%2Cd4N%2COo8-%2COi0-%2CLz%2F-%2CLxP-%2CFyp-%2CFVR- %2CEHL-%2CPrP-%2CLmE-%2CK3H-%2CKXJ-%2CFyn%2CIcq-%2CIco-%2CIK4-%2CIIg-%2CH2k-%2CH0N-%2CHwp-%2CHvF-%2CFij-%2CFhl-%2CCwj- %2CCb5-%2CCQj-%2CCQh-%2CB%2B2-%2CBc6-%2ChFo%2CNLq-%2CNI%2F-%2CFzM-%2Cdu-%2CHg2-%2CBug-%2CBse-%2CB9Q-] __VIEWSTATE [ 2FwEPDwUKLTkyMzI2ODA4Ng9kFg YCBA8WBB4EaHJlZgWJAWh0dHA6Ly93d3cuY3dqb2JzLmNvLnVrL0pvYlNlYXJjaC9SU1MuYXNweD9LZXl3b3Jkcz1QeXRob24mTFR4dD1Mb25kb24lMmMrU291dGgrRWFzdCZSYWRpdXM9MCZMSWRzMj1aViZjbGlkPTE2MjEmY2x0eXBlaWQ9MiZjbE5hbWU9TG9uZG9uHgV0aXRsZQUkTGF0ZXN0IFB5dGhvbiBqb2JzIGZyb20gQ1dKb2JzLmNvLnVrZAIGDxYCHgRUZXh0BV48bGluayByZWw9ImNhbm9uaWNhbCIgaHJlZj0iaHR0cDovL3d3dy5jd2pvYnMuY28udWsvSm9iU2Vla2luZy9QeXRob25fTG9uZG9uX2wxNjIxX3QyLmh0bWwiIC8%2BZAIIEGRkFg4CBw8WAh8CBV9Zb3VyIHNlYXJjaCBvbiA8Yj5LZXl3b3JkczogUHl0aG9uOyBMb2NhdGlvbjogTG9uZG9uLCBTb3V0aCBFYXN0OyA8L2I%2BIHJldHVybmVkIDxiPjg1PC9iPiBqb2JzLmQCCQ8WAh4HVmlzaWJsZWhkAgsPFgIfAgUoVGhlIG1vc3QgcmVsZXZhbnQgam9icyBhcmUgbGlzdGVkIGZpcnN0LmQCEw8PFgIeC05hdmlnYXRlVXJsBQF%2BZGQCFQ9kFgYCBQ8PFgYfAgUGUHl0aG9uHgtEZWZhdWx0VGV4dAUMZS5nLiBhbmFseXN0HhNEZWZhdWx0VGV4dENzc0NsYXNzZWRkAgsPDxYGHwIFEkxvbmRvbiwgU291dGggRWFzdB8FBQllLmcuIEJhdGgfBmVkZAIRDxAPFgYeDURhdGFUZXh0RmllbGQFClJhZGl1c05hbWUeDkRhdGFWYWx1ZUZpZWxkBQZSYWRpdXMeC18hRGF0YUJvdW5kZ2QQFREHMCBtaWxlcwcyIG1pbGVzBzUgbWlsZXMIMTAgbWlsZXMIMTUgbWlsZXMI MjAgbWlsZXMIMjUgbWlsZXMIMzAgbWlsZXMIMzUgbWlsZXMINDAgbWlsZXMINDUgbWlsZXMINTAgbWlsZXMINjAgbWlsZXMINzAgbWlsZXMIODAgbWlsZXMIOTAgbWlsZXMJMTAwIG1pbGVzFREBMAEyATUCMTACMTUCMjACMjUCMzACMzUCNDACNDUCNTACNjACNzACODACOTADMTAwFCsDEWdnZ2dnZ2dnZ2dnZ2dnZ2dnZGQCFw9kFgQCAQ9kFgQCBA8QZA8WA2YCAQICFgMQBQhBbGwgam9icwUBMGcQBRlEaXJlY3QgZW1wbG95ZXIgam9icyBvbmx5BQEyZxAFEEFnZW5jeSBqb2JzIG9ubHkFATFnZGQCBg8QZA8WA2YCAQICFgMQBQlSZWxldmFuY2UFATFnEAUERGF0ZQUBMmcQBQZTYWxhcnkFATNnZGQCBQ8PFgYeClBhZ2VOdW1iZXICAh4PTnVtYmVyT2ZSZXN1bHRzAlUeDlJlc3VsdHNQZXJQYWdlAhRkZAIZDxYCHwNoZGQ%3D] Refinesearch%24txtKeywords [Python的] Refinesearch%24txtLocation [伦敦%2C +南东+] Refinesearch%24ddlRadius [0] ddlCompanyType [0] ddlSort [1] 响应头: 缓存控制[私人] 日期[Su n,02 May 2010 16:09:27 GMT] Content-Type [text/html; charset = utf-8] X-Powered-By [ASP.NET] X [Site-Host] [P310] X [Powered by By] [NET] X [ -AspNet-Version [2.0.50727] Set-Cookie [SearchSession = SessionGuid = 71de63de-3bd0-4787-895d-b6b9e7c93801 & LogSource = NAT;路径= /] 内容编码[gzip的] 因人而异[接受编码] 传送编码[分块]

--------什么现在我'SENDING USING机械化,一些头添加ETC ----------- POST /JobSearch/Results.aspx?Keywords=Python & LTxt =伦敦%2C +南+东&半径= 0 & LIds2 = ZV & CLID = 1621 & cltypeid = 2 & clName = London HTTP/1.1 \ r \ n内容长度:2424 \ r \ n Accept-Language:zh-cn,en; q = 0.5 \ r \ n Accept-Encoding:gzip \ r \ n Host :www.cwjobs.co.uk \ r \ n Accept:text/html,application/xhtml + xml,application/xml; q = 0.9,/; q = 0.8 \ r \ n Accept-Charset:ISO-8859-1,utf-8; q = 0.7 ,*; q = 0的。7 \ r \ n 连接:保持活动\ r \ n Cookie:AnonymousUser = MemberId = 8fa5ddd7-17ed-425e-b189-82693bfbaa0c & IsAnonymous = True; SearchSession = SessionGuid = 33e4e439-c2d6-423f-900F-574099310d5a & LOGSOURCE = NAT \ r \ n 的Referer:XXX /职位搜索/ Results.aspx关键词= Python的& LTxt =伦敦%2C +南+东&半径= 0 & LIds2 = ZV & CLID = 1621 & cltypeid = 2 &列表CLNAME =伦敦\ r \ n 内容类型:应用/ X WWW的窗体-urlencoded \ r \ n \ r \ N ' ' __EVENTTARGET = srpPager%24btnForward & __EVENTARGUMENT = & hdnSearchResults = BV%2CA%2CC0eif%2CMwc%2CM6s%2COou%2CK09%2CG4H%2CEZf%2CGTu%2CLrr%2CGuX%2CGs9%2CEz9%2CL5X%2CL9U%2ChU%2CHHf%2CMAL%2CNDi%2CJrY% 2CGBy%2CM%2BO%2CdE-%2CpI%2CtDI%2CL5L% 2CL7l%2CL8z%2CM%2FA%2CPPP%2CCM0%2CEpK%2CHPy%2Cez%2C7p%2CJ2U%2CJ9b%2CJ%2F2%2CKea%2CLBj%2CLvi%2CL2t%2CM8r%2CM9S%2CM%2FA%2CPRT%2CPgi%2Csg7% 2CF6%2CI2F%2CJTd%2 CO-%2CC0v%2CC3f%2CDCq%2CDxn%2CERl%2CUbV%2CGME%2CGMG%2CGd2%2CGgO%2CGyK%2CG0h%2CG4F%2CG5p%2CJGL%2CJHJ%2CKhj%2CL4L%2CMM1%2CMYL%2CMYN %2CMp4%2CNL0%2COrj%2CvuW%2CBdE%2CBfv%2CI1i%2CBCh-%2COLA%2CHH4%2CM6O%2CM8Q%2CMre & __VIEWSTATE =%2FwEPDwUKLTkyMzI2ODA4Ng9kFgYCBA8WBB4EaHJlZgWJAWh0dHA6Ly93d3cuY3dqb2JzLmNvLnVrL0pvYlNlYXJjaC9SU1MuYXNweD9LZXl3b3Jkcz1QeXRob24mTFR4dD1Mb25kb24lMmMrU291dGgrRWFzdCZSYWRpdXM9MCZMSWRzMj1aViZjbGlkPTE2MjEmY2x0eXBlaWQ9MiZjbE5hbWU9TG9uZG9uHgV0aXRsZQUkTGF0ZXN0IFB5dGhvbiBqb2JzIGZyb20gQ1dKb2JzLmNvLnVrZAIGDxYCHgRUZXh0BV48bGluayByZWw9ImNhbm9uaWNhbCIgaHJlZj0iaHR0cDovL3d3dy5jd2pvYnMuY28udWsvSm9iU2Vla2luZy9QeXRob25fTG9uZG9uX2wxNjIxX3QyLmh0bWwiIC8%2BZAIIEGRkFg4CBw8WAh8CBV9Zb3VyIHNlYXJjaCBvbiA8Yj5LZXl3b3JkczogUHl0aG9uOyBMb2NhdGlvbjogTG9uZG9uLCBTb3V0aCBFYXN0OyA8L2I%2BIHJld HVybmVkIDxiPjg1PC9iPiBqb2JzLmQCCQ8WAh4HVmlzaWJsZWhkAgsPFgIfAgUoVGhlIG1vc3QgcmVsZXZhbnQgam9icyBhcmUgbGlzdGVkIGZpcnN0LmQCEw8PFgIeC05hdmlnYXRlVXJsBQF%2BZGQCFQ9kFgYCBQ8PFgYfAgUGUHl0aG9uHgtEZWZhdWx0VGV4dAUMZS5nLiBhbmFseXN0HhNEZWZhdWx0VGV4dENzc0NsYXNzZWRkAgsPDxYGHwIFEkxvbmRvbiwgU291dGggRWFzdB8FBQllLmcuIEJhdGgfBmVkZAIRDxAPFgYeDURhdGFUZXh0RmllbGQFClJhZGl1c05hbWUeDkRhdGFWYWx1ZUZpZWxkBQZSYWRpdXMeC18hRGF0YUJvdW5kZ2QQFREHMCBtaWxlcwcyIG1pbGVzBzUgbWlsZXMIMTAgbWlsZXMIMTUgbWlsZXMIMjAgbWlsZXMIMjUgbWlsZXMIMzAgbWlsZXMIMzUgbWlsZXMINDAgbWlsZXMINDUgbWlsZXMINTAgbWlsZXMINjAgbWlsZXMINzAgbWlsZXMIODAgbWlsZXMIOTAgbWlsZXMJMTAwIG1pbGVzFREBMAEyATUCMTACMTUCMjACMjUCMzACMzUCNDACNDUCNTACNjACNzACODACOTADMTAwFCsDEWdnZ2dnZ2dnZ2dnZ2dnZ2dnZGQCFw9kFgQCAQ9kFgQCBA8QZA8WA2YCAQICFgMQBQhBbGwgam9icwUBMGcQBRlEaXJlY3QgZW1wbG95ZXIgam9icyBvbmx5BQEyZxAFEEFnZW5jeSBqb2JzIG9ubHkFATFnZGQCBg8QZA8WA2YCAQICFgMQBQlSZWxldmFuY2UFATFnEAUERGF0ZQUBMmcQBQZTYWxhcnkFATNnZGQCBQ8PFgYeClBhZ2VOdW1iZXICAR4PTnVtYmVyT2ZSZXN1bHRzAlUeDlJlc3VsdHNQZXJQYWdlAhRkZAI ZDxYCHwNoZGQ%3D & Refinesearch%24txtKeywords = Python的& Refinesearch%24txtLocation =伦敦%2CSouth +东& Refinesearch%24ddlRadius = 0 & Refinesearch%24btnSearch =搜寻& ddlCompanyType = 0 & ddlSort = 1'

回答

1

SearchSession cookies非常不同:工作人员有

SearchSession=SessionGuid=71de63de-3bd0-4787-895d-b6b9e7c93801 

and the non-workin g已经有

SearchSession=SessionGuid=33e4e439-c2d6-423f-900f-574099310d5a 

你有什么办法来独立验证为什么第二个可能不被服务器接受吗? (这可能不是这种情况,但由于服务器正在抱怨你的SearchSession cookie,它似乎应该是第一个查询行)。