2017-04-24 61 views
1
def get_site(r): 
    from bs4 import BeautifulSoup 
    soup=BeautifulSoup(r, 'lxml') 

这是我正在使用的代码。提到的r是从另一个函数请求的url。python:in __getattr__ raise AttributeError,attr AttributeError:__len__

r=urllib2.request(url) 

这说明在运行Python代码的错误是这样的:

能否请你帮我解决这个问题?该URL为https格式,我使用Python 2.7

更新:整个代码是在这里:

def web(url,a): def get_url(url): import urllib2 base=url r=urllib2.urlopen(url).read() return r def get_site(r): from bs4 import BeautifulSoup soup=BeautifulSoup(r, 'html.parser') for lt in soup.find_all("a",class_="url"): if lt.get("href"): si=lt.get("href") name=soup.find_all("h4",itemprop="name") cat="username-%s, site-%s\n"%(name,si) with open("/home/agneljeo/Desktop/url.txt","w") as ul1: ul1.write(cat) ul1.close() return soup def get_fol(soup): global url1 url1="" for item in soup.find_all("a",class_="d-inline-block no-underline mb-1"): if len(item.get("href"))<2: url1=item.get("href") break return url1 def main(): gu=get_url(url) gs=get_site(gu) gf=get_fol(gs) if a>0: web(gf,a-1) main()

调用该网址在明确了终端和URL在波纹管

意见所

回答

0

尝试使用这个代替请求

from bs4 import BeautifulSoup 
import urllib2 
url='https://github.com/johnpapa?tab=followers' 
content = urllib2.urlopen(url).read() 
soup = BeautifulSoup(content, 'html.parser') 
for lt in soup.find_all("a",class_="url"): 
    if lt.get("href"): 
     si=lt.get("href") 
     print si 

[email protected]:~/Desktop$ python sa.py http://johnpapa.net

这里是如何你可以转到跟随者页面和BS4进口BeautifulSoup抢他的追随者

import urllib2 
url='https://github.com/johnpapa?tab=followers' 
content = urllib2.urlopen(url).read() 
soup = BeautifulSoup(content, 'html.parser') 
for lt in soup.find_all("a",class_="url"): 
    if lt.get("href"): 
     si=lt.get("href") 
     print si 
for item in soup.find_all("a",class_="d-inline-block no-underline mb-1"): 
     url1=item.get("href") 
     url1='https://github.com'+url1+'?tab=followers' 
     print url1 
     content = urllib2.urlopen(url1).read() 
     soup1 = BeautifulSoup(content, 'html.parser') 
     for lt in soup1.find_all("a",class_="url"): 
      if lt.get("href"): 
       si=lt.get("href") 
       print si 
     for item in soup1.find_all("a",class_="d-inline-block no-underline mb-1"): 
       url1=item.get("href") 
       url1='https://github.com'+url1+'?tab=followers' 
       print url1 

输出:

[email protected]:~/Desktop$ python sa.py http://johnpapa.net https://github.com/gu1ma?tab=followers ...

+0

后'汤= BeautifulSoup(R 'LXML')代码'是'对于soup.find_all中的lt(“a”,class _ =“url”):'为此urlopen(url).read是必要的或.request ...? –

+0

urllib2.urlopen(url).read()将获得页面的完整源码 –

+0

btw新错误是'ValueError:未知的url类型:',即使url的形式是https:并且采取的URL是(https:// github.com/johnpapa?tab=followers) –