python
  • parsing
  • beautifulsoup
  • 2017-02-20 75 views 1 likes 
    1

    我最近开始了解更多关于Python的知识以及如何使用BeautifulSoup解析网站。从BeautifulSoup解析获取特定值

    我现在面临的问题是我似乎被卡住了。

    HTML代码(后作出的汤):

    <div class="mod-3-piece-app__visual-container__chart"> 
        <div class="mod-ui-chart--dynamic" data-chart-config='{"chartData":{"periods":[{"year":2013,"period":null,"periodicity":"A","icon":null},{"year":2014,"period":null,"periodicity":"A","icon":null},{"year":2015,"period":null,"periodicity":"A","icon":null},{"year":2016,"period":null,"periodicity":"A","icon":null},{"year":2017,"period":null,"periodicity":"A","icon":null},{"year":2018,"period":null,"periodicity":"A","icon":null}],"forecastRange":{"from":3.5,"to":5.5},"actualValues":[5.6785,6.45,9.22,8.31,null,null],"consensusData":[{"y":5.6307,"toolTipData":{"low":5.5742,"high":5.7142,"analysts":34,"restatement":null}},{"y":6.3434,"toolTipData":{"low":6.25,"high":6.5714,"analysts":35,"restatement":null}},{"y":9.1265,"toolTipData":{"low":9.02,"high":9.28,"analysts":40,"restatement":null}},{"y":8.2734,"toolTipData":{"low":8.17,"high":8.335,"analysts":40,"restatement":null}},{"y":8.9304,"toolTipData":{"low":8.53,"high":9.63,"analysts":41,"restatement":null}},{"y":10.1252,"toolTipData":{"low":8.63,"high":11.61,"analysts":42,"restatement":null}}]}}'> 
         <noscript> 
          <div class="mod-ui-chart--static"> 
           <div class="mod-ui-chart--sprited" style="width:410px; height:135px; background:url('/data/Charts/EquityForecast?issueID=36276&amp;height=135&amp;width=410') 0px -270px no-repeat;"> 
           </div> 
          </div> 
         </noscript> 
        </div> 
    </div> 
    

    我的代码:

    from bs4 import BeautifulSoup 
    import urllib.request 
    
    
    data = [] 
    List = ['AAPL'] 
    
    # Iterates Through List 
    for i in List : 
        # The webpage which we wish to Parse 
        soup = BeautifulSoup(urllib.request.urlopen('https://markets.ft.com/data/equities/tearsheet/forecasts?s=AAPL:NSQ').read(), 'lxml') 
    
        # Gathering the data 
        Values = soup.find_all("div", {"class":"mod-3-piece-app__visual-container__chart"})[4] 
        print(Values) 
    
        # Getting desired values from data 
    

    我希望获得的是价值后{"y" ....,因此数字5.6307,6.3434,9.1265, 8.2734, 8.9304 and 10.1252,但我不能为我的生活想出了如何。我试过Values.get_text以及Values.text,但这只是空白(可能是因为所有的代码都在列表或类似内容中)。

    如果我可以在“toolTipData”之后得到数据,那也可以。

    有没有人介意帮助我?

    如果我错过了任何内容,请提供反馈意见,以便我将来可以提出更好的问题。

    谢谢

    回答

    1

    不久,您想要获取位于属性标记内的一些信息。

    我所要做的就是:

    1. 打开网页源了解哪来位于您的信息
    2. 使用find_all寻找合适的类属性mod-ui-chart--dynamic
    3. 使用find_all位于每一个元素,取其属性内容使用.get()
    4. 在属性内容字符串中搜索术语'actualValues'
    5. 如果找到'actualValues',然后加载json并浏览它的值。

    请尝试以下一段代码。我评论过它,所以你应该能够理解它在做什么。

    代码:

    from bs4 import BeautifulSoup 
    import urllib.request 
    import json 
    
    data = [] 
    List = ['AAPL'] 
    
    # Iterates Through List 
    for i in List: 
        # The webpage which we wish to Parse 
        soup = BeautifulSoup(urllib.request.urlopen('https://markets.ft.com/data/equities/tearsheet/forecasts?s=AAPL:NSQ').read(), 'lxml') 
    
        # Gathering the data 
        elemList = soup.find_all('div', {'class':'mod-ui-chart--dynamic'}) 
    
        #we will get the attribute info of each `data-chart-config` tag, inside each `div` with `class=mod-ui-chart--dynamic` 
        for elem in elemList: 
    
         elemID = elem.get('class') 
         elemName = elem.get('data-chart-config') 
    
         #if there's no value in elemName, pass... 
         if elemName is None: 
          pass 
    
         #if the term 'actualValues' exists in elemName 
         elif 'actualValues' in elemName: 
          #print('Extracting actualValues from:\n') 
          #print("Attribute id = %s" % elemID) 
          #print() 
          #print("Attribute name = %s" % elemName) 
          #print() 
    
          #reading `data-chart-config` attribute as a json 
          data = json.loads(elemName) 
    
          #print(json.dumps(data, indent=4, sort_keys=True)) 
          #print(data['chartData']['actualValues']) 
    
          #fetching desired info 
          val1 = data['chartData']['actualValues'][0] 
          val2 = data['chartData']['actualValues'][1] 
          val3 = data['chartData']['actualValues'][2] 
          val4 = data['chartData']['actualValues'][3] 
    
          #printing desired values 
          print(val1, val2, val3, val4) 
    
          print('-'*15) 
    

    输出:

    1.9 1.42 1.67 3.36 
    --------------- 
    5.6785 6.45 9.22 8.31 
    --------------- 
    50557000000 42358000000 46852000000 78351000000 
    --------------- 
    170910000000 182795000000 233715000000 215639000000 
    --------------- 
    

    p.s.1:,如果你愿意,你可以取消注释elif loopprint()功能理解程序。

    p.s.2:如果你愿意,你可以在val1 = data['chartData']['actualValues'][0]'consensusData'

    +0

    谢谢你改变了'actualValues',这当我尝试将其他资产(IBM为例)val1-完全适用于1资产情况(仅AAPL),但val4得到过分夸大。我会尽我所能找到一种方法将这本词典拆分成一个列表,然后在每次运行时追加它。 –

    相关问题