计算div标记的平均高度和平均宽度

我需要获取html文档的平均div高度和宽度。计算div标记的平均高度和平均宽度

我尝试这种解决方案，但它不工作：

import numpy as np 
average_width = np.mean([div.attrs['width'] for div in my_doc.get_div() if 'width' in div.attrs]) 
average_height = np.mean([div.attrs['height'] for div in my_doc.get_div() if 'height' in div.attrs]) 
print average_height,average_width

的get_div方法返回所有的列表DIV通过beautifulSoup

这里的find_all方法检索是一个例子：

print my_doc.get_div()[1] 

<div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:45px; top:81px; width:127px; height:9px;"> 
    <span style="font-family: EICMDA+AdvTrebu-R; font-size:8px">Journal of  Infection (2015) 
    </span> 
    <span style="font-family: EICMDB+AdvTrebu-B; font-size:8px">xx</span> 
    <span style="font-family: EICMDA+AdvTrebu-R; font-size:8px">, 1</span> 
    <span style="font-family: EICMDD+AdvPS44A44B; font-size:7px">e</span> 
    <span style="font-family: EICMDA+AdvTrebu-R; font-size:8px">4 
    <br/> 
    </span> 
</div>

当我得到的属性，它完美的作品

print my_doc.get_div()[1].attrs 

{u'style': u'position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:45px; top:81px; width:127px; height:9px;'}

但是当我试图获得价值

print my_doc.get_div()[1].attrs['width']

我得到一个错误：

KeyError: 'width'

，但我不理解，因为当我检查类型：

print type(my_doc.get_div()[1].attrs)

这是一本字典，<type 'dict'>

来源

2015-10-15 mazouu rahim

？你可以给网页或更多的HTML页面的源？ – SIslam

@SIslam，我编辑了我的帖子 –

你如何计算'div'的宽度？例如：我有一个'div'设置为100％宽度。如果我的窗口是全屏的话，大概是〜1900px。如果我的窗口更小，'div'更小。那么它的宽度是多少？ '平均'这个概念是怎么来的？ –

可能有更好way-

路-1

下面是我测试的代码，以提取宽度和高度。

from bs4 import BeautifulSoup 

html_doc = '''<div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:45px; top:81px; width:127px; height:9px;"> 
    <span style="font-family: EICMDA+AdvTrebu-R; font-size:8px">Journal of  Infection (2015) 
    </span> 
    <span style="font-family: EICMDB+AdvTrebu-B; font-size:8px">xx</span> 
    <span style="font-family: EICMDA+AdvTrebu-R; font-size:8px">, 1</span> 
    <span style="font-family: EICMDD+AdvPS44A44B; font-size:7px">e</span> 
    <span style="font-family: EICMDA+AdvTrebu-R; font-size:8px">4 
    <br/> 
    </span> 
</div>''' 

soup = BeautifulSoup(html_doc,'html.parser')  
my_att = [i.attrs['style'] for i in soup.find_all("div")] 
dd = ''.join(my_att).split(";") 
dd_cln= filter(None, dd) 
dd_cln= [i.strip() for i in dd_cln ] 
my_dict = dict(i.split(':') for i in dd_cln) 
print my_dict['width']

分路-2 使用正则表达式所描述here。是U使用numpy的意思

工作代码 -

import numpy as np 
import re 
from bs4 import BeautifulSoup 

html_doc = '''<div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:45px; top:81px; width:127px; height:9px;"> 
    <span style="font-family: EICMDA+AdvTrebu-R; font-size:8px">Journal of  Infection (2015) 
    </span> 
    <span style="font-family: EICMDB+AdvTrebu-B; font-size:8px">xx</span> 
    <span style="font-family: EICMDA+AdvTrebu-R; font-size:8px">, 1</span> 
    <span style="font-family: EICMDD+AdvPS44A44B; font-size:7px">e</span> 
    <span style="font-family: EICMDA+AdvTrebu-R; font-size:8px">4 
    <br/> 
    </span> 
</div>''' 

soup = BeautifulSoup(html_doc,'html.parser')  
my_att = [i.attrs['style'] for i in soup.find_all("div")] 
css = ''.join(my_att) 
print css 
width_list = map(float,re.findall(r'(?<=width:)(\d+)(?=px;)', css)) 
height_list = map(float,re.findall(r'(?<=height:)(\d+)(?=px;)', css)) 
print np.mean(height_list) 
print np.mean(width_list)

来源

2015-10-15 16:16:13 SIslam

其实它确实工作，因为关键字是'样式'而不是'宽度'的字典，我试试这个解决方案http://stackoverflow.com/questions/10401110/using-beautiful-soup-to-convert- css-attributes-to-individual-html-attributes： 'import cssutils a = cssutils.parseStyle（my_doc.get_div（）[1]。attrs ['style']） print a ['width']' 但我得到这个错误： '错误\t属性：“CSS背景和边框模块级别3”属性的值无效：textbox 1px solid [1:20 ：border] 警告\t财产：未知物业名称。 [1:47：写作模式]' –

同样在这里！是的，这可能是python库没有的自定义标签！ – SIslam

更改了答案！ – SIslam

计算div标记的平均高度和平均宽度

回答

相关问题