我试图创建一个函数,它接受多个参数,并返回一个可调用的lambda函数。我将这些lambda函数传递到BeautifulSoup的find_all
方法中以解析html。返回动态创建函数
这里是我写生成lambda函数功能:
def tag_filter_function(self, name="", search_terms={}, attrs=[], **kwargs):
# filter attrs that are in the search_terms keys out of attrs
attrs = [attr for attr in attrs if attr not in search_terms.keys()]
# array of strings to compile into a lambda function
exec_strings = []
# add name search into exec_strings
if len(name) > 0:
tag_search_name = "tag.name == \"{}\"".format(name)
exec_strings.append(tag_search_name)
# add generic search terms into exec_strings
if len(search_terms) > 0:
tag_search_terms = ' and '.join(["tag.has_attr(\"{}\") and tag[\"{}\"] == \"{}\"".format(k, k, v) for k, v in search_terms.items()])
exec_strings.append(tag_search_terms)
# add generic has_attr calls into exec_strings
if len(attrs) > 0:
tag_search_attrs = ' and '.join(["tag.has_attr(\"{}\")".format(item) for item in attrs])
exec_strings.append(tag_search_attrs)
# function string
exec_string = "lambda tag: " + " and ".join(exec_strings)
return exec(compile(exec_string, '<string>', 'exec'))
从调用
tag_filter_function(name="article", search_terms={"id" : "article"})
的函数返回的字符串是
lambda tag: tag.name == "article" and tag.has_attr("id") and tag["id"] == "article"
函数的返回值是None
。我不确信exec()
函数是我想在这里使用的,但我真的不确定。将该字符串转换为可执行的lambda函数是可能的,如果是这样的话?不知道我是否以正确的方式开展这项工作。
如果你使用'标签上has_attr',你不应该找'tag.attr '而不是'tag [attr]'? –