从Ghost.py文件获取信息

-2

我正在做一个项目，我需要从网页获取信息。我正在使用python和ghost。我看到这个代码的文件中：从Ghost.py文件获取信息

links = gh.evaluate(""" 
        var linksobj = document.querySelectorAll("a"); 
        var links = []; 
        for (var i=0; i<linksobj.length; i++){ 
         links.push(linksobj[i].value); 
        } 
        links; 
       """)

这段代码绝对不是蟒蛇。哪种语言，我可以学习如何配置它？如何从标签中找到一个字符串，例如。在：

title>this is title of the webpage

我怎样才能得到

this is title of the page

感谢。

来源

2014-05-05 user1934948

如果收到输出是一个字符串，然后我想你应该看看在Python中常见的字符串操作。您可以剥离，拆分并用字符串做很多事情。 https://docs.python.org/2/library/string。html –

貌似javascript –

@PadraicCunningham你的分析对我来说似乎也是正确的。 –

ghost.py是webkit的客户端。它允许您加载一个网页并与其DOM和运行时进行交互。

这意味着，一旦你安装并运行的一切，你可以简单地这样做：

from ghost import Ghost 
ghost = Ghost() 
page, resources = ghost.open('http://stackoverflow.com/') 
if page.http_status == 200: 
    result, extra = ghost.evaluate('document.title;') 
    print('The title is: {}'.format(result))

来源

2014-05-05 10:28:22

编辑：在看了Padraic Cunningham的答案之后，在我看来，我不幸误解了你的问题。任何我如何离开我的答案为未来的参考或者downvotes。：P

如果您收到的输出是一个字符串，那么在python中实现您在问题中提到的所需输出中的常见字符串操作。

您会收到：title>this is title of the webpage

你渴望：this is title of the webpage

假设您收到的输出总是在相同的格式，所以你可以做下面的字符串操作以获取所需输出。使用split操作：

>>> s = 'title>this is title of the webpage' 
>>> p = s.split('>') 
>>> p 
['title', 'this is title of the webpage'] 
>>> p[1] 
'this is title of the webpage'

这里p是一个列表，所以你必须访问包含所需输出其应有的元素。

或者更简单的方法是做一个子串。

>>> s = 'title>this is title of the webpage' 
>>> p = s[6:] 
>>> p 
'this is title of the webpage'

在上面的代码片断p = s[6:]意味着你想拥有的title>this is title of the webpage起价7元到结束的所有内容的字符串。换句话说，你忽略了第一个6元素。

如果您收到的输出不总是相同的格式，那么您可能更喜欢使用regular expressions。

您的第二个问题已在评论部分中得到解答。我希望我能正确理解你的问题。

来源

2014-05-05 10:14:28

使用requests和beautifulSoup

import requests 
from bs4 import BeautifulSoup 
r = requests.get("https://www.google.com/") 
soup = BeautifulSoup(r.text) 
soup.title.string 
In [3]: soup.title.string 
Out[3]: u'Google'

来源

2014-05-05 10:17:35

+1我认为这是OP实际需要的。 :) –

短而甜！ –

我不明白这是如何解决这个人的问题，因为他没有使用请求或美丽。 –

从Ghost.py文件获取信息

回答

相关问题