提取部分的网页

我正在做Android上的应用程序。提取部分的网页

我有一个字符串中的网页（所有的HTML）的内容，我需要提取段落（p元素）内的所有文本= class =“content”。

例子：

<p class="content">La la la</p> 
<p class="another">Le le le</p> 
<p class="content">Li li li</p>

结果：

La la la 
Li li li

什么是做到这一点的最好办法？

来源

2010-07-30 pacopepe222

正则表达式是你最好的选择。

http://download-llnw.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html

来源

2010-07-30 16:06:30 Paddy

import java.io.DataInputStream; 
import java.io.IOException; 
import java.net.MalformedURLException; 
import java.net.URL; 
import java.net.URLConnection; 


public class Test { 
    void readScreen() //reads from server 
     { 
     try 
     { 
      URL    url; 
      URLConnection  urlConn; 
      DataInputStream dis; 

      //Open url 
      url = new URL("http://somewebsite.com"); 

      // Note: a more portable URL: 
      //url = new URL(getCodeBase().toString() + "/ToDoList/ToDoList.txt"); 

      urlConn = url.openConnection(); 
      urlConn.setDoInput(true); 
      urlConn.setUseCaches(false); 

      dis = new DataInputStream(urlConn.getInputStream()); 
      String s; 

      while ((s = dis.readLine()) != null) 
      { 
      System.out.println(s); //this is where it reads from the screen 
      } 
      dis.close(); 
      } 

      catch (MalformedURLException mue) {} 
      catch (IOException ioe) {} 
     } 

    public static void main(String[] args){ 

     Test thisTest = new Test(); 
     thisTest.readScreen(); 

    } 
}

来源

2010-07-30 14:32:08 Mike

首先，感谢你的帮助:) 我这样做，我的问题是，我不知道如何只提取网页的某些部分（在我的情况， class =“content”的所有段落）。我知道我可以在所有行中进行手动搜索，但必须有更好的方法来完成它 – pacopepe222 2010-07-30 14:38:12

它可能会更好地下载html文件，然后解析通过那里的文本。你可以使用一些XML工具来找到你想要的标签。这和我在网络和Java方面所做的一样多，很抱歉，我无法提供更多帮助。 – Mike 2010-07-30 17:00:46

提取部分的网页

回答

相关问题