2014-10-08 236 views
0

我想要做这样的事情! 因此,我只剩下字符串的网站部分。我在字符串中的报价有问题。解析字符串 - Http字符串

 /////////////////////This is what i read into a string. 

      ///<td width="118"><a href="research.html" class="navText style10 style12"> 

    ///////I wanna be able to parse this so i am only left with research.html 

    //I sometimes also get a string that contains: 

    //<a href="http://www.ucalgary.ca" class="style18"><font size="3">University of Calgary</font></a></div> 

    //From this string i wanna keep http://www.ucalgary.ca 

到目前为止我所得到的并不总是适用于每一种情况。我会感谢您的帮助!我的代码是

 public class Parse 
     { 
      public static void main(String[] args) 
      { 
      String h = "<a href=\"http://www.departmentofmedicine.com/policy.htm\">"; 
      int n = getIndexOf(h, '"', 0); 


      String[] a = h.substring(n).split(">"); 
      String url = a[0].replaceAll("\"", ""); 
      //String value = a[1].replaceAll("</a", ""); 

      System.out.println(url + " "); 
      } 

      public static int getIndexOf(String str, char c, int n) 
      { 
      int pos = str.indexOf(c, 0); 
      while (n-- > 0 && pos != -1) 
      { 
       pos = str.indexOf(c, pos + 1); 
      } 
      return pos; 
      } 
     } 
+0

看看Java字符串的方法。他们已经剥离和这样 – jgr208 2014-10-08 14:51:28

+0

目前尚不清楚,从你的输入,“", what do you want to keep/extract ? – ToYonos 2014-10-08 14:53:44

+0

only departmentofmedicine.com/policy.htm /// This input works but the other inputs i mentioned above dont seem to work!! For example if i use this as input///// University of Calgary chillax786 2014-10-08 14:58:52

回答

0

我会给Pattern和Matcher这样的尝试:

String s = "<a href=\"http://www.departmentofmedicine.com/policy.htm\">"; 

    Pattern p = Pattern.compile(".*href=\"([^\"]*).*"); 
    Matcher m = p.matcher(s); 
    if(m.matches()) { 
     System.out.println(m.group(1)); 
    } 
0

小码:

字符串H =“http://www.departmentofmedicine.com/policy .htm \“>”;
String url = h.substring(h.indexOf(“http”))。replace(“\”>“,”“);
System.out.println(url);

输出将是: http://www.departmentofmedicine.com/policy.htm

测试我的机器上。

另外发布什么是可能的情况。这样我可以告诉你更好的解决方案。

解决方案的所有三个posibilities:

 //String h1 = "<a href=\"http://www.departmentofmedicine.com/policy.htm\">"; 
     //String h1 = `"<a href=\"ucalgary.ca\"; class=\"style18\"><font size=\"3\">University of Calgary</font></a>"; 
    String h1="<td width=\"118\"><a href=\"research.html\" class=\"navText style10 style12\">";` 

String url = h1.substring(h1.indexOf("href=\"") + "href=\"".length()).substring(0, h1.substring(h1.indexOf("href=\"") + "href=\"".length()).indexOf("\"")); 

System.out.println(url); 

取消注释字符串H1;逐个对象并检查你的要求。

上面的代码是给输出:
research.html
http://www.departmentofmedicine.com/policy.htm
ucalgary.ca

+0

输出将是: – 2014-10-08 15:12:12

+0

这是另一种情况: chillax786 2014-10-08 15:20:02

+0

this is also another case: University of Calgary chillax786 2014-10-08 15:20:42