2016-04-10 118 views
1

我得到这种类型的响应,我打到的URL,我需要解析这个来获得所需的HTML。如何解析Golang中的HTTP.GET响应

此= AJAX({ “htmlInfo”: “一些-HT​​ML”, “活动促销”: “等等等等”, “moreInfo”: “的Bleh的Bleh”})

如上所述,我有三个关键对值,我需要从中获取“SOME-HTML”,我怎样才能得到它,主要问题是“某些HTML”有转义字符。以下是将会出现的那种回应。

\ u003Cdiv类= \ u0022container列-2 \ u0022 \ u003E \ n \ n \ u003Csection类= \ u0022col-主\ u0022 \ u003E \ n \ r \ n \ u003Cdiv类= \ u0027visor-物品─列表清单list-view-recent \ u0027 \ u003E \ r \ n \ u003Cdiv class = \ u0027grid_item visor-article-teaser list_default \ u0027 \ u003E \ n \ u003Ca class = \ u0027grid_img \ u0027 href = \ u0027/manUnited-is-在最佳\ u0027 \ u003E \ n \ u003Cimg SRC = \ u0022http://www.xyz.com/sites//files/styles/w400h22

任何人都可以请帮我在这方面。我不知道如何解决这个问题。

在此先感谢。

+0

请修改,使问题更清晰,以便更容易为人们帮助你。关键值对是什么?是Javascript吗? Go如何使用它?提供Go代码和真实信息,而不仅仅是“某些HTML”,“Blah Blah”和“Bleh Bleh”。 – PieOhPah

回答

1

最简单的方法是提取JSON,然后将其解组为一个结构。该\uXXXX部分是Unicode字符

package main 

import (
    "encoding/json" 
    "fmt" 
    "regexp" 
) 

// Data follows the structure of the JSON data in the response 
type Data struct { 
    HTMLInfo string `json:"htmlInfo"` 
    OtherInfo string `json:"otherInfo"` 
    MoreInfo string `json:"moreInfo"` 
} 

func main() { 
    // input is an example of the raw response data. It's probably a []byte if 
    // you got it from ioutil.ReadAll(resp.Body) 
    input := []byte(`this=ajax({"htmlInfo":"\u003Cdiv class=\u0022container columns-2\u0022\u003E\n\n \u003Csection class=\u0022col-main\u0022\u003E\n \r\n\u003Cdiv class=\u0027visor-article-list list list-view-recent\u0027 \u003E\r\n\u003Cdiv class=\u0027grid_item visor-article-teaser list_default\u0027 \u003E\n \u003Ca class=\u0027grid_img\u0027 href=\u0027/manUnited-is-the-best\u0027\u003E\n \u003Cimg src=\u0022http://example.com/sites//files/styles/w400h22", "otherInfo": "Blah Blah", "moreInfo": "Bleh Bleh"})`) 

    // First we want to extract the data json using regex with a capture group. 
    dataRegex, err := regexp.Compile("ajax\\((.*)\\)") 
    if err != nil { 
     fmt.Println("regex failed to compile:", err) 
     return 
    } 

    // FindSubmatch should return two matches: 
    // 0: The full match 
    // 1: The contents of the capture group (what we want) 
    matches := dataRegex.FindSubmatch(input) 
    if len(matches) != 2 { 
     fmt.Println("incorrect number of match results:", len(matches)) 
     return 
    } 
    dataJSON := matches[1] 

    // Since the data is in JSON format, we can unmarshal it into a struct. If 
    // you don't care at all about the fields other than "htmlInfo", you can 
    // omit them from the struct. 
    data := &Data{} 
    if err := json.Unmarshal(dataJSON, data); err != nil { 
     fmt.Println("failed to unmarshal data json:", err) 
    } 

    // You now have access to the "htmlInfo" property 
    fmt.Println("HTML INFO:", data.HTMLInfo) 
} 

将产生:

HTML INFO: <div class="container columns-2"> 

<section class="col-main"> 

<div class='visor-article-list list list-view-recent' > 
<div class='grid_item visor-article-teaser list_default' > 
<a class='grid_img' href='/manUnited-is-the-best'> 
<img src="http://example.com/sites//files/styles/w400h22