2017-10-09 85 views
1

创建基于Spring MVC宁静的控制器,它采用硬编码的RSS HTTP URL和XML并将其转换为JSON:使用Java转换RSS订阅XML到JSON是显示特殊字符

RssFeedController:

import java.io.IOException; 
import java.io.InputStream; 
import java.net.HttpURLConnection; 
import java.net.MalformedURLException; 
import java.net.URL; 
import java.net.URLConnection; 

import org.apache.commons.io.IOUtils; 
import org.apache.log4j.Logger; 
import org.json.JSONObject; 
import org.json.XML; 

import com.fasterxml.jackson.databind.ObjectMapper; 

@RestController 
public class RssFeedController { 

    private HttpHeaders headers = null; 

    public RssFeedController() { 
     headers = new HttpHeaders(); 
     headers.add("Content-Type", "application/json"); 
    } 

    @RequestMapping(value = "/v2/convertToJson", method = RequestMethod.GET, produces = "application/json") 
    public String getRssFeedAsJson() throws IOException { 
     InputStream xml = getInputStreamForURLData("http://www.samplefeed.com/feed"); 
     String xmlString = IOUtils.toString(xml); 
     JSONObject jsonObject = XML.toJSONObject(xmlString); 
     ObjectMapper objectMapper = new ObjectMapper(); 
     Object json = objectMapper.readValue(jsonObject.toString(), Object.class); 
     String response = objectMapper.writeValueAsString(json); 
     return response; 
    } 

    public static InputStream getInputStreamForURLData(String targetUrl) { 
     URL url = null; 
     HttpURLConnection httpConnection = null; 
     InputStream content = null; 

     try { 
      url = new URL(targetUrl); 
      URLConnection conn = url.openConnection(); 
      conn.setRequestProperty("User-Agent", "Mozilla/5.0"); 
      httpConnection = (HttpURLConnection) conn; 
      int responseCode = httpConnection.getResponseCode(); 
      content = (InputStream) httpConnection.getInputStream(); 
     } 
     catch (MalformedURLException e) { 
      e.printStackTrace(); 
     } 
     catch (IOException e) { 
      e.printStackTrace(); 
     } 
     return content; 
    } 

pom.xml的

<dependency> 
     <groupId>org.json</groupId> 
     <artifactId>json</artifactId> 
     <version>20170516</version> 
    </dependency> 

    <dependency> 
     <groupId>commons-io</groupId> 
     <artifactId>commons-io</artifactId> 
     <version>2.5</version> 
    </dependency> 

所以,原来的RSS源有以下内容:

<item> 
    <title>October Fest Weekend</title> 
    <link>http://www.samplefeed.com/feed/OctoberFestWeekend</link> 
    <comments>http://www.samplefeed.com/feed/OctoberFestWeekend/#comments</comments> 
    <pubDate>Wed, 04 Oct 2017 17:08:48 +0000</pubDate> 
    <dc:creator><![CDATA[John Doe]]></dc:creator> 
      <category><![CDATA[Uncategorized]]></category> 

    <guid isPermaLink="false">http://www.samplefeed.com/feed/?p=9227</guid> 
    <description><![CDATA[<p> 
</p> 
<p>Doors Open:6:30pm<br /> 
Show Begins: 7:30pm<br /> 
Show Ends (Estimated time): 11:00pm<br /> 
Location: Staples Center</p> 
<p>Directions</p> 
<p>Map of ...</p> 
<p>The post <a rel="nofollow" href="http://www.samplefeed.com/feed/OctoberFestWeekend/">OctoberFest Weekend</a> appeared first on <a rel="nofollow" href="http://www.samplefeed.com">SampleFeed</a>.</p> 
]]></description> 

这使得成JSON这样的:

{ 
    "guid": { 
     "content": "http://www.samplefeed.com/feed/?p=9227", 
     "isPermaLink": false 
    }, 
    "pubDate": "Wed, 04 Oct 2017 17:08:48 +0000", 
    "category": "Uncategorized", 
    "title": "October Fest Weekend", 
    "description": "<p>\n??</p>\n<p>Doors Open:6:30pm<br />\nShow Begins:?? 7:30pm<br />\nShow Ends (Estimated time):??11:00pm<br />\nLocation: Staples Center</p>\n<p>Directions</p>\n<p>Map of ...</p>\n<p>The post <a rel=\"nofollow\" href=\"http://www.samplefeed.com/feed/OctoberFestWeekend/\">OctoberFest Weekend</a> appeared first on <a rel=\"nofollow\" href=\"http://www.samplefeed.com\">Sample Feed</a>.</p>\n", 
    "dc:creator": "John Doe", 
    "link": "http://www.samplefeed.com/feed/OctoberFestWeekend", 
    "comments": "http://www.samplefeed.com/feed/OctoberFestWeekend/#comments" 
} 

请在所呈现的JSON注意到有两个问号(“?”)之后像这样的“说明”键的值内:

"description": "<p>\n??</p>\n 

此外,还有在这里两个问号演出开始后:

<br />\nShow Begins:?? 

还在晚上11点之前。

Show Ends (Estimated time):??11:00pm<br /> 

这不是唯一的显示特殊字符的模式,还有地方有三个?生成的标记和一些地方像?????

例如

<title>Today’s 20th Annual Karaoke</title> 

呈现像这样JSON:

"title": "Today???s 20th Annual Karaoke" 

而且

<content-encoded>: <![CDATA[(Monte Vista High School, NY.). </span></p>]]></content:encoded> 

呈现像这样JSON:

"content:encoded": "(Monte Vista High School, NY.).????</span></p> 

还有的地方对XML有类似的地方短跑(“ - ”):

<strong>Welcome</strong> – Welcome to the Party! 

它得到渲染JSON:

<strong>Welcome</strong>????? Welcome to the Party! 

有谁知道如何设置正确的编码在我的代码,所以我能避免这些坏/特殊字符呈现问题?

回答

0

得到这样摆脱未知字符(580)为:

@RequestMapping(value = "/v2/convertToJson", method = RequestMethod.GET, produces = "application/json;charset=UTF-8") 
public String getRssFeedAsJson() throws IOException, IllegalArgumentException { 
    String xmlString = readUrlToString("http://www.sample.com/feed"); 
    JSONObject xmlJSONObj = XML.toJSONObject(xmlString); 
    byte[] ptext = xmlJSONObj.toString().getBytes(ISO_8859_1); 
    String jsonResponse = new String(ptext, UTF_8); 
    return jsonResponse; 
} 

public static String readUrlToString(String url) { 
    BufferedReader reader = null; 
    String result = null; 
    String retValue = null; 
    try { 
     URL u = new URL(url); 
     HttpURLConnection conn = (HttpURLConnection) u.openConnection(); 
     conn.setRequestProperty("User-Agent", "Mozilla/5.0"); 
     conn.setRequestMethod("GET"); 
     conn.setDoOutput(true); 
     conn.setReadTimeout(2 * 1000); 
     conn.connect(); 
     reader = new BufferedReader(new InputStreamReader(conn.getInputStream())); 
     StringBuilder builder = new StringBuilder(); 
     String line; 
     while ((line = reader.readLine()) != null) { 
      builder.append(line).append("\n"); 
     } 
     result = builder.toString(); 
     retValue = result.replaceAll("[^\\x00-\\x7F]", ""); 
    } 
    catch (IOException e) { 
     e.printStackTrace(); 
    } 
    finally { 
     if (reader != null) { 
      try { 
       reader.close(); 
      } 
      catch (IOException ignoreOnClose) { 
      } 
     } 
    } 
    return retValue; 
} 

令人沮丧,没有人比其他SamDev试图帮助...

0

使用Java正显示特殊字符

审查一行代码行我得到了解决后,我更新我的回答你的问题 特殊字符转换RSS订阅XML到JSON回复为

如果更新这行代码

@RequestMapping(value = "/v2/convertToJson", method = RequestMethod.GET, produces = "application/json") 

@RequestMapping(value = "/v2/convertToJson", method = RequestMethod.GET, produces = "application/json;charset=UTF-8") 

你需要指定UTF-8 charset编码产生JSON的参数值的方式。我很抱歉以前的误解答案,但是我现在更新它。

+0

我很困惑...我不是使用emojis - 我将如何在我的代码中使用这些库中的一个?你能为我提供一个例子吗?还有什么其他编码或字符集可以防止这种情况?感谢您的回应... –

+0

Unicode不仅适用于emojis,当系统无法呈现流时,它可以是任何东西,因此它显示**?**。如果你想检查内部**?**使用这个Java库之一并解析它,在运行时 – 2017-10-10 00:13:44

+0

你可以请给我看一些代码使用其中之一? –