2013-07-29 26 views
0

这实际上是我在几天前问的question的延续。我采用了应用函子的路线并创建了自己的实例。json在haskell中的解析第2部分 - lambda中的非穷举模式

我需要在文件中解析大量的json语句,一行接一行。一个例子JSON声明是这样的 -

{"question_text": "How can NBC defend tape delaying the Olympics when everyone has 
Twitter?", "context_topic": {"followers": 21, "name": "NBC Coverage of the London 
Olympics (July & August 2012)"}, "topics": [{"followers": 2705, 
"name": "NBC"},{"followers": 21, "name": "NBC Coverage of the London 
Olympics (July & August 2012)"}, 
{"followers": 17828, "name": "Olympic Games"}, 
{"followers": 11955, "name": "2012 Summer Olympics in London"}], 
"question_key": "AAEAABORnPCiXO94q0oSDqfCuMJ2jh0ThsH2dHy4ATgigZ5J", 
"__ans__": true, "anonymous": false} 

遗憾的JSON格式。它变得不好

我有大约10000个这样的json语句,我需要解析它们。我写的代码是 这样的事情 -

parseToRecord :: B.ByteString -> Question 
parseToRecord bstr = (\(Ok x) -> x) decodedObj where decodedObj = decode (B.unpack bstr) :: Result Question 

main :: IO() 
main = do 
    -- my first line in the file tells how many json statements 
    -- are there followed by a lot of other irrelevant info... 
    ts <- B.getContents >>= return . fst . fromJust . B.readInteger . head . B.lines 
    json_text <- B.getContents >>= return . tail . B.lines 
    let training_data = take (fromIntegral ts) json_text 
    let questions = map parseToRecord training_data 
    print $ questions !! 8922 

此代码给我一个运行时错误Non-exhaustive patterns in lambda。代码中的错误引用\(Ok x) -> x。通过命中和试用,我得出的结论是,该程序运行良好,直到第8921次索引并且在第8922次迭代中失败。

我检查了相应的json语句,试图通过调用它的函数来解析它,并且它可以工作。但是,当我拨打地图时,它不起作用。我真的不明白发生了什么事。在“学习Haskell为了一件好事”中学习了一点Haskell之后,我想深入一个真实世界的编程项目,但似乎陷入了困境。

编辑::完整的代码如下

{-# LANGUAGE BangPatterns #-} 
{-# OPTIONS_GHC -O2 -optc-O2 #-} 
{-# OPTIONS_GHC -fno-warn-incomplete-uni-patterns #-} 
import qualified Data.ByteString.Lazy.Char8 as B 
import Data.Maybe 
import NLP.Tokenize 

import   Control.Applicative 
import   Control.Monad 
import   Text.JSON 

data Topic = Topic 
    { followers :: Integer, 
    name :: String 
    } deriving (Show) 

data Question = Question 
    { question_text :: String, 
    context_topic :: Topic, 
    topics :: [Topic], 
    question_key :: String, 
    __ans__ :: Bool, 
    anonymous :: Bool 
    } deriving (Show) 

(!) :: (JSON a) => JSObject JSValue -> String -> Result a 
(!) = flip valFromObj 

instance JSON Topic where 
    -- Keep the compiler quiet 
    showJSON = undefined 

    readJSON (JSObject obj) = 
    Topic  <$> 
    obj ! "followers" <*> 
    obj ! "name" 
    readJSON _ = mzero 

instance JSON Question where 
    -- Keep the compiler quiet 
    showJSON = undefined 

    readJSON (JSObject obj) = 
    Question  <$> 
    obj ! "question_text" <*> 
    obj ! "context_topic" <*> 
    obj ! "topics" <*> 
    obj ! "question_key" <*> 
    obj ! "__ans__" <*> 
    obj ! "anonymous" 
    readJSON _ = mzero 

isAnswered (Question _ _ _ _ status _) = status 
isAnonymous (Question _ _ _ _ _ status) = status 

parseToRecord :: B.ByteString -> Question 
parseToRecord bstr = handle decodedObj 
         where handle (Ok k)  = k 
           handle (Error e) = error (e ++ "\n" ++ show bstr) 
           decodedObj = decode (B.unpack bstr) :: Result Question 
--parseToRecord bstr = (\(Ok x) -> x) decodedObj where decodedObj = decode (B.unpack bstr) :: Result Question 

main :: IO() 
main = do 
    ts <- B.getContents >>= return . fst . fromJust . B.readInteger . head . B.lines 
    json_text <- B.getContents >>= return . tail . B.lines 
    let training_data = take (fromIntegral ts) json_text 
    let questions = map parseToRecord training_data 
    let correlation = foldr (\x acc -> if (isAnonymous x == isAnswered x) then (fst acc + 1, snd acc + 1) else (fst acc, snd acc + 1)) (0,0) questions 
    print $ fst correlation 

这里是它可以作为输入的可执行data。我正在使用ghc 7.6.3。如果程序名称是ans.hs,我遵循这些步骤。

$ ghc --make ans.hs 
$ ./ans < path/to/the/file/sample/answered_data_10k.in 

非常感谢!

回答

2

lambda函数(\(Ok x) -> x)的部分之处在于,它只能匹配已成功解码的对象。如果您遇到此问题,则表示您的分析器由于某种原因无法解析记录。

使parseToRecord函数更多的信息帮助您查找错误。尝试实际报告错误,而不是报告失败的模式匹配。

parseToRecord :: B.ByteString -> Question 
parseToRecord bstr = handle decodedObj 
    where handle (Ok k) = k 
      handle (Error e) = error e 
      decodedObj = decode (B.unpack bstr) :: Result Question 

如果您需要更多帮助,可能需要包含解析器代码。

更新

基于您的代码和样本JSON,它看起来像你的代码是第一次失败 当它在你的JSONcontext_topic领域遇到null。 您当前的代码无法处理null,因此无法解析。我的修复会 是类似于以下内容,但您可以想出其他方式来处理它 。

data Nullable a = Null 
       | Full a 
    deriving (Show) 

instance JSON a => JSON (Nullable a) where 
    showJSON Null  = JSNull 
    showJSON (Full a) = showJSON a 

    readJSON JSNull = Ok Null 
    readJSON c  = Full `fmap` readJSON c 

data Question = Question 
    { question_text :: String, 
    context_topic :: Nullable Topic, 
    topics :: [Topic], 
    question_key :: String, 
    __ans__ :: Bool, 
    anonymous :: Bool 
    } deriving (Show) 

它似乎也未能就行9002,那里是 该行的“1000”裸价值,它似乎是该行经过数JSON值缺乏 '__ans__'领域。

+0

是的,我试过了。它报告 '结果:MonadPlus.empty' – shashydhar

+1

也许你应该在你的文章中完成代码,以便测试它。此外,它可能有助于知道哪部分字符串不解析。因此,而不是仅仅打印错误消息,您可以打印s.th.例如:'error(e ++“\ n”++ show bstr)' – ichistmeinname

+0

感谢您检查问题。我已经完成了代码供您测试。另外,这里是json文件的链接。 (zip文件中带.in类型的文件) - http://hr-testcases.s3.amazonaws.com/688/sample.zip – shashydhar

0

我会建议以解析空值使用Maybe

data Question = Question 
    { question_text :: String 
    , context_topic :: Maybe Topic 
    , topics :: [Topic] 
    , question_key :: String 
    , __ans__ :: Bool 
    , anonymous :: Bool 
    } deriving (Show) 

,然后改变readJSON功能如下(此外,缺少 -fields可以通过返回固定False上的不成功尝试解析):

instance JSON Question where 
    -- Keep the compiler quiet 
    showJSON = undefined 

    readJSON (JSObject obj) = Question <$> 
    obj ! "question_text" <*> 
    (fmap Just (obj ! "context_topic") <|> return Nothing) <*> 
    obj ! "topics" <*> 
    obj ! "question_key" <*> 
    (obj ! "__ans__" <|> return False) <*> 
    obj ! "anonymous" 
    readJSON _ = mzero 

符合9000出头摆脱1000后(如sabauma提到) ,结果我得到4358。所以也许这些微小的变化就足够了?

+0

通过利用'm >> = return的事实,你可以让你的'context_topic'行更具可读性。 f == fmap f m'。我会用'Maybe'去,但它的'JSON'实例并不是我所期望的(即不是我为'Nullable'实现它的方式)。 – sabauma

+0

啊,谢谢你的提示。 我不确定它的行为是否如所希望的那样,但整个数据都被成功分析了。所以我猜想它工作得很好。 – ichistmeinname