2017-03-22 130 views
0

我们正在研究一个程序,用于从一组没有一致架构设置的服务器中获取幻灯片图像数据(我担心这是无效的,但我不熟练足够打电话)。我们没有影响作为独立无关的研究人员的服务器。 (数据回到90年代),通过大量的表格(n> 50)手动输入数据(大部分)。这里是一个响应的例子:,我知道该如何处理呢将动态JSON数据高效反序列化为数据表

{ 
"form12873": [ 

    { 
     "id": "9202075838", 
     "timestamp": "2015-06-25 10:24:51", 
     "user_agent": "Mozilla\/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit\/600.6.3 (KHTML, like Gecko) Version\/8.0.6 Safari\/600.6.3", 
     "remote_addr": "[Re.dact.ed]", 
     "processed": "1", 
     "data": { 
      "33885124": { 
       "field": "33885124", 
       "value": "CDat Lab", 
       "flat_value": "CDat Lab", 
       "label": "Completed by:", 
       "type": "select" 
      }, 

      ''**Several more fields as above**''... 

      "33884660": { 
       "field": "33884660", 
       "value": { 
        "slideX": "2456123", 
        "slideY": "456632", 
        "label": "K-20150322148", 
        "approved": "1", 
        "score": "30144" 
       }, 
       "flat_value": "slideX = 2456123\nslideY = 456632\nlabel = K-20150322148\napproved = 1\nscore = 30144", 
       "label": "Slide Stats:", 
       "type": "slidestats" 
      }, 

      ''**Some of the fields are as above... 

      "31970564": { 
       "field": "31970564", 
       "value": [ 
        "System", 
        "Crated", 
        "Mirax", 
        "NanoZoomer", 
        "ThinPrep", 
        "Aperio", 
        "Intellisite" 

       ], 
       "flat_value": "System\nCrated\nMirax\nNanoZoomer\nThinPrep\nAperio\nIntellisite", 
       "label": "System Information", 
       "type": "checkbox" 
      }, 

      ''**Some of the values are Arrays... 

      "33883781": { 
       "field": "33883781", 
       "selection": "Retain", 
       "label": "4. Retain\/Remove\/Review", 
       "type": "selectdrop" 
      }, 

      ''**Some of the fields don't have the same children 

      "52792890": { 
       "field": "52792890", 
       "image": "'A really large byte[], removed for ease of reading'", 
       "type": "image" 
      } 

      ''**Somewhere near the end of each response is the actual image... 
     } 
    }, 

    { 
     "id": "33884681", 
      ''**Then it continues on as above until the end: 
    } 
], "total": 170, "pages": 5, "pretty_id": "478125624983" } 

时,我已经能够model/class for the structure of the JSON过去(做一个数据类领域,价值等定义)。

喜欢尝试的解决方案:

var result = JsonConvert.DeserializeObject<List<Dictionary<string, 
          Dictionary<string, string>>>>(content); 

总是导致阵列错误或铸的问题(即使加直接管型)。我能够得到实际first array using

Public Shared Function Tabulate(json As String) As DataTable 
    Dim jsonLinq = Newtonsoft.Json.Linq.JObject.Parse(json) 

    ' Find the first array using Linq 

    Dim srcArray = jsonLinq.Descendants().Where(Function(d) TypeOf d Is JArray).First() 
    Dim trgArray = New Newtonsoft.Json.Linq.JArray() 
    For Each row As JObject In srcArray.Children(Of JObject)() 
     Dim cleanRow = New JObject() 
     For Each column As JProperty In row.Properties() 
      ' Only include JValue types 
      If TypeOf column.Value Is JValue Then 
       cleanRow.Add(column.Name, column.Value) 
      End If 
     Next 

     trgArray.Add(cleanRow) 
    Next 


    Return JsonConvert.DeserializeObject(Of DataTable)(trgArray.ToString()) 
End Function 

我的最终目标也得到一个数据表,循环/图像字节有我关心试图regressively下井的儿童。然后,我尝试使用第一个数组进行反序列化,然后就出现了。

如果有一个快速的方法来处理这个问题,我很喜欢这个解决方案。如果问题在于我试图处理废话JSON,我很喜欢参考当前标准被破坏的地方(所以我至少可以试着让其他机构改变他们的服务器)。也就是说,无论如何,我可能不得不处理它,即使它是循环。

*注意:该项目是在VB.net中启动的,所以我们保持这种方式,但我可能决定移植到C#。代码中的任何一个都会很棒。

下面是应该可用于测试的Json的未标记示例。我的最终目标就是击败成一个DataTable这样的:

{ 
"form12873": [ 
    { 
     "id": "9202075838", 
     "timestamp": "2015-06-25 10:24:51", 
     "user_agent": "Mozilla\/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit\/600.6.3 (KHTML, like Gecko) Version\/8.0.6 Safari\/600.6.3", 
     "remote_addr": "[Re.dact.ed]", 
     "processed": "1", 
     "data": { 
      "33885124": { 
       "field": "33885124", 
       "value": "CDat Lab", 
       "flat_value": "CDat Lab", 
       "label": "Completed by:", 
       "type": "select" 
      }, 
      "33884660": { 
       "field": "33884660", 
       "value": { 
        "slideX": "2456123", 
        "slideY": "456632", 
        "label": "K-20150322148", 
        "approved": "1", 
        "score": "30144" 
       }, 
       "flat_value": "slideX = 2456123\nslideY = 456632\nlabel = K-20150322148\napproved = 1\nscore = 30144", 
       "label": "Slide Stats:", 
       "type": "slidestats" 
      }, 
      "31970564": { 
       "field": "31970564", 
       "value": [ 
        "System", 
        "Crated", 
        "Mirax", 
        "NanoZoomer", 
        "ThinPrep", 
        "Aperio", 
        "Intellisite" 
       ], 
       "flat_value": "System\nCrated\nMirax\nNanoZoomer\nThinPrep\nAperio\nIntellisite", 
       "label": "System Information", 
       "type": "checkbox" 
      }, 



      "33883781": { 
       "field": "33883781", 
       "selection": "Retain", 
       "label": "4. Retain\/Remove\/Review", 
       "type": "select" 
      } 
     } 
    } 
], "total": 170, "pages": 5, "pretty_id": "478125624983" } 
+0

也许这里接受的答案会有帮助吗? http://stackoverflow.com/questions/947241/how-do-i-create-dynamic-properties-in-c – Muckeypuck

+0

@Muckeypuck只有在每个节点下的所有孩子都统一起来的情况下,这样做才有效吗? IE中“数据”下的项目没有相同的数量/类型的属性?我一直在尝试实现动态属性,但到目前为止,当我尝试链接解决方案时,我仍然在反序列化中失败。这可能是由于我缺乏理解,所以我会继续尝试。 –

+3

经过几种方法之后,我从糟糕/不可预测的JSON数据中读取的最佳解决方案是将其解析为一个JToken对象,并使用.SelectTokens和JSONPath来检索我需要的或发现它不存在的我的代码崩溃了。这对你来说是一种选择吗? – VBobCat

回答

1

下面的丑陋装置能够(大致)做到你想要的。将json源字符串作为参数提供给DeserializeToDataTable并收集结果数据表。它处理你的样品。我无法保证它能在其他数据中正常工作。这里的目的是提供一个工作启动工具包,您可以学习,理解,调试和适应您的需求。

Private Function DeserializeToDataTable(ByVal jsource As String) 
    Dim JRootObject = JObject.Parse(jsource) 
    Dim Children = JRootObject.SelectTokens("$..data.*").ToArray 
    Dim Records = Children.OfType(Of JObject).ToArray 
    Dim dicList As New List(Of Dictionary(Of String, Object)) 
    For Each rec In Records 
     dicList.Add(DeserializeToDictionary(rec)) 
    Next 
    Dim fieldnames = dicList.SelectMany(Function(d) d.Keys).Distinct.ToArray 
    Dim dt As New DataTable 
    For Each fieldname In fieldnames 
     dt.Columns.Add(fieldname, GetType(Object)) 
    Next 
    Dim row As DataRow 
    For Each dic In dicList 
     row = dt.NewRow 
     For Each kvp In dic 
      row.SetField(kvp.Key, kvp.Value) 
     Next 
     dt.Rows.Add(row) 
    Next 
    Return dt 
End Function 

Private Function DeserializeToDictionary(ByVal json_object As JObject) As Dictionary(Of String, Object) 
    Dim dic = New Dictionary(Of String, Object) 
    For Each field In json_object.Properties 
     Select Case field.Value.Type 
      Case JTokenType.Array 
       Dim subobject = New JObject 
       Dim item = 0 
       For Each token In field.Value 
        subobject("item" & item) = token 
        item += 1 
       Next 
       Dim subdic = DeserializeToDictionary(subobject) 
       For Each kvp In subdic 
        dic(kvp.Key) = kvp.Value 
       Next 
      Case JTokenType.Boolean 
       dic(field.Name) = field.Value.ToObject(Of Boolean) 
      Case JTokenType.Bytes 
       dic(field.Name) = field.Value.ToObject(Of Byte()) 
      Case JTokenType.Date 
       dic(field.Name) = field.Value.ToObject(Of Date) 
      Case JTokenType.Float 
       dic(field.Name) = field.Value.ToObject(Of Double) 
      Case JTokenType.Guid 
       dic(field.Name) = field.Value.ToObject(Of Guid) 
      Case JTokenType.Integer 
       dic(field.Name) = field.Value.ToObject(Of Integer) 
      Case JTokenType.Object 
       Dim subdic = DeserializeToDictionary(field.Value) 
       For Each kvp In subdic 
        dic(kvp.Key) = kvp.Value 
       Next 
      Case JTokenType.String 
       Try 
        dic(field.Name) = field.Value.ToObject(Of String) 
       Catch ex As Exception 
        dic(field.Name) = field.Value.ToObject(Of Object) 
       End Try 
      Case JTokenType.TimeSpan 
       dic(field.Name) = field.Value.ToObject(Of TimeSpan) 
      Case Else 
       dic(field.Name) = field.Value.ToString 
     End Select 
    Next 
    Return dic 
End Function 

使用上面的代码时,你必须意识到这一点:

  1. 它使用递归弄平多分支结构。所以,

    { 
        "A":"aaaa", 
        "B":"bbbb", 
        "C":{ 
          "D":"dddd", 
          "E":"eeee", 
          "F":"ffff" 
         } 
        } 
    } 
    

    将成为

    A |B |D |E |F 
    ----+----+----+----+---- 
    aaaa|bbbb|dddd|eeee|ffff 
    
  2. 我做的方式假设不会有压扁时重复;如果有这些,它会保留最后一个。所以,

    { 
        "A":"aaaa", 
        "B":"bbbb", 
        "C":{ 
          "D":"d1d1", 
          "E":"e1e1", 
          "F":"f1f1" 
         }, 
        "G":{ 
          "D":"d2d2", 
          "E":"e2e2", 
          "F":"f2f2" 
         } 
        } 
    } 
    

    将成为

    A |B |D |E |F 
    ----+----+----+----+---- 
    aaaa|bbbb|d2d2|e2e2|f2f2 
    

    这是一个明显的缺陷,错误行为,这将需要一个更复杂的方法,我批准你建我的划伤。

+0

我得到你要去的地方,并能看到这一点。只是为了清楚,这实际上并不使用'JsonPath'吗? –

+0

它在开始时使用JsonPath一次。然而,你的例子让我觉得简单的迭代可以使我们有一个类似于记录的对象集合。请记住我的代码的限制,但。有很多改进的余地。 – VBobCat

+0

没错,但它让我走上了正轨。随后对字典进行反序列化是真正的技巧。因为我讨厌自己,所以我也试图发布到这些服务器上,导致这个[问题](http://stackoverflow.com/questions/43097829/posting-an-array-parameter-using-restsharp)如果你有时间。 –

1

它可以添加DataColumnsDataTable即使它已经包含DataRows

我不会做JSON太多,但我的一般方法与狡猾的XML是分解成键 - 值对的流,其中关键是XPATH“地址”,值是节点的内容(不包括子节点),然后遍历流来构建DataTable。也许可以使用JSONPath在这里采取类似的方法。

+0

即使使用XML,你能举出一个这样的例子吗?我现在熟悉JSONPath。现在,当这些值本身是一个数组时,我正遇到一个正在经过第三个节点(即从上面的'form12873'->'data'->'value' - > {values})的问题。 –

+0

我添加了一个可用的JSON示例,但没有添加标记,这应该可用于测试。无论我将该节点作为Jobject还是Jarray进行处理,我都会一直遇到不能反序列化错误的情况。如果你能用上面的字符串做一个例子,我会接受。 –