2017-09-07 46 views
1

我想从一个数据帧列是一个嵌套的字典列和值的列表:读取数据时出错框字典类型错误:字符串索引必须是整数,不能海峡,

数据框架柱看起来像这样的:

{"id":"0","request":{"plantSearch":"true","maxResults":"51","caller":"WMS","companyCode":"GB54","purchOrg":"UPSO","Code":"5852","confidential":"false","flag":"true","service":"false","Item":"false","mastered":"true","copas":"false","pscmBlock":"false","descOperator":"CO","assocManuf":"PETK"},"response":{"hasMoreResults":"false","resultsCount":"0","execTime":"878 ms"}} 

我写代码:

s1.columns = ['data'] 
l2 = [] 
for idx, row in s1['data'].iteritems(): 
    tempdf = pd.DataFrame(row['request']['plantSearch']) 
    tempdf['maxResults'] = row['maxResults'] 
    l2.append(tempdf) 


pd.concat(l2,axis = 0) 

的问题是Python是指在“行”的字符串,即使它是一本字典。

回答

0

我认为你可以使用json.loads转换到dictDataFrame构造函数解析所有数据来自request键:

df = pd.DataFrame({'data':['{"id":"0","request":{"plantSearch":"true","maxResults":"51","caller":"WMS","companyCode":"GB54","purchOrg":"UPSO","Code":"5852","confidential":"false","flag":"true","service":"false","Item":"false","mastered":"true","copas":"false","pscmBlock":"false","descOperator":"CO","assocManuf":"PETK"},"response":{"hasMoreResults":"false","resultsCount":"0","execTime":"878 ms"}}','{"id":"0","request":{"plantSearch":"true","maxResults":"51","caller":"WMS","companyCode":"GB54","purchOrg":"UPSO","Code":"5852","confidential":"false","flag":"true","service":"false","Item":"false","mastered":"true","copas":"false","pscmBlock":"false","descOperator":"CO","assocManuf":"PETK"},"response":{"hasMoreResults":"false","resultsCount":"0","execTime":"878 ms"}}']}) 
print (df) 

               data 
0 {"id":"0","request":{"plantSearch":"true","max... 
1 {"id":"0","request":{"plantSearch":"true","max... 

df1 =pd.DataFrame(df['data'].apply(lambda x: pd.io.json.loads(x)['request']).values.tolist()) 
print (df1) 

    Code Item assocManuf caller companyCode confidential copas descOperator \ 
0 5852 false  PETK WMS  GB54  false false   CO 
1 5852 false  PETK WMS  GB54  false false   CO 

    flag mastered maxResults plantSearch pscmBlock purchOrg service 
0 true  true   51  true  false  UPSO false 
1 true  true   51  true  false  UPSO false 

类似的解决方案:

df = pd.DataFrame([pd.io.json.loads(x)['request'] for x in df['data']]) 
print (df) 

    Code Item assocManuf caller companyCode confidential copas descOperator \ 
0 5852 false  PETK WMS  GB54  false false   CO 
1 5852 false  PETK WMS  GB54  false false   CO 

    flag mastered maxResults plantSearch pscmBlock purchOrg service 
0 true  true   51  true  false  UPSO false 
1 true  true   51  true  false  UPSO false 

最后是按子集可能选择列:

cols = ['plantSearch','maxResults'] 
df2 = df[cols] 
print (df2) 
    plantSearch maxResults 
0  true   51 
1  true   51 
+0

感谢您的解决方案。不过,我也需要Id列。你能不能请告诉如何得到那个。 – r228302

+0

我觉得最后需要'df3 = df2.join(df ['id'])''。我只用手机,所以未经测试。 – jezrael

+0

我有另一个问题。我有数百这样的行,其中一行导致值错误,因为数据格式不正确{“key”:“123,err,1”,wersd“}。有没有什么办法可以跳过这些行和继续剩下吗? – r228302

相关问题