2017-03-07 56 views
-1

存在的其他解决方案对我而言并不适用。我想将一个csv文件与一个json文件进行比较,以查看json文件是否具有csv文件中的任何字符串。将json转换为csv文件时出错python

我试过(改编自其它计算器后)

jsoned = json.loads(x) 

with open("test.csv", "wb+") as csv_file: 
    csv_writer = csv.writer(csv_file) 
    for i in jsoned: 
     csv_writer.writerow([i[u'tag'], 
          i[u'newtag']]) 

但它不工作。我会更好地走另一条路线并将csv变成json吗?

编辑

JSON文件:

{"tag":["security architecture","systems security engineering","architecture","program protection planning (ppp)","system security engineering","security engineering"],"newtag":["security","architecture engineering & policy","certified ethical hacker","security policy and risk management","sse","enterprise transition plan","plan","tax","capacity analysis"]} 

CSV:

id tag 
88 systems engineering 
88 project management 
88 program management 
88 strategic planning 
88 requirements analysis 
88 acquisition 
88 enterprise architecture 
134 java 
134 software engineering 
134 software development 
134 xml 
134 c++ 
134 sql 
134 web services 
134 javascript 
134 linux 
134 html 
134 python 
134 c 
134 c# 
134 software architecture 
134 eclipse 
134 jquery 
134 oracle 
134 perl 
161 project management 
161 systems engineering 
161 requirements engineering 
161 requirements management 

我想看到哪个ID JSON文件的火柴最多用(所以我想知道每个ID有多少标签匹配),但我不知道如何处理比较json和csv

+1

定义“它不工作” – Cfreak

+2

任何机会我们可以看到json和csv文件的格式? – RichSmith

+2

没有比较发生,你只是写入文件。究竟应该比较什么? – roganjosh

回答

1

我可能误解了你的问题,但希望这至少可以让你开始。

我很确定必须有更好的方法来做到这一点,但这是一种做法。

首先,加载你的数据,把你的csv数据放到一个嵌套列表中,把你的json数据转换成字典。然后获取csv文件中的所有唯一ID。

浏览每个唯一ID的csv文件并计算json标签中存在的标签数量。

如果计数大于当前最大值,则将该ID存储为最佳值。

循环完成后,您应该拥有json标记中存在最多标记的ID。

# load csv data 
with open("csvdata.csv") as csvFile: 
    reader = csv.reader(csvFile) 
    loadedCSV = [row for row in reader] 

# load json data and get list of tags 
jsonTags = json.load("jsonFile.json")["tags"] 

# create a unique list of ids from csv file 
uniqueIDs = list(set([row[0] for row in loadedCSV]])) 

# best match so far 
selectedID = None 

# keep track of best count 
maxCount = 0 

# go through ids 
for id in uniqueIDs: 

    # count for specific ID 
    idCount = 0 

    # go through tags in csv and add one to count if in json tags 
    for row in loadedCSV: 
     if row[0] == id: 
      if row[1] in jsonTags: 
       idCount += 1 
    # compare count to current max 
    if idCount > maxCount: 
     selectedID = id 
+0

的匹配感谢RichSmith,我会研究这一点。我有类似的代码,但与JSON挣扎。 – user6754289

+0

当我打印selectedID或maxCount时,虽然没有出现。它似乎在idCount = + 1后中断。是因为它与csv中的id而不是标签进行比较? – user6754289

+0

@karmesto可能值得在循环之前打印出唯一标识,loadedCSV和jsonTags,以确保它们的格式正确并正确加载,我会在早上检查其余代码:) – RichSmith