从CSV提取行基于文件的特定关键字

enter image description here我创建了一个代码，以帮助我检索从csv文件从CSV提取行基于文件的特定关键字

import re 
keywords = {"metal", "energy", "team", "sheet", "solar" "financial", "transportation", "electrical", "scientists", 
      "electronic", "workers"} # all your keywords 


keyre=re.compile("energy",re.IGNORECASE) 
with open("2006-data-8-8-2016.csv") as infile: 
    with open("new_data.csv", "w") as outfile: 
     outfile.write(infile.readline()) # Save the header 
     for line in infile: 
      if len(keyre.findall(line))>0: 
       outfile.write(line)

我需要它来查找每个关键字，其中有两个主要的列中的数据“位置“和”职位描述“，然后将包含这些单词的整行写入新文件中。关于如何以最简单的方式完成这些任何想法？

来源

2017-08-27 Eng.Reem

我需要它来看待所有的关键字，例如，它应该寻找包括“金属”字下的行“位置”和“工作描述”，然后提取整行并将它们写入文件中，然后查找第二个单词并执行相同操作直到最后一个单词 –

试试这个，在数据框中循环并将新的数据框写回csv文件。

import pandas as pd 

keywords = {"metal", "energy", "team", "sheet", "solar", "financial", 
     "transportation", "electrical", "scientists", 
     "electronic", "workers"} # all your keywords 

df = pd.read_csv("2006-data-8-8-2016.csv", sep=",") 

listMatchPosition = [] 
listMatchDescription = [] 

for i in range(len(df.index)): 
    if any(x in df['position'][i] or x in df['Job description'][i] for x in keywords): 
     listMatchPosition.append(df['position'][i]) 
     listMatchDescription.append(df['Job description'][i]) 


output = pd.DataFrame({'position':listMatchPosition, 'Job description':listMatchDescription}) 
output.to_csv("new_data.csv", index=False)

编辑：如果你有许多列添加，修改下面的代码将做的工作。

df = pd.read_csv("2006-data-8-8-2016.csv", sep=",") 

output = pd.DataFrame(columns=df.columns) 

for i in range(len(df.index)): 
    if any(x in df['position'][i] or x in df['Job description'][i] for x in keywords): 
    output.loc[len(output)] = [df[j][i] for j in df.columns] 

output.to_csv("new_data.csv", index=False)

来源

2017-08-27 11:47:48

请注意，如果“作业描述”不是只有一个单词，因为我认为它不是，与Dataframe.isin方法 –

相反，csv文件还包含其他列以及我需要提取并放入新文件的内容。任何想法如何？ @Vincent K –

你的意思是像“薪水”，“地点”这样的列需要一起提取？如果是的话，如果它只是更多的几列，只需添加更多listMatchxxx –

你可以做到这一点使用熊猫如下，如果你正在寻找含有关键字的列表中只有一个字行：

keywords = ["metal", "energy", "team", "sheet", "solar" "financial", "transportation", "electrical", "scientists", 
      "electronic", "workers"] 

# read the csv data into a dataframe 
# change "," to the data separator in your csv file 
df = pd.read_csv("2006-data-8-8-2016.csv", sep=",") 
# filter the data: keep only the rows that contain one of the keywords 
# in the position or the Job description columns 
df = df[df["position"].isin(keywords) | df["Job description"].isin(keywords)] 
# write the data back to a csv file 
df.to_csv("new_data.csv",sep=",", index=False)

如果你正在寻找的行子（例如，在寻找financial engineeringfinancial），那么你可以做到以下几点：

keywords = ["metal", "energy", "team", "sheet", "solar" "financial", "transportation", "electrical", "scientists", 
      "electronic", "workers"] 
searched_keywords = '|'.join(keywords) 

# read the csv data into a dataframe 
# change "," to the data separator in your csv file 
df = pd.read_csv("2006-data-8-8-2016.csv", sep=",") 
# filter the data: keep only the rows that contain one of the keywords 
# in the position or the Job description columns 
df = df[df["position"].str.contains(searched_keywords) | df["Job description"].str.contains(searched_keywords)] 
# write the data back to a csv file 
df.to_csv("new_data.csv",sep=",", index=False)

来源

2017-08-27 11:56:39 MedAli

这很简单，看起来不错，我得到了代码。但它不会保存任何数据只有标题:(虽然我相信很多关键字都包含在文件中，具体位置和职位描述@MedAli –

@ Eng.Reem您可以分享您的数据样本吗？ – MedAli

这是行不通的，因为“职位说明”栏不仅仅是一个单词 –

从CSV提取行基于文件的特定关键字

回答

相关问题