删除csv文件中的换行符

我有一个csv文件，每行以（@）开头，行中的所有字段用（;）分隔。其中一个包含“Text”（“”[]“”）的字段有一些换行符，导致整个csv文件导致excel或访问时出现错误。换行符之后的文本被视为独立行，而不是遵循表的结构。删除csv文件中的换行符

@4627289301; Lima, Peru; 490; 835551022915420161; Sat Feb 25 18:04:22 +0000 2017; ""[OJO! 
la premiacin de los #Oscar, nuestros amigos de @cinencuentro revisan las categoras. 
+info: co/plHcfSIfn8]""; 0 
@624974422; None; 114; 835551038581137416; Sat Feb 25 18:04:26 +0000 2017; ""[Porque nunca dejamos de amar]""; 0

使用python脚本的任何帮助？或任何其他解决方案...

作为输出我想有行：

@4627289301; Lima, Peru; 490; 835551022915420161; Sat Feb 25 18:04:22 +0000 2017; ""[OJO! la premiacin de los #Oscar, nuestros amigos de @cinencuentro revisan las categoras. +info: co/plHcfSIfn8]""; 0 
@624974422; None; 114; 835551038581137416; Sat Feb 25 18:04:26 +0000 2017; ""[Porque nunca dejamos de amar]""; 0

任何帮助吗？我是一个csv文件（54MB），有很多带换行符的行...其他一些行也可以...

来源

2017-03-08 luisec

还有，@里面的评论是不是要考虑的呢？ –

我想获得所有结构为示例第二行的行（@ 624914422 ...） – luisec

您是否尝试过任何操作？似乎有一个相当简单的方法来开始。逐行读取并放弃'@'并用';'分割或使用csv模块。 –

您应该分享您的预期输出。

无论如何，我建议你先清理你的文件以删除换行符。然后你可以把它看成csv。一种解决方案可以是（我相信有人会提出更好的东西:-)）

清洁文件（在Linux上）：

sed ':a;N;$!ba;s/\n/ /g' input_file | sed "s/ @/\[email protected]/g" > output_file

读取文件为CSV（你可以使用任何阅读另一种方法）

import pandas as pd 
df = pd.read_csv('output_file', delimiter=';', header=None) 
df.to_csv('your_csv_file_name', index=False)

让我们来看看它是否有助于你:-)

来源

2017-03-08 05:47:41 Pintu

谢谢@PaulRooney – Pintu

使用窗口 – luisec

您可以搜索对于L后跟一行不以“@”开头的行，就像这样\r?\n+([email protected]\d+;)。

以下是由此regex101 demo生成的。它用空格替换这样的行结束。你可以改变它为任何你喜欢的。

# coding=utf8 
# the above tag defines encoding for this document and is for Python 2.x compatibility 

import re 

regex = r"\r?\n+([email protected]\d+;)" 

test_str = ("@4627289301; Lima, Peru; 490; 835551022915420161; Sat Feb 25 18:04:22 +0000 2017; \"\"[OJO!\n" 
    "la premiacin de los #Oscar, nuestros amigos de @cinencuentro revisan las categoras.\n" 
    "+info: co/plHcfSIfn8]\"\"; 0\n" 
    "@624974422; None; 114; 835551038581137416; Sat Feb 25 18:04:26 +0000 2017; \"\"[Porque nunca dejamos de amar]\"\"; 0") 

subst = " " 

# You can manually specify the number of replacements by changing the 4th argument 
result = re.sub(regex, subst, test_str, 0, re.MULTILINE) 

if result: 
    print (result) 

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

来源

2017-03-08 06:01:47

它不起作用，演示时添加一些字符到真正的文本行创建代码时... – luisec

@luisec我不明白这是什么意思？如果它意味着它增加了一个空格，那是因为'subst =“”'，你可以将它改为“”，或者任何你可能想用 –

代替它的地方。在文本“OJO！”之后的第一行。该示例在用示例导出代码时添加了“\ n”（来自演示，我已经尝试过了）...原始行没有这些字符来标识行的中断位置... – luisec

删除csv文件中的换行符

回答

相关问题