如果数据之间的空间量是固定的,丢失的数据仅仅是一个空间,你可以这样做:
>>> s="AAA B C E F G H J "
>>> s.split(" ")
['AAA', 'B', 'C', '', ' E', 'F', 'G', 'H', '', ' J ']
编辑
假设之间的连续2个数据的空间中的所有文件不变,我给你这个
使这个文件为例:missing.txt
AAA B C D E F G H I J
AAA B C D E F G H I J
AAA B C E F G H J
AAA B C E F G H
100 2 3 4 5 6 7 8 9 10
100 2 3 5 6 7 8 9 10
100 2 3 5 6 7 8 10
100 2 3 5 6 7 8
100.1 2.1 3.1 4.1 5.1 6.1 7.1 8.1 9.1 10.1
100.1 2.1 3.1 5.1 6.1 7.1 8.1 9.1 10.1
100.1 2.1 3.1 5.1 6.1 7.1 8.1 10.1
100.1 2.1 3.1 5.1 6.1 7.1 8.1
hello this is a example of a normal file right?
hello this is example of a normal file right?
hello this is example of a normal right?
hello this is example of a normal
,并用此功能
def read_data_line(path_file, data_size=10, line_format=None, temp_char="@", ignore=True):
"""Generator to read data_size data from a file that may have some missing
path_file: path to the file
line_format: list with the space between 2 consecutive data
temp_char: character that this function will use as placeholder for
the missing data during procesing
data_size: amount of data expected per line of the file
ignore: in case that 'line_format' is not given, ignore all
lines that don't have the correct format, otherwise
is expected that the first line have the correct
format to use it a model for the rest of the file
Expected format of the content of the file:
A B C D E F G H I J
with A,B,...,J strings without space or 'temp_char' or numbers
This function assume that the space between 2 consecutive
data is constant in all the file
usage
>>> datos = list(read_data_line("/some_folder/some_file.txt")
or
>>> for line in read_data_line("/some_folder/some_file.txt"):
print(line)"""
with open(path_file,"r") as data_raw: #this is the usual way of managing files
for line in data_raw: #here you read each line of the file one by one
datos = line.split()
if not line_format and len(datos)==data_size: #I have all the data, and I assume this structure is the norm
line = line.strip()
for d in datos:
line = line.replace(d,temp_char,1)
line_format = [ len(x) for x in line.split(temp_char)[1:-1] ]
if len(datos) < data_size: #missisng data
if line_format:
for t in line_format:
line = line.replace(" "*t,temp_char,1)
datos = list(map(str.strip,line.split(temp_char)))
else:
if ignore:
continue
raise RuntimeError("Imposible determinate the structure of file")
yield datos
输出
>>> for x in read_data_line("missing.txt"):
print(x)
['AAA', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J']
['AAA', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J']
['AAA', 'B', 'C', '', 'E', 'F', 'G', 'H', '', 'J']
['AAA', 'B', 'C', '', 'E', 'F', 'G', 'H']
['']
['100', '2', '3', '4', '5', '6', '7', '8', '9', '10']
['100', '2', '3', '', '5', '6', '7', '8', '9', '10']
['100', '2', '3', '', '5', '6', '7', '8', '', '10']
['100', '2', '3', '', '5', '6', '7', '8', '', '']
['']
['100.1', '2.1', '3.1', '4.1', '5.1', '6.1', '7.1', '8.1', '9.1', '10.1']
['100.1', '2.1', '3.1', '', '5.1', '6.1', '7.1', '8.1', '9.1', '10.1']
['100.1', '2.1', '3.1', '', '5.1', '6.1', '7.1', '8.1', '', '10.1']
['100.1', '2.1', '3.1', '', '5.1', '6.1', '7.1', '8.1', '', '']
['']
['hello', 'this', 'is', 'a', 'example', 'of', 'a', 'normal', 'file', 'right?']
['hello', 'this', 'is', '', 'example', 'of', 'a', 'normal', 'file', 'right?']
['hello', 'this', 'is', '', 'example', 'of', 'a', 'normal', '', 'right?']
['hello', 'this', 'is', '', 'example', 'of', 'a', 'normal', '', '']
>>>
希望如果你有你的数据之间的间隔一致的号码和丢失的数据被替换为一个空格(如示例中)能够解决您的问题
于是,缺少数据线具有其中的数据应该有一个空格,或没有空间,而下一个列移动到左侧? –
文件的格式是什么?逗号分隔或制表符分隔,看起来像? – Llopis
如果有空格作为分隔符,则无法执行此操作。 –