0
我的程序设置为根据状态和其他变量下载URL的笛卡尔积,将zip文件(从创建的URL)保存到指定位置,检查zip文件中的数据(一些zip文件无需下载数据下载),写入特定文件,了解状态数据,然后在状态完成时写入文件。这是基于状态并行完成的,即阿拉巴马州和阿拉斯加州将平行进行上述操作。不过,我不断收到以下错误:Python并行问题
An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line statement', (179, 0))
发生的错误,当我重新开始,即以前没有运行的过程。如果我部分运行该过程,则不会发生这种情况。更具体地说,它随机发生。
这里是我的代码:
功能 -
def createURL(state, typ, geography, level, data, dictionary):
DATALIST = list(itertools.product(typ, geography, level, data))
TXTLIST = list(itertools.product(typ, dictionary))
DEFLIST = list(itertools.product(typ))
DATALINKS = []
for data in DATALIST:
result = 'URL'
DATALINKS.append(result)
TXTLINKS = []
for txt in TXTLIST:
links = 'URL'
TXTLINKS.append(links)
DEFLINKS = []
for defl in DEFLIST:
definitions = 'URL'
DEFLINKS.append(definitions)
URLLINKS = DATALINKS + TXTLINKS + DEFLINKS
return URLLINKS
def downloadData(state, TYPE, GEOGRAPHY, LEVEL, DATA, \
DICTIONARY, YEAR, QUARTER, completedStates):
print ('Working on state: ', state)
URLLINKS = createURL(state, TYPE, GEOGRAPHY, LEVEL, DATA, DICTIONARY)
DIRECTORY = '/home/justin/QWI/' + YEAR + 'Q' + QUARTER + '/' + state
if not os.path.exists(DIRECTORY[:-2]):
os.makedirs(DIRECTORY[:-2])
if not os.path.exists(DIRECTORY):
os.makedirs(DIRECTORY)
downLoadedURLs = DIRECTORY[:-2] + 'downLoadedURLs.txt'
if not os.path.isfile(downLoadedURLs):
with open(downLoadedURLs, 'a') as downloaded:
downloaded.write('')
with open(downLoadedURLs) as downloaded:
URLcontent = downloaded.read().splitlines()
URLLINKS = [x for x in URLLINKS if x not in URLcontent]
for url in URLLINKS:
print ('Downloading data: ', url)
save = DIRECTORY + '/' + os.path.basename(url)
urllib.urlretrieve(url, save)
with open(downLoadedURLs, 'a') as downloaded:
downloaded.write('{}\n'.format(url))
if os.stat(save).st_size == 0:
shutil.rmtree(DIRECTORY)
with open(DIRECTORY[:-2] + '/zeroDataStates.txt', 'a') as zeroData:
zeroData.write('{}\n'.format(state))
break
with open(completedStates, 'a') as completedState:
completedState.write('{}\n'.format(state))
这里是并行代码:
from joblib import Parallel, delayed
STATE = ['al', 'ak', etc...]
Parallel(n_jobs = CORES)(delayed(downloadData)\
(state, TYPE, GEOGRAPHY, LEVEL, DATA, DICTIONARY, YEAR, QUARTER,
completedStates) for state in STATE)
我相信写入文件或获取当错误发生时任网址。
谢谢你的回应。然而,这并没有解决这个问题,因为我仍然得到了上述错误,即4次中的1次。我转换了将代码并行化到UNIX命令行的方式,例如,我通过命令行传递状态并从那里并行运行程序。 –
因此,您可能在文件末尾附近有另一个多行语句。在你的代码中搜索'''''''' –