2015-05-05 24 views
0

我的程序设置为根据状态和其他变量下载URL的笛卡尔积,将zip文件(从创建的URL)保存到指定位置,检查zip文件中的数据(一些zip文件无需下载数据下载),写入特定文件,了解状态数据,然后在状态完成时写入文件。这是基于状态并行完成的,即阿拉巴马州和阿拉斯加州将平行进行上述操作。不过,我不断收到以下错误:Python并行问题

An unexpected error occurred while tokenizing input 
The following traceback may be corrupted or invalid 
The error message is: ('EOF in multi-line statement', (179, 0)) 

发生的错误,当我重新开始,即以前没有运行的过程。如果我部分运行该过程,则不会发生这种情况。更具体地说,它随机发生。

这里是我的代码:

功能 -

def createURL(state, typ, geography, level, data, dictionary): 

    DATALIST = list(itertools.product(typ, geography, level, data)) 
    TXTLIST  = list(itertools.product(typ, dictionary)) 
    DEFLIST  = list(itertools.product(typ)) 

    DATALINKS = [] 
    for data in DATALIST: 
     result = 'URL' 

    DATALINKS.append(result) 

    TXTLINKS = [] 
    for txt in TXTLIST: 
      links = 'URL' 
    TXTLINKS.append(links) 


    DEFLINKS = [] 
    for defl in DEFLIST: 
     definitions = 'URL' 

    DEFLINKS.append(definitions) 

     URLLINKS = DATALINKS + TXTLINKS + DEFLINKS 
     return URLLINKS 


def downloadData(state, TYPE, GEOGRAPHY, LEVEL, DATA, \ 
      DICTIONARY, YEAR, QUARTER, completedStates): 
    print ('Working on state: ', state)  

    URLLINKS = createURL(state, TYPE, GEOGRAPHY, LEVEL, DATA, DICTIONARY) 

    DIRECTORY = '/home/justin/QWI/' + YEAR + 'Q' + QUARTER + '/' + state 
    if not os.path.exists(DIRECTORY[:-2]): 
     os.makedirs(DIRECTORY[:-2]) 

    if not os.path.exists(DIRECTORY): 
     os.makedirs(DIRECTORY) 

    downLoadedURLs = DIRECTORY[:-2] + 'downLoadedURLs.txt' 
    if not os.path.isfile(downLoadedURLs): 
     with open(downLoadedURLs, 'a') as downloaded: 
      downloaded.write('') 


    with open(downLoadedURLs) as downloaded: 
     URLcontent = downloaded.read().splitlines() 


    URLLINKS = [x for x in URLLINKS if x not in URLcontent] 

    for url in URLLINKS: 
     print ('Downloading data: ', url) 
     save = DIRECTORY + '/' + os.path.basename(url) 

     urllib.urlretrieve(url, save) 
     with open(downLoadedURLs, 'a') as downloaded: 
      downloaded.write('{}\n'.format(url)) 

     if os.stat(save).st_size == 0: 
      shutil.rmtree(DIRECTORY) 
      with open(DIRECTORY[:-2] + '/zeroDataStates.txt', 'a') as zeroData: 
      zeroData.write('{}\n'.format(state)) 
     break 

    with open(completedStates, 'a') as completedState: 
     completedState.write('{}\n'.format(state)) 

这里是并行代码:

from joblib import Parallel, delayed 

STATE = ['al', 'ak', etc...] 

Parallel(n_jobs = CORES)(delayed(downloadData)\ 
    (state, TYPE, GEOGRAPHY, LEVEL, DATA, DICTIONARY, YEAR, QUARTER, 
    completedStates) for state in STATE) 

我相信写入文件或获取当错误发生时任网址。

回答

0
'EOF in multi-line statement' 

Python多行语句是以\结尾的语句。 EOF表示文件的结尾。所以你正在寻找一个在文件结束之前没有完成的多行语句。你举的例子代码包含正是在这个片段的第一行:

Parallel(n_jobs = CORES)(delayed(downloadData)\ 
    (state, TYPE, GEOGRAPHY, LEVEL, DATA, DICTIONARY, YEAR, QUARTER, 
    completedStates) for state in STATE) 

它看起来像括号将跨行明确解析,所以你应该能够只是删除流氓\。你可能想要检查你的格式。格式不提供关于代码结构的任何线索。

+0

谢谢你的回应。然而,这并没有解决这个问题,因为我仍然得到了上述错误,即4次中的1次。我转换了将代码并行化到UNIX命令行的方式,例如,我通过命令行传递状态并从那里并行运行程序。 –

+0

因此,您可能在文件末尾附近有另一个多行语句。在你的代码中搜索'''''''' –