蟒蛇中的高效张量收缩

我有张量的列表L（ndarray对象），每个都有几个索引。我需要根据连接图来收缩这些指数。蟒蛇中的高效张量收缩

的连接进行编码元组的列表形式((m,i),(n,j))表示“与张力L[n]的Ĵ个指数收缩张L[m]的我个指标。

哪有我处理非平凡的连通图？第一个问题是，只要我收缩一对索引，结果就是一个新的张量，它不属于列表L。但即使我解决了这个问题（例如通过赋予一个唯一的标识符到所有张量的所有指数），就可以选择任何顺序来执行收缩，以及一些选择在计算中期产量不必要的巨大野兽（即使最终结果很小）。建议？

来源

2017-02-03 Ziofil

除了内存方面的考虑，我相信你可以在einsum的单个调用中完成收缩，不过你需要一些预处理。我并不完全确定“”是什么意思，因为我收缩了一对指数，结果是一个新的张量，它不属于列表L“，但我认为在单一步骤中收缩会完全解决这个问题。

我建议使用可替代的einsum数字索引语法：

einsum(op0, sublist0, op1, sublist1, ..., [sublistout])

所以你需要做的是编码指数在整数序列收缩。首先，您需要首先设置一系列唯一索引，并将其他副本保留为sublistout。然后，遍历您的连接图，您需要在必要时将合同指数设置为相同的指数，并同时从sublistout中删除合同指数。

import numpy as np 

def contract_all(tensors,conns): 
    ''' 
    Contract the tensors inside the list tensors 
    according to the connectivities in conns 

    Example input: 
    tensors = [np.random.rand(2,3),np.random.rand(3,4,5),np.random.rand(3,4)] 
    conns = [((0,1),(2,0)), ((1,1),(2,1))] 
    returned shape in this case is (2,3,5) 
    ''' 

    ndims = [t.ndim for t in tensors] 
    totdims = sum(ndims) 
    dims0 = np.arange(totdims) 
    # keep track of sublistout throughout 
    sublistout = set(dims0.tolist()) 
    # cut up the index array according to tensors 
    # (throw away empty list at the end) 
    inds = np.split(dims0,np.cumsum(ndims))[:-1] 
    # we also need to convert to a list, otherwise einsum chokes 
    inds = [ind.tolist() for ind in inds] 

    # if there were no contractions, we'd call 
    # np.einsum(*zip(tensors,inds),sublistout) 

    # instead we need to loop over the connectivity graph 
    # and manipulate the indices 
    for (m,i),(n,j) in conns: 
     # tensors[m][i] contracted with tensors[n][j] 

     # remove the old indices from sublistout which is a set 
     sublistout -= {inds[m][i],inds[n][j]} 

     # contract the indices 
     inds[n][j] = inds[m][i] 

    # zip and flatten the tensors and indices 
    args = [subarg for arg in zip(tensors,inds) for subarg in arg] 

    # assuming there are no multiple contractions, we're done here 
    return np.einsum(*args,sublistout)

一个简单的例子：

>>> tensors = [np.random.rand(2,3), np.random.rand(4,3)] 
>>> conns = [((0,1),(1,1))] 
>>> contract_all(tensors,conns) 
array([[ 1.51970003, 1.06482209, 1.61478989, 1.86329518], 
     [ 1.16334367, 0.60125945, 1.00275992, 1.43578448]]) 
>>> np.einsum('ij,kj',tensors[0],tensors[1]) 
array([[ 1.51970003, 1.06482209, 1.61478989, 1.86329518], 
     [ 1.16334367, 0.60125945, 1.00275992, 1.43578448]])

如果有多个收缩，循环物流变得更加复杂一点，因为我们需要处理所有的副本。然而，逻辑是一样的。此外，上述显然缺少检查，以确保相应的指数可以约定。

在事后看来，我意识到默认的sublistout不必指定，einsum无论如何都使用该顺序。我决定在代码中保留这个变量，因为如果我们想要一个非平凡的输出索引顺序，我们必须适当地处理这个变量，它可能会派上用场。

至于收缩顺序的优化，可以在np.einsum影响内部优化为1.12版本（由@hpaulj在现在删除的注释说明）。该版本将optimize可选关键字参数引入np.einsum，允许选择缩减顺序，以内存为代价减少计算时间。由于optimize关键字会通过'greedy'或'optimal'关键字使numpy选择一个缩减顺序，大小顺序为尺寸的大小。

可用于optimize关键字来自显然无证（只要在线文档去; help()幸运作品）的选项功能np.einsum_path：

einsum_path(subscripts, *operands, optimize='greedy') 

Evaluates the lowest cost contraction order for an einsum expression by 
considering the creation of intermediate arrays.

从np.einsum_path输出收缩路径也可以用作输入参数np.einsum。在你的问题中，你担心使用的内存太多，所以我怀疑没有优化（默认情况下运行时间更长，内存占用更小）。

来源

2017-02-04 00:07:32

@hpaulj即使在[特定版本的文档]（https://docs.scipy.org/doc/numpy-1.12.0/reference/generated/numpy.einsum.html）我只能看到提及'np.einsum_path'，但没有链接...感谢'优化'提示，我完全错过了（即使我有1.12）！ –

我通过'Ipython'''操作符读取大部分文档。 – hpaulj

@hpaulj that figures :)我只是恢复到当我知道我正在寻找的功能... –

蟒蛇中的高效张量收缩

回答

相关问题