迷惑分区域列表中的覆盖区域

我有一个很长的值列表x和y，按x值排序。我想输出x和y值最长连续跨度的列表。这是有点难以付诸话但将有希望成为用下面的例子清楚：作为5768和6000之间的区域没有被任何条目的覆盖，上述应迷惑分区域列表中的覆盖区域

输出：

0, 5768 
6000, 7262

在我看来，这应该是一个简单的问题，但我一直没有解决方案的工作一段时间。我已经在下面发布了我的代码。我目前的努力存在的问题是，虽然对x值进行排序，但可能是第k行的x值超过第k-1行的y值，但不标记新连续字符串的开始。

lines = [line.strip('\n') for line in open('test')] 
myarray=[] 
for line in lines: 
    myarray.append(line.split(', ')) 

def findCoveredRegions(regionArray): 
    resultsContigs = [] 
    j = regionArray[0][1] 
    i = regionArray[0][0] 
    for line in regionArray: 
     last_i = i 
     i = line[0] 
     if i <= j: 
      if line[1] > j: 
       j = line[1] 
     else: 
      resultsContigs.append([last_i,j]) 
    resultsContigs.append([i,regionArray[len(regionArray)-1][1]]) 

    return resultsContigs 

print findCoveredRegions(myarray)

来源

2017-03-10 Sigurgeir

对不起，我不明白这个问题，即使是这个例子。如果你发现很难用文字表达，那么你几乎可以肯定地发现很难将其写入代码。也许首先就是这样做的。 – Denziloe

您能否详细说明您的意思：1）“x和y值的最长连续跨度”2）“5768和6000之间的区域未被覆盖” –

想象一下，您将连续序列中的0到7262之间的所有数字。我们可以将我的例子中的每一行看作0到148,0到145等所有数字的字符串。我想要生成的是0到7262之间的区域列表，其数目至少会出现一次，知道有些数字根本不会出现。 5768和6000之间的数字不是任何子序列的一部分，但0和5768以及6000和7262之间的所有数字至少被其中一个区域“覆盖”。这有道理吗？ – Sigurgeir

这里是一个numpy的解决方案

myarray = np.asanyarray(myarray) 
order = np.argsort(myarray.ravel()) 
coverage = np.add.accumulate(1 - 2*(order%2)) 
gaps = np.where(coverage==0)[0] 
left = order[np.r_[0, gaps[:-1] + 1]] 
right = order[gaps] 
result = myarray.ravel()[np.c_[left, right]]

它池和排序的所有边界。然后从左到右计算它遇到的左（+1）和右（-1）边界的数量。这个数字永远不会是负数，只有在存在差距时才会降到零。从间隙位置重建覆盖间隔。

来源

2017-03-10 21:43:28

这似乎并没有给我提供的测试用例提供正确的解决方案。我得到以下输出： [[ '0' '2374'] [ '2376' '2415'] [ '3000' '4315'] [ '5768' '600'] [ '6000'“7262 ']] – Sigurgeir

事实上，它会检查_adjacent_间隔，但它必须检查上次覆盖的间隔，其中可能包含完全跟随它的一些间隔。 – 9000

@Sigurgeir我确实得到了正确的结果，只要我将字符串转换为整数。它不适合你，因为'600'>'5768'是字符串等，当然，它完全跳出算法。 –

这不会特别快，但我认为这是相当Pythonic和可读性。它不需要或使用排序的间隔列表。

intervals = [(0, 148), 
(0, 145), 
(0, 186), 
(0, 5768), 
(600, 2374), 
(2376, 2415), 
(3000, 4315), 
(6000, 6616), 
(6000, 6799), 
(6000, 7262)] 


def intersect(interval_a, interval_b): 
    """Return whether two intervals intersect""" 
    (a_bottom, a_top), (b_bottom, b_top) = interval_a, interval_b 
    return a_bottom <= b_top and b_bottom <= a_top 


def union_one_one(interval_a, interval_b): 
    """Return the union of two intervals""" 
    (a_bottom, a_top), (b_bottom, b_top) = interval_a, interval_b 
    return min(a_bottom, b_bottom), max(a_top, b_top) 


def union_many_one(old_intervals, new_interval): 
    """Return the union of a new interval with several old intervals.""" 
    result = [] 
    for old_interval in old_intervals: 
     # If an old interval intersects with the new interval, merge the old interval into the new one. 
     if intersect(old_interval, new_interval): 
      new_interval = union_one_one(old_interval, new_interval)  
     # Otherwise, leave the old interval alone. 
     else: 
      result.append(old_interval) 
    result.append(new_interval) 
    return result 


def union_all(intervals): 
    """Return the union of a collection of intervals""" 
    result = [] 
    for interval in intervals: 
     result = union_many_one(result, interval) 
    return result    


print(union_all(intervals))

来源

2017-03-10 23:49:32 Denziloe

不错。当然，最坏的情况是O（n^2），但在某些情况下（大量重叠），它甚至可能比排序解决方案更快。 –

迷惑分区域列表中的覆盖区域

回答

相关问题