将字符串拆分为不同的行长度

-1

我试图在不同但预定义的行长度上拆分变长字符串。我把下面的代码扔在一起，当我将它放到Python Tutor（我现在还没有访问适当的python IDE）时，在关键错误6上失败了。我想这意味着我的while循环无法正常工作，它尝试着不断增加lineNum，但我不太确定为什么。有一个更好的方法吗？或者这是容易解决的？将字符串拆分为不同的行长度

代码：

import re 

#Dictionary containing the line number as key and the max line length 
lineLengths = { 
     1:9, 
     2:11, 
     3:12, 
     4:14, 
     5:14 
       } 

inputStr = "THIS IS A LONG DESC 7X7 NEEDS SPLITTING"  #Test string, should be split on the spaces and around the "X" 

splitted = re.split("(?:\s|((?<=\d)X(?=\d)))",inputStr)  #splits inputStr on white space and where X is surrounded by numbers eg. dimensions 

lineNum = 1       #initialises the line number at 1 

lineStr1 = ""       #initialises each line as a string 
lineStr2 = "" 
lineStr3 = "" 
lineStr4 = "" 
lineStr5 = "" 

#Dictionary creating dynamic line variables 
lineNumDict = { 
     1:lineStr1, 
     2:lineStr2, 
     3:lineStr3, 
     4:lineStr4, 
     5:lineStr5 
     } 

if len(inputStr) > 40: 
    print "The short description is longer than 40 characters" 
else: 
    while lineNum <= 5: 
     for word in splitted: 
      if word != None: 
       if len(lineNumDict[lineNum]+word) <= lineLengths[lineNum]: 
        lineNumDict[lineNum] += word 
       else: 
        lineNum += 1 
      else: 
       if len(lineNumDict[lineNum])+1 <= lineLengths[lineNum]: 
        lineNumDict[lineNum] += " " 
       else: 
        lineNum += 1 

lineOut1 = lineStr1.strip() 
lineOut2 = lineStr2.strip() 
lineOut3 = lineStr3.strip() 
lineOut4 = lineStr4.strip() 
lineOut5 = lineStr5.strip()

我已经采取了看看这个答案，但没有C＃的任何真正的理解：通过数量Split large text string into variable length strings without breaking words and keeping linebreaks and spaces

来源

2013-05-20 ydaetskcoR

给定示例输入的输出应该是什么？ –

在这种情况下，我应该得到：“这是一个”“长期降落7”“X7需要”“分裂” – ydaetskcoR

是否分裂'7X7'是一个硬性要求？如果你只是分割单词边界，你可以得到一个更简单的表达式。 –

它不起作用，因为你的用于循环中分割的循环中的文字，并带有lineLen条件。你必须这样做：

if len(inputStr) > 40: 
     print "The short description is longer than 40 characters" 
    else: 
     for word in splitted: 
      if lineNum > 5: 
       break 
      if word != None: 
       if len(lineNumDict[lineNum]+word) <= lineLengths[lineNum]: 
        lineNumDict[lineNum] += word 
       else: 
        lineNum += 1 
      else: 
       if len(lineNumDict[lineNum])+1 <= lineLengths[lineNum]: 
        lineNumDict[lineNum] += " " 
       else: 
        lineNum += 1

而且lineStr1，lineStr2等不会改变，你必须直接访问字典（string是不可改变的）。我试了一下，得到的结果工作：

print("Lines: %s" % lineNumDict)

给出：

Lines: {1: 'THIS IS A', 2: 'LONG DESC 7', 3: '7 NEEDS ', 4: '', 5: ''}

来源

2013-05-20 10:49:07 Chris

这很好，但它似乎放弃了“X”和“分裂”。我已经改变了嵌套'if'的'else'部分，然后尝试将该单词添加到该行，并且如果该单词太长，则要“打印”该单词需要分裂，并且这似乎完美地工作。感谢您的帮助 – ydaetskcoR

是的，其他情况只是放弃了这个词，所以也不奇怪。实际上，我只看着循环。 – Chris

for word in splitted: 
    ... 
    lineNum += 1

代码增量lineNum字数为splitted，即16次。

来源

2013-05-20 10:39:02 xuanji

我不确定我是否正确理解了你，但我只希望它增加一个'lineNum'（如此移动到下一行），如果添加一个单词将超过'lineLength'限制。除非我错过了某些东西，否则'if'块应该能够发挥作用？ – ydaetskcoR

是的，但是你的代码有防止'lineNum'超过5的方法。 – xuanji

我还没有关注你。如果lineNum无法适合该行上的单词，则该行数只应增加。现在看看它，但我不认为它会将该词添加到下一行，而是跳到下一个词。这也需要改变。 – ydaetskcoR

我不知道是否正确评价正则表达式不会是更容易理解？

lineLengths = {1:9,2:11,3:12,4:14,5:14} 
inputStr = "THIS IS A LONG DESC 7X7 NEEDS SPLITTING" 
import re 
pat = """ 
(?:      # non-capture around the line as we want to drop leading spaces 
    \s*     # drop leading spaces 
    (.{{1,{max_len}}}) # up to max_len characters, will be added through 'format' 
    (?=[\b\sX]|$)  # and using word breaks, X and string ending as terminators 
         # but without capturing as we need X to go into the next match 
)?      # and ignoring missing matches if not all lines are necessary 
""" 

# build a pattern matching up to 5 lines with the corresponding max lengths 
pattern = ''.join(pat.format(max_len=x) for x in lineLengths.values()) 

re.match(pattern, inputStr, re.VERBOSE).groups() 
# Out: ('THIS IS A', 'LONG DESC 7', '7 NEEDS', 'SPLITTING', None)

此外，对line_lengths使用dict没有实际意义，列表可以很好地执行。

来源

2013-05-20 11:59:13

将字符串拆分为不同的行长度

回答

相关问题