2013-05-20 56 views
-1

我试图在不同但预定义的行长度上拆分变长字符串。我把下面的代码扔在一起,当我将它放到Python Tutor(我现在还没有访问适当的python IDE)时,在关键错误6上失败了。我想这意味着我的while循环无法正常工作,它尝试着不断增加lineNum,但我不太确定为什么。有一个更好的方法吗?或者这是容易解决的?将字符串拆分为不同的行长度

代码:

import re 

#Dictionary containing the line number as key and the max line length 
lineLengths = { 
     1:9, 
     2:11, 
     3:12, 
     4:14, 
     5:14 
       } 

inputStr = "THIS IS A LONG DESC 7X7 NEEDS SPLITTING"  #Test string, should be split on the spaces and around the "X" 

splitted = re.split("(?:\s|((?<=\d)X(?=\d)))",inputStr)  #splits inputStr on white space and where X is surrounded by numbers eg. dimensions 

lineNum = 1       #initialises the line number at 1 

lineStr1 = ""       #initialises each line as a string 
lineStr2 = "" 
lineStr3 = "" 
lineStr4 = "" 
lineStr5 = "" 

#Dictionary creating dynamic line variables 
lineNumDict = { 
     1:lineStr1, 
     2:lineStr2, 
     3:lineStr3, 
     4:lineStr4, 
     5:lineStr5 
     } 

if len(inputStr) > 40: 
    print "The short description is longer than 40 characters" 
else: 
    while lineNum <= 5: 
     for word in splitted: 
      if word != None: 
       if len(lineNumDict[lineNum]+word) <= lineLengths[lineNum]: 
        lineNumDict[lineNum] += word 
       else: 
        lineNum += 1 
      else: 
       if len(lineNumDict[lineNum])+1 <= lineLengths[lineNum]: 
        lineNumDict[lineNum] += " " 
       else: 
        lineNum += 1 

lineOut1 = lineStr1.strip() 
lineOut2 = lineStr2.strip() 
lineOut3 = lineStr3.strip() 
lineOut4 = lineStr4.strip() 
lineOut5 = lineStr5.strip() 

我已经采取了看看这个答案,但没有C#的任何真正的理解:通过数量Split large text string into variable length strings without breaking words and keeping linebreaks and spaces

+0

给定示例输入的输出应该是什么? –

+0

在这种情况下,我应该得到:“这是一个”“长期降落7”“X7需要”“分裂” – ydaetskcoR

+0

是否分裂'7X7'是一个硬性要求?如果你只是分割单词边界,你可以得到一个更简单的表达式。 –

回答

1

它不起作用,因为你的用于循环中分割的循环中的文字,并带有lineLen条件。你必须这样做:

if len(inputStr) > 40: 
     print "The short description is longer than 40 characters" 
    else: 
     for word in splitted: 
      if lineNum > 5: 
       break 
      if word != None: 
       if len(lineNumDict[lineNum]+word) <= lineLengths[lineNum]: 
        lineNumDict[lineNum] += word 
       else: 
        lineNum += 1 
      else: 
       if len(lineNumDict[lineNum])+1 <= lineLengths[lineNum]: 
        lineNumDict[lineNum] += " " 
       else: 
        lineNum += 1 

而且lineStr1,lineStr2等不会改变,你必须直接访问字典(string是不可改变的)。我试了一下,得到的结果工作:

print("Lines: %s" % lineNumDict) 

给出:

Lines: {1: 'THIS IS A', 2: 'LONG DESC 7', 3: '7 NEEDS ', 4: '', 5: ''} 
+0

这很好,但它似乎放弃了“X”和“分裂”。我已经改变了嵌套'if'的'else'部分,然后尝试将该单词添加到该行,并且如果该单词太长,则要“打印”该单词需要分裂,并且这似乎完美地工作。感谢您的帮助 – ydaetskcoR

+0

是的,其他情况只是放弃了这个词,所以也不奇怪。实际上,我只看着循环。 – Chris

0
for word in splitted: 
    ... 
    lineNum += 1 

代码增量lineNum字数为splitted,即16次。

+0

我不确定我是否正确理解了你,但我只希望它增加一个'lineNum'(如此移动到下一行),如果添加一个单词将超过'lineLength'限制。除非我错过了某些东西,否则'if'块应该能够发挥作用? – ydaetskcoR

+0

是的,但是你的代码有防止'lineNum'超过5的方法。 – xuanji

+0

我还没有关注你。如果lineNum无法适合该行上的单词,则该行数只应增加。现在看看它,但我不认为它会将该词添加到下一行,而是跳到下一个词。这也需要改变。 – ydaetskcoR

0

我不知道是否正确评价正则表达式不会是更容易理解?

lineLengths = {1:9,2:11,3:12,4:14,5:14} 
inputStr = "THIS IS A LONG DESC 7X7 NEEDS SPLITTING" 
import re 
pat = """ 
(?:      # non-capture around the line as we want to drop leading spaces 
    \s*     # drop leading spaces 
    (.{{1,{max_len}}}) # up to max_len characters, will be added through 'format' 
    (?=[\b\sX]|$)  # and using word breaks, X and string ending as terminators 
         # but without capturing as we need X to go into the next match 
)?      # and ignoring missing matches if not all lines are necessary 
""" 

# build a pattern matching up to 5 lines with the corresponding max lengths 
pattern = ''.join(pat.format(max_len=x) for x in lineLengths.values()) 

re.match(pattern, inputStr, re.VERBOSE).groups() 
# Out: ('THIS IS A', 'LONG DESC 7', '7 NEEDS', 'SPLITTING', None) 

此外,对line_lengths使用dict没有实际意义,列表可以很好地执行。

相关问题