发现平均字长字符串

def word_count (x: str) -> str: 
    characters = len(x) 
    word = len(x.split()) 
    average = sum(len(x) for x in word)/len(word) 
    print('Characters: ' + str(char) + '\n' + 'Words: ' + str(word) + '\n' + 'Avg word length: ' + str(avg) + '\n')

此代码工作正常与普通字符串，但对于像字符串：发现平均字长字符串

'***The ?! quick brown cat: leaps over the sad boy.'

如何修改代码，以便像“***”的数字和“？！”没有在代码中考虑？上面这句话的平均单词数应该是3.888889，但是我的代码给了我另一个数字。

来源

2015-10-31 Ramon Hallan

您必须更精确地确定要过滤的内容。但基本思想是从x.split（）中删除被拒绝的“单词”，并使用该简化列表。 –

如果问题是从某些词语中删除不需要的字符，则必须将其拼出来。 –

使用're'过滤掉你不想包含的内容将是一个相对简单的方法来达到这个目的（即双空格，特殊字符等） –

试试这个：

import re 

def avrg_count(x): 
    total_chars = len(re.sub(r'[^a-zA-Z0-9]', '', x)) 
    num_words = len(re.sub(r'[^a-zA-Z0-9 ]', '', x).split()) 
    print "Characters:{0}\nWords:{1}\nAverage word length: {2}".format(total_chars, num_words, total_chars/float(num_words)) 


phrase = '***The ?! quick brown cat: leaps over the sad boy.' 

avrg_count(phrase)

输出：

Characters:34 
Words:9 
Average word length: 3.77777777778

来源

2015-10-31 02:45:28 flamenco

您应该能够修剪每个单词中的所有非字母数字字符，然后仅在长度仍大于0时使用该单词。我找到的第一个解决方案是一个正则表达式解决方案，但您可能能够找到其他方法来完成它。

Stripping everything but alphanumeric chars from a string in Python

来源

2015-10-31 01:15:56

import re 

full_sent = '***The ?! quick brown cat: leaps over the sad boy.' 
alpha_sent = re.findall(r'\w+',full_sent) 
print(alpha_sent)

将输出：

['The', 'quick', 'brown', 'cat', 'leaps', 'over', 'the', 'sad', 'boy']

为了得到平均，你可以这样做：

average = sum(len(word) for word in alpha_sent)/len(alpha_sent)

哪位能给：3.77

来源

2015-10-31 02:49:44 Leb

我遇到了麻烦，将其纳入我的功能 - 你介意有点简单地插入我的代码上面？ –

如果你正在谈论其他印刷品，你不需要合并它，那么'word'将会是'len（alpha_sent）'，'char'将会是sum_（len_word） ' – Leb

串具有.translate()方法，你可以使用这个（如果你知道所有的字符，你想删除）：

>>> "***foo ?! bar".translate(None, "*?!") 
'foo bar'

来源

2015-10-31 02:57:12 thebjorn

发现平均字长字符串

回答

相关问题