这个网站上有一个任务:统计单词数。单词被认为是英文字母字符的连续序列(从 A 到 z)。
以下是示例:
Hello there, little user5453 374 ())$. I’d been using my sphere as a stool. Slow-moving target 839342 was hit by OMGd-63 or K4mp.
contains "words"
['Hello', 'there', 'little', 'user', 'I', 'd', 'been', 'using', 'my','sphere', 'as', 'a', 'stool', 'Slow', 'moving', 'target', 'was', 'hit', 'by', 'OMGd', 'or', 'K', 'mp']
有些词应该被排除。这些话是:
"a", "the", "on", "at", "of", "upon", "in" and "as", case-insensitive.
我的解决方案是这样的:
我们用文字来分解它——我们删除所有不必要的东西。在循环中,我们检查该单词是否被禁止或为空字符串并增加计数器
import codewars_test as test
import re
except_word = ["a", "the", "on", "at", "of", "upon", "in", "as"]
def word_count(s):
all_word = re.sub(r'([^A-Za-z]+)', r' ', s).split(' ')
print(all_word)
cnt = 0
for word in all_word:
if word in except_word or word == '':
continue
else:
cnt += 1
return cnt
if __name__ == '__main__':
test.assert_equals(word_count("hello there"), 2)
test.assert_equals(word_count("hello there and a hi"), 4)
test.assert_equals(word_count("I'd like to say goodbye"), 6)
test.assert_equals(word_count("Slow-moving user6463 has been here"), 6)
test.assert_equals(word_count("%^&abc!@# wer45tre"), 3)
test.assert_equals(word_count("abc123abc123abc"), 3)
test.assert_equals(word_count("Really2374239847 long ^&#$&(*@# sequence"), 3)
long_text = r"""
I’d been using my sphere as a stool. I traced counterclockwise circles on it with my fingertips and it shrank until I could palm it. My bolt had shifted while I’d been sitting. I pulled it up and yanked the pleats straight as I careered around tables, chairs, globes, and slow-moving fraas. I passed under a stone arch into the Scriptorium. The place smelled richly of ink. Maybe it was because an ancient fraa and his two fids were copying out books there. But I wondered how long it would take to stop smelling that way if no one ever used it at all; a lot of ink had been spent there, and the wet smell of it must be deep into everything.
"""
test.assert_equals(word_count(long_text), 112)
基本测试:
test.assert_equals(word_count("hello there"), 2)
test.assert_equals(word_count("hello there and a hi"), 4)
test.assert_equals(word_count("I'd like to say goodbye"), 6)
test.assert_equals(word_count("Slow-moving user6463 has been here"), 6)
test.assert_equals(word_count("%^&abc!@# wer45tre"), 3)
test.assert_equals(word_count("abc123abc123abc"), 3)
test.assert_equals(word_count("Really2374239847 long ^&#$&(*@# sequence"), 3)
该函数通过了所有基本测试,但不会在长文本上产生正确的答案,
这是一个基本示例:
Example Input 2
I’d been using my sphere as a stool. I traced counterclockwise circles on it with my fingertips and it shrank until I could palm it. My bolt had shifted while I’d been sitting. I pulled it up and yanked the pleats straight as I careered around tables, chairs, globes, and slow-moving fraas. I passed under a stone arch into the Scriptorium. The place smelled richly of ink. Maybe it was because an ancient fraa and his two fids were copying out books there. But I wondered how long it would take to stop smelling that way if no one ever used it at all; a lot of ink had been spent there, and the wet smell of it must be deep into everything.
Example Output 2
112
我的函数返回 113 也许我对任务的理解不正确(翻译自 Google)?在服务器上的随机测试中,我的解决方案总是产生 1 到 3 个额外的单词。
请解释一下,这是什么原因?就好像我忘记删除某些东西(我不明白是什么)。
作业说:不区分大小写。
但单词可以以大写字母开头,例如“and”和“And”是不同的单词。