Hello, I am new here to this forum, I need some help. I have written a python code to count word frequency from a file into a dictionary. However, for some reason it is counting every word more than it appears in the file:
For example, word “creating”:4, but ts frequency in the file is 3.Below is my code, I will appreciate if someone help me point out the possible error in my code.
def word_frequencies(filename="src/alice.txt"):
d = {}
with open(filename, 'r') as f:
for line in f:
line = line.lower()
line = line.split()
stripped = [x.strip('''!"#$%&'()*,-./:;?@[]_''') for x in line]
for word in stripped:
try:
d[word] += 1
except KeyError:
d[word] = 1
return d
it works for me. Could you post the contents of the word file?
The Project Gutenberg EBook of Alice in Wonderland, by Lewis Carroll
This eBook is for the use of anyone anywhere at no cost and with
almost no restrictions whatsoever. You may copy it, give it away or
re-use it under the terms of the Project Gutenberg License included
with this eBook or online at www.gutenberg.org
Title: Alice in Wonderland
Author: Lewis Carroll
Illustrator: Gordon Robinson
Release Date: August 12, 2006 [EBook #19033]
Language: English
Character set encoding: ASCII
*** START OF THIS PROJECT GUTENBERG EBOOK ALICE IN WONDERLAND ***
Produced by Jason Isbell, Irma Spehar, and the Online
Distributed Proofreading Team at http://www.pgdp.net
[Illustration: Alice in the Room of the Duchess.]
_THE "STORYLAND" SERIES_
ALICE'S ADVENTURES IN WONDERLAND
SAM'L GABRIEL SONS & COMPANY
NEW YORK
Copyright, 1916,
by SAM'L GABRIEL SONS & COMPANY
NEW YORK
ALICE'S ADVENTURES IN WONDERLAND
[Illustration]
I--DOWN THE RABBIT-HOLE
This is a part of the file’s content.
“creating” is indeed present 4 times in the txt file. Notice that the first occurence has upper-case initial: “Creating”. Maybe you are checking by manually searching, but you are using a case-sensitive search?
Got it. Thanks for pointing out this minor confusion. It works great now!