I have two txt files I’m comparing and I’m trying to write see what words they have in common. I think I need and if statement and to split the files but I’m not sure how to tell python to pull the same words. This is what I have so far:
Unique_Dracula = ()
with open(r’Dracula.txt’,‘r’) as Dracula, open (r’great_expectations.txt’,‘r’, encoding=‘utf8’) as Great:
Dracula_word = Dracula.split()
Great_word = Great.split()
if Dracula_word in Great_word:
#Unique_Dracula( I know I need something here to tell python to grab the values and store them here)
Any tips will help! I’m super new to python so I’m still working out the basics. Thank you in advance!
Great_word are lists, asking if a list is in another list will not work with this syntax. Plus you do not want to check if all the words in the
Dracula_word are in the
Great_word list. You only want to know which words are the same.
In my opinion, the easiest way to approach this problem is to create a
set for each of the
lists and then find where the sets intersect. There is a
set method you can use to do exactly this named
intersection. It will return a set of only the items that match.
Note: This approach counts
the as different words, so if you want them to count as the same word, you will first need to convert them to lowercase or lowercase when you pull them into each list or set.
One more suggestion, you might want to consider using regular expressions to create the list of words, because if you use split, it’s not going to handle the punctuation marks in any way; consider the difference of these two outputs, where the 2nd is simply matching a pattern of lower case (or apostrophe). You might need to think about hyphenated words as well somehow, but without going too deep, this very simple code is already a huge improvement over split. If you aren’t familiar with regex stuff, it is covered in the Scientific Computing Python course with Charles Severance on this site.
str = “A rather… simple example: what happens to, for example, the punctuation?”
[‘a’, ‘rather…’, ‘simple’, ‘example:’, ‘what’, ‘happens’, ‘to,’, ‘for’, ‘example,’, ‘the’, ‘punctuation?’]
[‘a’, ‘rather’, ‘simple’, ‘example’, ‘what’, ‘happens’, ‘to’, ‘for’, ‘example’, ‘the’, ‘punctuation’]