Keeping count of new keys as they are generated in dictionary comprehension

This code: (for chap 10, ex 1 of Py4e book,) scrapes sender’s email addresses from a txt file. Uses code3/mbox.txt:-

import re
address_book = {}
fle = input('enter a file ') 
with open(fle) as fhandle:
    for line in fhandle:
        if line.startswith('From '):
            address = re.findall('\S+@\S+', line)[0]
            address_book[address] = address_book.get(address, 0) + 1


llist = sorted([(v, k) for (k, v) in address_book.items()], reverse=True)
v, k = llist[0]
print(f"this address {k}, sent the most emails. It sent {v} emails")

It works as it should, but I’m trying out dictionary comprehension, and testing my knowledge of how dictionaries work in python, by trying this:

import re
fle = input('enter a file ') 
with open(fle) as fhandle:
    address_book = {re.findall('\S+@\S+', line)[0]: 0 for line in fhandle if line.startswith('From ')} 
#	more code to go here

llist = sorted([(v, k) for (k, v) in address_book.items()], reverse=True)
v, k = llist[0]
print(f"this address {k}, sent the most emails. It sent {v} emails")

Which doesn’t work, as the count of emails remains at 0, so there isn’t a ‘most prolific sender’.

I was surprised that the ‘address_book =’ line worked; I’d expected an error, as dictionaries only use unique keys, and, as some of the emails in the txt file are repeated, it would be sent these repeats and asked to use them to create duplicate keys. But it resolves this by discarding these potential duplicates as it finds them it seems, giving me a dictionary of the set of all senders addresses. So I may have learned something there, at least.

If there was a way of counting how many times it sees each email address during that process that would be great, but I can’t work out a way. That could be used to generate the number of sent emails for each sender.

Is there a way, or is this just a silly idea? Bearing in mind that I’m only doing this to feel out the limits of how dictionaries work, not trying to create good code per se.

You can create a list of the email addresses and then you would use count().

email = ['brandon@hmail.com', 'brandon@hmail.com',  'brandon@hmail.com',  'brandon@Zmail.com', ]

count_hmail = email.count('brandon@hmail')

# Output

3

Thanks Brandon, I had hoped there was a built in way that python would flag up attempted duplicate dict keys, but there doesn’t appear to be, so something like your idea is probably the only workable way. But it would mean making another list of all the counts to find the max value, which would mean more code rather than less. I’ll stick to the more standard way of doing it in a loop, I think that’s as compact as its going to get.

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.