Keeping count of new keys as they are generated in dictionary comprehension

moorea21 · February 4, 2022, 5:13pm

This code: (for chap 10, ex 1 of Py4e book,) scrapes sender’s email addresses from a txt file. Uses code3/mbox.txt:-

import re
address_book = {}
fle = input('enter a file ') 
with open(fle) as fhandle:
    for line in fhandle:
        if line.startswith('From '):
            address = re.findall('\S+@\S+', line)[0]
            address_book[address] = address_book.get(address, 0) + 1


llist = sorted([(v, k) for (k, v) in address_book.items()], reverse=True)
v, k = llist[0]
print(f"this address {k}, sent the most emails. It sent {v} emails")

It works as it should, but I’m trying out dictionary comprehension, and testing my knowledge of how dictionaries work in python, by trying this:

import re
fle = input('enter a file ') 
with open(fle) as fhandle:
    address_book = {re.findall('\S+@\S+', line)[0]: 0 for line in fhandle if line.startswith('From ')} 
#	more code to go here

llist = sorted([(v, k) for (k, v) in address_book.items()], reverse=True)
v, k = llist[0]
print(f"this address {k}, sent the most emails. It sent {v} emails")

Which doesn’t work, as the count of emails remains at 0, so there isn’t a ‘most prolific sender’.

I was surprised that the ‘address_book =’ line worked; I’d expected an error, as dictionaries only use unique keys, and, as some of the emails in the txt file are repeated, it would be sent these repeats and asked to use them to create duplicate keys. But it resolves this by discarding these potential duplicates as it finds them it seems, giving me a dictionary of the set of all senders addresses. So I may have learned something there, at least.

If there was a way of counting how many times it sees each email address during that process that would be great, but I can’t work out a way. That could be used to generate the number of sent emails for each sender.

Is there a way, or is this just a silly idea? Bearing in mind that I’m only doing this to feel out the limits of how dictionaries work, not trying to create good code per se.

brandon_wallace · February 4, 2022, 9:15pm

You can create a list of the email addresses and then you would use count().

email = ['brandon@hmail.com', 'brandon@hmail.com',  'brandon@hmail.com',  'brandon@Zmail.com', ]

count_hmail = email.count('brandon@hmail')

# Output

3

moorea21 · February 5, 2022, 10:09am

Thanks Brandon, I had hoped there was a built in way that python would flag up attempted duplicate dict keys, but there doesn’t appear to be, so something like your idea is probably the only workable way. But it would mean making another list of all the counts to find the max value, which would mean more code rather than less. I’ll stick to the more standard way of doing it in a loop, I think that’s as compact as its going to get.

Topic		Replies	Views
A question about dictionaries in an exercise Python	2	802	December 23, 2022
Can somebody please explain how this code works starting from line 7 Python	2	407	February 1, 2023
Why is it giving me this list comprehension error Python	2	2240	October 6, 2021
Code Explanation Python Python	1	298	October 11, 2020
Dictionaries and Loops in Python. Using Two Nested loops Python	5	687	September 9, 2020

Keeping count of new keys as they are generated in dictionary comprehension

Related topics