Writing elements from a list to a file - need a push in the right direction

I am working through the Python for Everybody course and am trying to write a program that extracts email addresses from a file and writes them to a new file, each address on a new line. The file "mbox-shot.txt’is used as a source, since it has loads of (duplicate) email adresses in it.
I did succeed in getting them from the file into a list without all other text from the file. When I have the program iterate through the list, it will print each addresss on a new line, but I cannot get it to write each address to a new line in a new file (.txt). Right now it writes all the email addresses to the file twice, all without spacing or spacing characters. I have tried multiple variations, including defining a function, return and write the result to a file (did not work), with some only the last address is written to the file and with others the same result as it is now. I did check the type(email) to confirm it is a list.

Below the code, can anyone push me in the right direction?
(I do realise the the used re for finding an email address is not perfect, but I will experiment more with that later, as well as removing duplicates.)

import re

with open ('mbox-short.txt') as han :
    lines = han.readlines()

emails = re.findall("[\w\.-]+@[\w\.-]+", str(lines))

with open ("email.txt", "a+" ) as file1 : 
    for line in emails :
        file1.write(line) ## when I use print(line) the output is exactly how I want it in the file, so I must miss something (knowlegde/understanding) here?

Thanks a lot!

1 Like

If you open a file as “a+” it will start appending to the file instead of rewriting it. So if you run the program multiple times, you will end up with multiple lists of the same addresses. You need to modify it to “w” in order to overwrite it.

To put addresses on new lines, add a “new line character sequence” to the string as file1.write(line + "\n").

1 Like

Thanks, it works! Now on with the rest of my wishlist in this small programm :slight_smile:

The “a+” came up in my mind as: when iterating through the list (read element, write to file, read, write, etc.), each address has to be added to the file. Since I had one faulty solution where only the final item was written to the file, I figured it could have been because it overwrites each previous one. Seems I was wrong there too…thanks for the extra tip!

i dont know ur module specifically but try

for line in emails:
file1.write(line +"/n")

or if u use pandas u could define the list as a dataframe and then print the dataframe.

thanks!
I am getting the hang of basics now and postpone the use of pandas for a short while, but I will try your suggestion then!

1 Like

why my python is not like u?
that’s weird

Nailed it :smiley:

import re

with open ('mbox-short.txt') as han :
    lines = han.readlines()

# [.] exact character: in this case it has to be a . 
# {m} (number in between {} makes sure the RE appears exactly  m times (in this case: the . is allowed 1x at this position; prevents email addresses without a complete domain showing up 
# creates list with items containing x@x.x (with unlimited @x.x.x), removes \n, <,>, etc.
# str() because RE is a collection of strings
emails = re.findall("[\w\.-]+@[\w\.-]+[.]{1}[\w\.-]+", str(lines)) 
#print(emails)

# check & remove duplicates in list 'emails'
# list(dict.fromkeys(emails)) generates a dictionary containing unique keys only (no values): dictionary does not support duplicates; changes dictionary back into list
# function with a return defined and called: not really necessary, but it looks neater (other advantages?)
def single(emails):
  return list(dict.fromkeys(emails))
unique = single(emails) # call function

# write unique emails adresses to new file (will be generated automatically)
with open ("email.txt", "w" ) as file1 :
    for line in unique :
        file1.write(line + "\n") # \n to write each item on a separate line
1 Like