Question about a .split() issue

I´m working on a Python problem asking me to extract the hour of the day for each of the email messages in a text file and count the number of times they appear.

This is my code so far.

name = input("Enter file:")
if len(name) < 1:
    name = "mbox-short.txt"
handle = open(name)
for line in handle:
    words = line.split()
    if line.startswith("From"):
        date = words[5:6]
      
        print(date)
    It returns this:
['09:14:16']
[]
['18:10:48']
[]
['16:10:39']
[]
['15:46:24']
[]
['15:03:18']
[]
['14:50:18']
[]
['11:37:30']
[]
['11:35:08']
[]
['11:12:37']
[]
['11:11:52']
[]
['11:11:03']
[]
['11:10:22']
[]
['10:38:42']
[]
['10:17:43']
[]
['10:04:14']
[]
['09:05:31']
[]
['07:02:32']
[]
['06:08:27']
[]
['04:49:08']
[]
['04:33:44']
[]
['04:07:34']
[]
['19:51:21']
[]
['17:18:23']
[]
['17:07:00']
[]
['16:34:40']
[]
['16:29:07']
[]
['16:23:48']
[]

I need to obtain the number before the first colon, butI´m lost about where to begin. Strings are immutable. I´ve tried .rstrip() as well as running .split(":") None of these are working. I know I have to utilize key/values in some way because the data needs to go into a dict()…but I´m unsure how.

Any hints would be greatly appreciated. I´m attaching a link to the text file referenced in the code.

Thanks!

I am working toward counting the dates. Right now, I´m just trying to get a successful return of all months appearing after the email and the element "From "

My code below returns a date

ls = list()
name = input("Enter file:")
if len(name) < 1:
    name = "mbox-short.txt"
handle = open(name)
for line in handle:
    if line.startswith("From"):
      date = line.find(":")
      if int((line[date-2:date])):
        ls.append(line[date-2:date])   
    print(ls)    

However, it is the same one:

['09']
['09']
['09']
['09']
['09']
['09']
['09']
['09']
['09']
['09']
['09']
['09']
['09']
['09']
['09']
['09']
['09']
['09']
['09']
['09']
['09']
['09']
['09']
['09']
['09']
['09']
['09']
['09']
['09']
['09']
['09']
['09']
['09']
['09']

It´s getting stuck.

I assume this has to do with the way I´m trying to identify the integer in the string:

      if int((line[date-2:date])):
        ls.append(line[date-2:date])   

I feels like the code is too convoluted, but I´m not sure how else to identify this and I want to make sure I can draw this element out before I move on to counting it and organizing it in a data structure (I imagine a dictionary with a key of the month and a value of the counter).

Am I correct about any of this?

I´m working on assembling the list of strings that I will eventually iterate through, keeping a counter of their appearance, which I will eventually plug into a dictionary as a key/value.

That said, with this code,

# input text file into program create backstop in case line is too small
name = input("Enter file:")
if len(name) < 1:
    name = "mbox-short.txt"
#create dictionary and handle for input content
dic = dict()
handle = open(name)
#run for loop on line
for line in handle:
#if line starts with FROM, split elements at colon, separate off relevent parts of element
#in "relevant" variable, run for-in loop on "time" in order to eliminate errant "From" 
# element from "time", append to "lst"
    if line.startswith("From"):
        linespl = line.split(":")
        relevant = linespl[0].split( )
        time = relevant[-1]
        time = time.split(" ")
        print(time)

I am returning this:

['09']
['From']
['18']
['From']
['16']
['From']
['15']
['From']
['15']
['From']
['14']
['From']
['11']
['From']
['11']
['From']
['11']
['From']
['11']
['From']
['11']
['From']
['11']
['From']
['10']
['From']
['10']
['From']
['10']
['From']
['09']
['From']
['07']
['From']
['06']
['From']
['04']
['From']
['04']
['From']
['04']
['From']
['19']
['From']
['17']
['From']
['17']
['From']
['16']
['From']
['16']
['From']
['16']
['From']

So I added this to eliminate that pesky “From”…

        lst = list()
        for num in time:
            if int(time[num]):
                 lst.append(num)
        print(lst)

Which I intended to squeeze out the “From” part of the string as an int() would be false and would not qualify it for append() to lst

I can tell by the error message

line 43, in <module>
    if int(time[num]):
TypeError: list indices must be integers or slices, not str

that my assumptions were not based in logic, but I don´t understand how it doesn´t work. I´m performing the int() formula on a string within a list…what am I missing?
Is, from a broad standpoint, my logic reasonable?
If not, where am I off?

Thanks fpr your patience and help.

I am posting what I´m working with so far. I am returning the correct key, but I am unsure about my ideas moving forward.
Can you tell me if what I´ve done so far is going to lead me in the right direction and my ideas about moving forward are valid?

My code and my comments explaining my intentions are below:

# input text file into program create backstop in case line is too small
name = input("Enter file:")
if len(name) < 1:
    name = "mbox-short.txt"
#create dictionary and handle for input content
dic = dict()
handle = open(name)
#run for loop on line
for line in handle:
#if line starts with FROM, split elements at colon, separate off relevent parts of element
#in "relevant" variable, run for-in loop on "time" in order to eliminate errant "From" 
# element from "time", append to "lst"
    if line.startswith("From "):
        linespl = line.split(":")
        relevant = linespl[0].split( )
        time = relevant[-1]
        #print(time)
        lst = list()
        print(lst)
        ##this is problematic, variable "time" is primed, 
        # access and count it in "dic" data type
        for word in dic:
            if dic[word] == time[word]:
                dic[word] = dic[word]+1
            else: 
                dic[word] = 1
# If the counts dictionary does not contain the hour as a key, then add it 
# and set its value to 1, otherwise increment the existing hour key value by 1.

I´m stuck on 3.1.1 of your algorithm.
I applied your suggestions, but I return absolutely nothing when I run the algorithm and I´m completely lost. I don´t understand.
My code is below. What am I doing wrong?

# input text file into program create backstop in case line is too small
name = input("Enter file:")
if len(name) < 1:
    name = "mbox-short.txt"
#create dictionary and handle for input content
lst = list()
dic = dict()

fhand = open(name)
for line in fhand:
    if line.startswith("From "):
        linespl = line.split(" ")
        if len(linespl) == 7:
            date = linespl[5].split(":")
            print(date)
            time = date[0]
            print(time)

This is what I´ve got now.

# input text file into program create backstop in case line is too small
name = input("Enter file:")
if len(name) < 1:
    name = "mbox-short.txt"
#create dictionary and handle for input content
lst = list()
counts = dict()
fhand = open(name)
for line in fhand:
    if line.startswith("From "):
        email_info = line.split()
        #print(email_info)
    if len(email_info) == 7:
        time_arr = email_info[5].split(":")
        print(time_arr)
        hours = time_arr[0]
        #print(hours)
        lst_hours.append(hours)

Now I return the hours taken out of time_arr
I append() the hours to lst_hours

NOW, here is what I think I ened to do

With lst_hours , I can run a for in loop to check for frequency of each hour in the dictionary, using a .get() method to apply to the hour if it doesn´t already exist OR add one to the existing variable

Do I have it right?

I solved the issue and I have some questions. First, I´ll post my code below:

# input text file into program create backstop in case line is too small
name = input("Enter file:")
if len(name) < 1:
    name = "mbox-short.txt"
#open name,create dictionary and handle for input content

fhand = open(name)
lst = list()
counts = dict()

#search each line of handle for particular phrase
#split() this phrase when found
for line in fhand:
    if line.startswith("From "):
        email_info = line.split()

        #print(email_info)

#separate off the time component of phrase
        time_arr = email_info[5].split(":")
        
        #print(time_arr)

        hours = time_arr[0]
#iterate through dict() and add one if it appears
        counts[hours] = counts.get(hours, 0)+ 1
#loop through dict() and append k and v to list()
for key,val in counts.items():
    lst.append((key,val))
#Sort list 
sorted(lst)

#print(lst)

#Run loop in key values in list and print them out
for key,val in lst:
    print(key,val)
        

You had mentioned that, after my hours variable, I didn´t need to use a list() and could just start testing for the existence of the hour in the counts dictionary and if it is existed add one and if not add it and give it a value.

I could not figure out how to do it without list().

The problems I ran into was that Python did not allow me to add things to a dictionary element. This was possible with a list()

I imagine that this is a result of ignorance. Therefore, would you mind demonstrated a manner of completing my project without using list()?

Thanks so much!

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.