Question about len() in problem

I am working on a Python problem for an online course. I need to extract floats from a .txt file and find the average of them; however, when I print the length from the floats it gives me 7. The length of the floats is more than 7. This is messing with my code and returning the wrong value. Would someone look at my code and tell me why it returns a length of 7? Where is my mistake?

I´m posting my code and all the floats that return when I print only the floats below.

fname = input("Enter file name: ")
fh = open("mbox-short.txt")
for line in fh:
    if not line.startswith("X-DSPAM-Confidence:"):
        continue
    num = line.find(".")
    numstr = line[num-1:]
    sum = 0
    print(float(numstr))
    el = (float(numstr))
    if sum < el:
      sum+=el
    length = len(numstr)
    res = sum/length
print("Average spam confidence: ", res)

print return of floats

0.8475
0.6178
0.6961
0.7565
0.7626
0.7556
0.7002
0.7615
0.7601
0.7605
0.6959
0.7606
0.7559
0.7605
0.6932
0.7558
0.6526
0.6948
0.6528
0.7002
0.7554
0.6956
0.6959
0.7556
0.9846
0.8509
0.9907

Have you tried to print the string that’s giving such length? Try printing it, but with added some specific characters at the start and ending, to delimit the actual start and ending. There might be some characters (like spaces or new line characters) that might not be so easy to notice.

I ran the .split() to get rid of white space print function

   print(numstr.split())

It returned this and the length was still 7. I also ran rsplit(). Same return. I don´t understand why the len() is returning something less than what is.

['0.8475']
['0.6178']
['0.6961']
['0.7565']
['0.7626']
['0.7556']
['0.7002']
['0.7615']
['0.7601']
['0.7605']
['0.6959']
['0.7606']
['0.7559']
['0.7605']
['0.6932']
['0.7558']
['0.6526']
['0.6948']
['0.6528']
['0.7002']
['0.7554']
['0.6956']
['0.6959']
['0.7556']
['0.9846']
['0.8509']
['0.9907']

The content of the file exceeded FCC content limit. A link to the txt file is below:
https://www.py4e.com/code3/mbox-short.txt?PHPSESSID=239cd7100f27f8ce0c0321fb5fa24666

That still might contain new line character, it wouldn’t be split to own item:

>>> '0.8475\n'.split()
['0.8475']
>>> '0.8475'.split()
['0.8475']

Try printing it like this:

print('"{}"'.format(numstr))

There’s one more way - print(repr(numstr)) - , it would print the new line character normally:

>>> print(repr('0.8475\n'))
'0.8475\n'

I want to make sure I understand where I´m at and what I need to do from here.
WHERE I´M AT:
So far my code prints takes strings of floats (that are crowded with /n) from the .txt (Code below)

fname = input("Enter file name: ")
fh = open("mbox-short.txt")
for line in fh:
    if not line.startswith("X-DSPAM-Confidence:"):
        continue
    num = line.find(".")
    numstr = line[num-1:]

WHAT I NEED TO DO:
First I need to find a way to iterate through the string floats being collected from the .txt file, eliminate the whitespace and convert to a float and, when done with that, set up a counter to count the converted float .

Second, I need to iterate through the converted floats and then add each one to the previous one, saving the total sum in a variable and dividing that by the counter for the floats.

I think this is a good plan. Where are the flaws?

I am testing my code to see if I am accumlating the floats and the counter.

#Assigning file name user inputs to fname
fname = input("Enter file name: ")
#assigning action of opening file to fh
fh = open(fname) 
#looping through fh
for line in fh:
#for every line strip off the white space
    line = line.rstrip()
    #initialize counter and sum for accum. tot sum of floats and number of float instances
    counter = 0
    sum = 0
    #Include exception to insure loop focuses on specific instance of float
    if not line.startswith("X-DSPAM-Confidence:"):
        continue
    #identify decimal point
    num = line.find("0.")
    try:
    #use try to test if value is float through attempting to convert to float
        numstr = float(line[num:])
    #conversion is successful add current sum    
        sum = sum + numstr
    #note occurrence of float by adding to counter
        counter = counter + 1
    #if try fails return top of loop and run again
    except:
        continue  
    print(counter)  
    print(sum)      

I return this.

1
0.8475
1
0.6178
1
0.6961
1
0.7565
1
0.7626
1
0.7556
1
0.7002
1
0.7615
1
0.7601
1
0.7605
1
0.6959
1
0.7606
1
0.7559
1
0.7605
1
0.6932
1
0.7558
1
0.6526
1
0.6948
1
0.6528
1
0.7002
1
0.7554
1
0.6956
1
0.6959
1
0.7556
1
0.9846
1
0.8509
1
0.9907

I know this means my counter is in the wrong place…as well as my sum variable…
but I don´t get it. I put them in try because if try is successful (and it is a float) they´ll add one to counter.

This is incorrect. My logic is that if try completes successively during an instance of the loop than add one to counter and add the sum to the current converted value in the loop and ascribe that new value to the variable sum.

Where am I wrong here?

Okay! I got it. I can´t initiate my variables sum/counter inside the loop, but I they need to accumulate within the loop. Thanks for your help!

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.