I am getting all the values as nan after labelling rows in a data frame

lata007 · October 15, 2021, 11:16am

pd.DataFrame(df3, index = indx).head()

I tried to label rows of bag of words dataset which contains 0 and 1 but now getting all nan values after that why?

Jagaya · October 15, 2021, 11:26am

Can you like share a bit more code or something?

lata007 · October 15, 2021, 1:28pm



def remove_stop_words(a,b):
    global count
    c=[] #list with comments without stop words and punctuation
    for i in range(len(a)):
        if a[i].lower() not in b :
           c.append(a[i].lower())
    
    for i in range(len(c)): #used to create dictionary
        if c[i].lower() not in bag_of_words.keys():
            bag_of_words[c[i].lower()] = 1
            fields.append(c[i].lower())
            count= count+1
        else:
            bag_of_words[c[i].lower()] = bag_of_words[c[i].lower()] + 1
    c = ' '.join(c)
    return c
def bagwords(twt,df,count):
    li=twt.split()
    for i in range(len(li)):
        df3.at[count, li[i]] = df3.iloc[count][li[i]] + 1



filepath=r"Apple Sentiment Tweets.csv"
filepath2=r"Cleanedtweets.csv"
df1 = pd.read_csv(filepath)
df2 = pd.read_csv(filepath2)
f1=open(r"stop_words.txt","r+")
li=f1.read().split()
rows=len(df1)
values = []
for i in range(rows):
    tweet=cleann(df1.iloc[i]['text'])
    temp=remove_stop_words(tweet,li)
    df2.at[i,'text']=temp #contains tweets in form of dataframe
for i in range(rows):
    values.append([])
    for j in range(count):
        values[i].append(0)
with open(filename,'w') as csvfile:
    writer=csv.DictWriter(csvfile,fieldnames=fields)
    writer.writeheader()
    csvwriter=csv.writer(csvfile)
    csvwriter.writerows(values)
df3 = pd.read_csv(filename)
for i in range(rows):
    tweet = df2.iloc[i]['text']
    bagwords(tweet,df3,i)

print(df3.head())

n = len(df3)
indx = []
for i in range(n):
    st = "T"
    print(i)
    st += str(i)
    indx.append(st)
print(indx)

pd.DataFrame(df3, index = indx).head()

Jagaya · October 15, 2021, 2:03pm

Ok that’s not what I meant.
I meant showing the specific piece of code which doesn’t do what you want - maybe including an example on what you WANT as a result compared to what the ACTUAL result is.

lata007 · October 15, 2021, 2:20pm

hi, you can just look at the last part of the code where I am trying to label the index using for loop after that I got nan values. That’s what I did before you asked for code then.

Jagaya · October 15, 2021, 3:06pm

hm… ok maybe I phrased that wrong, anyway I just copied the last part and tested it with some basic DataFrame I made myself.
Basically the way you did it made it so the new DataFrame SELECTS the rows from df3 where the index is equal to indx and returns those values. Which obviously doesn’t give any result but as the nice stable library Pandas is, instead of throwing an error, it creates new rows and populates them with nan.

Anyway, you gotta df3.index = indx to just replace the index inplace OR pd.DataFrame(df3.values, index=indx) to create a new DataFrame.
Because you plugged in the entire df3 DataFrame, it included it’s index and hence the method assumed you want to work WITH the index, instead of replacing it.

lata007 · October 15, 2021, 3:48pm

so, if I just use pd.DataFrame(df3.values, index=indx, inplace = True) then it will work?
and if you can also tell me how can I access the “c” in the function called remove_stop_words and create a data frame for that? I will really appreciate if you do so.

Jagaya · October 15, 2021, 8:52pm

I don’t think pd.DataFrame has a “inplace” argument. You’d use the other method and just df3.index = indx

What do you mean? At the end, the function returns c. The problem is, I have no idea what it does or how it looks like. I think you turn it into a string at the end. If you’d turn it into an array (or any other iterable) you should be able to just give it to pd.DataFrame(c) and get a DataFrame.

However unlike the first case, I cannot test this one, as I don’t know, how c looks in the end.
That said, it’s Python - just try out what happens

berkninan · March 23, 2022, 4:49am

First consider if you really need to iterate over rows in a DataFrame. Iterating through pandas dataFrame objects is generally slow. Iteration beats the whole purpose of using DataFrame. It is an anti-pattern and is something you should only do when you have exhausted every other option. It is better look for a List Comprehensions , vectorized solution or DataFrame.apply() method.

Pandas DataFrame loop using list comprehension

result = [(x, y,z) for x, y,z in zip(df['Name'], df['Promoted'],df['Grade'])]

Pandas DataFrame loop using DataFrame.apply()

result = df.apply(lambda row: row["Name"] + " , " + str(row["TotalMarks"]) + " , " + row["Grade"], axis = 1)

system · September 21, 2022, 4:49pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.