Pandas: df.var vs df['var']?

Hi y’all! I keep getting hung up on this. When do you use df.var instead of df[‘var’]? For example, df.var.mean() and df[‘var’].mean() produce the same results, but I assume that’s not always the case.

As a general rule , you use dot notation when you have the attribute name and you use bracket notation when you have a variable containing the attribute name.

name = `var`

# These are all the same
df.var.mean() 
df['var'].mean()
df[name].mean()

Sorry, some Googling later and I’m still struggling to understand what an attribute name is. What does it mean to “have the attribute name” vs a variable that contains it? Thanks you so much!

Whoops, there was a typo in my answer. Does the edit help?

Sorry, still confused :frowning: I’m struggling to figure out the difference between having the attribute name and having a variable containing the attribute name.

The only difference is the syntax for accessing the data. Sometimes you will be able to use the attribute name directly when writing the code and sometimes you want to write code that will work for any attribute name provided at runtime (so, stored in a variable).

The thing that I got from using it is:

You generally use df.var when the name of your column is following the rules of python
And you use df[‘var’] when it doesn’t.

That being said, it’s supposed to be like this.

If your column is written Number of Sales
You should use df[‘Number of Sales’]

If it’s written number_of_sales
Then, go for the df.number_of_sales

I don’t know for sure it this is the only time when you do that.
But I’ve read about it on Codecademy, and that’s how I do it.