How do I check if values in Excel are correct with python?

Hello Everyone,

I am new to python, just getting started with it and I was trying to do some task with python. I have an excel file that I want to automatically check if it meets the criteria that I want to set. I know that I have to use openpyxl to open and read/write excel sheets but I do not know how to set criteria for checking.

For example, I want that the python script to check if the first column has values only text then the second column has only double values and the third column has value integer, but, avoid the first row (since its title) and if so then it gives a message that is fine and if not then give a message where the error appears.

Thank you for your time and help

Well you open the file in Python and have some for-loops check these condision.
You need to look at “how” Python is importing the file (propably as 2D-Array) and then do the type checking - which is just:

for text in excel[0]:
  if type(text) != str:
    print("There was some none-text in the first row")

# repeat for 
# type(number) != float
# type(integer) != int

Keep in mind, Python has no strict typing and also automatically adjusts space.
Meaning there is no “double”, only “float”. Same with integer. If a number becomes to big for the datatype, Python adjusts the type under the hood and still shows it as float/int/whatever.

Also in the example, you will get a generic error message for EVERY TIME types don’t match.

Thank you for your reply sir, I will try this out and let you know.

Best Regards

Sir I just checked the code and I have an error in excel which says undefined name ‘excel’. Sorry but I am a newbie in this matter.

Thank you

It is difficult to help without seeing the code you have written.

Thank you for your reply, All I have done is read the information from Excel file with pandas.
I.e

Import pandas as pd
dfs = pd.read:excel(" excel/file", sheet_name = "sheet name", engine= "openpyxl")
print(dfs.head(10))

I know its nothing but that how i got started.
Thank you for your help and patience.

“excel” was just meant as a placeholder :wink:
In your code you named it dfs. Also with Pandas it’s a dataframe which is a more complicated data-structure and I’d recommend looking into Pandas, by going through the FCC-tutorials on that.
My method was for a basic array, I couldn’t tell you how for-loops would work on a dataframe. Also the dataframes actually do have implemented strict typing because they are optimized to deal with HUGE datasets which benefit greatly from that.

Though from the top of my head dfs.describe() should give you some decent intormation on the datatypes in one go - because pandas assumes each column is a specific attribute so it ascribes one datatype per column which applies to all entries. And the read command will take in the first row as titles for the columns.

Thank you for your reply, I somehow achieved on opening with with pandas the files and then going into details of the file and checking for what type of data is but the only problem now is I do not know how to create if-else statement out of it.

for example :

Import pandas as pd
dfs = pd.read:excel(" excel/file", sheet_name = "sheet name", engine= "openpyxl")
print(dfs.head(10))
dfs.iloc[2:11 , 1:2].dtypes

which in this case the result is object and I would like to make and if-else statement out of it. like

if this = dfs.iloc[2:11 , 1:2].dtypes is object then print ("value is okay")
else:
print("value is not okay")

Thank you for your help and patience.

Then you should definitly look into a course / tutorial on getting started with Python :wink:
For example “x = y” is an assignment, writing y into x.
Comparison is done with “x == y”.
If-statements look like this:

if condition:
    do something
else:
    do something else

Thank you for your reply and help, I will look into it. but in this case that I have written do you have any idea how could be ? with if -else i mean ?

Thank you once again.

Well I did a bit of testing and this works:

if dfs.iloc[2:11 , 1:2].dtypes[0] == object:
  print ("value is okay")
else:
  print("value is not okay")

.dtypes creates a pandas-series, even if it only is one column. So you have to adress the first entry via brackets, otherwise pandas will again create a series-object out of the comparison. But the if-condition will throw an error in that case.

Alternatively you could take the first entry after the comparison for the same result:

if (dfs.iloc[2:11 , 1:2].dtypes == object)[0]:
  print ("value is okay")
else:
  print("value is not okay")

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.