I am new to python, just getting started with it and I was trying to do some task with python. I have an excel file that I want to automatically check if it meets the criteria that I want to set. I know that I have to use openpyxl to open and read/write excel sheets but I do not know how to set criteria for checking.
For example, I want that the python script to check if the first column has values only text then the second column has only double values and the third column has value integer, but, avoid the first row (since its title) and if so then it gives a message that is fine and if not then give a message where the error appears.
Well you open the file in Python and have some for-loops check these condision.
You need to look at “how” Python is importing the file (propably as 2D-Array) and then do the type checking - which is just:
for text in excel[0]:
if type(text) != str:
print("There was some none-text in the first row")
# repeat for
# type(number) != float
# type(integer) != int
Keep in mind, Python has no strict typing and also automatically adjusts space.
Meaning there is no “double”, only “float”. Same with integer. If a number becomes to big for the datatype, Python adjusts the type under the hood and still shows it as float/int/whatever.
Also in the example, you will get a generic error message for EVERY TIME types don’t match.
“excel” was just meant as a placeholder
In your code you named it dfs. Also with Pandas it’s a dataframe which is a more complicated data-structure and I’d recommend looking into Pandas, by going through the FCC-tutorials on that.
My method was for a basic array, I couldn’t tell you how for-loops would work on a dataframe. Also the dataframes actually do have implemented strict typing because they are optimized to deal with HUGE datasets which benefit greatly from that.
Though from the top of my head dfs.describe() should give you some decent intormation on the datatypes in one go - because pandas assumes each column is a specific attribute so it ascribes one datatype per column which applies to all entries. And the read command will take in the first row as titles for the columns.
Thank you for your reply, I somehow achieved on opening with with pandas the files and then going into details of the file and checking for what type of data is but the only problem now is I do not know how to create if-else statement out of it.
Then you should definitly look into a course / tutorial on getting started with Python
For example “x = y” is an assignment, writing y into x.
Comparison is done with “x == y”.
If-statements look like this:
if condition:
do something
else:
do something else
Thank you for your reply and help, I will look into it. but in this case that I have written do you have any idea how could be ? with if -else i mean ?
if dfs.iloc[2:11 , 1:2].dtypes[0] == object:
print ("value is okay")
else:
print("value is not okay")
.dtypes creates a pandas-series, even if it only is one column. So you have to adress the first entry via brackets, otherwise pandas will again create a series-object out of the comparison. But the if-condition will throw an error in that case.
Alternatively you could take the first entry after the comparison for the same result:
if (dfs.iloc[2:11 , 1:2].dtypes == object)[0]:
print ("value is okay")
else:
print("value is not okay")