Iterate over dictionary with multiple keys and values and match content with dataframe column

I am making a dashboard of a (financial) book keeping by labelling transactions in a .csv dump from a bank account. I go to the point of making a dataframe and being able to create a new column with a label based on the partial contents of anotther column with a description in it. To do this, lists with containing certain phrases were made (store namens, expressions like ‘ice cream’, DIY-stores, etc.). When finding a phrase in the Descrition column, the name of the list is used as a label for that transaction. In the end, the labels and amounts are imported in MS Power bi and a nice overview is being produced.

The problem I run into is that I need 20 lists so far (one for each category) and 20 df.loc-code lines.
What I would like to do is iterate over each list and use only 1 piece of code. I made a dictionary with multiple values to each key and can iterate over each key-value pair, but cannot get it combined with the pandas dataframe code…

Does anybody know a solution?

Below a simple extraction of the code with two lists and two lines of code:

# lists with labels and expressions to look for
Groceries_list = ['Albert Heijn', 'Lidl', 'Hoogvliet', 'Slagerij', 'Bayrakdar']
Maintenance_list = ['Karwei', 'Gamma', 'Praxis', 'Hornbach']

# Match expression in column 'Description' in data frame and add the correct label in column 'Label'
df.loc[df['Description'].str.contains('|'.join(Groceries_list), case=False), "Label"] = 'Groceries'
df.loc[df['Description'].str.contains('|'.join(Maintenance_list), case=False), "Label"] = 'Maintenance'

And here a short version of the dictionary of the 2 lists (pretty straightforward):

Labels = {
    'Groceries': ['Albert Heijn', 'Lidl', 'Hoogvliet', 'Slagerij', 'Bayrakdar'],
    'Maintenance': ['Karwei', 'Gamma', 'Praxis', 'Hornbach']}

Thanks a lot for pointing me in the right direction (again :wink: )!

You can iterate over Labels like this:

for label in Labels:
  print(Labels[label])

The following line of pandas code:

df.loc[df['Description'].str.contains('|'.join(Groceries_list), case=False), "Label"] = 'Groceries'

only requires Groceries_list and 'Groceries' both of which are available when you iterate over the Labels:

  • Groceries_listLabels[label]
  • 'Groceries'label

So I suppose (haven’t tested it) that the following code should work:

for label in labels: 
  df.loc[df['Description'].str.contains('|'.join(labels[label]), case=False), "Label"] = label
1 Like

Thank you!!!

I was making it unnecessary complicated with my iteration over the Labels, the method suggested by you works very well!
So, code reduced with 60%, easier to maintain and add more categories: happy me :smiley:

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.