I have to categorize 22k account names. I decided to try out this NLP model
(just the baseline model, not the ones below it).
The train/test model results were satisfactory for the 33k other accounts already categorized. But I’m stuck trying to figure out how to apply it towards the .csv file that holds the list of uncategorized accounts. I don’t know how to go about writing a function for this (I’m used to straightforward numerical stuff like forecasting/time series analysis).
I played around with the dataframes but vectorizing doesn’t seem to like a CSV file consisting solely of account names. But like I said, even if I kinda work around it I still don’t have an applicable function.
Aside from file names/data differences, I’m using .csv instead of .txt and splitting on ‘,’ instead of ‘\t’
For the record, I have also posted this on Reddit