I have a large dataset and I need to match some documents to it. At the moment I have a function with several if/else statements to try and match all the various combinations the document could match to the dataset however I am only achieving a 70% pass rate. I’m trying to work out if machine learning or neural network models is the correct approach to try and get a better conversion rate; or is there another way that I am unfamiliar with. The below is a typical example of the ways these can match;
In this, the document ‘Title’ may match to the dataset’s ‘Title’ or ‘Sub-title’ however it also could just be that the document ‘Title’ is in the dataset’s ‘Title’ but does not exactly match. Similarly it may not be within the other property but document ‘Title’ may be some combined dataset ‘Title’ and ‘Sub-Title’ merged in some various way such as separated by a comma.
Finally the document ‘Title’ may match with lots of ‘Title’ in the dataset but will then need to match ‘Sub-Title’ to get the correct match.
Apologies for the long winded question, I’m hoping someone can point me in the direction of a tutorial to help me train a model to so I can attempt to get to 90-99% match rate.