Using AI/Machine Learning: for Keyword, Hashtag, Key Data extraction and Text Analysis of news articles

Hey all, I’m just going to throw this out and see if other experienced developers here have done similar kind of work before and could offer me ideas/pointers/solutions.

I’m a freelance/self-employed web developer/designer and one of my clients (a news organization) has asked for help/ideas on how they can more effectively manage their news stories.

I built their existing website including a custom CMS/database solution, the problem they’d like to address is they have multiple news editors on staff who may assign different variations of keywords to a particular news story/subject.

Obviously, they want to avoid/minimize this scenario so they usually have to manually go back to the archive and manually review the exact keywords they’ve assigned to a previous related story – time consuming.

For example, a few weeks ago the big news was the Sutherland Texas church shooting. It’s a news story that has a lot of moving parts… the tragedy itself, then segue to gun control, mental health, church security, background check, etc… A news editor may initially assign it the keyword “Sutherland”, while another may use “Sutherland Texas”, and another may use “Sutherland Baptist Church”. A news article can also have multiple keywords so another keyword can be “background check”, and another editor may assign it the keyword “mental health”.

A popular news story like this can quickly grow and use multiple keywords, and multiple variations of each keyword and you can see how it can become big and wieldy – that’s the problem.

During our planning meeting, they suggested that I create a new database where they can pick pre-defined keywords they can use/assign for news stories, so they can keep usage consistent. They’ll manually manage this new database (adding, deleting pre-defined keywords).

Now, we’ve done something similar before (but for a different piece of data) and I reminded them how in the long run, we found this approach was not flexible and how it became more of a nightmare the longer the list became to manage. They remembered and agreed. They publish several news stories everyday, 5x a week. So I am not completely sold that this is the right approach – based on our previous experience.

While googling around, the idea of using AI/Machine learning came to me and using an online API provider that will do smart text analysis of the news story, so it can extract the relevant keywords, data, locations, person names, suggested hashtags, even condensed summary for each news article. The API provider will then give me a JSON output, which I can then format for display on their CMS admin. Then, news staffers can either accept the suggested keywords, or just pick and choose what the most important keywords, #hashtags they want to assign to a particular news story.

That’s my thought, and yes, it will cost some $$$ paying the API provider (but not too bad), but my thinking is this will at least standardize what keywords/tags they’ll use for a news story.

So anyway, how would you approach this? Interested to hear other ideas. Thanks!