How to Extract a Grammatically Correct Word from a String

AgentLoneStar007 · March 6, 2024, 7:50pm

Hi. So I’m currently working on a moderation system for a Discord bot. Currently, I’m trying to add another stage to the system that extracts all words(English only) from a string. An example would be an input of “aaaatestdd221” would output “test”. Currently I’m using spaCy, a natural language processing library for Python. Here’s my current code:

nlp = spacy.load("en_core_web_sm")

doc = nlp("text input to process")

words_in_input: list = []
for token in doc:
    # If the token in the input is alphabetical
    if token.is_alpha:
        # Start going through all words in the vocabulary
        for word in self.nlp.vocab.strings:
            # If the word is in the token, append it to the list
            if token.is_alpha and word in str(token).lower():
                words_in_input.append(word.lower())

But this is including garbage words as well. An example would be an input of “test” would output “e es est s st t te tes test”. The only item I care about is “test.” (And I know that NLP might be overkill for this scenario, but currently I don’t know a better way of going about it.)

Is there a way to limit spaCy to only include words from its’ vocabulary that are full, grammatically correct words?

system · September 5, 2024, 7:50am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Application of python lists and iterables Python	19	347	January 2, 2024
Homework help Please!	5	602	June 1, 2021
Why is my RegEx in split() not returning capitalized words? JavaScript	8	451	March 22, 2022
How do I find the start of a word?	3	804	January 16, 2021
Title Caser (Title Case a Sentence challenge) Code Feedback	1	725	June 1, 2021

How to Extract a Grammatically Correct Word from a String

Related topics