Hi. So I’m currently working on a moderation system for my Discord bot. The system has a function where it compares words in a message to all the possible Leetspeak variants of the blocked word that are contained in a list. The problem I’m having is I basically need to compare each word in the message to well over the 42 million variants in the list in under- at the maximum- three seconds. How would I go about optimizing my current system(which takes multiple minutes)? What tools could I use to do this quicker?
Checking if something exists in list will get slower and slower as more elements are in list. Each time, the whole list needs to be traversed to confirm something isn’t in it. Using different data structure, in which such check is faster, will surely speed up things.
Could you recommend said data structure(s)?
Binary search tree
https://stackoverflow.com/questions/21704357/datastructure-for-fast-and-efficient-search
https://www.geeksforgeeks.org/binary-search-tree-data-structure/
Oh I just noticed this, perfect!
I’ll look more into binary search trees because it sounds like a very interesting topic, but in the end I fixed the issue using multi-threading.
def runScan(string_input: str) -> bool:
for item in list:
if string_input in list:
return True
return False
with concurrent.futures.ThreadPoolExecutor() as executor:
items_to_check: list = ["item 1", "item 2", "item 3"]
results: list = list(executor.map(runScan, tokens))
if True in results:
print("Item found in list!")
In a nutshell, the code will take the process of running the given function(runScan()
) and run it in multiple instances using items from the list as the input for the function.
And then when I created the list, I created it as a set, then copied it to a list. It’s a redneck-engineering method of preventing duplicates. (I would just leave it as a set, but I need the indexes of the items in the list for my code.)
This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.