I have a sequence of characters. The characters are ordered in chronological order. I am looking for an algorithm to group the sequence and remove errors in the data. I am having an hard time to explain in words/maths what the requirement of fixing the sequence is but something like “the outcome is to group as many characters as possible in a chain of constant letters” and minimum sequence length should be a setting. Below I am trying to visualize by some examples what I mean (3 chars minimum):

AAACCAACA => AAAAAAAAA

ACACAAAAA => AAAAAAAAA

AYYYYYYYA => YYYYYYYYY

YYYYAAYYY => YYYYYYYYY

AYAYAYYYY => YYYYYYYYY

And for longer sequences it becomes a little more tricky

AYAYAYAYAAAAAAAAAAYAYYYYYYYYYYYYYY => AAAAAAAAAAAAAAAAAAYYYYYYYYYYYYYYYY

HTHTTHHHTTHHH => TTTTTHHHHHHHH

TTOAOAOOAATTA => OOOOOOOOAAAAA

This is bit tricky because even thought the 3 O’s aren’t directly next to each other, they are still the most probable uninterrupted sequence of characters.

Does anyone know of any algorithm (Machine learning?) or similar (term of this problem, problem name) that can do this type of error-code correction?

Thank you in advance