Hello
I am working on a project which i have to find consecutive repeats of the DNA STR to match it with the owner. For that i want to use Regex( Regular expression ).
What i did is :
import re
string = "AATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGGTTAAAGCCAAGTGGAAGTTGACGAGCTACGGCACAGGTACCCTATACATACGGTAAATGAGTCGGAGGTTGTGGGTTTAAAGTAAGTCCCCGCTCAACATTCAGCAGACCCTCGAAGTGGGCCCTAAAATCGTGTTGCTAACGCTCCGGACCTGACCCCGAGCTTGGCTCCTAATTGTGTACTCTCTCCAACCAAGCAGCGTACCAACGCGGCAACCAGAGCGAAGCTGTACACGTCGATCATCGTTACGCCTCTACTCGATAGTCGTAGAAACTTGTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTGCGGTTGGTAGCTCTAACTGTCATCGTATTCGCGAATACCTCAGATATAAGCTCCAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAAGTGAATGCACGAGAGTGTTATAGCAGATATCCCCGCTGATCCGGCTGCCGAGGAGGTGGGCATGTGACGTTATGCACTACACAGCTACTACCAAGGTCTTCTGCGGGAAAGGATAGACAAACCGGCAACTCCGCGAGGTCGCGGACTTAGTATTGCGACGGCGTCCTAATCGGCTGGATTTGCGGTTTGTTGGCGTTAGTCCAAAGGTGCCGCTAATGTGGCCATATTTACGATCCACCCTATAGGGCTCCAGGTCGTTTTAAGTCGAGTCGTGTCTAGGGGCCATTCCTGGCCTTGAACGAAAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCGGCCTTAATGCTCAGATTCATATGCTGTGAGGCCGAGGGTGGCGTCATATCTTCGATGATGTTGAACATACGGTCCGGTATTTCGACTTGCCACCTGGTACTGCTTTAAAAGATGATACCATCAACAAAAGGGCACGGCGTGCCTCATGCAGGACGGGACGTTGCCTGCCTACAGCGCTCTACGTAGCAATGTCCGTCTTTCTTCATACACGTATGCTCCTAAAGAAATTGTAGTCTAACAGCTTCCAAACTGTAATCGCCGTTAGGTTCGTCTAAAGTAAAAATGATTGCAAGACGCAATCGAAGGAGCCATCGTTTCGAGGTGACTTCTAATATAACTACCTATGGTCATAGCATGCCCCAACATTGAACGAGGTAAGATCACGGGATCGACTGTCCTGGCGAGGGCCTTACGTTAGTCGTGTAATGCTCCGCGCGTCCCAAATATATGAAAGGCACGACACTCCCCACAATTTAACCCTCCCGCCAAATAAGTACCTAGCGGAGATAAGAATCTGGTCGGTCAGAAAAGGGTCTATGTCCTACAGAGTAGGGCGAAGTCCGCATACCGCAACAGTGCGGTGGCAAACGCTTTAATGACCAGGATCGTGCTAGGCAGTGGAATTTCATGTGGATTGGCCCGCGAATGGACAGGGAGCTATGTCTGAACTCTGTTGACGCTGAACTGTATCCGGATCGTCATGTGAATCGTAGCTATGGGAGTGGTGGTACTGTAAGTCAGGGCTACTTACTGCGGGGTATCTATCTATCTATCTATCTATCCTCACAGTTCATGATTATACGGATGTAATTTGCCGCTGGCTCACGATACGGCTATACAGCGTTGGCTCCTAACGTTGCCACCTACAGTCTGCACTTGGGCACTCGGTATGGTATAAAATATATGACGGCAGACGTTGCGATAAGTAAAAGATCGAACAATCTCGCAGCAAATCTTAAAGCGCATCTAACATCGGGCGTGCGAATGGACCGTTCCGAGGGACACTAGTCGAGCCCCTCTTACAGCTCACAGGTAAATCGATTATCGTACGTAAGTCAAGTCGGCACTGCTTTACGGCAGGTAGTAATGGCTGCGTGCTGCGCAGACCTTCTGCCCCTCAGTTAGTCACGGCCACTAGCCCGGGAAAATATAGTTCGGACAGAAAAATCAGTACCCAGCACCCAACTAAAACAAGTTCTATTCCGAGACGCCTGCGGAGAGCCTCACTCGTTATAACTATGTACGGCGGATGGGGGTAGGGTATAAAGGGCATGCGTCTACACCGATTTCCTGGTTAATGATAATCTAGTTCTTAAAGCACTACTAGGCGCTGCGAATAGGGGTATTGGGCAATAGGCCCTGAATTAACCTTGTTTAGGGTTAGCCTATGCAGCGACCGTAGTACAATAATATCTATAAACGGGTACTCTCCAGACGTATTCATTAACTTCTCAATGAGGAACTATCTACAAAATCAATGAGTGATAACAGCGCATATGAAAAGTATGCAGTTGTTTCAAGCTGTTAACGGCCATTTCCACGAACGTGTTCACAGAGTAGAAGAAACGTAAAGCGTTACTCATCTCCGATACGGTGCGTGCGATGGGGCGTATTGCTTGTAATGTCGAGGGACGGGCATTGAAAAGAGTGCCACAGCATATCGGAGCAATTCACTAGTGAGCGTACCTTGATAAAGCAAAAGGATTACCTATTTTGCACACGTGTGCTAACCCCCAAGACCTGTTGAAACCGCCGAGCATCCGCCAATTTCTAGCACAACATTTCCATCTGCAACTAGCCGTAGAGCACTCAGGAATTTGATCTTAACATGATCGTGGAGGCAAGAAAAAAGGATGCAACAGCACCTTAGAGCACGAGATCATTCCTGGTTAATATTATGCTGTACGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGATCGCCATCATTAAGTACTTATATCTGCATAGAACATTAAGCGAGACGTTTGTGAGATATTCCCCTCTGGGCCCTTAGCTTCGCAGTTCCTCAGCGCCCTAAGATAAACGGGTGTAGCAGAAGAATCGGCGTGCTTTTTACAAGTCCTGCCGGCGATTCAGCATCAATTATAAACGGCCCCTAATAGAAATAGGGCGGCAGGAGTCAATTGGTATCGTTTTGGAGCCATTCACCGCCAAGGGTCAGATAACCCGGCATTCACTGCTGTATTCCCGGATTAACGGATCTCGGATCCAATGGCCCTCTGTGCCGATCTAATACTGCACGCTTAGTGGGCGGGATCAGATGAATGGCACCTCAGCCCCCCGAATTCAGTTGCTGGCCAGACGAGGGCGGGGACTGTTTGGAATTATTTGCTCAGTCCTTTTATCATCCCGATGCTATGACTCAATCCTCTAGATCCTTGGATGTCTCAGGAAATCTCACACATCATAGTCAACAAGAAACGAGACAAACTCGACTTGAGACTTCATCGCCTACAGTGTTTTATTGTAACGGGCACCTCTATATGTCGTCTTGATGGCATCAACAGCGCATGGTGATACATCGCTAGCGGCATTAGGCTTGATTGGTGCTTGCCGGGCGGGAGGCCATTTGGAGAGAGGCAGACTATCGTGGCATGCCGTAGCGCTTTGCATGCAGGTGGCGCGACCGTAAGGAGTGCAAGATGTAGATTGTCACGCTAAAGTTTATCACGTGATACTAGCTGACGTGTCCATAAGGCACGCAACAGCCTGCTCTAGGTTACTGTAGGGCTTGGCGATAGCATAGATAGGCCTGAGGGAGTTCTGGCGTAATAGTTGTTAGATAAAGCTGCCCAAATCCAACAGCTGGATTTCATGTGTGTTTGATAGCGCAATGCACTCATACTCAGTCCTTGCCAGCATGCTGTCACACGATGTACATCGTTAGCCCTAAGAGCCCCGTCGAGTAGCTAGTAAGCCTCATGAATGATACTCGGGGCCTCCCGACATAGACGCAGCTTGAGTGTCGGACGAGTATAAGCCATCCCAATGATTTGCCACTTAGAGAGTAGCGCCGTTTGGGATTGAGTCGAAGAGCGTGGCCTTAGACCACATATGATTTGCTTGCGCCTCCGTATCGCTTGCATTTGAGATGGAGCCTCATTTCTCTACCATCGCCGACTAGCAAGTTACCGATGGACAAGCCTAGCTTGTGTACTTTGAGAGTGGCTTCGTCACCAAAGGGTAGCCATAACCTCAATGGCTGTGATCTCTTACCCCCGGGGTCGGGCGAGATCTGGGCGAGAAGACTGCACGAGCCCTAGAAACTGCAAGTGGCACGGCTTCTTGTCCCATAGGCTATTGAGGGCATTGTTGAGTCGAAGTTTCTCCTAAAAATGTGAACATAGTTTCCCGCTCAGAGATACTCGCTTAAAACTCATACCATGGATGGCTGGAATGGACAAGCGGTATTCGTGCTGTGTAGGGATCCGCGTTGGTCTATTAACCACTGAGCGGATGCGGATTAAAGGGACAGACGATTACACGCCACGGAAGTCCTCGTCTGTGACGGGTCCCTCGCGTCTCCCCCAGAGGACCTTCATTCCCCGGTGGAGCGTCCATACGGTCTAGCTTGTACGCTTCGGGGTCGGGTATCGGACTGACCTATACGACAGACATATCCTAGAGAGGCCTAGATGGACCGGGAGCACGCGAGGGCAAACTCCCTCGCTATCCCACTTCGATTTCCCGGGGAGGGCGGCGTTTTAACACGTAAGGCACGTCTATTAGATGAGCTTATATATATGCGAACTTTGATCCAATTGGCACAGAACGTCAATTAAGAAAAATAATACGGAGATAGTGCCGCAATTGTCCATTTATACGCACCCTCTTTCTAGTATCTAACGTTCTTGGTACGCGGTCCACTAGACCCGACTCATAGCGTTATAATTTCCTGGTATCTATTAAATCGTCGGCCGTCTTTTCCACTAGTAACCTGCTCTTAGGCCGCAGGCACGGGCGTACGATACCCCCCGTACGGTGTAACATCAGTGCGAAGTAAATACGGGGCCAGCGTGTAGACGATAGTCATGTTAGCTGGAAGGGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATATTCTGAGTATGCCGATCCAGGTTTGGCAGCAACGGAAAATATCTTCTACTTGGGCCCCTATAACGAAATGTCTGCCTAACCACCTTTTTTCTGGACCCTCAACATGCCAGTTAACCCCGCGCGGGAAAAGCGTCTGGCGCGGGCGTCGGGATATACTGACCAGTAGAGCACTGATTAAAGTATTTGTGGTTAAAAATTCACAACGTATTCCATGCGGGACACCGACACGCACGTCAGTTGCTCGCAGGTGATGGTAGAGGGGTGGATCGACCGAGGTCGGGTTGGTGGGTAAAGGTTAGCCTGCACCACGCGAATGTGCTCCATTCAATTTTGGGGGTGCGATTCTCCGTTGCGGGATCCAAGAGGAGTTAAGATGGCCTTGTCCAGTTGAAACTTGGCTGTGGCATGGGCGACAAGATAAAAGGGTTATTACTGATCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGCACTGAGGTCTAGTACGTACGATGAGTGAGCATCGTTATTGGAAAAAGTCATGAACCGG "
x = re.findall(r"(AATG)+", string)
The thing i am trying to do is to get all consecutive repeats of AATG
Here the problem is in my Regex . it should return 13 , but it gives me 27 back !
Here is a picture of the Correct values that i should get for that DNA sequence.
The DNA sequence here is for Ron which is highlighted in Blue. So i should get 13 back instead of 27.
The regex and the code is in python !
I would really appreciate it if anyone can help . Thanks,