Regex Matching in Python

Hi! So I have two data items. One is a list of strings that I want to trim down to a smaller list (call it “A”). The other is a series of strings (call them “B”), which are (possible) partial matches for the list. For instance, if I have “abra” in B, then I should get a match for the entry “abracadabra” in A.
B currently takes the form “a|b|c|d|…” and is used in conjunction with REGEXP_CONTAINS in Google BigQuery, which I believe translates to re.match()'s functionality in Python.
Using this method, I can get entries in A that are matches in B. But is there a way to find out which string in B it exactly corresponded to? Doing it with manual code makes the runtime extraordinarily long, and the algorithm used behind the scenes must somehow run through the options in B–so perhaps there’s feature that allows you to extract which string the match() function stopped at? If not, what’s an efficient way I might implement such an algorithm?
Thanks!

Please post your code so far!

1 Like

I am not sure if it will help much, as it is an SQL query at the moment, but here it is:

    """ SELECT
        co.person_id,
        co.condition_source_value,
        REGEXP_CONTAINS(co.condition_source_value, r'""" + comorbidities_str + """')""" + """ AS is_comorbidity
    FROM
        `""" + path + """.condition_occurrence` co"""