Pythonic problem with lists

I have list of items

list_of_ids = ['abc','xyz','123']

And another very big list containing long strings with ids

big_list = ['>abc\nssssssdadadddddddddddda','>qwe\nsaddwwwwwwwwwwwww',...,'>uio\nasdf','123\ndaswdwwwwwwwwww']

I need to pull out these long strings which contain ids from list

I’ve tried with

new_list = [] 
for ID in list_of_ids:
    for i in big_list:
        if ID in i:
            new_list.append(i)

But I have a problem with my twisted logic and something is wrong. I’m a biologist not a programmer.

Please need help with this
All best

Can you explain what is going wrong? I’m a bit rusty on my Python, but that should be working.

Could you make a minimal example on https://repl.it/ that demonstrates the issue?

My copy-pasting of your code seems ok, though you might have issues with duplicate entries
my repl link

@ArielLeslie @JeremyLT https://repl.it/@biomg/problem-with-list#main.py
I do not know what is wrong… it works fine on small data. But on big not. :exploding_head: :exploding_head: :exploding_head: :exploding_head:

there are 477154 strings in really_big_list and 29219 names… and script give me back 477154 strings not 29219 in new list.

If a string in really_big_list contains multiple IDs, then it will get added to the new_list more than once.

For example, if you had '>abc\nlkjoihohohojohnohn123lkjohhohopihnohjo' in your example, then it would get added to new_list once when 'abc' was found in it and once when '123' was found in it.

My suspicion is that in order to do what you really want to, you’re going to want to use regular expressions.

OK thanks I will consider regex :slightly_smiling_face:

Good luck. Happy coding!

1 Like

Is it possible that every string accidentally matches?

Your code seems to not run:
repl.it link