Let say I want to parse a list of links and return that in a dictionary format
I am following along with an online course and I want to create a solution that can validate multiple links or a list of links, so how I go about that?
import re
def url_validate(input):
url_reg = re.compile(r'(https?)://(www\.[A-za-z-]{2,256}\.[a-z]{2,6})([-a-zA-Z0-9@:%_\+.~#?&//=]*)')
match = url_reg.search(input)
if match:
return dict(({
'Protocol': match.group(1),
'Domain': match.group(2),
'Remaining': match.group(3)
}))
return f"This is not url {input}"
url_validate("https://www.youtube.com/watch?v=emHAoQGoQic&list=LLEvmU2o3RMbp4lpXdKgfCnw&index=5&t=0s")
This works fine with one link though
Out:
{'Protocol': 'https',
'Domain': 'www.youtube.com',
'Remaining': '/watch?v=emHAoQGoQic&list=LLEvmU2o3RMbp4lpXdKgfCnw&index=5&t=0s'}
What I understood is that you have a list of links, and you want to parse each one them. Right? According to my understanding I edited your function so that it takes a list of links and return a list of dictionaries (a dict for each link). Here’s my solution:
import re
def url_validate(input_links):
input_size = len(input_links)
output_links = []
for i in range(input_size):
url_reg = re.compile(r'(https?)://(www\.[A-za-z-]{2,256}\.[a-z]{2,6})([-a-zA-Z0-9@:%_\+.~#?&//=]*)')
match = url_reg.search(input_links[i])
if match:
output_links.append(dict(({
'Protocol': match.group(1),
'Domain': match.group(2),
'Remaining': match.group(3)
})))
# return dict(({
# 'Protocol': match.group(1),
# 'Domain': match.group(2),
# 'Remaining': match.group(3)
# }))
#return f"This is not url {input}"
return output_links
links = ["https://www.youtube.com/watch?v=emHAoQGoQic&list=LLEvmU2o3RMbp4lpXdKgfCnw&index=5&t=0s", "https://www.google.com/", "blah blah blah"]
print(url_validate(links))
Note: the code returns only valid links as dicts, if it found an invalid link it just ignores the link.
Hope this helps.