MY GOAL:
- extract the mobile number from images in PYTHON, I have 500 images, cant open by one.
- all the images containing in a folder, this program should remove duplicates & other alphabetical characters as well.
- need to store all the mobile no. in a single TEXT file after removing duplicates.
- need to add all the country codes and format of mobile number all over the globe, currently i have added only indian mobile number format.
- PROBLEM: this code was working 1 year back, now its not working, I don’t know how to give the path in python as well, can someone help me out to get the desired output I want.
SOURCE CODE: Extracting phone numbers from multiple images using Python | by Ankit Gupta | Medium
import os, pyperclip, re, send2trash
from pytesseract import image_to_string
path = os.path.dirname(os.path.realpath(__file__))
input_path = path + '/Input/'
all_text = []
for root, dirs, filenames in os.walk(input_path):
for filename in filenames:
try:
img = Image.open(input_path + filename)
all_text.append(image_to_string(img))
# Deleting the files scanned
send2trash.send2trash(input_path + filename)
except:
continue
# +91 95959 59595 or 07557575575 or 0-99555 55999
phone_regex = re.compile(r'''(
(\+91|0)? # country code
(\s|-|\.)? # seperator
(\d{5}) # 5 digits
(\s|-|\.)? # seperator
(\d{5}) # 5 digits
)''', re.VERBOSE)
text = str('\n'.join(all_text))
matches = []
phone_num = ''
for groups in phone_regex.findall(text):
phone_num = ''.join([groups[3], groups[5]])
matches.append(phone_num)
if len(matches) > 0:
distinct_matches = list(dict.fromkeys(matches))
if len(matches)!=len(distinct_matches):
print(str(len(matches)-len(distinct_matches)) + ' Duplicates Removed')
else:
print('No duplicates found!')
pyperclip.copy('\n'.join(distinct_matches))
print('Copied to clipboard')
else:
print('No Phone no. was found!!')