Chinese word handling on encode/decode

Dr_Strange · May 20, 2021, 5:00am

I am trying to scrap the data from a web, I find that the Chinese word is missing in the extracted html. Would you have any suggestion how to handle this case ?

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
import requests as rq
import pandas as pd
import re
import numpy as np


from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.common.exceptions import NoSuchElementException
from selenium.common.exceptions import ElementClickInterceptedException
from bs4 import BeautifulSoup
from time import sleep
from datetime import datetime


chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
wd = webdriver.Chrome("chromedriver.exe")
driver = webdriver.Chrome("chromedriver.exe")

base_url = str("https://ps.hket.com/srde001/%E4%B8%80%E6%89%8B%E6%88%90%E4%BA%A4%E8%A8%98%E9%8C%84")

driver.get(base_url)

sleep(30)

# Get the source of the current page
html = driver.page_source

# decode to work around error
html_dec = html.encode('utf-8').decode('ascii', 'ignore')
print("Extract the whole html for checking")
print(html_dec)

Brain150 · May 21, 2021, 5:15pm

It looks all Chinese to me ,

Can this help you?
python - How to decode unicode in a Chinese text - Stack Overflow

Dr_Strange · May 24, 2021, 6:30am

Dear Brain,

Thank you for your information, I need time to test, I am using ‘utf-8’, but still fail.
The Chinese word in the div class “rt-th” is missing, but I can see it from the web site.
Still don’t know why missing.

system · November 22, 2021, 6:30pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Help in scraping a webpage	4	467	June 1, 2021
Python Selenium Help	1	1045	June 1, 2021
Can't find element with Selenium/Python Python	4	7150	June 1, 2021
Another newbie struggling to make web scraper	5	1900	June 1, 2021
Web scraping with selenium Python	10	1479	November 14, 2022

Chinese word handling on encode/decode

Related topics