Chinese word handling on encode/decode

I am trying to scrap the data from a web, I find that the Chinese word is missing in the extracted html. Would you have any suggestion how to handle this case ?

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
import requests as rq
import pandas as pd
import re
import numpy as np


from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.common.exceptions import NoSuchElementException
from selenium.common.exceptions import ElementClickInterceptedException
from bs4 import BeautifulSoup
from time import sleep
from datetime import datetime


chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
wd = webdriver.Chrome("chromedriver.exe")
driver = webdriver.Chrome("chromedriver.exe")

base_url = str("https://ps.hket.com/srde001/%E4%B8%80%E6%89%8B%E6%88%90%E4%BA%A4%E8%A8%98%E9%8C%84")

driver.get(base_url)

sleep(30)

# Get the source of the current page
html = driver.page_source

# decode to work around error
html_dec = html.encode('utf-8').decode('ascii', 'ignore')
print("Extract the whole html for checking")
print(html_dec)

It looks all Chinese to me :flushed:,

Can this help you?
python - How to decode unicode in a Chinese text - Stack Overflow

Dear Brain,

Thank you for your information, I need time to test, I am using ‘utf-8’, but still fail.
The Chinese word in the div class “rt-th” is missing, but I can see it from the web site.
Still don’t know why missing.

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.