๊ฐœ๋ฐœ์ผ๊ธฐ ๐Ÿ’ป/Python

[Python] Colab ํ™˜๊ฒฝ์—์„œ ์…€๋ ˆ๋‹ˆ์›€์œผ๋กœ ํฌ๋กค๋งํ•˜๊ธฐ (WebDriver Exception ํ•ด๊ฒฐ)

xoghks_h 2024. 5. 7. 17:47

Selenium์˜ WebDriver๋ฅผ ์‚ฌ์šฉํ•ด ํฌ๋กค๋งํ•˜๊ธฐ

# ์ถ”๊ฐ€ ํŒจํ‚ค์ง€ ์„ค์น˜
!pip install supabase # ์ˆ˜ํŒŒ๋ฒ ์ด์Šค SDK ์„ค์น˜
!pip install selenium # ํ—ค๋“œ๋ฆฌ์Šค ๋ธŒ๋ผ์šฐ์ €๋ฅผ ์œ„ํ•œ ํ…Œ์ŠคํŠธ ์ž๋™ํ™” ํˆด
!pip install beautifulsoup4 # html ํŒŒ์‹ฑ ํˆด

 

  • ํฌ๋กฌ ํ™”๋ฉด ์šฐ์ƒ๋‹จ ... ๋ฉ”๋‰ด ๋ฒ„ํŠผ ํด๋ฆญ → ์„ค์ • → ํ•˜๋‹จ Chrome ์ •๋ณด ํด๋ฆญ
  • ํฌ๋กฌ ๋“œ๋ผ์ด๋ฒ„ ๋‹ค์šด๋กœ๋“œ
 

Chrome for Testing availability

chrome-headless-shellmac-arm64https://storage.googleapis.com/chrome-for-testing-public/124.0.6367.118/mac-arm64/chrome-headless-shell-mac-arm64.zip200

googlechromelabs.github.io

import platform
import sys, os, requests, zipfile

# ์šด์˜์ฒด์ œ ๋ฐ ์•„ํ‚คํ…์ฒ˜ ํ™•์ธ
os_name = platform.system().lower()
architecture = platform.machine()

if os_name == 'darwin':
    if architecture == 'arm64':
        print("์šด์˜์ฒด์ œ: macOS, ์•„ํ‚คํ…์ฒ˜: ARM64")
    elif architecture == 'x86_64':
        print("์šด์˜์ฒด์ œ: macOS, ์•„ํ‚คํ…์ฒ˜: x64")
elif os_name == 'windows':
    if sys.maxsize > 2**32:
        print("์šด์˜์ฒด์ œ: Windows, ์•„ํ‚คํ…์ฒ˜: 64-bit")
    else:
        print("์šด์˜์ฒด์ œ: Windows, ์•„ํ‚คํ…์ฒ˜: 32-bit")
else:
    print(f"์šด์˜์ฒด์ œ: {os_name}, ์•„ํ‚คํ…์ฒ˜: {architecture}")
chrome_driver_url = 'https://storage.googleapis.com/chrome-for-testing-public/124.0.6367.91/linux64/chromedriver-linux64.zip'
  • ํฌ๋กฌ ๋“œ๋ผ์ด๋ฒ„ ์„ค์น˜ ๋ฐ ์ž‘๋™ ํ™•์ธ
# ๋‹ค์šด๋กœ๋“œ
os.makedirs('./driver', exist_ok=True)
with requests.get(chrome_driver_url) as response:
    with open('./driver/chromedriver.zip', 'wb') as file:
        file.write(response.content)


# ์••์ถ•ํ•ด์ œ
with zipfile.ZipFile('./driver/chromedriver.zip') as zip_ref:
    zip_ref.extractall('./driver')
os.remove('./driver/chromedriver.zip')
from glob import glob

driver_path = None
if os_name == 'darwin': # ๋งฅ ์‚ฌ์šฉ์ž
    driver_path = glob('./driver/**/chromedriver', recursive=True)[0]
else: # ์œˆ๋„์šฐ ์‚ฌ์šฉ์ž
    driver_path = glob('./driver/**/chromedriver', recursive=True)[0]
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from bs4 import BeautifulSoup
service = Service(executable_path=driver_path)
chrome_options =  webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--blink-settings=imagesEnabled=false')

driver = webdriver.Chrome(service=service, options=chrome_options)
url = 'ํฌ๋กค๋ง ํ•  url'
driver.get(url)

์œ„์˜ ๋ฐฉ์‹๋Œ€๋กœ ๋ฌธ์ œ์—†์ด ์ง„ํ–‰๋œ๋‹ค๋ฉด ํฌ๋กค๋ง์ด ์„ฑ๊ณต์ ์œผ๋กœ ์ง„ํ–‰๋œ๋‹ค. ํ•˜์ง€๋งŒ ๋‚˜์˜ ๊ฒฝ์šฐ ์…€๋ ˆ๋‹ˆ์›€ ๋ฒ„์ „์˜ ๋ฌธ์ œ์ธ์ง€ ํฌ๋กฌ ๋ธŒ๋ผ์šฐ์ €์˜ ๋ฌธ์ œ์ธ์ง€ WebDriver ๊ฐ์ฒด๋ฅผ ๋งŒ๋“ค ๋•Œ, Chrome binary๋ฅผ ์ œ๋Œ€๋กœ ์ฐพ์ง€ ๋ชปํ•˜๋Š” ์—๋Ÿฌ๊ฐ€ ๋ฐœ์ƒํ–ˆ๋‹ค.

์—๋Ÿฌ ๋‚ด์šฉ

 

WebDriverException ๋ฐœ์ƒ ์‹œ - Seleniumbase

 ์ฐพ์•„๋ณด๋‹ˆ selenium์ด ์ตœ์‹  ๋ฒ„์ „์œผ๋กœ ์—…๋ฐ์ดํŠธ ๋˜๋ฉด์„œ, seleniumbase ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ๋‚ด Driver ํด๋ž˜์Šค๋ฅผ ์‚ฌ์šฉํ•ด์•ผ ์ •์ƒ์ ์œผ๋กœ ์ž‘๋™ํ•œ๋‹ค๊ณ  ํ•œ๋‹ค.

# ํŒจํ‚ค์ง€ ์„ค์น˜
!pip install seleniumbase
from seleniumbase import Driver
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from bs4 import BeautifulSoup

driver = Driver(browser="chrome", headless=True)
url = 'ํฌ๋กค๋ง ํ•  url' 
driver.get(url)

print(driver) # ์‹คํ–‰ ํ™•์ธ
# ํŒŒ์‹ฑ
soup = BeautifulSoup(driver.page_source, 'html.parser')

# ํฌ๋กค๋ง์ด ๋๋‚œํ›„ ๋ฐ˜๋“œ์‹œ ๋ธŒ๋ผ์šฐ์ € ์ž์›์„ ๋ฐ˜๋‚ฉํ•ด์•ผํ•จ.
driver.close()
driver.quit()