2021年5月 – ページ 2 – mowareのブログ

[Python] 253 リストのCSVファイル化

ネスト(入れ子)になっていない普通のリストのCSVファイル化です。

ネストになったリストはよくCSV化していましたが、普通のリストは意外と扱っていませんでした。

writerowをwriterowsにすると、文字列が1字ずつバラバラになります。

datetime_now = datetime.datetime.now()
datetime_now_str = datetime_now.strftime('%y%m%d%H%M')
filename = f"/{datetime_now_str}_{key}.csv"

with open(filename, "w", encoding="shift_jis") as f:
    writer = csv.writer(f)
    writer.writerow(list)

ネストになったリストの場合は以下の通りです。

with open(filename, "w", encoding="shift_jis") as f:
    writer = csv.writer(f, lineterminator='\n')
    writer.writerows(list)

[Python] 252 スクレイピングイディオム集

知識の整理も兼ねてまとめてみました。

seleniumによるURL取得

from selenium import webdriver

driver = webdriver.Chrome()

driver.get("URL")

selenium ヘッドレスモード

from selenium.webdriver.chrome.options import Options

option = Options()
option.add_argument('--headless')
driver = webdriver.Chrome('/usr/local/bin/chromedriver',options=option)

selenium 要素をクリック

driver.find_element_by_xpath("XPATH").click()

selenium 要素が現れるまで30秒待機

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

WebDriverWait(driver,30).until(EC.presence_of_element_located((By.ID, "ID")))

selenium 要素がクリックできるまで30秒待機

WebDriverWait(driver, 30).until(EC.element_to_be_clickable((By.ID, "ID")))

selenium 文字列を入力

driver.find_element_by_xpath("XPATH").send_keys("STRING")

selenium 要素の文字列を取得

textA = driver.find_element_by_xpath("XPATH").text

selenium 他のタブへ移動

handle_array = driver.window_handles
driver.switch_to.window(handle_array[NUMBER])

selenium カーソルでホバリング

from selenium.webdriver.common.action_chains import ActionChains

actions = ActionChains(driver)
actions.move_to_element(driver.find_element_by_xpath("XPATH")).perform()

[Python] 251 BeautifulSoupのparser選択

久々のスクレイピングネタです。

万能だと思っていた構文解析器 lxmlがそうでもなかったのでメモ書き。

from bs4 import BeautifulSoup

＜中略＞

# webページのソースコードを取得
html = driver.page_source.encode('utf-8')

# 基本的にはlxmlを使っています
try:
  soup = BeautifulSoup(html, "lxml")
# うまくいかない場合はデフォルトのhtml.parserを使ってみる
except:
  soup = BeautifulSoup(html, "html.parser")

# 要素を抽出(例)
elements = soup.find_all("td",{"class":"txt_l"})
href_l = [str(e) for e in elements if 'href="/race' in str(e)]

日	月	火	水	木	金	土
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31