[Python] 251 BeautifulSoupのparser選択

久々のスクレイピングネタです。

万能だと思っていた構文解析器 lxmlがそうでもなかったのでメモ書き。

from bs4 import BeautifulSoup

＜中略＞

# webページのソースコードを取得
html = driver.page_source.encode('utf-8')

# 基本的にはlxmlを使っています
try:
  soup = BeautifulSoup(html, "lxml")
# うまくいかない場合はデフォルトのhtml.parserを使ってみる
except:
  soup = BeautifulSoup(html, "html.parser")

# 要素を抽出(例)
elements = soup.find_all("td",{"class":"txt_l"})
href_l = [str(e) for e in elements if 'href="/race' in str(e)]

日	月	火	水	木	金	土
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31