[Python] PTT crawler in Python 使用Python爬批踢踢（網路爬蟲）（二）－Jialin

延續這篇： [Python] PTT crawler in Python 使用Python爬批踢踢（網路爬蟲）

補上換頁的功能^^

使用BeautifulSoup來讀取網頁內容、Selenium控制瀏覽器來進行爬蟲。

對批踢踢的電影版（movie）爬蟲，

第11行 User可以輸入想要擷取幾頁，存進num_page變數中

第14行 用while迴圈來執行要換頁的次數（num_page次）

第20行 為觀察網頁原始碼所得到的每一個PO文，都是被屬性值class名為r-ent的標籤所包圍

第26行 每擷取完一頁面，該程式需擷取的頁面就少一（num_page就減1）

程式碼

印出

本篇使用BeautifulSoup與Selenium來完成換頁爬文的操作，

但是好像可以只用Selenium來完成，再來研究看看 =)

Selenium的操作可以參考這篇 [Python] 使用Selenium在Google Chrome瀏覽器

參考：

Stackoverflow，http://stackoverflow.com/questions/15985339/how-do-i-get-current-url-in-selenium-webdriver-2-python

請不吝指教 =)

Jialin

Jialin 發表在痞客邦留言(0) 人氣()

E-mail轉寄

Jialin

It's more fun to be a pirate than to join the navy. (Steve Jobs)
email: jialin9112@gmail.com，我的CodePen

[Python] PTT crawler in Python 使用Python爬批踢踢（網路爬蟲）（二）

留言列表

文章搜尋

文章分類

熱門文章

最新文章

文章精選

最新留言

新聞交換(RSS)

參觀人氣

QR Code

POWERED BY

Jialin

It's more fun to be a pirate than to join the navy. (Steve Jobs) email: jialin9112@gmail.com，我的CodePen

[Python] PTT crawler in Python 使用Python爬批踢踢（網路爬蟲）（二）

留言列表

文章搜尋

文章分類

熱門文章

最新文章

文章精選

最新留言

新聞交換(RSS)

參觀人氣

QR Code

POWERED BY

It's more fun to be a pirate than to join the navy. (Steve Jobs)
email: jialin9112@gmail.com，我的CodePen