实例:百度搜索,结果列表翻页查询
解决问题:解决selenium driver获取web页面元素时,元素过期问题
思路1:获取所有“页面翻页链接”元素,然后遍历元素并点击
# -*- coding: utf-8 -*-
from selenium import webdriver
import time
if __name__ == "__main__":
driver = webdriver.Firefox()
driver.maximize_window()
driver.get('http://www.baidu.com')
driver.implicitly_wait(5)
driver.find_element_by_id('kw1').send_keys('selenium')
driver.find_element_by_id('su1').click()
page = driver.find_element_by_id('page')
pages = page.find_elements_by_tag_name('a') #查找所有翻页跳转链接
#设置滚动条位置为底部
js = 'document.documentElement.scrollTop=10000'
for each in pages:
driver.execute_script(js) #拖动滚动条到底部
each.click()
driver.execute_script(js)
time.sleep(3)
driver.quit()
结果:点击第3页时,程序出错
selenium.common.exceptions.StaleElementReferenceException: Message: u'Element not found in the cache - perhaps the page has changed since it was looked up' ; Stacktrace:
即在cache中找不到元素,可能是在元素被找到之后页面变换了。 这说明,当前页面发生跳转之后,存在cache中的与这个页面相关的元素也被清空了
思路2:基于思路1的错误结果分析>先获取每个页面数,然后每次点击某个页面,跳转后重新获取下一个页面翻页链接,然后点击,循环。。
# -*- coding: utf-8 -*-
from selenium import webdriver
import time
if __name__ == "__main__":
driver = webdriver.Firefox()
driver.maximize_window()
driver.get('http://www.baidu.com')
driver.implicitly_wait(5)
driver.find_element_by_id('kw1').send_keys('selenium')
driver.find_element_by_id('su1').click()
page = driver.find_element_by_id('page')
pages = page.find_elements_by_tag_name('a')
js = 'document.documentElement.scrollTop=10000'
total = len(pages)
has_pre_page = False
page_num = 0
for i in range(total):
driver.execute_script(js)
pn=10
page_num = page_num + 1 #设置页面号
one_page = driver.find_element_by_css_selector('p[id="page"]>a:nth-of-type('+str(page_num)+')')
one_page.click()