用selenium模拟浏览器点击事件

作者：网络转载发布时间：[ 2017/4/25 16:18:06 ] 推荐标签：功能测试工具 Selenium

　　开始写爬虫时，遇到和js有关的内容只能绕过去，尽可能翻源码找或者模拟post之类的办法获得想要的内容。
　　但是不是所有的post都能模拟的，或者说对于那些有很长的加密和一些莫名繁多的参数以及动不动给你返回一大堆java后端接口信息的post，分析起来成本过高。
　　于是考虑模拟点击事件不失为另一种可行的策略。
　　为什么选择selenium
　　嗯，selenium是一个很好的自动化测试工具(浏览器)集/组件。
　　可能满足需求的还有PhantomJS和Ghost.py之类的东西，但是当我发现Ghost.py is very headless，并且得知在Python中使用PhantomJS的一个比较好的办法是基于selenium 后，那直接选择selenium好了。
　　selenium Python还有一份精心维护的比Ghost.py的官方文档写的还好的非官方文档：我是文档
　　关于文档，我会告诉你前一阵我用grab这个库，它的作者是俄罗斯人以至于很多的troubleshooting和issue都是俄文，我差点学会俄语了好吗。
　　一份简短的代码
　　selenium.webdriver有好多种实现，下面是一个选择Chromedriver的实现方式，包括禁止加载图片等设置。
　　总的来说用起来还是很方便的
　　import time
　　import pymongo
　　from selenium import webdriver
　　from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
　　from selenium.common.exceptions import NoSuchElementException
　　class Crawler(object):
　　def __init__(self):
　　# selenium and webdriver settings
　　# ignore the img
　　chromeOptions = webdriver.ChromeOptions()
　　prefs = {"profile.managed_default_content_settings.images":2}
　　chromeOptions.add_experimental_option("prefs"，prefs)
　　website = 'your website'
　　self.driver = webdriver.Chrome('your chromedriver location'， chrome_options=chromeOptions)
　　# mongo settings
　　client = pymongo.MongoClient("localhost"， 27017)
　　db = client.your_db
　　self.table = db.your_table
　　# start
　　self.driver.get(website)
　　def crawl(self):
　　while True:
　　# refresh
　　self.driver.delete_all_cookies()
　　self.driver.refresh()
　　# possibly without decoy
　　try:
　　self.driver.find_element_by_class_name('decoy').click()
　　except NoSuchElementException:
　　pass
　　time.sleep(1)
　　# question
　　question = self.driver.find_element_by_class_name('wrapper').text
　　# current word
　　word = question.split(' ')[1]
　　# choices
　　choices = [i.value_of_css_property('background-image').split('(')[1].split(')')[0] for i in self.driver.find_element_by_class_name('choices').find_elements_by_tag_name('a')]
　　# .......
　　if __name__ == '__main__':
　　Crawler().crawl()

关键词阅读

测试热点文章

技术专题

活动专题