Scraping JavaScript pages with Scrapy and Splash



This video is part of the “Learn Scrapy” series. In this video, you’ll learn how to use Splash to render JavaScript based pages for your Scrapy spiders.

Have a look at the companion website: https://learn.scrapinghub.com/scrapy/

– Splash docs: https://splash.readthedocs.io/en/stable/
– Scrapy-Splash plugin: https://github.com/scrapy-plugins/scrapy-splash

Settings for ScrapySplash:

SPLASH_URL = ‘http://localhost:8050’
DOWNLOADER_MIDDLEWARES = {
‘scrapy_splash.SplashCookiesMiddleware’: 723,
‘scrapy_splash.SplashMiddleware’: 725,
‘scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware’: 810,
}
SPIDER_MIDDLEWARES = {
‘scrapy_splash.SplashDeduplicateArgsMiddleware’: 100,
}
DUPEFILTER_CLASS = ‘scrapy_splash.SplashAwareDupeFilter’

Original source

17 thoughts on “Scraping JavaScript pages with Scrapy and Splash

Leave a Reply