Dynamic Javascript Scraping – Web scraping with Beautiful Soup 4 p.4




Welcome to part 4 of the web scraping with Beautiful Soup 4 tutorial mini-series. Here, we’re going to discuss how to parse dynamically updated data via javascript.

Many websites will supply data that is dynamically loaded via javascript. In Python, you can make use of jinja templating and do this without javascript, but many websites use javascript to populate data. To simulate this, I have some javascript added to the sample page: https://pythonprogramming.net/parsememcparseface/

https://pythonprogramming.net

https://www.facebook.com/pythonprogramming.net/
https://plus.google.com/+sentdex

Original source


40 responses to “Dynamic Javascript Scraping – Web scraping with Beautiful Soup 4 p.4”

  1. Hi, thanks for making this tutorial. Can you also provide the codes for PyQt5? I've tried installing PyQt4 but i just couldn't get it to install. I have no other choice but to work with PyQt5 that comes with Python 3.6.

  2. Hi sentdex,  thank you very much for sharing your Python programming experience. May I ask a question? Is it possible to extract the information "Look at you shinin!" between the <script> tag without mimicking the browser?

  3. 2:20
    when i run the code showing this error:
    Traceback (most recent call last):
    File "C:UsersusernameDesktopa.py", line 9, in <module>
    print(js_test.text)
    AttributeError: 'NoneType' object has no attribute 'text'

  4. Hi,
    I've just seen your video and it helped me understanding the principle behind scraping dynamic pages. I tried the code on your page and it worked fine, but I ran into a problem: I tried it on another website and after like 15 minutes the line: "client_response = Client(url)" is still being executed. Does scraping like this takes an eternity for bigger sites? Or is something wrong with code?
    I am using pythin 3.6 and 4.11 pyqt.
    Regards

  5. Can you make a tutorial of explaining how to import from a website that contains a list of links, and each link points to a different dataset. I wonder how to import those datasets from the links in the same webpage and combine them in a dataframe. Thaaaaaanksssss……

  6. working code
    import sys
    from PyQt5.QtWidgets import QApplication
    from PyQt5.QtCore import QUrl
    from PyQt5.QtWebKitWidgets import QWebPage
    import bs4 as bs
    import urllib.request

    class Client(QWebPage):
    def _init_(self,url):
    self.app= QApplication(sys.argv)
    QWebPage.__init__(self)
    self.loadFinished.connect(self.on_page_load)
    self.mainFrame().load(QUrl(url))
    self.app.exec_()

    def on_page_load(self):
    self.app.quit()

    url = 'https://pythonprogramming.net/parsememcparseface/'
    client_response = Client(url)
    source = client_response.mainFrame().toHtml()

    soup = bs.BeautifulSoup(source,'lxml')
    js_test = soup.find('p',class_='jstest')
    print(js_test.text)

  7. Nice Tutorial. I am trying to scrape some data from the website http://www.airlinequality.com but i don't know why the code below it is bit working. can you help me?

    from bs4 import BeautifulSoup
    import os
    import urllib.request
    import re

    thepage = urllib.request.urlopen("http://www.airlinequality.com/airline-reviews/aegean-airlines")
    soup = BeautifulSoup(thepage, "lxml")
    #print(soup)
    for profile in soup.findAll('article',{"itemprop":"review"}):
    image = profile.text
    print(image)

  8. Thanks a lot of this great tutorial! It works really nicely for scraping a single page, but when looping through multiple pages it retrieves all the html but throws this error at the end:
    QObject::connect: Cannot connect (null)::configurationAdded(QNetworkConfiguration) to QNetworkConfigurationManager::configurationAdded(QNetworkConfiguration)
    QObject::connect: Cannot connect (null)::configurationRemoved(QNetworkConfiguration) to QNetworkConfigurationManager::configurationRemoved(QNetworkConfiguration)
    QObject::connect: Cannot connect (null)::configurationChanged(QNetworkConfiguration) to QNetworkConfigurationManager::configurationChanged(QNetworkConfiguration)
    QObject::connect: Cannot connect (null)::onlineStateChanged(bool) to QNetworkConfigurationManager::onlineStateChanged(bool)
    QObject::connect: Cannot connect (null)::configurationUpdateComplete() to QNetworkConfigurationManager::updateCompleted()
    Any idea on how to fix this?

  9. Hey. Is there any way I can use Beautiful Soup to fill out forms, click a button, then scrape information off of a page?

    I want to create a web scraper/crawler that will scrape textbook information off of an online textbook store. To search for the textbook, I need to fill out a form and pick several options (department, term, course, section, etc), click a submit button, and wait for the page to load.

    Any ideas?

    Thanks.

  10. It's amazing how everytime i have a problem in python i run into one of tutorials and solve it XD. Just thank you. But i still have a question:
    To make the program lighter in case there are several scripts can you somehow onl run one of them?
    Thanks again for the tutorials :p

  11. hi,
    can we install pyQt4 on centos 6.
    or on the other hand i wana develop webapp and upload in VPS host for extracing data. PhantomJS makes so many problem in cgi-bin therefor I thought qtwebkit could be better.

  12. I'm getting an error "AttributeError: 'Client' object has no attribute 'mainFrame'" any thoughts on how to fix this? I'm using Python 3 and PyQt5.

    For PyQt5 I used:
    from PyQt5.QtWidgets import QApplication
    from PyQt5.QtCore import QUrl
    from PyQt5.QtWebKitWidgets import QWebEnginePage

    I can't figure out what's causing that.

  13. Hello! Thank you for these lessons! What is wrong i did?[ Traceback (most recent call last):
    File "C:Pythonparse1.py", line 2, in <module>
    from PyQt4.QtGui import QApplication
    ImportError: DLL load failed: no found this module]

  14. @sentdex Bro, I've been watching your tutes of a long time and its helped me loads! <3 Love it! You make the hardest stuff easy! And also show implementations! Can you please give some more tutorials on A.I. for beginners? Would love that mate! best wishess!

  15. I have a question~! How can I make a new window in matplotlib? When I run plt.show(), it just shows its graph in ipyton console instead of making a new window. I use anaconda Spyder python IDE. Please… tell me how to open a new window~!

Leave a Reply