• Friday , 22 September 2017

Dynamic Javascript Scraping – Web scraping with Beautiful Soup 4 p.4

Code Canyon



Welcome to part 4 of the web scraping with Beautiful Soup 4 tutorial mini-series. Here, we’re going to discuss how to parse dynamically updated data via javascript.

Many websites will supply data that is dynamically loaded via javascript. In Python, you can make use of jinja templating and do this without javascript, but many websites use javascript to populate data. To simulate this, I have some javascript added to the sample page: https://pythonprogramming.net/parsememcparseface/

https://pythonprogramming.net

https://www.facebook.com/pythonprogramming.net/
https://plus.google.com/+sentdex

Original source

3d Ocean

Related Posts

40 Comments

  1. Simon Chan
    August 21, 2017 at 07:29

    Hello Harrison. Did you eventually make that tutorial on multi-processing / mutlithreading with PyQt?

  2. Eric Choi
    August 21, 2017 at 07:29

    Hi, thanks for making this tutorial. Can you also provide the codes for PyQt5? I've tried installing PyQt4 but i just couldn't get it to install. I have no other choice but to work with PyQt5 that comes with Python 3.6.

  3. segun oyebode
    August 21, 2017 at 07:29

    i am crapping a page that required login, i have login with my code but i can't scrap the data from the table beacause it is dynamic how can i do that with pyqt with the login?

  4. Product Dutt
    August 21, 2017 at 07:29

    "Cannot connect to X server" what is the issue?

  5. Harsha's Python Guide
    August 21, 2017 at 07:29

    Thanks for the playlist..

  6. Samir Saci
    August 21, 2017 at 07:29

    If you got the error for js_test.text : be sure to have urllib.request.urlopen(link) and not urllib.request.urlopen(link).read()

  7. Huan Wang
    August 21, 2017 at 07:29

    Hi sentdex,  thank you very much for sharing your Python programming experience. May I ask a question? Is it possible to extract the information "Look at you shinin!" between the <script> tag without mimicking the browser?

  8. Md Sarwar
    August 21, 2017 at 07:29

    2:20
    when i run the code showing this error:
    Traceback (most recent call last):
    File "C:UsersusernameDesktopa.py", line 9, in <module>
    print(js_test.text)
    AttributeError: 'NoneType' object has no attribute 'text'

  9. Naimur Rahman
    August 21, 2017 at 07:29

    is there any way to use it in a py 'Qt designer' Gui app?
    as QApplication(sys.argv) is called twice then and so new event loop is created and function fails to execute..

    any solution? :/

  10. Еркін Абдукаримов
    August 21, 2017 at 07:29

    Hello, i want parsing one website, which information update(add new) when you scroll down(info in table),and how i can parse all 'td.text'

  11. Yawgmoth1806
    August 21, 2017 at 07:29

    Hi,
    I've just seen your video and it helped me understanding the principle behind scraping dynamic pages. I tried the code on your page and it worked fine, but I ran into a problem: I tried it on another website and after like 15 minutes the line: "client_response = Client(url)" is still being executed. Does scraping like this takes an eternity for bigger sites? Or is something wrong with code?
    I am using pythin 3.6 and 4.11 pyqt.
    Regards

  12. Sohil Luhar
    August 21, 2017 at 07:29

    I get this error when try to run please help

    File "D:/Python/test.py", line 20
    url = 'https://pythonprogramming.net/parsememcparseface/'
    ^
    IndentationError: unindent does not match any outer indentation level

  13. ekbastu
    August 21, 2017 at 07:29

    write a book dude…..

  14. Qian Li
    August 21, 2017 at 07:29

    Can you make a tutorial of explaining how to import from a website that contains a list of links, and each link points to a different dataset. I wonder how to import those datasets from the links in the same webpage and combine them in a dataframe. Thaaaaaanksssss……

  15. PRASHANT GOYAL 4-Yr B.Tech. Chemical Engg.
    August 21, 2017 at 07:29

    I wanted to how can I scrape the title of all the videos in a playlist of more than 100 videos using this from Youtube. Can anyone help.

  16. subhrajit mohanty
    August 21, 2017 at 07:29

    I want to scrap from a website containing reviews comments load on click of read more. Could you please suggest me what I have to do? I am new to web scraping.

  17. CariagaXIII
    August 21, 2017 at 07:29

    working code
    import sys
    from PyQt5.QtWidgets import QApplication
    from PyQt5.QtCore import QUrl
    from PyQt5.QtWebKitWidgets import QWebPage
    import bs4 as bs
    import urllib.request

    class Client(QWebPage):
    def _init_(self,url):
    self.app= QApplication(sys.argv)
    QWebPage.__init__(self)
    self.loadFinished.connect(self.on_page_load)
    self.mainFrame().load(QUrl(url))
    self.app.exec_()

    def on_page_load(self):
    self.app.quit()

    url = 'https://pythonprogramming.net/parsememcparseface/'
    client_response = Client(url)
    source = client_response.mainFrame().toHtml()

    soup = bs.BeautifulSoup(source,'lxml')
    js_test = soup.find('p',class_='jstest')
    print(js_test.text)

  18. Mark Jay
    August 21, 2017 at 07:29

    dang I wanted to see a Qt browser

  19. Efthimis Ath
    August 21, 2017 at 07:29

    Nice Tutorial. I am trying to scrape some data from the website http://www.airlinequality.com but i don't know why the code below it is bit working. can you help me?

    from bs4 import BeautifulSoup
    import os
    import urllib.request
    import re

    thepage = urllib.request.urlopen("http://www.airlinequality.com/airline-reviews/aegean-airlines")
    soup = BeautifulSoup(thepage, "lxml")
    #print(soup)
    for profile in soup.findAll('article',{"itemprop":"review"}):
    image = profile.text
    print(image)

  20. aeroplaneman747
    August 21, 2017 at 07:29

    Thanks a lot of this great tutorial! It works really nicely for scraping a single page, but when looping through multiple pages it retrieves all the html but throws this error at the end:
    QObject::connect: Cannot connect (null)::configurationAdded(QNetworkConfiguration) to QNetworkConfigurationManager::configurationAdded(QNetworkConfiguration)
    QObject::connect: Cannot connect (null)::configurationRemoved(QNetworkConfiguration) to QNetworkConfigurationManager::configurationRemoved(QNetworkConfiguration)
    QObject::connect: Cannot connect (null)::configurationChanged(QNetworkConfiguration) to QNetworkConfigurationManager::configurationChanged(QNetworkConfiguration)
    QObject::connect: Cannot connect (null)::onlineStateChanged(bool) to QNetworkConfigurationManager::onlineStateChanged(bool)
    QObject::connect: Cannot connect (null)::configurationUpdateComplete() to QNetworkConfigurationManager::updateCompleted()
    Any idea on how to fix this?

  21. Mitchell Woodin
    August 21, 2017 at 07:29

    Is there any way to scrape comments from html to be able to manipulate that text?

    I can't seem to use soup.find_all('<!–') to pull it out.

  22. West Jr
    August 21, 2017 at 07:29

    would this work with data generated from react.js??

  23. Eric Roque
    August 21, 2017 at 07:29

    Hey. Is there any way I can use Beautiful Soup to fill out forms, click a button, then scrape information off of a page?

    I want to create a web scraper/crawler that will scrape textbook information off of an online textbook store. To search for the textbook, I need to fill out a form and pick several options (department, term, course, section, etc), click a submit button, and wait for the page to load.

    Any ideas?

    Thanks.

  24. chari Muvilla
    August 21, 2017 at 07:29

    It's amazing how everytime i have a problem in python i run into one of tutorials and solve it XD. Just thank you. But i still have a question:
    To make the program lighter in case there are several scripts can you somehow onl run one of them?
    Thanks again for the tutorials :p

  25. Mahmoud Talebi
    August 21, 2017 at 07:29

    hi,
    can we install pyQt4 on centos 6.
    or on the other hand i wana develop webapp and upload in VPS host for extracing data. PhantomJS makes so many problem in cgi-bin therefor I thought qtwebkit could be better.

  26. Chris Grippo
    August 21, 2017 at 07:29

    I'm getting an error "AttributeError: 'Client' object has no attribute 'mainFrame'" any thoughts on how to fix this? I'm using Python 3 and PyQt5.

    For PyQt5 I used:
    from PyQt5.QtWidgets import QApplication
    from PyQt5.QtCore import QUrl
    from PyQt5.QtWebKitWidgets import QWebEnginePage

    I can't figure out what's causing that.

  27. Sora Amm Keyheart
    August 21, 2017 at 07:29

    how do I link that code to html <input> tag?
    so when the user paste a link it scrape and display the data on html?

  28. Anastasia Lee
    August 21, 2017 at 07:29

    Hello! Thank you for these lessons! What is wrong i did?[ Traceback (most recent call last):
    File "C:Pythonparse1.py", line 2, in <module>
    from PyQt4.QtGui import QApplication
    ImportError: DLL load failed: no found this module]

  29. Ratul Shams
    August 21, 2017 at 07:29

    @sentdex Bro, I've been watching your tutes of a long time and its helped me loads! <3 Love it! You make the hardest stuff easy! And also show implementations! Can you please give some more tutorials on A.I. for beginners? Would love that mate! best wishess!

  30. Zhuchang Zhan
    August 21, 2017 at 07:29

    Oh sentdex thank you so much again for making me level up in programming grind. What makes you keep going with all the programming? Too much coding often drives me nuts.

  31. G-FORCE GAMING
    August 21, 2017 at 07:29

    What about xhr I'm a beginner btw, I'm getting none for some sites

  32. panzach
    August 21, 2017 at 07:29

    for PyQt5:
    from PyQt5.QtWidgets import QApplication
    from PyQt5.QtCore import QUrl
    from PyQt5.QtWebKitWidgets import QWebPage

  33. chemhong
    August 21, 2017 at 07:29

    How can i add an header just User-Agent on Url request, sentdex?
    thanks

  34. გიორგი კაკულაშვილი
    August 21, 2017 at 07:29

    Can we get 'inspect element' instead of 'source code' of html by python?

  35. Nevil Dsouza
    August 21, 2017 at 07:29

    Hey guys.New to scrapping. I used Scrapy for web scraping. Worked well until the issue arised maybe because of Google tag or AJAX. Need help. Here's the project and the issue : https://github.com/ZNClub-PA-ML-AI/Scrapy-Spiders/issues/2

  36. Wojciech Orzechowski
    August 21, 2017 at 07:29

    QtWebKit does not work anymore 🙁 can you update the video for Python 3? I would really appreciate that. You are great!

  37. 宏杰李
    August 21, 2017 at 07:29

    you should try selenium. it's less type and user_friendly. and it's more acceptable for beginner.

  38. Logan Lee
    August 21, 2017 at 07:29

    I have a question~! How can I make a new window in matplotlib? When I run plt.show(), it just shows its graph in ipyton console instead of making a new window. I use anaconda Spyder python IDE. Please… tell me how to open a new window~!

  39. Hugo Peralta
    August 21, 2017 at 07:29

    How does QWebPage work behind a proxy?

  40. Miguel Serrano
    August 21, 2017 at 07:29

    Thank you Harrison.
    I'm a fan of your python tutorials, I love python.
    Could you please make some tutorials about web scraping using Selenium to login in forms and scrap dynamic data?

Leave A Comment

You must be logged in to post a comment.