Web Scraping
There are two way to extract the data from the website,
- We can use the API to retrieve the data from the web, like facebook API which helps to retrieve data from facebook.
- We can also use web scraping or we harvesting to retrieve the data from the website.
Step Required
- Sending the HTTP request to the server of the webpage you want to access, the server will respond in the HTML content.
- Once we get the html file,now we are starting the parsing the data, we can not extract the data in the string format so we have to used parsing.
- Now we have to search for the Parse tree which we have created, for this task we have used a third-party app like Beautiful Soup.
Installing Beautiful Soup
pip install requests pip install html5lib pip install bs4 Pip install lxml
Now we will import the beautifulsoup and lxml.
Program 1
import bs4 as bs import urllib.request sour=urllib.request.urlopen('https://salesforcedrillers.com/learn-salesforce/paas/').read()
Printing the all the information about salesforcedrillers
soup=bs.BeautifulSoup(sour,'lxml') print(soup)
Output
Note: complete output is not shown because of the high character output being present.
If we print the title of the salesforcedrillers.
Program 2
soup=bs.BeautifulSoup(sour,'lxml') print(soup.title)
Output
====== RESTART: C:/Users/ASUS/AppData/Local/Programs/Python/Python 38-32/soup.py =====PaaS | salesforcetutorial
Now we will find the all the paragraph tag
Program 3
soup=bs.BeautifulSoup(sour,'lxml') print(soup.find_all('p'))
Output
====== RESTART: C:/Users/ASUS/AppData/Local/Programs/Python/Python38-32/soup.py ===== [ ,Copyright (c) 2020 salesforcedrillers.com | All rights reserved.
] >>>
Now we will using the loop for print the all paragraph tag
Program 4
import bs4 as bs import urllib.request sour=urllib.request.urlopen('https://salesforcedrillers.com/learn-salesforce/paas/').read() soup=bs.BeautifulSoup(sour,'lxml') for pra in soup.find_all('p'): print(pra)
Output
Copyright (c) 2020 salesforcedrillers.com | All rights reserved.
Now we will print the text of the salesforcedrillers.
Program 5
import bs4 as bs import urllib.request sour=urllib.request.urlopen('https://salesforcedrillers.com/learn-andriod/android-app-development/').read() soup=bs.BeautifulSoup(sour,'lxml') for pra in soup.find_all('p'): print(pra.text)
Output
====== RESTART: C:/Users/ASUS/AppData/Local/Programs/Python/Python38-32/soup.py ===== Android is open-source which means free to use and a Linux-based operating system which is an android kernel is based on Linux os and it is used for mobile devices such as smartphones and tablet computers. Android was developed by Google and other companies. Benefits of learning Mobile Application Development. Everyone refers to the mobile App to fetch the data and used for sending email and communication, to inquire, to search for anything. Every business is trying to promote their products and services through mobile apps. The nature of the company or the quality of their products is judged by their App and apps are generally used in 2 platforms ie. Android and IOS apple OS. Well designed Mobile App plays an important role to enhance the reach of your products and services. It is for sure; Mobile Application developers are in demand in the market nowadays. There are different types of Mobile applications:
Note: complete output is not present due to the high character output is present.
Now will the child text of salesforcedriilers.
Program 6
import bs4 as bs import urllib.request sour=urllib.request.urlopen('https://salesforcedrillers.com/learn-andriod/android-app-development/').read() soup=bs.BeautifulSoup(sour,'lxml') for pra in soup.find_all('p'): print(pra.text)
Output
====== RESTART: C:/Users/ASUS/AppData/Local/Programs/Python/Python38-32/soup.py ===== None Android is open-source which means free to use and a Linux-based operating system which is an android kernel is based on Linux os and it is used for mobile devices such as smartphones and tablet computers. Android was developed by Google and other companies. programming language for automaton Apps and XCode for iOS Apps that use it's rather like an internet browser in a very mobile. each uses a mix of technologies like hypertext mark-up language, CSS, JavaScript. However, rather than targeting a mobile browser, hybrid applications target a WebView based mostly within a native app. this permits them to try and do things like access hardware capabilities of the mobile devices. Progressive Web Applications (PWAs): A Progressive internet App (PWA) could be an internet app. It means that an easy hypertext mark-up language associates degreed CSS based mostly online page that uses fashionable internet capabilities to deliver associate degree app-like expertise to users while not requiring them to put in an app from the AppStore/PlayStore, you'll be able to open easy internet computer address to access them. they're typically accessible by an internet computer address which might continually be stapled or saved on your phone's home browser. None None >>>
Note: complete output is not present due to the high character output is present.
Finding all hyperlinking tag
Program 7
import bs4 as bs import urllib.request sour=urllib.request.urlopen('https://salesforcedrillers.com/learn-andriod/android-app-development/').read() soup=bs.BeautifulSoup(sour,'lxml') for urls in soup.find_all('a'): print(urls)
Output
Home Learn Salesforce Learn Android Learn DevOps Contact Us What is Android History of Android Android Architecture Android - Environment Setup Setting up Android Studio Creating Android Virtual Device with AVD Manager How to make android apps Android App Project Package Structure (Android Studio) Introduction To Android Views And ViewGroups
Note: complete output is not present due to the high character output is present.