Web Scraping

There are two way to extract the data from the website,

  1. We can use the API to retrieve the data from the web, like facebook API which helps to retrieve data from facebook.
  2. We can also use web scraping or we harvesting to retrieve the data from the website.

Step Required

  1. Sending the HTTP request to the server of the webpage you want to access, the server will respond in the HTML content.
  2. Once we get the html file,now we are starting the parsing the data, we can not extract the data in the string format so we have to used parsing.
  3. Now we have to search for the Parse tree which we have created, for this task we have used a third-party app like Beautiful Soup.

Installing Beautiful Soup

pip install requests
pip install html5lib
pip install bs4
Pip install lxml

Now we will import the beautifulsoup and lxml.

Program 1

import bs4 as bs
import urllib.request

sour=urllib.request.urlopen('https://salesforcedrillers.com/learn-salesforce/paas/').read()

Printing the all the information about salesforcedrillers

soup=bs.BeautifulSoup(sour,'lxml')
print(soup)

Output

Note: complete output is not shown because of the high character output being present.

If we print the title of the salesforcedrillers.

Program 2

soup=bs.BeautifulSoup(sour,'lxml')
print(soup.title)

Output

====== RESTART: C:/Users/ASUS/AppData/Local/Programs/Python/Python 38-32/soup.py =====
PaaS | salesforcetutorial

Now we will find the all the paragraph tag

Program 3

soup=bs.BeautifulSoup(sour,'lxml')
print(soup.find_all('p'))

Output

====== RESTART: C:/Users/ASUS/AppData/Local/Programs/Python/Python38-32/soup.py =====
[

,

Copyright (c) 2020 salesforcedrillers.com | All rights reserved.

] >>>

Now we will using the loop for print the all paragraph tag

Program 4

import bs4 as bs
import urllib.request

sour=urllib.request.urlopen('https://salesforcedrillers.com/learn-salesforce/paas/').read()

soup=bs.BeautifulSoup(sour,'lxml')

for pra in soup.find_all('p'):
    print(pra)

Output


Copyright (c) 2020 salesforcedrillers.com | All rights reserved.

Now we will print the text of the salesforcedrillers.

Program 5

import bs4 as bs
import urllib.request

sour=urllib.request.urlopen('https://salesforcedrillers.com/learn-andriod/android-app-development/').read()

soup=bs.BeautifulSoup(sour,'lxml')

for pra in soup.find_all('p'):
    print(pra.text)

Output

====== RESTART: C:/Users/ASUS/AppData/Local/Programs/Python/Python38-32/soup.py =====

Android is open-source which means free to use and a Linux-based operating system which is an android kernel is based on Linux os and it is used for mobile devices such as smartphones and tablet computers. Android was developed by Google and other companies.
Benefits of learning Mobile Application Development.
Everyone refers to the mobile App to fetch the data and used for sending email and communication, to inquire, to search for anything. Every business is trying to promote their products and services through mobile apps. The nature of the company or the quality of their products is judged by their App and apps are generally used in 2 platforms ie. Android and IOS apple OS.
Well designed Mobile App plays an important role to enhance the reach of your products and services.
It is for sure; Mobile Application developers are in demand in the market nowadays.
There are different types of Mobile applications:

Note: complete output is not present due to the high character output is present.
Now will the child text of salesforcedriilers.

Program 6

import bs4 as bs
import urllib.request

sour=urllib.request.urlopen('https://salesforcedrillers.com/learn-andriod/android-app-development/').read()

soup=bs.BeautifulSoup(sour,'lxml')

for pra in soup.find_all('p'):
    print(pra.text)

Output

====== RESTART: C:/Users/ASUS/AppData/Local/Programs/Python/Python38-32/soup.py =====
None
Android is open-source which means free to use and a Linux-based operating system which is an android kernel is based on Linux os and it is used for mobile devices such as smartphones and tablet computers. Android was developed by Google and other companies.
programming language for automaton Apps and XCode for iOS Apps that use it's rather like an internet browser in a very mobile. each uses a mix of technologies like hypertext mark-up language, CSS, JavaScript. However, rather than targeting a mobile browser, hybrid applications target a WebView based mostly within a native app. this permits them to try and do things like access hardware capabilities of the mobile devices.
Progressive Web Applications (PWAs):
A Progressive internet App (PWA) could be an internet app. It means that an easy hypertext mark-up language associates degreed CSS based mostly online page that uses fashionable internet capabilities to deliver associate degree app-like expertise to users while not requiring them to put in an app from the AppStore/PlayStore, you'll be able to open easy internet computer address to access them. they're typically accessible by an internet computer address which might continually be stapled or saved on your phone's home browser.
None
None
>>> 

Note: complete output is not present due to the high character output is present.

Finding all hyperlinking tag
Program 7

import bs4 as bs
import urllib.request

sour=urllib.request.urlopen('https://salesforcedrillers.com/learn-andriod/android-app-development/').read()

soup=bs.BeautifulSoup(sour,'lxml')

for urls in soup.find_all('a'):
    print(urls)

Output

Note: complete output is not present due to the high character output is present.

Subscribe Now