How to Extract Data From a Website Using BeautifulSoup

·

2 min read

Extracting data is a common task when working with web scraping. BeautifulSoup is an HTML parsing library that makes it easy to pull data from a website by looking at the underlying code. In this tutorial, we’ll learn how to extract data from

There are mainly two ways to extract data from a website:

  • Use APIs(if available) to retrieve data.
  • Access the HTML of the webpage and extract useful information/data from it.

In this article, we will extract Billboard magazine’s Top Hot 100 songs of the year 1970 from Billboard Year-End Hot 100 singles of 1970.

image.png

Task:

  • Perform Web scraping and extract all 100 songs with their artists.
  • Create python dictionary which contains key as title of the single and value as lists of artists.

Installation

We need to install requests and bs4.The requests module allows you to send HTTP requests using Python. Beautiful Soup (bs4) is a Python library for pulling data out of HTML and XML files.

pip install requests
pip install bs4

Import the libraries

import requests
from bs4 import BeautifulSoup

Sending request

url = "https://en.wikipedia.org/wiki/Billboard_Year-End_Hot_100_singles_of_1970"
response = requests.get(url)
print(response.url) # print url
response # response status
songSoup = BeautifulSoup(response.text) # Object of BeautifulSoup

data_dictionary = {}

for song in songSoup.findAll('tr')[1:101]: # loop over index 1 to 101 because the findAll('tr') contains table headers
  # Priting 100 table rows.............
  # print(song)   

  title = song.findAll('a')[0].string

  artist = song.findAll('a')[1].string
  # Printing Titles and Artists.............
  print(title, ',', artist)

  # Printing Dictionary.............
  data_dictionary[title] = [artist]
print(data_dictionary)

image.png