C_Users_Axay_AppData_Local_Packages_Microsoft.SkypeApp_kzf8qxf38zg5c_LocalState_79632c9d-6303-4089-a67c-1b10842106e2

Planning to manually browse the listing on the rental website than surely, it’s a bad idea, as it is a very time-consuming process. The best option is to scrape the rental websites using the python web scraping. In this blog you are definitely going to have a brief overview on performing the scraping using the python.

The web scraping performed using the data wrangling & BeautifulSoup using the Pandas in order to discuss the generated insights.
Planning to rent an apartment or the condo in New York, Etobicoke& Mississauga is considerably cheaper than having it in Toronto?

  • Which suburb is available at lowest price?
  • The amount which you can save if rent a basement unit?
  • The comparison between the Toronto city rent & suburbs rent?

So, browsing them is not a good choice better scrape them using the web scraping python. Not only it will provide you the faster results, but would also help in performing a proper analysis & provide an exact answer for all the question.

Scraping Rental Website Data using the Python & BeautifulSoup

 Scraping-Rental-Website-Data-using

We are extracting the data from the TorontoRentals.com using the BeautifulSoup & Python. This website provide a list for the Toronto & many suburbs like Vaughan, Scarborough, Brampton, Mississauga, etc. It has different listing like house, condo, apartments as well as the basements.

Primarily, we have imported required Python libraries

# Import Python Libraries
# For HTML parsing
from bs4 import BeautifulSoup
# For website connections
import requests
# To prevent overwhelming the server between connections
from time import sleep# Display the progress bar
from tqdm import tqdm# For data wrangling
import numpy as np
import pandas as pd
pd.set_option(‘display.max_columns’, 500)
pd.set_option(‘display.width’, 1000)
# For creating plots
import matplotlib.pyplot as plt
import plotly.graph_objects as go

And, then we have created a function named “get_page” which mainly accepts the 4 user inputs- type, city, beds & page. It also consists of the logic for checking the response status code of HTTP for finding that the HTTP request are completed successfully. The get_page function is named from the key function name page_num.

def get_page(city, type, beds, page):

url = f’https://www.torontorentals.com/{city}/{type}?beds={beds}%20&p={page}’
# https://www.torontorentals.com/toronto/condos?beds=1%20&p=2

result = requests.get(url)

# check HTTP response status codes to find if HTTP request has been successfully completed
if result.status_code >= 100  and result.status_code <= 199:
print(‘Informational response’)
if result.status_code >= 200  and result.status_code <= 299:
print(‘Successful response’)
soup = BeautifulSoup(result.content, “lxml”)
if result.status_code >= 300  and result.status_code <= 399:
print(‘Redirect’)
if result.status_code >= 400  and result.status_code <= 499:
print(‘Client error’)
if result.status_code >= 500  and result.status_code <= 599:
print(‘Server error’)  return soup

We plan to scrape the information from every listing – City, Zip, Dimension, Street, Rent, Bed &Bath. We also assigned the empty listing for every variable being scraped. And, seven empty listing are also been created. The scripting grabs the City, Zip, Dimension, Street, Rent, Bed & Bath for all listings with the nested FOR LOOP & also using the HTML tags.

for page_num in tqdm(range(1, 250)):

sleep(2)

# get soup object of the page

soup_page = get_page(‘toronto’, ‘condos’, ‘1’, page_num)

# grab listing street

for tag in soup_page.find_all(‘div’, class_=’listing-brief’):

for tag2 in tag.find_all(‘span’, class_=’replace street’):

# to check if data point is missing

if not tag2.get_text(strip=True):

listingStreet.append(“empty”)

Finally, when the script completes their execution. We ensure that all have the similar lengths. And, then the listing are used for making the Panda DF & which is further save to the CSV File.

# create the dataframe

df_Toronto_Condo = pd.DataFrame({‘city_main’:’Toronto’, ‘listing_type’: ‘Condo’, ‘street’: listingStreet, ‘city’: listingCity, ‘zip’: listingZip, ‘rent’: listingRent, ‘bed’: listingBed,’bath’: listingBath, ‘dimensions’: listingDim})
# saving the dataframe to csv filedf_Toronto_Condo.to_csv(‘df_Toronto_Condo.csv’)

By using the page_num function & the fluctuating parameters of the get_page function, we gathered the data for different kind of housing – condo, apartment, house as well as basement for Toronto & the suburb cities. The panda DF is created for all types of the housing variants & are saved to the CSV file.

Data Preparation & Cleaning with Panda

After completion data gathering, cleaning& preparation. We have combined the DF produced from the web scraping, for getting the single DF which consist of all listings. Then, the data wrangling is started.

  • Searching of several values missing in DF
  • Managing the missing data
  • Finding & fixing the issues
  • Finishing the data transformation

Insight Produced from Data

You can easily generate the graphical insight of the data gathered. And, even you can easily generate the insight of the respective data fields. This further can be utilized for the different work which will always provide a positive result for the business or service provided by the organisation.

Investigation of Relation Between Several Fields

By getting the required result, you can easily compare the data & choose the best one for your requirements.

And, if you are more eager to have information related to the web scraping service using the BeautifulSoup & Python. Just approach the LogicWis & get total web scraping solution for all your requirements.