Project Title: Scrape Farm Prices from OLX.com.br

Project Description:

I’d like to get farm prices and their areas from the website OLX.com.br.

This is a two part job, that should work independently.

First, we need to fetch all the regions from states. I’d like to start with just the state of São Paulo (https://sp.olx.com.br/)
SP > DDD 14 > Bauru e Marília > Avaré
https://sp.olx.com.br/regiao-de-bauru-e-marilia/regiao-de-avare

Then, we’ll need to get all the farms data and test the area size, as explained below.

The filter for this is shown in this link: https://sp.olx.com.br/regiao-de-bauru-e-marilia/regiao-de-avare/imoveis/terrenos/fazendas?q=fazenda

It SEEMS that it just need to add /imoveis/terrenos/fazendas?q=fazenda to the URL, so now we’ll start scraping

1st announcement is this:
https://sp.olx.com.br/regiao-de-bauru-e-marilia/terrenos/fazenda-a-venda-1355200-m-por-r-18-000-000-itai-avare-sp-894585131

Title
Id
Price
Category
Size
Description

Now, we’ll go to classified number 2:
https://sp.olx.com.br/regiao-de-bauru-e-marilia/terrenos/venda-fazenda-itatinga-avare-861672889

Title
Id
Price
Size
Description
Area Total

Now, this is the tricky part! no way a farm have only 60m². So, we’ll need to scrape the data for other areas.

The keywords should be hectares or ha; alqueires or alq. In this case, it’s provided both measures. So let’s test it! 145,4728 hectares = 145,4728*10.000 = 1454728m² or 60,11 alqueires = 60,11 * 2,42 = 145,4662 hectares. There is minimum difference! So, the scraper should get the biggest value, which is 1454728m²

There is also some garbage, like this:
https://sp.olx.com.br/regiao-de-bauru-e-marilia/terrenos/21-imovel-rural-no-boleto-892219223#

There is no area available. so the script should just flag the ID 892219223 as “check manually” on the exported results.

Let me know if you can do this? Looking forward to your reply!

For similar work requirement feel free to email us on info@logicwis.com.