Project Title: Scrape Farm Prices from OLX.com.br
I’d like to get farm prices and their areas from the website OLX.com.br.
This is a two part job, that should work independently.
First, we need to fetch all the regions from states. I’d like to start with just the state of São Paulo (https://sp.olx.com.br/)
SP > DDD 14 > Bauru e Marília > Avaré
Then, we’ll need to get all the farms data and test the area size, as explained below.
The filter for this is shown in this link: https://sp.olx.com.br/regiao-de-bauru-e-marilia/regiao-de-avare/imoveis/terrenos/fazendas?q=fazenda
It SEEMS that it just need to add /imoveis/terrenos/fazendas?q=fazenda to the URL, so now we’ll start scraping
1st announcement is this:
Now, we’ll go to classified number 2:
Now, this is the tricky part! no way a farm have only 60m². So, we’ll need to scrape the data for other areas.
The keywords should be hectares or ha; alqueires or alq. In this case, it’s provided both measures. So let’s test it! 145,4728 hectares = 145,4728*10.000 = 1454728m² or 60,11 alqueires = 60,11 * 2,42 = 145,4662 hectares. There is minimum difference! So, the scraper should get the biggest value, which is 1454728m²
There is also some garbage, like this:
There is no area available. so the script should just flag the ID 892219223 as “check manually” on the exported results.
Let me know if you can do this? Looking forward to your reply!
For similar work requirement feel free to email us on firstname.lastname@example.org.