Project Title: Products Detail Extraction from Tesco, Sainsburys.co.uk
Build and run a scraper for the website of Tesco (https://www.tesco.com/) and Sainsbury (https://www.sainsburys.co.uk/) to extract the following columns about products:
unit_price.price (e.g. 1.48)
unit_price.measure (e.g. ml)
unit_price.measure_amount (e.g. 100)
Product Description HTML Block
Product Ingredients HTML Block
Extract the following about offers:
Type (e.g. buy x for £y, buy any x for £y, reduced price, …)
Applicable items (make sure they can be looked up from the Products list above)
Unit (if applicable)
Price per unit
For “Any x for £y” and similar offers, a way to link the items in the offer (offer unique id or something)
– It is very important that you assign or use a unique ID for each product so that they can be linked.
– No duplicate products please. If the product is exactly the same one, it should only have one entry.
– The results should be reproducible. I should be able to run the code without much manual intervention.
– The scraper should be able to run uninterrupted and complete the whole website within a day each time.
– You’ll need to provide the data in excel or similar format, and the cleaned-up source code. Both data and functioning source code are needed.
If things go well I would like to work with you on multiple follow up jobs similar to this one.
Please note: Just like most retailers these sites might block you after certain number of requests. You will need to know how to address such issues. If you don’t know how to be unblocked please DON’T take this project.
For similar work requirement feel free to email us on email@example.com.