Project Title: Products Detail Extraction from Tesco, Sainsburys.co.uk

Project Description:

Build and run a scraper for the website of Tesco (https://www.tesco.com/) and Sainsbury (https://www.sainsburys.co.uk/) to extract the following columns about products:

Title
Review
Item Code
Currency
Price
unit_price.price (e.g. 1.48)
unit_price.measure (e.g. ml)
unit_price.measure_amount (e.g. 100)
Product Description HTML Block
Product Ingredients HTML Block
Product URL
Images URLs

Extract the following about offers:

Offer title/name
Type (e.g. buy x for £y, buy any x for £y, reduced price, …)
Applicable items (make sure they can be looked up from the Products list above)
Description
Dates
Original price
New price
Unit (if applicable)
Price per unit

For “Any x for £y” and similar offers, a way to link the items in the offer (offer unique id or something)

*Special notes*

– It is very important that you assign or use a unique ID for each product so that they can be linked.

– No duplicate products please. If the product is exactly the same one, it should only have one entry.

– The results should be reproducible. I should be able to run the code without much manual intervention.

– The scraper should be able to run uninterrupted and complete the whole website within a day each time.

– You’ll need to provide the data in excel or similar format, and the cleaned-up source code. Both data and functioning source code are needed.

If things go well I would like to work with you on multiple follow up jobs similar to this one.

Please note: Just like most retailers these sites might block you after certain number of requests. You will need to know how to address such issues. If you don’t know how to be unblocked please DON’T take this project.

For similar work requirement feel free to email us on info@logicwis.com.