Project Title: Extract Data from Large Fashion E-commerce Websites

Project Description:

I will need you to scrape products and their details from large fashion e-commerce websites.

1) For each e-commerce website, I will provide you with a list of start urls (see “abc_urls.csv” below, 40 to 60 rows) of product lists to scrape.

2) On each start url, there are several pages of products that will have a standard format that is specific to the e-commerce website.

3) You will start at each start url, and iterate through the pages and click on products to extract the product details.

4) In the product page of each product, extract the product details (see Extraction details below).
Example: abc_urls.csv (Provided)
Row 1: brandname_women_dresses,https://www.abc.com/dresses?sortby=1
Row 2: brandname_women_tops,https://www.abc.com/tops?sortby=1

File Structure:
1)
brandname_women_dresses/
data.json
images/
0000000001.webp
0000000002.webp

2)
brandname_women_tops/
data.json
images/
0000000001.webp
0000000002.webp

3) …

Extraction details on product page :
1) There are sometimes multiple colors for a product, click on each and continue with the extraction of the items below for each color.
– product_url : URL of the product page
– product_name : Name of the product
– product_description : Description of the product
– product_categories : Category of the product (breadcrumb on top)
– product_care: Product care details if it exists
– product_details : Specifications of the product

2) There are multiple product images on each product page, save each image and its corresponding image url as a new entry in the “data.json” file. If you are working on “brandname_women_dresses” (from “abc_urls.csv”), then images should be saved at “brandname_women_dresses/images/X.webp” where “X” is a unique id.

– product_image: Save image in webp format then store the relative path (“images/X.webp”) of the image here.
– image_url: URL to download the image
Example entries in “data.json” (suppose there is a “Happy Dress” with 2 product images):
[{ “product_url” : “https://www.abc.com/dresses/happy_dress”,
“product_name” : “Happy Dress”,
“product_description” : “This is a happy dress”,
“product_categories” : “Home/Women/Dresses”,
“product_care” : “Do not tumble dry”,
“product_image” : “images/0000000001.webp”,
“image_url: “abc.images.cdn/happy_dress_image1.webp” },
{ “product_url” : “https://www.abc.com/dresses/happy_dress”,
“product_name” : “Happy Dress”,
“product_description” : “This is a happy dress”,
“product_categories” : “Home/Women/Dresses”,
“product_care” : “Do not tumble dry”,
“product_image” : “images/0000000002.webp”,
“image_url: “abc.images.cdn/happy_dress_image2.webp” },
… ]
For similar work requirements feel free to email us on info@logicwis.com.