Project Title: Scrape Business Data from D&B

Project Description:

I am looking forward to extracting 140 Million company records from this website: dnb.com

Here is the sitemap: dnb.com/business-directory-sitemapindex.xml. There are a total of 2875 sitemaps and each sitemap contains 50,000 company records. Thus, a total of 140 million records. Alternatively, you may use the business directory to extract the data: dnb.com/business-directory.html

Fields required:
1. Company Name
2. Company Description
3. Company Website
4. DNB URL

For example, here is the data extract for the following URL:

Here is a sample company record URL: dnb.com/business-directory/company-profiles.pfizer_inc.140f48fa0b37556f925afcaec7b5c566.html

1. Company Name: Pfizer Inc.

2. Company Description: Pfizer Inc. is one of the world’s largest research-based pharmaceuticals firm, producing medicines for cardiovascular health, metabolism, oncology, inflammation and immunology, and other areas, with about 10 products that fetch approximately $1 billion or more in annual revenue. Its top prescription products include cholesterol-lowering Lipitor, pain management drugs Celebrex and Lyrica, pneumonia vaccine Prevnar, and erectile dysfunction treatment Viagra, as well as arthritis drug Enbrel, antibiotic Zyvox, and blood-thinner Eliquis. The company also makes and sells generic drugs and consumer health products. Pfizer operates around the world and gets about 55% of its revenue from international customers.

3. Company Website: pfizer.com

4. DNB URL: dnb.com/business-directory/company-profiles.pfizer_inc.140f48fa0b37556f925afcaec7b5c566.html

You would require the following:
1. IP rotation: Multiple IPs are required as the D&B system is very strict when it comes to scrapping and they would keep blocking the IPs.
2. You should use multiple scrapers running at the same time to extract this large volume of data.

Deliverable: MS Excel format. 1 million records per file. Total 140 files.

For similar work requirement feel free to email us on info@logicwis.com.