Project Title: Data Extraction from Wikipedia
I need to collect the history of interlanguage links from Wikipedia for a set of Wikipedia pages.
In particular, I have a set of pages in various language editions of Wikipedia, about 6000 pages in total.
I will give the list (page id’s, titles, and urls).
The task would be:
1. For each page, Download revision history.
2. From each revision, check if there are interlanguage links like this (more info here: https://en.wikipedia.org/wiki/Help:Interlanguage_links).
3. Whenever there is such a link in the revision, record: page id, revision id, timestamp, language, and title
The output should be a CSV file, which collects all such revisions.
I would also need the script that is used to collect the data and generate the output.
I would need the results within three days.
In your proposal, please mention (1) Which tools (programming languages, approaches) you would use. (2) Do you have any experience with Wikipedia scraping?
For similar work requirement feel free to email us on email@example.com.