Web Scrapper | Search Engine for Scrapped Results | Python | Requests | BeautifulSoup
This passion project involves scraping data from the web archives of a national newspaper called “The Hindu” and creating a search engine to explore the scraped content. Here’s what it does:
1. Data Scraping: I’ve collected all the articles from The Hindu’s archives for the entire year of 2010. Using BeautifulSoup libraries, I extracted clean data from the site and stored in the "newsarticles.txt" file.
2. Search Engine: From command line, enter the keywords to be searched and it will search for articles by matching names with the article headline. Whether you’re researching historical events or curious about specific individuals, Project NASS has you covered.