Performed text preprocessing on web data over a corpus of 2000+ documents.
Preprocessed documents by removing stop words, tokenizing and all to get the processed document for further indexing.
Created a dictionary (key-value pair) consisting of DocID as key and the frequency of that word as the value in that document.
Created inverted index where implemented Boolean queries and Phrase queries to get the required documents.
Technologies Used: Python, Pandas, nltk