This project began as a business idea to provide small political campaigns, media outlets, and advocacy groups with a cost-effective solution for understanding public sentiment in real time. Traditional polling methods are often expensive, slow, and static, making it hard for smaller campaigns to adapt to rapid shifts in public opinion. Our platform addressed this challenge by leveraging machine learning and natural language processing to deliver dynamic, actionable insights.
Solution and Features:
• Data Integration: Combined real-time discussions from Reddit with structured news content from the New York Times to create a comprehensive dataset.
• Sentiment Analysis: Used VADER to assign positive, neutral, and negative sentiment scores, providing a clear understanding of public opinion.
• Predictive Modeling: Applied XGBoost to classify topics into traction categories (High, Medium, Low), optimizing performance with SMOTE and Grid Search.
• Topic Modeling: Used Latent Dirichlet Allocation (LDA) to identify key themes and their evolution over time.