Healthcare Analytics System – ETL Pipeline

AWS Healthcare Analytics System 🚀 Designed and implemented

AWS Healthcare Analytics System 🚀 Designed and implemented a robust ETL pipeline to power a Healthcare Analytics System on AWS, handling 59,452+ treatment records. Optimized the flow from data ingestion to visualization using a structured, multi-layered approach: 👉🏻 Extraction: Pulled healthcare treatment data from Amazon DynamoDB into Databricks Notebooks. 👉🏻 Transformation: Leveraged PySpark to clean, structure, and generate analytical tables. 👉🏻 Loading: Stored processed data into Amazon Redshift, modeling a Star Schema to accelerate querying by 3x. 📈 Key Insights Enabled: 👉🏻 Ranked providers by total treatments and success rates. 👉🏻 Tracked monthly success trends, achieving a 15% improvement in trend detection accuracy. 👉🏻 Mapped geographical distribution of treatments across multiple cities. 👉🏻 Summarized critical metrics such as average treatment costs and success rates city-wise. ⚙️ Tools & Technologies: 👉🏻 Azure Databricks Community Edition: Data preprocessing. 👉🏻 AWS Services: DynamoDB (Source DB), S3 (Staging Layer), Lambda (Automation), Step Functions (Orchestration), CloudWatch (Monitoring), Redshift (Data Warehouse). 🔎 Challenges Tackled: 👉🏻 Downscaled from 600,000 to 59,452 records to balance compute limits without compromising analytical depth. 👉🏻 Automated 90% of the data movement and processing pipeline using Lambda and Step Functions. 👉🏻 Ensured high availability and fault tolerance through AWS CloudWatch monitoring. 🏆 Impact: Achieved 40% faster report generation and 25% better query optimization compared to traditional SQL-based batch loading systems.