AWS Healthcare Analytics System š
Designed and implemented a robust ETL pipeline to power a Healthcare Analytics System on AWS, handling 59,452+ treatment records. Optimized the flow from data ingestion to visualization using a structured, multi-layered approach:
šš» Extraction: Pulled healthcare treatment data from Amazon DynamoDB into Databricks Notebooks.
šš» Transformation: Leveraged PySpark to clean, structure, and generate analytical tables.
šš» Loading: Stored processed data into Amazon Redshift, modeling a Star Schema to accelerate querying by 3x.
š Key Insights Enabled:
šš» Ranked providers by total treatments and success rates.
šš» Tracked monthly success trends, achieving a 15% improvement in trend detection accuracy.
šš» Mapped geographical distribution of treatments across multiple cities.
šš» Summarized critical metrics such as average treatment costs and success rates city-wise.
āļø Tools & Technologies:
šš» Azure Databricks Community Edition: Data preprocessing.
šš» AWS Services: DynamoDB (Source DB), S3 (Staging Layer), Lambda (Automation), Step Functions (Orchestration), CloudWatch (Monitoring), Redshift (Data Warehouse).
š Challenges Tackled:
šš» Downscaled from 600,000 to 59,452 records to balance compute limits without compromising analytical depth.
šš» Automated 90% of the data movement and processing pipeline using Lambda and Step Functions.
šš» Ensured high availability and fault tolerance through AWS CloudWatch monitoring.
š Impact: Achieved 40% faster report generation and 25% better query optimization compared to traditional SQL-based batch loading systems.