• Built and deployed a scalable FastAPI backend on AWS using ECS and EC2 auto-scaling, optimized for
high-concurrency and low-latency via connection pooling and rate limiting.
• Implemented centralized logging with AWS CloudWatch across ECS/EC2 services for real-time alerts, debugging, and monitoring.
• Implemented WebSocket-based streaming and real-time analysis to enhance AI responsiveness and improve user experience.
• Deployed machine learning models with Ray Serve for optimized GPU utilization; integrated Prometheus and Grafana for observability.
• Built RAG pipelines using real-time and historical data, increasing average session duration by 15%.
• Fine-tuned GPT-2 and BERT models for CEFR vocabulary scoring; developed feedback systems to boost user engagement and language learning.
• Created automated LLM pipelines for prompt generation, content creation, and performance evaluation, supporting continuous model improvement