Amazon Machine Learning Challenge: Multi-Moda

This project presents a comprehensive solution for the Amazo

This project presents a comprehensive solution for the Amazon Machine Learning Challenge, focusing on extracting product attributes from images and text through advanced techniques in natural language processing and computer vision. Key highlights include: - Technical Architecture: - Developed a custom data pipeline using a "ProductDataset" class for efficient dataset handling. - Implemented asynchronous image downloading and flexible CSV parsing for improved performance. - Model Implementation: - Utilized the "vikhyatk/moondream2" model for seamless integration of image and text inputs, enabling high-performance visual question answering capabilities. - Inference Optimization: - Optimized the inference process for GPU acceleration and batch processing to ensure rapid predictions. - Post-Processing Algorithms: - Designed sophisticated algorithms for accurately extracting various attributes, including: - Weight: Used regex for precise numerical value extraction. - Voltage: Handled multiple voltage units and formats. - Wattage: Processed complex representations and managed unit prefixes. - Dimensional Attributes: Parsed multi-dimensional formats effectively. - Volume: Managed compound units with intelligent defaulting. - Challenges Addressed: - Overcame multi-modal input handling through a custom dataset class. - Optimized model deployment for improved inference speed. - Developed regex patterns for complex string parsing. - Established comprehensive unit mapping for consistency across predictions. - Implemented robust error handling to maintain pipeline stability. - Formulated adaptive questions based on target attributes for optimized model responses.