This project presents a comprehensive solution for the Amazon Machine Learning Challenge, focusing on extracting product attributes from images and text through advanced techniques in natural language processing and computer vision. Key highlights include:
- Technical Architecture:
- Developed a custom data pipeline using a "ProductDataset" class for efficient dataset handling.
- Implemented asynchronous image downloading and flexible CSV parsing for improved performance.
- Model Implementation:
- Utilized the "vikhyatk/moondream2" model for seamless integration of image and text inputs, enabling high-performance visual question answering capabilities.
- Inference Optimization:
- Optimized the inference process for GPU acceleration and batch processing to ensure rapid predictions.
- Post-Processing Algorithms:
- Designed sophisticated algorithms for accurately extracting various attributes, including:
- Weight: Used regex for precise numerical value extraction.
- Voltage: Handled multiple voltage units and formats.
- Wattage: Processed complex representations and managed unit prefixes.
- Dimensional Attributes: Parsed multi-dimensional formats effectively.
- Volume: Managed compound units with intelligent defaulting.
- Challenges Addressed:
- Overcame multi-modal input handling through a custom dataset class.
- Optimized model deployment for improved inference speed.
- Developed regex patterns for complex string parsing.
- Established comprehensive unit mapping for consistency across predictions.
- Implemented robust error handling to maintain pipeline stability.
- Formulated adaptive questions based on target attributes for optimized model responses.