RustSight is a fast, safe, and extensible dataset analysis CLI tool written in Rust. This tool is designed to perform data validation and exploratory analysis, crucial steps before AI/ML model training. It efficiently handles any CSV file and can analyze binary or text files to extract useful properties.
CSV Dataset Analysis: Detects numeric vs categorical columns, counts missing values per column, computes basic statistics (min, max, mean) for numeric columns, handles large CSV files efficiently through streaming, and generates a clean, readable analysis report.
Data Validation: Detects columns with high missing value ratios, flags no-variance columns, detects potential outliers, identifies mixed-type columns, and prints clear validation warnings before ML usage.
File Analysis: Counts total bytes, detects UTF-8 validity, counts lines and words for text files, and counts non-ASCII bytes for binaries.
The tool is open source, licensed under the MIT License, and contributions are welcome. It is designed with a CLI-first approach, making it ideal for automation and scripting. The tech stack includes Rust for performance and memory safety, and the csv crate for efficient CSV parsing.
Portfolio: https://omarnahdi.dev
Built with