• Developed a machine learning model to predict gene-disease associations using the PharmGKB database, advancing personalized medicine.
• Applied wavelet denoising and KNN imputation for data preprocessing, utilized PCA for dimensionality reduction, and employed clustering and classification algorithms, achieving a classification accuracy of 70.2% and a silhouette score of 0.918 for clustering.
• Tools used: Python, scikit-learn, PCA, Gaussian Mixture Model, Logistic Regression