• Formed a data-driven Instance Selection approach from scratch that will significantly enhance the generalizability of all Data Science models implemented by the company and go into production.
• Tackled the highly complex problem via an efficient on-line clustering model in spark with growing number of clusters which undercuts the problem of Big Data and creates a proper representation of it.