This project is aimed at adapting Parallel K-means algorithm based on map reduce framework
using Hadoop to make the clustering method applicable to large scale data.
We compared parallel k-means with serial k means. We use speedup, scaleup and sizeup to evaluate the performances of our proposed algorithm.
The results show that the proposed algorithm can process large datasets on commodity hardware effectively.
Tools Used: Apache Hadoop version 1.2.1.