An Optimized Fuzzy C-Means Clustering Algorithm for Efficient Social Media Text Analysis
Keywords:
Social media data, fuzzy clustering, optimized fuzzy means, data mining, text analysisAbstract
The volume of data produced via social media platforms is staggering, with users generating an enormous volume of text through posts, comments, and interactions that occur every day. Analysing this data is incredibly demanding work due partially to the fact that it is so unstructured, noisy, and contains many overlapping topics. Clustering techniques play an important role in classifying social media data by grouping together similar pieces of information. However, traditional clustering methods are limited in their ability to characterize social media data because they assign each data point to one cluster only (as with the k-means algorithm) and are very sensitive to the initial conditions being used when performing clustering operations. In this research we propose an optimized fuzzy means clustering algorithm to group social media text data, which will permit data points to belong to multiple clusters with multiple membership values, which is a more appropriate way to model how individuals participate and interact with one another online. Optimization techniques are employed in our method to reduce time complexity, improve convergence rates, and improve clustering accuracy. Finally, social media text data will be pre-processed using traditional text mining methods prior to clustering using the proposed algorithm. Experimental results show the optimized fuzzy means clustering produces more meaningful clusters than the traditional fuzzy c-means method, and demonstrates the capability to handle large quantities of real-time social media data for analytical purposes.