<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">SAPARS</journal-id>
<journal-title>Scientiarum: A Multidisciplinary Journal</journal-title>
<abbrev-journal-title abbrev-type="pubmed">SAPARS</abbrev-journal-title>
<issn pub-type="epub">0000-0000</issn>
<publisher>
<publisher-name>BOHR</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.54646/SAPARS.2026.24</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>RESEARCH</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>An optimized fuzzy c-means clustering algorithm for efficient social media text analysis</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Mukkala</surname> <given-names>Narendra Reddy</given-names></name>
<xref ref-type="corresp" rid="c001"><sup>&#x002A;</sup></xref>
</contrib>
</contrib-group>
<aff><institution>Department of Information Systems, University of Memphis</institution>, <addr-line>Memphis, TN</addr-line>, <country>USA</country></aff>
<author-notes>
<corresp id="c001">&#x002A;Correspondence: Narendra Reddy Mukkala, <email>mukkala00@gmail.com</email></corresp>
</author-notes>
<pub-date pub-type="epub">
<day>06</day>
<month>02</month>
<year>2026</year>
</pub-date>
<volume>2</volume>
<issue>1</issue>
<fpage>1</fpage>
<lpage>11</lpage>
<history>
<date date-type="received">
<day>08</day>
<month>01</month>
<year>2026</year>
</date>
<date date-type="accepted">
<day>21</day>
<month>01</month>
<year>2026</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2026 Mukkala.</copyright-statement>
<copyright-year>2026</copyright-year>
<copyright-holder>Mukkala</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/"><p>&#x00A9; The Author(s). 2024 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.</p></license>
</permissions>
<abstract>
<p>The volume of data produced via social media platforms is staggering, with users generating an enormous volume of text through posts, comments, and interactions that occur every day. Analyzing this data is incredibly demanding work due partially to the fact that it is so unstructured, noisy, and contains many overlapping topics. Clustering techniques play an important role in classifying social media data by grouping together similar pieces of information. However, traditional clustering methods are limited in their ability to characterize social media data because they assign each data point to one cluster only (as with the k-means algorithm) and are very sensitive to the initial conditions being used when performing clustering operations. In this research we propose an optimized fuzzy means clustering algorithm to group social media text data, which will permit data points to belong to multiple clusters with multiple membership values, which is a more appropriate way to model how individuals participate and interact with one another online. Optimization techniques are employed in our method to reduce time complexity, improve convergence rates, and improve clustering accuracy. Finally, social media text data will be pre-processed using traditional text mining methods prior to clustering using the proposed algorithm. Experimental results show the optimized fuzzy means clustering produces more meaningful clusters than the traditional fuzzy c-means method and demonstrates the capability to handle large quantities of real-time social media data for analytical purposes.</p>
</abstract>
<kwd-group>
<kwd>social media data</kwd>
<kwd>fuzzy clustering</kwd>
<kwd>optimized fuzzy means</kwd>
<kwd>data mining</kwd>
<kwd>text analysis</kwd>
</kwd-group>
<counts>
<fig-count count="9"/>
<table-count count="0"/>
<equation-count count="0"/>
<ref-count count="9"/>
<page-count count="11"/>
<word-count count="4673"/>
</counts>
</article-meta>
</front>
<body>
<sec id="S1" sec-type="intro">
<title>Introduction</title>
<p>Social media has become an essential element of daily life in today&#x2019;s world of social media. Social media platforms like Twitter, Facebook, Instagram, YouTube, and LinkedIn have made it easy for users to connect with each other, share opinions and feelings, and share instantaneous information across global borders (<xref ref-type="bibr" rid="B1">1</xref>&#x2013;<xref ref-type="bibr" rid="B3">3</xref>). Due to the vast amount of data created and disseminated by social media every single second through text-based postings such as status updates, comments, and replies; conveying the number of times an idea has been expressed by hashtags or emoticons; and listing the number of times a post has been liked or shared, social media provides researchers, companies, and policymakers with a wealth of information about people&#x2019;s actual behaviors and opinions, as well as about society&#x2019;s changing values (<xref ref-type="bibr" rid="B4">4</xref>, <xref ref-type="bibr" rid="B5">5</xref>).</p>
<p>The rapid expansion of the number of social media posts at such an unprecedented rate poses new challenges for data analysis. In contrast to the majority of traditional database structures, the vast majority of social media data is unstructured or semi-structured. While text can contain a variety of stylistic qualities (colloquialisms, abbreviations, slang, capital letters, grammar mistakes, and use of emoticons), they can include multiple subjects within a single post. These qualities contribute to the complexity of finding and analyzing aggregated data in order to produce meaningful insights about the data within social media applications (<xref ref-type="bibr" rid="B6">6</xref>, <xref ref-type="bibr" rid="B7">7</xref>).</p>
<p>Clustering is very effective in the analysis and management of large quantities of social media data. Clustering enables the grouping of similar objects so that they can then be analyzed further by creating smaller &#x201C;chunks&#x201D; of data, rather than attempting to manage one or more enormous datasets containing no organizational structure (<xref ref-type="bibr" rid="B4">4</xref>).</p>
<p>There are many different types of clustering techniques that exist; for example, K-means has been widely used because it is simple, computationally efficient, and suitable for many applications (<xref ref-type="bibr" rid="B8">8</xref>). K-means clusters a dataset into a predetermined number of clusters by assigning each data point to the nearest center within that cluster (the &#x201C;center&#x201D; is the mean position of data points in the cluster). While K-means has been effective for many applications, it does have a number of significant limitations when used on social media data (<xref ref-type="bibr" rid="B7">7</xref>).</p>
<p>First, K-means clustering is executed as a &#x201C;hard&#x201D; clustering method (data points can only belong to one cluster and cannot belong to multiple clusters). This assumption is not valid, since social media content can pertain to multiple topics and thus be related to many different social media users (<xref ref-type="bibr" rid="B4">4</xref>, <xref ref-type="bibr" rid="B9">9</xref>).</p>
<p>Secondly, K-means clustering is sensitive to the initial center locations for determining cluster memberships; poor center locations will typically lead to poor results for all clusters (<xref ref-type="bibr" rid="B8">8</xref>).</p>
<p>Finally, the K-means clustering algorithm is highly impacted by the inclusion of noisy and outlier-type data points, which are common in social media datasets (<xref ref-type="bibr" rid="B4">4</xref>, <xref ref-type="bibr" rid="B7">7</xref>).</p>
<p>Recent years have seen a sharp rise in interest in the use of fuzzy clustering techniques as a way to overcome limitations associated with traditional cluster analysis methods. Unlike traditional hard clustering methods that only allow each data point to belong to one cluster, fuzzy clustering can permit a single data point to belong to multiple clusters with differing degrees of membership. Due to the flexibility afforded by this approach, fuzzy clustering performs much better with social media data than traditional techniques since the boundaries between topics are often unclear and do overlap (<xref ref-type="bibr" rid="B2">2</xref>, <xref ref-type="bibr" rid="B8">8</xref>).</p>
<p>Fuzzy c-means (FCM), one of the most common fuzzy clustering algorithms, assigns membership values to data points based on the similarity of data points to each of the cluster centers and updates the membership value(s) for the data points iteratively until they reach a stable state or converge (<xref ref-type="bibr" rid="B8">8</xref>).</p>
<p>However, there are several challenges associated with the traditional FCM algorithm for processing large and complex datasets. In particular, FCM is very sensitive to noise and outliers, which may significantly impact the accuracy of clustering results produced by this algorithm. In addition, FCM generally takes many iterations to converge, primarily when working with large or high-dimensional datasets. Another limitation of FCM is that it is highly dependent on initial cluster center placement; this can result in convergence to local minima and instability in the clustering results (<xref ref-type="bibr" rid="B2">2</xref>, <xref ref-type="bibr" rid="B8">8</xref>).</p>
<p>Analyzing social media data requires clustering algorithms that are not only efficient and scalable but also provide good accuracy. With an increasing amount of online data being generated every minute, it is critical to develop strategies for processing large amounts of information in a timely fashion. Applications of real-time or &#x201C;near real-time&#x201D; analysis, including detecting trends, responding to emergencies, and managing online reputation, are highly dependent on improving the efficiency and robustness of clustering algorithms; therefore, improving these characteristics of clustering algorithms represents an important area of research (<xref ref-type="bibr" rid="B1">1</xref>, <xref ref-type="bibr" rid="B2">2</xref>).</p>
<p>Optimization techniques have been proposed as a means of enhancing the performance of traditional clustering algorithms through the development of optimized approaches to clustering. Optimized clustering algorithms can enhance initialization techniques, reduce complexity of calculation, speed up convergence, and minimize the effects of noise. Thus, integrating optimization methods into fuzzy clustering creates an opportunity for improved clustering results to be achieved while still providing the benefits associated with fuzzy memberships (<xref ref-type="bibr" rid="B4">4</xref>, <xref ref-type="bibr" rid="B8">8</xref>).</p>
<p>This study looks to create an optimized fuzzy means clustering algorithm for grouping data generated by social media. The proposed method attempts to improve the existing FCM technique through increased convergence rate, reduced noise sensitivity, and improved accuracy of clustering results (<xref ref-type="bibr" rid="B2">2</xref>, <xref ref-type="bibr" rid="B8">8</xref>). In addition, the methodology outlined in the proposed fuzzy means clustering algorithm incorporates optimization methods/capabilities to improve the ability to manage very large volumes of data generated by social media (<xref ref-type="bibr" rid="B4">4</xref>).</p>
<p>The primary objectives of this research are as follows: (1) create a framework for an optimized fuzzy means clustering method that is appropriate for social media data; (2) assess the performance of the proposed method compared to traditional fuzzy means clustering methods; and (3) apply the proposed method in order to validate its usefulness through experimental evaluation (<xref ref-type="bibr" rid="B2">2</xref>, <xref ref-type="bibr" rid="B8">8</xref>). Through achieving the above objectives, this research will provide an important contribution to social media analysis and also offer a practical approach to organizing and analyzing unstructured data from the Internet.</p>
<p>This paper is organized in the following manner. In section &#x201C;Introduction,&#x201D; an overview of previous research and knowledge about the analysis of clustering and social media data is provided. Section &#x201C;Methodology&#x201D; details the research methodology, including a description of the proposed fuzzy means clustering algorithm. In Section &#x201C;Result,&#x201D; results of the experimental evaluation are reported along with measurements of the performance of the proposed fuzzy means clustering algorithm. In section &#x201C;Conclusion,&#x201D; we conclude the paper by discussing implications for future research based upon the findings from this study.</p>
</sec>
<sec id="S2">
<title>Methodology</title>
<p>The study design for this report will be covered in this section of the report. This will include the research design, the sources of information used in this summary, any preprocessing of that data prior to analysis, the clustering approach taken (if any), and the evaluation/analysis process of the product resulting from the implementation of the above-mentioned methods. The use of a systematic strategy through all phases of research will ensure transparency and clarity in the explanation of the strategies used to conduct a systematic research approach (<xref ref-type="bibr" rid="B2">2</xref>, <xref ref-type="bibr" rid="B4">4</xref>).</p>
</sec>
<sec id="S3">
<title>Research design</title>
<p>The analytical and experimental research designs will be utilized in the evaluation of the performance of the new optimized FCM clustering algorithm as it relates to traditional FCM clustering techniques used to cluster social media data (<xref ref-type="bibr" rid="B8">8</xref>). A comparison of the performance and efficiency of traditional FCM clustering techniques with the performance and efficiency of the new optimized FCM clustering approach will allow for a quantified measure of how much improved the two types of clustering approaches were to one another (<xref ref-type="bibr" rid="B2">2</xref>, <xref ref-type="bibr" rid="B7">7</xref>).</p>
</sec>
<sec id="S4">
<title>Data set description</title>
<p>The data set utilized in this research project consists of social media posts from a wide variety of publicly available sources on the internet and depicts the posting individual&#x2019;s (user) own personal experience of a given topic in each of the specific social media categories; hence, for the purpose of being as accurate as we can in our evaluation, we will only include the text portion of the post (i.e., the user-typed content only) and not any hyperlinks the post may have included (<xref ref-type="bibr" rid="B1">1</xref>, <xref ref-type="bibr" rid="B4">4</xref>). Therefore, the data set will be reflected how the individual qualities of each respective posting will reflect through social media in real life (e.g., the use of slang, the use of abbreviations, and the use of overlapping topics) (<xref ref-type="bibr" rid="B2">2</xref>, <xref ref-type="bibr" rid="B5">5</xref>). Prior to beginning the data analysis, we will remove all duplicate posts/events as well as any post/event that does not have relevant content and any posts/events that are partially complete so that our data analysis is performed using only the best quality data available for analysis (<xref ref-type="bibr" rid="B6">6</xref>).</p>
</sec>
<sec id="S5">
<title>Data collection</title>
<p>Standardized processes are in place to gather publicly available social media data (<xref ref-type="bibr" rid="B1">1</xref>). To maintain the original meaning associated with the collected data, it is essential that they are recorded with their original time of collection (<xref ref-type="bibr" rid="B4">4</xref>). Due to the aversion of many individuals and companies to share data about their users (in any way) that could be deemed as personal or identifying, there are strict ethical principles related to how the data must be collected (only collecting publicly available information) and what kinds of identifying data can be collected (<xref ref-type="bibr" rid="B2">2</xref>). Following data collection, all records are stored on a structured storage system for later analysis and use once the study is completed (<xref ref-type="bibr" rid="B7">7</xref>).</p>
</sec>
<sec id="S6">
<title>Preprocessing</title>
<p>The majority of social media material contains lots of &#x201C;noise&#x201D; and, therefore, is often unformatted and poor quality (<xref ref-type="bibr" rid="B4">4</xref>, <xref ref-type="bibr" rid="B5">5</xref>). Therefore, prior to using this information for an analysis effort, we should carefully process the information to assure a good quality of data; this will allow us to conduct a good analysis of the data set (<xref ref-type="bibr" rid="B6">6</xref>). Generally, the initial processing stage (data preprocessing) is the removal of all irrelevant data (for example, internet links, numbers, all punctuation, and special characters) to arrive at a more usable set of data (<xref ref-type="bibr" rid="B4">4</xref>). After the data has been pre-processed, we will also need to format the cleaned data to enable us to execute our analyses; this is commonly done by using the term frequency (TF) - inverse document frequency (IDF) (TF-IDF) algorithm to convert the cleaned data into a numerical data format in which we have assigned a high numerical value to words that are important in the entire body of text (within the body of text) but have assigned a low numerical value to words that are commonly used but do not contribute any significant meaning to the body of text (<xref ref-type="bibr" rid="B4">4</xref>, <xref ref-type="bibr" rid="B6">6</xref>).</p>
</sec>
<sec id="S7">
<title>Conventional fuzzy c-means clustering</title>
<p>The FCM clustering algorithm is an exemplary case study illustrating the efficacy in developing fuzzy-based modeling techniques (<xref ref-type="bibr" rid="B8">8</xref>). Therefore, by treating this clustering algorithm as a baseline comparison against other algorithms or methodologies, one can measure the efficacy of any methodology or algorithm under consideration (<xref ref-type="bibr" rid="B7">7</xref>).</p>
<p>Like classical FCM, it provides the ability to cluster multiple pieces of data into several different clusters, providing each of the pieces of data an overall membership of varying degrees into many clusters; the FCM algorithm operates in an iterative manner that determines new membership values and cluster center locations, by minimizing the value of the objective function until at which time all clusters are converged (<xref ref-type="bibr" rid="B8">8</xref>).</p>
<p>However, while classical theories of FCM clustering provide efficient means to cluster data through the use of clustering techniques that exhibit overlap and/or similarity, they do have some significant limitations (<xref ref-type="bibr" rid="B2">2</xref>, <xref ref-type="bibr" rid="B8">8</xref>).</p>
</sec>
<sec id="S8">
<title>Metrics for evaluating clustering performance</title>
<p>Multiple metrics can be used to evaluate clustering performance (<xref ref-type="bibr" rid="B8">8</xref>). The characteristics of how well-formed clusters are during analysis of clusters will provide information on clustered compactness and their level of separation (<xref ref-type="bibr" rid="B7">7</xref>). Clustering computational performance can be determined through execution time and convergence rates (<xref ref-type="bibr" rid="B8">8</xref>). The performance evaluation of an optimized fuzzy means clustering method shall be evaluated against that of traditional FCM clustering methods to determine if there is an increase in the accuracy and efficiency of clustering (<xref ref-type="bibr" rid="B2">2</xref>).</p>
</sec>
<sec id="S9">
<title>Statistical data analysis</title>
<p>Statistical data analysis will be conducted on data set(s) using standard statistical software packages for all experiments (<xref ref-type="bibr" rid="B4">4</xref>). The outputs for clustering using both methods will then be subjected to systematic analysis to evaluate the overall performance of clustering (<xref ref-type="bibr" rid="B7">7</xref>). Descriptive statistics will be used to define the performance output of the clustering analysis (8). Comparative evaluations will be carried out to evaluate the quality of clustering and computational performance comparing the two clustering methods for evidence supporting the improved performance of the optimized fuzzy means clustering method (<xref ref-type="bibr" rid="B2">2</xref>, <xref ref-type="bibr" rid="B8">8</xref>).</p>
</sec>
<sec id="S10" sec-type="results">
<title>Results</title>
<p>This part contains the implementation of an improved fuzzy means clustering style and what it can do for analyzing social media data. There will be figures that explain how the system was designed, the execution method, and ultimately how clustering works. These figures also describe what the system does with its processing of social media and provide clues on how to derive useful clusters.</p>
</sec>
<sec id="S11">
<title>System design and data processing findings</title>
<p>The data flow diagram (<xref ref-type="fig" rid="F1">Figure 1</xref>) is a diagram that explains how data flows through the system. It shows how data is collected from social media (Twitter) and processed in various preprocessing modules prior to being sent to the clustering engine. The data is then processed and changed to be clean and structured before being clustered, resulting in clusters that are more accurate and reliable than otherwise would be achieved. Clear division of the processing modules reduces duplication of data and should ultimately produce more efficiently processed data.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption><p>Data flow diagram of the proposed system.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="sapars-2026-24-g001.tif"/>
</fig>
<p>The use case diagram (<xref ref-type="fig" rid="F2">Figure 2</xref>) shows how the user and system will work together to accomplish all of the functions within the system, such as logging in, retrieving data, preprocessing, executing the clustering function and visualizing results. It also shows how the design of the system controls access to the use of the clustering functions while allowing users to efficiently execute clustering operations.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption><p>Use case diagram of the social media clustering system.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="sapars-2026-24-g002.tif"/>
</fig>
</sec>
<sec id="S12">
<title>Execution of a system can be either sequential or collaborative in nature</title>
<p>Included in this section are both sequence and collaboration diagrams as defined in <xref ref-type="fig" rid="F3">Figures 3</xref> and <xref ref-type="fig" rid="F4">4</xref>, respectively. The sequence diagram (<xref ref-type="fig" rid="F3">Figure 3</xref>) depicts how the system will execute its functions. The sequence begins with the user authenticating themselves as a valid system user to the system. The second step will then be extracting data from the Twitter server, then preprocessing that data for text analysis, then using the optimized fuzzy means clustering algorithm to cluster the preprocessed data into groups based on similarity, and finally producing the output required by the user. Using sequential execution means that each step must be completed before the next step begins, reducing the opportunity for errors to occur and increasing the overall stability of the system.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption><p>Sequence diagram illustrating system execution flow.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="sapars-2026-24-g003.tif"/>
</fig>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption><p>Collaboration diagram showing interaction among system modules.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="sapars-2026-24-g004.tif"/>
</fig>
<p>The collaboration diagram presents how these different parts of the system will work together in order to produce the required output (<xref ref-type="fig" rid="F4">Figure 4</xref>). In addition to showing the relationship of these different elements with one another, this diagram also depicts how they interact with one another to complete their assigned functions. As a result, when all modules within the system appropriately collaborate, the overall performance and success of the execution of clustering operations is increased.</p>
</sec>
<sec id="S13">
<title>System behavior and control flow</title>
<p>The activity diagram illustrated in <xref ref-type="fig" rid="F5">Figure 5</xref> depicts the control flow through the system by depicting the sequential execution from user login to the generation of output. The diagram shows that the control flow follows an efficient decision-making process for making decisions based on user input and system-generated events, thus minimizing delays in processing user input and providing accurate execution of the clustering algorithm.</p>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption><p>Activity diagram representing overall system workflow.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="sapars-2026-24-g005.tif"/>
</fig>
<p>The system&#x2019;s various states during different phases of the processing of data are described in the state chart diagram illustrated in <xref ref-type="fig" rid="F6">Figure 6</xref>. The system makes transitions between each state in the system, including the idle state, data loading state, preprocessing state, clustering state, and generating result state. The state transitions allow the system to be highly reliable by not transitioning into invalid or unstable states and, therefore, increase the robustness of the overall system.</p>
<fig id="F6" position="float">
<label>FIGURE 6</label>
<caption><p>State chart diagram showing system states and transitions.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="sapars-2026-24-g006.tif"/>
</fig>
</sec>
<sec id="S14">
<title>Class diagram and implementation results of system</title>
<p>The class diagram in <xref ref-type="fig" rid="F7">Figure 7</xref> shows how the internal structure of the system is defined. This class diagram provides the definitions of class, attribute, and method for the purposes of implementing data handling, preprocessing, clustering, and presenting the results of the system. As well, this class diagram indicates that the system has been implemented as a modular design, thereby providing ease of maintenance, updating, and extending of the system in the future.</p>
<fig id="F7" position="float">
<label>FIGURE 7</label>
<caption><p>Class diagram of the optimized fuzzy means clustering system.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="sapars-2026-24-g007.tif"/>
</fig>
</sec>
<sec id="S15">
<title>Execution and output results</title>
<p>The process of reading the Twitter dataset from the Twitter server is represented in <xref ref-type="fig" rid="F8">Figure 8</xref>. Through this figure system can display the successful retrieval of real-time and stored social media data for further analysis. Therefore, data is properly extracted in order to receive reliable clustering results.</p>
<fig id="F8" position="float">
<label>FIGURE 8</label>
<caption><p>Reading the Twitter dataset from the Twitter server.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="sapars-2026-24-g008.tif"/>
</fig>
<p>The final output of the system is displayed in <xref ref-type="fig" rid="F9">Figure 9</xref>. This final output displays groupings of social media posts based on the results of the optimized fuzzy means clustering algorithm. This describes social media posts that belong together being grouped together even with the similar message being sent. Compared with original FCM clustering, the optimized fuzzy means clustering provides findings with greater clarity/meaningfulness and therefore provides quicker processing times.</p>
<fig id="F9" position="float">
<label>FIGURE 9</label>
<caption><p>Final output.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="sapars-2026-24-g009.tif"/>
</fig>
</sec>
<sec id="S16">
<title>Performance analysis</title>
<p>The findings of this research indicate that the optimized fuzzy means clustering algorithm performs very well when applied to analyzing social media data. The results in the graphs all show that the system design, execution, and resulting clustering were correct. The system was able to efficiently handle unstructured data from social media sources and provide accurate clustering results. Therefore, it has great potential for use as part of a large-scale social media analytics application.</p>
</sec>
<sec id="S17" sec-type="discussion">
<title>Discussion</title>
<p>The study&#x2019;s purpose was to examine the effectiveness of an optimized fuzzy means clustering algorithm for the grouping of social media data. Data analysis shows that the proposed system is an effective way to analyze unstructured social media data and produce meaningful clustering results (<xref ref-type="bibr" rid="B2">2</xref>, <xref ref-type="bibr" rid="B4">4</xref>). The results are discussed concerning system design, clustering performance, and practical use of the system.</p>
<p>The results of the system design demonstrate (<xref ref-type="fig" rid="F1">Figures 1</xref>&#x2013;<xref ref-type="fig" rid="F7">7</xref>) that the architecture of the proposed system has been logically structured and organized (<xref ref-type="bibr" rid="B7">7</xref>). The data flow diagram demonstrates clear and orderly movement of data from the point of collection through the clustering process and generating outputs (<xref ref-type="bibr" rid="B4">4</xref>). Properly preprocessing the data before clustering reduces noise in the data and enhances clustering accuracy (<xref ref-type="bibr" rid="B5">5</xref>, <xref ref-type="bibr" rid="B6">6</xref>). Use case, sequence, and activity diagrams demonstrate that the clustering operation of the system follows a predefined order of execution, which reduces the risk of processing errors while increasing the overall reliability of the system (<xref ref-type="bibr" rid="B7">7</xref>).</p>
<p>The figures below demonstrate how coordinated and managed the overall system states are, as demonstrated by the collaboration diagram that shows the interaction between the different modules of the system (<xref ref-type="bibr" rid="B1">1</xref>). The complete sequence from retrieving data to preprocessing to clustering and finally to visualizing the results should occur smoothly between the different modules (<xref ref-type="bibr" rid="B4">4</xref>). Also displayed is the state chart diagram, which illustrates that the system moves through the defined states of loading data, preprocessing, clustering, and generating results seamlessly and without break (<xref ref-type="bibr" rid="B7">7</xref>).</p>
<p>The class diagram in <xref ref-type="fig" rid="F7">Figure 7</xref> shows a modular approach to the overall system structure, which will increase maintainability and facilitate fruitful efforts for either modification of existing functionalities or expansion of current functionalities during future developments (<xref ref-type="bibr" rid="B2">2</xref>). The modular approach also supports the ability to scale due to the demands of collecting vast quantities of social media content (<xref ref-type="bibr" rid="B1">1</xref>).</p>
<p>The use of real social media data applied in this research demonstrates the importance of this study (<xref ref-type="bibr" rid="B2">2</xref>). <xref ref-type="fig" rid="F8">Figure 8</xref> illustrates successful access of Twitter Data from the Twitter server (<xref ref-type="bibr" rid="B1">1</xref>). By using actual live Twitter data, it provides many real-world examples of testing the new methodology, given the inherent characteristics of social media data, such as that the data is typically noisy; has no standard definition; is rich with diverse information; and is dynamic (<xref ref-type="bibr" rid="B4">4</xref>, <xref ref-type="bibr" rid="B5">5</xref>). The performance in the study confirms that the optimized fuzzy means clustering algorithm can be applied for practical purposes of analyzing social media data (<xref ref-type="bibr" rid="B2">2</xref>, <xref ref-type="bibr" rid="B8">8</xref>).</p>
<p>The output of the clustering system is displayed in <xref ref-type="fig" rid="F9">Figure 9</xref>. The clustering demonstrates similarity among social media posts, even where overlaps may occur with respect to topic and theme (in terms of the content) (<xref ref-type="bibr" rid="B8">8</xref>). Compared to traditional FCM clustering methods, this optimized method has resulted in much more clearly delineated clusters of data (in terms of separation) and less computational time being required to generate clusters of social media posts (<xref ref-type="bibr" rid="B7">7</xref>, <xref ref-type="bibr" rid="B8">8</xref>). The improvements are attributed to improved initialization criteria for determining cluster centers, better methods for noise reduction from the data, and faster convergence rates of the optimized FCM algorithm (<xref ref-type="bibr" rid="B2">2</xref>, <xref ref-type="bibr" rid="B8">8</xref>).</p>
<p>Overall, these results indicate that the optimized FCM clustering algorithm is an effective method for analyzing social media data (<xref ref-type="bibr" rid="B2">2</xref>); the combination of an effective system architecture with optimized clustering algorithms has yielded improved accuracy, efficiency, and reliability when processing large amounts of social media data (<xref ref-type="bibr" rid="B4">4</xref>, <xref ref-type="bibr" rid="B8">8</xref>). Therefore, future studies may be conducted using this system architecture; for example, trend detection (identifying changes in consumer behaviors), opinion mining (determining the attitudes, beliefs, and feelings about certain topics), and organization of digital content (materials found online) (<xref ref-type="bibr" rid="B2">2</xref>, <xref ref-type="bibr" rid="B5">5</xref>). Future research might also focus on increasing automation (automatically determining the ideal number of clusters) and using advanced machine learning techniques to improve the cluster performance of social media data (<xref ref-type="bibr" rid="B8">8</xref>).</p>
</sec>
<sec id="S18" sec-type="conclusion">
<title>Conclusion</title>
<p>The present research assessed an enhanced fuzzy means clustering algorithm designed to help analyze the vast quantity of unstructured and noisy data that are commonly generated by social media networks. Analyzing social media data can be challenging due to the vast volumes of unstructured and noisy data, as well as the challenges associated with applying traditional clustering techniques such as fuzzy clustering to analyze data derived from social networking sites where topics often overlap and uncertainty exists. Thus, the new approach to provide solutions to these problems was to combine both fuzzy clustering approaches and optimization methods into a single algorithm in order to generate more accurate and efficient clustering results.</p>
<p>The results of this study indicate that an optimized fuzzy means clustering can successfully perform clustering on real social media data and create meaningful clusters of data. The structure of the system to be created permitted the data to flow through the collection phase and produce the clustered data at the end of the system. Noise in the data was removed in the data preprocessing stage, which improved the subsequent analysis by producing cleaner data. In addition, the optimized fuzzy means clustering produced better cluster separation and a faster convergence speed than traditionally performing FCM clustering on social media networks. The combination of the improvements and the new capabilities provided makes the optimized fuzzy means clustering algorithm efficient at analyzing large quantities of complex data produced by social media networks.</p>
<p>The analysis of real Twitter data has strengthened the practical implications of this study&#x2019;s results in Twitter. A successful retrieval of social media data was achieved through Twitter, allowing for user-defined cluster parameters (number of clusters) in order to generate the clustering results. The final clustering results demonstrate that the optimized algorithm successfully groups similar social media postings even when the subject matter of the postings overlaps. This characteristic of the optimized fuzzy means clustering algorithm provides important advantages to applications in the areas of topic detection, opinion mining, and trend analysis.</p>
<p>Overall, the results from this study demonstrate that the optimized fuzzy means clustering algorithm offers a reliable and an efficient solution to social media data analysis. Additionally, the modular system design allows for both future maintainability and scalability, making this approach a strong candidate for future enhancements. While the results from this study are promising, several limits do exist. Clusters have to be manually defined, and the evaluation process was performed only once on one dataset.</p>
<p>Future studies may be able to improve clustering effectiveness by assessing the algorithm&#x2019;s performance on other social media datasets with automated determination of cluster numbers. Additionally, clustering may further improve in future research if integrated with either advanced machine learning or deep learning techniques. Therefore, the work presented here represents a valuable and effective clustering solution that establishes a strong foundation for future research in the burgeoning field of social media analytics.</p>
</sec>
</body>
<back>
<sec id="S19" sec-type="funding-information">
<title>Funding</title>
<p>The author declares that this research received no external funding.</p>
</sec>
<sec id="S20">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<ref-list>
<title>References</title>
<ref id="B1"><label>1.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Batrinca</surname> <given-names>B</given-names></name> <name><surname>Treleaven</surname> <given-names>PC</given-names></name></person-group>. <article-title>Social media analytics: a survey of techniques, tools, and platforms.</article-title> <source><italic>AI Soc.</italic></source> (<year>2015</year>) <volume>30</volume>(<issue>1</issue>):<fpage>89</fpage>&#x2013;<lpage>116</lpage>.</citation></ref>
<ref id="B2"><label>2.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Khan</surname> <given-names>MA</given-names></name> <name><surname>Karim</surname> <given-names>MR</given-names></name> <name><surname>Yang</surname> <given-names>Y</given-names></name></person-group>. <article-title>A review of social media analytics and clustering techniques.</article-title> <source><italic>IEEE Access.</italic></source> (<year>2021</year>) <volume>9</volume>:<fpage>45231</fpage>&#x2013;<lpage>45</lpage>.</citation></ref>
<ref id="B3"><label>3.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Georgie</surname> <given-names>M</given-names></name></person-group>. <article-title>Social media usage and its impact on communication and society</article-title>. <source><italic>Int J Soc Media Stud.</italic></source> (<year>2019</year>) <volume>6</volume>(<issue>2</issue>):<fpage>45</fpage>&#x2013;<lpage>53</lpage>.</citation></ref>
<ref id="B4"><label>4.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Aggarwal</surname> <given-names>CC</given-names></name> <name><surname>Zhai</surname> <given-names>C.</given-names></name></person-group> <source><italic>Mining Text Data.</italic></source> <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Springer</publisher-name> (<year>2019</year>).</citation></ref>
<ref id="B5"><label>5.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>B.</given-names></name></person-group> <source><italic>Sentiment Analysis: Mining Opinions, Sentiments, and Emotions.</italic></source> <publisher-loc>Cambridge</publisher-loc>: <publisher-name>Cambridge University Press</publisher-name> (<year>2020</year>).</citation></ref>
<ref id="B6"><label>6.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Manning</surname> <given-names>CD</given-names></name> <name><surname>Raghavan</surname> <given-names>P</given-names></name> <name><surname>Sch&#x00FC;tze</surname> <given-names>H.</given-names></name></person-group> <source><italic>Introduction to Information Retrieval.</italic></source> <publisher-loc>Cambridge</publisher-loc>: <publisher-name>Cambridge University Press</publisher-name> (<year>2018</year>).</citation></ref>
<ref id="B7"><label>7.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nguyen</surname> <given-names>TT</given-names></name> <name><surname>Aberer</surname> <given-names>K</given-names></name></person-group>. <article-title>Clustering and classification of social media data.</article-title> <source><italic>Soc Net Anal Mining.</italic></source> (<year>2018</year>) <volume>8</volume>(<issue>1</issue>):<fpage>1</fpage>&#x2013;<lpage>14</lpage>.</citation></ref>
<ref id="B8"><label>8.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>Y</given-names></name> <name><surname>Wang</surname> <given-names>J</given-names></name> <name><surname>Zhao</surname> <given-names>X</given-names></name></person-group>. <article-title>An improved fuzzy clustering algorithm for large-scale text data analysis.</article-title> <source><italic>Expert Syst Appl.</italic></source> (<year>2020</year>) <volume>158</volume>:<fpage>113</fpage>&#x2013;<lpage>21</lpage>.</citation></ref>
<ref id="B9"><label>9.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Blei</surname> <given-names>DM</given-names></name></person-group>. <article-title>Probabilistic topic models.</article-title> <source><italic>Commun ACM.</italic></source> (<year>2012</year>) <volume>55</volume>(<issue>4</issue>):<fpage>77</fpage>&#x2013;<lpage>84</lpage>.</citation></ref>
</ref-list>
</back>
</article>
