Search for a command to run...
The aim of the study is to propose a technique for visualizing the results of short text clustering using the Gibbs Sampling Dirichlet Multinomial Mixture (GSDMM) algorithm, in order to facilitate the analysis of the results and the selection of the hyperparameters of this algorithm and dictionary. GSDMM is selected as the most popular short text clustering algorithm on GITHUB. The algorithm implemented by Ryan Walker on Rust was used. The program Scimago Graphica was used to create bar charts. 16486 bibliometric records on the topic “Visualization” exported from the Scopus database on November 12, 2024 served as the source of short texts. Only Author keywords are used as short texts in this paper. A technique for visualizing the results of short text clustering using the GSDMM algorithm is proposed, which is based on comparing the occurrence of keywords in a given cluster and in each of the other clusters. It is shown that the cluster topics obtained using the GSDMM algorithm can be compared with the results of author keyword clustering performed using the VOSviewer program. The obtained results can be interpreted as a certain stability of cluster themes obtained by essentially different methods. The author suggests to expand the study by creating a thematic dictionary of abbreviations, analyzing the influence of the dictionary on the clustering results of the GSDMM algorithm, and extending the method of visualizing the clustering results to other short texts such as titles and abstracts.