Search for a command to run...
Author keywords, unlike terms assigned by professional indexers, are not regulated by normative documents or controlled by special dictionaries. The aim of this study is to identify statistical differences between two sets of keywords (KWs): those assigned by authors, on the one hand, and those assigned by editors of abstract database of VINITI RAS, on the other. It is believed that confirming and understanding these differences may be useful for more rational use of keywords obtained from various sources. A comparative analysis of quantitative indicators of the novelty and lexical diversity of author and editorial KWs was conducted for the first time in this study. A comparison of the inclusion measures of author and editorial KWs in other metadata elements was conducted for the first time on several independent thematic samples. The methodological basis of the study is generalization—the identification and quantitative analysis of common features inherent in the studied data arrays. The empirical base of the study consisted of five independent statistical samples, the size of which varied from 10.40 thousand to 18.97 thousand articles. The topics of the samples corresponded to five headings of the State Rubricator of Scientific and Technical Information: 52. Mining; 53. Metallurgy; 55. Mechanical Engineering; 61. Chemical Technology. Chemical Industry; 73. Transport. We selected Russian-language articles uploaded to the VINITI abstract database in 2021–2024 and simultaneously containing the following non-empty metadata elements: title, author’s keywords, author’s abstract, editor’s keywords, and an abstract specially prepared for the VINITI abstract database. For each sample and separately for author’s and editor’s KWs, point statistical estimates of the identified common features were obtained: lexical diversity, novelty, and inclusion of keywords in other metadata elements (title and abstract). Similar statistical differences of author’s and editor’s KWs were observed across all five thematic collections: the degree of lexical diversity in author-generated KWs is higher than that of editor-generated terms; the novelty coefficient of author-generated KWs is higher than that of editor-generated terms; the novelty coefficient of author-generated annotations is higher than that of abstracts; and the degree of inclusion of author-generated KWs in article titles is lower than the degree of inclusion of editor-generated terms. Replication of the identified differences across five independent thematic samples, corresponding to randomly selected fields of knowledge, suggests the statistical stability of these differences. The vocabulary of author KWs is more variable over time compared to the more stable vocabulary of editor-generated terms, which may be useful for the rapid identification of new terminology and scientific frontiers. Unlike editor-generated KWs, author-generated KWs cannot independently express the main themes and concepts of a document, as they supplement the terms that can be extracted from publication titles.