Post-clustering merging with novel metrics for multi-label image collections

20250 citationsJournal Articlehybrid Open Access

Authors

Floris Gisolf · Stichting Kinderoncologie Nederland

Marcel Worring · University of Amsterdam

Abstract

This study addresses the task of clustering multi-label image collections, which is increasingly important in fields such as forensics, social media, and intelligence. Traditional classification models fall short in real-world scenarios where labeled data may not be available. Unsupervised clustering is a way to move forward in such cases. Clustering of multi-label data should minimize the number of clusters for an analyst to identify all instances of a specific label, ensuring cluster efficiency, while also reducing misplaced data within each cluster to improve cluster quality. Existing clustering algorithms applied to multi-label image collections generally have a strong emphasis on either cluster efficiency or cluster quality. We propose a Post-Clustering Merging algorithm that provides greater control over cluster efficiency vs quality in multi-label image collections, that can be applied on the results of existing clustering algorithms. We introduce two external metrics designed for multi-label clustering: Pairwise Jaccard Similarity Score and Label Distribution Score. These metrics enable a nuanced evaluation of clustering quality and efficiency, respectively, in scenarios where single-label metrics are inadequate. We demonstrate its effectiveness on various multi-label image collections. The results indicate significant improvements, not only giving more control, but also reducing the trade-off between cluster quality and efficiency. This study fills a gap in multi-label data collection analysis and sets a foundation for future exploration in this domain. • Two novel metrics allow evaluation of multi-label image collection clustering. • Merge clusters based on similarities and dissimilarities to increase performance. • Our method has proven effectiveness on several multi-label image collections. • Our merge method with k-means outperforms state-of-the-art deep clustering.

Topics & Keywords

Image Retrieval and Classification Techniques Text and Document Classification Technologies Advanced Image and Video Retrieval Techniques

Publication Details

Published in: Expert Systems with Applications

Volume 288, pp. 127875-127875

DOI: 10.1016/j.eswa.2025.127875

Field-Weighted Citation Impact: 0.00

Command Palette

Post-clustering merging with novel metrics for multi-label image collections

Authors

Abstract

Topics & Keywords

Publication Details