Search for a command to run...
This dataset presents a comprehensive collection of 1,338 sputum smear microscopy images containing 1,628 ground truth bounding box labels for the detection of Mycobacterium tuberculosis. The clinical specimens were sourced from Dr. Mohamad Soewandhi Regional General Hospital and Surabaya Pulmonary Hospital, Indonesia, and captured using a standard optical microscope equipped with a Hayear digital microscope camera. To address common challenges in microscopic imaging, such as uneven background illumination, dust artifacts, and spatial noise, the dataset is provided in two distinct versions to facilitate diverse experimental setups: Raw Dataset: Contains the original images captured directly from the microscope, preserving the raw illumination and color characteristics of the stained slides. Processed CIELAB (Enhanced) Dataset: Contains images that have undergone a specific computational enhancement pipeline. This pipeline includes spatial noise reduction using a Median Filter (3x3 kernel), illumination equalization via Contrast Limited Adaptive Histogram Equalization (CLAHE) applied to the 'L' channel, and a color space transformation where the original 'Blue' channel is synthetically replaced by the 'a' channel from the CIELAB color space. This synthesis maximizes the visual separation between the bacilli pigments and the background. Data Structure & Format: Both the raw and enhanced datasets are explicitly divided into train and val (validation) subfolders to facilitate immediate machine learning model training. All image annotations are provided as text files (.txt) strictly following the standard YOLO bounding box format (normalized coordinates: class_id x_center y_center width height). The object class ID for Mycobacterium tuberculosis is set to 0. Potential Use Cases: Researchers and developers in computer vision and healthcare diagnostics can utilize this dual-version dataset to build, benchmark, and improve object detection algorithms (such as the YOLO family) for automated tuberculosis screening. Furthermore, it serves as a ready-to-use resource for evaluating how color space transformations affect model robustness against common microscopic imaging artifacts.