Bacterial Colony Images on Multiple Media (10-class dataset)

20260 citationsDatasetgreen Open Access

Authors

Tassadaq Hussain · Pawsey Supercomputing Research Centre

Abstract

# Bacterial Colony Images on Multiple Media (10-class dataset) This repository provides **documentation, metadata, scripts, and a lightweight GitHub subset** of a bacterial colony image dataset organized by **species × culture medium** (10 classes). ✅ **Full dataset (recommended for research use):** hosted on **Zenodo** (~9.2 GB) ✅ **GitHub subset:** a reduced version (~1.9 GB) containing images from **all 10 classes**, intended for quick access, preview, and reproducible folder structure (not the complete set). --- ### Zenodo (full archival dataset)- Full-resolution dataset (**~9.2 GB**) intended for experiments and publications.- DOI-based, citable, versioned archive. --- ## Quick facts (Full Zenodo release) - **Total images (full):** 2317 - **File formats:** JPG, PNG - **Classes:** 10 (4 species across multiple media) - **Full dataset size:** ~9.2 GB (Zenodo) - **GitHub subset size:** ~1.9 GB (this repo; not complete) - **Imaging protocol (paper):** fixed ≥16 MP camera, controlled lighting, ~30 cm standoff; consistent framing and preprocessing. --- ## Class taxonomy (species × medium) — Full dataset counts (Zenodo) | Species | Medium (folder name) | Images (full) ||---|---|---:|| E_Coli | E_Coli on EMB agar medium | 203 || E_Coli | E_Coli on MacConkey_Agar medium | 203 || E_Coli | E_Coli on Nutrients agar medium | 216 || Salmonella | Salmonella on XLD agar medium | 211 || Salmonella | Salmonella on MacConkey agar medium | 215 || Salmonella | Salmonella on Nutrients agar medium | 294 || Enterococcus | Enterococcus on Slantz_and_Bartley agar medium | 208 || Enterococcus | Enterococcus on Nutrients agar medium | 285 || Staphylococcus | Staphylococcus on MSA agar medium | 175 || Staphylococcus | Staphylococcus on Nutrient Agar | 307 || **Total** | | **2317** | > Note: The **GitHub subset contains fewer images than the table above** (subset is ~1.9 GB). > The table reflects the **full Zenodo release**. --- Description_file/conversations.jsonl is a JSON Lines annotation file (one JSON object per line) that links each colony image to multiple natural-language question–answer pairs in an instruction/VQA format—each record contains an id, an image filename (matching the corresponding file under Images/...), and a conversations array with a human question and an assistant answer (keys like {"from":"human","value":...} and {"from":"gpt","value":...}); in the current subset it includes 16,501 Q/A items covering 592 images, with questions designed to capture colony morphology and diagnostic cues (e.g., colour, size range, margin, elevation, surface texture, opacity), growth/distribution patterns (crowding, isolation suitability, approximate count, purity/contamination hints), and medium-specific indicators (e.g., lactose fermentation cues, EMB sheen, swarming/motility, hemolysis), so it can be used to (i) fine-tune or benchmark vision-language models for colony interpretation, (ii) build an interactive teaching/QA assistant for microbiology lab training, (iii) generate standardized captions/attribute labels for weak supervision, retrieval, or dataset search, and (iv) reproduce the exact prompts/answers used in downstream experiments by grouping records by image and pairing them with the corresponding plate image files in the GitHub subset or Zenodo archive.

Topics & Keywords

Publication Details

Published in: Zenodo (CERN European Organization for Nuclear Research)

DOI: 10.5281/zenodo.18620619

Command Palette

Bacterial Colony Images on Multiple Media (10-class dataset)

Authors

Abstract

Topics & Keywords

Publication Details