Search for a command to run...
📊 Dataset Description Title: NHIF Bulgaria: Outpatient Pharmacy Reimbursement Expenditures and Patient Counts for Home Treatment (🇧🇬 Лекарствени продукти за домашно лечение) Summary: This dataset contains detailed records of pharmaceutical expenditures and patient counts for home treatment in Bulgaria, as reported by the National Health Insurance Fund (NHIF). The data covers all reimbursed medicinal products, medical devices, and dietary foods for special medical purposes dispensed through community pharmacies. The data is aggregated at the regional level within Regional Health Insurance Offices (RZOK), and grouped by NHIF reimbursement codes and ICD-10 diagnosis codes. Key Characteristics: Aggregated by Regional Health Insurance Office (RZOK) across all 28 regions. Linked to NHIF reimbursement codes, ATC classification codes, and ICD-10 diagnosis codes. Covers the full spectrum of NHIF-reimbursed outpatient pharmaceuticals for home treatment. Sub-monthly reporting granularity preserved via the part variable. Dual-currency columns (BGN and EUR) derived using the official fixed exchange rate of 1 EUR = 1.95583 BGN, with a currency flag indicating the original denomination (BGN for pre-2026 records, EUR from 2026 onward). Temporal Granularity: Each calendar month is reported in three sub-periods, preserved in the part variable: Part 01: days 1–10 of the month Part 02: days 11–20 of the month Part 03: days 21–end of the month The period variable always represents the first day of the calendar month regardless of sub-period, enabling straightforward monthly aggregation by grouping on period while retaining the option for intra-month dispensing pattern analysis via part. This structure is useful for identifying trends, delays, or spikes in utilisation within reporting cycles. Data Cleaning and Preprocessing: The raw monthly XLS files (198 source files, named as costs_part_NN_mmm_YYYY.xls) were processed using R (version 4.5.1; tidyverse, readxl, janitor, lubridate) with a standardised pipeline that: Harmonised Bulgarian column names to English and R-compatible identifiers via a defined column mapping dictionary Padded region codes to a uniform 2-digit zero-padded format Standardised region names using a canonical 28-region mapping Parsed temporal identifiers (month, year) and sub-period part numbers from filenames Added dual-currency columns (costs_bgn, costs_eur) and a currency flag based on the reporting period Tracked all file-level issues (read failures, missing columns, unmatched headers) and reported them in a structured summary All character variables have been preprocessed with the following transformations: Lowercasing all strings (except ICD-10 disease names, which preserve original case) Trimming and squishing whitespace Removing quotes (" and ') Standardizing " - " to "-" Removing trailing punctuation All 198 source files were processed successfully with no file-level issues. No records were imputed, modified, or excluded beyond the transformations described above. Data quality validation identified: 2,474 extreme cost outliers exceeding 10 times the 99th percentile of the reimbursement distribution 95 records with negative costs (refunds/corrections), totalling 40,307.85 BGN No missing cost or num_in_pack data (0.0%) No duplicate records, no invalid region codes, no missing or invalid part values All 66 months have complete 3-part sub-period coverage Structure: 📦 Rows: 7,266,074 📁 Columns: 19 📆 Temporal coverage: July 2020 – December 2025 (66 months, 198 source files in 3 sub-periods each) 🌍 Geographical scope: All 28 NHIF regions in Bulgaria 💊 Distinct NHIF medication codes: 3,367 🧪 Distinct ATC codes: 526 💰 Total reimbursement: 6,686,965,176 BGN (3,418,991,004 EUR) Sub-period (part) distribution: Part 01 (days 1–10): 2,564,109 records (35.3%) Part 02 (days 11–20): 2,530,737 records (34.8%) Part 03 (days 21–end): 2,171,228 records (29.9%) Records by year: 2020: 651,667 (9.0%) 2021: 1,287,060 (17.7%) 2022: 1,299,797 (17.9%) 2023: 1,313,999 (18.1%) 2024: 1,351,516 (18.6%) 2025: 1,362,035 (18.7%) Key Variables: Variable Description region_num NHIF regional code (2-digit, zero-padded, e.g. 01) region_name Name of the NHIF regional office (lowercase Cyrillic) atc_code Anatomical Therapeutic Chemical (ATC) classification code atc_name International nonproprietary name (INN) of the active substance nhif_code NHIF-specific reimbursement product code market_name Marketed product name (brand name) packaging Dosage form and packaging format concentration Strength or concentration per unit num_in_pack Number of units per package icd_code ICD-10 code of the diagnosed disease icd_name Diagnosis name (in Bulgarian, original case preserved) patients_num Number of insured persons (ЗОЛ) reimbursed for the product during the period pack_num Number of reimbursed packages costs Reimbursement amount in original currency (BGN pre-2026; EUR from 2026) period First day of the reporting month (YYYY-MM-DD); identical across all three sub-periods of the same month part Sub-period indicator: 01 = days 1–10, 02 = days 11–20, 03 = days 21–end currency Original currency denomination (BGN or EUR) costs_bgn Reimbursement amount standardised to BGN (1 EUR = 1.95583 BGN) costs_eur Reimbursement amount standardised to EUR (1 EUR = 1.95583 BGN) Use Cases: This dataset is suitable for: Time series analysis of outpatient pharmaceutical expenditure Pharmacoepidemiology and drug utilisation research Regional inequality studies in access to reimbursed medicines Health economics research and budget impact analyses Pharmaceutical policy evaluation at national and regional levels Intra-month dispensing pattern analysis (via the three-part reporting structure) Note: The unit of observation is an administrative reimbursement record aggregated at the region–product–diagnosis–period–part level. The patients_num field counts insured persons reimbursed for the given product within each stratum and does not represent unique patient identifiers across strata, parts, or periods. To obtain monthly totals, group by period and aggregate across parts. Product names and diagnoses are in Bulgarian; ATC codes follow the WHO international classification. Source: National Health Insurance Fund (NHIF), Bulgaria — https://www.nhif.bg/ License: Unless otherwise restricted by NHIF, this dataset is shared under Creative Commons Attribution 4.0 International (CC BY 4.0) Files included: nhif_outpatient_pharmacy_combined.csv — merged analytical dataset (UTF-8) nhif_outpatient_pharmacy_combined_metadata.csv — variable-level data dictionary with English and Bulgarian descriptions, source column mappings, data types, and value formats