Search for a command to run...
Background Integrating HIV clinical records with population-based surveillance data allows the study of health care seeking behaviours, access to care, and predictors of patient outcomes. We implemented a graph-based record linkage algorithm to deduplicate and link HIV clinical and population-based surveillance records in an HIV-endemic setting in rural South Africa.Methods We linked four data sources to create the Africa Health Research Institute (AHRI) Unified Data Platform: AHRI's Health and Demographic Surveillance System (HDSS), AHRI Clinic and Hospital Information System (AHRILink), National Health Laboratory Service (NHLS), and Three Integrated Electronic Registers (TIER.Net) HIV care and treatment records. HDSS data were collected between January 1, 2000, and July 31, 2024, through repeated household surveys of over 140,000 individuals. Clinical and laboratory data were obtained for one hospital and 17 clinics in Hlabisa, KwaZulu-Natal, covering the HDSS surveillance area. We implemented a probabilistic record linkage algorithm trained and validated on a subset of records with national identity numbers. We assessed linkage accuracy, computed descriptive statistics for the linked database, and estimated the HIV care cascade for this population.Results A total of 986,832 records were successfully linked across the four databases, achieving a sensitivity of 92.7% and a positive predictive value of 96.5% (F-score=0.95). The average number of records (standard deviation (SD)) in TIER.Net, HDSS, AHRILink and NHLS were 1.18 (0.44),1.05 (0.23),1.13 (0.40), and 5.21 (4.24), respectively. The linked data indicated that 12,293 HDSS resident adults (≥15 years) were living with HIV at some point during the 2022 and 2024 surveillance rounds. Of these, 10,622 (86.4%) had ever sought HIV care in the public sector, of whom 10,492 (98.8%) had ever started ART and 7,065 (66.5%) were currently on ART, of whom 6,301 (89.2%) were virally suppressed(viral load<200 copies/mL).Conclusion HIV care and population surveillance records from four data sources were deduplicated and linked with high accuracy, revealing persistent gaps in retention in care and viral suppression in an HIV-endemic region in rural South Africa. The AHRI Unified Data Platform offers the potential to deepen our understanding of HIV epidemiology in a well-described population and to improve services for HIV.Trial Registration Not applicable.