AI-based automated SQL query generation for SQLite databases in Mobile forensics

20260 citationsJournal Articlehybrid Open Access

Authors

Dirk Pawlaszczyk · Hochschule Mittweida

Ronny Bodach · Hochschule Mittweida

Philipp Engler · Hochschule Mittweida

Jan Kolouch · University of Finance and Administration

Michael Spranger · Hochschule Mittweida

Christian Hummert · Hochschule Mittweida

Dirk Labudde ·

Abstract

SQLite databases play a central role in mobile phone forensics. Mobile applications frequently use them for data storage. Efficient extraction and interpretation of SQLite data are crucial for reconstructing device usage and user activities. In practice, digital forensic investigators must formulate and analyse complex SQL queries to retrieve evidence from various heterogeneous databases. This task requires extensive expertise in SQL, database schemas, and application-specific data logic. In this paper, we investigate an LLM-based approach to assist digital forensic investigators by automating the generation of SQL queries for forensic analysis. This enables investigators to query SQLite databases more efficiently and with less technical effort. First, we propose a mobile forensic dataset that captures typical investigative questions and database structures. We then use this dataset to fine-tune a local LLM. We introduce ForSQLiteLM, a Llama 3.2-3B bf16 model. It is optimized on a self-defined, domain-specific dataset tailored to mobile forensic scenarios. We compare ForSQLiteLM with several state-of-the-art LLMs to evaluate its effectiveness in generating forensic queries. We show that effective forensic Text-to-SQL generation can be achieved with a locally deployable 3B-parameter LLM by combining realistic SQLite schemas, execution-based evaluation, and domain- specific fine-tuning. Finally, as a proof of concept, we demonstrate how the proposed model can be integrated into the FQLite data retrieval tool via a retrieval-augmented generation (RAG) pipeline. • LLM-based approach to assist investigators by automating the generation of SQL queries for forensic analysis. • Introduction of a novel and unique dataset for mobile forensics. • Fine-tuning of an LLM with a domain dataset. • Benchmark study of the finetuned model with other LLM. • Proof-of-Concept study.

Topics & Keywords

Digital and Cyber Forensics Digital Media Forensic Detection Forensic and Genetic Research

UN Sustainable Development Goals

Peace, Justice and strong institutions

Publication Details

Published in: Forensic Science International Digital Investigation

Volume 57, pp. 302100-302100

DOI: 10.1016/j.fsidi.2026.302100

Field-Weighted Citation Impact: 0.00