Search for a command to run...
SQLite databases play a central role in mobile phone forensics. Mobile applications frequently use them for data storage. Efficient extraction and interpretation of SQLite data are crucial for reconstructing device usage and user activities. In practice, digital forensic investigators must formulate and analyse complex SQL queries to retrieve evidence from various heterogeneous databases. This task requires extensive expertise in SQL, database schemas, and application-specific data logic. In this paper, we investigate an LLM-based approach to assist digital forensic investigators by automating the generation of SQL queries for forensic analysis. This enables investigators to query SQLite databases more efficiently and with less technical effort. First, we propose a mobile forensic dataset that captures typical investigative questions and database structures. We then use this dataset to fine-tune a local LLM. We introduce ForSQLiteLM, a Llama 3.2-3B bf16 model. It is optimized on a self-defined, domain-specific dataset tailored to mobile forensic scenarios. We compare ForSQLiteLM with several state-of-the-art LLMs to evaluate its effectiveness in generating forensic queries. We show that effective forensic Text-to-SQL generation can be achieved with a locally deployable 3B-parameter LLM by combining realistic SQLite schemas, execution-based evaluation, and domain- specific fine-tuning. Finally, as a proof of concept, we demonstrate how the proposed model can be integrated into the FQLite data retrieval tool via a retrieval-augmented generation (RAG) pipeline. • LLM-based approach to assist investigators by automating the generation of SQL queries for forensic analysis. • Introduction of a novel and unique dataset for mobile forensics. • Fine-tuning of an LLM with a domain dataset. • Benchmark study of the finetuned model with other LLM. • Proof-of-Concept study.
Published in: Forensic Science International Digital Investigation
Volume 57, pp. 302100-302100