SLMs Meet GraphRAG: A Structured Approach to Context-Aware Cybersecurity Hint Generation

20260 citationsJournal Articlediamond Open Access

Authors

Ishan Abraham · Lewis & Clark College

Jens Mache · Lewis & Clark College

Taylor Wolff · The Evergreen State College

Jack Cook · The Evergreen State College

Richard Weiss · The Evergreen State College

Justin Wang · Universidad del Noreste

Abstract

Generating hints for learners who are engaged in hands-on cybersecurity exercises is the goal of our research. Learners sometimes get stuck or frustrated, they head in the wrong direction or are missing information that is necessary for solving an exercise. While using large language models (LLMs) is an option, LLMs typically require the sharing of student data with third-party AI providers. In order to improve privacy and minimize cost and computational overhead, previous research has explored using locally deployed small language models (SLMs) with retrieval-augmented generation (RAG). However while RAG has been shown to enhance SLM capabilities without the need to fine tune, it falls short when answering open-ended or multi-step questions that require reasoning across interconnected concepts. This limitation is particularly evident in cybersecurity education, where students often need help understanding how threats, tools, and strategies relate to one another. The cybersecurity hint system EDUHints (Wolff et al, 2025) currently relies on a standard RAG pipeline. In classroom testing, students were unsure whether generated hints meaningfully answered their questions. To address this challenge, we present a custom GraphRAG approach that builds on a proposed cybersecurity education focused ontology and knowledge graph called AISecKG. We extend the ontology to let us incorporate natural language-to-bash command mappings, a valuable feature as students tend to ask questions regarding command-line use. Graph data is extracted using multiple methods and semantically scored to prioritize only the most relevant results. Our pipeline currently employs Microsoft’s Phi-3-mini-4k-instruct SLM, integrates LangChain for modular orchestration, and uses Neo4j as the graph database. We survey cybersecurity instructors to rate responses generated by the EDUHints and our GraphRAG system. Results show that hints generated using a GraphRAG are preferred almost three times more by cybersecurity instructors. This suggests that an SLM’s educational hint generation abilities can be improved through our GraphRAG architecture.

Topics & Keywords

Advanced Graph Neural Networks Intelligent Tutoring Systems and Adaptive Learning Online Learning and Analytics

UN Sustainable Development Goals

Quality Education

Publication Details

Published in: International Conference on Cyber Warfare and Security

Volume 21, Issue 1, pp. 1-8

DOI: 10.34190/iccws.21.1.4434

Field-Weighted Citation Impact: 0.00