Enhancing movie script creation through retrieval-augmented LLMs and stable diffusion scene modeling

20260 citationsJournal Articlegold Open Access

Authors

Ansh Lulla · Symbiosis International University

Aayush Koul · Symbiosis International University

Rampalli Agni Mithra · Symbiosis International University

Aniket K. Shahade · Symbiosis International University

Mayur Gaikwad · Symbiosis International University

Abstract

Script writing has become a labor-intensive process, especially when tailored to specific scene setting, culture or character. Recent advancements in the field of Natural Language Processing and Deep Learning algorithms have made it possible to automate the process of script writing. With a simple prompt, it is possible to generate the entire script for a movie, which would be customizable by the input's creativity and imagination. Retrieval Augmented Generation has the potential to facilitate rapid prototyping of movie scripts by ingesting the data in a vector database and retrieving scripts which would be most similar with the input prompt. These retrieved scripts would act as part of context for the Large Language Models and prevent them from hallucinating. This approach enhances the potential of these LLMs to generate context-specific output which is essential when taking into consideration the fine details mentioned in the input prompt. Fine-tuning LLMs is another approach which helps downstream the LLMs to learn how to generate movie scripts. With the inclusion of visualizing the script elements, one can conveniently turn their ideas into a script and scenes. Combining the capabilities of Stable Diffusion with the LLMs, script generation can be extended to scene generation. On training multiple models on the dataset of movie scripts, Gemini-Pro (for RAG) was very effective with a cosine similarity of 0.5713 whereas GPT-2 and Bloom (for fine-tuning) showed a cosine similarity of 0.5011 and 0.5058 respectively between the input prompt and the generated script, and perplexity of 1.7443 for GPT-2 and 1.6892 for Bloom showing that GPT-2 is able to generate scripts which are coherent and relevant to the input prompt and has thus understood the language structure and patterns well. A CLIP score of 0.3061 was achieved with using CompVis for generation of movie scenes.

Topics & Keywords

Generative Adversarial Networks and Image Synthesis Multimodal Machine Learning Applications Artificial Intelligence in Games

Publication Details

Published in: Scientific Reports

DOI: 10.1038/s41598-026-45852-z

Field-Weighted Citation Impact: 0.00

Command Palette

Enhancing movie script creation through retrieval-augmented LLMs and stable diffusion scene modeling

Authors

Abstract

Topics & Keywords

Publication Details