Synthetic Pseudo-CIBIL Dataset Creation and Predictive Modelling

20250 citationsJournal Article

Authors

Rohan Ganesh · Vilnius College of Design

K S Sudip Aniruddh · Victoria College

Palash Tamrakar · Victoria College

Rithvik Rajshekaran · Meirin College

Rohan Kurup · Vilnius College of Design

Bhaskar M G · East-Siberian Institute of Economics and Management

Rajeswara Rao K V S · East-Siberian Institute of Economics and Management

Abstract

Credit, a fundamental pillar of modern commerce, hinges on trust-based agreements between lenders and borrowers. To demystify how creditworthiness can be computationally assessed, this study develops a case study involving the synthesis of credit scoring datasets from multiple public sources. This simulated dataset acts as a proxy to real-world credit data, enabling exploration of analytical methods without regulatory constraints. The primary objective of this work is to provide a hands-on framework for analyzing and interpreting credit information using machine learning. It demonstrates key processes such as feature engineering, handling missing values, data normalization through Min-Max scaling, and imputing medians. Furthermore, it covers the construction of pseudo-CIBIL scores using a weighted aggregation of components relevant to credit scoring, offering a robust structure for predictive modeling. The paper also explores unsupervised techniques like DBSCAN to identify potential customer segments based on behavioral clustering. These synthetic credit scores are compared with proxy and theoretical distributions using KL Divergence and Wasserstein Distance. Finally, percentile-based benchmarking is performed to validate the system's fidelity against known industry standards, thereby highlighting the efficacy of simulation-based approaches in credit risk modeling.

Topics & Keywords

Financial Distress and Bankruptcy Prediction Credit Risk and Financial Regulations Imbalanced Data Classification Techniques

Publication Details

DOI: 10.1109/csitss67709.2025.11295542

Field-Weighted Citation Impact: 0.00

Command Palette

Synthetic Pseudo-CIBIL Dataset Creation and Predictive Modelling

Authors

Abstract

Topics & Keywords

Publication Details