Date of Award
2026
Document Type
Thesis
Degree Name
Master of Science in Artificial Intelligence
Department
Digital Engineering
Committee Chair and Members
Naoual Amrouche
Keywords
AI-enabled medical auditing Healthcare fraud detection Large language models RoBERTa SHAP interpretability and stability Synthetic data
Abstract
The rapid evolution of Large Language Models (LLMs) has introduced a sophisticated new vector for healthcare fraud: the generation of high-fidelity, synthetic medical prescriptions. Traditional fraud detection systems, which rely on rule-based engines and basic statistical anomalies, are increasingly ill-equipped to identify these AI-generated forgeries that mimic the structural and clinical logic of authentic records. This thesis presents a robust detection framework using Transformer-based architectures to distinguish between human-authored Medicare Part D prescriptions and fully synthetic records generated by GPT-4.
The research was conducted across two distinct phases: an initial pilot study using 4,000 samples and a rigorous validation stress test using 10,000 samples sourced from the 2023 Centers for Medicare & Medicaid Services (CMS) Prescriber Summary. A specialized semicolon-delimited string format was developed to strip away metadata and force the model to focus exclusively on the linguistic and clinical consistency of the prescription data.
Results from the pilot study showed near-perfect classification (99.92% accuracy), though SHAP (SHapley Additive exPlanations) analysis revealed reliance on lexical shortcuts. In the 10,000-sample validation phase, performance normalized to a more robust 85.09% accuracy, while the SHAP Stability Score improved from 0.67 to 0.75. This increase in stability confirms that as the dataset scale increased, the model’s reasoning became more robust, transitioning from a reliance on superficial lexical shortcuts to the identification of deep clinical stylometry characterized by variance compression and hyper-regularity. These findings provide a scalable, privacy-preserving blueprint for integrating Transformer-based auditing into existing healthcare ecosystems to safeguard against generative AI-enabled fraud.
Keywords: Healthcare Fraud Detection; Large Language Models; Logistic Regression; RoBERTa; Clinical Natural Language Processing; AI-Enabled Medical Auditing; Health Informatics Security; SHAP Interpretability and Stability, Synthetic Data
Recommended Citation
Vokkaleri Shankarappa, Ankitha, "AI vs. GenAI: Combating LLM generated prescription fraud with transformer based detection models" (2026). Selected Full-Text Master Theses 2021-. 53.
https://digitalcommons.liu.edu/brooklyn_fulltext_master_theses/53