Date of Award

2026

Document Type

Thesis

Degree Name

Master of Science in Artificial Intelligence

Department

Digital Engineering

Committee Chair and Members

Naoual Amrouche

Keywords

AI-enabled medical auditing Healthcare fraud detection Large language models RoBERTa SHAP interpretability and stability Synthetic data

Abstract

The rapid evolution of Large Language Models (LLMs) has introduced a sophisticated new vector for healthcare fraud: the generation of high-fidelity, synthetic medical prescriptions. Traditional fraud detection systems, which rely on rule-based engines and basic statistical anomalies, are increasingly ill-equipped to identify these AI-generated forgeries that mimic the structural and clinical logic of authentic records. This thesis presents a robust detection framework using Transformer-based architectures to distinguish between human-authored Medicare Part D prescriptions and fully synthetic records generated by GPT-4.

The research was conducted across two distinct phases: an initial pilot study using 4,000 samples and a rigorous validation stress test using 10,000 samples sourced from the 2023 Centers for Medicare & Medicaid Services (CMS) Prescriber Summary. A specialized semicolon-delimited string format was developed to strip away metadata and force the model to focus exclusively on the linguistic and clinical consistency of the prescription data.

Results from the pilot study showed near-perfect classification (99.92% accuracy), though SHAP (SHapley Additive exPlanations) analysis revealed reliance on lexical shortcuts. In the 10,000-sample validation phase, performance normalized to a more robust 85.09% accuracy, while the SHAP Stability Score improved from 0.67 to 0.75. This increase in stability confirms that as the dataset scale increased, the model’s reasoning became more robust, transitioning from a reliance on superficial lexical shortcuts to the identification of deep clinical stylometry characterized by variance compression and hyper-regularity. These findings provide a scalable, privacy-preserving blueprint for integrating Transformer-based auditing into existing healthcare ecosystems to safeguard against generative AI-enabled fraud.

Keywords: Healthcare Fraud Detection; Large Language Models; Logistic Regression; RoBERTa; Clinical Natural Language Processing; AI-Enabled Medical Auditing; Health Informatics Security; SHAP Interpretability and Stability, Synthetic Data

Share

COinS