AN EMPIRICAL EVALUATION OF MACHINE LEARNING TECHNIQUES FOR FINANCIAL FRAUD DETECTION IN TRANSACTION-LEVEL DATA
DOI:
https://doi.org/10.63125/60amyk26Keywords:
Transaction-level fraud detection, Data quality, Model interpretability, XGBoost, Compliance readinessAbstract
Financial institutions increasingly rely on cloud hosted, data driven transaction monitoring, yet many fraud programs still struggle to balance fraud capture with false alert workload. This study tested how readiness determinants shape fraud detection effectiveness and compared machine learning classifiers on transaction level data in a quantitative, cross sectional, case-based design. Survey data were collected from 180 practitioners in one enterprise fraud platform case, spanning fraud and risk analysts, compliance and audit, IT and data engineering, and operations management, complemented by transaction records from the same environment. Independent variables were Data Quality, System Integration, Analytics Competency, Model Interpretability, Management Support, and Compliance Readiness; the dependent variable was Fraud Detection Effectiveness. Analysis included reliability testing, descriptive statistics, Pearson correlations, multiple regression, and an ML benchmark using precision, recall, F1, and ROC AUC with cross validation and threshold sensitivity. All constructs were reliable (Cronbach alpha .81 to .89). Fraud Detection Effectiveness was moderately high (M 3.74, SD 0.62), while System Integration was the weakest area (M 3.41, SD 0.71). Correlations were positive, strongest for Data Quality (r .62) and Model Interpretability (r .49). The regression model explained 53 percent of variance (F (6,173) 32.41, R2 .53); Data Quality (beta .36, p < .001), Model Interpretability (beta .19, p .002), Management Support (beta .17, p .004), and Analytics Competency (beta .14, p .017) were significant, while System Integration was not (beta .07, p .192). On transaction evaluation, XGBoost achieved the best balance (Precision 0.84, Recall 0.79, F1 0.81, ROC AUC 0.93) and remained stable (F1 0.80 ± 0.03). Profiling showed higher fraud rates at night (2.8 percent) and in high velocity bursts (4.6 percent). Compliance Readiness showed borderline influence (beta .09, p .052), and mid-range amounts of $120 to $500 contained 46.9 percent of fraud cases. Implications are that data governance and explainability should be treated as core controls alongside model selection, improving performance, auditability, and threshold tuning to match investigation capacity.
