USING MACHINE LEARNING TO ASSESS WORKPLACE ACCIDENT COMPENSATION CASES VIA HEALTH RECORDS
DOI:
https://doi.org/10.63125/fqs8ca04Keywords:
Workplace Accident Compensation, Electronic Health Records, Administrative Claims, Quantitative Cross-SectionalAbstract
This study employs a quantitative, cross-sectional, multi–case design to evaluate determinants of workplace accident compensation outcomes through harmonized electronic health records (EHRs), administrative claims, and a brief Likert-scale survey capturing process and communication constructs. The investigation encompasses multiple compensation jurisdictions and health systems, enabling comparative analysis and enhancing external validity. Closed adjudicated cases serve as the analytic unit, providing consistent labeling for four key outcomes, approval status, benefit magnitude, processing time, and appeal occurrence, each reflecting distinct stages of the adjudicative lifecycle. Data integration occurs under a privacy-preserving linkage protocol across three coordinated sources: structured EHR extracts containing diagnostic codes, procedures, medications, laboratory results, and encounter timestamps; claims system data containing lodgment and decision dates, indemnity and medical payments, attorney involvement, and prior claims indicators; and survey data capturing process-level constructs such as documentation completeness, communication quality, transparency, and adjudication clarity. Rigorous data quality assessments, reproducible phenotyping of clinical and administrative variables, and standardized reporting protocols ensure methodological transparency. Missingness is characterized and resolved through multiple imputation or indicator approaches, outliers are managed via clinically justified bounds, and all variables are timestamped relative to the index event to maintain temporal coherence and prevent information leakage. The analytical framework combines explanatory and predictive modeling to investigate the interplay between clinical burden, process efficiency, and adjudicative outcomes. Logistic regression models estimate approval and appeal likelihoods, generalized linear models with Gamma family and log link address skewed benefit magnitudes, and Cox proportional hazards models quantify time-to-decision processes. Model performance is evaluated using discrimination (AUC, precision–recall AUC, F1 score), calibration (Brier score, calibration slope), and equity diagnostics across subgroups defined by age, sex, industry, and language. Predictor domains include clinical severity, comorbidities, medication exposure, care intensity, process metrics, prior claims, and demographic–occupational context, complemented by the four Likert-based process indices. Reproducibility is ensured through version-controlled scripts, documented variable definitions, and auditable preprocessing pipelines that promote transparency and external validation. Collectively, the study advances a robust, equity-aware methodological framework for assessing and predicting compensation adjudication outcomes using EHR-linked data—bridging clinical analytics with administrative decision-making to enhance fairness, interpretability, and accountability in occupational health research.
