A Neural Network Model Using Pain Score Patterns to Predict the Need for Outpatient Opioid Refills Following Ambulatory Surgery: Algorithm Development and Validation

Background Expansion of clinical guidance tools is crucial to identify patients at risk of requiring an opioid refill after outpatient surgery. Objective The objective of this study was to develop machine learning algorithms incorporating pain and opioid features to predict the need for outpatient opioid refills following ambulatory surgery. Methods Neural networks, regression, random forest, and a support vector machine were used to evaluate the data set. For each model, oversampling and undersampling techniques were implemented to balance the data set. Hyperparameter tuning based on k-fold cross-validation was performed, and feature importance was ranked based on a Shapley Additive Explanations (SHAP) explainer model. To assess performance, we calculated the average area under the receiver operating characteristics curve (AUC), F1-score, sensitivity, and specificity for each model. Results There were 1333 patients, of whom 144 (10.8%) refilled their opioid prescription within 2 weeks after outpatient surgery. The average AUC calculated from k-fold cross-validation was 0.71 for the neural network model. When the model was validated on the test set, the AUC was 0.75. The features with the highest impact on model output were performance of a regional nerve block, postanesthesia care unit maximum pain score, postanesthesia care unit median pain score, active smoking history, and total perioperative opioid consumption. Conclusions Applying machine learning algorithms allows providers to better predict outcomes that require specialized health care resources such as transitional pain clinics. This model can aid as a clinical decision support for early identification of at-risk patients who may benefit from transitional pain clinic care perioperatively in ambulatory surgery.


Introduction
Opioids play an essential role in acute perioperative pain management. Increased attention to pain management as a quality metric has brought to light an overuse of prescription opioids contributing to an epidemic across the United States. The United States has had increased opioid prescriptions filled in the immediate postoperative period; a study reported that the mean dose of opioids prescribed for most surgical procedures in the United States was higher than that prescribed in other Persistent opioid prescribing is often postsurgical [4], in which as many as 3% of opioid-naive patients required opioids for more than 90 days after a major elective surgery [5]. One potential service that may help curb outpatient opioid use after surgery is the transitional pain clinic, which consists of a team of providers who implement multidisciplinary opioid-sparing approaches such as pharmacological, nonpharmacological, and psychological interventions with the goal of weaning patients from opioids postoperatively as outpatients [6,7]. Transitional pain clinics have been shown to reduce opioid use postoperatively, symptoms of anxiety and depression, pain catastrophizing, and postsurgical pain [8,9]. Given the increased resources required to provide this type of service, not all surgical patients may realistically receive postoperative care from transitional clinics. Currently, the criteria for recommendation to transitional pain services for surgical patients are not uniformly defined; thus, accurate predictive methods for patients who may benefit from transitional pain clinics are needed. Less work has been done on patients undergoing ambulatory surgery and on the identification of patients who may likely require more opioids as an outpatient. In such populations, machine learning may be used to identify postoperative opioid use in the recovery room [10]. In addition, some studies have described the risk factors for using outpatient opioids after ambulatory surgery [11][12][13].
The objective of this study was to develop machine learning-based predictive models that may aid in the identification of patients likely to require opioid refills after their initial discharge prescription. Specifically, pain score patterns were incorporated (ie, trends in reported pain scores in the recovery room) into the models. We focused on patients who underwent ambulatory surgery, which included orthopedic surgery (eg, joint arthroscopy, forearm/hand surgery), nonmastectomy breast surgery, urology (eg, cystoscopy), minimally invasive surgery (eg, cholecystectomy, hernia repair), colorectal surgery (eg, hemorrhoidectomy), and gynecology (dilation and curettage/evacuation, hysteroscopy). We hypothesized that the use of neural networks that incorporate various features, including recovery room pain phenotypes, may identify patients at higher risk. The pain phenotypes included patterns in patient-reported pain scores in the recovery room, including trajectory of pain and median and maximum pain scores.

Study Population
Data were retrospectively collected from the electronic medical records of patients who underwent outpatient surgery from January to July 2020 at a single ambulatory surgery center. The outpatient surgeries included in the analysis included orthopedic surgery (eg, joint arthroscopy, forearm/hand surgery), nonmastectomy breast surgery, urology (eg, cystoscopy), minimally invasive surgery (eg, cholecystectomy, hernia repair), colorectal surgery (eg, hemorrhoidectomy), and gynecology (dilation and curettage/evacuation, hysteroscopy).

Ethics Approval
Our institutional review board (Human Research Protections Program) waived the consent requirement and approved this retrospective study (protocol 210099).

Primary Outcome and Features
The primary outcome of interest was a binary variable (response range, 0 or 1), in which 0 was defined as "no opioid refill" and 1 was defined as the patient "needed to refill their outpatient opioid prescriptions" within 2 weeks after surgery (no opioid refill vs opioid refill). This was captured retrospectively from the electronic medical record by review of the following: (1) any telephone note describing patient calling in for opioid renewal; (2) any progress/office visit note from primary care provider, pain medicine specialist, or surgical provider describing the need for opioid refill; or (3) any renewal order in the medication list for opioids within this time frame. On postanesthesia care unit (PACU) discharge, all patients were prescribed up to 5 days of opioids. For perioperative multimodal analgesia, all patients received preoperative acetaminophen unless contraindicated. For a subset of surgical procedures, regional nerve blocks were routinely offered preoperatively (ie, shoulder, hand/forearm, and knee surgeries). Intraoperatively, patients may have received fentanyl, hydromorphone, ketamine, ketorolac, and dexmedetomidine at the discretion of the anesthesiologist. In the PACU, patients were given oxycodone, fentanyl, and hydromorphone, as needed.
Features that were integrated into the model were collected retrospectively from the electronic medical record system. The data included age (years), sex (male vs female), body mass index (kg/m 2 ), English-speaking, comorbidities, regional nerve block performance, general anesthesia, intraoperative ketamine, intraoperative total intravenous anesthesia, opioid consumption, and pain scores (11-point numeric rating scale [NRS] from 0 to 10). These features were included, as they were determined to be relevant to postoperative opioid use based on clinical judgement and previous research [14,15]. Opioid consumption, defined as total opioids consumed intraoperatively and in the PACU, was measured in intravenous morphine equivalents (MEQ). Pain scores were captured as preoperative pain score, median pain score in the PACU, maximum pain score in the PACU, and slope of pain score trajectory in the PACU. Preoperative pain scores were collected by nurses upon arrival for preoperative check-in. PACU pain scores were captured every 5-15 minutes and recorded in the electronic medical record. A negative value for the pain score slope was defined as an overall decrease in pain scores throughout the PACU stay. A positive value for the pain score slope was defined as an overall increase in the pain scores throughout the PACU stay. A zero value of the pain score slope was defined as no change in the overall trend of pain scores throughout the PACU stay.

Statistical Analysis
Python (v3.10.1) was used for all statistical analyses. Patient and surgical characteristics were compared with chi-squared test (categorical) and Wilcoxon rank sum test (continuous). A generalized linear mixed model fit by maximum likelihood was implemented to model the features to the primary outcome of opioid refill. The random effect in this model was the surgical procedure. All features were included in the model, and their association with the outcome was reported by their respective odds ratios (ORs), 95% CI, and P values. A neural network model to predict the need for opioid refills following surgery was constructed. Logistic regression, random forest, and support vector machine classifiers were implemented for comparison. For all models, patient data were divided into training and test data sets with a 70:30 split by using a stratified randomized splitter-the train_test_split method from the sci-kit learn library. K-fold cross-validation on the training set was used to tune the hyperparameters and to optimize oversampling techniques as well as to calculate the average sensitivity, specificity, F1-score, and area under the curve (AUC) for the receiver operating characteristic curve. The final version of each model was then validated on the test set and the AUC was reported. Feature importance from the neural network was ranked based on Shapley Additive Explanations (SHAP).

Data Balancing
Synthetic Minority Oversampling Technique (SMOTE) for Nominal and Continuous algorithm and random undersampling were both implemented using the imblearn library [16]. These tools were used to achieve a balanced class distribution with minimal difference between positive and negative outcomes. A data set with a large difference between positive and negative outcomes was considered unbalanced and may make it difficult for predictive machine learning models to draw useful conclusions, given the uneven classification of data.
Random undersampling of the majority outcome is frequently used to reduce the impact of imbalanced data sets; however, SMOTE oversamples were used to create synthetic minority class examples to balance the minority class with the majority class. SMOTE uses samples from the minority class and a set number of nearest neighbors-in this case, 5-to generate synthetic cases from the sample class. Combining the 2 techniques as outlined yielded positive outcomes. Both techniques were only applied on the training set. Different combinations for proportions of minority to majority class were analyzed, ranging from 0.25 to 1.00. After performing k-fold cross-validation, the parameter "sampling_strategy" for the SMOTE class from imblearn was optimal when set to 0.24 and the parameter "sampling_strategy" for the RandomUnderSampler class from imblearn was optimal when set to 0.94. Optimal results were based on which hyperparameters produced the highest performance metrics for the model (eg, AUC, F1-score, sensitivity, specificity).

Machine Learning Models
Four different machine learning classification models were evaluated: neural network, logistic regression, random forest classifier, and support vector classifier. For each model, the following sampling methods were compared: oversampling the training set via SMOTE, undersampling the majority class in the training set, a combination of SMOTE and undersampling of the majority classes, and no oversampling or undersampling technique. The results from the sampling method that provided the optimal results were reported. For each model, all features were included as inputs. One-hot encoding was used for categorical features.

Multilayer Perceptron Neural Network
Using the Keras interface for the TensorFlow library, a shallow feedforward neural network was constructed. The rectified linear unit function was used as the activation function. The final output layer used the sigmoid activation function, and the overall model used the Adam optimizer. Repeated k-fold cross-validation was used to tune the hyperparameters to find the optimal parameter values for the number of hidden layers (1), number of neurons per hidden layer (100), maximum number of iterations (300), batch size (16), and learning rate (0.0001).

Logistic Regression
The logistic regression classifier predicts the probabilities of the different outcome possibilities based on the input. A newton-cg solver regression model was implemented without specifying individual class weights. This model provided a baseline score and helped make the case for improvement over the evaluation metrics. Repeated k-fold cross-validation was used to tune the hyperparameters to find the optimal parameter value for C (the strength of the regularization is inversely proportional to C), which was 0.3.

Random Forest Classifier
The random forest is an ensemble approach, which has been proven effective for a variety of classification problems. To tune the hyperparameters, we performed repeated k-fold cross-validation to find the optimal parameter values for maximum depth (75), minimal samples required to be at the leaf node (4), minimal samples required to split an internal node (4), and number of estimators (100) (ie, number of trees).

Support Vector Classifier
A support vector classifier maps the data onto an n-dimensional space (n being the number of features) and then identifies the hyperplane decision boundary that best separates the data into 2 classes by maximizing the distance between the hyperplane and the nearest data point in either class. K-fold cross-validation was used to tune the hyperparameters to identify the optimal parameter value for C, which was 130.

Performance Metrics
The primary performance metric of interest was the AUC for the receiver operating characteristic curve. In addition, we reported F1-scores, sensitivity, and specificity.

K-Fold Cross-Validation
To effectively tune the hyperparameters of the models, stratified k-fold cross-validation was implemented on the training set to observe the sensitivity, specificity, F1-score, and AUC scores, for 10 splits. For each iteration, the data set was split into 10 groups (folds). One fold served as the test set with the other 9 serving as the training set. When assessing the effectiveness of SMOTE and random undersampling, only the training folds were changed. This process was repeated until each fold served as the test set once. Every model exhibited improved performance metrics when SMOTE and random undersampling were applied.

Study Cohort Characteristics
There were 1333 patients and 28 unique surgical procedures included in the final analysis, and 144 (10.8%) patients refilled their opioid prescription within 2 weeks after outpatient surgery. Univariate analysis revealed that patients who required opioid refills were more likely to be smokers (32/144, 22.2% vs 156/1189, 13.1%, respectively; P=.005) and had a regional nerve block performed (87/144, 60.4% vs 440/1189, 37%, respectively; P<.001). Those who required opioid refills had higher total perioperative opioid consumption (intraoperative and PACU opioid use; P<.001), preoperative pain scores (P<.001), maximum PACU pain scores (P<.001), and median PACU pain scores (P<.001). Table 1 lists the differences between the opioid refill and non-opioid refill cohorts represented in the study population in order to provide information regarding the baseline characteristics. All surgical procedures included in our analysis are listed in Table 1.

Mixed Effects Logistic Regression Model
The need for opioid refill within 2 weeks after ambulatory surgery was modeled utilizing a mixed effects logistic regression analysis fit by maximum likelihood, in which the random effect was the surgical procedure (

Neural Network Approach to Predicting Opioid Refills
Hyperparameter tuning via grid search cross-validation was implemented to identify the best architecture of the multilayer perceptron neural network, which consisted of 1 hidden layer, 100 neurons within the hidden layer, 300 maximum iterations for learning, a batch size of 16, and a learning rate of 0.0001. Based on this architecture, the average AUC calculated from k-fold cross-validation was 0.71 (95% CI 0.68-0.74). The final model was then validated on the test set, which yielded an AUC of 0.75 (Figure 1). The features with the highest impact on model output for the neural network based on the absolute SHAP values were performance of a regional nerve block, maximum pain score in the PACU, median pain score in the PACU, active smoking history, and total opioid consumption (intraoperative and PACU) ( Figure 2). Next, various other machine learning-based models were implemented to predict the need for opioid refills after ambulatory surgery. Based on k-fold cross-validation, the average AUCs from models with optimized hyperparameters were identified for support vector machine (0.64, 95% CI 0.57-0.71), random forest (0.66, 95% CI 0.60-0.71), and logistic regression (0.69, 95% CI 0.66-0.74) ( Table 3). The final models for each machine learning approach were then validated on a separate test set when SMOTE was not applied versus when SMOTE was applied-the calculated AUCs were identified for support vector machine (0.65), random forest (0.68), and logistic regression (0.73). SMOTE improved the performance of each model.

Principal Findings
We demonstrated that a shallow feedforward neural network and other machine learning approaches that integrated pain score patterns had adequate performance to predict the need for opioid refills within 2 weeks following ambulatory surgery. The features with the highest impact on model output were active smoking history, intraoperative opioid consumption, PACU opioid consumption, regional nerve block utilization, as well as maximum and median pain scores in the PACU. The importance of pain score patterns (ie, median and maximum pain scores) in predicting opioid refills is interesting and highlights the association of PACU analgesia and opioid consumption with the requirement for more opioids following the initial prescription. This neural network may be useful in identifying patients at risk who require a longer duration of opioid use so that the limited hospital resources can be better utilized in a precise manner.

Comparison to Prior Work
Previous studies have reported the utilization of machine learning for predicting postoperative opioid use in ambulatory surgery [10,17]. Nair et al [10] reported the accuracies of regression, naïve Bayes, neural networks, random forest, and extreme gradient boosting in predicting postoperative opioid use in the recovery room and showed that random forest performed best when using only preoperative features. Anderson et al [17] utilized models to predict prolonged opioid use specifically in patients who underwent anterior cruciate ligament repair by using regression, Bayesian belief network, gradient boosting, and random forest. They found that gradient boosting was able to achieve an AUC of 0.77. In our study, we reviewed multiple types of ambulatory surgeries and focused on predicting the need for additional outpatient opioid refills weeks after surgery. Four computational approaches were used to determine the best model for our data set, and all had similar performances. Random forest, logistic regression, and support vector machine tools did not perform as well as the neural network, though the random forest model had increased specificity compared to the neural network. Both the support vector classifier and neural network can increase the dimensionality of the data to find a solution, but given the time and training, neural networks usually outperform support vector classifiers. Random forest and neural networks approach data inversely, as random forest decision trees are independent and neural network neurons are dependent on other neurons. Logistic regression is the standard approach but often does not perform well in multidimensional data sets. By surveying multiple models, the benefits of each can be identified and evaluated to improve the validity of the predicted features [18].
Opioids remain the cornerstone for acute postoperative pain management, and the perioperative period is often the patients' first introduction to prescription opioids. Our study's patient cohort was primarily opioid-naïve; only 4.6% (55/1189) of the patients in the non-refill group and 5.6% (8/144) of the patients in the refill group reported preoperative opioid use. Studies have shown surgical procedure as an independent risk factor for prolonged opioid use [4,19,20]. Other risk factors include preoperative opioids, tobacco use, gender, and mood disorders [21][22][23][24][25][26]. Although efforts are in place to standardize postoperative opioid prescriptions per surgical procedure [27], there continues to be a wide variety in the amount and duration of opioids prescribed and often in excess [1,[28][29][30][31]].
An estimated 67%-92% of the prescribed opioids for postoperative pain remain unused [1,32], leaving great potential for diversion and misuse. An increasing number of heroin users reported first being introduced to opioids via prescription and then resorting to heroin for cost and availability factors [33]. Likewise, Bartels et al [34] report that 80% of the opioid prescriptions remain unused with limited and challenging disposal options. Similarly, a 2017 systematic review reports that patients took only 29%-58% of the prescribed opioid pills [32]. Over the decades, we have learned that excess opioids do not necessarily reduce persistent postsurgical pain or any other pain-related outcomes [35]. As Porter and colleagues [36] demonstrated, sometimes less is more-patient-centered opioid discharge prescription guidelines satisfied 93% of the patients, with 99% in the 0 morphine milligram equivalents group [36]. Although there may be some procedures that do not require postoperative opioids, we must also find a balance and prescribe opioids as necessary to meet individual patients' pain needs [37]. For these reasons, risk stratification can be a helpful tool for guiding the process of postoperative opioid prescribing.
The use of regional anesthesia was associated with opioid refilling. It is important to note that there is no causality that may be drawn from these results but rather an association. It may be that the use of regional anesthesia was associated with surgeries that were more painful in nature, and despite pain scores being likely lower in the PACU, this group would more likely require additional opioids as outpatients when compared to other surgical procedures that are less likely to receive regional anesthesia for pain management. Other potential limitations include the variability in surgery type, which may range in pain level, both during surgery and during recovery, as well as the subjective nature of pain scores. Despite these limitations, the features that have been identified are actionable and trackable in future studies.

Limitations
There are several limitations in this study-mainly due to the inherent limitations of a retrospective analysis. First, the primary outcome (opioid refills) may potentially be underestimated, as we captured this data based on clinical notes and orders in the electronic medical record system. It is possible that the need for opioid refills was missed in some patients who sought care outside of our health care system (and thus not recorded in our records). However, we extracted the data via a manual clinician review to optimize accuracy as much as possible. A prospective study would be needed to assess the incidence of postsurgical opioid refills more accurately. Second, an issue of generalizability is also of concern, as this is single-institution data. Model performance (eg, AUC) could decrease in a surgical population outside of this institution. To avoid the issue of overfitting and, thus, limited generalizability, we calculated the metrics from k-fold cross-validation and furthermore used a holdout data set for validation. What is needed is a high-quality prospective study that can more accurately capture the features and outcomes from each patient and, subsequently, be validated at external institutions.

Future Directions
Early identification of at-risk patients prior to their elective surgical procedure is the key. These patients can then be referred to establish care with a dedicated and comprehensive transitional pain program. Built on solid evidence-based medicine, this multidisciplinary transitional pain service includes anesthesiology, pharmacy, psychiatry, and physical therapy.
Patients are often evaluated preoperatively to help manage expectations regarding anticipated postoperative pain and offer preoperative weaning when appropriate. This anesthesiologist-led team makes recommendations about intraoperative and immediate postoperative pain management [38], including predischarge and postdischarge tapering plans, if applicable. After the discharge, the transitional pain service can continue to manage these patients by using a multimodal approach with nonopioid medication, interventional procedures such as regional peripheral nerve blocks [39] or cryoanalgesia [40], as well as provide necessary psychological support [41]. Transitional pain clinics have been shown to reduce opioid use postoperatively, symptoms of anxiety and depression, pain catastrophizing, and pain [7,8]. Early identification of these clinical predictors, in conjunction with knowing the typical pain trajectories and patterns of common surgical procedures [42], can serve as the foundation for the basis of prescribing the right regimen and duration for the opioid prescription. Anesthesiology as a specialty, and especially in the setting of a dedicated acute pain service, is well positioned to take the lead in defining personalized pain medicine through all 3 phases of perioperative care [43].

Conclusions
Applying machine learning algorithms to electronic health data allows providers to develop models to predict more accurately and therefore appropriately allocate the limited health care resources (ie, transitional pain clinics). In this study, we showed that the need for regional anesthesia, high intraoperative opioid consumption, increased PACU pain scores, and opioid consumption were important features in models predicting outpatient opioid refills. Although providers are aware of the potential risk factors of opioid misuse, it remains challenging to accurately predict patients that will benefit from services as an outpatient. This prediction model serves as an example of a model that could be formalized into clinical decision support tools to help us better understand which patients will benefit from transitional pain clinics following ambulatory surgery.