Usability of Mobile Health Apps for Postoperative Care: Systematic Review

Background Mobile health (mHealth) apps are increasingly used postoperatively to monitor, educate, and rehabilitate. The usability of mHealth apps is critical to their implementation. Objective This systematic review evaluates the (1) methodology of usability analyses, (2) domains of usability being assessed, and (3) results of usability analyses. Methods The A Measurement Tool to Assess Systematic Reviews checklist was consulted. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses reporting guideline was adhered to. Screening was undertaken by 2 independent reviewers. All included studies were assessed for risk of bias. Domains of usability were compared with the gold-standard mHealth App Usability Questionnaire (MAUQ). Results A total of 33 of 720 identified studies were included for data extraction. Of the 5 included randomized controlled trials (RCTs), usability was never the primary end point. Methodology of usability analyses included interview (10/33), self-created questionnaire (18/33), and validated questionnaire (9/33). Of the 3 domains of usability proposed in the MAUQ, satisfaction was assessed in 28 of the 33 studies, system information arrangement was assessed in 11 of the 33 studies, and usefulness was assessed in 18 of the 33 studies. Usability of mHealth apps was above industry average, with median System Usability Scale scores ranging from 76 to 95 out of 100. Conclusions Current analyses of mHealth app usability are substandard. RCTs are rare, and validated questionnaires are infrequently consulted. Of the 3 domains of usability, only satisfaction is regularly assessed. There is significant bias throughout the literature, particularly with regards to conflicts of interest. Future studies should adhere to the MAUQ to assess usability and improve the utility of mHealth apps.


Introduction
Industry experts have forecasted significant growth in mobile app users [1]. Given this projected surge, mobile health (mHealth) apps offer a unique and readily accessible platform to the patient, surgeon, and innovator. mHealth apps are now being integrated into various sectors of health care, with over 318,000 [2] apps currently helping to track, educate, and diagnose [3].
One area of particular growth is the use of mHealth apps as a means of monitoring patients in the important postoperative period. Well-designed apps have the potential to encourage earlier discharge, reduce in-person follow-ups [4,5], rehabilitate [6], aid clinicians in picking up surgical complications [7], and improve communication between patient and health care professional [8]. In addition to the economic and medical benefit of early discharge, postoperative monitoring apps have the potential to empower patients, giving them autonomy over their own health, which in turn might improve patient satisfaction and motivation for recovery [9].
The usability of mHealth apps is important [10,11] because those with poor usability will be less commonly used [12,13]. This is particularly significant in the postoperative period, given the focus of mHealth apps on rehabilitation, for which patient engagement is critical. One study revealed that around half of all mHealth app users stop engaging for various reasons, including loss of interest [14]. Despite this, little empirical research is undertaken to analyze the usability of mHealth apps before they are launched [15].
Several definitions and domains of usability have been previously defined without clear unification [11,16,17], but with several recurring themes. For example, the International Organization for Standardization (ISO) 3-pronged definition includes effectiveness (ie, whether users can use the product to complete their goals), efficiency (ie, the extent to which individuals expend resource in achieving their goals), and satisfaction [18]. Another definition [19] has been designed specifically for mHealth apps and includes factors such as mobility, connectivity, and additional cognitive load.
Different methods have been proposed for assessing domains of usability, such as the Post-Study System Usability Questionnaire [20] and the System Usability Scale (SUS) [21]. However, these tools were not originally created to evaluate mHealth apps. The Mobile App Rating Scale [22] was recently created for researchers and clinicians to assess the quality of mHealth apps, with the simpler user version of the Mobile App Rating Scale (uMARS) [23] being proposed shortly after. While quality of an mHealth app shares several components with usability, there are important differences.
Given the heterogeneity in definitions and methods used for assessing the usability of mHealth apps, one group has recently developed and validated the 21-item mHealth App Usability Questionnaire (MAUQ) [24]. This tool explores 3 domains of usability, which are in line with the ISO definition: (1) ease of use and satisfaction, akin to ISO satisfaction; (2) system information arrangement, akin to ISO efficiency; and (3) usefulness, akin to ISO effectiveness. This systematic literature review aims to determine whether the usability of postoperative mHealth apps is being rigorously assessed, using the validated MAUQ as the gold-standard reference. We consider which empirical methods are being used and analyze whether postoperative mHealth apps are indeed usable.

Database Search
The A Measurement Tool to Assess Systematic Reviews checklist [25] was analyzed before conducting this review, with all methodology being established prior to the review being conducted. A university librarian experienced in the field of systematic literature review methodology was consulted. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) [26] reporting guideline was adhered to for this review. Rayyan (Qatar Computing Research Institute) [27] software was used for the search.
Textbox 1 shows the questions that were defined.
The Medline, Embase, and Association for Computing Machinery Digital Library databases were searched. The search string was generated and aimed to provide maximum coverage while maintaining manageability. We defined 4 broad themes for our search. Terms within a theme were combined using Boolean operator OR, as seen in Table 1. Themes were then combined using Boolean operator AND.

Screening of Papers for Inclusion and Exclusion
Each study recruited from the initial search was evaluated to determine whether it should be admitted for analysis. The inclusion and exclusion criteria are shown in Textbox 2.
Screening of article titles and abstracts was performed by 2 authors independently. In situations where eligibility of a study could not be determined based on abstract alone, the full-text article was retrieved. We executed a full-text review of the remaining studies after title and abstract screening to further analyze appropriateness for inclusion. We analyzed all review articles to identify any other appropriate studies. We also reviewed the reference list of included papers.

Inclusion Criteria
• The paper uses a mobile health app, defined as an application (rather than a web-based tool) on a portable device (including smartphones and tablets). We include apps designed both for the patient and for the health care professional. We include all types of apps, including monitoring, educational, and rehabilitation apps • The paper analyzes the postoperative period, defined as the point at which the patient leaves the operating theater, having undergone a surgical procedure

•
The paper studies usability of the mobile health app. Any level of assessment is included, from structured questionnaire to analysis of engagement or time spent on the app • The paper must be a full paper (not an abstract)

Exclusion Criteria
• The paper is not written in English

•
The paper was published before 2000, in keeping with the launch of the first smartphone, the Ericsson R380 (Ericsson Mobile Communications) • The paper only uses web-based, text-based, or email-based technologies (no mobile health app). We want to concentrate on mobile health apps, given that they are the subject of such traction in the market

Database Search Results
The initial search and reference list screening identified 721 studies. After title and abstract screening, 660 were excluded, leaving 61 full-text studies to be assessed. Of these, 28 were excluded, leaving 33 studies included for data extraction. The PRISMA summary of the database search is presented in Figure  1.

Study Characteristics
A total of 33 studies were included. Of the 33 studies, 21 were from North America (14 from the United States and 6 from Canada), 9 were from Europe, 2 were from Asia, and 1 was from South America. Most studies specified the type of mobile device used by participants. Smartphones were used in 22 studies, tablets were used in 9, smartwatches were used in 1, iPod touch (Apple Inc) devices were used in 2, and 3 studies did not specify. Regarding the operating system, 11 studies used iOS (Apple Inc), 5 used Android, 1 used Windows (Microsoft Corp), and 17 did not specify.
Functionality was divided into 5 clear categories; 26 studies included monitoring of symptoms or wounds, 8 included educational content, 5 provided a communication platform, 5 included physiotherapy and rehabilitation, and 2 enabled medication management. App details are presented in Table 2.
Study characteristics are presented in Table 3. With regards to study design, 5 studies were randomized controlled trials (RCTs), 25 were prospective noncontrolled studies, and 3 were retrospective reviews. Sample sizes ranged from 4 to 494, with a median of 39 patients and a mean of 81 patients. Follow-up ranged from 30 minutes postoperation to 12 months postdischarge. The follow-up period was less than 7 days in 4 studies, between 1 week and 1 month in 15 studies, greater than 1 month in 9 studies, and not declared in 5 studies.

Usability Analysis
Regarding the method of usability analysis, usage (ie, monitoring of user engagement with the app) was used in 15 studies and was the only usability analysis employed in 4 studies. Interviews were used in 10 studies. Self-created questionnaires were used in 18 studies. Validated questionnaires were used in 9 studies. Of these, 7 used the SUS questionnaire, 1 used the uMARS questionnaire, 1 used the technology acceptance subscale, and 1 used the Computer System Usability Questionnaire (CSUQ).
We have categorized the domains of usability according to the MAUQ. A total of 28 studies covered ease of use and satisfaction, 11 studies covered system information arrangement, and 18 studies covered usefulness.
Average SUS scores ranged from 76 to 95 out of 100, with a median score of 87. The uMARS score was 4.1 out of 5. The CSUQ score was 2 out of 7 (whereby a score of 1 would indicate greatest usability).

Bias
There is significant potential for bias in studies evaluating the usability of mHealth apps. Hidden agenda bias and secondary gains bias were common and seemingly underreported in the literature. Of the 33 included studies, 8 officially reported authors' conflicts of interest, stating that they held shares in the app. Furthermore, several of the study groups were provided with the apps free of charge [28], which has clear implications on the usability domain of satisfaction; users who have paid for an app might be expected to have higher expectations than those who have been given an app for free. Perhaps more worryingly, a number of groups [38] declared no conflict of interest, despite seemingly being founders of their app.
Nonresponse bias is a further concern. Some studies, such as Pecorelli et al [44], had high response rates (96%) to usability analyses. However, others, such as Nilsson et al [43], had much lower rates (57.5% on day 14), and some [51] did not disclose the proportion of responders. Nonresponders to usability analyses are more likely to have reported poor usability. Therefore, studies with high rates of nonresponders are likely to have inflated usability results.
Population bias is a further issue. Younger audiences are likely to be more adept at using mobile technologies. Therefore, studies that include a younger demographic are likely to demonstrate inflated usability results. Additionally, the generalizability of results from studies [44] that included patients that were not used to mobile technologies may be limited and may change in the future, when greater numbers of older patients are used to mobile technologies.

Principal Findings
To our knowledge, this is the first comprehensive systematic review to assess usability of mHealth apps in postoperative management. This review identified 33 studies evaluating the usability of mHealth apps in the postoperative period across a broad range of surgical subspecialties, demonstrating the growing interest in this area. Most of the included studies were derived from the United States and Europe, which appear to be hubs of innovation in the field. Unsurprisingly, smartphones were the most commonly used devices. However, we suspect that wearable devices such as smartwatches, which have additional monitoring capabilities such as electrocardiogram monitors, will play an increasingly important role in the future [61].
With respect to study designs, 25 of 33 studies were prospective noncontrolled trials. There were 5 RCTs, but usability was never a primary end point in these studies. We feel RCTs comparing mHealth apps to normal practice (eg, in-person follow-up, telephone follow-up, or no follow-up) would be particularly beneficial in assessing the domains of satisfaction and usefulness. It has also been suggested that mHealth app interventions are associated with a falsely heightened level of user satisfaction due to patients' affinities for their digital devices [62]. This could be minimized by comparing postoperative mHealth apps to a sham app. However, we also acknowledge that RCTs have previously been described as an impractical evaluation methodology for mHealth apps, due to their prolonged duration from recruitment to results and their high costs [63].
The methodology for assessing usability was generally poor. The majority of analyses used simplistic self-created questionnaires that asked rudimentary questions focusing on the domain of satisfaction (28/33 studies) rather than other domains of usability. Indeed, only 11 of the 33 usability analyses assessed the domain of system information arrangement. We would argue that formal usability analyses should cover all 3 common domains of (1) satisfaction, (2) usefulness, and (3) system arrangement, according to the ISO definition of usability [18]. Validated questionnaires are helpful in assessing these areas reliably. Only 9 of the 33 included studies used validated questionnaires, most of which used the SUS. The SUS is a Likert scale made up of 10 questions. The average SUS score is 68 out of 100, meaning that all 7 studies that used the SUS scored above average in terms of usability. Although the SUS is a quick and cheap means of assessing usability, it was created in 1986, before the first smartphone or the concept of an app was realized. The SUS has not been validated for assessing mHealth apps. In comparison, the MAUQ was recently proposed and validated for use in mHealth apps in a population of English-speaking adults [64]. This is the gold-standard reference for analysis of mHealth app usability. While scores on the MAUQ have previously been shown to correlate with the SUS, this is not a strong correlation (r=0.643), thereby highlighting the inadequacy of studies that have only used the SUS.
A major concern in these studies is the risk of bias. A number of the studies' authors have a financial interest in the usability of their apps, with high user satisfaction making adoption by hospitals and investors more likely. Furthermore, devices were sometimes provided free of charge, which could influence the feedback from users.

Conclusions
mHealth apps have significant potential during the postoperative period for encouraging earlier discharge, improving patient engagement, and offering a safety net for early identification of complications. Thorough analysis of usability is critical to the adoption of these novel technologies in the postoperative period; those with poor usability will have little impact in health care. According to this review, usability analyses to date have been substandard. They have focused on satisfaction, a narrow dimension of usability, with simplistic self-created questionnaires. Furthermore, there is a significant risk of bias, given the common conflicts of interest among authors of published studies. We hope this review changes future practice, with researchers undertaking more robust assessments of usability by employing validated questionnaires, such as the MAUQ, in blinded RCTs.