Quality improvements of healthcare trajectories by learning from aggregated patient-reported outcomes: a mixed-methods systematic literature review

Background In healthcare, analysing patient-reported outcome measures (PROMs) on an aggregated level can improve and regulate healthcare for specific patient populations (meso level). This mixed-methods systematic review aimed to summarize and describe the effectiveness of quality improvement methods based on aggregated PROMs. Additionally, it aimed to describe barriers, facilitators and lessons learned when using these quality improvement methods. Methods A mixed-methods systematic review was conducted. Embase, MEDLINE, CINAHL and the Cochrane Library were searched for studies that described, implemented or evaluated a quality improvement method based on aggregated PROMs in the curative hospital setting. Quality assessment was conducted via the Mixed Methods Appraisal Tool. Quantitative data were synthesized into a narrative summary of the characteristics and findings. For the qualitative analysis, a thematic synthesis was conducted. Results From 2360 unique search records, 13 quantitative and three qualitative studies were included. Four quality improvement methods were identified: benchmarking, plan-do-study-act cycle, dashboards and internal statistical analysis. Five studies reported on the effectiveness of the use of aggregated PROMs, of which four identified no effect and one a positive effect. The qualitative analysis identified the following themes for facilitators and barriers: (1) conceptual (i.e. stakeholders, subjectivity of PROMs, aligning PROMs with clinical data, PROMs versus patient-reported experience measures [PREMs]); (2a) methodological—data collection (i.e. choice, timing, response rate and focus); (2b) methodological—data processing (i.e. representativeness, responsibility, case-mix control, interpretation); (3) practical (i.e. resources). Conclusion The results showed little to no effect of quality improvement methods based on aggregated PROMs, but more empirical research is needed to investigate different quality improvement methods. A shared stakeholder vision, selection of PROMs, timing of measurement and feedback, information on interpretation of data, reduction of missing data, and resources for data collection and feedback infrastructure are important to consider when implementing and evaluating quality improvement methods in future research. Supplementary Information The online version contains supplementary material available at 10.1186/s12961-022-00893-4.


Background
Since the introduction of value-based healthcare by Porter [1] in 2006, an increase in the use of patients' perspectives on health outcomes for quality and safety improvement in healthcare has been observed [2], in addition to process and clinical outcomes [3][4][5]. These so-called patient-reported outcome measures (PROMs) capture a person's perception of their own health through standardized, validated questionnaires [6]. The main purpose of PROMs is to improve quality of care and provide more patient-centred care by quantifying important subjective outcomes, such as perceived quality of life and physical and psychosocial functioning.
For the purpose of quality improvement in healthcare, PROMs are used on a micro, meso and macro level. On a micro level, PROMs are useful screening and monitoring tools to facilitate shared decision-making and patientcentred care [7][8][9]. On a meso level, aggregated PROMs (i.e. PROM outcomes on the group level) provide analytical and organizational angles for improving and regulating health in specific populations as a result of enhanced understanding, self-reflection, benchmarking and comparison between healthcare professionals and practices [10][11][12]. At a macro level, PROMs are used for overall population surveillance and policy [2,13,14]. The use of structurally collected PROMs is increasingly adopted in national quality registries [15,16], and it increased even further after the Organisation for Economic Co-operation and Development (OECD) recommended the collection of aggregated PROMs to obtain insight into system performance and to enable comparative analysis between practices [17].
The use of aggregated PROMs is a relatively young field. In 2018, Greenhalgh et al. showed that there was little empirical evidence that PROMs, at a meso level, led to sustained improvements in quality of care [18]. However, since then, there has been growing interest in this field, with a plethora of quantitative and qualitative research currently available. Therefore, the aim of this mixedmethods systematic review was threefold: (1) to summarize quality improvement methods based on aggregated PROMs at the meso level in hospital care; (2) to describe the effectiveness of quality improvement methods; and (3) to describe barriers, facilitators and lessons learned when using aggregated PROMs for quality improvement in healthcare.

Methods
The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines were used to design and report this review [19]. The review was prospectively registered with the International Prospective Register of Systematic Reviews (PROSPERO) 07-12-2020 (PROSPERO 2020: CRD42020219408).

Search strategy
Embase, MEDLINE, CINAHL and the Cochrane Library were searched for studies published up to May 2021. The search strategy (Additional file 1: Appendix I) included terms related to outcome measurements, quality management and quality improvement. Search terms consisted of Medical Subject Headings (MeSH) and free-text words, wherein for most terms, synonyms and closely related words were included. The search was performed without date or language restriction. Additional references were obtained by hand-searching reference lists of included studies and systematic reviews (backwards selection) and by identifying studies that cited the original included studies (forward selection). Duplicate studies were removed.

Eligibility criteria
Studies were considered eligible for inclusion if they described, implemented or evaluated a quality improvement method based on aggregated PROMs in the curative hospital setting. Both quantitative and qualitative studies were included in this review. Quantitative studies included experimental study designs, such as randomized controlled trials, controlled trials, cluster trials, controlled before-after studies and time-series studies. Qualitative studies included semi-structured interviews, focus groups or studies with a mixed-methods approach (e.g. process evaluation studies). Studies were excluded for the following: (1) the quality improvement was based on the use of PROMs in the individual setting only (e.g. in the consultation room); (2) written in a language other than English; (3) not peer-reviewed; (4) conference and editorial papers and reviews; or (5) the full text could not be obtained.

Study selection
All records found were uploaded to Rayyan, an online web application that supports independent selection of abstracts [20]. Two researchers (KvH and MD) independently screened the titles and abstracts of the identified studies for eligibility. Discrepancies were resolved by discussion with the involvement of a third researcher (JJ) when necessary. Subsequently, full texts were screened against the eligibility criteria independently by two researchers (KvH and MD).

Data extraction and synthesis
Due to the mixed-methods design of this review, two researchers (KvH and MD) extracted data from qualitative and quantitative studies separately [21] using a standardized form. Details on the study design, aims, setting, sample size, quality improvement method, PROMs and outcomes were extracted and synthesized into a narrative summary. The described quality improvement methods were summarized, and when available, the effect of these methods was reported.
For the qualitative synthesis, the approach outlined by Thomas and Harden [22] was followed, which involved a thematic synthesis in the form of three stages: (1) free line-by-line coding of the findings performed by three researchers; (2) organization of these codes into related areas to construct descriptive themes; and (3) the development of analytical themes. A fourth researcher (MO) was consulted for verification and consensus. The qualitative synthesis was structured around facilitators, barriers and lessons learned for the implementation of quality improvement interventions based on PROM data. Finally, both quantitative and qualitative synthesis were combined in the discussion section.

Quality assessment
Study quality was assessed independently by two researchers (KvH and MD) with the validated Mixed Methods Appraisal Tool (MMAT) [23] informing the interpretation of findings rather than determining study eligibility. The MMAT is a critical appraisal tool that is designed for mixed-methods systematic reviews and permits us to appraise the methodological quality of five study designs: qualitative research, randomized studies, non-randomized studies, descriptive studies and mixedmethods studies. Aspects covered included (dependent on study design) quality of study design, randomization, blinding, selection bias, confounding, adherence and completeness of data. The MMAT does not provide a threshold for the acceptability of the quality of the studies [23].

Effect and impact
Only five out of 13 studies reported on the effect of quality improvement methods based on aggregated PROMs [24,28,32,34,36]. Four of these studies, including both randomized controlled trials, showed no effect [24,28,36] to a minimal effect [34] on patient-reported outcomes after the use of individual benchmarking as a quality improvement method ( Table 1). One of the studies showed a significant improvement in the Oxford Knee Score after a plan-do-study-act (PDSA) cycle in a crosssectional post-intervention cohort [32]. The other eight studies described the method of implementation without effect measurement [25,27,33,35], or discussed (statistical) models for using aggregated outcomes as performance indicators [29][30][31].

PDSA cycle
Two studies used a PDSA cycle to improve the quality of care [32,33]. Van Veghel et al. (2014) reported on the establishment of an online transparent publication service for aggregated patient-relevant outcomes. Subsequently, these data enable benchmarking between Dutch heart centres to improve quality and efficiency. However, this study was not able to provide benchmarked patient-reported data due to a low response rate and a lack of data [33]. The study from Partridge et al. was a cross-sectional post-intervention study and compared their outcomes with a previously published report from the Health and Social Care Information Centre (HSCIC) from August 2011. A significant improvement in the Oxford Knee Score was found after changing the practice of care [32].

Dashboard as a feedback tool
Two studies used a web-based dashboard as a feedback tool [25,26]. In the study by van Zijl et al. (2021), feedback was available through graphical analysis of patient characteristics and PROMs for individual rhinoplastic surgeons. The purpose of this dashboard was to identify learning and improvement needs or provide data-driven motivation to change concepts or surgical techniques [25]. In Reilly et al., a dashboard was established to consistently measure the value of total hip and total knee arthroplasty by combining surgeon-weighted PROMs, clinical outcomes and direct costs [26]. Neither study reported on the impact of these methods.

Aggregated statistical analysis
One study investigated how clinical outcome measures can be linked to PROMs and concluded that the following methods were most appropriate: (1) analysing the factors related to a good or poor patient-reported outcome, and (2) analysing the factors related to agreement or disagreement between clinical and patient-reported outcomes [31]. Table 3 shows the study characteristics of the qualitative studies included in this research. All three studies comprised semi-structured interviews [10,11,37]. Interviews were conducted amongst experts from the United Kingdom [10,11], US [11], Ireland [37], Sweden [10] and the Netherlands [11]. The study from Boyce et al. comprises the qualitative evaluation [37] of a randomized controlled trial, which is discussed in the quantitative section [28].

Barriers, facilitators and lessons learned
In the qualitative analysis, barriers, facilitators and lessons learned/neutral statements were derived and were grouped into the following three themes: (1) conceptual, (2) methodological and (3) practical ( Table 4). The overview and description of the themes (i.e. codebook) with the occurrence of facilitators, barriers and lessons learned can be found in Table 4. The most important lessons learned for future implementation and research can be found in Table 5.

(1) Conceptual
The following four themes were derived: stakeholders, subjectivity of PROMs, aligning PROMS with clinical data, and PROMs versus patient-reported experience measures (PREMs). One facilitator for success that was mentioned was the engagement and commitment of stakeholders at both the meso and macro levels from the beginning [10,11,37]. Champions can advocate the added value of collecting PROMs, and governance and political will can be decisive for its success and sustainability [10,37]. Healthcare providers differ in their attitudes regarding the usage of PROMs for quality improvements; some advocate for sceptics [37]. As a start, small-scale projects with willing clinicians is recommended instead of teams with limited interest or readiness [11]. These advocates often need to convince other healthcare professionals due to concerns about the scientific properties of PROM measures, in particular the subjective characteristics of these measures. Thus, healthcare professionals have underlying doubt about the patient's ability to answer PROM questionnaires [10,37]. Furthermore, difficult-to-accept discrepancies between the PROM outcome and the clinical experience from healthcare professionals' point of view were found, since expectations were that these two outcome measures would align [37]. Moreover, Boyce et al. (2018) found that healthcare professionals were not able to distinguish the difference between PROMs and measures of PREMs [37].

(2) Methodological
Within this main theme, a distinction was made between data collection (2a) and data processing (2b).

(2a) Data collection
The following four themes were derived: choice of measure, timing of data collection, response rate of measurement and focus of measurement. Patient-reported measures should be selected cautiously to be appropriate for the targeted population [37], to ensure comparability and to prevent burdening the patient [10,11]. The combination of generic and disease-specific measures was seen as feasible and complementary [10,11,37], especially since generic measures facilitate good comparison, but are less able to detect variation [10]. Moreover, standardization of time points for data collection is advocated, as timing may influence the results [10]. For example, outcomes were measured during short-term follow-up when patients were not fully recovered [37]. Furthermore, to obtain high response rates, it is important to discuss the results of PROMs with the patient during consultation, especially during long-term followup [11]. Another reported barrier concerned the clinical value of performance measurement for interventions in a field where small variability a priori could be expected [37].

(2b) Data processing
Four themes were derived: representativeness of collected data, responsibility of healthcare professionals, inadequate case-mix control and interpretation of feedback.
It was mentioned that some healthcare professionals mistrusted quality improvement measures based on aggregated PROMs. First, the representativeness of the data used for benchmarking or quality improvement was seen as a barrier. Healthcare professionals expressed concern that the data would not reflect practice, the individual practitioner or the population of patients [10,11,37]. Furthermore, some patient groups were identified as a possible source of information and recall bias, such as patients with low health literacy or those with comorbidities who might confuse problems from one condition with another [37]. Additionally, patients' answers might be influenced by their care expectations, with the belief that this information is used to rate care, or by the need to justify their decision to have an operation [10,37]. Additionally, healthcare professionals may be tempted to manipulate data to obtain good performance rates by recruiting patients who are more likely to have good outcomes (i.e. selection bias) [10,11,37]. Second, healthcare professionals were afraid to be held unfairly responsible for outcome data that could be biased by differences in resources across hospitals [37], differences in support services at the community level [37] or factors Table 5 Lessons learned for future implementation and research • Involve stakeholders from the very start and create a shared vision between stakeholders • Use generic and disease-specific patient-reported outcome measures • Ensure that PROMs are administered at the right time during the health process • Provide feedback on performance to individual healthcare professionals • Ensure that the data are representative and that the statistical analysis is comprehensible • Provide healthcare professionals with training for adequate interpretation of aggregated PROM data • Enable a good infrastructure for adequate data collection and analysis by trained and qualified staff that occurred outside of their control [10,11]. Third, healthcare professionals worried that inadequate casemix control of confounders would bias comparisons of healthcare providers. In addition, the lack of transparency of the statistical analysis made it difficult to engage with the data. Two solutions were provided to address these barriers: (1) only providing aggregated data collection for quality improvement at a very generic level, or (2) presenting results stratified into subgroups instead of risk-or case-mix adjustment [11]. Furthermore, healthcare professionals expressed difficulty in understanding the data, a lack of norms for good or poor performance [11], and a need for training or guided sessions to correctly interpret the aggregated PROM data [10,37]. Quality improvement reports were able to identify how hospitals and healthcare professionals stand relative to one another, but they are often general and lack the ability to identify opportunities for real quality improvement or action [10], which is key for clinicians in engaging with data and processes [11].

(3) Practical
Statements related to practical implementation were grouped under "practical".
One theme, resources, was derived. Funding to get the programmes started was seen as a key facilitator for further development in structural embedding in routine care. Overall, commitment and support from the government and healthcare organizations were seen as facilitators [10,37]. The availability of resources for routine data collection and monitoring without disruption of workflow or additional workload was seen as important [10,11,37]. For example, the need for sufficient IT capacity and software to analyse the data enabled the data to be available quickly for healthcare professionals [10,11,37]. Additionally, the availability of tablets and assistance in the waiting room for completing questionnaires, the establishment of infrastructure for developing and disseminating annual reports [10], and the opportunity for data linkage and integration in hospital records were mentioned.

Discussion
The aim of this mixed-methods systematic review was to describe and investigate the experience and effectiveness of quality improvement methods based on aggregated PROMs. Four quality improvement methods were identified, including benchmarking, PDSA cycles, web-based dashboards as feedback tools, and the provision of aggregated statistical analysis reports. In total, 13 quantitative and three qualitative studies revealed that there is limited empirical evidence concerning quality improvements based on aggregated use of available PROMs. Only five studies reported on the effectiveness of the applied quality improvement method, and only one descriptive study reported a significant improvement of PROMs after implementation of aggregated PROM feedback. The qualitative studies identified that the belief of stakeholders, the use of generic and disease-specific PROMs, and the availability of funding and resources were important facilitators for success. One reported barrier was that sceptical healthcare professionals mistrusted the use of aggregated PROMs due to the subjectivity of PROMs and the contradictory results of PROMs and clinical outcomes. Furthermore, they were afraid to be held unfairly accountable for biased results as a result of case mix, differences in resources across hospitals, differences in support services at the community level or factors that occurred outside of their control. Lessons learned from the qualitative studies included creating shared stakeholder vision and that feedback on individual performance should be directed to individual healthcare professionals to learn from the outcomes of their own patients.
One quantitative study did find an effect of using aggregated PROMs in the PDSA cycle [32], and used specific facilitating factors to generate representative data, such as engagement of all stakeholders, the use of a combination of generic and disease-specific questionnaires, and obtainment of a high response rate. However, the results of this methodologically inferior cross-sectional postintervention study should be interpreted cautiously.
Methodological and practical barriers were considered a reason for not finding an effect of benchmarking. Weingarten et al. suggested that no effect of peer-benchmarked feedback was found due to the choice of measure, since only one generic outcome measure (functional status) was used [24]. The themes timing of data collection and timing of feedback were mentioned as important barriers in the included quantitative studies as well; a follow-up measurement was taken too early after providing peer-benchmarked feedback [28], provision of feedback started too late in the study [34] or the authors mentioned that the duration of the intervention was too short to be fully adopted by all participating healthcare professionals [36]. Multiple studies had shortcomings in reporting on bias due to an insufficient response rate of the measurement. As PROMs are prone to missing data, it is important that studies adequately report on the completeness of data and take possible bias into account when drawing conclusions.
Another issue mentioned was the representativeness of the collected data, as some outcomes could not be linked to one specific surgeon, or low-volume surgeons were excluded from the analysis, which caused less variation [34]. Kumar et al. (2021) mentioned that the difficulty in feedback interpretation for healthcare professionals caused a lack of effect [36]. To improve understanding and interpretation, the use of training (e.g. statistics and visualization) and educational interventions was mentioned explicitly within the two randomized controlled trials addressing the quality improvement method of peer-benchmarked feedback [24,28]. The importance of training was also addressed by the qualitative findings [10,11,37]. Previous research indicates that educational support is an important contextual factor for success in quality improvement strategies [38].
Additionally, the importance of good resources was mentioned in the discussion of the quantitative studies [24,28,34]. The importance of structural implementation was underlined by Varagunam et al. (2014), who stated that the small effect of the national PROMs programme was partly caused by the delay in the representation of the collected data.

Strengths and limitations
A major strength of this review is the mixed-methods design with the inclusion of overall moderate-to goodquality studies, which enabled a comprehensive overview of all available quantitative and qualitative research within this field. Furthermore, due to the mixed-methods design of this review, the quantitative findings were discussed in light of the derived qualitative barriers, facilitators and lessons learned. As a result of the lack of empirical research concerning quality improvement methods based on the aggregated use of PROMS, a metaanalysis was not performed. Additionally, it was purposively decided to include only peer-reviewed studies, and it is acknowledged that important studies from the grey literature may have been missed.

Future perspective
Future implementation of aggregated PROM feedback can be substantiated with the reported facilitators, barriers and lessons learned from the current review (Tables 4, 5). It is important that every institution using aggregated PROMs make their results available, including possible biases and completeness of outcome data. Furthermore, the strength of combining PROMs, clinical data and PREMs should be recognized. The use of aggregated clinical data and PREMs has already been shown to be effective in quality improvement [5,[39][40][41], while using aggregated PROMs for quality improvement is still in its infancy.
As qualitative outcomes mainly addressed the issue of obtaining accurate data and consequently gaining professionals' trust in the concept and relevance of quality improvement, this research did not find best practices on how to learn and improve based on aggregated PROM data. Future research should focus on organizational and individual aspects that contribute to the optimal use of the obtained aggregated PROMs for quality improvement [42].

Conclusion
This review synthesized the evidence on the methods used and effectiveness for quality improvement in healthcare based on PROMs. The findings demonstrate that four quality improvement methods are used: benchmarking, PSDA cycles, dashboards and aggregated analysis. These methods showed little to no effect, which may be due to methodological flaws, as indicated by the qualitative results. In conclusion, this field of research is in its infancy, and more empirical research is needed. However, the descriptive and effectiveness findings provide useful information for the future implementation of value-based healthcare at the meso level and further quality improvement research. In future studies, it is important that a shared stakeholder vision is created, PROMs and timing of measurement and feedback are appropriately chosen, interpretation of the feedback is optimal, every effort is made to reduce missing data, and finally, practical resources for data collection and feedback infrastructure are available.

Additional file 1: Appendix I: Search strategy.
Additional file 2: Appendix II: Quality appraisal of included studies.