Skip to main content

Tools for assessing health research partnership outcomes and impacts: a systematic review



To identify and assess the globally available valid, reliable and acceptable tools for assessing health research partnership outcomes and impacts.


We searched Ovid MEDLINE, Embase, CINAHL Plus and PsycINFO from origin to 2 June 2021, without limits, using an a priori strategy and registered protocol. We screened citations independently and in duplicate, resolving discrepancies by consensus and retaining studies involving health research partnerships, the development, use and/or assessment of tools to evaluate partnership outcomes and impacts, and reporting empirical psychometric evidence. Study, tool, psychometric and pragmatic characteristics were abstracted using a hybrid approach, then synthesized using descriptive statistics and thematic analysis. Study quality was assessed using the quality of survey studies in psychology (Q-SSP) checklist.


From 56 123 total citations, we screened 36 027 citations, assessed 2784 full-text papers, abstracted data from 48 studies and one companion report, and identified 58 tools. Most tools comprised surveys, questionnaires and scales. Studies used cross-sectional or mixed-method/embedded survey designs and employed quantitative and mixed methods. Both studies and tools were conceptually well grounded, focusing mainly on outcomes, then process, and less frequently on impact measurement. Multiple forms of empirical validity and reliability evidence was present for most tools; however, psychometric characteristics were inconsistently assessed and reported. We identified a subset of studies (22) and accompanying tools distinguished by their empirical psychometric, pragmatic and study quality characteristics. While our review demonstrated psychometric and pragmatic improvements over previous reviews, challenges related to health research partnership assessment and the nascency of partnership science persist.


This systematic review identified multiple tools demonstrating empirical psychometric evidence, pragmatic strength and moderate study quality. Increased attention to psychometric and pragmatic requirements in tool development, testing and reporting is key to advancing health research partnership assessment and partnership science.

PROSPERO CRD42021137932

Peer Review reports

Scoping review and Coordinated Multicentre Team Protocol registrations

  1. 1.

    PROSPERO Protocol Registration: CRD42021137932

  2. 2.

    Open Science Framework (Coordinated Multicentre Team Protocol):

  3. 3.

    Coordinated Multicentre Team Protocol publication:


The emphasis on and number of studies involving health research partnerships has grown substantially over the last decade [1]. Despite this evolving popularity and mounting demand for the systematic quantification of partnership outcomes and impacts, the assessment of health research partnerships has not kept pace [2]. Here, we refer to health research partnerships as those involving “individuals, groups or organizations engaged in collaborative, health research activity involving at least one researcher (e.g. individual affiliated with an academic department, hospital or medical centre), and any partner actively engaged in any part of the research process (e.g. decision or policy maker, healthcare administrator or leader, community agency, charities, network, patients, industry partner, etc.)”(p 4) [3].

Although quantitative tools for assessing the outcomes and impacts of health research partnerships emerged in the late 1980s to early 1990s [5,6,7], available tools are largely simplistic and the assessment of outcomes and impacts in the health research partnerships domain, nascent [5, 7,8,9,10,11,12,13]. Available studies are often hampered by a lack of rigorous measurement, including tool psychometric testing to establish evidence of validity and reliability. The limitations of existing studies fall into three categories: many primary studies select single-use and locally relevant tools as a core part of the partnership process, with a focus on monitoring their partnerships’ progress and on bespoke outcomes and impacts of highest relevance to them [5, 9]. Although most tool studies aim to incorporate partner views, track individual partnership progression and capture partner perspectives, few aim to create more universally applicable, standardized tools that can be used more broadly or for replication studies [10]. Second, many such studies are limited by small sample sizes and lack of iterative tool testing, which in turn contributes to the lack of psychometric evidence and a lack of evidence across a broader range of contexts. Third, primary studies in this domain are often limited by interchanging terminology, a lack of discrete concept definitions, problems associated with literature indexing, location and retrieval [3, 14, 15], and multiple tool-specific challenges including construct identification, definition, refinement and application [5,6,7,8,9,10, 12].

Cumulatively, these challenges inhibit the evolution of partnership assessment and ultimately slow the advancement of partnership science [9, 10]. A recent overview of reviews examining quantitative measures to evaluate impact in research coproduction suggests that investigators must “engage more openly and critically with psychometric and pragmatic considerations when designing, implementing, [evaluating] and reporting on measurement tools” (p. 163) [8]. There is an established rationale for developing robust, pragmatic measures that are both relevant to partners and usable in real-world settings; pragmatic tools are viewed as a critical accompaniment to pragmatic designs [16,17,18]. In this light, health research partnership tools should be relevant to partners, be actionable, have a low completion burden, and demonstrate adequate validity and reliability. Importantly, there is a need for tools that are broadly applicable, can be used for benchmarks with accompanying norms to aid interpretation, and that demonstrate strong psychometric and theoretical underpinnings, without causing harm [16]. Closing these gaps would help to facilitate tool use, advance the measurement of systematic partnerships and drive improvements in partnership science [8].

Numerous tools for assessing health partnership outcomes and impacts have been identified in previous reviews focused on specific partnership domains, partner groups or contexts [5,6,7,8,9,10,11,12]; however, scope restrictions in these reviews preclude our understanding of tools across health research partnership traditions. These reviews also reveal that information about tool psychometric and pragmatic properties remains lacking. This study reviewed and systematically assessed globally available tools for the assessment of health research partnership outcomes and impacts to address documented gaps in both the psychometric and pragmatic characteristics of these assessment tools.

Our primary research question was as follows: what are the globally available, valid, reliable and acceptable tools for assessing the outcomes and impacts of health research partnerships? Our secondary research questions pertained to tool characteristics, including the following: what are the reported purposes of the tools, are outcomes and/or impacts measured, and what are the reported theoretical underpinnings and psychometric and pragmatic properties of the tools? (Additional file 1: Appendix S1). Secondary research questions pertaining to partnership characteristics were captured and will be reported in a forthcoming publication to preserve manuscript clarity.


This review is part of a comprehensive, multisite synthesis effort by the Integrated Knowledge Translation Research Network (IKTRN) [3, 19] and was guided by a collaboratively built conceptual framework [3]. In this review, we define tools as “instruments (e.g. survey, measures, assessments, inventory, checklist, questionnaires, list of factors, subscales or similar) that can be used to assess the outcome or impact elements or domains of a health research partnership” (p 5)[3, 20].

The overall approach to the review was guided by the steps outlined by Arksey and O’Malley [21], with refinements [22,23,24], and additional guidance from the Centre for Reviews and Dissemination (CRD) guidance for undertaking reviews in healthcare [25], the Cochrane Handbook for Systematic Reviews [26] and the Joanna Briggs Institute Reviewers’ Manual [27]. This manuscript was structured and reported using the newly updated Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) reporting standards [28]. Operational terms and definitions were published a priori as part of the multicentre approach [3]; additional definitions are provided in Additional file 1: Appendix S2 and detailed in the PROSPERO registered protocol, including key questions, inclusion–exclusion criteria and a priori specified methods [29]. All protocol deviations and accompanying rationale are detailed in Additional file 1: Appendix S1.

Search strategy and data sources

In consultation with an academic medical librarian (MVD), we iteratively developed a comprehensive search strategy using key papers and audit-improvement rounds to refine study catchment and feasibility [30]. The resulting health research partnership term clusters and the search strategy development methods have been applied to subsequent, parallel reviews [2, 3, 14, 15, 31]. We tested the strategy in Ovid MEDLINE to balance search sensitivity and scope [32]. The partnership search term cluster underwent peer review [33, 34] by an academic librarian to test for conceptual clarity across multiple partnership approaches. The overall strategy was subjected to the Peer Review of Electronic Search Strategies (PRESS) checklist review by a second academic network librarian, resulting in the spelling correction of a single term. No restrictions for date, design, language or data type were applied. The search strategy was translated for all four databases (Additional file 1: Appendix S3).

Electronic databases

Using the a priori, unrestricted strategy, we searched MEDLINE (Ovid), Embase, CINAHL Plus and PsycINFO from inception through 2 June 2021, including two updates. The search generated a total of 56 123 citations, resulting in the screening of 36 027 de-duplicated records [35] and 2784 full-text papers, managed with EndNote™ X7.8.

Eligibility and screening

We kept studies involving health research partnerships that (i) developed, used and/or assessed tools (or an element or property of a tool) to evaluate partnership outcomes or impacts [5, 36] as an aim of the study, and (ii) that also reported empirical evidence of tool psychometrics (e.g. validity, reliability). We excluded studies in which the main purpose of the partnership was recruitment and retention of study participants. Conference abstracts were excluded from the eligible literature only after full-text assessment or confirmation that the citations were preliminary or duplicate records, or were lacking sufficient abstraction detail [37]. Abstracts in languages other than English were passed through title/abstract (level 1 [L1]) screening but translated prior to full-text assessment (Table 1).

Table 1 Study inclusion–exclusion criteria

All titles/abstracts (L1) and eligible full-text studies (L2) were screened and assessed independently, in duplicate (KJM with JB, LP, LN, SS, SM, MK, CM, AG, LS, KA), and tracked in a Microsoft (MS) Excel [38] citation database and screening spreadsheets. We tested and revised screening tools at each stage of the review and employed a minimum calibration rule (Cohen’s κ ≥ 0.60) [39] to align team members’ shared understanding of concepts and the application of eligibility criteria [40,40,41,43]. To balance abstraction burden with data availability and complexity, full-text abstraction (study and tool characteristics) was undertaken using a hybrid strategy [22, 44]. Eligible papers were independently abstracted by KJM and independently validated (MK, SS, SM, KP) [45] using a predefined coding manual. We resolved all discrepancies by consensus discussion [21, 41]. Investigators were sought out to locate missing tools or for assistance in differentiating linked citations only [43]. At least two attempts were made to locate corresponding authors and tools when contact details or tools were incorrect or missing [3, 5, 14]. The assessment and abstraction/scoring of psychometric, pragmatic tool evaluation and study quality characteristics were also undertaken independently and in duplicate, with discrepancies resolved the same way.

Study and tool characteristics

Data pertaining to study and tool characteristics were abstracted per the protocol [29]. We anticipated challenges associated with consistent use of terminology as are commonly reported in this research domain (e.g. outcomes/impacts, partnership approaches, tool type) [3, 8, 14, 15]. When this occurred, we used the terms most prominent in methodological descriptions. We coded health subdomains inductively based on key words and study purposes [46]. More than one code per study was used to describe the study subdomain, as required.

Empirical evidence of tool psychometrics

The empirical psychometric evidence for tools was evaluated for each identified tool. Informed by previous studies [6,7,8,9,10,11,12] and best-practice recommendations [17, 18, 36, 47, 48], we created an initial list of psychometric evidence types, and expanded this list iteratively when new sources were identified by included studies (Additional file 1: Appendix S3). Only studies reporting empirical psychometric evidence were retained in this review to (i) address the documented lack of research reporting psychometric evidence for health research partnership outcomes and impacts assessment tools, and (ii) advance our understanding about the presence and types of psychometric evidence available in existing literature beyond simple dichotomous labels (e.g. valid/not valid or reliable/not reliable). By synthesizing the presence of psychometric evidence across studies, we also aimed to highlight areas in which the nature and type of psychometric evidence could be improved and advance the science of partnership assessment. This approach necessarily focused on later testing and evaluation stages of tool development [49] but does not diminish the importance of conceptual and theoretical sources of evidence to establish tool reliability and validity as important precursor evidence sources. As previously reported, the identification and reporting of psychometric data was complex and varied substantially in level of detail. This was mitigated through iterative review, piloting and calibration; all abstraction discrepancies were independently, then collectively considered, then resolved to consensus through recurrent discussion.

Pragmatic tool evaluation criteria

We modified a set of consensus-built criteria developed by Boivin et al. [7, 50] as an alternative to applying the Psychometric and Pragmatic Evidence Rating Scale (PAPERS) criteria [17, 18] due to the quality of reported data. The main purpose of the criteria checklist was to appraise the tools from the perspective of those intended to use the tools [7]. Team members iteratively modified and piloted the revised items. A final set of 20 criteria (five questions in four domains: Scientific Rigour, Partner Perspective, Comprehensiveness and Usability) were generated. Piloting confirmed that these criteria were a better fit for the level and detail present in the literature under examination, and provided a comprehensive, easily interpretable (single score) evaluation of scientific, partner, comprehensiveness and usability/accessibility properties for each tool (Additional file 1: Appendix S4). It is important to note that the original criteria were intended for use as a checklist, not a quality assessment [7]; we used them this way in our review. The modified criteria were applied independently and in duplicate to all tools [51], with discrepancies resolved by consensus. Tools were coded as toolkits in studies where multiple tools were described and intended for collective use; in these cases, tool characteristics were scored cumulatively and reported as a single tool.

Study quality assessment: the quality of survey studies in psychology (Q-SSP) checklist [52]

Study quality assessments typically assess the degree to which adequate measures were taken to minimize bias and avoid errors throughout the research process [53], and are hence design-focused. After piloting several quality appraisal tools with the eligible literature, we found that the best-fitting tool was an assessment of survey methods, namely the Q-SSP appraisal checklist and guide (Additional file 1: Appendix S5). The Q-SSP checklist was developed to address a wide variety of research and to help investigators differentiate broadly acceptable from lower-quality studies [52] using a four-stage process comprising evidence review, expert consensus, checklist refinement and criterion validity testing [52]. Q-SSP assessments were undertaken independently, in duplicate, and we resolved discrepancies by consensus.


Basic descriptive statics including means, standard deviations and frequencies were calculated to synthesize quantitative study, tool, psychometric and pragmatic characteristics in MS Excel [38] and Stata v13.1 software [54]. The synthesized data were consolidated into tables. Scores for each of the pragmatic and tool evaluation criteria (mean/standard deviation) were synthesized and reported by criterion, domain and overall sample. We synthesized qualitative variables using thematic analysis [46] in NVivo v12.7 [55], in keeping with the overarching descriptive-analytical approach for the review [56], and used existing reporting guidelines to organize the findings [57,57,59]. Finally, study quality assessments (Q-SSP) [52] were documented by calculating an overall quality (%) and four domain-specific scores (ratios) for each study.


The search generated 36 071 de-duplicated records and 49 full-text studies (48 studies and one companion report), as depicted in the study flow diagram (Fig. 1).

Fig. 1
figure 1

PRISMA systematic review study flow diagram

The team Cohen’s kappa was 0.66 [95% CI (0.64–0.67)] at L1 title/abstract screening and 0.74 [95% CI (0.72–0.76)] at L2 full-text review; these results were categorized as “substantial” [39, 42].

Study characteristics

Eligible studies comprised English-language and a single French-language report originating mostly in North America (39) and Europe (9), with a small remainder from South Africa (3), Australia (1) and Taiwan (1). Five dual-site studies involved the United Kingdom and South Africa (3), Canada and Australia (1), and Mexico and the United States (1) (Table 2).

Table 2 Characteristics of included studies (n = 48 studies, 1 companion report)

The eligible literature was widely dispersed, with exactly half of the publications (24, 50%) published in the same number of journals. Several small publication clusters were identified, including seven studies in Health Education & Behaviour (15%), three each in the American Journal of Community Psychology, Global Health Promotion and theses (each 6%), and two each in Health Promotion International, Public Health Nursing, Evaluation and Program Planning and Health Promotion Practice (each 4%). As shown in Fig. 2, about half of the identified literature was published after 2014 (20, 42%), and the earliest study was published in 1996.

Fig. 2
figure 2

Year of publication for included studies (n = 48 studies)

Most studies involved cross-sectional (28, 58%) and mixed methods with embedded survey (14, 29%) designs, case/multi-case (3, 6%), post- and pre-post designs (2, 4%), and a single nested longitudinal study (1, 2%) (Table 2). Studies employed quantitative (31, 65%) or mixed methods (17, 35%), and of the mixed-methods studies (17), most were true mixed quantitative–qualitative methods (14, 82%), and the remainder were mixed qualitative (2, 12%) and mixed quantitative (1, 6%) methods.

The studies were conducted in multiple health subdomains (Fig. 3), including health promotion, prevention and public health (19), and disease-specific domains [i.e. cancer, mental health and substance use/harm reduction, and sexually transmitted/blood-borne infections and sexual health (12)]. The smaller subdomains included community health and development (7), special populations (e.g. primary care, paediatric/adolescent health, and immigrant and geriatric health) (6), partnerships (6), health equity (4) and health services research (3).

Fig. 3
figure 3

Health subdomain clusters

Most studies reported explicit conceptual underpinnings (44, 91%). Methodologically, studies were multifocal, contributing to the health research partnership assessment literature through tool validation (44, 92%), development (25, 52%), modification (21, 44%) and evaluation (13, 27%), and measured outcomes (25, 52%), impacts (2, 4%) or both outcomes and impacts simultaneously (21, 44%). Explicit definitions for the terms outcome and impact were available in less than half of studies (20, 42%), and terms were frequently switched.

Tool characteristics

Included studies yielded 58 tools. The characteristics of the included tools are summarized in Table 3. With one exception, studies were exclusively English-language, and six contained non-English-language tools (English–Spanish, 3 [60,60,62]; English–French, 2 [63,63,65]; and Dutch, 1) [66]). Tools targeted multiple partner groups including partnership members (28, 43%), community members (11, 17%), researchers (10, 15%), patients, and public and coalition staff (4, 6% respectively), and to a lesser extent targeted research staff (3, 5%), healthcare staff and partner organizations (2, 3% respectively), and education staff members (1, 2%). Surveys (21, 36%), questionnaires (17, 29%) and scales (12, 21%) were the most common tool types identified, and these categories were complicated by frequent switching of terms (survey, questionnaire, scale) and variable categorization across reports. We also identified several toolkits (3, 5%), indices and rubrics (2, 3%, respectively), and a single checklist (2%).

Table 3 Tool characteristics (n = 58 tools)

Almost all tools assessed process (55, 95%), but only half assessed outcomes (30, 52%) or both outcomes and impacts (26, 45%). Very few focused on impact assessment alone (2, 3%); however, we observed inconsistencies in the use and definition of these terms. We identified multiple forms of empirical evidence for validity (86%, 50) and reliability (95%, 55) in the tools. The presence of conceptual underpinnings (90%, 52) was the same as study-level conceptualization.

Pragmatic tool evaluation scores

Tables 4 and 5 present a synthesis of pragmatic tool evaluation criteria [7] (Additional file 1: Appendix S4). Mean domain scores were highest for Comprehensiveness (3.79, SD 0.75) and Scientific Rigour (3.58, SD 0.87), followed by Usability (3.19, SD 1.38). The lowest mean domain score was for Partner Perspective (2.84, SD 1.04), which was a surprising finding given the review focus on health research partnership assessment.

Table 4 Pragmatic tool evaluation consolidated scores (n = 58 tools)
Table 5 Health research partnership tool evaluation—study scores (n = 48 with 1 companion report; n = 58 tools)

Tool  comprehensiveness was high in terms of documenting outcomes and/or impacts (100%), partnership process (95%) and context (97%); however, tools lacked deliberate design for recurrent monitoring of partnerships (33%).

In terms of Scientific Rigour, tools were not typically informed by systematic evidence (17%) but were conceptually grounded (90%) and presented evidence for both validity and reliability (90% and 93%, respectively, inclusive of both empirical and theoretical/conceptual sources). Only half of the tools were explicitly based on the experiences and expertise of partners (55%).

Overall, tool Usability was mixed. Tool purpose was always present (100%), but only half of the tools were freely accessible (50%), considered easy to read and understand (53%), accompanied by instructions (57%) and available in a readily usable format (62%).

Tools were generally designed to be self-administered (97%), but not for reporting back to partners (28%). The level of partner involvement was not commonly included (28%), and partners were deliberately involved as co-designers in only 59% of studies, despite frequent capture of partner influence (76%).

The overall tool evaluation mean score was 66.64 (SD15.54), with scores ranging from 35 to 90% (Fig. 4).

Fig. 4
figure 4

Pragmatic tool assessment—criteria total scores (n = 58 tool scores)

The domains and total score analysis highlighted strengths for several tools. Twelve tools scored high (4 or 5) across all four domains (≥ 85%) [61,61,62,64, 67,67,68,69,70,72], and an additional two tools [73, 74] had lower Partner Perspective domain scores (3) but still achieved a high total score (85%) across the remaining three domains. Several tools demonstrated top scores for Comprehensiveness [69, 73, 75,75,76,77,79] while others scored higher in Scientific Rigour [61, 66, 70,70,72, 74, 80] and Usability [61,61,62,64, 67, 68, 71,71,72,74, 77, 81,81,83]. Few achieved top scores in the Partner Perspectives domain [62, 70] (Tables 4 and 5).

Psychometric assessment

Psychometric testing and reporting were widely variable and challenging to assess, primarily due to inconsistent or incomplete testing, reporting and reporting detail. Almost three quarters of studies presented two or more forms of psychometric evidence for validity (35, 73%); eight studies (17%) presented two forms of evidence for reliability. Iterative assessment and abstraction of psychometric evidence revealed reliability evidence in four categories (internal consistency, test–retest reliability, inter-rater reliability and other). The most frequently occurring form of reliability evidence was internal consistency (83%). Validity evidence was found in 11 categories [construct validity (convergent, factorial, discriminant, known groups, other), criterion validity (predictive, concurrent), structural validity (dimensionality), responsiveness, face validity, and content validity] (Table 6). The most frequent validity evidence was convergent construct validity (43, 27%) and predictive criterion validity (31, 20%). We observed norms and abstracted two forms of evidence for interpretability (ceiling/floor effects and interpretability); however, both evidence forms were rare.

Table 6 Consolidated tool psychometric evidence (n = 58 tools)

We identified 18 studies with more advanced and comprehensive assessment and reporting of psychometric evidence for validity and reliability [60, 61, 65, 68, 69, 71, 72, 74, 78,78,80, 82,82,83,84,85,86,88]; several of these studies overlapped with high-scoring tools identified using pragmatic tool evaluation criteria [61, 68, 69, 71, 72, 74].

Study quality assessment (Q-SSP)

The Q-SSP assessment revealed an overall mean study quality score of 58.02% (SD 12.32%), with scores ranging from 25 to 80%. Most studies (42, 88%) scored < 75%, and thus were categorized as having “questionable” quality by convention; very few studies (6, 12%) scored ≥ 75% or within the “acceptable” range [61, 65, 71, 81, 88, 89] (Table 7).

Table 7 Q-SSP assessments by item, domain and total score for included studies (n = 48 studies, 1 companion report)

Across studies, the Introduction domain mean score was 3.04/4.00 points (SD0.82), the Participant domain mean score was 1.77/3.00 points (SD0.78), the Data domain mean score was 5.27/10.00 points (SD1.62), and the Ethics domain mean score was 1.52/3.00 points (SD0.71).

The problem and target population were generally well described and participant sampling and recruitment details present, but operational definitions (32, 67%), research questions and hypotheses (24, 50%) and sample size justification were often lacking (35, 75%). There were strong links between the proposed and presented analyses (46, 96%), but the study measures themselves were frequently missing from reports or supplements (17, 35%). The provision of validity evidence for included measures was found lacking in almost a third of studies (14, 29%), and most studies lacked detail about those collecting data (42, 88%), the duration of data collection (29, 60%) and the study context (25, 52%). Explicit reference to informed consent/assent and the inclusion of participants in post-data-collection debriefing was largely absent or unclear across included studies (29, 60% and 37, 77%, respectively).

Overall, four of the six studies with “acceptable” quality overlapped with studies reporting more comprehensive psychometrics [61, 65, 71, 88], but only two overlapped with those reporting higher pragmatic tool criteria scores [61, 71].

Evidence summary: tool validity, reliability, pragmatics and study quality

This review identified 58 tools underpinned by empirical psychometric evidence in the assessment of health research partnership outcomes and impacts. When considered with pragmatic tool evaluation criteria and study quality score findings, four noteworthy groups of studies and accompanying tools emerged (22, 46%). First, only two studies (2, 4%) reported more comprehensive psychometrics and had both high pragmatic tool criteria and Q-SSP study quality scores [61, 71]. A second group of studies (7, 15%) reported more comprehensive psychometrics and either high pragmatic tool criteria scores [68, 69, 72, 74, 80] or high study quality scores [65, 88]. The third group (8, 17%) had more comprehensive psychometrics [60, 78, 79, 82,82,83,85, 87], and the last set of studies (5 plus companion report, 10%) scored high on pragmatic tool evaluation criteria [62,62,64, 67, 70, 73].


This systematic review identified 58 tools for assessing health research partnership outcomes and impacts with tool psychometric evidence and pragmatic characteristics. We were able to identify a group of noteworthy tools, distinguished by their psychometric evidence, tool pragmatic characteristics and study quality scores.

Key study-level comparative findings

Overall, the presence and reporting of empirical psychometric evidence and pragmatic characteristics appeared improved in our study compared with previous reviews, yet several challenges related to the nascency of this research field remain (e.g. lack of key term definitions and measurement clarity, term switching, a lack of studies with deliberate focus on tool development, testing, evaluation and improvement, variable and inconsistent reporting). Future research to advance partnership measurement and science should consider both psychometric improvements (with specific emphasis on increased consistency, level of tested and reported detail, and dedicated study) and pragmatic considerations (specifically on accessible tools that are better informed by partner experiences and expertise, designed for partnership monitoring, and quantifiably readable). In examining tools with empirical psychometric evidence, this study contributes to our understanding of existing partnership tool measurement strengths and gaps. Our review provides practical ways to advance partnership measurement and, ultimately, partnership science.

At the study level, our findings aligned with previous reviews in that most included studies were North American- and English-centric, with a wide publication dispersion pattern and mid-2010 emergence [2, 7, 8, 11]. We also experienced previously reported challenges in the location of tools and author responsiveness [5, 7]. Our study differed from others documenting a predominance of qualitative methods and relative rarity of quantitative tools, designs and methods [9, 12, 70, 90,90,92]. By contrast, our review deliberately sought and identified tools with empirical psychometric and pragmatic characteristics encompassing diverse health research approaches. This review identified studies employing cross-sectional and mixed-method/embedded survey designs and quantitative and mixed methods; this catchment is likely a function of our study inclusion criteria but may also reflect an increasing overall trend towards the quantification of partnership assessment [1, 7, 11,12,13, 92, 93].

Key tool-level comparative findings

On a tool level, we found similarities and differences between our study and previous, related reviews, but these studies differed in scope (e.g. literature, search period, research domains other than health, focus of measurement) and definitions of partnership, generating very different samples and eligible primary literature [2].

Our findings demonstrate the need for research deliberately focused on tool development, testing and evaluation. Like other related health research partnership reviews [7, 8, 10, 94], we found that while tool purpose was universally reported, investigators focused almost exclusively on assessing and understanding the characteristics of bespoke partnerships. This was a consistent finding, despite the diverse scope and focus of these reviews (i.e. patient/public evaluation tools, community coalitions, coproduction impacts, and research collaboration quality and outcomes, respectively). Very few primary studies in our review focused specifically on tool validation or psychometric testing, although most involved one or more such activities. Furthermore, most studies were multifocal, that is, encompassing one or more tool development, modification, use, evaluation or validation activities simultaneously. These findings support previous reports regarding the paucity of focused health research partnership tool evaluation research [10, 94]. Our findings strengthen existing recommendations targeting the systematic assessment of psychometric and pragmatic tool properties [8], and more deliberate funding of research on tool design, testing, improvement and evolvement in general [49]. These aspects are considered key to advancing partnership science measurement and partnership science as a field [8, 9, 70, 95].

Conceptually, our study revealed a much higher presence of theoretical underpinnings at both the study and tool levels (91%, respectively), compared with levels reported in other partnership tool reviews of patient/public and community coalition evaluation tools [7, 94]. However, the implications of this finding remain unclear. Some authors have observed that theoretical/conceptual connections to both partnership and measurement theory rarely translate into operationalized tool elements [8, 17]; this is an important area of future inquiry.

The tools we reviewed measured outcomes similarly, as compared with a recent review of patient/public partnership evaluation tools (52% vs 56%) [7]; however, in our study, we found that explicit definitions for outcome and impact terms were present intermittently and often interchanged. Terminology challenges have been reported in other systematic studies in the health research partnerships domain, noting the significant variance, overlap and omission of key term definitions from reports (i.e. terms for outcomes/impacts, partnership approaches and tool types) [9, 14, 15, 96]. While comparative research and crosstalk among research partnership traditions is a relatively recent phenomenon [4, 6, 96,96,97,99], clarity on key concepts, terminology, definitions, core measures and tools is fundamental to advancing partnership measurement and scientific inquiry [8, 9, 49, 70].

Comparative findings: tool pragmatic characteristics, validity and reliability

Pragmatic tool evaluation scores were generally higher in our review than in Boivin and colleagues’ review of patient partnership evaluation tools [7]. In our study, the highest mean domain scores were Comprehensiveness and Scientific Rigour, whereas Scientific Rigour was the lowest domain score in the Boivin review [7]). Importantly, we found that only a single tool overlapped between the reviews. This lack of overlap can be accounted for by differences in review scope, targets and inclusion criteria (i.e. the Boivin review focused on patient and public involvement evaluation tools and included tools for assessing engagement in both health system decision-making and health research, with narrower search terms over a shorter time span; and our review deliberately selected studies reporting empirical tool validity and reliability evidence).

Tool validity (86%) and reliability (95%) evidence in our study was markedly higher and contrasted starkly with prior work [7, 8], in which evidence for validity was found in only 48% and 7% of studies, respectively [7, 8], and evidence for reliability was found in 45% and 35% of studies, respectively [7, 8]. As noted previously, there was little to no overlap in captured tools between these reviews (n = 1 [7] and n = 13 [8], respectively), which can be similarly accounted for by differences in scope that generated different primary and secondary literature sets. The MacGregor overview of reviews [8] focused solely on reviews of tools to assess the impacts of research coproduction, differing by time span, key partnership terminology and key domains. As a result, only four of the eight identified reviews were considered in-scope; thus, the number of overlapping tools was limited (n = 13).

Future research

Boateng et al. [49] describe the requisite steps, activities and key precursors and concurrent factors required for robust tool development, testing and evaluation in the future. Specific attention to such steps and components could enable more deliberate tool evolvement in the health research partnership assessment domain. Specifically, the authors call for graduate-level training in the development and evaluation of tools, to create expertise in graduate students and research teams. Furthermore, the authors caution that this research can be “onerous, jargon-filled, unfamiliar, and resource intensive” (p. 1) [49]. Specific accommodations to offset resource and time intensity and higher participant burden due to larger sample sizes may be required. Health research partnerships assessments must meet the needs of both researchers and end-users by balancing rigour and resource intensity in a way that remains fit for purpose. Both deliberate funding and the use of hybrid study designs will be helpful for providing required focus and generating robust evidence that will address persistent psychometric and pragmatic gaps with future research.

Study limitations

We noted several key limitations with this review. We observed several challenges with respect to the evidence for and the testing of tool psychometric properties. Like Sandoval et al. [5], we experienced challenges related to the reporting of psychometrics on multiple levels (e.g. scale, index, subscale, item and tool), as well as mismatched use of psychometric evidence (e.g. justification or application of previous scale, subscale or item-specific psychometrics to other levels of testing). To mitigate this risk, we approached psychometric evidence in eligible studies with these issues in mind, and relied on strict methodological processes (independent, duplicate abstraction and review and resolution of all discrepancies through consensus discussions) to ensure accurate interpretation and representation of abstracted data.

As mentioned previously, the variable use of terminology may have compromised our ability to clearly describe and assess health research partnership tools. Further efforts to consolidate terms and definitions across health research partnership traditions will help resolve these issues in future work.

This study was limited in several ways by the accessibility and reporting concerns documented in previous reviews [3, 5, 7, 14, 15]. Most included studies were multimodal and did not often explicitly refer to tool development, testing or evaluation in their purpose statements. To mitigate the risk of missing potentially relevant studies in our review, we deliberately kept our inclusion criteria broad at the title and abstract (L1) screening phase. However, this strategy also produced a large set of L2 full-text assessments, negatively impacting study feasibility. Consensus and consolidation of evidence in this research domain, as well as more focused, explicit reporting of health research partnership assessment, tools and psychometric and pragmatic characteristics, will facilitate more efficient literature location, retrieval and assessment in the future.

Finally, we noted a potential gap in the scope of a question modified as part of the pragmatic tool evaluation criteria: Was the tool informed by literature generated from a systematic literature search? In retrospect, we surmise that this question was too narrow to capture evidence derived from historical hypothesis testing generated by theoretically driven research (i.e. dimensionality tests) [49]. In addition to synthesis-level evidence for relevant components, tools or tool components that are informed by iterative tests of components derived from conceptual framework testing could play an equal or more important role in identifying and refining key tool constructs. Theoretically grounded components may also progressively improve the psychometric quality of health research partnership outcome and impact assessment tools. We recommend amending this question for use in future tool evaluation studies to better capture the full scope of relevant evidence underlying assessment tools.


This large-volume systematic review successfully identified empirically evidenced tools for the assessment of health research partnership outcomes and impacts. Our findings signal some promising improvements in the presence of conceptual, methodological and psychometric characteristics in measurement tools, and the availability of pragmatic tool characteristics. Persistent challenges linked to the nascency of the research partnership field and its measurement remain. Practically, the comprehensive tool characteristics presented here can help researchers and partners choose assessment tools that best fit their purposes and needs. Finally, our findings further strengthen calls for more deliberate and comprehensive tool development, testing, evaluation and reporting of psychometric and pragmatic characteristics to advance research partnership assessment and research partnership science domains.

Advancing knowledge of health research partnership outcomes and impacts assessment and partnership science are mandated aims of the IKTRN [100]. The IKTRN is a research network based at the Centre for Practice-Changing Research at the Ottawa Hospital and supported by the Canadian Institutes of Health Research. The IKTRN comprises researchers from more than 30 universities and research centres and research users from over 20 organizations, with a broad research agenda focused on best practices and their routine application to ensure effective, efficient and appropriate healthcare [101, 102].

Availability of data and materials

The study search strategy, abstraction tools and bibliographic tool index will be available through the Open Science Framework upon completion of the research and publication of findings. Data generated and/or analysed during the current study will be made available upon reasonable request, after completion of the dissertation research and publication of findings, from the first author.


  1. Goodman MS, SandersThompson VL. The science of stakeholder engagement in research: classification, implementation and evaluation. Transl Behav Med. 2017;7(3):486–91.

    Article  Google Scholar 

  2. Mrklas KJ, Boyd JM, Shergill S, Merali SM, Khan M, Moser C, Nowell L, Goertzen A, Swain L, Pfadenhauer LM, Sibley KM, Vis-Dunbar M, Hill MD, Raffin-Bouchal S, Tonelli M, Graham ID. A scoping review of the globally available tools for assessing health research partnership outcomes and impacts. Health Res Policy Syst. 2022. (under review).

  3. Hoekstra F, Mrklas KJ, Sibley K, Nguyen T, Vis-Dunbar M, Neilson CJ, Crockett LK, Gainsforth HL, Graham ID. A review protocol on research partnerships: a coordinated multicenter team approach. Syst Rev. 2018;7(217):1–14.

    Google Scholar 

  4. Drahota A, Meza RD, Brikho B, Naaf M, Estabillo JA, Gomez ED, Vejnoska SF, Dufek S, Stahmer AC, Aarons GA. Community-academic partnerships: a systematic review of the state of the literature and recommendations for future research. Milbank Q. 2016;94(1):163–214.

    Article  Google Scholar 

  5. Sandoval JA, Lucero J, Oetzel J, Avila M, Belone L, Mau M, Pearson C, Tafoya G, Duran B, Iglesias Rios L, Wallerstein N. Process and outcome constructs for evaluating community-based participatory research projects: a matrix of existing measures. Health Educ Res. 2012;27(4):680–90.

    Article  Google Scholar 

  6. Hamzeh J, Pluye P, Bush PL, Ruchon C, Vedel I, Hudon C. Towards assessment for organizational participatory research health partnerships: a systematic mixed studies review with framework synthesis. Eval Program Plan. 2018;73:116–28.

    Article  Google Scholar 

  7. Boivin A, L’Esperance A, Gauvin FP, Dumez V, Maccaulay AC, Lehoux P, Abelson J. Patient and public engagement in research and health system decision making: a systematic review of evaluation tools. Health Expect. 2018;21(6):1075–84.

    Article  Google Scholar 

  8. MacGregor S. An overview of quantitative instruments and measures for impact in co-production. J Prof Cap Community. 2020;6(2):163–83.

    Google Scholar 

  9. Luger TM, Hamilton AB, True G. Measuring community-engaged research contexts, processes and outcomes: a mapping review. Milbank Q. 2020;98(2):493–553.

    Article  Google Scholar 

  10. Tigges BB, Miller D, Dudding KM, Balls-Berry JE, et al. Measuring quality and outcomes of research collaborations: an integrative review. J Clin Transl Sci. 2019;3:261–89.

    Article  Google Scholar 

  11. Brush BL, Mentz G, Jensen M, Jacobs B, Saylor KM, Rowe Z, Israel BA, Lachance L. Success in longstanding community based participatory research (CBPR) partnerships: a scoping literature review. Health Educ Behav. 2019;47(4):556–68.

    Article  Google Scholar 

  12. Bowen DJ, Hyams T, Goodman M, West KM, Harris-Wai J, Yu JH. Systematic review of quantitative measures of stakeholder engagement. Clin Transl Sci. 2017;10:314–36.

    Article  CAS  Google Scholar 

  13. Vat LE, Finlay T, Schuitmaker-Warnaar TJ, et al. Evaluating the ‘return on patient engagement initiatives’ in medicines research and development: a literature review. Health Expect. 2020;23:5–18.

    Article  Google Scholar 

  14. Hoekstra F, Mrklas KJ, Khan M, McKay RC, Vis-Dunbar M, Sibley K, Nguyen T, Graham ID, SCI Guiding Principles Consensus Panel, Gainforth HL. A review of reviews on principles, strategies, outcomes and impacts of research partnerships approaches: a first step in synthesising the research partnership literature. Health Res Policy Syst. 2020;18(51):1–23.

    Google Scholar 

  15. Hoekstra F, Trigo F, Sibley K, Graham ID, Kennefick M, Mrklas KJ, Nguyen T, Vis-Dunbar M, Gainforth HL. Systematic overviews of partnership principles and strategies identified from health research about spinal cord injury and related health conditions: a scoping review. J Spinal Cord Med. 2021.

    Article  Google Scholar 

  16. Glasgow RE, Riley WT. Pragmatic measures: what they are and why we need them. Am J Prev Med. 2013;45(2):237–43.

    Article  Google Scholar 

  17. Stanick CF, Halko HM, Nolen EA, Powell BJ, Dorsey CN, Mettert KD, Weiner BJ, Barwick M, Wolfenden L, Damschroder LJ, Lewis CC. Pragmatic measures for implementation research: development of the psychometric and pragmatic evidence rating scale (PAPERS). Transl Behav Med. 2021;11(1):11–20.

    Article  Google Scholar 

  18. Lewis CC, Mettert KD, Stanick CF, Halko HM, Nolen EA, Powell BJ, Weiner BJ. The psychometric and pragmatic evidence rating scale (PAPERS) for measure development and evaluation. Implement Res Pract. 2021.

    Article  Google Scholar 

  19. IKTRN (Integrated Knowledge Translation Research Network). Resources: our publications. 2021. Accessed 23 Nov 2021.

  20. Mrklas KJ. Towards the development of a valid, reliable and acceptable tool for assessing the impact of health research partnerships (PhD dissertation thesis proposal). Calgary: University of Calgary; 2018. p. 119pp.

  21. Arksey H, O’Malley L. Scoping studies: towards a methodological framework. Int J Soc Res Methodol Theory Pract. 2005;8(1):19–32.

    Article  Google Scholar 

  22. Levac D, Colquhoun H, O’Brien KK. Scoping studies: advancing the methodology. Implement Sci. 2010;5(69):1–9.

    Google Scholar 

  23. Daudt HM, van Mossel C, Scott SJ. Enhancing the scoping study methodology: a large, inter-professional team’s experience with Arksey and O’Malley’s framework. BMC Med Res Methodol. 2013;13(48):1–9.

    Google Scholar 

  24. Colquhoun HI, Levac D, O’Brien KK, Straus S, Tricco AC, Perrier L, Kastner M, Moher D. Scoping reviews: time for clarity in definition, methods and reporting. J Clin Epidemiol. 2014;67(12):1291–4.

    Article  Google Scholar 

  25. Centre for Reviews and Dissemination (CRD), University of York. Systematic reviews: CRD’’s guidance for undertaking reviews in health care. Layerthorpe, York: CRD, University of York; 2009.

  26. Higgins J, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA. Cochrane handbook for systematic reviews of interventions, version 6.2. Cochrane. 2021.

  27. Joanna Briggs Institute. The Joanna Briggs Institute Reviewers’ Manual 2015. Adelaide: Joanna Briggs Institute; 2015. p. 24.

    Google Scholar 

  28. Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, The PRISMA, et al. Statement: an updated guideline for reporting systematic reviews. BMJ. 2020;2021:372.

    Google Scholar 

  29. Mrklas KJ, et al. Open science framework file: towards the development of a valid, reliable and acceptable tool for assessing the impact of health research partnerships (protocols). 2021. Accessed 23 Nov 2021.

  30. Bidwell S, Jensen MF. Etext on health technology assessment (HTA) information resources. Chapter 3: Using a search protocol to identify sources of information: the COSI model. 2000. Accessed 2 July 2019.

  31. Mrklas KJ, Merali S, Khan M, Shergill S, Boyd JM, Nowell L, Pfadenhauer LM, Paul K, Goertzen A, Swain L, Sibley KM, Vis-Dunbar M, Hill MD, Raffin-Bouchal S, Tonelli M, Graham ID. How are health research partnerships assessed? A systematic review of outcomes, impacts, terminology and the use of theories, models and frameworks. Health Res Policy Syst. 2022 (accepted).

  32. Sampson M. Should we change how we do our searches? Objectively derived search strategies or ‘exhaustive search method’ as performed by Bramer. Ottawa: Childrens Hospital of Eastern Ontario (CHEO) Research Institute, University of Ottawa; 2016. p. 1–34.

  33. Sampson M, McGowan J, Cogo E, Grimshaw J, Moher D, Lefebvre C. An evidence-based practice guideline for the peer review of electronic search strategies. J Clin Epidemiol. 2009;62:944–52.

    Article  Google Scholar 

  34. McGowan J, Sampson M, Salzwedel D, Cogo E, Foerster V, Lefebvre C. Guideline statement: PRESS peer review of electronic search strategies 2015 guideline statement. J Clin Epidemiol. 2016;75:40–6.

    Article  Google Scholar 

  35. Bramer WM, Giustini D, de Jonge GB, Holland L, Bekhuis T. De-duplication of database search results for systematic reviews in EndNote. J Med Libr Assoc. 2016;104(3):240–3.

    Article  Google Scholar 

  36. Terwee CB, de Vet HCW, Prinsen CAC, Mokkink LB. Protocol for systematic reviews of measurement properties. 2011. Accessed 24 Feb 2022.

  37. Scherer RW, Saldanha IJ. How should systematic reviewers handle conference abstracts? A view from the trenches. Syst Rev. 2019;8(264):1–6.

    Google Scholar 

  38. Microsoft Corporation. Microsoft excel for Mac 2021, V. (21101001), editor. 2021 Microsoft Corporation. 2021.

  39. Altman DG. Practical statistics for medical research: measuring agreement. London: Chapman and Hall; 1991.

    Google Scholar 

  40. Armstrong R, Hall BJ, Doyle J, Waters E. ‘Scoping the scope’ of a Cochrane review. J Public Health. 2011;33(1):147–50.

    Article  Google Scholar 

  41. Valaitis R, Martin-Misenter R, Wong ST, et al. Methods, strategies and technologies used to conduct a scoping literature review of collaboration between primary care and public health. Prim Health Care Res Dev. 2012;13(3):219–36.

    Article  Google Scholar 

  42. McHugh ML. Interrater reliability: the kappa statistic. Biochemia Medica. 2012;22(3):276–82.

    Article  Google Scholar 

  43. Polanin JR, Pigott TD, Espelage DL, Grotpeter JK. Best practice guidelines for abstract screening large-evidence systematic reviews and meta-analyses. Res Synthesis Methods. 2019;10(3):330–42.

    Article  Google Scholar 

  44. O’Blenis P. Data extraction: weighing your options. In: Evidence partners. Evidence Partners Inc.; 2016.

  45. Tricco AC, Lillie E, Zarin W, O’Brien K, Colquhoun H, Kastner M, Levac D, Ng C, Pearson Sharpe J, Wilson K, Kenny M, Warren R, Wilson C, Stelfox HT, Straus SE. A scoping review on the conduct and reporting of scoping reviews. BMC Med Res Methodol. 2016;16(15):1–10.

    Google Scholar 

  46. Braun V, Clarke V. Using thematic analysis in psychology. Qual Res Psychol. 2006;3(2):77–101.

    Article  Google Scholar 

  47. Terwee CB, Bot S, de Boer MR, van der Windt D, Knol DL, Dekker J, Bouter LM, de Vet HCW. Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol. 2007;60:34–42.

    Article  Google Scholar 

  48. Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, Bouter LM, de Vet HCW. COSMIN checklist manual. Amsterdam: University Medical Center; 2012.

    Google Scholar 

  49. Boateng GO, Neilands TB, Frongillo EA, Melgar-Quinonez HR, Young SL. Best practices for developing and validating scales for health, social, and behavioural research. Front Public Health. 2018;6:149.

    Article  Google Scholar 

  50. Centre of Excellence on Partnership with Patients and the Public (CEPPP). Patient and public engagement evaluation toolkit. 2021. Accessed 15 Dec 2022.

  51. Research and Evaluation Unit, W.R.H.A. Random sample calculator. 2014. Accessed 03 Nov 2021.

  52. Protogerou C, Hagger MS. A checklist to assess the quality of survey studies in psychology. Methods Psychol. 2020;3: 100031.

    Article  Google Scholar 

  53. Khan K, Kunz R, Kleijnen J, Antes G. Systematic reviews to support evidence-based medicine. 2nd ed. London: Hodder Arnold; 2011.

    Book  Google Scholar 

  54. Statacorp LP. Stata 13.1 statistics/data analysis special edition. College Station: StataCorp LP; 2013.

  55. International Q. NVivo12 for Mac. New York: QSR International; 2019.

  56. Pawson R. Evidence-based policy: in search of a method. Evaluation. 2002;8(2):157–81.

    Article  Google Scholar 

  57. Tong A, Sainsbury P, Craig J. Consolidated criteria for reporting qualitative research (COREQ): a 32-item checklist for interviews and focus groups. Int J Qual Health Care. 2007;19(6):349–57.

    Article  Google Scholar 

  58. Tong A, Flemming K, McInnes E, Oliver S, Craig J. Enhancing transparency in reporting the synthesis of qualitative research: ENTREQ. BMC Med Res Methodol. 2012;12(181):1–8.

    Google Scholar 

  59. O’Brien BC, Harris IB, Beckman TJ, Reed DA, Cook DA. Standards for reporting qualitative research: a synthesis of recommendations. Acad Med. 2014;89(9):1245–51.

    Article  Google Scholar 

  60. Brown LD, Chilenski SM, Ramos R, Gallegos N, Feinberg ME. Community prevention coalition context and capacity assessment: comparing the United States and Mexico. Health Educ Behav. 2016;43(2):145–55.

    Article  Google Scholar 

  61. Duran B, Oetzel J, Magarati M, et al. Toward health equity: a national study of promising practices in community-based participatory research. Progr Community Health Partnersh Res Educ Action. 2019;13(4):337–52.

    Article  Google Scholar 

  62. Dickson E, Magarati M, Boursaw B, Oetzel J, Devia C, Ortiz K, Wallerstein N. Characteristics and practices within research partnerships for health and social equity. Nurs Res. 2020;69(1):51–61.

    Article  Google Scholar 

  63. Bilodeau A, et al. L’Outil diagnostique de l’action en partenariat: fondements, élaboration et validation. Can J Public Health. 2011;102(4):298–302.

    Article  Google Scholar 

  64. Bilodeau A, Kranias G. Self-evaluation tool for action in partnership: translation and cultural adaptation of the original Quebec French tool to Canadian English. Can J Program Eval. 2019;34(2):192–206.

    Google Scholar 

  65. Loban E, Scott C, Lewis V, Haggerty J. Measuring partnership synergy and functioning: multi-stakeholder collaboration in primary health care. PLoS ONE. 2021;16: e0252299.

    Article  CAS  Google Scholar 

  66. Wagemakers MA, Koelen MA, Lezwijn J, Vaandrager L. Coordinated action checklist: a tool for partnerships to facilitate and evaluation community health promotion. Glob Health Promot. 2010;17(3):17–28.

    Article  Google Scholar 

  67. Oetzel JG, Villegas M, Zenone H, White Hat ER, Wallerstein N, Duran B. Enhancing stewardship of community-engaged research through governance. Am J Public Health. 2015;105:1161–7.

    Article  Google Scholar 

  68. Oetzel JG, Zhou C, Duran B, et al. Establishing the psychometric properties of constructs in a community-based participatory research conceptual model. Am J Health Promot. 2015;29(5):e188-202.

    Article  Google Scholar 

  69. Stocks SJ, Giles SJ, Cheraghi-Sohi S, Campbell S. Application of a tool for the evaluation of public and patient involvement in research. BMJ Open. 2015;5: e006390.

    Article  Google Scholar 

  70. Goodman MS, Sanders Thompson VL, Arroyo Johnson C, Gennarelli R, Drake BF, Bajwa P, Witherspoon M, Bowen D. Evaluating community engagement in research: quantitative measure development. J Community Psychol. 2017;45(1):17–32.

    Article  Google Scholar 

  71. Oetzel JG, Wallerstein N, Duran B, Sanchez-Youngman T, Woo K, Wang J, et al. Impact of participatory health research: a test of the community-based participatory research conceptual model. Biomed Res Int. 2018;1:7281405.

    Google Scholar 

  72. Rodriguez Espinosa P, Sussman A, Pearson CR, Oetzel J, Wallerstein N. Personal outcomes in community-based participatory research partnerships: a cross-site mixed methods study. Am J Comm Psychol. 2020;66:439–49.

    Article  Google Scholar 

  73. Lucero JE, Boursaw B, Eder M, Greene-Moton E, Wallerstein N, Oetzel JG. Engage for equity: the role of trust and synergy in community-based participatory research. Health Educ Behav. 2020;47(3):372–9.

    Article  Google Scholar 

  74. Boursaw B, Oetzel JG, Dickson E, et al. Scales of practices and outcomes for community-engaged research. Am J Community Psychol. 2021;67(3–4):1–15.

    Google Scholar 

  75. Feinberg ME, Bontempo DE, Greenberg MT. Predictors and level of sustainability of community prevention coalitions. Am J Prev Med. 2008;34(6):495–501.

    Article  Google Scholar 

  76. Feinberg ME, Gomez G, Puddy RW, Greenberg MT. Evaluation and community prevention coalitions: validation of an integrated web-based/technical assistance consultant model. Health Educ Behav. 2008;35(1):9–21.

    Article  Google Scholar 

  77. King G, Servais M, Forchuk C, Chalmers H, Currie M, Law M, Specht J, Rosenbaum P, Willoughby T, Kertoy M. Features and impacts of five multidisciplinary community-university research partnerships. Health Soc Care Community. 2010;18(1):59–69.

    Google Scholar 

  78. Brown LD, Feinberg ME, Greenberg MT. Measuring coalition functioning: refining constructs through factor analysis. Health Educ Behav. 2012;39(4):486–97.

    Article  Google Scholar 

  79. Brown LD, Feinberg ME, Shapiro VB, Greenberg MT. Reciprocal relations between coalition functioning and provision of implementation support. Prev Sci. 2015;16(1):101–9.

    Article  Google Scholar 

  80. Weiss ES, Anderson RM, Lasker RD. Making the most of collaboration: exploring the relationship between partnership synergy and partnership functioning. Health Educ Behav. 2002;29(6):683–98.

    Article  Google Scholar 

  81. Orr Brawer CR. Replication of the value template process in a community coalition: implications for social capital and sustainability. Philadelphia: Temple University; 2008.

    Google Scholar 

  82. King G, Servais M, Kertoy M, Specht J, Currie M, Rosenbaum P, Law M, Forchuk C, Chalmers H, Willoughby T. A measure of community members’ perceptions of the impacts of research partnerships in health and social services. Eval Program Plan. 2009;32:289–99.

    Article  Google Scholar 

  83. Hamilton CB, Hoens AM, McKkinnon AM, McQuitty S, English K, Hawke LD, Li LC. Shortening and validation of the patient engagement in research scale (PEIRS) for measuring meaningful patient and family caregiver engagement. Health Expect. 2021;24:863–79.

    Article  Google Scholar 

  84. El Ansari W, Phillips CJ. The costs and benefits to participants in community partnerships: a paradox? Health Promot Pract. 2004;5(1):35–48.

    Article  Google Scholar 

  85. Cramer ME, Atwood JR, Stoner JA. Measuring community coalition effectiveness using the ICE instrument. Public Health Nurs. 2006;23(1):74–87.

    Article  Google Scholar 

  86. Jones J, Barry MM. Developing a scale to measure synergy in health promotion partnerships. Glob Health Promot. 2011;18(2):36–44.

    Article  Google Scholar 

  87. Jones B, Barry MM. Developing a scale to measure trust in health promotion partnerships. Health Promot Int. 2011;26(4):484–91.

    Article  CAS  Google Scholar 

  88. West KM. Researcher trustworthiness in community-academic research partnerships: implications for genomic research. In: Public health genetics. Seattle: University of Washington; 2018.

    Google Scholar 

  89. Perkins DF, Feinberge ME, Greenberg MT, Johnson LE, Chilenski SM, Mincemoyer CC, Spoth RL. Team factors that predict to sustainability indicators for community-based prevention teams. Eval Program Plan. 2011;34:283–91.

    Article  Google Scholar 

  90. Staniszewska S, Herron-Marx S, Mockford C. Measuring the impact of patient and public involvement: the need for an evidence base. Int J Qual Health Care. 2008;20(6):373–4.

    Article  Google Scholar 

  91. Jagosh J, Macaulay AC, Pluye P, Salsbert J, Bush PL, Henderson J, Greenhalgh T. Uncovering the benefits of participatory research: implications of a realist review for health research and practice. Millbank Q. 2012;90(2):311–46.

    Article  Google Scholar 

  92. Goodman MS, Ackermann N, Bowen DJ, Thompson V. Content validation of a quantitative stakeholder engagement measure. J Community Psychol. 2019;47:1937–51.

    Article  Google Scholar 

  93. Wallerstein N, Oetzel J, Sanchez-Youngman S, et al. Engage for equity: a long-term study of community-based participatory research and community-engaged research practices and outcomes. Health Educ Behav. 2020;47(3):380–90.

    Article  Google Scholar 

  94. Granner ML, Sharpe PA. Evaluating community coalition characteristics and functioning: a summary of measurement tools. Health Educ Res Theory Pract. 2004;19(5):514–32.

    Article  CAS  Google Scholar 

  95. Zuckerman HS, Kaluzny AD, Ricketts TC. Alliances in health care: what we know, what we think we know and what we should know. Health Care Manag Rev. 1995;20:54–64.

    Article  CAS  Google Scholar 

  96. Nguyen T, et al. How does integrated knowledge translation (IKT) compare to other collaborative research approaches to generating and translating knowledge? Learning from experts in the field. Health Res Policy Syst. 2020;18(1):35.

    Article  CAS  Google Scholar 

  97. Jull J, Giles A, Graham ID. Community-based participatory research and integrated knowledge translation: advancing the co-creation of knowledge. Implement Sci. 2017;12(150):1–9.

    Google Scholar 

  98. Bowen S. The relationship between engaged scholarship, knowledge translation and participatory research. In: Participatory qualitative research methodologies in health. Los Angeles: SAGE; 2015. p. 183–99.

    Chapter  Google Scholar 

  99. Voorberg WH, Bekkers VJ, Tummers LG. A systematic review of co-creation and co-production: embarking on the social innovation journey. Public Manag Rev. 2015;17(9):1333–57.

    Article  Google Scholar 

  100. IKTRN (Integrated Knowledge Translation Research Network). IKTRN: about us—vision and mission. 2022. Accessed 26 Oct 2022.

  101. Graham ID, Kothari A, McCutcheon C. Moving knowledge into action for more effective practice, programmes and policy: protocol for a research programme on integrated knowledge translation. Implement Sci. 2018;13(22):1–15.

    Google Scholar 

  102. IKTRN (Integrated Knowledge Translation Research Network). Research projects. 2022. Accessed 26 Oct 2022.

  103. Health Research and Educational Trust. Partnership Self-Assessment Survey, Community Care Network Evaluation, Chicago, 1997.

  104. Provan KG, Nakama L, Veazie MA, Teufel-Shone NI, Huddlesston C. Building community capacity around chronic disease services through a collaborative interorganizational network. Health Edu Behav 2003:30:646–62.

  105. Israel BA, Checkoway B, Schulz A, Zimmerman M. Health education and community empowerment: Conceptualizing and measuring perceptions of individual, organizational, and community control. Health Edu Quarter. 1994:21:149–70.

  106. Bullen P, Onyx J. Measuring social capital in five communities in NSW, A practitioner’s guide. 1998. Accessed 11 Dec 2022.

  107. Chrislip DD, Larson CE. Collaborative leadership: How citizens and civic leaders can make a difference. San Francisco: Jossey-Bass; 1994.

  108. Mattessich PW, Murray-Close M, Monsey BR. The Wilder collaboration factors inventory: assessing your collaboration's strengths and weaknesses. Amherst H. Wilder Foundation; 2001.

  109. Bilodeau A, Galarneau M, Fournier M, Potvin L, Senecal G, Bernier J. Outil Diagnostique De L’Action en Partnenariat, 1st Edn. Direction de sante publique de l’Agence de la sante et des services sociaux de Montreal. 2008. ISBN 978-2-89673-450-4.

  110. Bilodeau A, Galarneau M, Fournier M, Potvin L, Senecal G, Bernier J. Outil Diagnostique De L’Action en Partnenariat, 2nd Edn. Direction de sante publique de l’Agence de la sante et des services sociaux de Montreal. 2014. ISBN 978-2-89673-450-4.

  111. Bilodeau A, Galarneau M, Fournier M, Potvin L, Senecal G, Bernier J. Self-Evaluation Tool for Action in Partnership. Health Nexus, Library and Archives Canada. 2017. ISBN 978-0-9866907-5-4.

  112. Cramm JM, Strating MM, Nieboer AP. Development and validation of a short version of the Partnership Synergy Assessment Tool (PSAT) among professionals in Dutch disease-management partnerships. BMC Res Notes. 2011;4:224.

  113. Cramm JM, Strating MM, Nieboer AP. The role of partnership functioning and synergy in achieving sustainability of innovative programmes in community care. Health Soc Care Commun. 2013;21(2):209–15.

  114. Slaghuis SS, Strating MM, Bal RA, Nieboer AP. A framework and a measurement instrument for sustainability of work practices in long-term care. BMC Health Serv Res. 2011;11:314.

  115. Morrow E, Ross F, Grocott P, Bennett J. A model and measure for quality service user involvement in health research. Int J Consum Stud. 2010;34(5):532–9.

  116. Moore A, Wu Y, Kwakkenbos L, Silveira K, Straus S, Brouwers M, Grad R, Thombs BD. The patient engagement evaluation tool was valid for clinical practice guideline development. J Clin Epidemiol. 2022;143:61–72.

  117. Hamilton CB, Hoens AM, McQuitty S, McKinnon AM, English K, Backman CL, et al. Development and pre-testing of the Patient Engagement In Research Scale (PEIRS) to assess the quality of engagement from a patient perspective. PLoS ONE. 2018;13(11):e0206588.

Download references


Many thanks to Christie Hurrell, University of Calgary, for consultative advice regarding the refinement of search term clusters, and Christine Neilson (CN) at the University of Manitoba for her assistance with PRESS assessments. Sincere gratitude to Dr Aziz Shaheen, Department of Gastroenterology, Cumming School of Medicine, University of Calgary, for providing summer student support for Liam Swain (LS), Kevin Paul (KP) and Kate Aspinall (KA). Warm thanks to Cheryl Moser (CM) for her contributions during the full-text screening phase. We thank Dr Audrey L’Esperance at the Centre of Excellence on Partnership with Patients and the Public (CEPPP), who introduced us to the Patient and Public Engagement Evaluation Toolkit assessment grid, permitting us to modify it for our study purposes. We would like to recognize our colleagues in the IKTRN and the Multicentre Collaborative Team for their iterative feedback over the course of the research.


Dissertation research support was provided by Dr Ian Graham through a Canadian Institutes for Health Research (CIHR) Foundation Scheme Grant (FDN#143237) Moving Knowledge Into Action for More Effective Practice, Programs and Policy: A Research Program Focusing on Integrated Knowledge Translation. Dr Kate Sibley provided research assistant support with a CIHR Project Grant (#FRN156372) Advancing the Science of Integrated Knowledge Translation with Health Researchers and Knowledge Users: Understanding Current & Developing Recommendations for iKT Practice. University of Calgary Summer studentships (L. Swain, K. Paul and K. Aspinall) were provided by Dr Aziz Shaheen, Department of Gastroenterology, Cumming School of Medicine, University of Calgary. Funding agencies were not involved in the study design, collection, analysis or interpretation of the data, or in the writing of the manuscript and its dissemination.

Author information

Authors and Affiliations



Conceptualization, study design—KJM with Doctoral Supervisory Committee: MDH, SRB, MT, IDG. Formal analysis: KJM. Funding acquisition: KJM, KMS, IDG. Investigation: KJM, JMB, SS, SM, MK, LN, AG, KP, LS, LMP, KMS, MVD. Methodology: KJM, KMS, MVD, MDH, SRB, MT, IDG. Project administration: KJM, KMS, IDG. Supervision: IDG, MDH, SRB, CT. Validation: KJM, SS, SM, MK, KP. Writing—original draft: KJM. Writing—review, editing and approval of final manuscript: KJM, JMB, SS, SM, MK, LN, AG, KP, LS, LMP, KMS, MVD, MDH, SRB, CT, IDG. Guarantor: IDG. All authors read and approved the final manuscript.

Authors’ information

KM is a Doctoral Candidate at the University of Calgary in the Department of Community Health Sciences—Health Services Research stream. She is employed by the Strategic Clinical Networks™ at Alberta Health Services as a Knowledge Translation Implementation Scientist.

JMB is employed by the Knowledge Translation Program, St. Michael’s Hospital, Unity Health Toronto, as a Research Manager.

SS is a BSc Health Sciences major (Biomedical Stream) at the University of Calgary.

SM is a BSc Kinesiology major in the Faculty of Kinesiology at the University of Calgary.

MK is a research coordinator in the Department of Community Health Sciences, University of Manitoba.

LN is an Assistant Professor at the University of Calgary in the Faculty of Nursing. She holds a Teaching and Learning Research Professorship and is a University of Calgary Teaching Scholar.

AG is a BSc Physiology major in the Faculty of Science at the University of Alberta.

LMP is a senior research fellow at the Pettenkofer School of Public Health, University of Munich LMU), Germany.

KP is a student funded by the University of Calgary Summer Studentships Program (2021).

KMS is an Associate Professor, Department of Community Health Sciences; Director, Knowledge Translation, Centre for Healthcare Innovation; University of Manitoba.

LS is an MSc student (Epidemiology Stream) in the Department of Community Health Sciences, Cumming School of Medicine, University of Calgary.

MVD is the Data and Digital Scholarship Librarian at the University of British Columbia’s Okanagan campus.

MDH is the Medical Director for the Cardiovascular and Stroke Strategic Clinical Network™ at Alberta Health Services, with a primary appointment as Professor in the Department of Clinical Neuroscience and Hotchkiss Brain Institute, Cumming School of Medicine, University of Calgary and Foothills Medical Centre.

SRB is Associate Professor, Faculty of Nursing University of Calgary.

CT is the Associate Vice President, Research, University of Calgary.

IDG is a Distinguished University Professor in the Schools of Epidemiology and Public Health and Nursing at the University of Ottawa and Senior Scientist at the Ottawa Hospital Research Institute.

Corresponding author

Correspondence to K. J. Mrklas.

Ethics declarations

Ethics approval and consent to participate

This study was reviewed and approved by the Conjoint Health Research Ethics Board (CHREB) at the University of Calgary (REB180174).

Consent for publication

Not applicable.

Competing interests

KM, JMB, SM, SS, MVD, KP, LS, MK, LN, AG, LMP, KMS, SRB and CT have no competing interests to declare. MDH is the Medical Director (Stroke) for the Cardiovascular and Stroke Strategic Clinical Network™ at Alberta Health Services. IDG holds the position of Scientific Director for the IKTRN.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Appendix S1.

Systematic review protocol deviations and rationale. Appendix S2. Glossary of terms. Appendix S3. Translated search strategy. Appendix S4. Health research partnership pragmatic tool evaluation criteria. Appendix S5. Quality assessment checklist for survey studies in psychology (Q-SSP) criteria. Appendix S6. Bibliography of included studies. Appendix S7. PRISMA-systematic review checklist.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Mrklas, K.J., Boyd, J.M., Shergill, S. et al. Tools for assessing health research partnership outcomes and impacts: a systematic review. Health Res Policy Sys 21, 3 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Health research partnerships
  • Evaluation tools
  • Psychometrics
  • Acceptability
  • Systematic review