Mobilising Evidence for Good Governance

Taking Stock of Principles and Standards for Policy Design, Implementation and Evaluation

Report

OECD Public Governance Reviews

22 December 2020

Annex B. Mapping of Existing Standards of Evidence across a Range of Jurisdictions

#	Country	Organisation	Framework	URL	URL for evidence standard
1	Australia	Be you	The Be You Programs Directory	https://beyou.edu.au/get-started	https://beyou.edu.au/resources/tools-and-guides/about-programs-directory
2	Australia	ARACY Australian Research Alliance for Children and Youth	What Works for Kids (WW4K)	http://whatworksforkids.org.au/	http://whatworksforkids.org.au/rapid-evidence-assessment
3	Canada	Public Health Agency of Canada	Canadian Best Practices Portal	http://cbpp-pcpe.phac-aspc.gc.ca/resources/evidence-informed-decision-making/	https://www.nccmt.ca/registry/resource/pdf/69.pdf
4	Canada	McMaster University	Health Evidence	https://www.healthevidence.org/search.aspx	https://www.healthevidence.org/documents/our-appraisal-tools/quality-assessment-tool-dictionary-en.pdf
5	EU	European Commission	EU-Compass for Action on Mental Health and Well-being	https://ec.europa.eu/health/non_communicable_diseases/mental_health/eu_compass_en	https://ec.europa.eu/health/sites/health/files/mental_health/docs/compass_bestpracticescriteria_en.pdf
6	EU	European Platform for Investing in Children	Evidence Based Practices	https://ec.europa.eu/social/main.jsp?catId=1246&langId=en	https://ec.europa.eu/social/main.jsp?catId=1246&intPageId=4286&langId=en
7	EU	EMCDDA European Monitoring Centre for Drugs and Drug Addiction	European drug prevention quality standards	http://www.emcdda.europa.eu/emcdda-home-page_en	http://www.emcdda.europa.eu/system/files/publications/646/TD3111250ENC_318193.pdf
8	EU	EMCDDA European Monitoring Centre for Drugs and Drug Addiction	Best practice portal	http://www.emcdda.europa.eu/best-practice_en	http://www.emcdda.europa.eu/best-practice/evidence/about
9	Germany	Crime Prevention Council of Lower Saxony	Green List Prevention	http://lpr.niedersachsen.de/nano.cms/english	https://www.gruene-liste-praevention.de/communities-that-care/Media/GreenListPrevention_Rating-Criteria.pdf
10	New Zealand	SUPERU	A Quality Scale for New Zealand	https://www.superu.govt.nz/resources/evidence-rating-scale	https://www.superu.govt.nz/sites/default/files/Publications/Evidence%20Rating%20Scale.pdf
11	New Zealand	Education counts	Best Evidence Synthesis Iteration	https://www.educationcounts.govt.nz/home	https://www.educationcounts.govt.nz/__data/assets/pdf_file/0016/6640/BES-Development-Guidelines-27-07-04.pdf
12	Spain	Prevención basada en la evidencia	Criterios de selección de programas	http://www.prevencionbasadaenlaevidencia.net/index.php	http://www.prevencionbasadaenlaevidencia.net/index.php?page=Criterios
13	UK	Darlington Service Design Lab	Standards of evidence	https://dartington.org.uk/	https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/284086/early-intervention-next-steps2.pdf
14	UK	Project Oracle	Standards of evidence	https://project-oracle.com/what-weve-done/all-projects/	https://project-oracle.com/uploads/files/Validation_Guidebook.pdf
15	UK	Early Intervention Foundation	The Guidebook	https://guidebook.eif.org.uk/	https://guidebook.eif.org.uk/eif-evidence-standards
16	UK	Nesta	Standards of evidence	https://www.nesta.org.uk/	http://www.alliance4usefulevidence.org/assets/What-Counts-as-Good-Evidence-WEB.pdf
17	UK	Bond	Evidence Principles	https://www.bond.org.uk/resources/evidence-principles	https://www.bond.org.uk/ngo-support/evidence-principles-download
18	UK	Centre for Analysis of Youth Transitions (CAYT)	Standards of evidence (CAYT)	http://cayt.mentor-adepis.org/	http://cayt.mentor-adepis.org/wp-content/uploads/2017/06/CAYT-Scoring-Application-Form-2017-FINAL.pdf
19	UK	What Works Centre for Local Economic Growth	The Maryland Scientific Methods Scale (SMS)	http://wwg.percipio.london/	https://whatworksgrowth.org/public/files/Methodology/16-06-28_Scoring_Guide.pdf
20	UK	Big Lottery Fund's Realising Ambition Programme	The confidence review	http://www.theconfidenceframework.org.uk/	http://www.theconfidenceframework.org.uk/different-sectors/
21	UK	Education Endowment Foundation	Teaching and Learning Toolkit	https://educationendowmentfoundation.org.uk/	https://educationendowmentfoundation.org.uk/public/files/Toolkit/Toolkit_Manual_2018.pdf
22	UK	HACT-Ideas and Innovation in Housing	Standards for producing evidence	https://www.hact.org.uk/	https://www.hact.org.uk/sites/default/files/StEv2-1-2016%20Effectiveness-Specification.pdf
23	UK	Conservation evidence	What Works in Conservation	https://www.conservationevidence.com/	https://www.conservationevidence.com/content/page/79
24	UK	What Works Centre for Children’s Social Care	Evidence Standards	https://whatworks-csc.org.uk/	https://wwc-evidence.herokuapp.com/pages/our-ratings-explained
25	UK	What Works Centre for Wellbeing	GRADE (Grading of Recommendations Assessment, Development and Evaluation)	https://whatworkswellbeing.org/	https://whatworkswellbeing.org/product/a-guide-to-our-evidence-review-methods/
26	UK	What Works Centre for Crime Reduction	EMMIE Framework	https://whatworks.college.police.uk/toolkit/Pages/About_the_CRT.aspx	https://whatworks.college.police.uk/toolkit/Pages/Quality-Scale.aspx
27	International	Campbell Collaboration	Campbell Collaboration Systematic Reviews: Policies and Guidelines	https://campbellcollaboration.org/	https://campbellcollaboration.org/library/campbell-collaboration-systematic-reviews-policies-and-guidelines.html
28	USA	Community Preventive Services Task Force (CPSTF)	The Community Guide	https://www.thecommunityguide.org/	https://www.thecommunityguide.org/about/our-methodology
29	USA	U.S. Department of Health & Human Services	Home Visiting Evidence of Effectiveness	https://homvee.acf.hhs.gov/	https://homvee.acf.hhs.gov/Review-Process/4/Overview/19
30	USA	What Works Clearinghouse	Find What Works from Systematic Reviews	https://ies.ed.gov/ncee/wwc/	https://ies.ed.gov/ncee/wwc/Handbooks
31	USA	Center for the Study and Prevention of Violence	Blueprints	http://www.blueprintsprograms.com	https://www.blueprintsprograms.org/resources/Blueprints_Standards_full.pdf
32	USA	California Department of Social Services	California Evidence-Based Clearinghouse for Child Welfare	http://www.cebc4cw.org/home	http://www.cebc4cw.org/ratings/scientific-rating-scale
33	USA	Center for Research and Reform in Education (CRRE) at Johns Hopkins University School of Education	Best Evidence Encyclopedia	http://www.bestevidence.org	http://www.bestevidence.org/aboutbee.htm
34	USA	Center for Research and Reform in Education (CRRE) at Johns Hopkins University School of Education	Evidence for ESSA (Every Student Succeeds Act )	https://www.evidenceforessa.org/	https://content.evidenceforessa.org/sites/default/files/On%20clean%20Word%20doc.pdf
35	USA	Society for Prevention Research	Standards of Evidence for Efficacy, Effectiveness, and Scale-up Research in Prevention Science.	http://www.preventionresearch.org/	http://www.preventionresearch.org/wp-content/uploads/2011/12/Standards-of-Evidence_2015.pdf
36	USA	U.S. Department of Health and Human Services	Evidence Based Teen Pregnancy Programs	https://tppevidencereview.aspe.hhs.gov/Default.aspx	https://tppevidencereview.aspe.hhs.gov/pdfs/TPPER_Review%20Protocol_v5.pdf
37	USA	National Institute of Justice	Crimesolutions	https://www.crimesolutions.gov	https://www.crimesolutions.gov/about_evidencecontinuum.aspx
38	USA	Arnold Ventures	Social Programmes That Work	https://evidencebasedprograms.org/	http://toptierevidence.org
39	USA	Child Trends	What Works for Child and Youth Development	http://www.childtrends.org/what-works	http://www.childtrends.org/what-works/eligibility-criteria
40	USA	Washington State Institute for Public Policy (WSIPP)	Washington State Institute for Public Policy Benefit -Cost Results	http://www.wsipp.wa.gov/BenefitCost	http://www.wsipp.wa.gov/TechnicalDocumentation/WsippBenefitCostTechnicalDocumentation.pdf
41	USA	U.S Department of Justice	Office of Juvenile Justice and Delinquency Prevention Model Programs Guide	http://www.ojjdp.gov/programs/ProgSearch.asp	https://www.ojjdp.gov/mpg/Home/About
42	USA	University of Wisconsin Population Health Institute’s	What Works for Health	http://whatworksforhealth.wisc.edu/search-options.php	http://whatworksforhealth.wisc.edu/evidence.php
43	USA	U.S. Department of Health and Human Services	The Agency for Healthcare Research and Quality (AHRQ)	https://www.ahrq.gov/cpi/about/profile/index.html	https://innovations.ahrq.gov/help/evidence-rating
44	USA	U.S. Department of Labor	Clearinghouse for Labor Evaluation and Research (CLEAR)	https://clear.dol.gov/	https://clear.dol.gov/about
45	USA	Clearinghouse for Military Family Readiness	Continuum of evidence	https://lion.militaryfamilies.psu.edu/programs/find-programs	https://militaryfamilies.psu.edu/wp-content/uploads/2017/08/continuum.pdf
46	USA	Suicide Prevention Resource Center	Evidence-Based Practices Project	http://www.sprc.org/	http://www.sprc.org/sites/default/files/ebpp_proj_descrip%20revised.pdf
47	USA	National Implementation Research Network	The Hexagon: An Exploration Tool	https://nirn.fpg.unc.edu/	https://implementation.fpg.unc.edu/sites/implementation.fpg.unc.edu/files/imce/documents/NIRN%20Hexagon%20Discussion%20Analysis%20Tool%20v2.2.pdf
48	USA	National Dropout Prevention Center	Model Programs Database	http://dropoutprevention.org/	http://dropoutprevention.org/mpdb/web/rating-system
49	USA	National Cancer Institute	Research-Tested Intervention Programs (RTIPs)	https://www.cancer.gov/	https://rtips.cancer.gov/rtips/reviewProcess.do
50	USA	Strengthening Families Evidence Review	Standards of evidence	https://familyreview.acf.hhs.gov/Default.aspx	https://familyreview.acf.hhs.gov/ReviewProcess.aspx?id=3

#	Country	Organization	Framework	Type of evidence assessed				Theory of change/ Logic Model	Design and development	Efficacy	Effectiveness	Cost		Implementation
#	Country	Organization	Framework	Impact evaluation	Systematic review	Quantitative methods	Qualitative methods	Theory of change/ Logic Model	Design and development	Efficacy	Effectiveness	Cost information	Cost-Benefit evaluation	Requirements	Intervention Readiness	System readiness	Experiences
1	Australia	Be you	The Be You Programs Directory	X		X	X	YES	YES	YES	NO	YES	NO	YES	NO	YES	NO
2	Australia	ARACY Australian Research Alliance for Children and Youth	What Works for Kids (WW4K)	X		X		NO	YES	YES	YES	YES	YES	YES	NO	NO	NO
3	Canada	Public Health Agency of Canada	Canadian Best Practices Portal	X				NO	YES	YES	YES	NO	NO	YES	NO	YES	NO
4	Canada	McMaster University	Health Evidence	X	X	X	X	NO	NO	YES	YES	YES	YES	NO	NO	NO	NO
5	EU	European Commission	EU-Compass for Action on Mental Health and Well-being	X				YES	NO	YES	YES	YES	YES	YES	YES	YES	NO
6	EU	European Platform for Investing in Children	Evidence Based Practices	X		X		NO	YES	YES	YES	YES	YES	YES	NO	NO	NO
7	EU	EMCDDA European Monitoring Centre for Drugs and Drug Addiction	European drug prevention quality standards	X		X	X	YES	YES	YES	YES	YES	YES	YES	YES	NO	NO
8	EU	EMCDDA European Monitoring Centre for Drugs and Drug Addiction	Best practice portal	X	X	X	X	NO	NO	YES	YES	NO	NO	NO	NO	NO	YES
9	Germany	Crime Prevention Council of Lower Saxony	Green List Prevention	X				YES	YES	YES	YES	YES	NO	YES	NO	NO	NO
10	New Zealand	SUPERU	A Quality Scale for New Zealand	X	X			YES	YES	YES	YES	YES	YES	YES	NO	NO	YES
11	New Zealand	Education counts	Best Evidence Synthesis Iteration	X	X	X	X	YES	YES	YES	YES	NO	YES	YES	NO	NO	YES
12	Spain	Prevención basada en la evidencia	Criterios de selección de programas	X		X		YES	NO	YES	YES			YES	NO	NO	YES
13	UK	Darlington Service Design Lab	Standards of evidence	X		X		YES	YES	YES	YES	YES	NO	YES	YES	YES	NO
14	UK	Project Oracle	Standards of evidence	X		X	X	YES	YES	YES	YES	NO	YES	NO	YES	NO	NO
15	UK	Early Intervention Foundation	The Guidebook	X				YES	YES	YES	YES	YES	NO	YES	NO	NO	NO
16	UK	Nesta	Standards of evidence	X		X	X	YES	YES	YES	YES	YES	NO	YES	YES	NO	NO
17	UK	Bond	Evidence Principles	X		X	X	YES	NO	YES	NO	NO	NO	NO	NO	YES	YES
18	UK	Centre for Analysis of Youth Transitions (CAYT)	Standards of evidence (CAYT)	X		X	X	NO	YES	YES	YES	YES	NO	NO	NO	NO	NO
19	UK	What Works Centre for Local Economic Growth	The Maryland Scientific Methods Scale (SMS)	X		X		NO	YES	YES	NO	NO	NO	NO	NO	NO	NO
20	UK	Big Lottery Fund's Realising Ambition Programme	The confidence review	X		X	X	YES	YES	NO	NO	YES	YES	YES	YES	YES	NO
21	UK	Education Endowment Foundation	Teaching and Learning Toolkit	X	X	X		NO	YES	YES	NO	YES	YES	NO	NO	NO	NO
22	UK	HACT-Ideas and Innovation in Housing	Standards for producing evidence	X		X		YES	YES	YES	YES	NO	YES	YES	YES	YES	YES
23	UK	Conservation Evidence	What Works in Conservation	X		X		NO	NO	YES	NO	NO	NO	NO	NO	NO	NO
24	UK	What Works Centre for Children’s Social Care	Evidence Standards	X	X			NO	NO	YES	YES	YES	YES	YES	YES	NO	YES
25	UK	What Works Centre for Wellbeing	GRADE (Grading of Recommendations Assessment, Development and Evaluation)	X		X	X	YES	YES	YES	YES	YES	YES	NO	NO	NO	YES
26	UK	What Works Centre for Crime Reduction	EMMIE Framework		X			YES	NO	YES	YES	YES	YES	YES	NO	NO	YES
27	International	Campbell Collaboration	Campbell Collaboration Systematic Reviews: Policies and Guidelines	X		X		YES	NO	YES	YES	YES	YES	YES	NO	NO	YES
28	USA	Community Preventive Services Task Force (CPSTF)	The Community Guide	X		X		YES	YES	YES	YES	YES	YES	YES	YES	YES	YES
29	USA	U.S. Department of Health & Human Services	Home Visiting Evidence of Effectiveness	X		X		NO	NO	YES	YES	YES	NO	YES	NO	NO	YES
30	USA	What Works Clearinghouse	Find What Works from Systematic Reviews	X		X		NO	YES	YES	NO	NO	NO	NO	NO	NO	NO
31	USA	Center for the Study and Prevention of Violence	Blueprints	X				YES	NO	YES	YES	NO	NO	YES	YES	NO	YES
32	USA	California Department of Social Services	California Evidence-Based Clearinghouse for Child Welfare	X		X		NO	NO	YES	YES	NO	NO	YES	YES	YES	YES
33	USA	Center for Research and Reform in Education (CRRE) at Johns Hopkins University School of Education	Best Evidence Encyclopedia	X				NO	NO	YES	NO	NO	NO	YES	NO	NO	NO
34	USA	Center for Research and Reform in Education (CRRE) at Johns Hopkins University School of Education	Evidence for ESSA (Every Student Succeeds Act )	X	X			NO	YES	YES	NO	YES	NO	YES	NO	NO	NO
35	USA	Society for Prevention Research	Standards of Evidence for Efficacy, Effectiveness, and Scale-up Research in Prevention Science.	X				YES	NO	YES	YES	YES	YES	YES	YES	YES	YES
36	USA	U.S. Department of Health and Human Services	Evidence Based Teen Pregnancy Programs	X				NO	NO	YES	NO	NO	NO	YES	YES	NO	NO
37	USA	National Institute of Justice	Crimesolutions	X	X			YES	YES	YES	YES	NO	NO	YES	NO	NO	NO
38	USA	Arnold Ventures	Social Programmes That Work	X		X		NO	YES	YES	YES	NO	NO	NO	NO	NO	NO
39	USA	Child Trends	What Works for Child and Youth Development	X				NO	NO	YES	YES	NO	NO	NO	NO	NO	NO
40	USA	Washington State Institute for Public Policy (WSIPP)	Washington State Institute for Public Policy Benefit -Cost Results	X				NO	NO	YES	YES	YES	YES	NO	NO	NO	NO
41	USA	U.S Department of Justice	Office of Juvenile Justice and Delinquency Prevention Model Programs Guide	X		X		YES	NO	YES	YES	YES	NO	YES	NO	YES	NO
42	USA	University of Wisconsin Population Health Institute’s	What Works for Health	X	X			NO	YES	YES	NO	NO	NO	NO	NO	NO	NO
43	USA	U.S. Department of Health and Human Services	The Agency for Healthcare Research and Quality (AHRQ)	X		X	X	NO	YES	YES	NO	YES	NO	YES	NO	NO	YES
44	USA	U.S. Department of Labor	Clearinghouse for Labor Evaluation and Research (CLEAR)	X		X		NO	YES	YES	NO	YES	NO	YES	NO	NO	YES
45	USA	Clearinghouse for Military Family Readiness	Continuum of evidence	X		X		NO	YES	YES	YES	YES	NO	YES	NO	NO	NO
46	USA	Suicide Prevention Resource Center	Evidence-Based Practices Project	X		X		YES	NO	YES	NO	YES	NO	YES	YES	NO	NO
47	USA	National Implementation Research Network	The Hexagon: An Exploration Tool	X		X		YES	YES	YES	YES	YES	NO	YES	YES	YES	NO
48	USA	National Dropout Prevention Center	Model Programs Database	X		X		NO	YES	YES	YES	YES	NO	YES	NO	NO	NO
49	USA	National Cancer Institute	Research-Tested Intervention Programs (RTIPs)	X		X		YES	YES	YES	NO	YES	YES	YES	YES	NO	NO
50	USA	Strengthening Families Evidence Review	Standards of evidence	X		X		NO	NO	YES	NO	NO	NO	NO	NO	NO	YES

#	Organisation	Framework	Use of an assessment approach or scale
1	Be you	The Be You Programs Directory	No. Programs must: align with one or more of the five professional learning domains (Mentally Healthy Communities, Family Partnerships, Learning Resilience, Early Support, Responding Together) align with the Australian Curriculum or National Quality Framework be supported by a training/delivery/implementation manual or guide be offered as more than a one-off session (i.e., offer multiple, sequential sessions which, either as a set series of sessions or on an as-needs basis) be targeted at one of the following audiences as the intended beneficiary, for example: children, young people, parents, carers or families; early childhood educators, Out of Hours School Care have at least one research or evaluation study which demonstrates: a positive impact on mental health outcomes for children or young people a minimum of 20 participants in the study who received the program at least pre and post testing conducted on the group that received the program.
2	ARACY Australian Research Alliance for Children and Youth	Nest What Works for Kids (WW4K)	Yes. Well supported No evidence of risk or harm. If there have been multiple studies, the overall evidence supports the benefit of the program. Clear baseline and post-measurement of outcomes for both conditions. At least two RCTs have found the program to be significantly more effective than comparison group. Effect was maintained for at least one study at one-year follow-up. Supported No evidence of risk or harm. If there have been multiple studies, the overall evidence supports the benefit of the program. Clear baseline and post-measurement of outcomes for both conditions. At least one RCT has found the program to be significantly more effective than comparison group. Effect was maintained at 6-month follow-up. Promising No evidence of risk or harm. If there have been multiple studies, the overall evidence supports the benefit of the program. Clear baseline and post-measurement of outcomes for both conditions. At least one study using some form of contemporary comparison group demonstrated some improvement outcomes for the intervention but not the comparison group. Emerging No evidence of risk or harm. There is insufficient evidence demonstrating the program’s effect on outcomes because: ◦the designs are not sufficiently rigorous (i.e. they do not meet the criteria of the above programs), or ◦the results of rigorous studies are not yet available.
3	Public Health Agency of Canada	Canadian Best Practices Portal	No. Promising Practices: A Promising Practice is defined as an intervention, program, service, or strategy that shows potential (or “promise”) for developing into a best practice. Promising practices are often in the earlier stages of implementation, and as such, do not show the high level of impact, adaptability, and quality of evidence as best practices. However, their potential is based on a strong theoretical underpinning to the intervention. Aboriginal Ways Tried and True: Aboriginal ‘Ways Tried and True’ (WTT) refers to successful practices implemented in First Nations, Inuit, and Métis contexts to address local challenges. Success is measured not only by effectiveness, but also by how the intervention was designed and carried out. Interventions are intended to inspire and support public health practitioners, program developers, evaluators, and others by sharing information on programs and processes that have worked in Aboriginal contexts. Best Practices: A Best Practice is defined as an intervention, program, or initiative that has, through multiple implementations, demonstrated: high impact (positive changes related to the desired goals), high adaptability (successful adaptation and transferability to different settings), and high quality of evidence (excellent quality of research/evaluation methodology, confirming the intervention’s high impact and adaptability evidence).
4	McMaster University	Health Evidence	Yes. Strong: Reviews with a score of 8 or higher in the Yes column Moderate: Reviews with a score between 5-7 in the Yes column Weak: Reviews with a score of 4 or less in the Yes column
5	European Commission	EU-Compass for Action on Mental Health and Well-being	Yes, although not numbered. Detailed criteria around three issues: Exclusion Criteria assess the following aspects: Relevance Intervention Characteristics Evidence and Theory base Ethical aspects Core criteria assess the following aspects: Effectiveness Efficiency Equity Qualifier criteria assess the following aspects: Transferability Sustainability Participation Intersectoral collaboration
6	European Platform for Investing in Children	Evidence Based Practices	Yes. Criteria to determine the evidence level are organised according to three categories: Evidence of effectiveness: Comparison group + Evaluation utilises at the minimum pre/post design with appropriate statistical adjustments employed in order to control for selection + +: Study design uses a convincing comparison group to identify practice impacts, including randomised-control trial (experimental design) or some quasi-experimental designs Statistical significance + Significant (p<0.1), positive results are shown on at least one relevant outcome + + Significant (p<0.05), positive results are shown on at least one relevant outcome Effect size + No requirement + + Effect size of at least 10% of a standard deviation. Sample size + Sample size of at least 20 in each group + + Sample size of at least 50 in each group Outcomes + Outcomes are directly or indirectly related to outcomes identified in topic definitions + No significant negative outcomes reported (excluding those negative outcomes that might be due to chance) + + No significant negative outcomes reported (excluding those negative outcomes that might be due to chance) + + Outcomes are directly related to outcomes identified in topic definitions + + Outcome assessments have been validated, where applicable + + Outcome assessments conducted at baseline and follow-up, where applicable Attrition + No requirement + + Attrition is less than 25% or has been accounted for using an acceptable procedure, where applicable Location: At least one evaluation that meets the above criteria must have been conducted within EU member state(s) Transferability Replication + Practice has been evaluated in at least one additional population beyond the original study population** (broadly defined) in such a way that at least meets the basic criteria for internal validity as specified in the evidence of effectiveness criteria (e.g. significant positive results for at least one outcome are found, uses a comparison group, etc.) + +Same requirements as for + but in addition the practice has been found to be cost-effective/cost-beneficial (i.e. the practice can deliver positive impact at a reasonable cost) Practice materials: Practice materials (curriculum, etc.) are available, or documentation is sufficient, such that program can be replicated Enduring impact Follow-up conducted An evaluation of the practice which meets the basic criteria for inclusion has conducted a follow-up of at least 2 years, and continues to find positive (p<0.1) and direct impact on at least one outcome Evidence-based practices on this site are assigned one of three evidence levels: •Emergent Practice: An “emergent practice” has achieved at least a + in “evidence of effectiveness.” •Promising Practice: A “promising practice” has achieved at least a + in “evidence of effectiveness” and a + in at least one of the other two categories, “transferability” and “enduring impact.” •Best Practice: A “best practice” has achieved at least a + in each of the three evidence categories, including “evidence of effectiveness”, “transferability” and “enduring impact.”
6	European Platform for Investing in Children	Evidence Based Practices		7	EMCDDA European Monitoring Centre for Drugs and Drug Addiction	European drug prevention quality standards	No, it is an eight-stage project cycle with cross-cutting considerations. Organised in an eight-stage project cycle, the Standards cover the following areas: Stage 1: Needs assessment Stage 2: Resource assessment Stage 3: Programme formulation Stage 4: Intervention design Stage 5: Management and mobilisation of resources Stage 6: Delivery and monitoring Stage 7: Final evaluations Stage 8: Dissemination and improvement Cross-cutting considerations are relevant for each project stage and are therefore placed in the centre of the project cycle. These Standards relate to: (A) sustainability and funding, (B) communication and stakeholder involvement, (C) staff development, (D) the ethics of drug prevention.
8	EMCDDA European Monitoring Centre for Drugs and Drug Addiction	Best practice portal	Yes, Evidence ratings The available information on the effects of specific interventions are examined and then ranked them as described below. Beneficial: Interventions for which precise measures of the effects in favour of the intervention were found in the systematic reviews of randomised controlled trials (RCTs), and that were recommended in guidelines with reliable methods for assessing evidence (such as GRADE). An intervention ranked as ‘beneficial’ is suitable for most contexts. Likely to be beneficial: Interventions that were shown to have limited measures of effect, that are likely to be effective but for which evidence is limited, and/or those that are recommended with some caution in guidelines with reliable methods for assessing evidence (such as GRADE). An intervention ranked as ‘likely to be beneficial’ is suitable for most contexts, with some discretion. Trade-off between benefits and harms: Interventions that obtained measures of effects in favour of harm reduction and/or are recommended in guidelines with reliable methods for assessing evidence (such as GRADE), but that showed some limitations or unintended effects that need to be assessed before providing them. Unknown effectiveness: Interventions for which there are not enough studies or where available studies are of low quality (with few patients or with uncertain methodological rigour), making it difficult to assess if they are effective or not. Interventions for which more research should be undertaken are also grouped in this category. Evidence of ineffectiveness: Interventions that gave negative results if compared with a standard intervention, for example. Quality of evidence:* High quality evidence— one or more up-to-date systematic reviews that include high-quality primary studies with consistent results. The evidence supports the use of the intervention within the context in which it was evaluated. Moderate quality evidence— one or more up-to-date reviews that include a number of primary studies of at least moderate quality with generally consistent results. The evidence suggests these interventions are likely to be useful in the context in which they have been evaluated but further evaluations are recommended. Low quality evidence— where there are some high or moderate quality primary studies but no reviews available OR there are reviews giving inconsistent results. The evidence is currently limited, but what there is shows promise. This suggests these interventions may be worth considering, particularly in the context of extending services to address new or unmet needs, but should be evaluated.
9	Crime Prevention Council of Lower Saxony	Green List Prevention	Yes. Ratings of both programmes and evaluations. Programme ratings Level 1: Theoretically well grounded. Detailed criteria on the Conceptual Quality, Implementation Quality and Evaluation Quality Level 2: Probable Effectiveness Level1 and at least one evaluation study 1 to 3 stars with (predominantly) positive results. Level 3: Proven Effectiveness Level 1 and at least one evaluation study 4 or 5 stars with (predominantly) positive results and at least sufficient conclusiveness. Ratings of evaluations *** Five Stars Randomized Controlled Trial (RCT) with follow-up (not less than 6 month, also below) Four Stars Quasi-Experimental Design (QED) with follow-up * Three Stars RCT without follow-up, QED without follow-up. ** Two Stars “Clinical” RCT or QED with or without follow-up (not in routine context). Pre-post assessment with control-group(s) in routine context * One Star Benchmark / Norm-reference-study, Theory of Change – study No stars Participant-satisfaction assessment, Pre-post assessment without control-group, Goal-attainment study, Quality-assurance-study.
10	SUPERU	A Rating Scale for New Zealand	Yes, there are two scales. The strength of evidence scale: Level 0 - a pilot of a new initiative. Level 1 - Intervention is in its early stages of implementation, or planned but not yet implemented. This intervention’s evidence base will be built over time. Level 2 - Typically, this intervention has been in operation for around one to three years. It has met all level 1 criteria and has been evaluated at least once. The evaluation indicates some effect, but it may not yet be possible to directly attribute outcomes to it. This intervention’s evidence base will continue to be built over time. Level 3 - Typically, this intervention has been in operation for around three to 10 years. It has an established design which is consistently implemented, and quality assurance procedures are in place. It has met all the level 2 criteria, plus it has at least one evaluation that provides evidence about impact. It also has some information available that will help with implementation in new contexts. Level 4 - Typically, this intervention has been in operation for around eight years or longer and is large scale or high risk, justifying extra evaluation effort. It has met all the level 3 criteria, plus it has been replicated at least once. It has been evaluated at least twice and the evaluations provide strong evidence about effectiveness and impact, insights into how the intervention causes change, what works well or less well for different participants, and cost-benefit. There is support for implementation in new contexts. The effectiveness scale: Beneficial Mixed effects No effect Harmful Not applicable
11	Education counts	Best Evidence Synthesis Iteration	Yes, although not numbered. To evaluate the evidence, they consider: which outcomes have been considered and how the evidence has linked influences to outcomes; whether the research design was appropriate to address the research question; whether there are untenable assumptions framing the study; the nature of the data whether the research is robust according to the method followed who has been included in the sample or focus of the research the role of context credibility; validity; verifiability; confidence in the findings (internal validity); how issues of causation have been addressed; how issues of agency and change processes have been addressed; insights about what did and did not work; how the findings illuminate influences on lower and higher achieving learners, and how issues of diversity are implicated in the educational processes or influences considered; the range of findings about what did and did not influence learner outcomes, and the potential for making inter-links with these across other studies are considered; degree of applicability to New Zealand contexts; and specificity or generalisability of findings, with cautious attention to the significance of context.
12	Prevención basada en la evidencia	Criterios de selección de programas	Yes. (Original in Spanish)) **Strong: Well-evaluated programs whose effect have been demonstrated through different studies. Moderate: Programs that having proven to be effective require more research to show that their effects maintain at long term. Low: Programs whose effectiveness is not sufficiently demonstrated and it is necessary to investigate more about it to know the usefulness of the program. Very Low/Not evidence: There is no evidence, or it is indirect, insufficient or contradictory according to the studies carried out with the program.
13	Darlington Service Design Lab	Standards of evidence	Yes, although the scale is not numbered. Within each of the four dimensions there are sub-categories which rank the intervention's evidence as "good enough" or "best". 1) Evaluation Quality;• Have been subjected to an evaluation that compares outcomes for children receiving the intervention with children with the same needs who do not receive the intervention;• Ideally, have been independently evaluated using a well–executed randomised controlled trial. 2) Impact; A positive effect size, a standard measure of impact that provides comparable data regardless of the outcomes assessed; No harmful effects or negative side–effects of the intervention. 3) Intervention Specificity; Who is being served; What impact on which aspects of children’s health and development will be achieved; The reason – the logic behind – why the intervention will achieve the outcome. 4) System Readiness Having a clear indication of unit cost and staffing requirements; Explicit processes to measure the fidelity of implementation and to address common implementation problems.
14	Project Oracle	Standards of evidence	Yes. Standard 1: We know what we want to achieve- Theory of Change and Evaluation Plan Standard 2: We have seen there is a change - Indication of impact Standard 3: We believe the change is caused by us - Evidence of impact Standard 4: We know why and how the change happened, this works elsewhere - Model ready Standard 5: We know why and how the change happened, this works everywhere - System ready
15	Early Intervention Foundation	The Guidebook	Yes. Level 4 recognises programmes with evidence of a long-term positive impact through multiple high-quality evaluations. Level 3 recognises programmes with evidence of a short-term positive impact from at least one high-quality evaluation. Level 2 recognises programmes with preliminary evidence of improving a child outcome, but where an assumption of causal impact cannot be drawn. NL2 (not level 2) distinguishes programmes whose most robust evaluation evidence does not meet the Level 2 threshold for a child outcome, so do not yet have direct evidence about the scale of impact of the programme at a ‘preliminary’ level. NE (found not to be effective in at least one rigorously conducted study) is reserved for programmes where there is evidence from a high-quality evaluation of the programme that it did not provide significant benefits for children. This rating should not be interpreted to mean that the programme will never work, but it does suggest that the programme will need to adapt and improve its model, learning from the evaluation.
16	Nesta	Standards of evidence	Yes. A 1 to 5 scale. 1. You can give an account of impact. 2. You are gathering data that shows some change amongst those using or receiving your intervention. 3. You can demonstrate that your intervention is causing the impact, by showing less impact amongst those who don’t receive the product/service. 4. You can explain why and how your intervention is having the impact that you have observed and evidenced so far. An independent evaluation validates the impact. In addition, the intervention can deliver impact at a reasonable cost, suggesting that it could be replicated and purchased in multiple locations. 5. You can show that your intervention could be operated by someone else, somewhere else, whilst continuing to have positive and direct impact on the outcome, and whilst remaining a financially viable proposition.
17	Bond	Evidence Principles	Yes. Principles 1. Voice and inclusion: the perspectives of people living in poverty, including the most marginalised, are included in the evidence, and a clear picture is provided of who is affected and how: 2. Appropriateness: the evidence is generated through methods that are justifiable given the nature of the purpose of the enquiry 3. Triangulation: the evidence has been generated using a mix of methods, data sources, and perspectives. 4. Contribution: the evidence explores how change happens, the contribution of the intervention and factors outside the intervention in explaining change: 5. and Transparency: the evidence discloses the details of the data sources and methods used, the results achieved, and any limitations in the data or conclusions. Each of the five principles has four questions and each question can be answered on a scale of 1-4. Scores for each of the questions are then added up and an overall score for the principles out of 16 is provided. Depending on the score, the principle is then assigned to a scale: 1) weak, 2) minimum standard, 3) good standard 4) gold standard.
18	Centre for Analysis of Youth Transitions (CAYT)	Standards of evidence	Yes. Assessing impact grades (Score 0-4), they consider: a) Reach: the extent to which the programme attracts its intended audience and b) Significance: the effect that the programme is having on young people to influence health and wellbeing. Level of evidence grades (Score 0-7) 0. Basic 1. Descriptive, anecdotal, expert opinion 2. Study where a statistical relationship (correlation) between the outcome and receiving services is established 3. Study which accounts for when the services were delivered by surveying before and after 4. Study where there is both a before and after evaluation strategy and a clear comparison between groups who do and do not receive the youth services 5. As above but in addition includes statistical modelling to produce better comparison groups and of outcomes to allow for other differences across groups 6. Study where intervention is provided on the basis of individuals being randomly assigned to either 7. the treatment or the control group. 8. Various studies that evaluate an intervention which has been provided through random allocation at the individual level. Overall Programme Performance (Score 0-4)
19	What Works Centre for Local Economic Growth	The Maryland Scientific Methods Scale (SMS)	Yes. Level 1: Either (a) a cross-sectional comparison of treated groups with untreated groups, or (b) a before-and-after comparison of treated group, without an untreated comparison group. No use of control variables in statistical analysis to adjust for differences between treated and untreated groups or periods. Level 2: Use of adequate control variables and either (a) a cross-sectional comparison of treated groups with untreated groups, or (b) a before-and-after comparison of treated group, without an untreated comparison group. In (a), control variables or matching techniques used to account for cross-sectional differences between treated and control groups. In (b), control variables are used to account for before-and-after changes in macro-level factors. Level 3: Comparison of outcomes in treated group after an intervention, with outcomes in the treated group before the intervention, and a comparison group used to provide a counterfactual (e.g. difference in difference). Justification given to choice of comparator group that is argued to be similar to the treatment group. Evidence presented on comparability of treatment and control groups. Techniques such as regression and (propensity score matching may be used to adjust for difference between treated and untreated groups, but there are likely to be important unobserved differences remaining. Level 4: Quasi-randomness in treatment is exploited, so that it can be credibly held that treatment and control groups differ only in their exposure to the random allocation of treatment. This often entails the use of an instrument or discontinuity in treatment, the suitability of which should be adequately demonstrated and defended. Level 5: Reserved for research designs that involve explicit randomisation into treatment and control groups, with Randomised Control Trials (RCTs) providing the definitive example. Extensive evidence provided on comparability of treatment and control groups, showing no significant differences in terms of levels or trends. Control variables may be used to adjust for treatment and control group differences, but this adjustment should not have a large impact on the main results. Attention paid to problems of selective attrition from randomly assigned groups, which is shown to be of negligible importance. There should be limited or, ideally, no occurrence of ‘contamination’ of the control group with the treatment.
20	Big Lottery Fund's Realising Ambition Programme	The confidence review	No. "The Confidence Framework addresses the five dimensions that Realising Ambition assessed as being essential for effective replication – service design, service delivery, ability to monitor impact, ability to determine benefit and the prospects for sustainability"'
21	Education Endowment Foundation	Teaching and Learning Toolkit	Yes. Security of evidence criteria: In term of : Quantity and type of study; Outcomes,; Causal inference; Consistency requirements; Effect Size requirements (from Four padlocks) Ranking 1. Very limited: One padlock: Single studies with quantitative evidence of impact with effect size data reported or calculable (such as from randomised controlled trials, well-matched experimental designs, regression discontinuity designs, natural experiments with appropriate analysis); and/or observational studies with correlational estimates of effect related to the intervention or approach; but no publically available meta-analyses. 2. Limited: Two padlocks: At least one publically available meta-analysis 3. Moderate: Three padlocks: Two or more publically available meta-analyses which meet the following criteria: they have explicit inclusion and search criteria, risk of bias discussed, and tests for heterogeneity reported. They include some exploration of methodological features such as research design effects or sample size. 4. Extensive: Four padlocks: Three or more meta-analyses which meet the following criteria: they have explicit inclusion and search criteria, risk of bias discussed, and tests for heterogeneity reported. They include some exploration of the influence of methodological features such as research design effects or sample size on effect size. The majority of included studies should be from school or other usual settings. 5. Very Extensive: Five padlocks: Three or more meta-analyses which meet the following criteria: They have explicit inclusion and search criteria, risk of bias discussed, and tests for heterogeneity reported. They include some exploration of the influence of methodological features such as research design effects or sample size on effect size. The majority of included studies should be from school or other usual settings.
22	HACT-Ideas and Innovation in Housing	Standards of evidence	No, evidence should be assessed in a seven-step process. 1) Describe; 2) Design; 3) Proceed; 4) Plan; 5) Protocol; 6) Study; 7) Findings They establish the Purpose, limitations and intended usage evidence at different levels (Standard for Producing Evidence – Effectiveness of Interventions –Part 1: Specification, page 18) Level 1:Exploration and Development Level 2: Effectiveness Level 3: Scaling-up
23	Conservation evidence	What Works in Conservation	Yes, although not numbered 1. Experts are asked to read the summarized evidence in the synopsis and then score to indicate their assessment of the following: 2. Effectiveness: 0 = no effect, 100% = always effective. 3. Certainty of the evidence: 0 = no evidence, 100% = high quality evidence; complete certainty. This is certainty of effectiveness of intervention, not of harms. 4. Harms: 0 = none, 100% = major negative side-effects to the group of species/habitat of concern. 5. The median score from all the experts’ assessments is calculated for the effectiveness, certainty and harms for each intervention. 6. Effectiveness categorization is based on these median values (i.e. on a combination of the size of the benefit and harm and the strength of the evidence), as and listed as follow: a) Beneficial b) Likely to be beneficial c) Trade-offs between benefits & harms d) Unknown effectiveness e) Unlikely to be beneficial f) f. Likely to be ineffective or harmful
24	What Works Centre for Children’s Social Care	Evidence Standards	Yes. Overall effectiveness: looking at the consistency of effect across different research studies Negative effect: The balance of evidence suggests that the intervention has a negative effect (meta-analysis OR most of the studies) Mixed or no effect: The balance of evidence (including the pooled effect size from meta-analysis where available) suggests that the intervention has no effect overall, or studies show a mixture of effects. Tends to positive effect: The balance of evidence suggests that the intervention has a positive effect. There are one or more studies showing a negative effect, but either there was a meta-analysis OR most of the studies that showed a positive effect. Consistently positive effect: Most published studies have positive effects and none have negative effects for this outcome. Some individual studies may show no effect. However, either the pooled effect (in a meta-analysis) or most studies AND the studies involving most of the participants have a positive effect. Strength of evidence: looking at how confident we can be about a finding, based on how the research was designed and carried out. They overall framework is provided by the EMMIE system. They have adapted the EMMIE-Q to provide a four-point rating for strength of evidence. 0. Very low strength evidence: No acceptable quality studies 1. Low strength evidence: One or two acceptable quality studies 2. Moderate strength evidence: Three or more acceptable quality studies. High quality review therefore possible. Between 0-3 EMMIE-Q requirements are met. 3. High strength evidence: Three or more acceptable quality studies. High quality review therefore possible. Between 4-6 EMMIE-Q requirements are met including all themes marked* (see below). An acceptable quality study must have the following characteristics (definition used by the Early Intervention Foundation-EIF): 1. The sample is sufficiently large to test for the desired impact (e.g. a minimum of 20 participants in the treatment group AND comparison group). 2. The study must use valid measures. These measures should reliable, standardised and validated independently of the study. 3. Comparability of groups is addressed in selection and/ or analysis. 4. An ‘intent-to-treat’ design is used. 5. The study should report on overall and differential attrition. EMMIE-Q requirements 1. A transparent and well-designed search strategy* 2. High statistical conclusion validity (at least four of the following are necessary for a study to be considered sufficient)* (a) Calculation of appropriate effect sizes (b) The analysis of heterogeneity (c) Use of a random effects model where appropriate (d) Attention to the issue of dependency (e) Appropriate weighting of individual effect sizes in the calculation of mean effect sizes 3. Sufficient assessment of the risk of bias 4. Attention to the validity of the constructs, with only comparable outcomes combined and/or exploration of the implications of combining outcome constructs* 5. Assessment of the influence of study design (e.g. separate overall effect sizes for experimental and quasi-experimental design) 6. Assessment of the influence of unanticipated outcomes or spin-offs on the size of the effect (e.g. quantification of displacement or diffusion of benefit) Requirements 1-4 (highlighted by *) are considered particularly important, and are required for any review to achieve a rating of 3, which is the highest rating in the scale.
25	What Works Centre for Wellbeing	GRADE (Grading of Recommendations Assessment, Development and Evaluation)	Yes. High quality: Further research is very unlikely to change our confidence in the estimate of effect Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate Very low quality: Any estimate of effect is very uncertain
26	What Works Centre for Crime Reduction	EMMIE Framework	Yes. EMMIE rates systematic review evidence against five dimensions: Effect: focuses on whether the evidence suggests the intervention led to an increase, decrease or had no impact on crime.; Mechanisms: focuses on what it is about the intervention that could explain its effect; Moderators: focuses on the circumstances and contexts in which the intervention is likely (or unlikely) to work ; Implementation: focuses on the conditions that should be considered when implementing the intervention ; Economic cost: focuses on the costs associated with the intervention, both direct and indirect, and whether there is any evidence of cost-benefit.
27	Campbell Collaboration	Campbell Collaboration Systematic Reviews: Policies and Guidelines	No
28	Community Preventive Services Task Force (CPSTF)	The Community Guide	Yes, although not numbered. The CPSTF uses the terms below to describe its findings. Recommended The systematic review of available studies provides strong or sufficient evidence that the intervention is effective. The categories of "strong" and "sufficient" evidence reflect the degree of confidence the CPSTF has that an intervention has beneficial effects. They do not directly relate to the expected magnitude of benefits. The categorization is based on several factors, such as study design, number of studies, and consistency of the effect across studies. Recommended Against The systematic review of available studies provides strong or sufficient evidence that the intervention is harmful or not effective. Insufficient Evidence The available studies do not provide sufficient evidence to determine if the intervention is, or is not, effective. This does NOT mean that the intervention does not work. It means that additional research is needed to determine whether or not the intervention is effective.
29	U.S. Department of Health & Human Services	Home Visiting Evidence of Effectiveness	Yes. HomVEE assigns a rating of high, moderate, or low to each effectiveness study according to the quality of causal evidence it provides. High rating: Only randomized controlled trials (RCTs) can receive a high rating. RCTs receive a high rating if they do not have substantial design problems and if they control for differences in baseline characteristics and baseline measures. Moderate rating: RCTs receive a moderate rating if they do not have substantial design problems but did not control for differences in the pre-program characteristics. Quasi-experimental design studies (QEDs) receive a moderate rating if they demonstrate that there were no differences between the treatment and comparison groups in their baseline characteristics and baseline measures, and if they control for baseline measures. Low rating: RCTs and QEDs that do not meet the criteria for high or moderate ratings receive a low rating Assessing evidence of effectiveness To meet HHS’ criteria for an “evidence-based early childhood home visiting service delivery model,” models must meet at least one of the following criteria: At least one high- or moderate-quality impact study of the model finds favourable, statistically significant impacts in two or more of the eight outcome domains; or At least two high- or moderate-quality impact studies of the model using non-overlapping analytic study samples find one or more favourable, statistically significant impacts in the same domain. In both cases, the impacts considered must either (1) be found for the full sample or (2) if found for subgroups but not for the full sample, be replicated in the same domain in two or more studies using non-overlapping analytic study samples. For results from single-case designs to be considered toward the HHS criteria, three additional requirements must be met: At least five studies examining the intervention meet the WWC’s pilot single-case design standards without reservations or standards with reservations (equivalent to a “high” or “moderate” rating in HomVEE, respectively). The single-case designs are conducted by at least three research teams with no overlapping authorship at three institutions. The combined number of cases is at least 20. The HomVEE team examined and reported other aspects of the evidence for each model based on all high- and moderate-quality studies available, including the following: Quality of Outcome Measures. Replication of Impacts Subgroup Findings Unfavourable or Ambiguous Impacts. Evaluator Independence • Magnitude of Impacts (HomVEE reported effect sizes when possible)
30	What Works Clearinghouse	Find What Works from Systematic Reviews	Yes. a) The results are sorted by evidence of effectiveness: *Effectiveness Rating Key:* it is based on the quality of research, the statistical significance of findings, the magnitude of findings, and the consistency of findings across studies Positive: strong evidence that intervention had a positive effect on outcomes. Potentially Positive: evidence that intervention had a positive effect on outcomes with no overriding contrary evidence. Mixed: evidence that intervention’s effect on outcomes is inconsistent. No Discernible: no evidence that intervention had an effect on outcomes. Negative: strong evidence that intervention had a negative effect on outcomes b) The program also lets you compare interventions (Max. 5 interventions). It will allow you to see basic information for each intervention, such as grades examined, program type, delivery method, and the effectiveness rating. c) For single-case design research, the WWC rates the effectiveness of an intervention in each domain based on the quality of the research design and the consistency of demonstrated effects.
31	Center for the Study and Prevention of Violence	Blueprints	Yes. Blueprints considers four criteria: Evaluation quality – Can we be confident in a program’s evaluation Intervention impact – How much positive change in key developmental outcomes can be attributed to the intervention Intervention specificity – Is the intervention focused, practical, and logical Dissemination readiness– Does the program have the necessary support and information to be successfully implemented Program Criteria: Promising Programs meet the following standards: Intervention specificity: The program description clearly identifies the outcome the program is designed to change, the specific risk and/or protective factors targeted to produce this change in outcome, the population for which it is intended, and how the components of the intervention work to produce this change. Evaluation quality: The evaluation trials produce valid and reliable findings. This requires a minimum of (a) one high quality randomized control trial or (b) two high quality quasi-experimental evaluations. Intervention impact: The preponderance of evidence from the high quality evaluations indicates significant positive change in intended outcomes that can be attributed to the program and there is no evidence of harmful effects. Dissemination readiness: The program is currently available for dissemination and has the necessary organizational capability, manuals, training, technical assistance and other support required for implementation with fidelity in communities and public service systems. European programs have not undergone the Blueprints certification process to determine dissemination readiness. Model Programs meet these additional standards: Evaluation Quality: A minimum of (a) two high quality randomized control trials or (b) one high quality randomized control trial plus one high quality quasi-experimental evaluation. Positive intervention impact is sustained for a minimum of 12 months after the program intervention ends. Model Plus Programs meet one additional standard: Independent Replication: In at least one high quality study demonstrating desired outcomes, authorship, data collection, and analysis has been conducted by a researcher who is neither a current or past member of the program developer's research team and who has no financial interest in the program.
32	California Department of Social Services (CDSS)	California Evidence-Based Clearinghouse for Child Welfare	Yes. Scientific Rating Scale 1. Well-Supported by Research Evidence Multiple Site Replication and Follow-up 2. Supported by Research Evidence Randomized Controlled Trial and Follow-up: 3. Promising Research Evidence At least one study utilizing some form of control (e.g., untreated group, placebo group, matched wait list study) has been established 4. Evidence Fails to Demonstrate Effect Two or more randomized controlled trials (RCTs) have found the practice has not resulted in improved outcomes, when compared to usual care. The studies have been reported in published, peer-reviewed literature. 5. Concerning Practice If multiple outcome studies have been conducted, the overall weight of evidence suggests the intervention has a negative effect upon clients served; and/or NR. Not able to be Rated on the CEBC Scientific Rating Scale Measurement Tools Rating Scale: based on the level of psychometrics (e.g., sensitivity and specificity, reliability and validity) found in published, peer-reviewed journals A - Psychometrics Well-Demonstrated: 2 or more published, peer-reviewed studies have established the measure’s psychometrics. B - Psychometrics Demonstrated: 1 published, peer-reviewed study has established the measure’s psychometrics. C - Does Not Reach Acceptable Levels of Psychometrics: A preponderance of published, peer-reviewed studies have shown that the measure does not reach acceptable levels of psychometrics. NR - Not Able to Be Rated: Published peer-reviewed studies demonstrating the measure’s psychometrics are not available. Sensitivity: a measure of how well a test identifies people with a specific disease or problem Specificity: a measure of how well a test excludes people without a specific disease or problem Reliability: the extent to which the same result will be achieved when repeating the same measure or study again Validity: the degree to which a result is likely to be true and free of bias.
33	Center for Research and Reform in Education (CRRE) at Johns Hopkins University School of Education	Best Evidence Encyclopedia	Yes. Strong Evidence of Effectiveness: At least one large randomized or randomized quasi-experimental study and one additional large qualifying study, or multiple smaller studies, with a combined sample size of 500 and an overall weighted mean effect size of at least +0.20. Moderate Evidence of Effectiveness: Two large matched studies, or multiple smaller studies with a collective sample size of 500 students, with a weighted mean effect size of at least +0.20. Limited Evidence of Effectiveness: Strong Evidence of Modest Effects: Studies meet the criteria for “Moderate Evidence of Effectiveness” except that the weighted mean effect size is +0.10 to +0.19. Limited Evidence of Effectiveness: Weak Evidence with Notable Effect: A weighted mean effect size of at least +0.20 based on one or more qualifying studies insufficient in number or sample size to meet the criteria for “Moderate Evidence of Effectiveness”.
34	Center for Research and Reform in Education (CRRE) at Johns Hopkins University School of Education	Evidence for ESSA (Every Student Succeeds Act )	Yes. The organization recognizes four levels of evidence. The top three levels require findings of a statistically significant effect on improving student outcomes or other relevant outcomes. Strong evidence: At least one well-designed and well-implemented experimental (i.e., randomized) study. Moderate evidence: At least one well-designed and well-implemented quasi-experimental (i.e., matched) study. Promising evidence: At least one well-designed and well-implemented correlational study with statistical controls for selection bias. The fourth level is a program or practice that does not yet have evidence qualifying for the top 3 levels, and can be considered evidence-building and under evaluation.
35	Society for Prevention Research	Standards of Evidence for Efficacy, Effectiveness, and Scale-up Research in Prevention Science.	Yes, although not numbered. Standards for Efficacy Standards for Effectiveness Standards for Scaling Up of Evidence-Based Interventions
36	U.S. Department of Health and Human Services	Evidence Based Teen Pregnancy Programs	Yes. Study quality rating: In terms of study design, Attrition, Baseline equivalence, Reassignment, Confounding factors. 1. High 2. Moderate 3. Low All impact studies meeting the criteria for a high or moderate study quality rating are considered eligible for providing credible evidence of program impacts. The program’s evidence of effectiveness (by domain) is classified as 1. Positive impacts: Evidence of uniformly favourable impacts across one or more outcome measures, analytic samples (full sample or subgroups), and/or studies. 2. Mixed impacts: Evidence of a mix of favourable, null, and/or adverse impacts across one or more outcome measures, analytic samples (full sample or subgroups), and/or studies. 3. Indeterminate impacts: Evidence of uniformly null impacts across one or more outcome measures, analytic samples (full sample or subgroups), and/or studies. 4. Negative impacts: Evidence of uniformly adverse impacts across one or more outcome measures, analytic samples (full sample or subgroups), and/or studies.
37	National Institute of Justice	Crime solutions	Yes. a) Programs undergo an eight-step review and evidence-rating process. b) Practices undergo a seven-step review and evidence-rating process c) Then they address program and practice evaluations in an evidence continuum with two axes: (1) Effectiveness and (2) Strength of Evidence. Effectiveness is determined by the outcomes of an evaluation in relation to the goals of the program or practice. Strength of evidence for programs is determined by the rigor and design of the outcome evaluation, and by the number of evaluations. Rated as Effective: Programs and practices have strong evidence to indicate they achieve criminal justice, juvenile justice, and victim services outcomes when implemented with fidelity. Rated as Promising: Programs and practices have some evidence to indicate they achieve criminal justice, juvenile justice, and victim services outcomes. Included within the promising category are new, or emerging, programs for which there is some evidence of effectiveness. Inconclusive Evidence: Programs and practices that made it past the initial review but, during the full review process, were determined to have inconclusive evidence for a rating to be assigned. Rated as No Effects: Programs have strong evidence indicating that they had no effects or had harmful effects when implemented with fidelity
38	Arnold Ventures	Social Programmes that Work	Yes. Suggestive tier: Programs that have been evaluated in one or more well-conducted RCTs (or studies that closely approximate random assignment) and found to produce sizable positive effects, but whose evidence is limited by only short-term follow-up, effects that fall short of statistical significance, or other factors. Such evidence suggests the program may be an especially strong candidate for further research, but does not yet provide confidence that the program would produce important effects if implemented in new settings. Near top tier: Programs shown to meet almost all elements of the Top Tier standard, and which only need one additional step to qualify. This category primarily includes programs that meet all elements of the Top Tier standard in a single study site, but need a replication RCT to confirm the initial findings and establish that they generalize to other sites. This is best viewed as tentative evidence that the program would produce important effects if implemented faithfully in settings and populations similar to those in the original study. Top tier: Programs shown in well-conducted RCTs, carried out in typical community settings, to produce sizable, sustained effects on important outcomes. Top Tier evidence includes a requirement for replication – specifically, the demonstration of such effects in two or more RCTs conducted in different implementation sites, or, alternatively, in one large multi-site RCT (Is this equivalent to effectiveness?). Such evidence provides confidence that the program would produce important effects if implemented faithfully in settings and populations similar to those in the original studies.
39	Childs Trends	What Works for Child and Youth Development	No, only Eligibility Criteria for Analysis
40	Washington State Institute for Public Policy (WSIPP)	Washington State Institute for Public Policy Benefit -Cost Results	No
41	U.S Department of Justice	Office of Juvenile Justice and Delinquency Prevention Model Programs Guide	Yes. Based on the reviewers’ assessment of the evidence, programs included in the Model Programs Guide and CrimeSolutions.gov the evidence ratings are: Effective : Programs have strong evidence indicating they achieve their intended outcomes when implemented with fidelity. Promising: Programs have some evidence indicating they achieve their intended outcomes. Additional research is recommended. No Effects: Programs have strong evidence indicating that they did not achieve their intended outcomes when implemented with fidelity. * The rating is given for a single study OR more than one study. * A single study icon is used to identify programs that have been evaluated with only one study. A multiple studies icon is used to represent a greater extent of evidence supporting the evidence rating.
42	University of Wisconsin Population Health Institute’s	What Works for Health	Yes. Evidence Rating: Scientifically Supported 1 or more systematic review(s), or at least 3 experimental studies, or 3 quasi-experimental studies with matched concurrent comparisons Strong designs Statistically significant favourable findings Some Evidence 1 or more systematic review(s), or at least 2 experimental studies, or 2 quasi-experimental studies with matched concurrent comparisons, or 3 studies with unmatched comparisons or pre-post measures Statistically significant favourable findings Less rigorous designs Limited effect(s) Expert Opinion Generally no more than 1 experimental or quasi-experimental study with a matched concurrent comparison, or 2 or fewer studies with unmatched comparisons or pre-post measures Expert recommendation supported by theory, but study limited Study quality varies, but is often low Study findings vary, but are often inconclusive Insufficient Evidence Generally no more than 1 experimental or quasi-experimental study with a matched concurrent comparison, or 2 or fewer studies with unmatched comparisons or pre-post measures Study quality varies, but is often low Study findings vary, but are often inconclusive Mixed Evidence 1 or more systematic review(s), or at least 2 experimental studies, or 2 quasi-experimental studies with matched concurrent comparisons, or 3 studies with unmatched comparisons or pre-post measures Studies have statistically significant findings Body of evidence inconclusive Evidence of Ineffectiveness 1 or more systematic review(s), or at least 2 experimental studies, or 2 quasi-experimental studies with matched concurrent comparisons, or 3 studies with unmatched comparisons or pre-post measures Studies have: Strong designs Significant unfavourable or ineffective findings, or Evidence of harm
43	U.S. Department of Health and Human Services	The Agency for Healthcare Research and Quality (AHRQ)	Yes. Strong: The evidence is based on one or more evaluations using experimental designs based on random allocation of individuals or groups of individuals. The results of the evaluation(s) show consistent direct evidence of the effectiveness. Moderate: While there are no randomized, controlled experiments, the evidence includes at least one systematic evaluation of the impact of the innovation using a quasi-experimental design, which could include the non-random assignment of individuals to comparison groups, before-and-after comparisons in one group, and/or comparisons with a historical baseline or control. The results of the evaluation(s) show consistent direct or indirect evidence of the effectiveness. However, the strength of the evidence is limited by the size, quality, or generalizability of the evaluations, and thus alternative explanations cannot be ruled out. Suggestive: While there are no systematic experimental or quasi-experimental evaluations, the evidence includes non-experimental or qualitative support for an association between the innovation and targeted health care outcomes or processes, or structures in the case of health care policy innovations. This evidence may include non-comparative case studies, correlation analysis, or anecdotal reports. As with the category above, alternative explanations for the results achieved cannot be ruled out.
44	U.S. Department of Labor	Clearinghouse for Labor Evaluation and Research (CLEAR)	Yes. Although only for Causal Studies High Causal Evidence This means there is strong evidence that the effects estimated in this study are solely attributable to the intervention being examined. This does not necessarily mean that the study found positive impacts, only that the analysis meets high methodological standards and the causal impacts estimated, whether positive, negative, or null, are credible. Currently, only well-implemented randomized controlled trials can receive this rating. Moderate Causal Evidence This means there is evidence that the effects estimated in the study are attributable at least in part to the intervention being examined. However, there may be other factors that were not accounted for in the study that might also have contributed. Causal studies that meet CLEAR evidence guidelines for no experimental designs (including randomized controlled trials with high attrition) can receive this rating. Low Causal Evidence This means there is little evidence that the effects estimated in the study are attributable to the intervention being examined, and other factors are likely to have contributed to the results. This does not imply that the study's results are not useful for some purposes, but they should be interpreted with caution. Causal studies that do not meet criteria for a high or moderate evidence rating receive this rating. They present, separately, Guidelines for reviewing quantitative descriptive studies and one for implementation studies.
45	Clearinghouse for Military Family Readiness	Continuum of evidence	Yes. Criteria to evaluate evidence: Significant Effect, Sustained Effect, Successful, External Replication, Study Design, and Additional Criteria Regarding Study Execution Continuum of Evidence: 1. Effective 2. Promising 3. Unclear 4. Ineffective
46	Suicide Prevention Resource Center	Evidence-Based Practices Project	Yes. Scoring Criteria Reviewers rated the quality of program evaluations using 10 items (See Table 1). Items were scored on a scale of 1-5 or 0-5. (A more detailed description of these items can be found in the Appendix.) 1. Theory 2. Intervention fidelity 3. Design 4. Attrition 5. Psychometric properties of measures 6. Analysis 7. Threats to validity 8. Safety 9. Integrity 10. Utility Classification Criteria Classifications of programs as insufficient current support, promising, or effective were based solely upon the average scores for two items: integrity and utility. After averaging the scores of the reviewers, the lower average score of the two determined the classification level. Insufficient current support < 3.5 Promising: 3.5 - 3.9 Effective 4.0 - 5.0
47	National Implementation Research Network	The Hexagon: An Exploration Tool	Yes, the rating criteria is for each of the following indicators in a scale from 1 to 5, 5 being the best. Implementing site indicators Fit with current initiatives 1. Alignment with community, regional, state priorities. 2. Fit with family and community values, culture and history 3. Impact on other interventions & initiatives 4. Alignment with organizational structure Need 1. Target population identified. 2. Disaggregated data indicating population needs 3. Parent & community perceptions of need 4. Addresses service or system gaps Capacity to implement 1. Staff meet minimum qualifications. 2. Able to sustain staffing, coaching, training, data systems, performance assessment, and administration: Financial capacity, Structural capacity AND Cultural responsivity capacity 3. Buy-in process operationalized: Practitioners AND families Program indicators Evidence 1. Strength of evidence—for whom in what conditions: Number of studies, Population similarities, Diverse cultural groups AND Efficacy or Effectiveness 2. Outcomes – Is it worth it? 3. Fidelity data 4. Cost – effectiveness data Usability 1. Well-defined program 3. Mature sites to observe 5. Several replications 6. Adaptations for context Supports 1. Expert Assistance 2. Staffing 3. Training 4. Coaching & Supervision 5. Racial equity impact assessment 6. Data Systems Technology Supports (IT) 7. Administration & system
48	National Dropout Prevention Center	Model Programs Database	Yes. Strong Evidence of Effectiveness These programs have been in existence for three years or more. They were evaluated using an experimental or strong quasi-experimental design conducted by an external evaluation team and have strong empirical evidence demonstrating program effectiveness in reducing dropout and/or increasing graduation rates and/or having significant impact on dropout-related risk factors. Moderate Evidence of Effectiveness These programs have been in existence for three years or more. They were evaluated using a quasi-experimental design conducted by an external or internal evaluation team and have adequate empirical evidence demonstrating program effectiveness in reducing dropout and/or increasing graduation rates and/or having significant impact on dropout-related risk factors. Limited Evidence of Effectiveness These programs may be relatively new programs. They were evaluated using a limited evaluation design (single group pre- and post-test) conducted by an external or internal evaluation team. They have promising empirical evidence demonstrating program effectiveness in reducing dropout and/or increasing graduation rates and/or having significant impact on dropout-related risk factors that requires confirmation using more appropriate experimental techniques. Insufficient Evidence of Effectiveness These programs require additional information before a rating category is determined.
49	National Cancer Institute	Research-Tested Intervention Programs (RTIPs)	No. Intervention evaluation and program materials are evaluated in four areas for the RTIPs review. Research Integrity Research Integrity reflects the overall confidence reviewers can place in the findings of a program's evaluation based on its scientific rigor. The Research Integrity rating system comprises 16 criteria scored by independent experts. Scores on each criterion are given on a 5-point scale ranging from low quality to high quality. The overall integrity score is an average of the 16 criteria reflecting the merits of the science that went into the program evaluation. Intervention Impact Intervention Impact describes whether, and to what degree, a program is usable and appropriate for widespread application and dissemination. This rating is determined by the Review Coordinators. Population Reach and Effect Sizes are separately rated on a 5-point scale; these ratings are then combined using the RTIPs Intervention Impact rating table to determine the impact score. Dissemination Capability Dissemination Capability refers to the readiness of program materials for use by others as well as a program's capability to offer services and resources to facilitate dissemination. The rating is given on a 5-point scale ranging from low quality (1.0) to high quality (5.0). Dissemination capability is measured through the assessment of three areas: Quality of implementation materials Training and technical assistance protocols Availability of quality assurance materials to determine whether implementation was carried out with high fidelity to the original model RE-AIM RE-AIM is a five-step framework designed to enhance the quality, speed, and public health impact of efforts to translate research into practice: Reach your intended target population Effectiveness or efficacy Adoption by target staff, settings, or institutions Implementation consistency, costs, and adaptations made during delivery Maintenance of intervention effects in individuals and settings over time
50	Strengthening Families Evidence Review	Standards of evidence	Yes. High Rating Randomized controlled trials received a high rating if: The sample was randomly assigned to at least two conditions (for example, treatment and comparison groups). The sample met the What Works Clearinghouse (WWC) standards for low levels of overall and differential attrition. The sample members were not reassigned after random assignment was conducted. There were no confounding factors, when one part of the design lined up exactly with either the treatment or comparison groups. The analysis included statistical adjustments for selected measures (baseline measures of the outcomes, race/ethnicity, and socioeconomic status) if the treatment and comparison groups were not equivalent on these measures at baseline. Moderate Rating Randomized controlled trials received a moderate rating if: The sample members were not reassigned after random assignment was conducted. The sample met the WWC standards for low levels of overall and differential attrition. There were no confounding factors. The study included groups that were not equivalent on selected baseline measures (baseline measures of the outcomes, race/ethnicity, or socioeconomic status, but the analysis does not include statistical adjustments. OR The study had high rates of overall or differential attrition OR sample members were reassigned after random assignment was conducted. There were no confounding factors. There was baseline equivalence of the treatment and comparison groups on selected measures (baseline measures of the outcomes, race/ethnicity, and socioeconomic status). The analysis included statistical adjustments for the selected measures. Quasi-experimental designs received a moderate rating if: There were no confounding factors. There was baseline equivalence of the treatment and comparison groups on selected measures (baseline outcomes, race/ethnicity, and socioeconomic status). The analysis included statistical adjustments for the selected measures. Pre/post or other designs received a moderate rating if: Not applicable; these studies cannot receive a moderate rating because there is no comparison group. Low Rating A study received a low rating if it included participant outcomes but did not meet the criteria for a high or moderate rating. Unrated We did not rate studies that do not include participant outcomes.

Publications

Featured publications

Data

Featured data

News & Events

Featured Events

About

Engage with us

Work with us

Publications

Featured publications

Data

Featured data

News & Events

Featured Events

About

Engage with us

Work with us

Mobilising Evidence for Good Governance

More info

Cite this content as:

Annex B. Mapping of Existing Standards of Evidence across a Range of Jurisdictions

Table B.1. Mapping of Standards of Evidence

Table B.2. Type of Evidence Assessed and Key Standards of Evidence by Approach

Table B.3. Rating and ranking of quality of evidence by approach

Topics

Countries & regions

Data

Publications

News & Events

About

Featured topics

Agriculture and fisheries

Climate change

Development

Digital

Economy

Education and skills

Employment

Environment

Finance and investment

Governance

Health

Industry, business and entrepreneurship

Regional, rural and urban development

Science, technology and innovation

Society

Taxation

Trade

Energy

Nuclear energy

Transport

Featured topics

Agriculture and fisheries

Climate change

Development

Digital

Economy

Education and skills

Employment

Environment

Finance and investment

Governance

Health

Industry, business and entrepreneurship

Regional, rural and urban development

Science, technology and innovation

Society

Taxation

Trade

Energy

Nuclear energy

Transport

Countries A - C

Countries D - I

Countries J - M

Countries N - R

Countries S - T

Countries U - Z

Regional and global engagement

Countries

Countries A - C

Countries D - I

Countries J - M

Countries N - R

Countries S - T

Countries U - Z

Regional and global engagement

Publications

Publications

Featured publications

Data

Data

Featured data

News & Events

News & Events

Featured Events

About OECD

About

Engage with us

Work with us

Featured topics

Agriculture and fisheries

Climate change

Development

Digital

Economy

Education and skills

Employment

Environment

Finance and investment