AGREE II Instrument

Follow us on Twitter

Scoring the AGREE II

A quality score is calculated for each of the six AGREE II domains. The six domain scores are independent and should not be aggregated into a single quality score.

i)    Calculating Domain Scores

Domain scores are calculated by summing up all the scores of the individual items in a domain and by scaling the total as a percentage of the maximum possible score for that domain.

If 4 appraisers give the following scores for Domain 1 (Scope & Purpose):

Item 1 Item 2 Item 3 Total
Appraiser 1 5 6 6 17
Appraiser 2 6 6 7 19
Appraiser 3 2 4 3 9
Appraiser 4 3 3 2 8
Total 16 19 18 53


Maximum possible score = 7 (strongly agree) x 3 (items) x 4 (appraisers) = 84
Minimum possible score = 1 (strongly disagree) x 3 (items) x 4 (appraisers) = 12

The scaled domain score will be:

(Obtained score – Minimum possible score)
(Maximum possible score – Minimum possible score)
(53 – 12) / (84 – 12)  X 100 =
41 / 72    X 100 =    0.5694 x 100 =     57 %

If items are not included, appropriate modifications to the calculations of maximum and minimum possible scores are required.

ii)    Interpreting Domain Scores

Domain scores can be used to identify strengths and limitations of guidelines, to compare methodological quality between guidelines, or to select high quality guidelines for adaptation, endorsement, or implementation. At present, there are no empirical data to link specific quality scores with specific implementation outcomes (e.g., speed of adoption, spread of adoption) or specific clinical outcomes; this makes selection of quality thresholds to differentiate between high, moderate, and low quality guidelines a challenge. In the absence of these data, we provide examples of approaches that can be used to set quality thresholds:

• Prioritizing one domain: Through consensus or based on decisions by leadership, one quality domain may be prioritized over the others. Thus, thresholds can be created based on scores for the prioritized domain (e.g., high quality guidelines are those with a Domain 3 score >70%).

• Staged AGREE II appraisal: If users value one domain over the others, they can first appraise the guidelines using that domain only. Only those guidelines that meet a quality threshold for that domain (e.g., >70%) are then appraised using the other five AGREE II domains.

• Considering all domain scores: Users can create a threshold across all six domain scores based on consensus or decisions by leadership (e.g., high quality guidelines are those with domain scores that are all >70%). Alternatively, users might create different thresholds for each of the domains.

• Thresholds for improvement over time: If evaluating changes in scores for guidelines over time, users can create thresholds for improvement (e.g., at least 10% improvement in each domain score for guidelines by a particular developer over a period of five years).

Any decisions about how to define quality thresholds should be made by a panel of all relevant stakeholders before beginning the AGREE II appraisals. Decisions should be guided by the context in which the guideline is to be used and by evaluating the importance of the different domains and items in that context.