Search for a command to run...
INTRODUCTION The massive volume of evidence generated through medical research worldwide continues to expand the scientific literature. One must be cautious of the variations between different research studies. When reading a research paper, especially an original study that analyzes data, it is important not to begin directly with the results or discussion sections. Instead, the reader should first critically appraise the study by carefully examining the objectives and the methods as a first step. The objectives help clarify what the researchers intended to investigate, while the methods explain how the study was conducted, including the study design, participants, measurements, and analyses. In general, standard and objective tools have been developed to appraise research papers. Evidence from multiple studies may be synthesized through systematic reviews and meta-analyses and used to inform clinical guidelines. However, there was a need for a proper system to assess the quality or level of evidence that provide hierarchy of evidence. This led to the development of the Grading of Recommendations, Assessment, Development, and Evaluation (GRADE) approach. The GRADE approach began around 2003 and is part of the ongoing evolution of evidence-based medicine, which started in the early 1980s. Its origins can be traced to a report published in the BMJ in 2004, and it has been further refined over time.[1] This paper discusses the basis of determining the certainty of evidence in evidence-based medicine, with special focus on the GRADE approach and its role in improving the reliability of guideline recommendations. EVIDENCE-BASED MEDICINE For many years, medical practice was guided by the Bolam test, which relied on the principle of accepted common practice.[2] This meant that if a responsible group of clinicians followed a particular practice, for example, prescribing a new drug, then it was considered an acceptable treatment, even if strong research evidence was not available. In simple terms, practice was justified by the fact that other clinicians were also doing the same. This approach of defense in negligence based on customary professional practice received criticism, and in the late 1990s, another case law called the Bolitho test established that clinical decisions must be supported by logic and reasoning.[3] The court emphasized that professional opinion must be reasonable and scientifically defensible, not just widely followed. During the 1980s, medicine was rapidly evolving. There was a large increase in medical research, and at the same time, medico-legal cases against doctors were also increasing. As a result, it was no longer sufficient to justify treatment decisions based only on tradition or common practice. Clinicians were increasingly expected to support their decisions with scientific evidence, as part of sound logical reasoning. At the same time, research methods were improving, and the volume of published research was growing enormously. However, not all research studies were of good quality. This created the need for a structured, objective, and reproducible method to evaluate and interpret research findings. This need gradually led to the development of evidence-based medicine (EBM), which emphasized the use of the best available scientific evidence for clinical decision-making. CLINICAL PRACTICE GUIDELINES Clinical Practice Guidelines are systematically developed statements that assist clinicians in making appropriate clinical decisions, usually regarding the treatment of a specific condition. They not only recommend treatments but also consider the balance between benefits and harms of different options. Before the concept of EBM emerged, clinicians mainly relied on expert opinion and consensus to guide their practice. While expert experience was valuable, it was not always based on a systematic assessment of scientific evidence. With the growth of medical research, more rigorous and transparent methods were developed to combine and evaluate research findings, which led to the development of systematic reviews and meta-analyses that collect, critically appraise, and synthesize all relevant studies on a particular topic in a structured manner.[4] Over time, the concept of hierarchy of evidence also emerged, recognizing that some types of research, such as randomized controlled trials and systematic reviews, provide more reliable evidence than others.[5] Evidence was differentiated by rating and contributed to developing guidelines. The National Academy of Medicine worked in this area, and they set out criteria for the development of clinical practice guidelines.[6] According to these standards, guidelines should be based on a systematic review of the literature, clearly rate the quality of evidence and strength of recommendations, and be developed by a multidisciplinary panel. The process should be explicit, transparent, and minimize bias and conflict of interest. Guidelines should also present alternative options when appropriate. The GRADE approach was developed to address these limitations by providing a transparent system to rate the certainty of evidence and the strength of recommendations. Therefore, the GRADE approach now forms the core process in developing evidence-based clinical practice guidelines. However, the key features central to the GRADE approach, such as a transparent presentation of review results and a clear rating of the strength of recommendations, were largely lacking in traditional approaches. Earlier systems often provided recommendations without clearly explaining how the evidence was judged or how confident we could be in the findings. This created difficulty for clinicians and guideline users in understanding how much trust to place in the recommendations. To address this gap, the GRADE approach introduced the concept of certainty of evidence, which indicates “the confidence that the truth lies on one side of a specified threshold or within a specific.[7] In this context, the “truth” or parameter refers to the answer to the research question being studied. This may include questions related to the presence of an association between factors, the effect of a treatment or intervention, the prevalence of a condition, or any other question examined in a systematic review. Thus, the certainty of evidence reflects how confident we are that the study findings represent the true effect or true situation. Based on this assessment, the certainty of evidence is categorized into four levels: high, moderate, low, and very low certainty of evidence, evolved from previous A, B, C or I, II, III types [Table 1].[8]Table 1: Four categories of certainty of evidenceThe core steps in performing a GRADE assessment include: Define the intent of the question and the precise target for the certainty rating (outcome or parameter) Determine the starting point: Randomized controlled trials begin as high certainty, observational studies as low certainty Assess domains for downgrading or upgrading certainty: Downgrade for: (i) risk of bias, (ii) inconsistency, (iii) imprecision, (iv) indirectness, and (v) publication bias Upgrade for observational evidence when there is: (i) a large effect, (ii) a clear dose–response relationship, or (iii) when all plausible residual confounding would reduce an apparent effect (i.e., strengthen confidence in direction of effect). Arrive at a final certainty rating (high, moderate, low, and very low) Present findings in narrative and tabular formats (e.g., summary of findings tables). Findings from the certainty assessment are incorporated mainly into narrative statements of conclusions. For example, if the certainty of evidence in a causation study is high, the conclusion would be “this exposure probably increases harm,” whereas if it is moderate, “this derangement probably increases damage.” Recommendations are made based on the balance of benefits and harms, values and preferences of the target population. Resource implications, cost effectiveness, and feasibility are also taken into account while making recommendations. Taking all this into account, certainty of evidence is applied to make recommendations either as strong or weak/conditional. The GRADE Working Group provides detailed guidance about how to conduct assessments and reviewers can benefit from online software such as GRADEpro (GRADEpro GDT: GRADEpro Guideline Development Tool. Available at: Https://gdt.gradepro.org),[9] which guides through assessment and aids in generating Summary of findings tables. Many organizations across the globe provide training in this. In India, organizations such as the Department of Health Research support capacity building in evidence-based guideline development. PSYCHIATRY RESEARCH Imprecision can be particularly prominent in psychiatric research due to small sample sizes or variability in outcome measurements. For example, trials evaluating treatments for rare psychiatric conditions such as treatment-resistant obsessive–compulsive disorder or early-onset psychosis may include relatively few participants, leading to wide confidence intervals around effect estimates. Such uncertainty can reduce confidence in the estimated treatment effects even when the direction of benefit appears favorable. Indirectness often arises in psychiatry because study populations, interventions, or outcomes may not fully reflect real-world clinical settings. In psychiatric research, indirectness may arise when trial populations differ from typical clinical populations. As a result, the study population may not fully represent the patients seen in routine mental health settings. Indirectness may also occur when pharmacological interventions are compared with a placebo, whereas clinicians often need to choose between pharmacological and psychological therapies, such as cognitive behavioral therapy. Although GRADE provides a structured framework for evaluating certainty of evidence, elements of subjectivity remain, particularly when judging domains such as indirectness or imprecision. This subjectivity may be more pronounced in psychiatric research because outcomes frequently rely on symptom rating scales and patient-reported measures rather than objective physiological markers. Furthermore, methodological challenges such as difficulties in blinding in research involving psychotherapy interventions or heterogeneity in diagnostic criteria across studies may complicate the assessment of risk of bias and certainty of evidence. Despite these challenges in applying the GRADE approach to psychiatry research, the process of downgrading the certainty of evidence requires clear explanatory notes. These explanations improve transparency and help readers understand the reasons behind the judgments, allowing for more careful and informed interpretation of the evidence. LIMITATIONS AND QUALITY ASSURANCE GRADE assessments retain elements of subjectivity, particularly when reviewers judge domains such as indirectness or imprecision. To enhance reliability, assessments should be carried out independently by at least two reviewers, with discrepancies resolved by discussion or by involving a third reviewer. GRADE can be time-intensive and requires training, which may limit its uptake without institutional support. CONCLUSION The GRADE framework advances rigorous, transparent appraisal of the certainty of evidence and provides a direct link between evidence and recommendations. Its structured approach improves the interpretability of systematic reviews and the credibility of clinical practice guidelines, thereby bridging research and clinical practice to enhance patient care. Systematic reviewers and guideline authors should be proficient in GRADE or collaborate with GRADE-trained methodologists to ensure that recommendations reflect both the evidence and the confidence we can place in it. Author contribution statement RH: Conceptualization, review, and editing. RSP: Conceptualization, first draft, and literature review. VH: Conceptualization, literature review, review of draft, and editing. Disclosure of use of generative AI No generative AI was used in the preparation or review of the manuscript. AI assistance was used for improving language and readability. Declaration on use of copyrighted tools No copyrighted tools, instruments, or proprietary materials were used. Financial support and sponsorship Nil. Conflicts of interest There are no conflicts of interest.
Published in: Journal of Psychiatry Spectrum
Volume 5, Issue 2, pp. 75-77