Search for a command to run...
Cervical cerclage is among the few obstetric therapies with demonstrated efficacy, but only within narrowly defined indications. The most recent FIGO Good Practice Recommendations deliberately restrict endorsement to the areas of strongest consensus [1]. These include most notably women with a history of three late second-trimester losses or preterm births, women with a cervical length <25 mm if they have had one or more spontaneous preterm birth and/or mid-trimester loss, as well as those presenting with painless cervical dilation in the second trimester, classically defined as cervical insufficiency [1]. When effective, the benefit for mother, child, and family can be profound and enduring. Beyond these indications, however, certainty rapidly dissipates. The evidentiary foundation for cervical cerclage remains limited and heterogeneous, despite its widespread adoption. Although clinical trials and meta-analyses demonstrate overall benefit for pregnancy and child outcomes, they provide far less clarity regarding which women truly benefit [2]. As a field, we have prioritized implementation based on broad and simplified indications rather than systematically interrogating the biological and clinical heterogeneity that likely determines individual treatment response. This gap does not necessarily reflect failure of the intervention itself, but rather limitations in how we conceptualize and apply it. In this issue of Pregnancy, a multi-site retrospective observational study enters this uneasy space, offering data where randomized evidence remains conspicuously absent [3]. The authors examine whether cerclage placement, added to standard vaginal progesterone therapy, is associated with prolonged pregnancy latency among asymptomatic women with singleton pregnancies with a transvaginal cervical length of 10 mm or less and no prior spontaneous preterm birth. Their findings are directionally consistent with prior retrospective cohorts and subgroup analyses, reinforcing a growing clinical intuition: when cervical shortening is extreme, progesterone alone may not be enough [4, 5]. Yet intuition is not evidence, and the study cannot resolve the central question it raises. The investigators conducted a retrospective cohort study across three academic centers between January 2016 and June 2024. Among women undergoing universal mid-trimester transvaginal cervical length screening between 16 and 23+6 weeks’ gestation, those with a cervical length ≤10 mm were identified. To isolate a narrowly defined population without overlapping indications for cerclage, the authors applied strict exclusion criteria: any prior spontaneous preterm birth <34 weeks, prior cerclage, cervical surgery such as LEEP or cone biopsy, suspected Müllerian anomalies, major fetal anomalies, symptoms of preterm labor, cervical dilation greater than 1.5 cm, incomplete in-system follow-up, and some additional rare conditions [3]. All included patients were prescribed vaginal progesterone at diagnosis, reflecting contemporary standard of care. Of the 247 patients initially identified with a cervical length ≤10 mm, only 87 met criteria for analysis, highlighting how rapidly real-world screening cohorts shrink once strict phenotyping is applied. Among these 87 women, 55 (63%) underwent cerclage placement, while 32 (37%) were managed expectantly with progesterone alone. Cerclage technique (McDonald or Shirodkar), perioperative antibiotics, tocolytics, and follow-up protocols were provider dependent. Importantly, the cerclage group appeared higher risk at baseline in ways that matter clinically. These patients were diagnosed earlier in gestation, had significantly shorter cervical lengths, higher mean body mass index, and a greater proportion identified as Black, characteristics consistently associated with increased risk of preterm birth. In most observational settings, such baseline differences would be expected to bias outcomes against the cerclage group. Despite this unfavorable risk profile, cerclage placement was associated with longer pregnancy latency and higher gestational age at delivery (34 vs. 32 weeks). Mean latency was approximately 2 weeks longer in the cerclage group (13 vs. 11 weeks), and term delivery occurred more frequently. Kaplan–Meier analyses suggested a longer time to delivery among cerclage recipients. However, confidence intervals were wide across all outcomes and consistently most often crossed the null. Although point estimates favored cerclage, the data remain statistically compatible with no effect and, for some outcomes, harm cannot be excluded. Neonatal outcomes, including neonatal intensive care unit admission, fetal demise, and neonatal death, did not differ meaningfully between groups, though event counts were small. No immediate procedural complications such as preterm prelabor rupture of the membranes or hemorrhage at placement were reported. These findings align with a growing body of observational literature suggesting that when cervical shortening is extreme, progesterone alone may be insufficient, and cerclage may offer incremental benefit. They also suggest that cerclage can be offered without obvious short-term procedural harm in carefully selected, asymptomatic patients. These are important contributions. But they are not answers. The principal limitation of this study is its observational design, which precludes causal inference and leaves the findings vulnerable to confounding by indication. Cerclage was not assigned randomly but selected through clinician judgment and patient preference, both informed by nuanced clinical features that are difficult or impossible to measure retrospectively. Providers do not choose cerclage at random; they choose it for patients who appear riskier based on a complex synthesis of sonographic findings, clinical trajectory, and intuition. Statistical adjustment can blunt but never eliminate this bias. Baseline imbalances complicate interpretation, but their direction is important. Differences in gestational age at diagnosis, cervical length, body mass index, and racial distribution suggest that the cerclage group may have carried a higher baseline risk profile. If so, confounding by indication would be expected to bias results against cerclage rather than in its favor. The fact that outcomes nonetheless trend toward benefit makes simple selection bias an incomplete explanation, though residual confounding cannot be excluded. In small cohorts, the trade-off between overfitting and residual confounding becomes unavoidable and ultimately defines the study's principal limitation. Also, other methodological considerations warrant attention. Prescription of vaginal progesterone does not equate to adherence, which may vary systematically between groups and correlate with other determinants of outcome such as health literacy, access to care, and perceived risk. The cerclage intervention itself was heterogeneous, encompassing different surgical techniques, perioperative regimens, and follow-up practices. The observed “cerclage effect” therefore represents an average across multiple practice patterns, complicating translation into a specific protocol. Outcome definitions further blur interpretation. The analysis did not distinguish spontaneous preterm birth from medically indicated preterm birth. If cerclage primarily affects spontaneous preterm birth, inclusion of indicated deliveries related to preeclampsia or fetal growth restriction may dilute or distort treatment effects. Similarly, the finding of higher cesarean delivery rates among cerclage recipients cannot be interpreted without information on delivery indications. Strict exclusion criteria, while enhancing internal validity, also narrow generalizability. Many patients encountered in routine practice, those with prior cervical procedures, borderline symptoms, incomplete adherence, or fragmented care, would not resemble the study population. Future trials must move beyond sonographic thresholds alone. Cervical length is a marker of risk, not a diagnosis, and it cannot serve as the sole determinant of intervention in a condition driven by heterogeneous biological pathways. The central unresolved question is not merely whether cerclage prolongs pregnancy on average, but whether it should be offered at all in the presence of intra-amniotic inflammation or infection, contexts in which the underlying causal pathway may render a mechanical intervention ineffective or potentially harmful. A suture cannot correct an inflammatory cascade, nor can it sterilize an infected amniotic cavity. This is not a marginal issue; it is arguably the strongest confounding factor in the entire cerclage literature. Subclinical intra-amniotic inflammation and infection fundamentally alter prognosis, yet they are rarely incorporated into cerclage trial design, stratification, or clinical algorithms. As a result, treatment effects are averaged across biologically distinct disease states, mechanical cervical insufficiency, sterile intra-amniotic inflammation, and microbial invasion of the amniotic cavity, thereby diluting true benefit in some women while masking harm in others. What appears as a modest or inconsistent treatment effect may in fact reflect the arithmetic consequence of mixing incompatible pathophysiology. This issue needs be approached either with invasive amniocentesis or non-invasive biomarker tests to assess the intra-amniotic environment [6, 7]. Without biologic stratification, particularly of the intra-amniotic environment, confounding by mechanism remains embedded in our evidence base. Trials that fail to account for this are not simply underpowered; they are conceptually incomplete. If we are serious about advancing cerclage care, future studies must incorporate meaningful phenotyping and mechanistic differentiation. Otherwise, we will continue to generate averaged answers to the wrong question, and precision in patient selection will remain aspirational rather than real. Observational studies like this one can illuminate the contours of uncertainty, but they cannot resolve it. Cerclage is an intervention we know can work, yet outside its classic indications, we still struggle to deploy it with precision. To help the greatest number of women and their families in the best possible way, belief is not enough. The field does not need more retrospective reassurance. It needs adequately powered, pragmatic randomized controlled trials designed to reflect real-world practice and biologic heterogeneity. Taken together, this study sharpens clinical equipoise rather than resolving it. It strengthens the signal that cerclage may prolong pregnancy in women with an extremely short cervix even without a prior spontaneous preterm birth, but it also underscores how little we truly know about who benefits, how much, and at what cost. This persistent uncertainty is not merely methodological; it is ethical and epistemological. One is therefore compelled to ask: why were these women not randomized? A common response is pragmatic: limited staffing and funding, insufficient infrastructure and competing clinical demand. Yet from a broader perspective, this explanation exposes a structural failure rather than a justification. Health care systems should be organized to generate knowledge as an integral part of care, not as an optional academic add-on. When uncertainty directly affects patient outcomes, the production of high-quality evidence is itself a clinical responsibility. Clinical trial units and embedded research pathways should not be luxuries confined to major academic centers, but standard components of modern obstetric care and part of the cost for care. Until such structural reform occurs, we remain dependent on the traditional model randomizing one patient at a time in an academic ad on setting, study by study, institution by institution. That model is slower and more fragile, but it remains ethically preferable to perpetuating uncertainty through observational inference alone. If equipoise is real, then systematic randomization is not an inconvenience; it is an obligation. Without it, we continue to practice in uncertainty while postponing the very evidence our patients need. If we accept anything less than the best possible science, we institutionalize mediocrity, which once embedded, becomes self-perpetuating. Weak evidence has a lock-in effect: it shapes guidelines, normalizes practice patterns, lowers the evidentiary bar for future research, and might hinder additional studies in the field. Over time, provisional assumptions harden into doctrine. That is how uncertainty becomes tradition. We cannot reframe insufficient evidence as “good enough” science, nor can we compensate for methodological weakness by leaning disproportionately on shared decision-making. Shared decision-making is ethically essential, but it cannot substitute for rigorous evidence. When invoked to fill evidentiary gaps, it risks transferring the burden of scientific uncertainty onto patients rather than resolving it. It is our responsibility to generate the strongest science possible and to resist settling for less. Until we do, clinicians will continue to practice in the gray zone, where shared decision-making compensates for fragile evidence instead of complementing robust knowledge. That may feel pragmatic in the short term, but in the long run, it delays the very clarity our patients deserve. A day without randomization is a lost day! This editorial has been edited with the assistance of an artificial intelligence (AI) tool for language clarity, grammar, and stylistic improvement. The authors take full responsibility for the content of the review. The authors declare no conflicts of interest. Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.