"Before operating on a patient’s brain, I realized, I must first understand his mind: his identity, his values, what makes his life worth living, and what devastation makes it reasonable to let that life end."

In his seminal work When Breath Becomes Air, Paul Kalanithi presents this simple yet powerful message about the balance medical professionals constantly strive to achieve when caring for their patients. Inherent in this message is the importance of a patient’s values and preferences, a cornerstone of true evidence-based practice.

But even after eliciting a patient’s values and preferences, clinicians and patients face the task of weighing these values and preferences as a part of decision-making to ensure that the possible benefit of an intervention outweighs the possible harms. For instance, consider a situation where a person has an estimated 12% risk of having a heart attack over the next 10 years, and there is a medication that can reduce that risk by a relative 25% (95% CI 19% to 30%), for an absolute risk of 9% (95% CI 8.4% to 9.7%) and an absolute risk difference of -3% (-2.3% to -3.6%) over that 10-year period. Suppose the medication is also known to cause persistent fatigue in some people, with an absolute increase in risk of persistent fatigue of 4.8% (95% CI 0.2% to 10%) over the same period of time. Surely, it would be overly simplistic to say that 4.8% is larger than 3%, so the harms of this medication are higher than the benefits. But so too would it be overly simplistic to assume all people place enough weight on avoiding a heart attack compared to persistent fatigue that the possible benefits clearly outweigh the possible harms. So, we come now to a pivotal question in medicine: How can we reconcile the values people may place on the possible outcomes of an intervention?

Until recently, weighing values and preferences was something clinicians and patients largely had to confront qualitatively as part of the art of medicine. There is nothing inherently wrong with such qualitative approaches, and there is certainly nothing wrong with remembering excellent medical practice is both a science and an art. However, the Grading of Recommendations Assessment, Development and Evaluation (GRADE) Working Group recently published a concept paper in BMJ Open that offers a decided step forward in navigating medical decision-making via the concept of certainty of net benefit.

Defining certainty of net benefit: a GRADE concept paper” gives a detailed description of how one can generate and rate the certainty in a net effect estimate that combines different patient-important outcomes and the relative importance of those outcomes. The paper represents the culmination of over two years of deliberation and refinement within the GRADE Working Group, spearheaded by Brian Alper (founder of DynaMed and Vice President of Innovations and Evidence-Based Medicine Development at EBSCO Information Services). This effort pairs well with the GRADE Working Group’s 2017 paper about the construct of certainty of evidence, which clarified that certainty statements pertain to the certainty that the effect estimate lies on one side of a specified threshold or within a specified range. This threshold or range can be decided with noncontextualized, partially contextualized, or fully contextualized approaches. However, contextualization – especially full contextualization – sometimes presents practical obstacles. Indeed, this is explicitly acknowledged in the certainty of net benefit paper, which states its motivation is to introduce certainty of net benefit as an approach that might help with developing fully contextualized ratings of certainty (something hinted at in the 2017 paper). Although the present writing will not detail all the steps or nuances involved in evaluating certainty of net benefit, the following provides a high-level overview.

The assumptions of the net effect estimate model are that the outcomes to be combined have effect estimates that: (1) represent normally distributed data, (2) are independent from and not correlated with one another, and (3) are expressed using the same unit of measure. Additionally, the model assumes the point estimate for the net effect estimate is the simple sum of the point estimates for the component outcomes.

How can we reconcile the values people may place on the possible outcomes of an intervention?

The assumption of normality is common in parametric methods and will thus not be considered further here, but some consideration should be given to other assumptions. Because the assumption of no correlation may not hold (e.g., Might hospitalization for heart failure be correlated with all-cause mortality?), the paper discusses methods for approaching correlations that might exist between effect estimates. Some effect estimates may not be entirely independent of each other, either. The paper reasonably suggests that minor violations of this assumption may not result in substantial error, and such allowance may be preferable to less explicit methods for attempting to balance benefit and harm. However, one must still be meticulous in the selection of outcomes for combination, and it would be inappropriate to combine outcomes that overlap substantially or where one outcome subsumes the other. For instance, it would be inappropriate to combine the outcomes of all-cause mortality and cardiovascular mortality, because cardiovascular mortality is a subset of all-cause mortality.

The assumption of a common unit of measure will rarely (if ever) hold. Fortunately, the use of a multiplicative conversion factor (the “relative importance value”) that expresses how important outcomes are relative to each other helps resolve this issue. Of note, the relative importance value not only results in a common unit of measurement via a reference outcome, but also allows for direct incorporation of patients’ values and preferences in a quantifiable manner. For instance, if we return to the hypothetical example and set heart attack as the reference outcome, this would give heart attack a relative importance value of 1 and result in the net effect estimate being expressed in heart attack-equivalent events. Now, suppose the hypothetical patient considers a heart attack to be twice as important as persistent fatigue, so the relative importance value for persistent fatigue would be 0.5. With an absolute risk difference of -3% (95% CI -2.3% to -3.6%) for heart attack over a 10-year period and an absolute risk difference of 4.8% (95% CI 0.2% to 10%) for persistent fatigue over the same period, one can calculate the net effect estimate as 6 fewer (95% CI 31 fewer to 19 more) heart attack-equivalent events per 1,000 people treated for 10 years. That is, in terms of heart attack-equivalent events, if 1,000 people were treated with this medication for 10 years, we would expect about 6 to benefit, but the confidence interval suggests the results are reasonably compatible with anything from 31 people benefiting or 19 people being harmed. We are thus left with considerable uncertainty about the net effect of this medication, and the patient might understandably opt not to take it. However, a different patient with a smaller relative importance value for persistent fatigue (say, 0.1) would see a net effect estimate of 25 fewer (95% CI 17 to 33 fewer) heart attack-equivalent events per 1,000 treated for 10 years, so certainty in net benefit is clearly higher in this case. Importantly, however, whether the magnitude of benefit is enough for the patient to take the medication is still a preference-sensitive decision (and may involve other considerations such as cost and burden/inconvenience). And simply quantifying the net effect estimate is not enough; one must also rate the certainty in that estimate.

One must also determine the certainty for the net effect estimate. This involves assessing the precision of the net effect estimate and classifying it accordingly (e.g., net benefit, net harm, or something between; Table 1 and Figure 3 of the paper provide suggestions for making these judgments), assessing the certainty in outcomes critical to the net effect estimate, and assessing how the net effect estimate might vary across the range of plausible relative importance values.

The ability to quantify a net effect estimate is an exciting notion for medical decision-making, and the GRADE Working Group hopes the paper will help stimulate further discussion. To that end, one should note the GRADE Working Group presented the paper mainly with an intended audience of those involved in developing clinical practice guidelines. However, they did note that this concept might at some point be extended to the individual patient level. And indeed, to limit this idea to clinical practice guidelines would arguably be to short-sell the concept’s potential utility, because the most compelling use of this concept seems to be at the individual patient level as a part of shared decision-making. One can certainly try to establish average relative importance values for a given population, and the paper discusses means to accomplish this. However, such relative importance values are averages, not individual values and preferences; this distinction is critical, especially if one were to apply this concept as a part of shared decision-making. If no better estimate for relative importance exists, having population averages is certainly desirable, but tuning in to a given person’s values and preferences seems far preferable for bedside decision-making. This is much like how we prefer to have individualized absolute estimates of benefit and harm whenever possible, but we will ultimately rely (even if somewhat reluctantly) on absolute metrics from clinical trials or systematic reviews of clinical trials when more individualized estimation is not possible.

In sum, “Defining certainty of net benefit: a GRADE concept paper” has noteworthy potential to impact how clinicians and patients think about and make decisions in medicine. It is sure to stimulate interest and discussion surrounding how this concept might improve everything from clinical practice guidelines to shared decision-making, and also how we might be able to build on or improve the method itself.