Rachel Oughton
22/02/2023
In this lecture we will
An important aspect of health economics problems is that a decision needs to be made in the presence of uncertainty.
This is often done using Decision theory.
A decision analytic model uses
As well as the decisions we make, there are events that are random
This information is combined to deduce the best decision.
By combining
we can calculate the expected outcome and cost associated with each sequence of decisions.
In general in decision theory we express the outcome in terms of
utility.
You can read about the link between utility and the QALY in Whitehead and Ali (2010).
In health economics, the decisions relate to efficacy and resource allocation.
There is uncertainty inherent in these decisions:
In health economic decision analysis there are two basic questions:
Should this treatment / technology / intervention etc. be adopted, given the existing evidence and the uncertainty surrounding its outcomes? - If so, which strategy should be adopted, and for which cohort(s)?
Is more evidence / information required before question 1 can be answered?
We will look at some aspects of each of these.
Decision theory addresses the question
“How should one decide what to do when things are uncertain?”.
In decision theory, we
We can also use decision theory to answer questions like “How much should we be willing to pay for more information?”
The ingredients of a decision problem are:
We will illustrate this throughout with the simple example of a lottery ticket.
The first task is to list all the possible decisions you can make, often written \(d_1,\ldots,d_k\).
For example, let’s say you’re deciding whether to buy a lottery ticket. Then we might have \[ \begin{align*} d_1: & \text{ Buy a ticket} \\ d_2: & \text{ Don't buy a ticket} \end{align*} \]
In this example, there is only one ‘phase’ of decision-making.
In many problems there might be several points at which a decision needs to be made, and options might depend on what has happened before.
The second set of ingredients is the events, often labelled \(E_1,\ldots,E_m\).
We use this in the probabilistic sense, to mean the set of possible things that might happen after you have made your decision (or between subsequent decisions, if there are multiple phases).
Make sure that exactly one of the events will happen.
In our example about buying a lottery ticket, we might have:
\[ \begin{align*} E_1: & \text{ You win} \\ E_2: & \text{ You don't win} \end{align*} \]
When we make the decision, we don’t know which event will occur.
At each point where one of a set of events will occur, we need the probability of each event.
Often based on information from studies / trials /
research.
In our lottery example, we might have:
\[ \begin{align*} p\left(E_1\mid{d_1}\right) & = 0.0001 \text{(winning given you bought a ticket)} \\ p\left(E_1\mid{d_2}\right) & = 0 \text{ (winning given you didn't buy a ticket)} \end{align*} \]
The rewards (or payoffs) are the consequences following each combination of decisions and events.
Note that although these are called ‘rewards’, they can sometimes be bad!
In health economics, these will often be expressed in QALYs.
Linked to rewards is the idea of costs.
Usually, each decision will incur a cost.
In our example let’s say the cost of buying a lottery ticket is £1, and the reward if you win is £500.
In health economics, there may be other costs associated with outcomes, eg. ongoing treatment.
We can put everything together into a decision tree.
A decision tree is made of nodes and branches, arranged to show
There are two types of node:
At each node there are then a number of branches, depending on the number of possible events or decisions.
For any combination of decisions and events, we end up at a reward / outcome.
It is important that the pathways are mutually exclusive.
The decision tree for our lottery example.
Time goes from left to right
The point of the decision tree was to help us to make a decision.
So, how do we use it to do that?
To solve a decision tree we
- Combine the probabilities and outcomes to find the expected value of each decision.
- Choose the decision(s) that lead to the best expected outcome
At each decision node, we rule out the options with the lower
expected values, leaving one path:
The path with
the best expected outcome
We construct the tree from left (the first decision node) to right (the final outcomes)
We solve the tree from right to left.
We work through the tree, removing each node as we go, from right to left.
- For each chance node,
- calculate the expectation of its branches (using the probabilities and outcomes).
- write this value at the chance node and remove the part of the tree to the right.
- For each decision node,
- choose the option that leads to the highest expected outcome,
- then cross out all other branches.
When we have reached the root decision node, we will have found
For our lottery ticket example, we have
The chance node is the furthest to the right, so we will start there.
At the chance node, we have
- probability \(0.0001\) of an outcome of £499
- probability \(0.9999\) of an outcome of -£1.
Therefore our expected outcome (in £) is
\[0.0001 \times{499} + 0.9999 \times{-1} = -0.95 \]
Our tree therefore becomes
The decision tree for our lottery example, with the chance node solved.
We can now look at the two branches from the decision node.
Therefore we cross out \(d_1\) and our optimal decision is not to buy a ticket.
The decision tree for our lottery example, with the chance node solved.
We can make decisions, but we don’t know what will happen at each chance node.
However, suppose we have an option to find out what will happen at each chance node
this would almost certainly have a big impact on our decision-making!
The expected value of perfect information is the difference in
- the expected outcome when we know beforehand what will happen at each chance node (ie. we have perfect information) and
- the expected outcome when we don’t know what will happen.
EVPI is calculated from the perspective of deciding whether or not to pay to gain the perfect information
For our lottery example:
The probabilities of the information revealing each event are the same as the probabilities of the event.
So our expected outcome with perfect information is:
\[ 0.0001\times{£499} + 0.9999\times{£0} = £0.0499.\] Our optimal decision before (not to buy a ticket) had expected outcome £0,
So our expected value of perfect information is
\[ EVPI = £0.0499 - £0 = £0.0499. \]
In this section:
Williams (1985) presents several scenarios for a certain class of angina patients
We will go through a simplified version, but the paper is a useful reference.
Possible outcomes from the operation in QALYs, taken from Williams (1985).
These are the probabilities of those outcomes if the operation is performed:
The decision tree for our angina example. The QALY values have been estimated by eye from Williams (1985).
We can solve this with backward induction as before.
For the chance node following the operation, we have an expected outcome of
\[ \underbrace{0.67\times{9}}_{\text{Improvement}} + \underbrace{0.3\times{4.4}}_{\text{No change}} + \underbrace{0.03 \times{0}}_{\text{Operative} \\ \text{mortality}} = 7.35 \text{ QALYs.}\]
The expected outcome of staying on the medical management plan is 4.4 QALYs.
Therefore the optimal decision is to have the operation.
So far we have just considered QALYs.
In fact, each decision option has associated with it a cost.
In this case:
Therefore our incremental cost efficiency ratio is
\[ ICER = \frac{3000-500}{7.35 - 4.4} = \frac{2500}{2.95} = 847.46. \] and if our willingness-to-pay threshold is above £847.46 per QALY, we will perform the operation.
How much would we pay in this case for perfect information?
Our outcome with perfect information would be
\[\underbrace{0.67\times{9}}_{\text{Operation successful}} + \underbrace{0.3\times{4.4}}_{\text{No change}} + \underbrace{0.03\times{4.4}}_{\text{Operative mortality}} = 7.482.\]
This is \(7.482 - 7.35 = 0.132\) QALYs more than our expected outcome.
Therefore we will pay up to 0.132 times our willingness-to-pay-threshold for perfect information.
A much more realistic (complex!) EVPI example in McCullagh, Walsh, and Barry (2012).
In this section:
The aim for performing diagnostic tests is to
We want to avoid
For a given condition there may be a range of different diagnostic tests, each with different costs and accuracies.
Think of the more costly but more accurate PCR test compared to the less expensive but less accurate LFD test.
Rautenberg, Gerritsen, and Downes (2020) give a review of the use of decision theory in diagnostic testing, as well as setting forward good practice.
Notation
We will use
- \(D\) to denote a disease (or condition)
- \(T\) to denote a test.
- Superscript \(+\) or \(-\) to denote the outcome.
For example, \(T^-\) means that a test is negative, whereas \(D^+\) means the disease/condition is present.
Two measures are important when considering diagnostic test accuracy:
As sensitivity increases, specificity often decreases
At the extreme,
Clearly this would not be a very useful test.
We can visualise these measures using a confusion matrix.
Test_Positive | Test_Negative | |
---|---|---|
Disease | True positives (TP) | False negatives (FN) |
No Disease | False positives (FP) | True negatives (TN) |
What we really want to know is
\[p\left(D\mid{T}\right)\] The probability someone has the disease, given their test result.
We use Bayes theorem
For two events \(A\) and \(B\), with \(p\left(B\right)>0\),
\[ p\left(A\mid{B}\right) = \frac{p\left(B\mid{A}\right)p\left(A\right)}{p\left(B\right)}.\]
For us, this becomes (for example)
\[p\left(D^+\mid{T^+}\right) = \frac{p\left(T^+\mid{D^+}\right)p\left(D^+\right)}{p\left(T^+\right)}.\] Therefore we need:
Two more measures of diagnostic accuracy:
Because of the dependence on the prevalence, these quantities may need to be re-calculated often.
Suppose a diagnostic test for a particular disease has sensitivity 0.99 and specificity 0.8. That is, \[p\left(T^+\mid{D^+}\right) = 0.99\] and \[p\left(T^-\mid{D^-}\right)=0.8.\] The prevalence of the disease in the population is 1%, that is \[ p\left(D^+\right) = 0.01.\]
We first need to calculate \(p\left(T^+\right)\), using partition theorem:
\[\begin{align*} p\left(T^+\right) & = p\left(T^+\mid{D^+}\right)p\left(D^+\right) + p\left(T^+\mid{D^-}\right)p\left(D^-\right)\\ &= 0.99 \times{0.01} + \left(1-p\left(T^-\mid{D^-}\right)\right)p\left(D^-\right)\\ & = 0.99 \times{0.01} + 0.2 \times{0.99}\\ & = 0.0099 + 0.198\\ & = 0.2079. \end{align*}\]
Reminder
\[\begin{align*} p\left(T^+\mid{D^+}\right) & = 0.99\\ p\left(T^-\mid{D^-}\right)& =0.8\\ p\left(T^+\right) & = 0.2079 \\ p\left(D^+\right) & = 0.01 \end{align*}\]
Bayes Theorem: \[p\left(D^+\mid{T^+}\right) = \frac{p\left(T^+\mid{D^+}\right)p\left(D^+\right)}{p\left(T^+\right)}\]
In this example Even though \(p\left(T^+\mid{D^+}\right)\) is high, \(p\left(D^+\mid{T^+}\right)\) is low
To see this:
\[\begin{align*} p\left(T^+\right) & = p\left(T^+\mid{D^+}\right)p\left(D^+\right) + p\left(T^+\mid{D^-}\right)p\left(D^-\right)\\ & = \underbrace{0.99 \times{0.01}}_{\text{Very few true positives}} + \underbrace{0.2 \times{0.99}}_{\text{Many false positives}} \end{align*}\]
More generally
The sensitivity \(p\left(T^+\mid{D^+}\right)\) is often confused with the
predictive power \(p\left(D^+\mid{T^+}\right)\).This is known as the Prosecutor’s fallacy
The decision tree structure is often used as a calculation tool (ie. with no decisions to be made).
We will use this to think about diagnostic tests
A ‘decision’ tree for diagnostics.
Suppose some disease has prevalence \(p\left(D^+\right)= 0.2\), and we know that
We can then fill in the tree as follows:
A disease-based approach.
These diagnostics results can be used in two (probably more) ways:
The resulting probabilities can feed into another decision analysis.
For example, a gold standard test could be a source of [not quite] perfect information
Suppose we have two tests:
In sequential diagnostic testing:
Let’s imagine that the diagnostic test from our example is the first test (\(T_1\)) in a sequential testing plan.
If 1000 people are tested, we expect:
Test_Positive | Test_Negative | |
---|---|---|
Disease | 172 | 28 |
No Disease | 240 | 560 |
Only the \(172 + 240 =
412\) with a positive result would be sent for the second test
\(T_2\).
Note that even with this rather high disease prevalence of 0.2, less than half of the positive tests are correct.
Let’s say that for the second test \(T_2\) we have
We assume here that the results of \(T_1\) and \(T_2\) are conditionally independent given the disease state
- \(T_1\perp{T_2}\mid{D}\),
- therefore \(T_2\mid{D,T_1}\sim T_2\mid{D}\).
- This means we can use the sensitivity and specificity stated, even though the patients will have already been tested with \(T_1\).
Sequential testing.
From this we can compare the results after just the first test with the results following both tests.
D_pos | D_neg | |
---|---|---|
T1_pos | 172 | 240 |
T1_neg | 28 | 560 |
D_pos | D_neg | |
---|---|---|
Test_pos | 163.4 | 48 |
Test_neg | 36.6 | 752 |
Whether this is acceptable would depend on:
These calculations often feed into health economic models.
More of this, and some of the difficulties surrounding decision making with diagnostic tests, in Sutton et al. (2008).
An important topic in diagnostics (and classification generally)!
In this section:
So far we have assumed a diagnostic test outputs
positive or negative.
In most
cases, a measurement (from which a test is derived) gives a continuous
value
Developed during WW2 to assess accuracy of radar operatives.
Suppose we take some measurement from a number of
people.
Some of the people have disease D, the others
don’t.
Probability distributions of a measurement for people with (D) and without (No D) a disease.
Suppose we set our decision threshold \(T=0\)
We see
Now we set \(T=0.5\)
Any value of \(T\) produces a pair (Sensitivity, Specificity), which we can plot
If we vary the decision threshold continuously, we can produce a ROC curve:
The ROC curve for the measurement shown in Figures 5.1-5.3. AUC is ‘area under the curve’, an idea we will explore shortly.
Each point on the ROC curve corresponds to a value of the decision threshold
Each point on the ROC curve corresponds to a value of the decision threshold
Each point on the ROC curve corresponds to a value of the decision threshold
Each point on the ROC curve corresponds to a value of the decision threshold
Each point on the ROC curve corresponds to a value of the decision threshold
Each point on the ROC curve corresponds to a value of the decision threshold
Each point on the ROC curve corresponds to a value of the decision threshold
Each point on the ROC curve corresponds to a value of the decision threshold
We will explore two main aspects of ROC analysis:
The shape of the ROC curve is determined by the degree of separation in the measurements
This shape of ROC curve indicates an ideal diagnostic, and the area under the curve (AUC) is 1 (the best it can be).
Good separation in distributions of a measurement for people with (D) and without (No D) a disease.
The ROC curve for the measurement with good separation.
The line with \(\text{AUC}=0.5\) is random guessing or full overlap.
Overlapping measurement distributions - much worse classifier
Poor separation in distributions of a measurement for people with (D) and without (No D) a disease.
The ROC curve for the measurement with poor separation.
Some values of \(T\) are better than others, but none is perfect.
In summary:
We will often still want to classify someone as ‘positive’ or ‘negative’
This means we need a value for the decision threshold.
The
optimal value will depend on:
We will assume an equal balance between sensitivity and specificity, but these methods can be weighted.
Youden’s index:
The best decision threshold
according to Youden’s index is the value of \(T\) that maximises \(J(T)\):
\[J\left(T\right) = \operatorname{sensitivity}\left(T\right) + \operatorname{specificity}\left(T\right) - 1\]
Distance from (0,1):
The value of \(T\) that minimizes the distance to the top
left corner (the ideal point):
\[D\left(T\right) = \sqrt{\left(1-\text{specificity}\right)^2 + \left(1-\text{sensitivity}\right)^2}.\]
So far we have been very theoretical. However:
ROC analysis doesn’t rely on any distributional assumptions!
100 measurements - coloured by whether each person has the disease.
ROC is not affected by uneven numbers of disease
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
The ROC curve for our empirical data.
We have only skimmed the surface of ROC analysis, but you can read more about it in Zou, O’Malley, and Mauri (2007) and Fawcett (2006).
In this lecture we have studied:
Decision trees
However, there is no natural way to build time into decision trees…
Diagnostic testing