# # PRACTICAL 1-4: Bayesian analysis of Poisson data' # --------------------------------- # # # In this practical we use R to investigate the conjugate Bayesian analysis for # Poisson data. We will also investigate using the effects of the prior distribution # on the posterior, and the situation where our sample size grows large. # # # * You will need the following skills from previous practicals: # * Creating vectors with `seq` # * Drawing a plot of a known function using `curve` # * Using additional graphical parameters to customise plots with colour (`col`), # line type (`lty`), axis range (`xlim` and `ylim`) etc # * Adding simple straight lines to plots with `abline` # * Writing your own functions with `function` # # * New R techniques: # * Using `plot` to draw plots of general data, and `lines` and `point` to add # additional data # * Using R's built-in functions to evaluate a standard pdf, e.g. `dgamma` and `dnorm` # * Using the quantile functions, e.g. `qgamma`, to produce exact credible intervals # using standard distributions # * Using `sapply` to evaluate a function with different arguments specified by # the element of a vector # # # 1. The Data # ================================================================================== # # # Our dataset concerns the number of volcanoes that erupted each year from 2000 to # 2018, taken from here - https://volcano.si.edu/faq/index.cfm?question=eruptionsbyyear. # The data are given in the table below, where each volcanic eruption is counted in # the year that the eruption started. Volcanic eruptions are split into groups # depending on the severity of the eruption, as measured on the logarithmically-scaled # _Volcanic Explosivity Index_ (VEI). For example, the 2010 eruption of Eyjafjallajökull # in Iceland had a VEI of 4, whereas the eruption of Krakatoa in 1883 was VEI6. # # The eruptions are loosely classified as `Small` (VEI$\leq 2$), `Medium` ($2<$ VEI $<4$), # and `Large` (VEI $\geq 4$). # # Exercise: # ~~~~~~~~~~~~~ # * Run the `R` code below to input the data, and create a new data frame called # `volcano`, with columns `Year`, `Small`, `Medium`, and `Large`. volcano <- data.frame(Year = 2000:2018, Small = c(30, 25, 30, 32, 29, 29, 34, 29, 25, 28, 39, 30, 40, 40, 41, 26, 37, 30, 34), Medium = c(6, 6, 5, 5, 8, 4, 3, 3, 7, 2, 6, 6, 4, 7, 7, 5, 4, 6, 4), Large = c(0, 0, 0, 2, 1, 1, 1, 3, 2, 1, 3, 0, 1, 0, 1, 0, 2, 0, 1)) # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ # ^ TECHNIQUE: # ^ # ^ **Plot, points, and lines** # ^ # ^ The `plot` function produces a scatterplot of its two arguments. Suppose we have # ^ saved our $x$ coordinates in a vector `a`, and our $y$ coordinates in a vector # ^ `b`, then to draw a scatterplot of $(x,y)$ we type # ^ # ^ `plot(x=a, y=b)` # ^ # ^ If the argument labels `x` and `y` are not supplied, _R_ will assume the first # ^ argument is `x` and the second is `y`. If only one vector of data is supplied, # ^ this will be taken as the $y$ value and will be plotted against the integers # ^ `1:length(y)`, i.e. in the sequence in which they appear in the data. # ^ # ^ All of the standard plot functions can be customised by passing additional # ^ arguments to the function. For instance, we can add a plot title and axis labels # ^ by supplying optional arguments: # ^ # ^ + `type` - determines the type of plot to draw. Possible types are: # ^ * "p" for **p**oints # ^ * "l" for **l**ines # ^ * "b" for **b**oth (points connected by lines) # ^ * "h" for **h**istogram-like vertical lines # ^ * "s" for **s**tep lines # ^ + `xlab`, `ylab` - provides a label for the horizontal and vertical axes # ^ + `xlim`, `ylim` - allows for specification of a minimum and maximum value of the # ^ corresponding axis limits, e.g. `xlim=c(0,10)` will set the horizontal axis # ^ limits to be $[0,10]$. # ^ + `col` - can be specified to [use colour](r_8_advplots.html#col) for drawing # ^ # ^ For example, # ^ # ^ `plot(x=-10:10,y=sin((-10:10)/(2*pi)), type="b", xlab='A', ylim=c(-1.5,1))` # ^ # ^ *Note*: Once a plot has been drawn, it is not possible to erase any features from # ^ it - we can only add extra lines or points to it. So, if you make a mistake # ^ drawing your plot then you'll need to start over with a fresh one by calling `plot` # ^ again. # ^ # ^ After creating a `plot`, we can _add_ additional points to the plot by calling the # ^ `points` function which draws additional points at the specified `x` and `y` values. # ^ Similarly, the `lines` function will draw connected lines between its `x` and `y` # ^ arguments. # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ # # Exercise: # ~~~~~~~~~~~~~ # * `plot` the number of `Small` eruptions against the year using a vertical axis # range (`ylim`) of $[0,50]$. # * Experiment with the plot `type`s to see how the different plots represent the # data. # * Redraw the data as a `l`ine `plot`, and add `lines` of different colours to the # same plot for the other two groups of eruptions. # * Do you see any apparent relationship between time and number of eruptions? # # Exercise: # ~~~~~~~~~~~~~ # * Now plot the `hist`ogram of the number of `Medium` eruptions - does it look # like it could follow a Poisson distribution? How else could you quickly assess this? # # # 1. Conjugate analysis with the Gamma-Poisson model # ================================================================================== # # # # 2.1 The Theory # ---------------------------------------------------------------------------------- # # # A Poisson distribution is often used to model the counts of random events occurring # within a fixed period of time at some rate $\lambda$. For such a random variable # X where X ~ Po(lambda) , a Bayesian analysis will require a prior distribution for # the unknown Poisson parameter $\lambda$. We have seen that the Gamma distribution # provides a conjugate prior for this particular problem. Therefore, if our data # X1, ...,Xn are iid Po(lambda), and our prior distribution for lambda is Ga(a,b). # Then the posterior distribution for lambda | X1,...,Xn is given by: # # lambda | X1,...,Xn ~ Ga(a+T, b+n), # # where T=sum_i X_i = n Xbar. # 2.2 The Prior # ---------------------------------------------------------------------------------- # # # First, let's use the Gamma distribution to identify an appropriate prior for our # Poisson count data. # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ # ^ TECHNIQUE: # ^ # ^ **Standard density functions in R** # ^ # ^ `R` provides built-in functions to evaluate the PDF of standard distributions. # ^ The density function for a Gamma distribution $\text{Ga}(a,b)$ is used as follows # ^ # ^ `dgamma(x, shape=a, rate=b)` # ^ # ^ where `x` is the point (or vector of points) where we want to evaluate the PDF, and # ^ `a` and `b` are the usual Gamma parameter values. # ^ # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ # # Exercise: # ~~~~~~~~~~~~~ # * Create a vector containing a `seq`uence of 500 equally-spaced values over the # interval [0,10]. Call this `lambdavals`. # * Evaluate the Ga(1,1) pdf at each of the values of $\lambda$ in `lambdavals` # using the gamma PDF `dgamma`. # * Make a `plot` of the pdf against lambda as a solid black line. # An expert vulcanologist believes that the unknown rate of `Medium` eruptions, lambda, # is such that a range of $[1,6]$ would be plausible. # # Exercise: # ~~~~~~~~~~~~~ # * Interpret the expert's given interval as corresponding to a statement of # E(lambda) +/- 2 sd(lambda). Hence, find the mean and variance of the prior. # * Equate the prior mean and variance to the theoretical values for the expectation # (a/b) and variance (a/b^2) of a Gamma pdf, and hence find the appropriate values # of its parameters a and b. # * Evaluate the expert's prior pdf for lambda at each of `lambdavals`, and save # it as `cnjPrior`. # * For plotting purposes, we need to normalise the values of `cnjPrior` so that # they sum (integrate) to 1. # Divide the values of `cnjPrior` by the `sum` of the values in `cnjPrior`, and # replace `cnjPrior` by these normalised values. # * Produce a fresh plot of the expert's prior pdf for $\lambda$ as a solid red curve. # Ensure your vertical axis covers [0,0.02]. # * Add a vertical red and dashed line (`abline` using the `v` argument) at the # location of the prior expectation for lambda. # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ # ^ TECHNIQUE: # ^ # ^ **Quantiles of standard densities** # ^ # ^ In addition to built-in functions for PDFs of standard distributions, `R` also # ^ provides the _quantile function_ for a pdf. The quantile function evaluates the # ^ inverse of the cumulative distribution function, F_X^{-1}(u). Given a probability # ^ value u in [0,1], the quantile function returns the value x of X for which # ^ P[X <= x]=u, and so the quantile functions are particularly useful for finding # ^ critical values of distributions and for finding exact credible intervals. # ^ # ^ The quantile function for a Gamma density X ~ Ga(a,b) is used # ^ as follows # ^ # ^ `qgamma(alpha, shape=a, rate=b)` # ^ # ^ where `alpha` is the lower tail probability (or vector of lower-tail probabilities) # ^ for which we want the corresponding value(s) of X, and `a` and `b` are the usual # ^ Gamma parameter values. # ^ # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ # # Exercise: # ~~~~~~~~~~~~~ # * Use the `qgamma` function to find a 95% equal-tailed prior credible interval for # lambda using the expert's prior distribution above. # _Hint:_ find the values of lambda with lower and upper tail probabilities of 2.5%. # * How does this compare to the expert's original interval? # 2.3 The Likelihood # ---------------------------------------------------------------------------------- # # The next ingredient in the Bayesian calculation requires us to capture the # information contained in the data via the likelihood. In a Bayesian context, the # likelihood is the conditional distribution of the data X1,..., Xn given the # parameter lambda. # # Let's compute the likelihood given our data on `Medium` volcanic eruptions, and # add it to the plot. # # Exercise: # ~~~~~~~~~~~~~ # * Write a function `poisLike` that computes the Poisson likelihood for the sample # of `Medium` volcano eruptions. Your function should: # * Take one argument `lambda` # * Compute the Poisson probability for each data point in `volcano$Medium` given # the value of `lambda`. # _Hint_: the `dpois` function evaluates Poisson probabilities for a vector of # values `x` and parameter `lambda`. Or you can use your own function from last time. # * Combine the probabilities to find the likelihood of the entire sample, and `return` it. # * If your function is working correctly, you should agree with the output below. # poisLike(5) ## [1] 4.225604e-17 # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ # ^ TECHNIQUE: # ^ # ^ # ^ **Using `sapply` to repeat calculations over a vector** # ^ # ^ We have seen previously that we can use the `replicate` to repeatedly call a # ^ function with no arguments. For functions which do take an argument, we often want # ^ to call that function at many different values. To do this, we use `sapply` to # ^ apply a specified function to every element of a vector as its argument, and then # ^ return a vector formed from the results. # ^ # ^ `sapply(x, fun)` # ^ # ^ applies the function `fun` to every element of the vector `x` it then returns a # ^ vector containing the values of `fun(x[1])`, `fun(x[2])`, and so on. So, to compute # ^ the square-root of the integers 1 to 10, we would write # ^ # ^ `sapply(1:10, FUN=sqrt)` # ^ # ^ which applies the square root function to each of the integers 1 to 10. # ^ # ^ There are other variations of `apply` which work with matrices and other data # ^ structures, and we will see those in later practicals. # ^ # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ # # Exercise: # ~~~~~~~~~~~~~ # * Use `sapply` to evaluate the Poisson likelihood (`poisLike`) for each of the # values of $\lambda$ in `lambdavals`. Save this as `like`. # * Normalise `like` so that its values sum to $1$. # * Add the likelihood to your plot of the prior distribution as connected blue `lines`. # * Add the maximum likelihood estimate of $\lambda$ as a vertical dashed blue line. # * Your plot should look like the one below # 2.4 The Posterior # ---------------------------------------------------------------------------------- # We can apply Bayes rule directly, given the prior and likelihood, and numerically compute the posterior density for all of the values of lambda. # # Exercise: # ~~~~~~~~~~~~~ # * Directly compute the posterior for lambda by Bayes theorem, and save these values # as `postDirect`. Normalise the `postDirect` so that it sums to 1. # _Hint:_ Posterior is proportional to Likelihood times Prior. # * Add the posterior to your plot as a solid green curve. # However, we also know that we're using the conjugate Gamma prior with a Poisson # likelihood, so our posterior distribution for $\lambda$ will also be a Gamma with # parameter values as above. Let's verify that this is the case, and add the # resulting density function to our plot. # # Exercise: # ~~~~~~~~~~~~~ # * Now use conjugacy to evaluate the conjugate Gamma posterior distribution. # * First find the posterior values of the Gamma parameters a and b, save these # as `aPost` and `bPost`. # * Now use `dgamma` with the posterior parameter values to evaluate the posterior # Gamma density at each of `lambdavals`. Save these values to `postConjugate` # and normalise to sum to $1$. # * Draw the conjugate posterior on your plot as a dashed `purple` curve. # * What do you see? Do the results agree with each other? How has the prior changed? # * Add the posterior mean as a vertical green line. # * What do you notice about the relationship between the posterior mean, the prior # mean, and the maximum likelihood estimate? # # Exercise: # ~~~~~~~~~~~~~ # * Use the exact Gamma posterior to find a 95\% posterior credible interval for lambda. # * Compare the posterior interval with the prior interval you found earlier. # What has changed? # # # 3. Further Exercises: Exploring the effects of prior choice and sample size # ================================================================================== # # 3.1 Changing the prior parameters # ---------------------------------------------------------------------------------- # # To investigate how sensitive our results are to our choices for a and b in the prior # Gamma distribution, we're going to want to repeat our previous calculations for # different values of the prior parameters. To make this easier, let's wrap those # calculations and plots in a new function that finds and draws the Gamma posterior # for a given prior and data set: # # Exercise: # ~~~~~~~~~~~~~ # * Write a function called `gammaPoisson` which: # * Takes four arguments: the prior Gamma parameters `a` and `b`, and the # summary statistics `T` and `n`. # * Computes the Ga(a,b) prior and the Ga(a+T,b+n) posterior over the sequence # of values in `lambdavals`. # * Plots the (normalised) prior and posterior Gamma distributions as coloured # curves on one plot. # * Adds the MLE as a dashed vertical line. # * Check your function by evaluating it with the values of `a`,`b`,`T`, and `n` # we used for the `Medium` volcano data above. # # Exercise: # ~~~~~~~~~~~~~ # * Investigate what happens as you change `a` and `b`. # * How does the shape of the prior change? What impact does this have on the posterior # distribution? # * What happens for large values of `a` and `b`? How does this affect your results? # # 3.2 Updating beliefs with large samples # ---------------------------------------------------------------------------------- # # # In Lecture 26, we will see that show that as the sample size grows large, the posterior # distribution tends towards a Normal distribution and the posterior distribution becomes # progressively less affected by the choice of prior distribution. # # Let's expand our data with the records of volcanic eruptions for the entire 20th century. # As records are slightly incomplete, only the counts of `Large` eruptions over this period # can be considered trustworthy. For the entire 100 years, a total of 64 `Large` # (VEI >= 4) eruptions were recorded # # Exercise: # ~~~~~~~~~~~~~ # * Combine the 20th century data with the data from 2000-2018 and compute new summary # statistics for the `Large` volcanic eruptions from 1900-2018. Save these as `Tbig` and `Nbig`. # * Use a prior of $\lambda\sim\text{Ga}(2,2)$, and plot the posterior given the combined # data. What happens to the posterior? # The large sample of data is clearly very informative for lambda, resulting in a # posterior distribution that is concentrated over a small range of possible lambda # values. Let's refocus our plot on that sub-interval: # # Exercise: # ~~~~~~~~~~~~~ # * By modifying your Gamma-Poisson function, redraw the same plot to zoom in on the # interval lambda in [0.25,1.25]. # Hint: you'll need to set the xlim argument to `plot` (or add it as an argument to # your function). # * What do you notice about the shape of the Gamma posterior distribution? # # 3.2.1 Theory: Limiting posterior distribution # # # If we suppose that X=(X1,...,Xn) are an i.i.d. sample of size $n$ from a # distribution with pdf f(x|theta) where Xi is independent of Xj given theta and # f(x|theta) is regular, then for n 'large enough' the posterior distribution for # theta | x is approximately # # f(theta | x) ~= N(thetahat, 1/I(thetahat) ), # # where thetahat is the MLE of the parameter theta, and I(thetahat) is the observed # Fisher information for a sample of size 1. # # For a Poisson distribution, we have: # # * $lambdahat = (sum_i xi)/n # * $L''(lambda) = - (sum_i xi)(lambda^2}. # # Exercise: # ~~~~~~~~~~~~~ # * Using the results above, compute the mean and variance of the limiting posterior # distribution for the `Large` volcano eruption data. # * Using `dnorm`, evaluate the Normal approximation to the posterior pdf for lambda # over `lambdavals`. Normalise it again to sum to one, and add it to your plot using # a thick dashed line. # * Is the limiting normal distribution in agreement with the Gamma posterior? If it # differs, how does it differ and to what extent? # * Do the values of the Gamma prior parameters $a$ and $b$ affect the quality of # the approximation?