EM {UEM} | R Documentation |
The function EM
implements the usual EM algorithm for
multivariate Gaussian mixtures or Poisson mixtures. The function updateEM
allows for
updating the result from a fitted mixture model, after new observations
have been coming in. The function UEM
allows to split a data set
(from the start) into several batches, and apply and update EM
sequentially on those batches.
EM(y, K, init = "quantile", family = "Gaussian", iter = -1, threshold = 0.0001, lambda = 0.999, tol=0.5, verbose = FALSE, plot = FALSE, ...) UEM(y, K, init = "quantile", split = NULL, randomize = FALSE, family = "Gaussian", iter = -1, threshold = 0.001, max.time = NULL, lambda = 0.999, verbose = FALSE, plot = FALSE, ...) updateEM(z = NULL, theta, iter = -1, threshold = 0.001, max.time = NULL, lambda = 0.999, verbose = FALSE, plot = FALSE, ...)
y |
a univariate or multivariate data set. |
z |
new data to be added for updating the estimate |
theta |
the value of the parameter vector |
K |
the number of components. |
init |
the type of initialization used. Options include "random" (randomly chosen from the
data), "scatter" (uniformly sampled from the _support_ of the data),
"quantile" (quantile-based), "shortruns" (short runs of EM), "gq" (based
on Gauss Quadrature points). See also help file for |
split |
For the use in |
randomize |
Boolean. For the use in |
family |
Response family. At present, "Gaussian" (default) and "Poisson" are supported. "Cauchy" is in preparation. |
iter |
Number of EM iterations. For |
threshold |
Convergence threshold (in terms of a log-likelihood difference). |
lambda |
calibrates between globally equal component variances ( |
tol |
tuning parameter which scales EM starting points inwards or outwards |
max.time |
Time limit after which execution stops (in seconds). |
verbose |
Boolean. If |
plot |
Boolean. If |
... |
Arguments to be passed to |
A fitted mixture object, of class umix
.
J. Einbeck, D. Bonetti
Einbeck, Jochen & Bonetti, Daniel (2014), A study of online and blockwise updating of the EM algorithm for Gaussian mixtures, in Kneib, Thomas, Sobotka, Fabian, Fahrenholz, Jan & Irmer, Henriette eds, Proceedings of the 29th International Workshop on Statistical Modelling. Goettingen, Germany, 14-18 July 2014 II: 29th International Workshop on Statistical Modelling. Goettingen, University of Goettingen, 35-38.
### Univariate Gaussian Example: data(pistonrings, package="qcc") boxplot(diameter ~ sample, data=pistonrings) dm <- as.matrix(pistonrings$diameter) # EM all at once: fit <- EM(dm,2, threshold=0.005) # Now via update EM fit2 <- UEM(dm,2, split=seq(100,200,by=5), iter= c(10, rep(2,20),-1),plot=TRUE ) # (this gives 100 data points first and iterates 10 times. Then it gives 20 batches # of size five and iterates twice after each batch. Finally it iterates until convergence). # Compare log-likelihoods: logLike(fit,dm) logLike(fit2,dm) ### Bivariate Gaussian Example: require(mvtnorm) s1 <- matrix(c(10,3,3,2),2,2) s2 <- matrix(c(1,3,3,16),2,2) m1 = rmvnorm(n=40, c(4,2), s1) m2 = rmvnorm(n=60, c(9,4), s2) x = rbind(m1, m2) par(mfrow=c(2,2)) plot(x) thetar = EM(x, 2, iter=10) # Standard EM plot(thetar,x, main="EM") i = sample(100, 50) theta0 = EM(x[-i, ], 2, iter=10) # remove 50 points, fit EM to remaining points theta1 = updateEM(x[i,], theta0, iter=10) # put points back, update EM plot(theta0,x[-i, ], col=1, main= "EM (subset)") plot(theta1,x, col= 1+ (1:100)%in%i, main = "update EM") ### Poisson Example: theta <- list("mu"=c(1,8,30),"pi"=c(0.2,0.5,0.3)) theta2 <- list("mu"=c(5,10,100),"pi"=c(0.2,0.2,0.6)) pdat <- poisSimN(100, theta) pdat.z <- poisSimN(20, theta2) poisfit <- EM(pdat, 3, iter=100, family="Poisson") plot.umix(poisfit,pdat) poisup <- updateEM(pdat.z, poisfit, iter=100, dist="Poisson", plot=TRUE) # equivalently, at once: poisall <- UEM(c(pdat,pdat.z), 3,split=100, iter=100, family="Poisson", plot=TRUE) poisup$mu poisall$mu # identical!