| EM {UEM} | R Documentation |
The function EM implements the usual EM algorithm for
multivariate Gaussian mixtures or Poisson mixtures. The function updateEM allows for
updating the result from a fitted mixture model, after new observations
have been coming in. The function UEM allows to split a data set
(from the start) into several batches, and apply and update EM
sequentially on those batches.
EM(y, K, init = "quantile", family = "Gaussian", iter = -1, threshold =
0.0001, lambda = 0.999, tol=0.5, verbose = FALSE, plot = FALSE, ...)
UEM(y, K, init = "quantile", split = NULL, randomize = FALSE,
family = "Gaussian", iter = -1, threshold = 0.001, max.time = NULL,
lambda = 0.999, verbose = FALSE, plot = FALSE, ...)
updateEM(z = NULL, theta, iter = -1, threshold = 0.001, max.time = NULL,
lambda = 0.999, verbose = FALSE, plot = FALSE, ...)
y |
a univariate or multivariate data set. |
z |
new data to be added for updating the estimate |
theta |
the value of the parameter vector |
K |
the number of components. |
init |
the type of initialization used. Options include "random" (randomly chosen from the
data), "scatter" (uniformly sampled from the _support_ of the data),
"quantile" (quantile-based), "shortruns" (short runs of EM), "gq" (based
on Gauss Quadrature points). See also help file for |
split |
For the use in |
randomize |
Boolean. For the use in |
family |
Response family. At present, "Gaussian" (default) and "Poisson" are supported. "Cauchy" is in preparation. |
iter |
Number of EM iterations. For |
threshold |
Convergence threshold (in terms of a log-likelihood difference). |
lambda |
calibrates between globally equal component variances ( |
tol |
tuning parameter which scales EM starting points inwards or outwards |
max.time |
Time limit after which execution stops (in seconds). |
verbose |
Boolean. If |
plot |
Boolean. If |
... |
Arguments to be passed to |
A fitted mixture object, of class umix.
J. Einbeck, D. Bonetti
Einbeck, Jochen & Bonetti, Daniel (2014), A study of online and blockwise updating of the EM algorithm for Gaussian mixtures, in Kneib, Thomas, Sobotka, Fabian, Fahrenholz, Jan & Irmer, Henriette eds, Proceedings of the 29th International Workshop on Statistical Modelling. Goettingen, Germany, 14-18 July 2014 II: 29th International Workshop on Statistical Modelling. Goettingen, University of Goettingen, 35-38.
### Univariate Gaussian Example:
data(pistonrings, package="qcc")
boxplot(diameter ~ sample, data=pistonrings)
dm <- as.matrix(pistonrings$diameter)
# EM all at once:
fit <- EM(dm,2, threshold=0.005)
# Now via update EM
fit2 <- UEM(dm,2, split=seq(100,200,by=5), iter= c(10, rep(2,20),-1),plot=TRUE )
# (this gives 100 data points first and iterates 10 times. Then it gives 20 batches
# of size five and iterates twice after each batch. Finally it iterates until convergence).
# Compare log-likelihoods:
logLike(fit,dm)
logLike(fit2,dm)
### Bivariate Gaussian Example:
require(mvtnorm)
s1 <- matrix(c(10,3,3,2),2,2)
s2 <- matrix(c(1,3,3,16),2,2)
m1 = rmvnorm(n=40, c(4,2), s1)
m2 = rmvnorm(n=60, c(9,4), s2)
x = rbind(m1, m2)
par(mfrow=c(2,2))
plot(x)
thetar = EM(x, 2, iter=10) # Standard EM
plot(thetar,x, main="EM")
i = sample(100, 50)
theta0 = EM(x[-i, ], 2, iter=10) # remove 50 points, fit EM to remaining points
theta1 = updateEM(x[i,], theta0, iter=10) # put points back, update EM
plot(theta0,x[-i, ], col=1, main= "EM (subset)")
plot(theta1,x, col= 1+ (1:100)%in%i, main = "update EM")
### Poisson Example:
theta <- list("mu"=c(1,8,30),"pi"=c(0.2,0.5,0.3))
theta2 <- list("mu"=c(5,10,100),"pi"=c(0.2,0.2,0.6))
pdat <- poisSimN(100, theta)
pdat.z <- poisSimN(20, theta2)
poisfit <- EM(pdat, 3, iter=100, family="Poisson")
plot.umix(poisfit,pdat)
poisup <- updateEM(pdat.z, poisfit, iter=100, dist="Poisson", plot=TRUE)
# equivalently, at once:
poisall <- UEM(c(pdat,pdat.z), 3,split=100, iter=100, family="Poisson", plot=TRUE)
poisup$mu
poisall$mu
# identical!