文档库 最新最全的文档下载
当前位置:文档库 › LETTER Communicated by Erkki Oja Variational Bayesian Learning of ICA with Missing Data

LETTER Communicated by Erkki Oja Variational Bayesian Learning of ICA with Missing Data

LETTER Communicated by Erkki Oja Variational Bayesian Learning of ICA with Missing Data
LETTER Communicated by Erkki Oja Variational Bayesian Learning of ICA with Missing Data

LETTER Communicated by Erkki Oja Variational Bayesian Learning of ICA with Missing Data Kwokleung Chan

kwchan@https://www.wendangku.net/doc/aa17337207.html,

Computational Neurobiology Laboratory,Salk Institute,La Jolla,CA92037,U.S.A. Te-Won Lee

tewon@https://www.wendangku.net/doc/aa17337207.html,

Institute for Neural Computation,University of California at San Diego,

La Jolla,CA92093,U.S.A.

Terrence J.Sejnowski

terry@https://www.wendangku.net/doc/aa17337207.html,

Computational Neurobiology Laboratory,Salk Institute,La Jolla,CA92037,U.S.A., and Department of Biology,University of California at San Diego,

La Jolla,CA92093,U.S.A.

Missing data are common in real-world data sets and are a problem for many estimation techniques.We have developed a variational Bayesian method to perform independent component analysis(ICA)on high-dimensional data containing missing entries.Missing data are handled naturally in the Bayesian framework by integrating the generative den-sity model.Modeling the distributions of the independent sources with mixture of gaussians allows sources to be estimated with different kurto-sis and skewness.Unlike the maximum likelihood approach,the varia-tional Bayesian method automatically determines the dimensionality of the data and yields an accurate density model for the observed data with-out over tting problems.The technique is also extended to the clusters of ICA and supervised classi cation framework.

1Introduction

Data density estimation is an important step in many machine learning problems.Often we are faced with data containing incomplete entries.The data may be missing due to measurement or recording failure.Another frequent cause is dif culty in collecting complete data.For example,it could be expensive and time-consuming to perform some biomedical tests.Data scarcity is not uncommon,and it would be very undesirable to discard those data points with missing entries when we already have a small data set. Traditionally,missing data are lled in by mean imputation or regression imputation during preprocessing.This could introduce biases into the data

Neural Computation15,1991–2011(2003)c°2003Massachusetts Institute of Technology

1992K.Chan,T.Lee,and T.Sejnowski cloud density and adversely affect subsequent analysis.A more principled way would be to use probability density estimates of the missing entries instead of point estimates.A well-known example of this approach is the use of the expectation-maximization(EM)algorithm in tting incomplete data with a single gaussian density(Little&Rubin,1987).

Independent component analysis(ICA;Hyvarinen,Karhunen,&Oja, 2001)assumes the observed data x are generated from a linear combination of independent sources s:

x D A s Co;(1.1)

where A is the mixing matrix,which can be nonsquare.The sources s have nongaussian density such as p.s l//exp.?j s l j q/.The noise termocan have nonzero mean.ICA tries to locate independent axes within the data cloud and was developed for blind source separation.It has been applied to speech separation and analyzing fMRI and EEG data(Jung et al.,2001).ICA is also used to model data density,describing data as linear mixtures of indepen-dent features and nding projections that may uncover interesting structure in the data.Maximum likelihood learning of ICA with incomplete data has been studied by Welling and Weber(1999)in the limited case of a square mixing matrix and prede ned source densities.

Many real-world data sets have intrinsic dimensionality smaller than that of the observed data.With missing data,principal component analysis cannot be used to perform dimension reduction as preprocessing for ICA. Instead,the variational Bayesian method applied to ICA can handle small data sets with high observed dimension(Chan,Lee,&Sejnowski,2002; Choudrey&Roberts,2001;Miskin,2000).The Bayesian method prevents over tting and performs automatic dimension reduction.In this article,we extend the variational Bayesian ICA method to problems with missing data. More important,the probability density estimate of the missing entries can be used to ll in the missing values.This allows the density model to be re ned and made more accurate.

2Model and Theory

2.1ICA Generative Model with Missing Data.Consider a data set of T data points in an N-dimensional space:X D f x t2R N g,t in f1;:::;T g. Assume a noisy ICA generative model for the data,

P.x t jμ/D Z

.x t j As t Co;a/P.s t jμs/d s t;(2.1)

where A is the mixing matrix andoand[a]?1are the observation mean and diagonal noise variance,respectively.The hidden source s t is assumed

Variational Bayesian Learning of ICA with Missing Data1993 to have L dimensions.Similar to the independent factor analysis of Attias (1999),each component of s t will be modeled by a mixture of K gaussians to allow for source densities of various kurtosis and skewness,

P.s t jμs/D L Y

l

á

K

X

k l

?lk

l

?

s t.l/jálk

l

;ˉlk

l

!

:(2.2)

Split each data point into a missing part and an observed part:x>t D.x o>t; x m>t/.In this article,we consider only the random missing case(Ghahra-mani&Jordan,1994),that is,the probability for the missing entries x m t is independent of the value of x m t,but could depend on the value of x o t.The likelihood of the data set is then de ned to be

.μI X/D

Y

t

P.x o t jμ/;(2.3)

where

P.x o t jμ/D Z

P.x t jμ/d x m t

D ZμZ

.x t j As t Co;a/d x m t

?

P.s t jμs/d s t

D Z

.x o t j[As t Co]o t;[a]o t/P.s t jμs/d s t:(2.4)

Here we have introduced the notation[¢]o t,which means taking only the

observed dimensions(corresponding to the t th data point)of whatever is

inside the square brackets.Since equation2.4is similar to equation2.1,

the variational Bayesian ICA(Chan et al.,2002;Choudrey&Roberts,2001; Miskin,2000)can be extended naturally to handle missing data,but only if

care is taken in discounting missing entries in the learning rules.

2.2Variational Bayesian Method.In a full Bayesian treatment,the pos-terior distribution of the parametersμis obtained by

P.μj X/D P.X jμ/P.μ/

D

Q

t

P.x o t jμ/P.μ/;(2.5)

where P.X/is the marginal likelihood and given as

P.X/D

Z Y

t

P.x o t jμ/P.μ/dμ:(2.6)

1994K.Chan,T.Lee,and T.Sejnowski The ICA model for P.X/is de ned with the following priors on the param-eters P.μ/,

P.A nl/D.A nl j0;?l/P.?l/D D.?l j d o.?l//

P.?l/D.?l j a o.?l/;b o.?l//P.álk l/D.álk l j1o.álk l/;3o.álk l//(2.7)

P.ˉlk

l /D.ˉlk

l j a o.ˉlk l/;b o.ˉlk l//

P.on/D.on j1o.on/;3o.on//P.9n/D.9n j a o.9n/;b o.9n//;(2.8) where.¢/,.¢/and D.¢/are the normal,gamma,and Dirichlet distribu-

tions,respectively:

.x j1;¤/D s

j¤j

.2?/

e?1.x?1/>¤.x?1/I(2.9)

.x j a;b/D

b a

0.a/

x a?1e?bx I(2.10)

D.?j d/D 0.

P

d k/

Q

0.d k/

?d1?1

1£¢¢¢£?d K?1

K

:(2.11)

Here a o.¢/,b o.¢/,d o.¢/,1o.¢/,and3o.¢/are prechosen hyperparameters for

the priors.Notice that¤in the normal distribution is an inverse covariance parameter.

Under the variational Bayesian treatment,instead of performing the in-

tegration in equation2.6to solve for P.μj X/directly,we approximate it

by Q.μ/and opt to minimize the Kullback-Leibler distance between them (Mackay,1995;Jordan,Ghahramani,Jaakkola,&Saul,1999):

?KL.Q.μ/j P.μj X//D Z

Q.μ/log

P.μj X/

D Z

Q.μ/

"

X

t

log P.x o t jμ/C log P.μ/

#

dμ?log P.X/:(2.12)

Since?KL.Q.μ/j P.μj X//·0,we get a lower bound for the log marginal likelihood,

log P.X/?Z

Q.μ/

X

t

log P.x o t jμ/dμC

Z

Q.μ/log

P.μ/

dμ;(2.13)

which can also be obtained by applying Jensen’s inequality to equation2.6. Q.μ/is then solved by functional maximization of the lower bound.A sep-

Variational Bayesian Learning of ICA with Missing Data 1995

arable approximate posterior Q .μ/will be assumed:

Q .μ/D Q .o/Q .a/£Q .A /Q .?/

£Y l

"Q .?l /Y k l

Q .álk l /Q .ˉlk l /#:

(2.14)

The second term in equation 2.13,which is the negative Kullback-Leibler

divergence between approximate posterior Q .μ/and prior P .μ/,is then ex-panded as

Z

Q .μ/log P .μ/

Q .μ/

d μD X l

Z

Q .?l /log

P .?l /

Q .l /

d ?l

C

X l k l

Z Q .álk l /log P .álk l /

lk l /d álk l C

X l k

l

Z

Q .ˉlk l /log

P .ˉlk l /

lk l /

d ˉlk l

C Z Z

Q .A /Q .?/log P .A j ?/d A d ?C Z

Q .?/log P .?/

d ?

C Z Q .o/log P .o/d oC Z

Q .a/log P .a/

d a:

(2.15)2.3Special Treatment for Missing Data.Thus far,the analysis follows

almost exactly that of the variational Bayesian ICA on complete data,except that P .x t j μ/is replaced by P .x o t j μ/in equation 2.6,and consequently the missing entries are discounted in the learning rules.However,it would be

useful to obtain Q .x m t j x o t /,that is,the approximate distribution on the

missing entries,which is given by

Q .x m t

j x o t /

D

Z

Q .μ/

Z

.x m t j [As t C o]m t ;[a]m

t /Q .s t /d s t d μ:

(2.16)

As noted by Welling and Weber (1999),elements of s t given x o t are depen-dent.More important,under the ICA model,Q .s t /is unlikely to be a single

gaussian.This is evident from Figure 1,which shows the probability den-sity functions of the data x and hidden variable s .The inserts show the sample data in the two spaces.Here the hidden sources assume density of P .s l //exp .?j s l j 0:7/.They are mixed noiselessly to give P .x /in the upper graph.The cut in the upper graph represents P .x 1j x 2D ?0:5/,which transforms into a highly correlated and nongaussian P .s j x 2D ?0:5/.

Unless we are interested in only the rst-and second-order statistics

of Q .x m t j x o t /,we should try to capture as much structure as possible of

1996K.Chan,T.Lee,and T.Sejnowski

s2

s1

Figure1:Probability density functions for the data x(top)and hidden sources s(bottom).Inserts show the sample data in the two spaces.The“cuts”show P.x1j x2D?0:5/and P.s j x2D?0:5/.

Variational Bayesian Learning of ICA with Missing Data 1997

P .s t j x o t /in Q .s t /.In this article,we take a slightly different route from Chan et al.(2002)or Choudrey and Roberts (2001)when performing variational Bayesian learning.First,we break down P .s t /into a mixture of K L gaussians in the L -dimensional s space:

P .s t /D

L Y l

áX k l

?lk l .s t .l /j álk l ˉlk l /

!

D X

k 1¢¢¢

X

k L

[?1k 1£¢¢¢£?Lk L £.s t .1/j á1k 1ˉ1k 1/£¢¢¢£.s t .L /j áLk L ˉLk L /]

D

X

k

?k .s t j ák ;ˉk /:

(2.17)

Here we have de ned k to be a vector index.The “k th”gaussian is centered

at ák ,of inverse covariance ˉk ,in the source s space,

k D .k 1;:::;k l ;:::;k L />;

k l D 1;:::;K

ák D .á1k 1;:::;álk l ;:::;áLk L />

ˉk D 0B @ˉ1k 1:::

ˉLk L

1C A ?k D ?1k 1£¢¢¢£?Lk L :

(2.18)

Log likelihood for x o t is then expanded using Jensen’s inequality,

log P .x o t

j μ/D log

Z

P .x o t j s t ;μ/X

k

?k .s t j ák ;ˉk /d s t D log X

k

?k

Z P .x o t j s t ;μ/.s t j ák ;ˉk /d s t ?X

k

Q .k t /log Z

P .x o t j s t ;μ/.s t j ák ;ˉk /d s t

C

X

k

Q .k t /log

?k

t /

:(2.19)

Here,Q .k t /is a short form for Q .k t D k /.k t is a discrete hidden variable,and Q .k t D k /is the probability that the t th data point belongs to the k th gaussian.Recognizing that s t is just a dummy variable,we introduce Q .s k t /,

1998K.Chan,T.Lee,and T.Sejnowski

Figure2:A simpli ed directed graph for the generative model of variational ICA.x t is the observed variable,k t and s t are hidden variables,and the rest are model parameters.The k t indicates which of the K L expanded gaussians generated s t.

apply Jensen’s inequality again,and get

log P.x o t jμ/?X

k Q.k t/

μZ

Q.s k t/log P.x o t j s k t;μ/d s k t

C

Z

Q.s k t/log

.s k t ják;ˉk/

k t/

d s k t

?

C X

k Q.k t/log

?k

t/

:(2.20)

Substituting log P.x o t jμ/back into equation2.13,the variational Bayesian method can be continued as usual.We have drawn in Figure2a simpli ed graphical representation for the generative model of variational ICA.x t is the observed variable,k t and s t are hidden variables,and the rest are model parameters,where k t indicates which of the K L expanded gaussians generated s t.

3Learning Rules

Combining equations2.13,2.15,and2.20,we perform functional maximiza-tion on the lower bound of the log marginal likelihood,log P.X/,with re-gard to Q.μ/(see equation2.14),Q.k t/and Q.s k t/(see equation2.20)—for example,

log Q.o/D log P.o/C Z

Q.μno/

X

t

log P.x o t jμ/dμnoC const.;(3.1)

Variational Bayesian Learning of ICA with Missing Data 1999

where μn ois the set of parameters excluding o.This gives

Q .o/D

Y

n

.on j 1.on /;3.on //

3.on /D 3o .on /C h 9n i X

t

o nt

1.on /D 3o .on /1o .on /C h 9n i

P

t

o nt P k Q .k t /h .x nt ?A n ¢s k t /i 3.on /

:

(3.2)

Similarly ,

Q .a/D

Y

n

.9n j a .9n /;b .9n //

a .9n /D a o .9n /C

1X

t

o nt

b .9n /D b o .9n /C 1X

t

o nt X k

Q .k t /h .x nt ?A n ¢s k t ?on /2i :

(3.3)

Q .A /D

Y n

.A n ¢j 1.A n ¢/;¤.A n ¢//

¤.A n ¢/D 0

B

@

h ?1i

::

:h ?L i

1

C

A C h 9n i X t o nt X k

Q .k t /h s k t s >k t i

1.A n ¢/D áh 9n i X t

o nt .x nt ?h on i /X

k

Q .k t /h s >k t i !

¤.A n ¢/?1

:

(3.4)

Q .?/D

Y

l

.?l j a .?l /;b .?l //

a .?l /D a o .?l /C

N

b .?l /D b o .?l /C

1X n h A 2

nl i :(3.5)

Q .?l /D D .?j d .?l //

d .?lk /D d o .?lk /C X t X

k l D k

Q .k t /:

(3.6)

2000K.Chan,T.Lee,and T.Sejnowski

Q .álk l /D .álk l j 1.álk l /;3.álk l //

3.álk l /D 3o .álk l /C h ˉlk l i X t X

k l D k

Q .k t /

1.álk l /D

3o .álk l /1o .álk l /C h ˉlk l i

P t

P

k l D k

Q .k t /h s k t .l /i

3.álk l /

:(3.7)

Q .ˉlk l /D .ˉlk l j a .ˉlk l /;b .ˉlk l //

a .ˉlk l /D a o .ˉlk l /C 1X t X

k l

D k

Q .k t /

b .ˉlk l /D b o .ˉlk l /C

12X t X

k l

D k

Q .k t /h .s k t .l /?álk l /2i :

(3.8)

Q .s k t /D .s k t j 1.s k t /;¤.s k t //¤.s k t /D 0B @h ˉ1k 1i :::

h ˉL k L i

1C A C *A >0B @o 1t 91

:::

o Nt 9N

1C A A +¤.s k t /1.s k t /D 0B @h ˉ1k 1á1k 1i :::

h ˉL k L áL k L i

1C A C *A >0B @o 1t 91

:::

o Nt 9N

1C A .x t ?o/+:(3.9)

In the above equations,h¢i denotes the expectation over the posterior distri-butions Q .¢/.A n ¢is the n th row of the mixing matrix A ,P

k l D k means picking out those gaussians such that the l th element of their indices k has the value of k ,and o t is an indicator variable for observed entries in x t :

o nt D ?

1;if x nt is observed

0;if x nt is missing :

(3.10)For a model of equal noise variance among all the observation dimensions,the summation in the learning rules for Q .a/would be over both t and

Variational Bayesian Learning of ICA with Missing Data2001 n.Note that there exists scale and translational degeneracy in the model,

as given by equation2.1and2.2.After each update of Q.?l/,Q.álk

l /,and

Q.ˉlk

l /,it is better to rescale P.s t.l//to have zero mean and unit variance.

Q.s k t/,Q.A/,Q.?/,Q.o/,and Q.a/have to be adjusted correspondingly. Finally,Q.k t/is given by

log Q.k t/D h log P.x o t j s k t;μ/i C h log.s k t ják;ˉk/i

?h log Q.s k t/i C h log?k i?log z t;(3.11)

where z t is a normalization constant.The lower bound E.X;Q.μ//for the log marginal likelihood,computed using equations2.13,2.15,and2.20,can be monitored during learning and used for comparison of different solutions or models.After some manipulation,E.X;Q.μ//can be expressed as

E.X;Q.μ//D

X

t log z t C

Z

Q.μ/log

P.μ/

dμ:(3.12)

4Missing Data

4.1Filling in Missing Entries.Recovering missing values while per-forming demixing is possible if we have N>L.More speci cally,if the number of observed dimensions in x t is greater than L,the equation x o t D[A]o t¢s t(4.1) would be overdetermined in s t unless[A]o t has a rank smaller than L.In this case,Q.s t/is likely to be unimodal and peaked,point estimates of s t would be suf cient and reliable,and the learning rules of Chan et al.(2002), with small modi cation to account for missing entries,would give a rea-sonable approximation.When Q.s t/is a single gaussian,the exponential growth in complexity is avoided.However,if the number of observed di-mensions in x t is less than L,equation4.1is now underdetermined in s t, and Q.s t/would have a broad,multimodal structure.This corresponds to overcomplete ICA where single gaussian approximation of Q.s t/is unde-sirable and the formalism discussed in this article is needed to capture the higher-order statistics of Q.s t/and produce a more faithful Q.x m t j x o t/.The approximate distribution Q.x m t j x o t/can be obtained by

Q.x m t j x o t/D X

k Q.k t/

Z

±.x m t?x m k t/Q.x m k t j x o t;k/d x m k t;(4.2)

2002K.Chan,T.Lee,and T.Sejnowski

where ±.¢/is the delta function,and

Q .x m k t j x o

t ;k /D

Z

Q .μ/

Z .x m k t j [As k t C o]m t ;[a]m

t /Q .s k t /d s k t d μ

D

Z Z

Q .A /Q .a/.x m k t j 1.x m k t /;¤.x m

k t //d A d a

(4.3)1.x m k t /D [A 1.s k t /C 1.o/]m

t

(4.4)¤.x m k t /

?1

D [A ¤.s k t /?1A >C ¤.o/?1C diag .a/?1]m t :(4.5)

Unfortunately,the integration over Q .A /and Q .a/cannot be carried out an-alytically,but we can substitute h A i and h ai as an approximation.Estimation

of Q .x m t j x o t /using the above equations is demonstrated in Figure 3.The

shaded area is the exact posterior P .x m t j x o t /for the noiseless mixing in

Figure 1with observed x 2D ?2and the solid line is the approximation by equations 4.2through 4.5.We have modi ed the variational ICA of Chan et al.(2002)by discounting missing entries.This is done by replacing P t

-

Figure 3:The approximation of Q .x m t j x o

t /from the full missing ICA (solid line)and the polynomial missing ICA (dashed line).The shaded area is the

exact posterior P .x m t j x o

t /corresponding to the noiseless mixture in Figure 1with observed x 2D ?2.Dotted lines are the contribution from the individual

Q .x m k t

j x o t ;k /.

Variational Bayesian Learning of ICA with Missing Data 2003

with P

t o nt and 9n with o nt 9n in their learning rules.The dashed line is the

approximation Q .x m t j x o t /from this modi ed method,which we refer to

as polynomial missing ICA .The treatment of fully expanding the K L hidden source gaussians discussed in section 2.3is named full missing ICA .The full

missing ICA gives a more accurate t for P .x m t j x o t /and a better estimate

for h x m t j x o t i .From equation 2.16,

Q .x m t j x o t /D Z Q .μ/Z .x m t j [As t C o]m t ;[a]m

t /Q .s t /d s t d μ;(4.6)and the above formalism,Q .s t /,becomes

Q .s t /D X

k

Q .k t /Z ±.s t ?s k t /Q .s k t /d s k t ;

(4.7)

which is a mixture of K L gaussians.The missing values can then be lled in by

h s t j x o

t i D Z s t Q .s t /d s t D X k

Q .k t /1.s k t /

(4.8)h x m t j x o t i D

Z

x m t Q .x m t j x o t /d x m

t

D

X

k

Q .k t /1.x m k t /D [A ]m t h s t j x o t i C [1.o/]m t ;

(4.9)

where 1.s k t /and 1.x m k t /are given in equations 3.9and 4.4.Alternatively,

a maximum a posterior (MAP)estimate on Q .s t /and Q .x m t j x o t /may be

obtained,but then numerical methods are needed.

4.2The “Full”and “Polynomial”Missing ICA.The complexity of the full variational Bayesian ICA method is proportional to T £K L ,where T is the number of data points,L is the number of hidden sources assumed,and K is the number of gaussians used to model the density of each source.If we set K D 2,the ve parameters in the source density model P .s t .l //are already enough to model the mean,variance,skewness,and kurtosis of the source distribution.The full missing ICA should always be preferred if memory and computational time permit.The “polynomial missing ICA”converges more slowly per epoch of learning rules and suffers from many more local maxima.It has an inferior marginal likelihood lower bound.The problems are more serious at high missing data rates,and a local maximum solution is usually found instead.In the full missing ICA,Q .s t /is a mixture of gaussians.In the extreme case,when all entries of a data point are missing,that is,empty x o t ,Q .s t /is the same as P .s t j μ/and would not interfere with the learning of P .s t j μ/from other data point.On the other hand,the single gaussian Q .s t /in the polynomial missing ICA would drive P .s t j μ/to become gaussian too.This is very undesirable when learning ICA structure.

2004K.Chan,T.Lee,and T.Sejnowski 5Clusters of ICA

The variational Bayesian ICA for missing data described above can be easily extended to model data density with C clusters of ICA.First,all parameters μand hidden variables k t,s k t for each cluster are given a superscript index c.Parameter?D f?1;:::;?C g is introduced to represent the weights on the clusters.?has a Dirichlet prior(see equation2.11).2D f?;μ1;:::;μC g is now the collection of all parameters.Our density model in equation2.1 becomes

P.x t j2/D X

c

P.c t D c j?/P.x t jμc/

D X

c P.c t D c j?/

Z

.x t j A c s c t Coc;ac/P.s c t jμc s/d s c t:(5.1)

The objective function in equation2.13remains the same,but withμreplaced by2.The separable posterior Q.2/is given by

Q.2/D Q.?/

Y

c

Q.μc/(5.2) and similar to equation2.15,

Z

Q.2/log P.2/

Q.2/

d2D

Z

Q.?/log

P.?/

Q./

d?

C X

c

Z

Q.μc/log

P.μc/

/

dμc:(5.3)

Equation2.20now becomes,

log P.x o t j2/?X

c Q.c t/log

P.c t/

Q.c t/

C X

c;k

Q.c t/Q.k c t/

£μZ

Q.s c k t/log P.x o t j s c k t;μc/d s c k t

C

Z

Q.s c k t/log

.s c k t jác k;ˉc k/

Q.c k t/

d s c k t

?

C X

c;k Q.c t/Q.k c t/log

?c k

c

t

/

:(5.4)

We have introduced one more hidden variable c t,and Q.c t/is to be inter-preted in the same fashion as Q.k c t/.All learning rules in section3remain

Variational Bayesian Learning of ICA with Missing Data2005

the same,only with P

t

replaced by

P

t

Q.c t/.Finally,we need two more

learning rules,

d.?c/D d o.?c/C

X

t

Q.c t/(5.5) log Q.c t/D h log?c i C log z c t?log Z t;(5.6)

where z c t is the normalization constant for Q.k c t/(see equation3.11)and Z t is for normalizing Q.c t/.

6Supervised Classi cation

It is generally dif cult for discriminative classi ers such as multilayer per-ceptron(Bishop,1995)or support vector machine(Vapnik,1998)to handle missing data.In this section,we extend the variational Bayesian technique to supervised classi cation.

Consider a data set.X T;Y T/D f.x t;y t/,t in.1;:::;T/g.Here,x t contains the input attributes and may have missing entries.y t2f1;:::;y;:::;Y g indicates which of the Y classes x t is associated with.When given a new data point x T C1,we would like to compute P.y T C1j x T C1;X T;Y T;M/,

P.y T C1j x T C1;X T;Y T;M/

D P.x T C1j y T C1;X T;Y T;M/P.y T C1j X T;Y T;M/

T C1T;Y T;/

:(6.1) Here M denotes our generative model for observation f x t;y t g: P.x t;y t j M/D P.x t j y t;M/P.y t j M/:(6.2) P.x t j y t;M/could be a mixture model as given by equation5.1.

6.1Learning of Model Parameters.Let P.x t j y t;M/be parameterized by2y and P.y t j M/be parameterized by!D.!1;:::;!Y/,

P.x t j y t D y;M/D P.x t j2y/(6.3) P.y t j M/D P.y t D y j!/D!y:(6.4) If!is given a Dirichlet prior,P.!j M/D D.!j d o.!//,its posterior has also a Dirichlet distribution:

P.!j Y T;M/D D.!j d.!//(6.5)

d.!y/D d o.!y/C

X

t

I.y t D y/:(6.6)

2006K.Chan,T.Lee,and T.Sejnowski I.¢/is an indicator function that equals1if its argument is true and0other-wise.

Under the generative model of equation6.2,it can be shown that

P.2y j X T;Y T;M/D P.2y j X y/;(6.7) where X y is a subset of X T but contains only those x t whose training labels y t have value y.Hence,P.2y j X T;Y T;M/can be approximated with Q.2y/ by applying the learning rules in sections3and5on subset X y.

6.2Classi cation.First,P.y T C1j X T;Y T;M/in equation6.1can be com-puted by

P.y T C1D y j X T;Y T;M/D Z

P.y T C1D y j!y/P.!y j X T;Y T/d!y

D d.!y/

P

y y /

:(6.8)

The other term P.x T C1j y T C1;X T;Y T;M/can be computed as

log P.x T C1j y T C1D y;X T;Y T;M/

D log P.x T C1j X y;M/

D log P.x T C1;X y j M/?log P.X y j M/(6.9)

?E.f x T C1;X y g;Q0.2y//?E.X y;Q.2y//:(6.10) The above requires adding x T C1to X y and iterating the learning rules to obtain Q0.2y/and E.f x T C1;X y g;Q0.2y//.The error in the approximation is the difference KL.Q0.2y/;P.2y j f x T C1;X y g//?KL.Q.2y/;P.2y j X y//.If we assume further that Q0.2y/?Q.2y/,

log P.x T C1j X y;M/?Z

Q.2y/log P.x T C1j2y/d2y

D log Z T C1;(6.11) where Z T C1is the normalization constant in equation5.6.

7Experiment

7.1Synthetic Data.In the rst experiment,200data points were gener-ated by mixing four sources randomly in a seven-dimensional space.The generalized gaussian,gamma,and beta distributions were used to repre-sent source densities of various skewness and kurtosis(see Figure5).Noise

Variational Bayesian Learning of ICA with Missing Data2007

Figure4:In the rst experiment,30%of the entries in the seven-dimensional data set are missing as indicated by the black entries.(The rst100data points are shown.)

Figure5:Source density modeling by variational missing ICA of the synthetic data.Histograms:recovered sources distribution;dashed lines:original proba-bility densities;solid line:mixture of gaussians modeled probability densities; dotted lines:individual gaussian contribution.

at?26dB level was added to the data,and missing entries were created with a probability of0.3.The data matrix for the rst100data points is plotted in Figure4.Dark pixels represent missing entries.Notice that some data points have fewer than four observed dimensions.In Figure5,we plotted the histograms of the recovered sources and the probability density functions(pdf)of the four sources.The dashed line is the exact pdf used to generate the data,and the solid line is the modeled pdf by mixture of two one-dimensional gaussians(see equation2.2).This shows that the two gaussians gave adequate t to the source histograms and densities.

2008K.Chan,T.Lee,and T.Sejnowski

------Number of dimensions

l o g m a r g i n a l l i k e l i h o o d l o w e r b o u n d

Figure 6:E .X ;Q .μ//as a function of hidden source dimensions.Full missing ICA refers to the full expansions of gaussians discussed in section 2.3,and polynomial missing ICA refers to the Chan et al.(2002)method with minor modi cation.

Figure 6plots the lower bound of log marginal likelihood (see equa-tion 3.12)for models assuming different numbers of intrinsic dimensions.As expected,the Bayesian treatment allows us to the infer the intrinsic di-mension of the data cloud.In the gure,we also plot the E .X ;Q .μ//from the polynomial missing ICA.Since a less negative lower bound represents a smaller Kullback-Leibler divergence between Q .μ/and P .X j μ/,it is clear from the gure that the full missing ICA gave a better t to the data density.

7.2Mixing Images.This experiment demonstrates the ability of the pro-posed method to ll in missing values while performing demixing.This is made possible if we have more mixtures than hidden sources,or N >L .The top row in Figure 7shows the two original 380£380pixel images.They were linearly mixed into three images,and ?20dB noise was added.Miss-ing entries were introduced randomly with probability 0.2.The denoised mixtures are shown in the third row of Figure 7,and the recovered sources are in the bottom row.Only 0.8%of the pixels were missing from all three mixed images and could not be recovered;38.4%of the pixels were missing from only one mixed image,and their values could be lled in with low

Variational Bayesian Learning of ICA with Missing Data2009

Figure7:A demonstration of recovering missing values when N>L.The original images are in the top row.Twenty percent of the pixels in the mixed images(second row)are missing at random.Only0.8%are missing from the denoised mixed images(third row)and separated images(bottom).

2010K.Chan,T.Lee,and T.Sejnowski uncertainty;and9.6%of the pixels were missing from any two of the mixed images.Estimation of their values is possible but would have high uncer-tainty.From Figure7,we can see that the source images were well separated and the mixed images were nicely denoised.The signal-to-noise ratio(SNR) in the separated images was14dB.We have also tried lling in the missing pixels by EM with a gaussian model.Variational Bayesian ICA was then ap-plied on the“completed”data.The SNR achieved in the unmixed images was5dB.This supports that it is crucial to have the correct density model when lling in missing values and important to learn the density model and missing values concurrently.The denoised mixed images in this example were meant only to illustrate the method visually.However,if.x1;x2;x3/ represent cholesterol,blood sugar,and uric acid level,for example,it would be possible to ll in the third when only two are available.

7.3Survival Prediction.We demonstrate the supervised classi cation discussed in section6with an echocardiogram data set downloaded from the UCI Machine Learning Repository(Blake&Merz,1998).Input variables are age-at-heart-attack,fractional-shortening,epss,lvdd,and wall-motion-index. The goal is to predict survival of the patient one year after heart attack.There are24positive and50negative examples.The data matrix has a missing rate of5.4%.We performed leave-one-out cross-validation to evaluate our classi er.Thresholding the output P.y T C1j X T;Y T;M/,computed using equation6.10,at0.5,we got a true positive rate of16/24and a true negative rate of42/50.

8Conclusion

In this article,we derived the learning rules for variational Bayesian ICA with missing data.The complexity of the method is proportional to T£K L, where T is the number of data points,L is the number of hidden sources assumed,and K is the number of1D gaussians used to model the density of each source.However,this exponential growth in complexity is man-ageable and worthwhile for small data sets containing missing entries in a high-dimensional space.The proposed method shows promise in analyzing and identifying projections of data sets that have a very limited number of expensive data points yet contain missing entries due to data scarcity.The extension to model data density with clusters of ICA was discussed.The application of the technique in a supervised classi cation setting was also covered.We have applied the variational Bayesian missing ICA to a pri-mates’brain volumetric data set containing44examples in57dimensions. Very encouraging results were obtained and will be reported in another article.

如何写求职信CoverLetter

如何写求职信Cover Letter 一封高质量的英文求职信是帮助你应聘成功的敲门砖。如果你自己是一件商品的话,那么你的英文求职信,就是一封把你自己推销出去的商业信函。求职信不可抄袭上千篇一律的东西,不但要写出自己的特点,还要根据应聘的公司有一定的针对性,这样才能吸引住招聘者的目光。下面就介绍一下写求职信的要点和注意事项,希望对渴望找到一份理想工作的你有所帮助。 writing a cover letter can be a daunting task. these tips should help make it a little easier: ·direct your letter to the decisionmaker of the organization. ·write a short, concise letter (23 paragraphs at most). ·type the letter and envelope. ·sign your name. ·use standard english. ·doublecheck your spelling and grammar. ·use standard white paper. ·keep a copy of everything you send to each organization.

·follow up with a phone call after three business days. besides, there are several cover letter dos and don ‘ts you should keep in mind. dos: ·include the person‘s full name, title, company name, and company address. ·include your full name, address, and contact information. ·use a formal greeting, such as mr., ms., dr. ·center your margins. ·follow up after three business days. ·mention how you found out about the position. ·be upbeat and creativemake your letter stand out. ·keep copies of everything you send. don‘ts: ·don‘t write a long lettershorter is better. ·don‘t address the person by first name unless you know him or her personally and have permission to do so. ·don‘t forget to personally sign the letter.

求职信coverletter十大样板

求职信Cover Letter十大样板 Date: Name: Add: Dear Ms. Peter: Sincerely, I am interested in exploring career opportunities in _______________ with _______________ (company), and have therefore enclosed my resume for your reference. Should you be in the market for a results-oriented professional for your firm, I would encourage you to consider my credentials. As my resume indicates, I have _______________ years experience in _______________. My day-to-day output reflects a high level of motivation, efficiency, and ability to meet any objective. I have a proven ability to troubleshoot, perform under a minimum amount of supervision, and demonstrate a high degree of initiative and good judgment. I am interested in interviewing with you or one of your associates. I can be reached at _______________ to arrange for an interview at a convenient time. Drew Sterling Enclosure Date: Name: Add: Dear Ms. Taylor: Sincerely,

cover letter 大学生的英语求职信样板

No.12, North Renmin Road Jishou,Hunan Province 416000,P.R.China Tel: 0743-152****1631 E-mail :****@https://www.wendangku.net/doc/aa17337207.html, May 3, 2014 Mr.Wheaton Director of Human Resources Cape May Import Export Co. Shenzhen, Guangdong, China Dear sir , In reference to your advertisement in the https://www.wendangku.net/doc/aa17337207.html, ,please accept this cover l etter as an application for the position as an English language instruction coordinator .I love this job since I am an English major. I believe that I am the right one you are finding . I’m currently a senior Business English major student in the College of International Exchange and Public Education of ** University .In school period, I studied the specialized knowledge diligently, and put huge enthusiasm and energy in it. Beside learning the textbook knowledge, I participated in all kinds of school activities actively and have obtained a lot of experience . I have passed TEM4 and TEM8 and I have great confidence with my oral English. I also familiar with Microsoft Office Programs: Word, Excel.I am interested in teaching .This year ,as a substitute teacher in my university,I teached fresh students oral English for nearly two months. I also worked as a English teacher in a local middle school of my town for two months, and I designed and supervised an oral English training activity for fresh students. And the training result turned out to be highly efficient. My teachers and friends say I have great responsibility for my job. My classroom experience in English education and English training activity experience will be great helpful to the job of English language instruction coordinator ,if fortunately, i am employed by you . If you desire an interview, I shall be most happy to be called, on any day and at any time. Enclosed is my resume. If there is further information that you wish in the meantime, please reach me at the address given at the beginning of this letter. Sincerely, ***** Enclosure

coverletter求职信范文2篇

sample 1 Dear Mr. Wang, I am a fresh graduate from Xiamen University of Technology, and I am writing in response to your advertisement for recruiting a software programmer intern posted on our university BBS yesterday. I hope I can take the job. The main reason for my confidence in this position lies in both my extensive academic training in software programming, and my work experience in the relevant industry which has further polished my abilities. Moreover, I have much spare time this semester, which can ensure my time commitment for the internship. Please find more details in my enclosed resume. Thank you for your time and patience, and I would greatly appreciate it if you could grant me an interview. Yours sincerely,

Cover letter Mckinsey 麦肯锡求职信

Xueying Wang Tsinghua University Gongwu Building 570 +86-181******** Xueying.wangthy@https://www.wendangku.net/doc/aa17337207.html, 12/10/2013 Mckinsey&Company Human Resources Dear Mr. / Ms., I’m writing to you to apply for theposition of Business Analyst based in Shanghai. My name is Xueying WANG, second year master student in Tsinghua University majored in Chemical Engineering. I’m strongly interested in the consulting although I have an engineering background--for me they don’t conflict with each other; what’s more, the engineering background gives me a logical thinking, solid analytic skills as well as a good numerical sense. Mckinsey is the world-leading consultancy company with a long history and a prosperous development in China, it is the very one company in which I’m passionate. I bear a strong curiosity for new things, I’m ready for challenges, I have a logical critical thinking(GMAT 720 without special revision) and language abilities (fluent in English and in French). I’m prepared to solve complicated business problems. 6 months’ experience of PTA in BCG gave me a better understanding of finance and industries; Hejun business school gave me a deep perspective of the Chinese market with many real merger & acquisition cases. I also noticed the different ways in which Chinese local consultancy companies and foreign companies such as BCG work—they each have their own advantages, to be more adapted in the Chinese market, I think foreign companies may lower their profile to have a look at how local companies manage to success. It’s always useful to know more about competitors. I have done 2 internships in the consulting. The most recent one is in a French Consultancy company called Daxue Consulting who helps the foreign companies to study the Chinese market for their entry. As it was a small enterprise, I did quite a lot of analytic work (desk research,quantitative/qualitative interviews).Before that I worked for BCG as PTA during half a year where I did lots of basic tasks and trained my commercial sense. I'm a strong-determined person.When I have a goal I make all my efforts to reach it. I learn things very fast.When I first arrived in France I didn't speak a word in French. At first I was so upset because I had always been excellent but at that moment I understood nothing at all in class and with my friends. However I adapted myself very quickly to the new situation and told myself it didn't matter to begin from zero. Miraculously in 3 months I acquired the basic communication skills as well as the ability to get through academic issues. These two years' experience also shaped my multi-cultural communication skills. I'm a multitask person. This year besides my scientific research,I successfully run the sino-french association in Tsinghua(400+ members), I did my internship in BCG, I took the GMAT exam, I built up my music band, I joint Hejun Business school, through my great efforts I arranged a meeting between Tsinghua international office and a French institute where we signed a double-degree program…I can do all these things because I take initiatives when I work , I’m efficient, I know how to manage my time to optimize my schedule in order to do many things at one time and do them well. I’m a good team leader and a good listener. I’m persuasive and I have a great affinity. I think I’m fully qualified for such a position. Thank you in advance for your kind consideration! Yours sincerely Xueying WANG

英语求职信范文Cover letter

Mike Green Address: ****** Call: ****** E-mail: ******* Barry Rubin Chair and Program Medical Director Peter Munk Cardiac Center The Premier Cardiac Center in Canada Dear Dr. Barry Rubin I would like to express my interest in innovative cardiac and vascular therapies and the available Peter Munk Cardiac Center volunteer described on your website. I am 18 years old and willing to volunteer at Peter Munk Cardiac Center for two years. During my study at ******High School, I acted as Commissary of General Affairs, knew about classmates’ life and helped them solve problems. My teachers and classmates described me as “strong problem-solving skills and conflict-resolution skills, caring, friendly, careful, reliable, punctual and resourceful”. To become more team-oriented, customer-service oriented and proactive, I had two part-time jobs on summer holiday. In addition, I am going to read quantities of beneficial books, study hard and try more to improve my quality and knowledge. Thank you for your time spent reading this cover letter and I would greatly appreciate if you could give me an interview. Sincerely, Mike Green

教你写英文求职信 (cover letter)

教你写英文求职信(cover letter) Writing A Cover Letter A complete job application consists of a cover letter and a resume. The cover letter is meant to highlight your individuality or personality, and to make you stand out from among hundreds of other applicants. When there are more job seekers around than job vacancies, human resource personnel tend to be more selective when short listing candidates for interview. Hence, you should use the cover letter as a tool to win the heart of a prospective employer. Market yourself to create a positive first impression in the cover letter, so that the person will read your resume, shortlist you for an interview, and offer you a job. A poorly written cover letter is likely to get instant rejection from the employer given the current job market. As there is no standard format for cover letter, you are encouraged to write a particular cover letter, one at a time, to apply for the position of your interest. Cover letter should not be generic, i.e. you should not use the same cover letter for all the companies you wish to approach. This is because details like where and when you learnt about the vacancy, why you are interested to apply, what you have to offer to the company etc. are different for each of these companies. Generally, a well written cover letter should provide answers to what the employers want to know: Are you the kind of person they are looking for? Do you have the relevant education, work experience and skills? Can you handle the work demands, based on the job description? Have you shown a commitment to this particular field of interest? How well can you communicate with others? Are you a team player? Have you any leadership qualities? Guidelines for writing a cover letter: Organize your thoughts carefully Express yourself clearly and reasonably Use strong action words to describe your achievements Use active rather than passive voice Avoid jargon Avoid long sentences Avoid bad grammar and spelling mistakes Limit the length to one page only Proof read before you send via e-mail Layout of a cover letter 1. opening Include your name and address, the date, employer's designation and address, salutation

如何写英语求职信CoverLetter(精选多篇)

如何写英语求职信CoverLetter(精选多篇) dear mr. dong, i noted with interest your advertisement in today’s people’s daily, you will see from the cv that i am a college graduate with great enthusiasm and ambition. not only do i have much relevant knowledge, but also i can Cover all the abilities required in the advertisement, for i have ever taken an internship as an analyst in a similar pany. although i am a graduate without much experience, i’ve spent lots of time developing my munication skills and teamwork spirits. and i can adapt to new situations with enough enthusiasm. if you consider that my qualification and experience are suitable, i should be available for interview at any time. yours sincerely, tom Cover Letter dear sir or madam i am writing this Letter to apply for a position as a field specialist in your corporation. to my knowledge,schlumberger ltd, with three headquarters in new york, paris, and hague, is the leading corporation supplying technology, integrated

商务英语写作求职信

商务英语写作辅导资料十七 主题:对unit 17求职信(job application letters)知识点的补充学习 时间:2013年7月22日-7月28日 内容: 1.the content of job application letter(本章重点) the letter is a kind of business letter and essential in job-hunting. different from resume, it is a cover letter. its purpose is to arouse the reader’s interest, show confidence in your ability to do the job, mention briefly your qualifications and request for an interview. the content of a job application letter is indefinite upon the applicant’s desired position and ind ividual information. generally, the letter should include these main points: (1) to tell how you got to know this vacancy and the purpose of this letter 说明获得信息的渠道以及写求职信的目的 (2) to introduce personal information 介绍个人情况 although having introduced in detail in his resume, the applicant should give a brief introduction to himself, such as degree, experiences, special skills, achievements and so on.尽管在个人 简历中已经作了详细的介绍,求职者在求职信中仍应简要概括一下个人情况,如学历、工作经历、特别技能以及工作业绩等。 (3) to explain the reason of applying for this position or changing job-changing 说明求职的原因或更换工作的原因 the applicant can explain the reason of applying to show his desire of getting this position. if you want to change your last job, please give the reason in letter.此项中,求职者可以进一步阐明求职的原因,表 达想要获得此职位的强烈意愿。如果是跳槽或离职,最好解释说明更换工作的原因。 (4) references 证明人 having got the references’ permission, list their names, addresses, and telephone numbers. two or three references are advisable. they can be omitted if having been listed in the resume. 在征得证明人同意后,把他们的姓名、地址、电话列在求职信中。一般需要两三名。如果在简 历中已经列出,那么此项可以省略。 (5) end 结尾 the aim to write a job application letter is to obtain an opportunity of interview. thus, at the end of the letter, it is time to ask for an interview. 写求职信的首要目的就是要能够获得面试机会。因此,在信 末求职人应请求用人单位安排面试。 (6) pleasantries 客套话 it is advisable to give some pleasantries at last to show politeness and respect, which can impress the reader. 为了表示礼貌和尊敬,应在最后加上一句客套话,给用人单位留下好印象。 in addition, sometimes there is a requirement of applicant’s expected salary in advertisement. then it should be contained in letter as a reference. 此外,有的招聘广告要求求职者注明所期望的工资待遇,那么 在求职信中应包括这一点,供用人单 位参考。 2.the structure of job application letter(本章难点) 与一般的商业信函一样,英语求职信通常由以下几个部分组成: heading 信头

求职信范文英语带翻译3篇

求职信范文英语带翻译3篇 求职信的指数非常重要。求职信的内容就应该仅是一封信。大致浏览一下内容,虽然你希望自己的求职信不是只有一两句话那么简单,但也不要写成多页的长篇大论。 Hello. I am a graduating undergraduate course graduate. I am honored to have the opportunity presented to you my personal information. In society, in order to better play to their talents, would like to make a few self-Leaders recommend. Good university life, cultivate my rigorous scientific thinking method, I also created a positive and optimistic attitude towards life and pioneering spirit of innovation consciousness. Basic knowledge of both inside and outside the classroom to expand the vast social practice, a solid and broad perspective, so I understand the society; to develop a continuous learning and work. Excellent quality is rigorous, the steadfast work style and the unity cooperation, so I am sure they are completely in the positions in setting, dedication, more business! I believe that my skills and knowledge required by your organization is, I sincerely desire, I can sacrifice for the future of their own youth and blood! I am cheerful and lively personality, interests and broad; open-minded, steady work; collective concern, a strong sense of responsibility; sincere, serious work of the initiative, with professionalism.

求职信CoverLetter十大样板

求职信Cover Letter 十大样板 Date: Name: Add: Dear Ms. Peter: Si ncerely, I am in terested in explori ng career opport un ities in ____________ with ______________ (compa ny), and have therefore en closed my resume for your referenee. Should you be in the market for a results-oriented professional for your firm, I would en courage you to con sider my crede ntials. As my resume in dicates, I have ______________years experie nee in ______________ . My day-to-day output reflects a high level of motivatio n, efficie ncy, and ability to meet any objective. I have a prove n ability to troubleshoot, perform un der a minimum amount of supervisi on, and dem on strate a high degree of in itiative and good judgme nt. I am interested in interviewing with you or one of your associates. I can be reached at t o arrange for an in terview at a convenient time. Drew Sterl ing En closure Date: Name: Add: Dear Ms. Taylor: Si ncerely, If your firm is seek ing a professi onal who has dem on strated sound bus in ess judgeme nt, decisive ness, well-developed pla nning, an alytical, and com muni catio n skills, and a con siste ntly high level of performa nce in a variety of progressively resp on sible and challe nging assig nmen ts, please see my en closed resume. I am accustomed to a fast-paced environment where deadlines are a priority and han dli ng multiple jobs simulta neously is a requireme nt. I enjoy a challe nge and work hard to attain my goals. Constant communication with all levels of man ageme nt has stre ngthe ned my in terpers onal skills. I will con tact you n ext week to inquire about the possibility of an in terview. I n the mean time, please do not hesitate to con tact me at __________________ . Drew Sterl ing En closure Date: Name: Add: Dear Mr. Cha ng: I rece ntly read with much in terest about your s pla ns for expa nsion.

COVER LETTER 英文求职信参考

Employer Contact Information Name Title Company Address City, State, Zip Code Date Dear Mr./Ms. LastName, I am a senior at the State University interested in interviewing for a job as a Marketing Assistant with ABCD. I believe that ABCD has the ideal culture in which to pursue my goal of becoming an innovative and responsible arts administrator and contributing to the ABCD company's outstanding record as a center for exhibition and discussion of contemporary art and culture. Academically, my extensive course work in the art history department at the State University has allowed me to gain an understanding and appreciation for the art your company exhibits. I also have studied broader elements of arts management, including audience development and fund-raising as a part of my recent arts management class at State University. As a part of that class, I worked for the BCE Museum examining their marketing strategies, as well as other institutions, in an effort to discover the most effective and efficient manner to promote their upcoming exhibition. This rewarding experience working for the BCE Museum sharpened my desire to continue in this field and helped me develop the tools I need to be an effective arts administrator. I have also been able to spend time working for BCE Museum specifically as a survey assistant, which has further strengthened my interest in museum studies. My previous summer work experience as a staff reporter for The XWZ newspaper, as well as my experience working for my school's paper, have provided me with a strong base of communication knowledge and helped me learn how to develop and work on several projects at once, write about a diverse range of topics, work quickly and efficiently, and further developed my interest in becoming a better marketer. I firmly believe that I will be a good fit in ABCD's innovative environment. Thank you for your time and consideration. Sincerely, Your Signature Your Typed Name

相关文档
相关文档 最新文档