文档库 最新最全的文档下载
当前位置:文档库 › A scheme for spatial scalability using nonscalable encoders

A scheme for spatial scalability using nonscalable encoders

Transactions Letters________________________________________________________________

A Scheme for Spatial Scalability Using Nonscalable Encoders

Rakesh Dugad,Member,IEEE,and Narendra Ahuja,Fellow,IEEE

Abstract—We describe a scheme that achieves spatially scalable coding of video by employing nonscalable video encoders(e.g., MPEG-2main profile),along with a downsampler and an upsam-pler.The scheme is illustrated for the case of coding video at two resolutions.The enhancement layer is coded in two steps by first exploiting the spatial redundancy and then exploiting the temporal redundancy.Hence,the scheme has a separable implementation. Results are presented for five different sequences,coded for three different combinations of base and enhancement layer bit rates.When MPEG-2main profile is used for the nonscalable encoders,the results obtained are comparable to the performance of MPEG-2spatial scalability profile.

Index Terms—Discrete cosine transform(DCT),HDTV, MPEG-2,multiresolution coding,scalable video compression, spatial scalability,subband decomposition.

I.I NTRODUCTION

T O ACCOMMODATE the varied requirements on compu-tational speed,bandwidth,and compatibility with existing equipment,many applications require that a compressed video stream be decodable at various resolutions and signal qualities. Of the various ways of achieving such scalable compression,we shall focus on spatial scalability.

In spatial scalability,the video is coded at a hierarchy of spa-tial resolutions with each higher layer using the(decoded)lower layers for spatial prediction[1].In case of two resolutions,the lower layer is called the base layer and the higher layer is called the enhancement layer.Hence,to obtain the video at lower reso-lution,only the base layer need be decoded,but to get the higher resolution video,both the base and enhancement layers need to be decoded and combined.A special case is simulcast in which the video at each of the various resolutions is coded indepen-dently of the video at every other resolution.This is wasteful of bandwidth because the bitrate can be reduced by exploiting the redundancy across various resolutions as in spatial scalability. Spatial scalability has many applications.It is used in HDTV to maintain compatibility with standard definition TV.For trans-mitting video over dual-priority networks,we can transmit a

Manuscript received October4,2001;revised October25,2002.This work was supported by the Office of Naval Research under Grant N00014-96-1-0502. This paper was recommended by Associate Editor J.-R.Ohm.

R.Dugad was with the Department of Electrical and Computer Engineering and Beckman Institute for Advanced Science and Technology,University of Illinois at Urbana-Champaign,Urbana,IL61801USA.He is now with Flarion Technologies,Bedminster,NJ07921USA(e-mail:dugad@https://www.wendangku.net/doc/f015034743.html,). N.Ahuja is with the Department of Electrical and Computer Engineering and Beckman Institute for Advanced Science and Technology,University of Illinois at Urbana-Champaign,Urbana,IL61801USA(e-mail:ahuja@https://www.wendangku.net/doc/f015034743.html,).

Digital Object Identifier10.1109/TCSVT.2003.816519low-resolution version of the video over high-priority channel and an enhancement layer over the low-priority channel.Also, one solution to transmitting video over bandwidth-constrained channels is to transmit a low-resolution version of the video.For browsing a remote video database,it would be more economical to send low-resolution versions of the video clips to the user and then,depending on his or her interest,progressively enhance the resolution.

In this paper,we propose a scheme that achieves spatial scalability by using two nonscalable encoders(e.g.,MPEG-2 main profile),along with a downsampler and an upsampler. Achieving the functionality of spatial scalability with standard equipment that already contains a number of nonscalable encoders is economically and practically very attractive.

A.Overview of Previous Work

A scheme for scalable compression of images using Lapla-cian pyramid was first proposed by Burt and Adelson[2].Later, the subband decomposition of images[3],[4],along with the theory of wavelets[5],removed the redundancy present in the pyramid representation.Very efficient schemes,such as the em-bedded zero-tree wavelet(EZW)algorithm of Shapiro[6]and the procedure of set partitioning in hierarchical trees(SPIHT) introduced by Said and Pearlman[7]for such scalable com-pression based on subband decomposition,have been devised for still image compression.

However,the extension of such schemes to video is not straightforward because exploiting temporal redundancy usu-ally involves recursive prediction(in the temporal direction). This implies that the encoder and the decoder have to maintain the same state(prediction value)to avoid error propagation. Hence,if the decoder is able to only partially decode the stream,its state will not match with that of the encoder.This leads to error propagation,also called drift.

Various schemes based on two-dimensional(2-D)[8]–[14] and three-dimensional(3-D)[15]–[18]subband decompo-sitions,with and without motion compensation,have been proposed.These schemes extend the ideas from scalable image compression(e.g.,subband decomposition)and nonscalable video compression(e.g.,motion compensation)to achieve scalable video compression and avoid the problem of drift. Drift can also be eliminated by coding the video explicitly at various resolutions,while exploiting the redundancies across the resolutions to reduce the bit rate.For example,in the case of two resolutions,a base layer is created(with any nonscal-able scheme)containing the video at lower resolution.To create the enhancement layer,the high-resolution frame is predicted

1051-8215/03$17.00?2003IEEE

by predicting either its pixel values or its transform coefficients. The spatial scalability scheme adopted in MPEG-2predicts the pixel values using a weighed combination(on a macroblock by macroblock basis)of an upsampled version of the low-resolu-tion frame,and a motion-compensated version of the previous reference frame.This allows for coding the video at two dif-ferent bit rates.The fine granularity scalability(FGS)adopted in MPEG-4allows for coding the video at a variety of bit rates[19]. The enhancement layer codes the difference between the orig-inal and the picture reconstructed from the base layer using bit-plane coding of discrete cosine transform(DCT)coefficients. Unlike MPEG-2,recursive temporal prediction is not used in the enhancement layer which can be truncated into any number of bits per picture after encoding is complete.The enhancement layer quality is proportional to the number of bits decoded. The transform coefficients of the enhancement layer can be predicted in the following manner while exploiting temporal re-dundancy[10],[11],[13].A motion-compensated prediction of the current high-resolution frame is formed by replacing each block by its prediction(closest match in MSE sense)in the pre-vious decoded high-resolution reference frame.This motion-compensated prediction is decomposed with discrete wavelet transform(DWT),and the high-frequency coefficients are used as prediction for the corresponding high-frequency coefficients of the subband decomposition of current high-resolution frame. The residual(prediction error)thus obtained is quantized and coded directly[11],or first DCT transformed and then quan-tized and coded[10],[12].This approach has the following two problems.

?The low-frequency components of a block play a crucial role in deciding its motion-compensated match in the pre-vious reference frame.However,only the high-frequency components are predicted after motion compensation.

Since most blocks have significant energy in the low-fre-quency components,motion compensation will be not very effective at minimizing the energy in the residual of the high-frequency components.

?If the predicted block contains parts which are shifted ver-sions of the corresponding parts of original block,then prediction of the high-frequency components will suffer (because the DWT is shift-variant).

B.Motivation for the Proposed Scheme

In the above section,we saw the problems associated with predicting the high-frequency coefficients of the2-D subband decomposition of a frame by motion compensation on the spa-tial-domain frames(which have predominantly low-frequency content).In our scheme,we use motion compensation to directly match the high-frequency contents which are to be coded in the enhancement layer.This is achieved by performing motion com-pensation on a spatial representation of the high-frequency con-tents(e.g.,edges in the high-resolution frame).

Another feature of our scheme is that it employs two non-scalable encoders,along with a downsampler and an upsampler. Achieving the functionality of spatial scalability with equip-ment containing a number of nonscalable encoders is econom-ically and practically very attractive.Such functionality pro-vides an affordable path to high-definition broadcasting while maintaining compatibility with existing standards and equip-ment.For example,various HDTV encoders in the market em-ploy six SDTV encoders(which already exist on their standard equipment)to get the effect of an HDTV encoder[20],[21]. This makes the equipment for SDTV encoding more attractive to the broadcasters,who can use the same equipment for HDTV broadcasting when they are ready for it.Such equipment also al-lows for switching between SDTV and HDTV transmissions. Our scheme works with any nonscalable encoders.However, we will illustrate our scheme for the case of MPEG-2encoder, which is widely used for encoding high-definition video.The scheme does not require any special hardware apart from a downsampler,an upsampler,and two nonscalable encoders. The scheme works in a sequential fashion by first exploiting the spatial redundancy and then exploiting the temporal redun-dancy on a frame-by-frame basis.Also,there are no weights to be chosen for combining the spatial and temporal predictions. The scheme is described in Section III.

https://www.wendangku.net/doc/f015034743.html,anization of the Paper

Section II describes the spatial scalability scheme used in MPEG-2.Section III describes our scheme for spatial scalability.Section IV describes the downsizing and upsizing schemes used in MPEG-2and our DCT-based schemes.Sec-tion V presents our results for several sequences.Conclusions are presented in Section VI.

II.S PATIAL S CALABILITY IN MPEG-2

A block diagram of the MPEG-2spatial scalability scheme

[22]is shown in Fig.1(a).We only consider the case in which the video is coded at two different spatial resolutions,the higher resolution being double the size of the lower resolution in each direction.Each frame is downsampled(the downsam-pling scheme need not be standardized)to produce the lower resolution frames.These frames are coded using a nonscalable scheme,1e.g.,the main profile of MPEG-2.The compressed stream containing the video at the lower resolution is called the base layer.

Now,consider how the enhancement layer is created.As shown in Fig.1(b),the macroblock in the current frame is predicted using a convex linear combination of two mac-roblocks.The first macroblock is the motion-compensated macroblock of the current macroblock[the MC macroblock can be obtained from the most recently decoded full-resolution frame(P frames),or from a combination of past and future reference macroblocks(B frames),or it can be simply a uniform block of grayscale128(I frames)].This macroblock serves to exploit temporal redundancy.The second macroblock is obtained by upsampling the corresponding

8

(a)

(b)

Fig. 1.(a)Spatial scalability scheme in MPEG-2[1].(b)Forming the spatiotemporal prediction for current macroblock in MPEG-2.

In the spatial scalability profile of MPEG-2,the linear

combining weight

Fig.3.Alternate representation of our scheme shown in Fig.2(a).

4)The temporal correlation between these two residuals is

exploited by predicting the current spatial residual (

blocks (we use

block

is independently downsized to

block is transformed using DCT,

and the

block.The up-

sampling scheme is exactly the reverse of the downsampling scheme.A given small-size image is divided into

DCT block whose high-frequency coefficients are

made equal to zero,as shown in Fig.2(c).Hence,the spatial prediction

:in this sense,the prediction

,which is also used

during upsizing in MPEG-2.Here,the tap 1/8multiplies the row in the bottom field that is being interpolated in the top field.Horizontal downsampling (i.e.,filtering followed by drop-ping every other column)of each row is then carried out using

the odd-length filter

.Vertical

TABLE I

PSNR OF L UMINANCE C OMPONENT W ITH B ASE AND E NHANCEMENT L AYERS C ODED AT V ARIOUS B IT R ATES .E ACH S EQUENCE HAS F RAME S IZE OF 7202480AND F RAME R ATE OF 30F RAMES /S .THE PSNR S R EPORTED ARE A VERAGE V ALUES O VER 150F RAMES .THE F IRST F OUR C OLUMNS ARE FOR THE F ULL -R ESOLUTION F RAMES AND THE L AST T WO C OLUMNS FOR THE B ASE -L AYER F RAMES .THE N UMBERS IN B RACKETS D ENOTE THE I MPROVEMENTS O VER THE PSNR V ALUES FOR MPEG _WT (T HE MPEG-2S PATIAL S CALABILITY S CHEME )OR O VER THE B ASE MPEG PSNR V ALUES .D ETAILED E XPLANATION OF N OTATION IS G IVEN IN S ECTION V .

(a)B ASE L AYER AT 2.0Mbits/s AND E NHANCEMENT L AYER AT 3.0Mbits/s.(b)B ASE L AYER AT 2.5Mbits/s AND E NHANCEMENT L AYER AT 3.5Mbits/s.(c)B ASE L AYER AT 4.0Mbits/s AND E NHANCEMENT L AYER AT 6.0

Mbits/s.

(a)

(b)

(c)

downsampling is implemented in two steps:first,downsample

by the odd-length

filter

,and then downsample again by the same filter.The center of the odd-length filter is made to coincide with the position of the rows corresponding to the top field.Since the vertical filter lengths used are odd,this will give us a row situated at alter-nate locations,corresponding to the rows of the top field.A similar procedure is adopted for the bottom field except that in the last step of vertical downsampling,we first downsample by the odd-length filter given before,but then downsample by the

even -length filter given

by

.This way,each row corresponding to the downsized bottom field will lie at the center of the corresponding two rows of the top field.Hence,the downsized frame is also interlaced.

Upsampling is standardized and is carried out as described next.First,consider upsampling the top field.This is first dein-terlaced as described above in downsampling to double its ver-tical size.Horizontal size is doubled by simple averaging in hor-izontal direction [using the

filter

.The

upsizing,which is part of the standard,is accomplished by bilinear interpolation with filter

can only improve the performance of MPEG-2scalability,and hence,we would be comparing our results with the best perfor-mance possible with MPEG-2.

We have provided results on five sequences:Football,Cheers (Cheerleaders),Tt,Bike,and Susie.Each sequence has a frame size of

[11]T.Yoshida and K.Sawada,“Spatio-temporal scalable video coding using

subband and adaptive field/frame interpolation,”in Proc.IEEE Asia Pa-cific Conf.Circuits and Systems,1996,pp.145–148.

[12]M.Domanski, A.Luczak,and S.Mackowiak,“Spatio-temporal

scalability for MPEG video coding,”IEEE Trans.Circuits Syst.Video Technol.,vol.10,pp.1088–1093,Oct.2000.

[13]T.Naveen and J.W.Woods,“Motion compensated multiresolution

transmission of high definition video,”IEEE Trans.Circuits Syst.Video Technol.,vol.4,pp.29–41,Feb.1994.

[14]J.W.Woods and G.Lilienfield,“A resolution and frame-rate scalable

subband/wavelet video coder,”IEEE Trans.Circuits Syst.Video Technol.,vol.11,pp.1035–1044,Sept.2001.

[15]J.-R.Ohm,“Three-dimensional subband coding with motion compensa-

tion,”IEEE Trans.Image Processing,vol.3,pp.559–571,Sept.1994.

[16] D.Taubman and A.Zakhor,“Multirate3-D subband coding of video,”

IEEE Trans.Image Processing,vol.3,pp.572–588,Sept.1994. [17] B.-J.Kim and W.A.Pearlman,“An embedded wavelet video coder using

three-dimensional set partitioning in hierarchical trees,”in Proc.IEEE Data Compression Conf.,Mar.1997,pp.251–260.

[18]S.-J.Choi and J.W.Woods,“Motion-compensated3-D subband coding

of video,”IEEE Trans.Image Processing,vol.8,pp.155–167,Feb.

1999.[19]W.Li,“Overview of fine granularity scalability in MPEG-4video stan-

dard,”IEEE Trans.Circuits Syst.Video Technol.,vol.11,pp.301–317, Mar.2001.

[20]NAB’98–Lucent Demos New HDTV Encoder(1998,Apr.).[Online].

Available:https://www.wendangku.net/doc/f015034743.html,/news/1998/april/6/1.html

[21]NDS Demos Multiplex(1998,Nov.).[Online].Available:http://web-

https://www.wendangku.net/doc/f015034743.html,/hdtvnewsonline/releases/NDSDemosMultiplex.html

[22] A.Puri and A.Wong,“Spatial domain resolution scalable video

coding,”in Proc.SPIE Conf.Visual Communications and Image Processing,Boston,MA,Nov.1993,pp.718–729.

[23]R.Dugad and N.Ahuja,“A fast scheme for image size change in the

compressed domain,”IEEE Trans.Circuits Syst.Video Technol.,vol.

11,pp.461–474,Apr.2001.

[24](1996)MPEG-2Video Codec(With Source Code).MPEG

Software Simulation Group(MSSG).[Online].Available: https://www.wendangku.net/doc/f015034743.html,/MPEG/MSSG/

[25] C.A.Gonzales,H.Yeo,and C.J.Kuo,“Requirements for motion-es-

timation search range in MPEG-2coded video,”IBM J.Res.Develop., vol.43,no.4,pp.453–470,July1999.

相关文档