当前位置：文档库 › Light Field Camera Design for Integral View Photography

Light Field Camera Design for Integral View Photography

ADOBE TECHNICAL REPORT Light Field Camera

Design for Integral

View Photography

Todor Georgeiv and Chintan Intwala

Adobe Systems Incorporated

345 Park Ave, San Jose, CA 95110

tgeorgie@https://www.wendangku.net/doc/409251304.html,

Abstract

This paper introduces the matrix formalism of optics as a useful approach to the area of “light fields”. It is capable of reproducing old results in Integral Photography, as well as generating new ones. Furthermore, we point out the equivalence between radiance density in optical phase space and the light field. We also show that linear transforms in matrix optics are applicable to light field rendering, and we extend them to affine transforms, which are of special importance to designing integral view cameras. Our main goal is to provide solutions to the problem of capturing the 4D light field with a 2D image sensor. From this perspective we present a unified affine optics view on all existing integral / light field cameras. Using this framework, different camera designs can be

produced. Three new cameras are proposed. Figure 1: Integral view of a seagull

Table of Contents

Abstract 1

1. Introduction 3

1.1 Radiance and Phase Space 3

1.2 Structure of this paper 3

2. Linear and Affine Optics 4

2.1. Ray transfer matrices 4

2.2. Affine optics: Shifts and Prisms 4

3. Light field conservation 5

4. Building blocks of optical

system 5

4.1."Camera" 5

4.2."Eyepiece" 6

https://www.wendangku.net/doc/409251304.html,bining eyepieces 6

5. The art of light field camera

design 6

5.1. Integral view photography 6

5.2. Camera designs 7

6. Results from our light field

cameras 9

Conclusion 13

References 13

Adobe Technical Report

Introduction

Linear (Gaussian) optics can be defined as the use of matrix methods from linear algebra in geometrical optics. Fundamentally, this area was developed (without the matrix notations) back in the 19-th century by great minds like Gauss and Hamilton. Matrix methods became popular in optics during the 1950-ies, and are widely used today [1], [2]. In those old methods we recognize our new friend, the Light Field. We show that a slight extension of the above ideas to what we call affine optics , and then a transfer into the area of computer graphics, produces new and very useful practical results. Applications of the theory to designing “Integral” or “light field” cameras are demonstrated.

1.1 Radiance and Phase Space

The radiance density function (or “Light field'' - as it is often called) describes all light rays in space, each ray defined by 4 coordinates [3]. We use a slightly modified version of the popular “2-plane parameterization”, which describes each ray based on intersection point with a predefined plane and the two angles / directions of intersection. Thus, a ray will be represented by space coordinates, 12,q q and direction coordinates,

12,p p which together span the phase space of optics (See Figure 2). In other words, at a given transversal plane in our optical system, a ray is defined by a 4D vector, 1212(,,,)q q p p , which we will call the light field vector .

These coordinates are very similar to the traditional (,,,)s t u v coordinates used in light field literature. Only, in our formalism a certain analogy with

Hamiltonian mechanics is made explicit. Our variables q and p play the same role as the coordinate and

momentum in Hamiltonian mechanics. In more detail, it can be shown that all admissible transformations of the light field preserve the so called symplectic form

0110???????

, same as in the case of canonical transforms in mechanics [4].

In other words, the phase space of mechanics and “light field space” have the same symplectic structure. For the light field one can derive the volume conservation law (Liouville's theorem) and other invariants of mechanics [4]. This observation is new to the area of light fields. Transformations of the light field in an optical system play a role analogous to canonical transforms in mechanics.

1.2 Structure of this paper

Next section 2 shows that: (1) A thin lens transforms the light field linearly, by the appropriate ray transfer matrix. (2) Light traveling a certain distance in space is also described by a linear transformation (a shear) - as first pointed out in a paper [5] by Durand at. al. (3) Shifting a lens from the optical axis or inserting a prism is described by the same affine transform. This extends linear optics into what we call affine optics . These transformations will be central to future “light field'' image processing, which is coming to replace traditional image processing.

Section 3: Transformation of the light field in any optical device, like telescope or microscope, has to preserve the integral of light field density. Any such transformation can be constructed as a product of only the above two types of matrices, and this is the most general linear transform for the light field.

Section 4 defines a set of optical devices, based on the above three transforms. Those optical devices do everything possible in affine optics, and they will be used as building block for our integral view cameras. The idea is that since those building blocks are the most general, everything that is possible in optics could be done using only those simple blocks.

Section 5 describes the main goal of Integral View Photography , and introduces several camera designs from the perspective of our theory. Three of those designs are new. Section 6 shows some of our results.

Figure 2: A ray intersecting a plane perpendicular to the optical axis. Directions (angles) of intersection are defined as derivatives of q1 and q2 with respect to t .

2. Linear and Affine Optics

This section introduces the simplest basic transforms of the light field. They may be viewed as the geometric primitives of image processing in light space, similar to rotate and resize in traditional imaging in the plane.

2.1. Ray transfer matrices (1) Light field transformation by a lens:

Just before the lens the light field vector is (,)q p . Just after the lens the light field vector is (',')q p . The lens doesn't shift the ray, so 'q q =. Also, the transform is linear. The most general matrix representing this type of transform would be: '1

0(1)'1q q p a p ??????

=????????????

Note: As a matter of notations, 1f

a ?=

where f is

called focal length of the lens. Positive focal length produces negative increment to the angle, see Figure 3. (2) Light field before and after traveling a distance T

(T stands for “travel” -- as in [5]), who first introduced this “shear” transform of light field traveling in space.) The linear transform is: '1(2)'01q T q p p ??????

=????????????

where the bottom left matrix element 0 specifies that there is no change in the angle p when a light ray travels through space. Also, positive angle p produces positive change in q , proportional to the distance traveled T .

2.2. Affine optics: Shifts and Prisms In this paper we need to slightly extend traditional linear optics into what we call affine optics . This is done by using (in the optical system) additive elements together with the above matrices.

Our motivation is that all known light field cameras and related systems have some sort of lens array, where individual lenses are shifted from the main optical axis. This includes Integral Photography [6], the Hartmann-Shack sensor [7], Adelson's Plenoptic camera [8], 3D TV systems [9], [10], the light field - related [3] and the camera of Ng [11]. We were not able to find our

current theoretical approach anywhere in the literature. One such “additive” element is the prism. By definition it tilts each ray by adding a fixed angle of deviation α.

Expressed in terms of the light field vector the prism transform is:

'0(3)'q q p p α??????

=+????????????

One interesting observation is that the same transform, in combination with lens refraction, can be achieved by simply shifting the lens from the optical axis. If the shift is s , formula (1) for lens refraction would be modified as follows:

Convert to lens-centered coordinates by subtracting s . Apply linear lens transform. Convert to original coordinates by adding back s ,

which is simply:

110'(4)

1'0f

q q s s p p ?????????=+???????????????

?Figure 3: A lens transform of the

light field.

Figure 4: Space transfer of light.

Final result: “Shifted lens = lens + prism ” This idea will be used later in section 5.2

Figure 5 illustrates the above result by showing how you can build a prism with variable angle of deviation

s f α= from two lenses of focal length f and f ? shifted by a variable distance s from one-another.

3. Light field conservation

The light field (radiance) density is constant along each

ray. The integral of this density over any volume in 4D phase space (light field space) is preserved during the

transformations in any optical device. This is a general

fact that follows from the physics of refraction, and it

has a nice formal representation in symplectic geometry

(see [12]).

In our 2D representation of the light field this fact is equivalent to area conservation in (,)q p - space, which will be shown next: Consider two rays, 11(,)q p and 22(,)q p . After the transform in an optical system, the rays will be

different. The signed area between those rays in light space (the space of rays) is defined by their cross product. In our matrix formalism the cross product expression for the area will be: 212211

1201().(6)10q q p q p q

p p ?????=?????????

After transformation in the optical device represented

by matrix M , the area between the new rays will be 21

1201(),

(7)10T q q

p M M p ?????????????

where T M is the matrix transposed to M . The

condition for expressions (6) and (7) to be equal for any

pair of rays is: 0

101.(8)1010T M M ????=??????????

This is the condition for area conservation. In the

general case, a similar expression describes 4D volume conservation for the light field. The reader can check that (1) the matrix of a lens and (2) the matrix of a light ray traveling distance T discussed above both satisfy this condition.

Further, any optical system has to satisfy it, as a product of such transforms. It can be shown [12] that any linear transform that satisfies (8) can be written as a product of matrices of type (1) and (2).

The last step of this section is to make use of the fact that since light field density for each ray before and after the transform is the same, the sum of all those densities times the infinitesimal area for each pair of rays must be constant. In other words, integral of the light field over a given area (volume) in light space is conserved during transforms in any optical device.

4. Building blocks of our optical system We are looking for simple building blocks for optical

systems (light field cameras), that are most general . In other words, they should be easy to understand in terms

of the mathematical transformations that they perform, and at the same time they should be general enough so they do not exclude useful optical transforms. According to our previous section, in the space of affine

optical transforms, everything can be achieved as products of the matrices of equations (1), (2) and prisms. However, those are not simple enough. That's why we define other building blocks as follows:

4.1. “Camera ”

1100'(5)

1's f f q q p p ????????=+?????????????????

Figure 5: A variable angle prism.

This is not the conventional camera, but is closely related to it by adding a field lens. With this lens the camera transform becomes simple: 10.

(9)0m m M ??=????

First, light travels a distance a from the object to the objective lens. This is described by a transfer matrix a M . Then it is refracted by the objective lens of focal length f , represented by transfer matrix f M . In the end it travels to the image plane a distance b , represented by b M . The full transform, found by multiplication of those three matrices is: 11.(10)1b ab f f b f a

a f f a

b M M M ??+??

???????

The condition for focusing on the image plane is that the top right element of this matrix is 0, which is equivalent to the familiar lens equation: 111.

(11)a b f

Using (11), our camera transfer matrix can be converted into a simpler form: 10.(12)b a

a ??

???????

We also make the bottom left element 0 by inserting a so called “field lens'' (of focal length bf a F =), just before the image plane: 11

00.(13)10b b

a a a a f

b b ??????

??=??????????????

This matrix is diagonal, which is the simple final form we wanted to achieve. It obviously satisfies our area conservation condition, which the reader can easily verify. The parameter b a m ?= is called “magnification” and is a negative number. (Cameras produce inverted images.)

4.2. “Eyepiece ”

This element has been used as an eyepiece (ocular) in optics, that's why we give it the name. It is made up of two space translations and a lens. First, light rays travel a distance f , then they are refracted by a lens of focal

length f , and in the end they travel a distance f . The result is: 111001

.(14)100

101f f f f f ??????????=?

???????????????

This is “inverse diagonal” matrix, which satisfies area

conservation. It will be used in section 5.2 for switching between q and p in a light field camera.

4.3. Combining eyepieces Inversion:

Two eyepieces together produce a “camera” with

magnification -1: 1

0(15)0

1????????

An eyepiece before and after a variable space T and then inversion produces a lens of variable focal length 2f T

F =

2111

0001.(16)10001T f f f f f T ??

???????=?????????????????????

By symmetry, same combination of eyepieces with a

lens produces space translation 2

f F T = without usin

g up real space! Devices corresponding to the above

matrices (9), (14), (15), (16) together with shifts and prisms are the elements that can be used as “building blocks” for our light field cameras. Those operators are also useful as primitives for future optical image processing in software. They are the

building blocks of the main transforms. Corresponding to geometric transforms like Resize and Rotate in current image processing.

5. The art of light field camera design

5.1. Integral view photography We define Integral View Photography as a

generalization from several related areas of research. These include Integral Photography [6], [9] and related, Adelson's “Plenoptic” camera [8], a number 3D TV systems ([10] and others), and the “Light Field” camera of Ng at. al. [11].

In our approach we see conventional cameras as

“integration devices’’ which integrate the optical field over all points on the aperture into the final image. This is already an integral view camera. It achieves effects like refocusing onto different planes and changing depth of field, commonly used by photographers. The idea of Integral View photography is to capture some representation of that same optical field and be able to integrate it afterwards , in software. In this way, the captured “light field”, “plenoptic” or “holographic” image potentially contains the full optical information, and much greater flexibility can be achieved. (1) Integration is done in software, and not mechanically.

(2) Instead of fixing all parameters in real time, the photographer can relax while taking the picture, and defer focusing, and integration in general, to post processing in the dark room. Currently only color and lightness are done as post processing (in Aperture and Light Room).

(3) Different methods of integrating the views can be applied or mixed together to achieve much more than what is possible with a conventional camera. Examples include focusing on a surface, “all in focus” and others. (4) Also, more power is gained in image processing because now we have access to the full 3D information about the scene. Difficult tasks like refocusing become amazingly easy. We expect tasks like deblur, object extraction, painting on 3D surfaces, relighting and many others to become much easier, too.

5.2. Camera designs

We are given the 4D light field (radiance density function), and we want to sample it into a discrete representation with a 2D image sensor. The approach taken is to represent this 4D density as a 2D array of images. Different perspectives on the problem are possible, but for the current paper we would choose to discuss it in the following framework. Traditional Integral photography uses an array of cameras

focused on the same plane, so that each point on that plane is imaged as one pixel in each camera. These pixels represent different rays passing at different angles through that same point. In this way angular dimensions are sampled. Of course, the image itself samples space dimensions, so we have a 2D array of 2D arrays.

The idea of compact light field camera design is to put all the optics and electronics into one single device. We want to make different parts of the main camera lens active separately, in the sense that their input is

registered independently (but on the same sensor!), as if coming from different cameras. This makes the design compact and cheap to manufacture. First design:

Consider formula (5). With this in mind, Figure 6 in which each lens is shifted from the optical axis, would be equivalent to adding prisms to a single main lens. See Figure 7. This optical device would be cheaper to manufacture because it is made up of one lens and

multiple prisms, instead of multiple lenses. Also, it's more convenient for the photographer to use the common controls of one single lens , while effectively working with a big array of lenses.

We believe this design is new. Based on what we call affine optics (formula (5)), it can be considered a reformulation of the traditional “multiple cameras”

design of integral photography.

Figure 6: An array of cameras used in integral photography for

capturing the light field.

Figure 7: Array of prisms design.

In traditional cameras all rays from a far away point

are focused into the same single point on the sensor.

This is represented in Figure 7, where all rays coming

from the lens are focused into one point. We want to

split rays coming from different areas of the main

lens. This is equivalent to a simple change of angle,

so it can be done with prisms of different angles of

deviation placed next to the main lens, at the

aperture.

Second design:

This approach was invented by Adelson-Wang [8], and

recently used by Ng et. al [11]. We would like to

propose an interesting interpretation of their design. It

is a traditional camera, where each pixel is replaced by

an eyepiece (E) with matrix of type01

???

, and

sensor (CCD matrix) behind it. The role of the eyepiece

is to switch between coordinate and momentum

position-direction) in optical phase space (light field).

As a result different directions of rays at a given

eyepiece are recorded as different pixels on the sensor

of that eyepiece. Rays coming from each area of the

main lens go into different pixels at a given eyepiece.

See Figure 8, where we have dropped the field lens for

clarity (but it should be there at the focal plane of the

main camera lens for the theory to be exact). In other

words this is the optical device “camera” of section 4.1,

followed by an array of eyepieces (section 4.2).

Figure 9 shows two sets of rays, and their path in the

simplified version of the system - without field lens.

Our next step will be to generalize designs (1) and (2)

by building cameras equivalent to them in the optical

sense. Using formula (5), we can replace the array of

prisms with an array of lenses. See Figure 10. We get

shift up in angle, same as with prisms. Total inverse

focal length will be sum of inverse focal length of main

lens and individual lenses.

A very interesting approach would be to make the array

of lenses or prisms external to the camera: With

positive lenses we get an array of real images, which are

captured by a camera focused on them. Figure 11.

Figure 8: Array of eyepieces

generating multiple views.

Figure 9: More detail about

Figure 8.

Figure 10: Lenses instead

of prisms in Figure 7.

Figure 11: Multiple lenses

creating real images.

With negative lenses we get virtual images on the other side of the main lens. Rays are shifted down on Figure

12, but virtual images are shifted up. This design is not possible as internal for the camera because images are

virtual. We believe it is new.

If all negative lenses had same focal length as the main

lens (but opposite), we would get a device equivalent to

an array of prisms. See Figure 5. This works perfectly

well, but with a large number of lenses the field of view

is too small. In order to increase it, we need the focal length of the array of lenses to be small.

Another problem is the big main lens, which is heavy

and expensive. The whole device can be replaced with

an array of lens-prism pairs, shown in Figure 13. This is another new design. See a picture of this array of 19 negative lenses and 18 prisms, Figures 14.

Most of our results are produced with a camera with the design of Figure 12, with an array of 20 lenses cut into squares so they can be packed together with the least loss of pixels. A picture of our camera is shown in

Figure 15. One of the datasets obtained with it is shown reduced in Figure 16.

6. Results from our light field cameras

In terms of results, in this paper we chose to make our only goal showing that our cameras work. Making use of the advantages of the light field in image processing,

Figure 12: Multiple lenses creating virtual images. Figure 13: Lenses and prisms.Figure 14: Picture of our hexagonal array of lenses and prisms.

Figure 15: Working model of the

camera in Figure 12, with 2 positive lenses and an array of 20 negative lenses.

and achieving all the improvements and amazing effects is too broad task - and deferred to future works.

The only effect that we are going to demonstrate will be

refocusing. It creates the sense of 3D and clearly shows the power of our new camera. Also, in this way we have a chance to compare our new results against the current

state of the art in the field, the camera of Ng at. al. [11].

With the square lens array we are able to obtain 20 images from 20 different viewpoints. See Figure 16. Firstly, one image is chosen as a base image with which all the rest of the images are registered. Only certain region of interest (ROI) in the base image is used for matching with the rest of the images. An example of ROI is shown in Figure 17. This registration process yields a 2D shift (,)i i i S dx dy =for an image i I that will shift the region of interest from i I to align it with the base image. Since for registration we consider only the region of interest, we compute a normalized cross-correlation coefficient (,)i c x y between the appropriate channel of the ROI and i I at every point of i I . The location that yields the maximum value of the coefficient (,)i c x y is used to compute the shift. This yields an array of shifts vectors

11221919(,),(,),,(,)dx dy dx dy dx dy K corresponding to each image i I , for a given ROI. In order to obtain an

image, like the one in Figure 18, we simply blend 20 shifted images equally. Because of the fact that objects at equal depth have equal shifts, this produces an image with objects in focus that lie in the same depth plane as the objects in ROI. Figure 18 and 20 show the mixture of appropriately shifted images. Note that the edges of those images are left unprocessed in order to show the shifts more clearly. Now, in order to obtain various intermediate depth planes, like in Figure 19, we need

two sets of shifts; one is for foreground f S and one is for background b S . As a result, the intermediate depth plane is obtained by linear interpolation (),(17)D f b f S S D S S =+?? where the depth D is between 0 and 1.

Another example of refocusing after taking the picture

is demonstrated in our seagull images. A photographer wouldn't have the time to refocus on birds while they are flying, but with integral view photography this can be done later in the dark room. Figure 21 is obtained for D =0, Figure 22 is obtained for D =0.65 and Figure 23 is obtained for D =1.

As seen from the results, the parallax among the 20 images and the overall range of depths of the scene are responsible for the believability of the resulting images. The larger the parallax, the more artifacts are produced in the final image. For example, one can compare the two images focused on the foreground:

Figure 18 and Figure 21. The problem in the later case can be solved by generating images from virtual

intermediate viewpoints using view morphing or other Vision algorithms.

Using a different camera design (not described in this paper) which is able to capture 100 images

corresponding to 100 slightly shifted viewpoints, we were able to refocus much more precisely and with minimal artifacts. Refer to Figure 24 and Figure 25 for the obtained results. Notice that the scene has a large variation in depth (comparable to Figure 21) and yet the results are much smoother.

Figure 26 shows one example of refocusing on the foreground using 3-view morphing, where 144 images have been generated from the original 20. Artifacts are greatly reduced, and there is no limit to the number of views (and level of improvement) that can be achieved

with this method.

Figure 16: A set of 20 images obtained with our camera.

Figure 17: Rodin’s Burghers (region

of interest). Figure 18: Rodin’s Burghers at

depth 0, closest to camera.

Figure 19: Rodin’s Burghers at

depth 0.125.

Figure 20: Rodin’s Burghers at depth 1.

Figure 21: Seagulls at depth 0.Figure 22: Seagulls at depth 0.65

Figure 23: Seagulls at depth 1.

Figure 24: Our 100 view light

field focused at depth 0.

Figure 25: Our 100-image light

field focused at depth 1.

Figure 26: Our 20-image light field

focused at the head of Scott. Result

obtained from 144 synthetic images

generated using view morphing.

Adobe Systems Incorporated ? 345 Park Avenue, San Jose, CA 95110-2704 USA ? https://www.wendangku.net/doc/409251304.html,

Adobe, the Adobe logo, Acrobat, Clearly Adobe Imaging, the Clearly Adobe Imaging logo, Illustrator, ImageReady, Photoshop, and PostScript are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States and/or other countries. Mac and Macintosh are trademarks of Apple Computer, Inc., registered in the United States and other countries. PowerPC is a registered trademark of IBM Corporation in the United States. Intel and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Microsoft, Windows, and Windows NT are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. All other trademarks are the property of their respective owners.

Conclusion

In this paper we used a new approach to the light field, based on linear and affine optics. We constructed simple transforms, which can be used as building blocks of integral view cameras. Using this framework, we were able to construct the main types of existing light field / integral cameras and propose three new designs.

Our results show examples of refocusing the light field after the picture has been taken. With a 16 mega pixel sensor our 20-lens camera (Figure 12) produces refocused output image of 700X700. View morphing approach to improving the quality of the final images is discussed and shown to be practical.

We would like to thank Colin Zheng from Univ. of Washington for help with producing Figure 26. We would also like to thank Gavin Miller for discussions, and for providing a very relevant reference [13]. References

1.E. Hecht and A. Zajac. Optics, Addison-Wesley, 1979.

2.A. Gerrard and J. M. Burch. Introduction to Matrix Methods in Optics, Dover Publications, 1994.

3.M. Levoy and P. Hanrahan. Light Field Rendering. In SIGGRAPH 96, 31-42.

4.R. Abraham and J. Marsden. Foundations of Mechanics. Perseus Publishing, 1978.

5. F. Durand, N. Holzschuch, C. Soler, E. Chan, F. Sillion. A Frequency Analysis of Light Transport. In SIGGRAPH 2005, 1115-112

6.M. Hutley. Microlens Arrays. Proceedings, IOP Short Meeting Series No 30. Institute of Physics, May 1991.

7. R. Tyson. Principles of Adaptive Optics. Academic Press, 1991. 8.T. Adelson and J. Wang. Single Lens Stereo with a Plenoptic Camera. IEEE Transactions on Pattern Analysis and Machine Intelligence 14, 2, 99-106, 1992.

9.F. Okano, H. Hoshino, J. Arai, I. Yuyama. Real-time pickup method for a three-dimensional image based on integral photography. Applied Optics, 36, 7, March 1997.

10.T. Naemura, T. Yoshida and H. Harashima. 3-D computer graphics based on integral photography. Optics Express, 8, 2, 255-262, 2001.

11.R. Ng, M. Levoy, M. Bredif, G. Duval, M. Horowitz, P. Hanrahan. Light Field Photography with a Hand-held Plenoptic Camera. Stanford Tech Report CTSR 2005-02.

12.V. Guillemin and S. Sternberg. Symplectic techniques in physics. Cambridge University Press.

13.L. Ahrenberg and M. Magnor. Light Field Rendering using Matrix Optics. WSCG 2006, Jan 30-Feb 3, 2006, Plzen, Czech Republic.