Capturing 2½D Depth And Texture Of Time-Varying Scenes .

3y ago
15 Views
2 Downloads
856.58 KB
8 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Cannon Runnels
Transcription

Capturing 2½D Depth and Texture of Time-VaryingScenes Using Structured Infrared LightChristian Frueh and Avideh ZakhorDepartment of Computer Science and Electrical EngineeringUniversity of California, Berkeley{frueh,avz}@eecs.berkeley.eduIn this paper, we describe an approach to simultaneouslycapture visual appearance and depth of a time-varying scene.Our approach is based on projecting structured infrared (IR)light. Specifically, we project a combination of (a) a staticvertical IR stripe pattern, and (b) a horizontal IR laser linesweeping up and down the scene; at the same time, the scene iscaptured with an IR-sensitive camera. Since IR light isinvisible to the human eye, it does not disturb human subjectsor interfere with human activities in the scene; in addition, itdoes not affect the scene’s visual appearance as recorded by acolor video camera. Vertical lines in the IR frames areidentified using the horizontal line, intra-frame tracking, andinter-frame tracking; depth along these lines is reconstructedvia triangulation. Interpolating these sparse depth lines withinthe foreground silhouette of the recorded video sequence, weobtain a dense depth map for every frame in the videosequence. Experimental results corresponding to a dynamicscene with a human subject in motion are presented todemonstrate the effectiveness of our proposed approach.Keywords: dynamic scene, motion capture, structured lightI. INTRODUCTIONThe 4D capture of time-varying, dynamic scenes isuseful for applications such as scene analysis, motionanalysis, human action recognition, and interactive sceneviewing. There are several approaches to acquireappearance and/or geometry of a scene from multipleangles using a set of surrounding cameras. [5, 6] proposeimage-based approaches for visualization that do notexplicitly capture geometry. Among others, [3, 10, 11, sentation based on space carving, shape-fromsilhouette, stereo matching or combinations thereof. Usedin conjunction with body models, these approaches havebeen successful for human motion estimation and analysis,aiming to replace the inconvenient but still widespread useof markers. However, they require (a) a large number ofprecisely calibrated and registered cameras, (b) thecapability of recording and storing multiple video streamssimultaneously, and (c) special scene background; inaddition, they make restrictive assumptions about the objectin the scene.Other approaches attempt to directly acquire depth fromone viewpoint at video frame rate. [4] offers a commercialsystem, which uses the time-of-flight measurementprinciple by sending out a IR-laser flash and detecting thefraction of the backscattered IR light that is returned withina certain shutter period. Using an additional color camera,this system provides both color and depth at video framerate, but it is currently quite expensive. Another class oftechniques for reconstructing dynamic 3D scenes is basedon structured light. While structured light is a wellestablished technique for static 3D shape reconstruction inreverse engineering, where a series of different patterns isprojected onto a static object in order to disambiguateindividual light stripes, it is not easily applicable tochanging scenes. [7, 9, 12] use structured light and colorcoding to distinguish individual projected lines andreconstruct depth in a one-shot fashion. While the usedprojector/camera configurations are inexpensive, and thedepth reconstruction is fast and sufficiently accurate, thesesystems have the inherent disadvantage that the projectedvisible pattern interferes with the texture/appearanceacquisition; thus these approaches are not applicable tophoto-realistic scene reconstruction. Moreover, the visiblelight pattern could potentially distract or disturb humans oranimals in the scene or change their behavior; for someapplications, e.g. military surveillance, it is desirable tocapture the 3D motion of a scene in a covert manner.In this paper, we propose architecture and associatedalgorithms for a system that is capable of capturing 3Ddepth of a time-varying scene using structured infrared (IR)light. Since IR light is invisible to the human eye, ourcapture process is done in a minimally disturbing fashion.At the same time, the visual appearance of the scene iscaptured by a visible-light (VIS) color camera insensitive tothe IR light; we propose algorithms to enhance the recordedVIS video stream with a 2½D depth (“z”) channel. In doingso, we combine advantages of passive systems, i.e. minimalinterference and proper texture acquisition, with advantagesof active systems, i.e. low calibration effort and robustnessto absence of features.The outline of this paper is as follows: Section IIintroduces the setup of our acquisition system, and SectionIII describes the IR line detection and identificationprocedure. In Section IV, we describe the algorithms forreconstruction of a dense depth map, and in Section V, weshow results for a video sequence of a moving person.

II. SYSTEM SETUPWe propose an active acquisition system, in which the 3Ddepth estimation is performed entirely in the IR domain,and thus invisible to the human eye. In this system, shownin Figure 1, an invisible static pattern of equally spacedvertical IR stripes is projected onto the scene, and iscaptured by an IR sensitive camera at a different angle. Ourapproach is to reconstruct the depth along the resultingvertical intensity discontinuities (V-lines) in the IR imagevia triangulation.Since there are no off-the-shelf IR “color” camerasavailable commercially, i.e. cameras that distinguishbetween different IR wavelengths, it is not possible toexploit color-coding techniques to identify and distinguishindividual lines from each other. Rather, we use anadditional horizontal IR laser stripe sweeping the scene upand down in a vertical manner; this stripe is easy to identifybecause it is the only horizontal line in the captured IRimage. Because we know the vertical displacement betweenthe rotating mirror and the camera, and because we candetermine the current plane equation for the sweeping lightplane for each frame using a reference object as describedbelow, we can compute the depth along this horizontal line(H-line) via triangulation. We then compute the planeequation for the perpendicular V-lines that intersect withthe H-line by using the 3D coordinates of the intersectionpoint, the center-of-projection of the IR light projector, andexploiting the fact that the light planes are vertical.Sweeping the H-stripe periodically up and down the scenealong the vertical direction, we obtain the depth along ahorizontal line at different locations in different IR frames,resulting in a different set of identified V-lines for eachframe; in doing so, almost all V-lines are likely to beidentified at some point in time. For the V-lines that do notintersect with the H-line in a given frame, we utilize theplane equation from a previous or future frame.Specifically, for each unidentified V-line, we search theprevious or future frame for an identified line at closeproximity. This is possible because the plane equations areconstant, and the motion between 2 consecutive frames, i.e.within 33 ms, is assumed to be small. In this fashion, theplane equation for each V-line is either determined directlyvia intersection points with the H-line, i.e. intra-frame linetracking, or indirectly by carrying over the plane equationacross frames, i.e. inter-frame line tracking. In Addition, aswill be explained shortly, some of the remaining V-linescan be identified via line counting, i.e. by using the lightplane equation of neighboring lines.Figures 1 and 2 show the main components of our system:We create the static vertical stripe (V-stripe) IR patternusing a 500W light bulb and a stripe screen in front of it.For IR camera, we use an off-the-shelf digital camcorder in0-lux (“nightvision”) mode with an additional infraredfilter, and record the IR video stream at 30 Hz inprogressive scan mode. The visible camera is mounted nextto the IR camera, and is connected to the PC via a FireWirecable. It is triggered by every third pulse of the camcorder’svideo sync signal; thus, it captures the visible appearance ofthe scene at 10 Hz. The horizontal stripe is generated by a30 mW IR line laser with a 90 degree fan angle, and isswept across the scene at about 2 Hz using a rotatingpolygonal mirror. For simplicity in our current experimentalsetup, we do not use an angular encoder in order to obtainthe precise orientation of this horizontal light plane; rather,we compute its plane equation by ‘reversing’ triangulation,i.e. by using the pre-calibrated depth of a vertical referencestrip to the left side of the scene.Due to its off-the-shelf components, the overall cost of oursystem is low, and the required pre-calibration besides thereference strip is limited to the camera parameters, and thetwo baselines, namely the vertical displacement betweenthe IR camera and the polygonal mirror, and the horizontaldisplacement between the IR camera and the light sourcefor the V-stripes. As we will see shortly, while the stripepattern does not need to be calibrated, we do assume it to bevertical; we achieve this by mounting the screen using awater gauge and gravity.Figure 1: System setupIR point lightsourceIR line laser andpolygonal mirrorIR-camcorderand color cameraStripepatternPCFigure 2: Acquisition systemIII. IR-LINE DETECTION AND IDENTIFICATIONIn this section, we describe the extraction of basic featuressuch as the horizontal line, vertical lines, and the silhouettefrom the IR video sequence. The goal is to identify theindividual patterns and distinguish them from each other.

A. H-Line DetectionThe horizontal line could potentially be detected byapplying a standard horizontal edge filter on each IR frame.However, there are two main problems: (1) the H-line istypically 7 to 8 pixels wide, due to the finite exposure timeand its fast-sweeping motion up and down the scene. Thisproblem can be solved by designing a filter for maximalresponse to an 8 pixels wide horizontal edge. (2) on thinhorizontal objects such as bookshelves and fingers, and oncomplicated surfaces such as wrinkles of a shirt, verticalstripes can appear as horizontal edges and result in asignificant response to the horizontal filter, as seen inFigure 3 for wrinkles in the shirt and on the collar.Figure 3: Vertical stripes can result in a response to thehorizontal edge filterHowever, “true” H-line pixels do not appear at the sameimage location in consecutive frames due to the sweepingmotion of the H-line, while “false” ones tend to be at thesame location due to the limited motion of objects in thescene; thus we apply an XOR operation between the Hpixels found in consecutive frames to eliminate false edges.Furthermore, since false H-lines are typically short, weremove all short H-line segments below a threshold lengthto obtain the final H-line pixels, as shown in Figure 4.The plane equation for the H-line is determined using thereference strip. Specifically, the 3D coordinates of the Hline points on the reference strip can be computed from theray of the corresponding IR camera pixel and the knowndepth; this point and the location of the polygonal mirror isenough to determine the plane, because it is parallel to thehorizontal axis. Then, the 3D coordinates and depth of eachH-line pixel is computed as the intersection point betweenthe computed light plane and each pixel’s ray.B. V-Line DetectionV-lines are detected by applying a standard vertical edgefilter on each IR frame, thinning the obtained edges andtracking them within the frame. However, it is conceivablefor two separate V-lines, one on the background and one onthe foreground, to align by coincidence, and hence betracked as a single one. To avoid this, and also to reduceprocessing time, we clip the obtained edges to the IRforeground silhouette. We assume a fixed but possiblycomplex scene background, rather than imposing severerestrictions such as a special color as it is commonly donein silhouette-based techniques. To determine the silhouette,we compute the difference between each frame and theoriginal background image acquired without any objects,and then threshold it; in order to cope with noise andobjects that by coincidence have a similar gray value, weapply median filtering, segment marked areas, and removeisolated small segments. Since the background difference issmall for dark stripes, the outline of the silhouette appearssomewhat jagged; as it turns out, this does not cause anyproblems for clipping the V-lines because it essentiallyoccurs between the V-lines. Furthermore, due to itsdifference with the original background, the shadow createdby a person moving in the stripe pattern is typically alsodetected as foreground; this is not of concern either becauseit does not contain any V-lines. Figure 5 shows steps for theV-line detection and clipping procedure.C. V-Line IdentificationRather than using calibrated vertical light planes andidentifying them with an ordinal number, we identify a Vline in the image directly by its plane equation. This hastwo advantages: (a) no precise pre-calibration step of theIR-stripes is necessary, since their corresponding planeequation is obtained on-the-fly using the H-line, and (b)there is no possibility of assigning an incorrect ordinalnumber and thus an incorrect pre-calibrated plane to astripe. Conceptually, the depth originally obtained from theH-line is locally updated; in this sense, it is similar to videocoding, with the H-line corresponding to an Intra-frame (Iframe), and the V-line movements, i.e. incremental depthchanges, corresponding to Prediction-frames (P-frames).Note that because V-lines are merely used for updating thedepth between two sweeps of the horizontal line, rather thanfor computing it from scratch, our approach is robust toinaccuracies in calibration of the V-lines.We define a coordinate system aligned with the IR camera,with the camera’s center-of-projection as origin, the x-axishorizontal, the y-axis vertical, and the z-axis pointing intothe image. Since V-planes are all vertical, i.e. the ycomponent ny of their normal vector n (nx,ny,nz) is zero, thegeneral plane equation x·nx y·ny z·nz d 0 for a point(x,y,z) on a V-plane can be simplified tox·nx z·nz d 0Figure 4: Detected H-line, marked red on the object andgreen on the reference stripe.Eq. (1)

(a)(b)(c)Figure 5: (a) Detected V-lines; (b) Detected IR foreground silhouette shown in red; (c) V-lines clipped to the foregroundsilhouette.Since all V-planes pass through the IR light source atS (Sx,Sy,0), we can determine d in Eq. (1):Sx ·nx 0·nz d 0d - Sx ·nxEq. (2)Then, we can write Eq. (1) asx·nx z·nz - Sx ·nx 0We determine the α-value for each V-line in a given frameusing the following methods:1. Intra-frame trackingα-values of V-lines that intersect with the H-line aredirectly computed from the 3D coordinates of theintersection point according to Eq. (4). Depth is assigned toevery line pixel according to Eq. (6); we refer to thisprocedure as intra-frame tracking because the depthassignment is based on tracking a V-line within a givenframe.and simplify it tox α· z Sx,Eq. (3)with only one single “gradient“ parameter α nz/nxdescribing the plane entirely. If only one 3D point P (x,y,z)on the light plane is known, it is straightforward to computeits α-value asα (Sx – x)/zEq. (4)Equally simple, if the α-value for a V-line in the IR imageis known, the depth z for any of its pixels can be computedas follows:The image pixel defines a ray from the camera’s center ofprojection into space, with a direction vk (vx, vy, vz) asobtained from the camera calibration. Since the camera’scenter of projection is simply the origin of the coordinatesystem, any 3D point (x,y,z) on this ray satisfiesvEq. (5)x x zvzCombining Eqs. (3) and (5), we can compute the z value forthe pixel asSEq. (6)z vx x αvzThe reason for the simplicity of the above equations is theselection of the IR camera as origin, and the alignment ofthe baseline and the stripes with the x- and y-axes,respectively.2. Inter-frame trackingFor remaining V-lines, we search the previous frame forsimilar lines, i.e. lines that are close enough as compared tothe average line spacing, and that correspond to the sametype of edge, i.e. either a transition from a dark to a brightor from a bright to a dark stripe. We compute a score forthat V-line based on proximity and line length, and if thescore is above a credibility threshold, we choose theprevious-frame V-line with the highest score as parent, andassign the α-value to its child, namely the corresponding Vline in the current frame. In this manner, the plane equationis passed on to future frames; we refer to this procedure asinter-frame tracking, because V-lines are tracked acrossconsecutive frames. For offline processing, future framesare also available, and can be used to reverse the trackingdirection in time in an anti-causal manner.3. Line countingAnother way to determine α-values for V-lines that couldnot be determined using either of the previous approaches isto use their neighboring lines. However, due to horizontaldepth discontinuities, non-consecutive V-lines mayerroneously appear as consecutive neighbors in the frame.Therefore, we assign an α-value only if (a) both left andright neighbor lines are identified and suggestapproximately the same α-value; in this case we choose thearithmetic mean between the two values; or (b), if a line hasa similar length to its identified neighbor, and runs roughlyparallel to it at the distance to be expected according to theline on the other side of the identified neighbor.

external camera parameters. This vertex is then projectedinto the VIS image, and depth is accordingly assigned to thecorresponding pixel. The result is a sparse depth imagewhich looks very similar to the one for the IR frame, butwhich is now superimposed on top of the VIS frame on apixel by pixel basis.B. Determine the foreground silhouette in the VIS framesIn order to assign depth values to the entire silhouette of aforeground object, it is necessary to precisely determine itsboundaries in the VIS frame. In principle, we could applythe same steps that we used to compute the silhouette in theIR domain shown in Figure 5(b), i.e. backgrounddifferencing followed by median filtering and small regionremoval. However, this procedure would inevitably resultin a silhouette which also includes the object’s shadow, asseen in Figures 7(a) and (b).Figure 6: Reconstructing the depth along V-lines. (a) IRframe; (b) V-lines from intra-frame tracking only; (c)V-lines with additional forward inter-frame tracking,(d) final result after V-lines with both forward andbackward inter-frame tracking, and line counting.To determine α-values, we first apply intra-frame tracking,followed by bi-directional inter-frame tracking, followed byline counting, and finally inter-frame tracking again, inorder to track the lines identified via line counting acrossframes. Using the obtained α-values, we compute depthvalues z for the identified lines according to Eq. (6). Figure6 shows the resulting depth lines, with depth coded as grayvalue.IV. DENSE DEPTH FRAME RECONSTRUCTIONUsing the techniques described in the previous section, weobtain depth along sparse lines in the IR image. However,our final goal is to obtain dense depth for foregroundobjects for every frame of the video sequence. Hence, weneed to interpolate between depth lines and extrapolate thedepth within the entire moving foreground object. To thisend, we apply the following steps to the sequence of VISframes:A. Project the IR depth lines onto the corresponding VISframeSince both the IR and the VIS camera are calibrated, thisstep is straightforward. From the depth value in the IRframe, a 3D vertex is computed using the internal a

Keywords: dynamic scene, motion capture, structured light I. INTRODUCTION The 4D capture of time-varying, dynamic scenes is useful for applications such as scene analysis, motion analysis, human action recognition, and interactive scene viewing. There are several approaches to acquire appearance and/or geometry of a scene from multiple

Related Documents:

Capturing the Gains Global Summit, ‘Capturing the Gains in Value Chains’, held in Cape Town, South Africa, 3-5 December 2012. The findings and views presented here do not represent those of Capturing the Gains funders or supporting organisations. Economic and social upgrading in global value chains: emerging trends and pressures

ST60 Depth instrument. Your ST60 Depth instrument provides depth information, plus maximum and minimum depth alarms. Switching on and off All the time that power is applied to the instrument, you can use the depth button to switch the instrument off and on as follows: † To switch the instrument off, hold down the depth button for approximately

depth buffer with the scene’s final depth values. Using this method, the rest of the scene can be rendered with depth test enabled and depth writes disabled. Note that it’s not necessary to render the entire scene in the Pre-Z pass as long as you keep track of what objects have been rendered to the depth buffer and

Topaz Signature Gem LCD Model T-LBK755SE-BHSB-R AIMsi V10 now supports signature capturing with the Topaz Signature Gem LCD 4x3 signature capturing device. To set up AIMsi for signature capturing, download and install the appropriate SigPlus software from www.topazsystems.com. Install the SigPlus software on the computer with the Signature Pad

Capturing SQL Server Data on page 2 Capture Mechanisms on page 3 Capturing Microsoft SQL Server Data on page 3 Capturing SQL Server Database Logs on page 4 Replicating Logs on page 6 Resizing a Database Log's Staging VDisk on page 4 SQL Server Data Capture Options on page 5 Replicating SQL Server Data on page 6

focus and correspondence depth estimates. The depth-maps are compared both qualitatively and quantitatively. A qualitative comparison of the two algorithms can be made by examining the resulting depth-maps. The algo-rithm implemented in this paper is able to create a depth-map with m

which contains four different architectures i.e., the multi-modal fusion (MM), multi-scale fusion (MS), global con-text aggregation (GA) and spatial information restoration (SR) cells. 3.2. Depth Sensitive Attention We propose a depth-sensitive RGB feature modeling scheme, including the depth decomposition and the depth-sensitive attention module.

2. Using the SharkBite Depth and Deburr tool, deburr the tube ends and mark the PEX with the correct insertion depth. For 1/2" fittings, the pipe insertion depth is 15/16". For 3/4" fittings, the pipe insertion depth is 1-1/8". 3. Push the PEX into the fitting on the manifold until it reaches the depth mark. 4. The connection is now complete.