Fast Quadratic Shape From Shading With Unknown Light Direction

An algorithm is proposed that extracts 3D shape from shading information in a digital image. The algorithm assumes that there is only a single source of light produc-ing the image, that the surface of the shape giving rise to the image is Lambertian (matte) and that its shape can be locally approximated by a quadratic function. Previous work shows that under these assumptions, robust shape from shading is possible, though slow for large images because a non-linear optimization method is applied in order to estimate local quadratic surface patches from image intensities. The work presented here shows that local quadratic surface patch estimates can be computed, without prior knowledge of the light source direction, via a linear least squares optimization, thus greatly improv-ing the algebraic complexity and run-time of the existing algorithms.


Introduction
Beginning with the work of B. Horn [1,2,3], Shape from Shading (SfS), algorithms enjoy a long history in computing science research [4,5,6,7,8], continue to attract theoretical investigation in the field of computer vision [9,10,11,12,13,14,15,16,17,18,19,20] and find many practical applications, e.g. in planetary research (Photoclinometry), industrial automation, medical imaging or face recognition [21,22,23,24,25,26,27,28,29]. The key objective of an SfS algorithm (see figure 1) is to process shading information, appearing as varying grayscale intensities in a digital image, in order to recover the 3D shape of an object or a set of objects depicted in the image. Often, the resulting shape description serves as further input to some other computer vision related or automated reasoning task. It is well known that humans have the ability to estimate the shape of an object by interpreting shading in their visual scene as a cue, however, exactly how humans accomplish this task is still the subject of active biological research [30]. Like many other computer vision related tasks, SfS methods typically base shape recovery on the physical theory underpinning image formation [31]. Even so, deducing the shape (often modelled as a height or alternatively a depth field in 3D space) of an object from its corresponding image intensities, e.g. by retracing the path of light from an image location back to the point on the object forming the image, is difficult because in general, the image formation transformation is not, in the mathematical sense, invertible. The fact that image formation reduces the 3D space of a scene to the 2D space of the image and because natural surfaces have varying reflectance properties and a complex geometry, in general implies that image intensity at a candidate point in the image is the aggregate result of the radiance of many other surface points in the scene. Thus, in general the relationship between image intensity and corresponding point on the source object is ambiguous. One way to deal with the ambiguity of image formation is to make assumptions about the problem domain that simplify the image formation model. Consequently, a great deal of theoretical research in SfS has focused on the so called Lambertian model [31] and its related applications. The Lambertian model describes the reflection properties of a matte surface, when it is illuminated by a point source of light. In some natural settings, e.g. marble statues under appropriate illumination, planetary or moon surface features under the illumination of the sun or facial skin under suitable light, to name a few, the Lambertian model can serve as a good approximation. As discovered by J. H. Lambert [32], when a Lambertian surface is illuminated by a point source of light, it appears equally bright in all viewing directions. This type of surface has diffuse radiance, meaning that regardless of its distribution, incident radiation at a point on the surface is scattered in all directions uniformly [33].

The Lambertian Model
As illustrated in figure 2, the Lambertian model has a simple geometric interpretation: Given a Lambertian surface H defined by a so called height function h(x, y), let θ denote the angle between the the normal vector n (at a given point (x, y, h(x, y)) on the surface) and the light vector l (representing a point source of light). Then the corresponding image intensity I is given by the following equation where ρ models surface reflectance (albedo), which often is assumed to be a constant equal to 1. Equation 1 is often represented as the equivalent vector dot product where n = (−p, −q, 1) (p,q are the partial derivatives of H w.r.t x,y direction respectively). An implicit assumption in the above model is that the camera C uses orthographic projection (infinite focus) for imaging, which is more suitable for imaging distant objects, instead of the perspective model. The simplicity that results from this is that an image point is in 1-1 correspondence with a scene point. However, there are problems with this model, notably [34] showed that when images are made with orthographic projection, due to an effect called the bas-relief ambiguity, SfS algorithms produce ambiguous results for certain types of image features. Despite its apparent simplicity, equation 2 is a first order non-linear partial differential equation for H in the differentials p and q. It has no obvious algebraic solution and to date, there is no single general numerical approach for solving it. A multitude of SfS methods have been proposed [7], each with specific (sometimes implicit) constraints on ρ, l or the underlying surface geometry in order to achieve certain theoretical results or meet the requirements of their respective applications.
SfS methods also differ in the way they approach the problem numerically. They may apply global energy minimization techniques in order to solve equation 2 simultaneously across all pixels in the image, whereas others may use so called propagation methods that progressively expand a region of known height around an initial point until the entire image is covered. Yet other so called local numerical methods compute height at each pixel in the image by combining the image intensities in a small region around a given pixel to a system of constraints that is then solved for each pixel independently. Lastly, while many SfS methods compute height directly at a given pixel, in contrast, some SfS methods apply an intermediate step, so that in a first pass across the image, the surface normal n is computed at every image pixel and in a secondary pass, so called normal integration, the resulting vector field is integrated to the surface height (H). Although normal integration is an elementary concept in Calculus, it remains a surprisingly difficult problem to solve numerically and is still the subject of active research [36].

Locally Quadratic Surfaces
The results presented in this work, are largely based on the image formation model presented in [16], which assumes a single image as input that is produced with orthographic projection and by illuminating a Lambertian surface (H) having constant albedo (ρ = 1 for all pixels in the image), with a point source of light and significantly H is locally quadratic. According to [16], a quadratic surface is one whose height function (see figure 2) is a bi-variate polynomial of degree two although they do not formally define what local in this case means. However, their underlying intention lies in their approach to SfS, i.e. subdividing the input image into a mosaic of small and overlapping image patches such that each patch is roughly the image of a quadratic surface. The success of the approach is predicated upon a proof they provide, i.e. when H is a quadratic surface that has an invertible Hessian matrix, then equation 2 has a unique solution if the light direction is known, and four distinct solutions if the light direction is unknown. Therefore, given a light source position, a search for a quadratic surface matching a given image patch I p is guaranteed to succeed. As they show, the search can be limited to points on a conic section that is unique to I p . Each point on the conic represents a unique quadratic surface proposal for I p . In a final step, a global numerical optimization method selects all quadratic surface proposals that minimize an overall error relative to the input image.
More recently, [10] also considers the case of SfS for quadratic surfaces. Similar to [16], an approach is presented in order to recover a locally quadratic surface patch, but in contrast to [16], their approach does not require prior knowledge of the light source and is neural network based. They show that the set of quadratic surface proposals that are consistent with the spatial derivatives of intensity at a single image point defines a two-dimensional algebraic variety embedded in the five-dimensional space of quadratic shapes. They devise a neural network, called a point processor, that computes an approximation of the variety from the image intensity and its derivatives at a given image point. Since the point processor, acts only as a constraint on the set of possible quadratic shapes at a point in the image, additional constraints are applied in order to find the unique quadratic shape satisfying the image constraints. They show two alternative ways to do this: In the first case, using two uncalibrated photometric stereo images, i.e. two images of the same scene but varying light vector, they compute for the same pixel in the two different images, the intersection of the respective varieties, using a maximum likelihood measure. They show empirically (but without proof) that the intersection of the varieties approximates a point, hence the unique quadratic surface. In the second approach, a continuity constraint is enforced across a neighborhood of the selected point in the image. The work presented in this paper is based on the observation (to be proven in section 3) that in the case of a quadratic surface, image intensities relate linearly to the components of the surface normal. This leads to the following three step approach to SfS (see figure 3): Compute skin: or each pixel, a locally quadratic analytic representation of image intensities is computed.
Compute normal: based on a small region of pixels, and the analytic representations of the previous step, a surface normal is computed via linear least squares.
Compute height: compute light vector based on local estimates of normal vectors, or alternatively compute surface height globally based on a normal integration algorithm.
The core of the work presented here focuses on step two, namely computing the surface normal, which effectively is the core task in [16] and [10]. Although height recovery is a fundamental post-processing step for the SfS approach presented here, it is considered out of scope. In order to verify the results of this work (computing normals), instead it shown how to recover the light vector. Note that in [16] a non-linear least-squares approximation (Levenberg-Marquardt) is required to do the same, whereas in [10] a neural net approach is required contrasting the purely algebraic approach used here. Refer to the conclusion in section 5 for more on this issue.

Definitions
In the following sections, unless specified otherwise, lower case letters generally denote scalar variables and upper case letters denote bi-variate functions over the reals. Furthermore, bold font lower case letters represent column vectors over the real numbers. If a n×1 (n ∈ N, n > 0) denotes an n dimensional column vector then a T 1×n denotes its transpose. For example, an arbitrary coordinate (x, y) ∈ R 2 is represented by the column vector x = [x y] T . Bold font uppercase letters represent matrices over real numbers. If H n×m (n ∈ N, n > 0,m ∈ N, m > 0) denotes an n × m dimensional real matrix then H T m×n denotes its transpose. Let l = [l x l y l z ] T , where l x ∈ R, l y ∈ R, l z ∈ R and l z > 0, denote a light source. As the algorithm presented here recovers the light direction only, assume further that l 2 x + l 2 y + l 2 z = 1. In the remaining sections let x = [x y] T (x ∈ R, y ∈ R) denote an arbitrary point in denote the quadratic height function of a surface illuminated by l. For convenience, let h = [h 0 · · · h 5 ] T denote the coefficients of H(x) and be referred to as the surface coefficients of H. Furthermore, it is assumed that the Hessian of H is invertible, i.e.
By [16] this implies that equation 2 has a unique solution if l is known and four distinct solutions otherwise. Let For convenience and when there is no ambiguity let p = P (x) and q = Q(x). Let n = [−p − q 1] T denote the vector normal to H at x. Using the notation defined above, equation 2 can be expressed as follows: and let By substituting the expressions for p and q (equations 5) into the denominator of equation 6 and expansion and simplification of the resulting terms, it is straightforward to show the following identity The identity above relates the components of u, which are solely functions of the surface coefficients to the magnitude squared of the normal vector (n) at a given image point. Therefore they will be henceforth referred to as the normal coefficients of H. Furthermore, since By substituting the expressions for p and q (equations 5) into the numerator of equation 6 and expansion and simplification of the resulting terms, it is straightforward to show the following identity Let and Because the components of m are related to the vector projection n · l they are referred to as the projection coefficients.
By combining the identities 9 and 14 with equation ?? it follows that Since, M (x) and U (x) are both quadratic polynomials and furthermore U (x) > 0, S(x) is a rational function defined everywhere over R 2 . Furthermore, it is easy to verify, using the quotient rule of differentiation, that all higher order derivatives of S(x) are also rational functions, where the denominator is a power of U (x). Hence, all higher order derivatives of S(x) are defined everywhere in R 2 . ∀i ∈ N, 0 ≤ i ≤ 5, let s i ∈ R and assume Furthermore, let s = [s 0 · · · s 5 ] . The components of s are referred to as the image coefficients of H at c.

Shape From Shading Using Least Squares
As was noted in the previous section, the values of the components of u solely depend on the components of h. In this section it is shown that that u can be computed using a linear least squares approximation that depends only on the known image coefficients and the known pixel position. In section 4 it is then shown how to compute h from u in a single step.
In the rest of this section, let c = [c x c y ], c x ∈ R, c y ∈ R denote an arbitrary coordinate in denote a general quadratic function of two variables, where q = [q 0 · · · q 5 ] T denotes the coefficient vector of Q.
Next, several identities with regards to the translation of the coordinate system are proven.
is a quadratic function of two variables, therefore it is a real analytic function over R 2 (see proposition 2.2.2 in [37]). Furthermore, because Q(x) is a real analytic function, then it has a Taylor series expansion at c (see theorem 2.2.5 in [37]). However, since Q(x) is a quadratic function, all terms in the Taylor series expansion of Q(x) at c of order 3 or greater must be zero, thus the statement of the lemma is proven.
x c x c y c 2 The coefficient vector q c of the quadratic function in equation 19 is therefore: Note that since D c is an upper triangular matrix with diagonal elements consisting of all ones, it follows that |D c | = 1 > 0. Hence, D c has an inverse and is positive definite. Let Proof From lemma 3.2 it follows that By lemma 3.1, the coefficients of the left hand side of equation 21 are D c · m. To prove the theorem, it suffices to show that the coefficients of the right hand side of equation 21 are given by (P c · D c · u). By the proof of lemma 3.2 , the coefficients of the right hand side of equation 21 are: From equation 16, it follows that U (x) · S(x) = M (x) and the entries in 22 can be rewritten as Note that P c is a lower triangular matrix and is singular iff s 0 = 0. Furthermore, if P c is non-singular then |P c | = (s 0 ) 6 > 0 and therefore is positive definite.
Let P = P 0 . By theorem 3.3, P c is similar to P via the change of basis matrix D c , i.e.
Furthermore, by theorem 3.3 It is straightforward to transform equation 25 to the following under determined homogeneous linear system in twelve unknowns, i.e. the components of u and m vectors combined: By selecting enough points in the domain (i.e. in the neighborhood of a candidate point at the center of the coordinate system) it is possible to extend the system represented by equation 26 in order to make it over-determined. For a given n ∈ N, n > 1, choose a unique set of points p i ∈ R 2 , i ∈ N, 0 ≤ i ≤ n and let D i = D pi . WLOG assume p 0 = 0 and extend the system in equation 26 to the following: Next, it is shown that the over-determined linear system in equation 27 provides sufficient constraints in order to solve for the six (unknown) components of u.

Theorem 3.4 Assume P is nonsingular and let
· · · · · · D n · P −D n     and let B = A T · A. Then B has a principal minor of rank 6.
Proof By multiplication of the terms of the block matrices A T and A, it is straightforward to show that Since ∀i ∈ N, 0 ≤ i ≤ n, D i is positive definite, then so is D i · D T i . Furthermore, since the sum of two positive definite matrices is positive definite, it follows that G is positive definite. Since P is nonsingular, it follows that |P T · G · P| = |P T | · |G| · |P| > 0. Therefore the submatrix P T · G · P, which is a principal minor of B, is positive definite and has rank 6.

Derivation Of Surface Coefficients
In order to compute surface height as well the light direction, the surface normals p and q are required. These are expressed in terms of components of h (see also the system of equations 5). This section shows how to compute h from u, where u was obtained using the method described in the previous section. Consistent with the results in [16] for the case when the light vector is unknown, upto a set of four distinct solutions for h are expected. Since by initial assumption (4h 3 h 4 − h 2 4 ) = 0 it follows immediately from equations 7b and 7c that there is a unique solution for h 1 and h 2 , i.e.
Therefore, the main task is to compute h 3 , h 4 and h 5 from equations eqs. (7d) to (7f). First, a method for computing the magnitudes |h 3 |, |h 4 | and |h 5 | is presented. Following, it is shown how to resolve their signs. Let w = h 2 4 . By equations 7d and 7f Therefore, if w (i.e. |h 4 |) is known then so are |h 3 | and |h 5 | and therefore, the main task reduces to the computation of w. Let a = u 2 4 + (u 3 − u 5 ) 2 , b = − By lemma 4.1, |h 4 | can be computed by applying the quadratic formula to a, b and c. Subsequently, |h 3 |, |h 4 | can be computed using equations 30, 31. Next, it is shown how to compute the signs of the resulting solution set.
Hence by equation 32 u 4 = 0 iff h 4 = 0. Next, assume a = 0. Then by lemma 4.1 (and after simplification) Therefore when u 4 = 0 By lemma 4.2, u 4 can be used to differentiate between two cases h 4 = 0 and h 4 = 0. When h 4 = 0, the following four solutions apply where the components h 1 and h 2 have been computed using equation 29. When h 4 = 0, by lemma 4.1 , there are either one (4u 3 u 5 − u 2 4 = 0) or two solutions for w. The rest of the section proves that for each possible choice of w, there are only two possible solutions for h. Hence in total, just like in the case h 4 = 0, there are at most four possible coefficient vectors as a solution.
Then either a = c and b = d or a = −c and b = −d.
Proof Let a ∈ R, b ∈ R, c ∈ R, d ∈ R where |a| = |c|, |b| = |d|. Assume that a · b = 0 and a · b = c · d. Clearly, when a = b and c = d or a = −b and c = −d, the lemma holds. Assume for the sake of contradiction that a = c and b = −d.
=⇒ a · b = c · d, by initial assumption =⇒ a · b = a · −b, by substitution for c and d Since by assumption a · b = 0 and hence b = 0, this is a contradiction. Using the same argument it can be shown that a = −c and b = d leads to a contradiction as well. Hence, either a = c and b = d or a = −c and b = −d.
Assume that a+b = 0 and a+b = c+d. Then a = c and b = d.

Results And Discussion
The theory presented in the previous sections is validated by a C++ implementation [38]. It shows that surface normals and light direction can be recovered fast and accurately for the case of a paraboloid and a sphere, but leaves out more complex examples because they require advanced techniques for computing height (see [40]), which is beyond the aim of this paper. Recall from figure 3 that surface height, or alternatively light recovery, is a multi-step process: Compute skin: At pixel location, the image coefficients are computed by linear least squares fitting of a second order polynomial to image intensities over a small square window of image.
Compute normal coefficients: Using the approach outlined in section 3 at each pixel location, normal coefficients are computed from the image coefficients.
Compute surface coefficients: Using the approach described in section 4 surface coefficients (and by equation 5, surface normals) are computed from normal coefficients.
Compute light direction (or height): In the final step the light direction is computed using a linear least squares approach by applying equation 6 as a linear constraint at each pixel, in a small square pixel window, on the light vector coefficients l x , l y and l z . The coefficients of the linear constraints, I(x), p(x) and q(x) are known values.
The computed light vector directions for two 400 × 400 pixel sample images (see figure 4) of a paraboloid and a sphere give the point clouds in figures 5 and 6 respectively. Table 1 shows the approximate computation times for both images based on a single threaded algorithm (running on an Intel i7-1185G7 processor). These indicate a significant improvement compared to [16], where under the assumption that the light vector is known, they apply a parallelized algorithm (no processor specified) on an 8 core machine, where they claim a one minute processing time for computing local quadratic surface proposals for an 128x128 pixel image.
Recall from section 4 that there are up to 4 different solutions to the light equation. Accordingly (see figures 5 and 6), the point clouds for the light vector computation consist of 4 clusters of points. When it is known that the image is that of a paraboloid or a sphere, there is no clustering algorithm (e.g. kmeans) required in order to determine the cluster centers (i.e. the desired solutions). Rather it suffices to partition the solution set depending on the sign of the computed second order surface coefficients (i.e. the signs of h 3 , h 4 and h 5 ; see also [38]). In this case, the cluster center is computed as the Euclidean average (center of mass) of the points in a given partition. The accuracy of the result is shown in table 2 as percentage deviation from the ideal vector l = [0.2 0. 3 Table 3 shows the accuracy of the algorithm in the case of an ideal sphere, i.e. where the image coefficients are generated synthetically as opposed to the standard approach (table 2) where image coefficients are computed in the skinning step with the least squares method. Since in the case of the ideal sphere the accuracy of the computed light vector is nearly 100%, it shows that the loss of accuracy in the standard case is solely due to skinning. Hence, the algorithm outlined in this work requires smooth (locally quadratic) patches and is sensitive to the quality of the analytic description of the source image.  Table 3: Accuracy of light direction estimation in case of an ideal sphere Finally, when comparing the approach presented here with that of [16], it must be noted that [16] (like many other approaches, see surveys [7] and [8]) combine the step of computing surface normals with that of surface heights into a single numerical step. The approach advocated here clearly separates these two steps into normal finding and so called normal integration (see [36]). That is, the runtime comparison of the algorithm here with that of [16] is not on an equal basis. However, separating the two steps (normal finding and integration) allows the approach presented here to compute the light vector in an efficient manner. This cannot be done with the approach [16]. It remains to be seen if an approach purely based on normal integration combined with the normal finding approach presented here, can outperform the algorithm in [16]. Nonetheless, it seems logical to take these two steps separately, i.e. approaching the shape from shading problem primarily as a normal finding problem and taking normal integration as a separate generalized problem of recovering surface from differentials. Because the latter does not bear any relationship to the non-linear light equation, it may offer a simpler and thus more efficient solution than the combined approach.