4D Atlas: Statistical Analysis of the Spatiotemporal Variability in Longitudinal 3D Shape Data

We propose a novel framework to learn the spatiotemporal variability in longitudinal 3D shape data sets, which contain observations of objects that evolve and deform over time. This problem is challenging since surfaces come with arbitrary parameterizations and thus, they need to be spatially registered. Also, different deforming objects, also called 4D surfaces, evolve at different speeds and thus they need to be temporally aligned. We solve this spatiotemporal registration problem using a Riemannian approach. We treat a 3D surface as a point in a shape space equipped with an elastic Riemannian metric that measures the amount of bending and stretching that the surfaces undergo. A 4D surface can then be seen as a trajectory in this space. With this formulation, the statistical analysis of 4D surfaces can be cast as the problem of analyzing trajectories embedded in a nonlinear Riemannian manifold. However, performing the spatiotemporal registration, and subsequently computing statistics, on such nonlinear spaces is not straightforward as they rely on complex nonlinear optimizations. Our core contribution is the mapping of the surfaces to the space of Square-Root Normal Fields where the L2 metric is equivalent to the partial elastic metric in the space of surfaces. Thus, by solving the spatial registration in the SRNF space, the problem of analyzing 4D surfaces becomes the problem of analyzing trajectories embedded in the SRNF space, which has a Euclidean structure. In this paper, we develop the building blocks that enable such analysis. These include: (1) the spatiotemporal registration of arbitrarily parameterized 4D surfaces in the presence of large elastic deformations and large variations in their execution rates; (2) the computation of geodesics between 4D surfaces; (3) the computation of statistical summaries; and (4) the synthesis of random 4D surfaces.


INTRODUCTION
S HAPE, an essential property of natural and man-made 3D objects, deforms over time as a result of many internal and external factors. For instance, anatomical organs such as bones, kidneys, and subcortical structures in the brain deform due to natural growth or disease progression; human faces deform as a consequence of talking, executing facial expressions, and aging. Similarly, actions and motions such as walking, jumping, and running are the result of deformations, over time, of the human body shape. The ability to understand and model (1) the typical deformation pat- terns of a class of 3D objects and (2) the variability of these deformations within and across object classes has many applications. In medical diagnosis and biological growth modeling, one is interested in measuring the intensity of pain from facial deformations [1], and in distinguishing between normal growth and disease progression using the deformation over time of body shape. In computer vision and graphics, the ability to statistically model such spatiotemporal variability can be used to summarize collections of 3D animations, and synthesize and simulate animations and motions. Similar to 3D morphable face models [2], these tools can also be used in a generative model for synthesizing large corpora of labeled longitudinal 3D shape data, e.g., 4D faces, for training deep neural networks.
This paper proposes a novel framework for the statistical analysis of longitudinal 3D shape data composed of objects that deform over time. Each object is represented as a closed manifold surface. We refer to an object captured at different points in time, e.g., a 3D human face performing a facial expression or speaking a sentence, or a 3D human body shape growing or performing actions, as a 4D (or 3D + t) surface. Given a set of 4D surfaces, our goal is to: • Compute the mean deformation pattern, i.e., the statistical mean 4D surface. For example, the same person can smile in different ways. Similarly, different people arXiv:2101.09403v2 [cs.CV] 20 Aug 2021 smile differently. The goal is to learn, based on observed longitudinal shape data, the typical smile. • Compute the directions of variation, analogous to Principal Component Analysis (PCA) for modeling 3D shape variability [3], [4], but here we focus on modeling variability in 4D surface collections. • Characterize a population of 4D surfaces using statistical models. • Synthesize new 4D surfaces by sampling, randomly or in a controlled fashion, from these statistical models.
We refer to these tasks as the process of constructing a 4D atlas. Achieving this goal requires solving important fundamental challenges. In fact, 3D objects such as faces, human body shapes, and anatomical organs, which come with arbitrary parameterizations, exhibit large elastic deformations within the same object and across different objects. This makes their spatial registration, i.e., finding one-to-one correspondences between each pair of shapes, very challenging. In the case of 4D surfaces, there is an additional temporal variability due to different execution rates (speeds) of evolution within and across objects. For instance, a walking action can be executed at variable speeds even by the same person. Thus, the statistical analysis of the spatiotemporal variability in samples of 4D surfaces requires efficient spatiotemporal registration of these samples. Spatial registration refers to the process of finding a one-to-one correspondence between two 3D surfaces of the same individual, captured at different points in time, or of different individuals. Temporal registration refers to the problem of finding the optimal time warping that aligns 4D surfaces, e.g., walking actions, performed at different execution rates.
We treat a 4D surface as a trajectory in a highdimensional nonlinear space. We then formulate the problem of analyzing the spatiotemporal variability of 4D surfaces as the statistical analysis of elastic trajectories, where elasticity corresponds to variations in the execution rates of the 4D surfaces. However, performing statistics on trajectories embedded in nonlinear spaces of high dimension is computationally expensive since it relies on nonlinear optimizations. Our core contribution in this paper is the mapping of the surfaces to the space of Square-Root Normal Fields (SRNF) [4], [5], which has a Euclidean structure (see Section 3.1-in particular, the L 2 metric in the space of SRNFs is equivalent to the partial elastic metric in the space of surfaces), meaning that the problem of analyzing 4D surfaces becomes the problem of analyzing trajectories, or curves, embedded in the Euclidean space of SRNFs.
This paper develops the building blocks that enable such analysis. We use these building blocks to compute statistical summaries, such as means and modes of variation of collections of 4D surfaces, and for the synthesis, either randomly or in a controlled fashion, of 4D surfaces. We demonstrate the utility and performance of the proposed framework using 4D facial surfaces from the VOCA dataset [6], 4D human body shapes from the Dynamic FAUST (DFAUST) dataset [7], and dressed 4D human body shapes from the CAPE dataset [8]. Our approach is, however, general and applies to all spherically-parameterized surfaces. In summary, the main contributions of this article are: • We represent 4D surfaces as trajectories in the space of SRNFs, which has a Euclidean structure (Section 3.1).
This key contribution enables the usage of standard computational tools for the analysis and modeling of 4D surfaces (Section 3.2). • We propose efficient algorithms for the spatiotemporal registration of 4D surfaces and the computation of geodesics between such 4D surfaces, even in the presence of large elastic deformations and significant variations in execution rates (Sections 3.2.2 and 3.2.3). • The framework does not explicitly or implicitly assume that the correspondences between the surfaces are given. It simultaneously solves for the spatial and temporal registrations, and for the 4D geodesics that are optimal under the proposed metrics. • We develop computational tools for (1) computing summary statistics of 4D surfaces and (2) synthesizing 4D surfaces from formal statistical models (Section 4). The remainder of this paper is organized as follows. We first discuss related work in Section 2. Section 3 describes the proposed mathematical framework. Section 4 discusses its application to various statistical analysis tasks. Section 5 presents the results and discusses the performance of the proposed framework. Section 6 summarizes the main findings of this paper and discusses future research directions.

RELATED WORK
We classify the state-of-the-art into two categories. Methods in the first category focus on cross-sectional shape data (Section 2.1). Methods in the second category focus on longitudinal shape data (Section 2.2).

Statistical models of cross-sectional 3D shape data
Modeling shape variability in 2D and 3D objects has been studied extensively in the literature. Early methods use Principal Component Analysis (PCA) to characterize the shape space of objects. Initially introduced for the analysis of planar shapes, the active shape model of Cootes et al. [9] has been extended to 3D faces [10] and 3D human bodies [3]; see [2] for a detailed survey. These methods represent 3D objects as discrete sets of landmarks, e.g., vertices, which are assumed to be in correspondence across a population of objects, and use standard Euclidean metrics for their comparison. Thus, they are limited to 3D objects that undergo small elastic deformations.
To handle large nonlinear variations, e.g., elastic deformations such as the bending and stretching observed in 3D human body shapes, Anguelov et al. [11] introduced SCAPE, which represents body shape and pose-dependent shape in terms of triangle deformations instead of vertex displacements. Hasler et al. [12] learn two linear models: one for pose and one for body shape. Loper et al. [13] introduce SMPL, a vertex-based linear model for human body shape and posedependent shape variation. This model has been extensively used in the literature for the analysis of the human body shape. It has also been adapted to other types of objects such as animals [14] and human body parts [15]. While these models can capture large variations, they exhibit two fundamental limitations. First, they rely on separate models for pose-independent shape, pose-dependent shape, and pose. Thus, they are limited to specific classes of objects, e.g., human bodies. Changing the target application, e.g., to animals [14] or infants [16], requires redefining the model. Second, they either assume a given registration between the surfaces of the 3D objects or solve for registration separately by matching vertices across the surfaces using an unrelated optimization criterion.
Recently, there has been a growing interest in analyzing variability in 3D shape collections using tools from differential and Riemannian geometry [4], [5], [17], [18], [19], [20], [21]; see [22] for a detailed survey. The work most relevant to ours is the Square-Root Normal Field (SRNF) representation introduced in [5]. In this work, parameterized surfaces are compared using a partial elastic Riemannian metric defined as a weighted sum of a bending term and a stretching term. More importantly, Jermyn et al. [5] show that by carefully choosing the weights of these two terms, the complex partial elastic metric reduces to the L 2 metric in the space of SRNFs. Thus, by treating shapes of objects as points in the SRNF space, a straight line between two points in this space is equivalent to the geodesic (or shortest) curve in the original space of surfaces under the partial elastic metric, and represents the optimal deformation between them. As a result, one can perform statistical analysis in the SRNF space using standard vector calculus, and then map the results back to the space of surfaces (for visualization), using the approach of Laga et al. [4]. Another important property of SRNFs is that both registration and optimal deformation (geodesic) are computed jointly, using the same partial elastic metric.
One of the fundamental problems in statistical shape analysis is correspondence and registration, see [23]. Past methods do not define a shape space and a metric that enable the computation of geodesics and statistics. Also, correspondence methods that are based on the intrinsic properties of surfaces, e.g., Generalized Multidimensional Scaling [24], spectral descriptors [25], or functional maps (which rely on the availability of descriptors) [26], [27], are primarily suited for surfaces that deform in an isometric manner. They also require landmarks to resolve symmetry ambiguities.

Statistical models for longitudinal shape data
As stated in [7], we live in a 4D world of 3D shapes in motion. With the availability of a variety of range sensing devices that can scan dynamic objects at high temporal frequency, there is a growing interest in capturing and modeling the 4D dynamics of objects [28], [29], [30]. For instance, Wand et al. [28] and Tevs et al. [30] propose methods to reconstruct the deforming geometry of time-varying point clouds. Li et al. [31] use sequences of 4D scans to learn a statistical 3D facial model. This model, referred to as FLAME, has been later used by Cudeiro et al. [6] to capture, learn, and synthesize 3D speaking styles. Bogo et al. [7] build a 4D human dataset by registering a 3D human template to sequences of 3D human scans performing various types of actions. These methods focus on the 3D reconstruction of deforming objects. The literature on the statistical analysis of their spatiotemporal variability is rather limited.
Early works focused on longitudinal 2D shape data. For instance, Anirudh et al. [32] represent the contour of planar shapes that evolve as trajectories on a Grassmann manifold. They then use the Transported Square-Root Vector Fields (TSRVF) representation for their rate-invariant analysis. This approach was later extended to the analysis of the trajectories of sparse features or landmarks measured on the surface of a deforming 3D object. For instance, Akhter et al. [33] introduced a bilinear spatiotemporal basis to model the spatiotemporal variability in 4D surfaces. The approach treats surfaces as N discrete landmarks and uses the L 2 metric and PCA in R 4N for their analysis. Thus, the approch is not suitable for highly articulated shapes that undergo large articulated and elastic motion (e.g., human bodies). The approach also assumes that the landmarks are in correspondence, both spatially and temporally.
Anirudh et al. [32] and Ben Amor et al. [34] represent human body actions using dynamic skeletons. By treating each skeleton, represented by a set of landmarks, as a highdimensional point on Kendall's shape space [35], motions become trajectories in a high-dimensional, Euclidean space. Thus, one can use the rich literature on the statistical analysis of high-dimensional curves [36] to build a framework for the statistical analysis of human motions and actions. This approach, however, has two fundamental limitations. First, the L 2 metric on Kendall's shape space is not suitable for large articulated motions. Second, skeletons and landmarks do not capture surface elasticity, and thus, cannot be used to model growth processes and surface deformations due to motion. While this can be addressed by using two separate models, one for shape and another for motion, it will fail to capture motion-dependent shape variations.
Using the LDDMM framework [37], Debavelaere et al. [38] and Bone et al. [39] represent a 4D surface as a flow of deformations of the 3D volume around each surface and then code deformations as geodesics on a Riemannian manifold. However, in general, natural deformations do not correspond to geodesics but can be arbitrary paths on the shape space. Also, deforming 3D volumes is expensive in terms of computation and memory requirements. Finally, this approach relies on manually specified landmarks to efficiently register the 3D volumes. Our approach, which can handle large articulated and elastic motions, works directly on surfaces, does not assume that deformations are (piecewise) geodesics, and does not rely on landmarks for the spatiotemporal registration.

MATHEMATICAL FRAMEWORK
We describe in this section the proposed mathematical framework for the spatiotemporal registration and comparison of 4D surfaces. Section 4 discusses its application to various statistical analysis tasks. A 4D surface, where the fourth dimension refers to time, is a 3D surface that evolves over time. Examples of such 4D surfaces include facial expressions (e.g., a smiling face), a human body shape performing an action such as walking or jumping, or an anatomical organ that evolves over time due to natural growth or disease progression. It can be represented as a path α(t), t ∈ [0, 1] such that α(0) and α(1) are the initial and final surfaces, respectively, and α(t), 0 < t < 1 are the intermediate surfaces.  surfaces within the same 4D surface and across different 4D surfaces come with arbitrary poses and registrations. Second, 4D surfaces can have different execution rates, e.g., two smiling expressions performed at different speeds. Thus, to compare and perform statistical analysis on samples of 4D surfaces, we first need to spatiotemporally register them.
We solve the spatiotemporal registration problem using tools from differential geometry. We treat surfaces as points in a Riemannian shape space equipped with an elastic metric that captures shape differences using bending and stretching energies. We then formulate the elastic registration problem, i.e., the problem of computing spatial correspondences, as that of finding the optimal rotation and reparameterization that align one surface onto another. This enables comparing and spatially registering surfaces, even in the presence of large elastic deformations (Section 3.1).
With this representation, a 4D surface becomes a timeparameterized trajectory in the above-referenced Riemannian shape space. Thus, the problem of analyzing 4D surfaces is reduced to the problem of analyzing curves. Similar to surfaces, we define a space of curves equipped with a Riemannian metric, which quantifies the amount of elastic deformation, or time warping, needed to align two 4D surfaces (Section 3.2). where Ω is a parameterization domain and s ∈ Ω is the parameter in this domain. The choice of Ω depends on the nature of the surfaces of interest. When dealing with closed surfaces of genus-0, Ω is a sphere, i.e., Ω = S 2 , and s = (u, v), with u ∈ [0, π] and v ∈ [0, 2π[, are the spherical coordinates. In practice, surfaces come as unregistered triangulated meshes. We then use the spherical parameterization algorithm of [40] to map them to a spherical domain.

The elastic shape space of surfaces
To remove shape-preserving transformations, we first translate the surfaces so that their center of mass is located at the origin, and then scale them to have unit surface area.
The space normalized surfaces, denoted by F, is called the preshape space.
Having removed translation and scale, we still need to account for rotations and reparameterizations. Those are handled algebraically. For any surface f ∈ F and for any rotation O ∈ SO(3), Of and f have equivalent shapes. Similarly, any reparameterization of a surface with an orientation-preserving diffeomorphism preserves its shape. Let Γ be the space of all orientation-preserving diffeomorphisms of Ω. Then, ∀ γ ∈ Γ, f and f • γ, i.e., the reparameterization of f with γ, have the same shape. (Here, • refers to the composition of two functions.) Note that reparameterizations provide dense correspondences across surfaces. If one wants to put a surface f 2 in correspondence with another surface f 1 , then we need to find a rotation O * and a reparameterization γ * such that O * (f 2 • γ * ) is as close as possible to f 1 . This is precisely the process of 3D surface registration. It is defined mathematically as: where d F is a measure of distances between surfaces in F.

SRNF representation of surfaces
For efficient registration and comparison of surfaces, the distance measure, or metric, d F should quantify interpretable shape differences, i.e., the amount of bending and stretching one needs to apply to one surface to deform it into another. It should also be simple enough to facilitate efficient computation of correspondences and geodesic paths. Jermyn et al. [5] introduced a partial elastic metric that measures differences between surfaces as a weighted sum of the amount of bending and stretching that one needs to apply to a surface to align it to another. In this approach, bending is measured in terms of changes in the orientation of the unit normal vectors, while stretching is measured in terms of changes in the infinitesimal surface areas. More importantly, Jermyn et al. [5] showed that by using a special representation of surfaces, called the Square-Root Normal Field (SRNF), the complex partial elastic metric reduces to the simple L 2 metric on the SRNF space. Definition 3.1 (SRNF maps). The SRNF map H(f ) of a surface f ∈ F is defined as the normal vector field of the surface scaled by the square-root of the local area around each surface point: where C h is the space of all SRNFs, n = ∂f ∂u × ∂f ∂v is the normal field to f and · 2 is the L 2 norm in R 3 .
The SNRF representation of surfaces has nice properties that make it suitable for the various analysis tasks at hand: • It is translation invariant. Also, the SRNF of a rotated surface is simply the rotation of the SRNF of that surface, i.e., H(Of ) = OH(f ).
the Jacobian of γ and | · | is its determinant. • Under the L 2 metric in the space of SRNFs, the action of Γ is by isometries, i.e., ∀ γ ∈ Γ and ∀ f 1 , • The space of SRNFs is a subset of L 2 (Ω, R 3 ). In addition, the L 2 metric in C h is equivalent to the partial elastic metric in the space of surfaces. As such, geodesics in F become straight lines in the SRNF space C h ; see Fig. 1. • Currently, there is no analytical expression for the inverse SRNF map, and in fact, the injectivity and surjectivity of the SRNF remain open questions. However, Laga et al. [4] showed that, for a given SRNF of a valid surface, one can always numerically estimate the original surface, up to translation [4]. The last three properties are critical for comparison and atlas construction of 4D surfaces. One can perform elastic registration of surfaces using the standard L 2 metric in the space of SRNFs, which is computationally very efficient compared to using the complex elastic metric in the space of surfaces (Section 3.1.2). Further, temporal evolutions of surfaces can be interpreted as curves in the Euclidean space of SRNFs, making them amenable to statistical analysis. Thus, the problem of constructing 4D atlases becomes the problem of statistical analysis of elastic curves in the space of SRNFs using standard statistical tools developed for Euclidean spaces. After analysis, the results can be mapped back to the original space of surfaces using efficient SRNF inversion procedures [4] (Section 3.2).

Spatial elastic registration of surfaces
Under the SRNF representation, the elastic registration problem in Eqn. (2) can be reformulated using the L 2 metric on C h , the space of SRNFs, instead of the complex partial elastic metric on the preshape space F. Let f 1 and f 2 be two surfaces in the preshape space F, and h 1 and h 2 their SRNFs. Then, the rotation and reparameterization that optimally register f 2 to f 1 are given by: where * is the composition operator between an SRNF and a diffeomorphism γ ∈ Γ. This joint optimization over SO (3) and Γ can be solved by alternating, until convergence, between the two marginal optimizations (this is allowed due to the product structure of SO(3) × Γ) [41]: • Assuming a fixed parameterization, solve for the optimal rotation using Procrustes analysis via Singular Value Decomposition (SVD). • Assuming a fixed rotation, solve for the optimal reparameterization using a gradient descent algorithm. To solve for the optimal reparameterization, we represent the space Γ of diffeomorphisms γ, which are functions on the sphere, using spherical harmonic basis {B i } i=1,...,n . This way, every γ ∈ Γ can be written as a weighted sum of the harmonic basis: γ = n i=1 a i B i . Thus, the search for the optimal diffeomorphism is reduced to the search for the optimal weights {a i }. This procedure is described in detail in Section A.1 of the Supplementary Material.
Although this approach converges to a local optimum, in practice, it can be used in a very efficient way. Since a 4D surface α is a sequence of discrete realizations α(t i ) ∈ F, i = 0, · · · , n, with t 0 = 0 and t n = 1, one can perform the elastic registration sequentially. H(α(t)). Also, let α 0 be a reference surface randomly chosen from the population of surfaces being analyzed, and β 0 its SRNF map (α 0 can be, for example, α(0)). Then, 1) Find O 0 ∈ SO(3) and γ 0 ∈ Γ that register β(t 0 ) (the start point of SRNF path) to the SRNF of the reference surface β 0 , by solving Eqn. (4). 2) For i = 0, . . . , n, The first step ensures that, when given a collection of 4D surfaces α j , j = 1, · · · , n, the surfaces α j (0), j = 1, · · · , n are registered to each other. The subsequent steps ensure that ∀t, α j (t) is registered to α j (0). This sequential approach is efficient since, in general, elastic deformations between two consecutive frames in a 4D surface are relatively small. In what follows, we assume that all surfaces within a 4D surface and across 4D surfaces are correctly registered, i.e., they have been normalized for translation and scale, and optimally rotated and reparameterized using the approach described in this section.

The shape space of 4D surfaces
Under the setup of Section 3.1, a 4D surface becomes a curve α : [0, 1] → F. However, since F is endowed with the partial elastic metric, which is non-Euclidean, we propose to further map the 4D surfaces to the SRNF space, which has a Euclidean structure. Thus, 4D surfaces become curves of the form β : [0, 1] → C h . With this representation, all statistical tasks are carried out in C h under the L 2 metric with results mapped back to the space of surfaces F for visualization.

TSRVF representation of SRNF trajectories
Let α be a curve (path) in F and β its image under the SRNF map, i.e., ∀ t ∈ [0, 1], β(t) = H(α(t)); β is also a curve, but in C h . Let M F be the space of all paths in F, and M h be the space of all paths in C h : To temporally register, compare, and summarize samples of such curves, we need to define an appropriate metric on M F , or M h , that is invariant to the rate (or speed) of the 4D surfaces. For example, facial expressions that only differ in the rate of their execution should be deemed equivalent under such a metric. Let Ξ = {ξ : [0, 1] → [0, 1] such that 0 <ξ < ∞, ξ(0) = 0 and ξ(1) = 1} denote all reparameterizations of the temporal domain [0, 1]. Here, ξ = dξ dt . Then, for any ξ ∈ Ξ, β • ξ and β only differ in the rate of execution and are thus equivalent. The function ξ is often referred to as a time warping of the domain [0, 1]. Temporal registration of two 4D surfaces α 1 and α 2 then becomes the problem of registering their corresponding curves β 1 and β 2 in C h . This requires solving for an optimal reparameterization ξ * ∈ Ξ that minimizes an appropriate distance d(·, ·) between β 1 and β 2 : The optimization over Ξ in Eqn. (5) [36] for analyzing shapes of curves in R n , n ≥ 2. The associated elastic metric defined therein is invariant to reparameterizations of curves, and quantifies the amount of bending and stretching of the curves in terms of changes in the orientations and lengths of their tangent vectors, respectively. However, instead of directly working with such a complex elastic metric, Su et al. [42] introduced the Transported Square-Root Vector Field (TSRVF) representation, which simplifies the complex elastic metric into the simple L 2 metric.

Definition 3.2 (Transported Square-Root Vector Field (TSRVF)).
For any smooth trajectory β ∈ M h , the transported square-root vector field (TSRVF) is a parallel transport of a scaled velocity vector field of β to a reference point c ∈ C h according to whereβ = ∂β ∂t is the tangent vector field on β and · is the L 2 metric on C h .
Note that the parallel transportβ(t)| β(t)→c is performed along the geodesic from β(t) to c. The TSRVF representation has nice properties that facilitate efficient temporal registration of 4D surfaces. Let β 1 and β 2 be two trajectories on M h , and let q 1 and q 2 be their respective TSRVFs. Then, • The elastic metric on the space of trajectories M h reduces to the L 2 metric on the space of their TSRVFs. Thus, one can use the L 2 metric to compare two paths: where · is again the L 2 norm on C h .
• Under the L 2 metric, the action of the reparameterization group Ξ on the space of TSRVFs is by isometries, i.e., q 1 − q 2 = (q 1 ξ) − (q 2 ξ) , ∀ ξ ∈ Ξ. • Given a TSRVF q and an initial trajectory point, one can reconstruct the corresponding path β, such that Q(β) = q, by solving an ordinary differential equation [42]. As we will see next, these properties enable efficient temporal registration of trajectories and subsequent rate-invariant statistical analysis. In what follows, let Q denote the space of TSRVFs equipped with the L 2 metric defined in Eqn. 7.

Temporal registration
Under the TSRVF representation, the temporal registration problem in Eqn. (5), which involved optimization over Ξ, can now be reformulated using the standard L 2 metric on the space of TSRVFs: This problem can be solved efficiently using a Dynamic Programming algorithm [42], [43]. Then, the rate-invariant distance between two trajectories is given by:

Geodesics between 4D surfaces
We now summarize our entire pipeline. Let α 1 , α 2 ∈ M F be two 4D surfaces. The pipeline to spatiotemporally register them and compute the geodesic path between them can be summarized as follows.
(1) Proposed spatial registration. The goal is to spatially register the surfaces in α 1 and α 2 to the same reference surface, which can be any arbitrary surface. For simplicity, we choose it to be α 1 (0), the first surface in the sequence α 1 . The spatial registration can then be performed in two steps.
(2) Proposed temporal alignment. β 1 and β 2 are elements of M h . We perform temporal registration in three steps.
(3) Proposed geodesic computation. Since Q is Euclidean, the geodesic path Λ q between q 1 and q * 2 is a straight line: Next, we map Λ q back to M h using the inverse TSRVF map, i.e., ∀ τ, Λ β (τ ) = Q −1 (Λ q (τ )). The computation of the inverse mapping uses the starting point on the trajectory and has a closed-form solution, making it computationally efficient. This is described in detail in [42]. After applying the inverse mapping to the entire geodesic path, we have , a geodesic path between the SRNF curves β 1 and β 2 .
(4) Visualization. To visualize geodesic paths between 4D surfaces (and not their SRNFs), we need to further map all SRNFs on the trajectory Λ β (τ ) to their corresponding surfaces in F. This is done using the inverse SRNF map, Unlike the TSRVF map whose inverse can be computed analytically, inversion of the SRNF map, whose injectivity and surjectivity are yet to be determined, has to be accomplished numerically using the approach of Laga et al. [4]. Now, Λ is the geodesic path between the 4D surfaces α 1 and α * 2 , i.e., Λ(0) = α 1 , Λ(1) = α * 2 , and α τ = Λ(τ ) is a 4D surface at time τ along the geodesic path. Fig. 3 shows an example of a geodesic between two 4D surfaces representing talking faces. Each row corresponds to one 4D surface. The top is the source, the bottom row is the target, after optimal spatiotemporal registration, and the highlighted row in the middle corresponds to the mean 4D surface. The temporal registration is further illustrated in Fig. 4, where we show the source 4D surface, the target 4D surface before the spatiotemporal registration, and the target 4D surface after the spatiotemporal registration. Section 5 provides more examples of geodesics computed between various types of 4D surfaces.

STATISTICAL ANALYSIS OF 4D SURFACES
Now that we have devised all of the required mathematical tools for comparing 4D surfaces, we shift our focus to how these tools can be used to build a 4D atlas from a sample of 4D surfaces. Let α 1 , · · · , α n be a set of 4D surfaces and β 1 , · · · , β n be their corresponding trajectories in C h . We assume that all of the surfaces, and their corresponding SRNFs, have been spatially registered to a common reference; see Section 3.1.2. We proceed to map all of the 4D surfaces to their corresponding TSRVFs, hereinafter denoted by q 1 , · · · , q n , and compute statistics in the space of TSRVFs. As before, all results are mapped at the end to the original space of surfaces F for visualization. We will use this framework to compute means and modes of variation, and to synthesize novel 4D surfaces by sampling from probability distributions fitted to a set of exemplar 4D surfaces.
Mean of 4D surfaces. Intuitively, the mean of a collection of 4D surfaces is the 4D surface that is as close as possible to all of the 4D surfaces in the collection, under the specified distance measure (or metric). It is also called Karcher mean and is defined as the 4D surface that minimizes the sum of squared distances to all of the 4D surfaces in the given sample. In other words, we seek to solve the following optimization problem, defined in the space of TSRVFs: Algorithm 3 in the Supplementary Material describes the proposed procedure for solving this optimisation problem. It outputs the TSRVF Karcher meanq, the optimal temporal reparameterizations ξ * i , i = 1, . . . , n, and the temporally registered TSRVFs q * i = q i ξ * i ; again, for simplified notation we simply use ξ i and q i to denote the optimal temporal reparameterizations and the temporally registered TSRVFs. The mean 4D surface can be obtained by TSRVF inversion of the mean TSRVF followed by SRNF inversion [4].
Principal directions of variation. Since the TSRVF space is Euclidean, the principal directions of variation can also be computed in a standard way, i.e., using the Singular Value Decomposition (SVD) of the covariance matrix. In the following, we assume that the TSRVFs are sampled using a finite set of points and appropriately vectorized. Let be the covariance matrix of the input sample, σ i , i = 1, . . . , k its k−leading eigenvalues, and Σ i , i = 1, . . . , k the corresponding eigenvectors. Then, one can explore the variability in the i−th principal direction using q τ =q + τ √ σ i Σ i , where τ ∈ R. To visualize this principal direction of variation, we again use TSRVF inversion followed by SRNF inversion to compute the 4D surface α τ , such that Q(H(α τ )) =q + τ √ σ i Σ i , τ ∈ R.
Random 4D surface synthesis. Given the mean and the k-leading principal directions of variation, any TSRVF q of a 4D surface α can be approximately represented, in a parameterized form, as: Thus, to generate a random TSRVF, we only need to generate k random values τ i ∈ R and plug them into Eqn. (12). Then, to compute the corresponding random 4D surface, we apply the inverse TSRVF map followed by the inverse SRNF map. Also, by enforcing each τ i to be within a certain range, e.g., [−1, 1], we can ensure that the generated random 4D surfaces are similar to the given sample and thus plausible. This procedure allows the generation of new random 4D surfaces. However, it does not offer any control over the generation process, which is entirely random. In many situations, we would like to control this process using a set of parameters. For instance, when dealing with 4D facial expressions, these parameters can be the degree of sadness, facial dimensions, etc. This type of control can be imple-mented using regression in the TSRVF space, a problem that we plan to explore in the future.

RESULTS
This section demonstrates some results of the proposed framework and evaluates its performance. Section 5.1 focuses on spatiotemporal registration and geodesic computation between 4D surfaces. Section 5.2 focuses on the computation of the statistical summaries while Section 5.3 focuses on the random synthesis of 4D surfaces. Finally, Section 5.4 provides an ablation study to demonstrate the importance of each component of the proposed framework. We use three datasets: (1) VOCA [6], which contains 4D facial scans, captured at 60fps, of 12 subjects speaking various sentences; (2) MPI DFAUST [7], which contains high-resolution 4D scans of 10 human subjects in motion, captured at 60fps; in total, the dataset contains 129 dynamic performances; and (3) MPI 4D CAPE [8], which contains high-resolution 4D scans of 10 male and 5 female subjects in clothing. These datasets come as polygonal meshes with consistent triangulation and given registration across the meshes. We spherically parameterize them using Kurtek et al.'s implementation [17] of the spherical parameterization approach of [40]. We also apply randomly generated spatial diffeomorphisms to simulate non-registered surfaces. Our framework does not use, either explicitly or implicitly, the provided vertex-wise correspondences.

Spatiotemporal registration and 4D geodesics
We consider pairs of 4D facial expressions from the VOCA dataset. We first reparameterize each 4D surface using (c) Target after spatio-temporal registration.   randomly generated time-warping functions to simulate facial expressions performed at different execution rates. We then apply the framework proposed in this paper to spatiotemporally register them. Fig. 4 shows an example of such spatiotemporal registration. In this example, we show (a) the source 4D surface, (b) the target 4D surface before spatiotemporal registration, and (c) the target 4D surface after spatiotemporally registering it to the source. We also highlight some key frames. As one can see, the original 4D surfaces differ significantly in their execution rates. The spatiotemporal registration framework synchronizes the source and target expressions, thus enabling their comparison, interpolation and averaging. We also perform a similar experiment on the human body shapes of the DFAUST [7] and CAPE [8] datasets; see Figs. 5, 6, and 12(a)-(c). Compared to faces, human body shapes are very challenging to analyze since they perform complex articulated motions, which result in large bending and stretching of their surfaces. 4D geodesics. Fig. 7 shows geodesics between 4D human body shapes. In this example, both the source and target perform a punching action but at different rates. We show the geodesic before and after the spatiotemporal registration of the target 4D surface onto the source. Unlike the jumping (a) Before registration. The highlighted middle row corresponds to the mean 4D surface.
(b) After registration. The highlighted middle row corresponds to the mean 4D surface.   Fig. 5, the left hand of the target surface does not perform the same action as the left hand of the source surface. Nevertheless, our framework can bring these two 4D surfaces as close as possible to each other. The supplementary material includes a video of the sequence and more examples of geodesics between 4D faces (from VOCA), 4D human bodies (from DFAUST), and clothed 4D human bodies (from CAPE).
Evaluation of the spatial registration. We quantitatively evaluate the accuracy of the proposed spatial registration method and compare it to the latest functional map-based techniques such as MapTree [44] and Fast Sinkhorn filters [45]. Similar to our method, functional maps operate on clean manifold surfaces and do not use any form of (deep) learning. We take the surfaces of COMA [46], CAPE [8], and DFAUST [7] datasets, which come with ground-truth correspondences, and apply random spatial diffeomorphisms to them to simulate unregistered surfaces. We then compute the correspondence map between each pair of surfaces in the dataset. We measure the spatial registration error in terms of the geodesic distance, on the parameterization domain, between the ground-truth and the computed correspondence. Table 1 reports the mean, standard deviation, and median of the registration errors computed across all the models in each data set. As one can see, the proposed SRNF-based spatial registration method significantly outperforms stateof-the-art algorithms [44], [45]. We refer the reader to the supplementary material, which includes visual examples of pairs of surfaces before and after spatial registration. It also includes additional spatial registration experiments using the quadruped animal data set of Kulkarni et al. [47].
An important property of the proposed approach is that it finds a one-to-one mapping between the source and target surfaces. This is not the case with functional map-based methods, which can map a point on the source to multiple points on the target. Thus, they cannot be used to compute geodesics, interpolations, and statistical summaries.
Evaluation of the temporal registration. We use the FLAME fitting framework [31] to generate random 4D facial surfaces with known ground-truth temporal registrations. We first generate two random SMPL parameters, each corresponding to a 3D surface, and then linearly interpolate them to simulate a deforming 4D facial surface. Let α i , i ∈ {1, . . . , 100} be the resulting 4D surfaces. Next, we generate 100 random temporal diffeomorphisms ξ i ; see Fig. 24-(b) of the supplementary material. These will be used to simulate 4D facial surfaces that have different execution rates. Now, given a pair of 4D surfaces α i and α j , and for each pair of temporal diffeomosphims ξ k and ξ l , α i •ξ k and α j •ξ l can be seen as a pair of 4D surfaces with different execution rates. Next, we compute, using the proposed framework, the optimal diffeomorphism ξ * i,j that aligns To quantitatively evaluate the quality of the computed temporal registration, we compute: • The distance between the perfectly registered 4D surfaces α i and α j ; see the green curves in Fig. 8. • The distance between α i •ξ k and α j •ξ l before temporal registration (Fig. 8-(a)), and the distance between α * i and α j , i.e., the distance between the two 4D surfaces after their temporal registration (Fig. 8-(b)). Ideally, the  latter should be significantly smaller than the former. It should also be as close as possible to the distance between the perfectly registered 4D surfaces α i and α j . Figs. 8-(a) and (b) report statistics of these errors for each pair of 4D surfaces, but aggregated over the 100 random diffeomorphisms. As one can see, the median distance between the 4D surfaces after registration (Fig. 8-(b)) is significantly lower than the one before registration (Fig. 8-(a)). The former is significantly closer to the ground-truth (shown with green curves in Fig. 8) than the latter.

Computation time.
Our approach runs entirely on a CPU. The Matlab implementation of the spatiotemporal registration process takes less than 31.43 seconds on 4.2 GHz Intel Core i7 with 32 GB of RAM. The visualization, which is needed when computing geodesics, means, and directions of variation, and when synthesizing random 4D surfaces, relies on the inversion of the SRNF maps. It requires 6 seconds per frame and a total of 30 minutes for the 300 temporal frames used in this paper. All the experiments were performed using a high spherical resolution (256 × 256).

Summary statistics
We now consider a set of unregistered 4D surfaces and compute their mean and principal directions of variation. Fig. 9 shows the 4D mean (highlighted with a blue box) computed from six 4D human shapes performing different types of actions. The figure also shows the input 4D surfaces after their spatiotemporal registration; see the video in the supplementary material for an illustration of the input 4D surfaces before spatiotemporal registration. Despite the large articulated motion, the large differences in the type of actions and the significant differences in the execution rates of the 4D surfaces, our framework is able to co-register them and generate a plausible average 4D surface. Figs. 10-(a) and (b) show the mean and the first two principal modes of variation computed on input 4D facial surfaces. As we can see, the computed mean also captures the main features of the dataset. The principal directions of variation further capture relevant variability in the given data. The supplementary material includes the input 4D surfaces prior to their registration. Please also refer to the videos in the supplementary material for additional results.  1 Comparison of the performance and accuracy of the proposed spatial registration with state-of-the-art techniques such as MAP Tree [44] and Fast Sinkhorn filters [45], which are based on functional maps [27].
COMA [46] CAPE [8] DFAUST [ 9. Co-registration of multiple 4D surfaces. In this example, we consider four human body shapes performing a jumping action (first four rows) and two others performing a punching action (rows 5 and 6). Here, we show the spatiotemporally co-registered 4D surfaces and the 4D mean computed using the proposed algorithm. The supplementary material includes the input 4D surfaces before their spatiotemporal registration. It also includes the full video sequences. The surfaces are from the DFAUST dataset. (c) Five randomly synthesized 4D faces. Each row corresponds to one 4D surface sampled between −1.5 to 1.5 times the standard deviation along the principal direction of variation. We refer the reader to the supplementary material, which shows the input 4D faces (before their spatiotemporal registration). It also includes more modes of variation and random samples, as well as the complete video sequences.

4D surface synthesis
(a) Without SRNF. (b) With SRNF. Fig. 11. The mean shape between the left and right surfaces, computed (a) in the original surface space without the SRNF representation, and (b) in the SRNF space. In (a), the mean shape is distorted due to the use of the L 2 metric in the originals pace of surfaces. In both cases, the spatial registration is performed using the proposed registration method.

Ablation study
We undertake an ablation study to demonstrate the importance of each component of the proposed framework.

Importance of the SRNF representation.
In this experiment (Fig. 11), we take two challenging 3D human body models, which undergo a large articulated motion, perform their spatial registration using the proposed SRNF approach, and then compute their statistical mean using the L 2 metric in the original surface space (Fig. 11-(a)) and the L 2 metric in the SRNF space ( Fig. 11-(b)). Fig. 11-(a) shows that the articulated parts of the mean computed in the original surface space unnaturally shrink. This is predictable since, under the L 2 metric, geodesics correspond to straight lines. However, in the SRNF space, the L 2 metric is equivalent to the optimal bending and stretching of the surfaces, and thus the computed mean is more natural; see Fig. 11-(b). Next, we consider two full 4D surfaces of deforming human body shapes (Fig. 12-(a) and (b)) and show their mean 4D surface obtained: (1) with the SRNF representation, with spatial registration, and without temporal registration ( Fig. 12-(d)), (2) with the SRNF representation, with spatial registration, and with temporal registration (Fig. 12-(e)), (3) without SRNF representation, with spatial registration, and without temporal registration (Fig. 12-(f)), and (4) without SRNF representation, with spatial registration, and with temporal registration (Fig. 12-(g)). The last two cases are equivalent to a linear interpolation in the original surface space, after spatial registration. In all cases, we perform the spatial registration using the SRNF framework.
First, we can see that the temporally-aligned target 4D surface (Fig. 12-(c)) is very close to the source 4D surface in Fig. 12-(a). We observe that the right hands became fully synchronized. As such, the mean 4D surface obtained after temporal registration (Fig. 12-(e)) is fully synchronised with the source and the aligned target, unlike the mean 4D surface in Fig. 12-(d), which has been obtained without temporal registration. Second, in the mean 4D surfaces obtained without the SRNF framework (Figs. 12-(f) and (g)), we can observe that the parts that undergo large articulated motion (e.g., the arms) unnaturally shrink. This shrinkage is stronger in Fig. 12-(f) since the mean is obtained without  temporal registration. The bottom row of Fig. 12 shows a zoom on the time frame highlighted in Figs. 12-(a) to (g). Finally, we quantitatively evaluate the importance of the SRNF representation by comparing the expressive power of PCA on the original space of spatially registered surfaces and on the space of SRNFs. We randomly divide a data set equally into a training set and a testing set. We then fit a PCA model to the training set (both in the original space and in the space of SRNFs), project each model in the test set onto the PCA model, reconstruct it, and measure the error between the original and the reconstructed models. We perform 5-fold cross-validation. Table 2 reports the mean, the median, and the standard deviation of the error over the test set and averaged over the five runs. As one can see, PCA on the SRNF space has a significantly lower reconstruction error than PCA in the original space of surfaces. This demonstrates that the former is more suitable to characterize variability in the shape of 3D objects that bend and stretch.

Importance of the TSRVF representation for 4D surfaces.
We perform a similar ablation study, but on 4D surfaces, to compare the expressive power of PCA on the original space of curves and on the space of TSRVFs. Table 3 shows that PCA error on the TSRVF space is lower than the error in the original space. This demonstrates that the former is more suitable to characterize variability in 4D surfaces.

CONCLUSION
We have proposed a new framework for the statistical analysis of longitudinal 3D shape data (or 4D surfaces), i.e., surfaces that deform over time, e.g., 3D human body shapes performing actions at different execution rates or 3D human faces pronouncing sentences at different speeds. Unlike traditional techniques, which only consider how features such as landmarks or measurements vary over time, the proposed framework considers the deformation of the entire surface of a 3D object. Our key contribution is in representing 4D surfaces as trajectories in the space of SRNFs, and the use of Transported Square-Root Vector Fields to analyze such trajectories statistically. The proposed framework can spatiotemporally register 4D surfaces, even in the presence of large elastic deformations and significant (c) Target (with SRNF, with spatial registration, with temporal registration).
(g) Mean (without SRNF, with spatial registration, with temporal registration). variations in the execution rates. It is also able to compute geodesics and summary statistics, which in turn can be used to synthesize new, unseen 4D surfaces randomly.
Although we have demonstrated the proposed 4D analysis framework on human body shapes and facial surfaces, it is general and can be applied to other types of surfaces. Our current implementation is limited to surfaces that are homeomorphic to a sphere, but we plan to extend the framework to higher-genus surfaces by exploring different parameterization methods, including mesh-based representations [48]. The approach uses the numerical SRNF inversion procedure of Laga et al. [4], which is sometimes not accurate near the poles of the parameterization domain; we plan to improve its performance via the use of charts.
The framework deals with surfaces that bend and stretch but do not change in topology; as such, it does not apply to tree-like shapes, e.g., botanical trees or roots. However, the concept of representing deformations as trajectories in a shape space also applies to tree-shape spaces such as those used in [49], [50]. The framework is also limited to clean surfaces that are free of geometric and topological noise; as such, the proposed spatial registration method cannot be used to register partial scans to each other, or to register a template to partial scans. However, similar to statistical shape models such as 3D morphable models and SMPL, the proposed 4D atlas can be used as a prior; in conjunction with a data generation model, it can thereby be applied to noisy or partial data, e.g., to reconstruct entire 4D surfaces. The statistical analysis presented in this paper assumes that the population of the 4D surfaces follows a Gaussian distribution. We plan to extend the approach to other types of distributions, e.g., Gaussian Mixture Models, which can represent populations that follow multimodal distributions The proposed framework has various applications in computer vision, graphics, biology, and medicine. In computer vision, collecting large animations to train deep neural networks, e.g., for 3D reconstruction or action recognition [51], [52], is complex and time-consuming. Our framework can contribute to solving this problem by automatically synthesizing new samples from a small dataset. Our current implementation has only considered random synthesis, which is very important for populating virtual environments and for data augmentation to train deep learning networks. However, there are many situations where we would like to control this process using a set of parameters. For instance, when dealing with 4D facial expressions, these parameters can be the degree of sadness, facial dimensions, etc. This type of control can be implemented efficiently using regression in the TSRVF space. Finally, our framework can be used to statistically analyze how anatomical organs deform due to growth or disease progression.

APPENDIX A SPATIAL REGISTRATION
This appendix discusses the implementation details of the spatial registration and provides additional results on various complex 3D shapes.

A.1 Algorithm
In this section, we provide more details on the procedure used to solve the spatial registration problem in Eqn. (4) of the main manuscript. In our formulation, we consider spatial registration of a surface f 2 to a surface f 1 as the problem of finding the rotation and reparameterization that bring the SRNF h 2 of f 2 as close as possible to the SRNF h 1 of f 1 . We measure closeness using the L 2 metric in the SRNF space, which is equivalent to the partial elastic metric in the original space of surfaces. That is, we seek to solve the following optimization problem: where * is the composition operator between an SRNF and a diffeomorphism γ ∈ Γ. This joint optimization over SO (3) and Γ can be solved by alternating, until convergence, between the two marginal optimizations: • Assuming a fixed parameterization, solve for the optimal rotation using Procrustes analysis via Singular Value Decomposition (SVD). • Assuming a fixed rotation, solve for the optimal reparameterization using a gradient descent algorithm. Below we detail each of these steps.
Optimization over the rotation group. For a fixed γ ∈ Γ, the minimization over SO(3) can be performed directly using Procrustes analysis.
in Eqn. (13). (Here, J[γ] is the determinant of the Jacobian of γ.) The optimal rotation matrix can then be obtained using Algorithm 1. Optimization over the space of spatial diffeomorphisms.
We use a gradient descent approach to solve the optimization problem over Γ. Although this approach has an obvious limitation of converging to a local solution, it is still general enough to be applicable to general surfaces and provides plausible results. In order to specify the gradient, we focus on the current iteration, and define the reduced cost function E reg : Γ → R ≥0 : whereh 2 = (h 2 , γ 0 ), γ 0 and γ denote the current and the incremental reparameterizations respectively, and φ : derivative of E reg at γ id , in the direction of b, is given by where φ * is the differential of φ and ·, · is the L 2 inner product. If we have an orthonormal basis for T γ id (Γ), we can specify the full gradient of E reg with respect to γ, which is an element of T γ id (Γ) given by This linear combination of the orthonormal basis elements of T γ id (Γ) provides the incremental update ofh 2 in the orbit [h 2 ].
This leaves two remaining issues: (1) the specification of an orthonormal basis of T γ id (Γ), and (2) an expression for φ * . In our implementation, we use gradients of the spherical harmonic basis to define the orthonormal basis of T γ id (Γ); see Section 3.2.2 of [21]. The expression of φ * is also derived in Section 3.2.2 of [21]. With this, the optimization over the space of diffeomorphisms can be performed using Algorithm 2.

A.2 Additional spatial registration results
In this section, we provide more results to show the importance of each component of the proposed framework and to demonstrate the efficiency of the proposed spatial registration.
Importance of the SRNF representation. Fig. 13 illustrates, with a practical example, the importance of the SRNF representation for computing geodesics and statistics between 3D surfaces that undergo complex articulated and elastic motion. In this example, we use two human body shapes, perform their spatial registration using the proposed SRNF framework, and then compute their mean using the L 2 metric in the original space of surfaces (Fig. 13-(a)) and using the L 2 metric in the space of SRNFs (Fig. 13-(b)). In the former case, we can see that the parts that undergo large articulated motion (arms in this example) significantly shrink. This is predictible since a geodesic under the L 2 metric in the original space of surfaces corresponds to straight lines.
In the SRNF space, the arms naturally bend since the L 2 metric in this space is equivalent to the optimal bending and stretching of surfaces; see Fig. 13-(b).
Human bodies and faces. Fig. 14 shows three dressed 3D human models, from the CAPE dataset, before registration (Fig. 14-(a)) and after registration (Fig. 14-(b)) using the proposed SRNF-based approach. In both cases, the correspondences are color-coded. CAPE dataset is particularly challenging since it contains dressed 3D human bodies. Despite the significant differences in the initial parameterizations (see Fig. 14-(a)) and in the shape of the bodies and clothes, the proposed approach is able to find plausible correspondences between the surfaces (see Fig. 14-(b)). Figs. 15 and 16 show more examples using, respectively, the DFAUST human body dataset [7] and the COMA human faces [46]. In both cases, we show the surface before and after registration with correspondences color-coded.
Surfaces with missing parts. Figs. 17 and 18 show examples of spatial registrations between surfaces that have missing parts. Since our framework uses an elastic metric, which allows bending and stretching of surfaces, it is able to find one-to-one correspondences even under these challenging cases. Note that, similar to functional maps, the framework requires closed manifold surfaces and thus it is not able to handle (noisy) partial scans.
Quadruped animals. We test the proposed SRNF-based spatial registration on the quadruped animal dataset of [47].     15. Illustration of the spatial correspondences, on some samples from the DFAUST dataset [7], before and after applying the proposed spatial registration algorithm. Correspondences are color-coded, i.e., points that are in correspondence are rendered with the same color.
(b) After spatial registration.  16. Illustration of the spatial correspondences, on samples from the COMA dataset [46], before and after applying the proposed spatial registration algorithm. Correspondences are color-coded, i.e., points that are in correspondence are rendered with the same color.
color-coded). We also compare, visually, our results to the registrations obtained using the state-of-the-art functional map-based methods such as MapTree [44]; see Figs. 20-(b), 19-(b), and 21-(b). As one can see, the mappings obtained using MapTree are often not correct in many regions of the shapes. More importantly, the maps obtained using MapTree are not one-to-one since a point on the source shape can be mapped to multiple points on the target. Our approach finds one-to-one correspondences and thus is more suitable for computing geodesics and shape statistics.

A.3 Karcher mean of surfaces
Algorithm 3 summarizes the Karcher mean algorithm used to compute the mean 4D surface of a collections of 4D surfaces; see Section 4 of the main manuscript. We refer the reader to the main manuscript for the definition of each symbol. Fig. 17. Examples of spatial correspondences between objects with missing parts. Correspondences are color-coded. The top row shows one view of the 3D models while the bottom row shows another view of the same models. In both cases, we show the smooth rendering as well as the wireframe rendering of the surfaces so that the reader can appreciate the quality of the correspondences. Fig. 18. Examples of spatial correspondences between 3D hands with one finger missing. Correspondences are color-coded, i.e., points that are in correspondence are rendered with the same color. We show the smooth rendering as well as the wireframe rendering so that the reader can appreciate the quality of the correspondences. • q i ← q * i and ξ i ← ξ i • ξ * i . 5: Update the Karcher meanq = 1 n n i=1 q i . 6: If the change in q is large then go to Step 4. 7: Findᾱ by TSRVF inversion ofq followed by SRNF inversion. 8: Returnᾱ, ξ i , q i , ı = 1, . . . , n.
(a) Correspondences using the SRNF framework.
(b) Correspondences using MAPTree [44]. Fig. 19. Examples of spatial correspondences between pairs of quadruped animals from the ACSM animal dataset [47]. Correspondences are color-coded. We also show in (a) the wireframe rendering of the surfaces. Correspondences are color-coded, i.e., points that are in correspondence are rendered with the same color. Correspondences are color-coded, i.e., points that are in correspondence are rendered with the same color. View 1.
View 1 with tessellation. View 2 with tessellation. (a) Correspondences using the SRNF framework.
View 2. (b) Correspondences using MAPTree [44]. Fig. 20. Examples of spatial correspondences between quadruped animals from the ACSM animal dataset [47]. Correspondences are color-coded. We also show in (a) the tessellation grid of the surfaces. Correspondences are color-coded, i.e., points that are in correspondence are rendered with the same color. View 1.
View 1 with tessellation. View 2 with tessellation. (a) Correspondences using the SRNF framework.
View 2. (b) Correspondences using MAPTree [44]. Fig. 21. Examples of spatial correspondences between quadruped animals from the ACSM animal dataset [47]. Correspondences are color-coded. We also show in (a) the tessellation grid of the surfaces. Correspondences are color-coded, i.e., points that are in correspondence are rendered with the same color.   22. We consider twelve 4D surfaces from the VOCA dataset, perturb each of them using 100 random temporal diffeomorphisms, and then measure the distance between each original 4D surface and its perturbed versions. We show the boxplots of distances (a) before the temporal registration and (b) after the temporal registration using the proposed framework for each of the twelve 4D surfaces. In each plot, the X axis corresponds to the 4D surfaces and the Y axis to the temporal alignment error. The lower the error is the better is the alignment.

APPENDIX B TEMPORAL REGISTRATION
In this appendix, we provide additional results and additional quantitative evaluations to further demonstrate the quality and performance of the temporal registration framework proposed in this paper.

B.1 Evaluation of the temporal registration
To quantitatively evaluate the performance of the proposed temporal registration, we first consider a 4D surface α and parameterize it with randomly generated temporal diffeomorphims ξ i , i ∈ {1, · · · , 100}, to obtain new 4D surfaces α i = α • ξ i . Next, we compute the distances d i , i = 1, . . . , 100, between α and α i , using the expression given in Eqn. (7) of the main manuscript, before temporal registration. Fig. 22-(a) shows the box plots of the resulting distances for 12 sequences from the VOCA dataset before the temporal registration. Ideally, ∀i, d i = 0. However, since the 4D surfaces are not temporally registered, the distances are large in most of the cases. In Fig. 23, we perform the same experiment but this time on five 4D surfaces simulated using the FLAME framework; see Section 5.1 of the main manuscript for a detailed description of how these 4D surfaces have been generated. Fig. 23 reports the temporal registration errors before and after applying the proposed framework for each of the five 4D surfaces. As one can see, our approach is able to properly align the perturbed 4D surfaces to their original 4D surfaces.
Next, we compute the optimal temporal registration, i.e., for every 4D surface α i , we find the optimal diffeomorphism ξ * i that aligns α i to α. Let α * i = α i • ξ * i . Fig. 23-(b) shows boxplots of the distances between α * i and α for the same five sequences. Compared to the distances between the original unregistered 4D expressions shown in Fig. 23-(a), these distances are significantly lower. This shows that the proposed temporal alignment framework brings the 4D surfaces as close as possible to each other.
Finally, Figs. 24-(a) and (b) show the 100 random temporal diffeomorphisms that have been applied to perturb the 4D surfaces used for quantitative evaluation shown in    23. We consider five 4D surfaces generated using FLAME framework, perturb each of them using 100 random temporal diffeomorphisms, and then measure the distance between each original 4D surface and its perturbed versions. We show the boxplots of distances (a) before the temporal registration and (b) after the temporal registration using the proposed framework for each of the five 4D surfaces. In each plot, the X axis corresponds to the 4D surfaces and the Y axis to the temporal alignment error. The lower the error is the better is the alignment. In each plot, the X and U axis both correspond to the temporal domain [0, 1] since, in our formulation, a temporal diffeomorphism ξ is defined as the mapping of the temporal domain [0, 1] to itself, i.e., ξ : [0, 1] → [0, 1] such that 0 < dξ dt < ∞, ξ(0) = 0 and ξ(1) = 1.

B.2 Additional temporal registration results
Fig . 25 shows another example of the spatiotemporal registration of two 4D facial surfaces. In this example, we show the source surface, the target 4D surface before the spatiotemporal registration, and the target 4D surface after spatiotemporal registration. We also highlight some key frames to illustrate the quality of the temporal registration.

APPENDIX C GEODESICS BETWEEN 4D SURFACES
This appendix presents additional results on the computation of geodesics between 4D surfaces. Fig. 26 shows two additional examples of gedoesics between 4D facial surfaces. Fig. 27, on the other hand, shows the input 4D surfaces, prior to the spatiotemporal registration, used to generate the mean 4D surface in Fig. 9 in the main manuscript.

APPENDIX D STATISTICS BETWEEN 4D SURFACES
Finally, Section D provides additional results on the computation of summary statistics on collections of 4D surfaces. Figs. 28 and 29 show statistics computed on a collection of six 4D facial expressions from the VOCA dataset. In Fig. 28, we show the input 4D surfaces before and after the spatiotemporal registration. In both cases, we highlight the computed mean 4D surface. Fig. 29 shows the leading three modes of variation; each mode of variation is discretized as a path from −1.5 to +1.5 standard deviations.
(a) Input unregistered 4D surfaces. The last row is the mean obtained without temporal registration.
(b) Spatio-temporally co-registered 4D surfaces. The last row is the mean obtained after co-registration The mean is shown in the last column.