Point Cloud Reconstruction From Truncated Geometry-Based Streams

—Geometry-based point cloud compression (G-PCC) has been rapidly evolving in the context of international standards. Despite the inherent scalability of octree-based geometry description, current G-PCC attribute compression techniques prevent full scalability for compressed point clouds. In this paper, we present a solution to add scalability to attributes compressed using the region-adaptive hierarchical transform (RAHT), enabling the reconstruction of the point cloud using only a portion of the original bitstream. Without the full geometry information, one cannot compute the weights in which the RAHT relies on to calculate its coefﬁcients for further levels of detail. In order to overcome this problem, we propose a linear relationship approximation relating the downsampled point cloud to the truncated inverse RAHT coefﬁcients at that same level. The linear relationship parameters are sent as side information. After truncating the bitstream at a point corresponding to a given octree level, we can, then, recreate the attributes at that level. Tests were carried out and results attest the good approximation quality of the proposed technique.


I. INTRODUCTION
A Point cloud (PC) is a 3D structure usually represented by a collection of points or volume elements (voxels) described by their geometry, given as the (x, y, z) coordinates of the points, and by the points attributes, which may be color, normal vectors and reflectance, among others. Recently, PC compression (PCC) research has intensified [1]- [6] and the Motion Picture Expert Group (MPEG) is in the process of finalizing two PCC standards [7], [8], based on purely geometrical techniques (G-PCC) or on existing video compression standards (V-PCC). PCs can be used to represent 3D scenes, where the points describe the hull of objects therein, and are used in, for example, autonomous navigation [9], [10], heritage preservation [11], entertainment, and telepresence [12].
The wide range of applications may result in different requirements in terms of quality and resolution. Consider the example illustrated in Fig. 1 near to the user's viewing point, the composition of the scene would require it to be rendered at a higher resolution when compared to the PC being placed further away. On the user side, the PC can be downloaded and fully reconstructed and then downsampled to the required resolution. However, if the encoded bitstream could offer some degree of scalability, the user would be able to only download and process the required amount of the bitstream in order to reconstruct the PC at the desired resolution.  Spatial scalability was one of the original requirements for G-PCC, so that the bitstream would have a layered structure for coarser approximations, and with each layer being used to predict the next one [13]. However, such a requirement is only partially addressed by G-PCC. The geometry representation already offers scalability by using octrees [14], wherein the point cloud geometry is inherently encoded as successive improvements on the geometry resolution, starting from a single block, successively dividing blocks into eight smaller blocks. The octree bitstream signals to the decoder which of these new blocks are occupied. For the attributes, G-PCC, which is currently based on RAHT [1], [2] or predicting/lifting transform [3], [15], offers partial spatial scalability, by using overlapping slices to encode regions of a PC at different fidelity levels, without inter-layer prediction [16].
In this paper, we propose a solution to reconstruct a PC from a truncated portion of the encoded bitstream containing the data from the octree-encoded geometry and RAHT-encoded attributes. The use of a truncated portion of the bitstream can save bandwidth in data transmission and avoid re-encoding the entire point cloud for each resolution.

II. TRUNCATED INVERSE RAHT
RAHT is a variation of the Haar transform, and it uses attribute values of a node at a lower level of the octree to predict the attributes of the nodes at the next level. For simplicity, assume there is just one attribute to be encoded per voxel (point), be it a color component, reflectance or else. Neighboring voxels are paired and transformed into low-and high-pass coefficients. The low-pass ones are further combined at each step with neighboring low-pass coefficients, repeating the process, until the entire space is traversed. At a given level, two low-pass coefficients about to be paired and transformed represent averages over different numbers of voxels, which render different weight values in the transformation matrix. Each weight indicates the number of voxels that were actually involved to generate that low-pass coefficient. Two neighbor low-pass coefficients at level + 1, F +1,2n and F +1,2n+1 , are combined through an orthogonal transform to form a lowand a high-pass coefficient, F ,n and G ,n , at level . Let w +1,2n and w +1,2n+1 be the respective weights of the input coefficients, then where (2) Note that a 2 + b 2 = 1, T is orthogonal and equation (1) can be inverted using T −1 = T T .
Let the PC have N occupied voxels, laid in a cubic grid of dimensions of 2 L voxels, which is referred to as an L-level PC. The set of {G ,n } are the N − 1 RAHT coefficients which are encoded along with the overall DC F 0,0 . The forward RAHT start from the voxels (tree leaves at the L-th level, or {F L,n }) generating low-pass coefficients which are laid in a voxel grid of level L − 1 ({F L−1,n }). The process is recurred until we traverse all the way to the tree root at level 0, generating the overall DC value F 0,0 for the entire PC.
At the decoder, we start with F 0,0 and G 0,0 , along with the weights w 1,0 and w 1,1 , to calculate F 1,0 and F 1,1 . From the start we need the weights for the whole tree. If we truncate the tree at level L − K and know the geometry up to level L − K we would still be unable to reconstruct the set of {F L−K,n } because the weights for all levels below L − K still depend on the geometry at levels above L − K. Hence, RAHT is not scalable.

III. PROPOSED SOLUTION
Assume we have truncated data composed of the geometry information up to level L − K and all the RAHT coefficients up to level L − K, i.e. F 0,0 and {G ,n , 0 ≤ < L − K}.
If we decode the data as an L − K-level PC, the lower resolution geometry would provide incorrect weights {w ,n } for the given coefficients, which were computed using the correct weights {w ,n }. For example, all w L−K,n = 1 since it is the last level of the truncated PC, which is most definitely not the case for the original PC with K further levels.
Let X ,n be the voxel attribute at the same position of F ,n obtained by downsampling the L-level PC from full-resolution voxels {F L,n } down to level . Generally, X ,n and F ,n are related by a scale depending on the weight w ,n , and the set {X L−K,n } is what we want to reconstruct with the truncated bitstream. With the wrong weights w , let the reconstructed voxel attributes at level L − K beF L−K,n . We have noticed there is some correlation in between X ,n andF ,n which is not precise but approximated. Figure 2 shows the relationship among pairs X ,n andF ,n for different conditions for a couple of PCs encoded at particular bit-rates specified in MPEG's G-PCC common test conditions (CTC). Note the first order correlation among the variables. This pattern has held for all PCs and conditions we have tested in Sec. IV. This suggests a linear approximation from one variable to the other as X ,n ≈ α F ,n + β . ( We, then, calculate, at the encoder side, not only the {F ,n } and {G ,n } for all levels 0 ≤ < L, but we also calculate {F ,n } and {X ,n } for 0 ≤ ≤ L−K. The encoder computes α and β using least-squares for all levels 0 ≤ ≤ L − K. The 2(L − K + 1) parameters are encoded and sent as side information to the decoder, which is generally a very small penalty. It is important to note that the G-PCC standard does not interlace the geometry and attribute information on a level basis, as each stream is separately included in the bitstream. In order to allow for the desired scalability in this standard, the proposed method expects stream interlacing on a level basis. The proposed G-PCC decoder must be able to truncate the geometry and attribute information until the desired downsampling level, inverse transform the truncated RAHT coefficients, and then apply the first-order approximation. Figure 3 illustrates our method.

IV. EXPERIMENTAL RESULTS
In order to test the proposed solution, we selected PCs with different densities, levels, and attributes (RGB color or reflectance), as described in Table I. We tested different truncation points, corresponding to removing K octree levels from the point cloud, i.e., downsampling by a factor of s = 2 K . Our distortion metric computes Y-PSNR [17], [18], i.e. the PSNR of the luminance channel or reflectance. We start from a compressed bitstream and compare the resulting PC to the original PC after resolution reduction (downsampling) of K levels. The downsampling is carried out by removing octree levels of the geometry and by averaging the attributes at each level. Two methods are compared: the one obtained by full-stream reconstruction (full-stream) followed by down-sampling of K levels, against our algorithm to reconstruct L − K levels of the PC from the truncated bitstream (proposed). The rate was calculated as the number of bits actually transmitted, i.e. the full bitstream size for full-stream method against the truncated bitstream size for the proposed one. Encoding and decoding, in both cases, were based on G-PCC Test Model (TMC13) version 12.0 [19], [20], and the encoding bit-rates were set according to the six target-rates from G-PCC's CTC [21].
For a given PC, as we set K we can vary the encoding rate obtaining rate-distortion (RD) curves for each method (full-stream vs. proposed) from which one can calculate the Bjøntegaard-delta (BD) [22] rate reduction. Figure 4 shows the BD-rate achieved for many PCs at different truncation points.
Although impressive results were obtained, it is important to mention that reductions come with a caveat, since they do not capture the fact that a truncated bitstream cannot achieve the same distortion levels as full-stream decoding as we can see in the RD curves shown in Fig. 5. In yet another way we can present results, Fig. 6 relates the percentage of the bitstream that is used against the drop in Y-PSNR (dB) for the proposed method against the full-stream one. In these, one can see the curves for either varying the encoding rate for a given value of s (or K), or vice-versa. In all these curves one may appreciate there are sweet spots where the large reduction in rates and small reduction in quality may be of interest to given application developers.  Table I.

V. CONCLUSIONS
We have proposed an algorithm to reconstruct a G-PCCcoded point cloud from a partial (truncated) bitstream. Based on a linear relationship approximation relating the downsampled point cloud to the truncated inverse RAHT coefficients at that same level, the downsampled point cloud can be estimated, enabling lower-resolution "previews" without fullstream reconstruction. This is no substitute for a truly scalable coder, but it can be useful to save transmission bandwidth when, for example, rendering objects farther away. A number of tests were carried out to demonstrate the quality of the reconstruction. In many situations, one can save a large percentage of the original bit-rate at a small distortion penalty.