3D Cartographic Generalization of LiDAR Point Clouds Based on the Principle of Self-Similarity of a Deterministic Fractal Structure

The rendering of virtual three-dimensional (3D) structures represented by Point Cloud (PC) allows the representation of internal and/ or external environments to buildings. However, the compilation of 3D geometric models is influenced by the intrinsic characteristics of PCs, which can be mitigated by the application of an PC simplification operator. According to the mathematical norms of fractal geometry, it was assumed that a PC is characterized by self-similarity. Two experimental datasets acquired with an SLT in static mode indoors were used. Four tasks were accomplished: sampling and structuring of a PC to solve the problem of random distribution, from an octree structure; estimation of the curvature of the points and the roughness of a neighbourhood for the extraction of edge points by the analysis of self-similarity and application of the Statistical Outliers Remove (SOR) algorithm, for the elimination of outliers points; uniform voxelization, to simplify the intermediate points; application of the Iterative Closest Point (ICP) algorithm to register the sets generated in the same local coordinate system. The use of voxelization was satisfactory, but once the voxel size is manually defined, the PC can be oversimplified and lose essential characteristics. This can be minimized by the primary analysis of the edge points, generating a set that is uniform, less noisy, and self-similar to the original set. To achieve a minimum density of points to model an environment three-dimensionally, one must analyse the geometric self-similarity characteristics of the PC to produce a simplified set self-similar to the original, considering the premises of fractal geometry. It is recommended to create an automatic simplification process to minimize the subjectivity coming from the analyst.


Introduction
Technological advances and the design of new 3D geospatial data acquisition sensors have made it possible to render virtual three-dimensional (3D) structures (Benita et al. 2020;Nikoohemat et al. 2020). These technologies allow the design of 3D models of cities in the context of Smart Cities (Kumar et al. 2017;Neuville et al. 2018) and Digital Twins (Benita et al. 2020;Döllner 2020;Lehner & Dorffner 2020), to represent internal and/or external environments (Nikoohemat et al. 2020). Among the alternatives for representing content in virtual reality, Point Cloud (PC) gains popularity due to its efficiency in capturing data in loco and encoding it into 3D renderable structures in computing environments.
However, the compilation of 3D geometric models comes up against a recurring problem: the intrinsic characteristics of PCs. When dense, they can portray an environment in three-dimensions. However, they constitute massive, purely geometric data, which presents incompleteness, due to areas of occlusions, and randomness in the distribution of points (Döllner 2020). The central problem consists of the redundancy of a raw PC that, depending on the size of the profiled area and the frequency of profiling of the sensor used, the memory and disk storage requirements can make additional processing impossible without the application of 3D cartographic generalization methods, which are not yet fully available (Sester 2020).
The application of a PC simplification operator represents an alternative to reduce this problem, basically consisting of the process of minimizing the set of points, maintaining the mathematical and statistical characteristics according to the original set, that is, the self-similarity between the original set and its derivatives. Technically, this means that the simplification process must present invariance in its form as the Level of Detail (LoD) changes, keeping its structure identical to the original, falling in the context of a structure fractal (Mandelbroi 1975).
The term fractal is generally applied to diverse constructions, both in the so-called abstract forms and those inherent to nature (Edgar 2008) and are objects of study in the fields of Mathematics and Physics, as laws of formation and scale (Edgar 2008;Mandelbroi 1975). In the context of a PC, the set of points that geometrically defines it must present mathematical self-similarity according to a deterministic fractal structure (Mandelbroi 1975), that is, the mathematical characteristics must be independent of the density of points of a PC. Therefore, the spatial configuration of point distribution must present exact selfsimilarity, that is, the geometric configurations related to the original set must be repeated in successive simplified configurations. The great challenge lies in defining a threshold between the loss of geometric characteristics and the minimization of the redundancy of a PC.
Thus, this research started from the principle that a PC, regardless of the profiled environment, is characterized by self-similarity, according to the mathematical norms of fractal geometry. Therefore, from the partitioning of a PC, a simplification operator of the 3D cartographic generalization can be seen as a replica of the whole, in a lower LoD calculated in a recursive procedure, that is, composed of subprocedures that allow the calculation of a fractal dimension that represents the degree of occupation and self-similarity that a structure in space can contain.

Related Research
Recent advances in 3D data acquisition technologies, such as Terrestrial Light Detection and Ranging (LiDAR) (SLT) sensors, enable the acquisition of PCs with high density per m². However, increasing details usually introduce computational costs in terms of processing and viewing operations. 3D simplification methods allow the minimization of data complexity while maintaining the description of relevant structures. This allows the minimization of problems in the processing of PCs that can become impractical on a large-scale dataset. In particular, there are three ways to simplify PC: A) Mesh-based; B) Based on direct optimization, and; C) Based on sampling; as described in the following subtopics: -Mesh-based methods: It is the preliminary adjustment of a PC to a surface, which undergoes successive simplification processes (Garland & Heckbert 1997). They are iterative methods, present inefficiency in profiling large portions and can be divided into two groups: based on decimation (Asgharian & Ebrahimnezhad 2020) or collapse (Hinderink, Mandad & Campen 2022). The first iteratively removes vertices and then performs a retriangulation process, while the second employs approaches based on edge blending to estimate an optimal position for a vertex (Asgharian & Ebrahimnezhad 2020). These methods assign a weightrelated to the importance of each point in the representation of a surface S based on statistical attributes, such as normal vectors (Hinderink, Mandad & Campen 2022) or the curvature of the point cloud (Asgharian & Ebrahimnezhad 2020). In addition, the determination of an energy function is commonly used to extract edge points (Hinderink, Mandad & Campen 2022).
-Methods based on direct optimization: The points of a PC are selected and simplified according to their local mathematical properties (Zou et al. 2020).
The simplification process based on this method assesses the importance of each point in a PC. Normal vectors are often applied and considered the base information to obtain the description of the local geometric characteristics of a neighbourhood of points from iterative strategies. However, although efficient, the algorithms present a high computational cost when applied at larger profiling scales.
-Sampling-based methods: They consist of a point selection scheme whose focus is to maintain the general structure of the object and not its geometric structure, based on the importance of representing the analysed surface. Methods such as the Farthest Point Sampling Histogram (Rusu, Blodow & Beetz 2009), SampleNet (Lang, Manor & Avidan 2020), and S-Net (Dovrat, Lang & Avidan 2019) can be classified as sampling-based methods. The main impasse is related to the non-preserving of the important geometric details of a PC, which can lead to problems in the interpretation and generation of a 3D model.
Regardless of the family of methods, the simplification of PCs inevitably introduces distortions and errors in the rendering of 3D models. The problem lies in the need to develop algorithms that interpret the distribution of the point cloud coherently, imitating a process of human cognition in a computational language. To achieve this objective, this research seeks to select variables and geometric attributes that allow the computational interpretation of the set of points and minimization of their redundancy. The application of mathematical and statistical elements, such as the calculation of normal vectors and the curvature of points belonging to a PC, allows the analysis of the self-similarity of a set P, which is called in this research perceptual metrics.
Motivated by the need to develop 3D generalization operators and by the aforementioned research, it was sought to develop a hybrid process for the simplification of a PC, based on aspects of fractal geometry and perceptual metrics of self-similarity. Therefore, assuming that a PC has mathematical and statistical characteristics (Döllner 2020) and forms a fractal and deterministic structure, to reach a minimum point density condition, a PC composed of a set of points P = (p 1 , p 2 , p n1-1 … p n1 ) that undergoes an isotropic simplification process (with the same intensity in all directions), based on the geometric and statistical distribution of a PC is minimized and transformed to the set P' = (p' 1 , p' 2 , p' n2-1 … p' n2 ), where n 2 < n 1 . Consequently, the set P is self-similar to the simplified set P', if P' is invariant after this transformation and is assumed as the main set to model a surface S.
In this context, with the objective of simplifying a PC and minimizing the computational cost the main contributions that this research seeks to provide are: 1) The use of the self-similarity of a fractal structure to specify the sample number of points sufficient to reconstruct a surface S without the need for iterations; 2) Adaptive sampling of edge points based on the roughness calculation of points belonging to a PC in an isotropic process and 3) Preservation of the visual and geometric quality of the dataset in a lower LoD without significant loss of precision.

Material and Methods
Two experimental datasets acquired with an SLT in static mode indoors were used, as illustrated in Figure 1.
It was assumed that a PC has mathematical and statistical characteristics (Döllner 2020) and forms a fractal and deterministic structure, to reach a condition of minimum point density. Thus, a PC is composed of a set of points P = (p 1 , p 2 , p n1-1 … p n1 ) that undergoes an isotropic simplification process based on the geometric and statistical distribution of a PC is minimized and transformed to the set P' = (p' 1 , p' 2 , p' n2-1 … p' n2 ), where n 2 < n 1 . In this research, geometric characteristics were considered for data minimization and not for topological conditions. Figure 2 shows the methodological flow used to derive P' from a set P.
It is considered that a simplification operator of a PC can be obtained from partitioning the original set into subsets from the establishment of restrictions and analysis of the self-similarity of the neighbourhood of points. The objective is to reduce noise and PC redundancy while preserving edge characteristics and uniformity of the total density of the set. The strategy employed seeks to optimize algorithms in a framework for the development of a point clouds simplification strategy in the context of 3D cartographic generalization, based on statistical selfsimilarity of the data and calculation of the importance of each point belonging to PC as simplification metrics.
The methodological flow used is subdivided into four tasks (Figure 2). The first consists of the process of sampling and structuring a PC to solve the first problem: random distribution. For this, an octree structure was applied (Rusu & Cousins 2011) that effectively provides a representation and structuring for 3D data. It was chosen to use an octree as the base data structure due to its efficiency in serialization.  Subsequently, the process of estimating the curvature of points and the roughness of a neighbourhood is applied to extract edge points by the analysis of self-similarity (Pauly, Gross & Kobbelt 2002) and then the algorithm developed by Rusu and Cousins (2011), Statistical Outliers Remove (SOR), is applied to eliminate outliers points in the generated sets. Subsequently, in task 3, the voxelization process based on uniform Lv, Lin and Zhao (2021) is applied to simplify the points classified as intermediate. Finally, the Iterative Closest Point (ICP) algorithm (Besl & McKay 1992) is applied to register the sets generated in the same local coordinate system.

Extracting edge Points from the Perceptual Roughness Metric
The extraction and simplification of edges of the profiled structures were based on local mathematical metrics of self-similarity, from the analysis of the curvature of the points belonging to the PC, according to Pauly, Gross and Kobbelt (2002). A variance and covariance matrix are an intuitive method for determining the curvature and normal vectors of a PC (Hoppe et al. 1993). Then, considering a neighbourhood N i around a point 3 i p R ∈ , it is possible to define the covariance matrix presented in Equation 1.
From the self-decomposition of C, one can derive the eigenvalues (λ 0 , λ 1 , λ 2 ) corresponding by principal components to define an orthogonal reference at the point p i . According to Hoppe et al. (1993), the eigenvalues with the highest value cover a tangent plane at the point p i , while those of lesser value are used to approximate the normal surface n i . Therefore, given the smallest eigenvalue related to p i of a surface, one can estimate the curvature values (k) of p i (Pauly, Gross & Kobbelt 2002), according to Equation 2.
Assuming that λ 0 , λ 1 , λ 2 are the eigenvalues of a local neighbourhood of p i and v 0 , v 1 , v 2 correspond to the normal vectors that define a plane T(x) = (x -p i )v 0 = 0, priority is given to the sum of squared distances to neighbours of p i (Figure 3).
The estimation of the local curvature of p i is the ideal mathematical variable to analyse the geometric selfsimilarity and minimization of the redundancy of a set of points. Therefore, using the initial estimate of the curvature of a point p i , it is possible to estimate the weighted Gaussian mean of the curvatures around a neighbourhood (Equation 3) Where h is the search radius of a neighbourhood of p i . The roughness estimate of a point cloud can be defined as the difference between the curvature (k) and the mean curvature (ˆ) k on point p i (Rodríguez-Cuenca et al. 2015).
The roughness of each point is represented by a simple operator calculated by the variation of the profiled surface. However, for simplification purposes, the output result (segmentation of edges and intermediate points) is still not completely uniform and contains noise that can be minimized. The solution is to resample the data using a simplification and de-noise operator, as presented in the Subsection that follows.

Voxelization-based Simplification
After extracting the points that allow the identification of edges in the profiled objects, the original PC is subdivided into edge points and intermediate points.
It is observed that using only the points belonging to the edges can introduce errors in the processes of defining 3D representations. Therefore, the points classified as intermediate must be associated and used in the previously mentioned set. However, it is necessary to simplify it uniformly over the S surface. For this process, the method developed by Lv, Lin and Zhao (2021) was adapted. The strategy of this step is to subdivide the point cloud into voxel structures (Figure 4).
A voxel is a geometry in a 3D space and corresponds to pixels in a 2D context. From a conceptual point of view, a voxel has a cubic geometry, composed of six faces, eight vertices, and twelve edges. Despite this, the representation is not made according to a polyhedral cube, but from a central point or through the points that represent its vertices (Xu, Tong & Stilla 2021). Therefore, in the context of simplification, a voxel is considered a basic unit that abstracts and structures a space of discrete points, representing a position in a regular cubic grid. In this perspective, the PC classified as an intermediate is subdivided by voxel. The implemented algorithm is performed in three steps: 1) Calculation of an enclosing rectangle for the PC that defines the space to be segmented; 2) The space defined in the first step is subdivided into regularly spaced 3D cuboids of predetermined size (defined by the executor), which become the cells of the voxel structure and; 3) The PC is segmented into small portions from the cuboids, and finally the subdivided set of points is represented by voxels, in which the position and characteristics of the points are calculated from the analysis of the set extracted in each plot. Then, from a set of points P = (p 1 , p 2 , p n1-1 … p n1 ) where n is the number of points inside a voxel V k , and k the voxel index in the voxelized space V = {V 1 , V 2 , … V m }, being m the number of all voxels generated.
From this task, the edge points and the intermediate PC are registered using the ICP algorithm to form a generalized cloud ready to serve as a base for the construction of 3D models. To evaluate the effectiveness and the performance of the proposed approach, two validation experiments were used based on data acquired with the SLT from indoor environments. The data processing platform was a Laptop computer with Windows 10, a 1.8 GHz processor and 8 GB of RAM. The Python 3.0 programming language and its libraries, such as the open3D Library (Zhou, Park & Koltun 2018) and Point Cloud Library (PCL) (Rusu & Cousins 2011) were used to estimate the geometric parameters.

Results and Discussion
As a first step, it was decided to organize the data sets in the form of an octree, as illustrated in Figure 5. The Octree structure is an efficient way of organizing data, in which a PC is subdivided into nodes (represented by a cube). Each node includes eight secondary nodes (except the leaf, the primary node). The total space is divided into 2 n x2 n x2 n (Rusu & Cousins 2011), in this case, the cloud is decomposed recursively (one node into eight other subnodes).
This strategy is mainly applied for structuring the PC and minimizing computational costs for future processing steps. With the structured experimental sets, it was possible to determine the roughness of the points that form the PC. From this geometric component, it started the process of analysing the roughness of self-similarity related to the points that compose a sample set.

Determination of Edge and Intermediate Points
The extraction of points belonging to the edges of the buildings was carried out from the analysis of the roughness of the points. Figure 6 illustrates the number of points in both experimental sets at the corresponding roughness values. In this research, the threshold related to the roughness value was 0.01. This value was defined from plot tests, being the coherent value found for building edges. Points with roughness > 0.01 were classified as edge points (green color - Figure 7). Those with values < 0.01 were categorized as intermediate (blue color - Figure 7).
The analysis of the self-similarity of the geometric roughness parameter proved to be efficient and allowed the extraction of edge points. However, both the edge points and the intermediate sets generated presented outlier points that introduce errors in the geometric modeling process of a surface. To eliminate them and reduce the redundancy of the test PCs, the SRO algorithm was applied. It eliminates noisy data from the sample that does not effectively contribute to the process of building a 3D model.
The profiling of indoor environments was highly complex due to the presence of reflective and translucent surfaces (glass) and occluded areas. Therefore, in the conception of building a 3D model, the step of removing outliers points is relevant to minimize errors in the process of the semantic and geometric interpretation of a PC. Figure 8 shows the result obtained in the process. The points in red were classified as outliers and eliminated from the sample, while the others were considered as remaining, minimizing the number of points by 10%.
From the segmented PCs and the outliers points eliminated, the simplification process based on voxelization started.

Voxel-based Simplification
The simplification process based on PC voxelization was applied only to the points classified as intermediate, since they presented the largest number of points that made up the sample and are important in the context of 3D modeling of built environments. One of the main problems in the application of PC simplification processes is defining a specific threshold for reducing the number of points that minimizes the computational cost and maintains minimal geometric properties (explicit representation of the position and topology of a neighborhood of points) stored in abstract 3D structures containing pre-defined positions and attributes so that modeling algorithms can be properly explored (Xu, Tong & Stilla 2021).
Due to this fact, it was decided to perform three specific tests for the sample sets, considering voxel sizes of 0.01, 0.02, and 0.05 m (Figure 9). It is observed that the higher the value of the associated voxel, the lower the number of remaining points. Despite this, too much minimization leads to interpolation errors and estimates of 3D models. Regardless of, it is observed that voxel-based representations are advantageous and efficient in the data compression process.  The objective of the voxel variation is to find an optimal suggestion value for an application in the simplification of PCs, considering the integration of the voxelized set with points categorized as edges in the previous step (Table 1). According to Xu, Tong and Stilla (2021), voxel techniques have a high potential of success to minimize computational cost and, due to their 3D structuring, their application is ideal for monitoring, planning, and navigating tasks in built environments.
For the composition of the final PC, the PC registration process was carried out using a variation of the ICP algorithm between the edge points and the voxelized intermediates, as Figure 10 shows.
The PCs generated were characterized by points concentrated on the edges of the buildings, while on flat (non-rough) surfaces the points were sparse, depending on the size of the voxel used. Table 2 presents the numbers of the remaining points. The original set consists of the point cloud after applying the octree structuring process. Figure 11 shows the graphic comparison. The addition of edge points in the voxelized data set resulted in an average increase of 23% in the number of points, considering the three tested configurations.
Despite the 23% increase, in both experiments, there was a reduction of 80, 96, and 98%, respectively, for voxels with dimensions of 0.01, 0.03, and 0.05 m. In this case, for the simplification of a PC from profiling of constructions, from the experiments, it was possible to verify that the adoption of the 0.01 m voxel presents adequate simplification rates since a simplification of around 80% is enough to allow processing with lower computational and storage costs, allowing 20% of the original cloud points to remain for future steps.
Anu. Inst. Geociênc., 2023;46:52720  The voxels with dimensions of 0.03 and 0.05 m significantly simplified the original sets, remaining less than 5%. This can lead to interpretation errors by computational algorithms in the stages of designing the 3D models. To test the efficiency of the proposal, a comparison was made with two other methods frequently used to simplify PCs.

Proposed Method Versus Other Simplification Methods
To analyze the experiments, the results obtained with the proposed methodology were compared with two methods used to simplify and minimize the number of points belonging to a cloud. The proposed approach seeks to store a compressed output based on the analysis of roughness values of points belonging to a PC, keeping edge aspects as an essential characteristic of the process. That is, it stores a set of points that coherently represents the changes in perspectives of an environment, minimizing the number of points and consequently the computational cost of processing.
In this case, only the results obtained with the voxel considered ideal (0.01 m dimension) were compared with regularly used algorithms. The Poisson Sample Disk (PSD) and the so-called Random Sample (RS) methods were used, both implemented, respectively, in MeshLab (https://www. meshlab.net/) and CloudCompare (https://www. danielgm. net/cc/). The advantage of the proposed method is the maintenance of edge points automatically, while the other processes minimize the PC without considering that changes in angulations and perspectives can introduce errors in estimates and representations in the making of 3D models.   The comparison test of simplification methods indicated that for both experimental sets, the size of the stored and generated file was larger than the others produced by the application of the tested algorithms. In addition, the geometric quality of the 3D representation of the PC generated with the proposed method was superior. Among the tested algorithms, the RS presented marked limitations, as it is not capable of performing analysis for the definition of the remaining points. The process is based only on the analysis of the distances between the points, leaving only one point as a result of a distance previously established by the analyst.
Mathematically, the results were compared with the definition of the simplification ratio ε. In this case, *100 P P P ε ′ − = %, where P is the number of points of the original PC and P' of the simplified sets, as listed in Table 3.
The PSD method allows sampling points based on the principle of proximity by defining the radius of the Poisson disk (user-defined iPCut parameter), which is the half-distance between the two closest samples. In comparison with the RS algorithm, the PSD presents a more uniform distribution over the sampling domain (Hou et al. 2013), however, it does not present the analysis of characteristic points such as the edge ones. The RS strategy allows minimizing the set of points randomly, that is, from the original PC the algorithm minimizes the set to a pre-established number of points. The position of each point does not follow specific rules, but random aspects of distribution. For flat surfaces and with little geometric variability the method can be applied and will be efficient. However, points that are important in the modelling process may be excluded from the sample set.
Despite the simplification ratio being on average 18% lower than the algorithms compared for both experimental sets, there was a simplification rate that allows ensuring that the derived set P' presents adequate geometric selfsimilarity to the original set P. It is noteworthy that the tests were performed with standard software settings and the parameters can be modified to find a lower simplification rate, but characteristic points, such as edges, will not be preserved.

Conclusions
In this work, an approach for the composition of a 3D cartographic generalization operator for simplification of PC using self-similarity based on fractal geometry theories is presented. The approach operates in profiling of outdoor environments to buildings. The method basically explores the estimation of normal vectors and the roughness of point belonging to a PC for inheritance of edge points, which configure as characteristics the self-similarity genes of the set that is associated with a simplified PC. This allows to successfully reduce the number of points os PC and, consequently, to minimize the computational cost of storage and processing.
The use of the voxelization process showed satisfactory results, but due to the manual definition of the voxel size, the PC can be oversimplified and lose essential characteristics. The primary analysis of the edge points allowed to minimize this problem, providing a less noisy, uniform set similar to the original set. Therefore, from the carried-out analyses, it is possible to assume that to achieve a minimum density of points to model an environment three-dimensionally, one must analyse the geometric selfsimilarity characteristics of a PC to produce a simplified set that is self-similar to the original, considering the premises of fractal geometry. For future researches, it is recommended to create an automatic simplification process to minimize the subjectivity of the analyst.