## Abstract

Teeth scans are essential for many applications in orthodontics, where the teeth structures are virtualized to facilitate the design and fabrication of the prosthetic piece. Nevertheless, due to the limitations caused by factors such as viewing angles, occlusions, and sensor resolution, the 3D scanned point clouds (PCs) could be noisy or incomplete. Hence, there is a critical need to enhance the quality of the teeth PCs to ensure a suitable dental treatment. Toward this end, we propose a systematic framework including a two-step data augmentation (DA) technique to augment the limited teeth PCs and a hybrid deep learning (DL) method to complete the incomplete PCs. For the two-step DA, we first mirror and combine the PCs based on the bilateral symmetry of the human teeth and then augment the PCs based on an iterative generative adversarial network (GAN). Two filters are designed to avoid the outlier and duplicated PCs during the DA. For the hybrid DL, we first use a deep autoencoder (AE) to represent the PCs. Then, we propose a hybrid approach that selects the best completion to the teeth PCs from AE and a reinforcement learning (RL) agent-controlled GAN. Ablation study is performed to analyze each component’s contribution. We compared our method with other benchmark methods including point cloud network (PCN), cascaded refinement network (CRN), and variational relational point completion network (VRC-Net), and demonstrated that the proposed framework is suitable for completing teeth PCs with good accuracy over different scenarios.

## 1 Introduction

A misaligned tooth can be treated with proper cosmetic dentistry products, also known as teeth aligners [1]. 3D printing of teeth aligners is promising since no patient has a similar set of teeth with the same dimensions and form of misalignment. Thus, 3D printed teeth aligners have been recently predominant in orthodontics as an alternative to traditionally manufactured teeth aligners [2]. The key advantages of 3D printed teeth aligners include fewer clinical emergencies and improved aesthetics, comfort, oral hygiene, periodontal health, and lack of soft tissue irritation. In addition, the 3D printed aligners have high-resolution digitally designed borders, smoother edges that do not need post-processing polishing, and customizable intra-aligner thickness, compared with traditional fabrications [2].

As shown in Fig. 1, teeth scans are required for many applications in restorative dentistry and orthodontics [3]. In particular, dentists use teeth scans to define a suitable treatment and design the aligner, which includes annotation, segmentation, alignment, and rotation [4]. Most 3D data are acquired using laser scanners, three-dimensional cameras, and computed tomography (CT)/magnetic resonance imaging scanners in the form of point clouds (PCs) [5,6]. PCs are highly memory efficient and preserve fine surface details [7,8]. Several deep learning (DL) approaches have addressed the shape completion problem for 3D PCs [6,7,9–11]. However, DL models generally require a significant amount of data for their training [12], which hampers their applications in some medical domains with limited data [13]. Therefore, there is a need for an efficient PC completion framework that works with limited data. To address these problems, we propose a two-step data augmentation (DA) technique, followed by a hybrid DL approach to complete the PCs.

To start, we use the human bilateral symmetry to split and recombine the teeth PCs to enlarge our dataset (see Fig. 2(*a*.1)). However, some combined PCs could be problematic. Specifically, if the combined PCs are too similar to other PCs (i.e., redundant PCs), the dataset could become redundant, which may result in model performance degradation [14]. Meanwhile, if the combined PCs are too different from the raw PCs (i.e., outlier PCs), the trained model may not be accurate [15]. Consequently, we develop two filters to discard the defective (i.e., redundant or outlier) PCs by comparing the raw and combined PCs using chamfer distance *d*_{CH}.

In addition, generative adversarial networks (GANs) have been used as a DA technique [13,16,17]. GANs can create fake data that resemble the real data from a random vector (seed **z**) [18] and are particularly useful in the medical domain [16]. Hence, in the second DA step, we train a latent space GAN (l-GAN) to generate fake PCs iteratively. In each iteration, we create fake PCs from a set of seed **z**. Then, we use our filters to isolate the useful PCs’ seed **z** distribution and apply the new distribution to generate new PCs in the next iteration (see Fig. 2(*a*.2)). Consequently, an augmented dataset is obtained and then used to train a deep autoencoder (AE) and a reinforced-learning agent-controlled GAN (RL-GAN) [6].

RL is used to optimize system performance based on training so that the system can automatically learn to solve complex tasks from the input and the reward [19–21]. Then, we use the AE and RL-GAN to complete the incomplete PCs and select the best completion by comparing their similarity with the incomplete PCs (see Fig. 2(b)).

An ablation study is performed to analyze each component’s contribution [22]. We compared our method with other benchmark methods including point cloud network (PCN), cascaded refinement network (CRN), and variational relational point completion network (VRC-Net).

The main contributions of this study are summarized as follows:

We propose customized data augmentation and filtering methods that exploit the human teeth’ bilateral symmetry and iterative l-GAN for fake PC generation.

We use a hybrid AE and RL-GAN framework to identify the best teeth PC completion.

## 2 Literature Review

### 2.1 Teeth Molds and Intraoral Scans.

Teeth molds/dental impressions are transcendental for patient dental diagnosis and treatment [23]. Traditionally, dental impressions have been performed on elastomers [24], alginates [25], wax [23], plaster [26], etc. For instance, Megremis et al. [24] evaluated eight elastomeric occlusal registration models for restorative dental procedures. Hellmann et al. [25] obtained dental impressions made from alginate for bite recording and prosthetic reconstruction planning. See also Refs. [27,28]. Although traditional dental impressions have benefited dental diagnosis and treatment, these methods are invasive, time-consuming, and produce high material waste.

Current digitization technology has enabled one to obtain digital impressions for subsequent diagnosis and procedure planning (e.g., orthodontia and surgery planning) [29,30]. Several scanning methods have been used to digitize the dental impressions, such as X-ray [31], optical scanning [32], and computer tomography (CT) [33]. For instance, Kamegawa et al. [34] measured dental casts with a micro-focus X-ray for a 3D morphological assessment of occlusion treatment. Kang et al. [35] used 3D optical scanning of dental casts for bite registration. See other examples in Refs. [36,37]. These methods have helped to ameliorate the limitations of conventional teeth molds/dental impressions, however, they still need conventional dental impressions as a starting point.

Intraoral scanning (IOS) can produce digital impressions with minimum patient invasion. Current IOS technologies include light projection, distance object determination, and reconstruction [38]. Ireland et al. [39] described the utilization of light projection (e.g., digital fringe) to obtain accurate digital dental impressions. Pradíes et al. [40] used stereophotogrammetric technology for obtaining intraoral digital impressions of implants. See similar studies in Refs. [41,42]. Generally, scanning technology has proven to be effective at representing 3D objects and facilitating the utilization of traditional manufacturing processes (e.g., milling) and additive manufacturing in the dentistry industry. However, irrespective of the scanning methods, the teeth molds/dental impressions suffer from outliers, occlusion, irregularity, and unstructuredness [43].

### 2.2 Point Cloud Shape Denoising and Completion.

PCs have become popular to represent 3D objects in various fields, such as robotics, autonomous driving, and 3D modeling and fabrication [44]. The PCs need to undergo denoising and completion to represent an entire 3D object (e.g., teeth mold) [5,44].

Conventional methods, such as density-based and geometry-based methods, have been deployed for PC denoising and completion, respectively [45]. Ester et al. [46] developed a density-based algorithm for discovering clusters in large spatial databases with noise. Zhao et al. [47] presented a robust hole-filling algorithm for triangular mesh. Here, new vertices are re-positioned by solving the Poisson equation. See other similar studies in Refs. [48,49]. These methods heavily rely on assumptions, such as symmetry and shape similarity, which are not suitable for unstructured data, as is the case of PCs.

Machine learning (ML) approaches have also demonstrated important progress for PC denoising and completion via dimensional reduction and regression techniques [50–52]. Duan et al. [50] applied a principal component analysis-based approach for low-complexity PC denoising for LiDAR data. Sarkar et al. [51] developed a structured low-rank matrix factorization for PC denoising. Gandler et al. [52] presented an object shape estimation approach based on sparse Gaussian process implicit surfaces combining visual data and tactile exploration. See also Refs. [53,54]. Although ML methods are robust, their performances are limited for complex shapes or considerably large missing areas in the PCs.

DL methods have demonstrated good performance for PC denoising and completion [6,44]. For instance, Yuan et al. [55] proposed a point cloud completion network (PCN). This pioneering work consists of an encoder–decoder network to reconstruct dense and complete point sets from an incomplete point cloud. Pan et al. [56] exploited multi-scale local point features to reconstruct point clouds with fine-grained geometric details to predict local and thin shape structures in their VRC-Net. In addition, AE and GAN-based approaches have outperformed traditional methods [57,58]. Zong et al. [59] proposed a denoising AE for learning robust local region features from partial inputs. Wang et al. [60] developed a CRN for point cloud completion. See also Refs. [7,61]. The performance of these methods is affected by small sample sizes and robustness during training [62].

In addition, training a GAN is an unstable process and may suffer from model collapse [6]. To address these issues, Sarmad et al. [6] presented an RL-GAN network for real-time PC completion. However, the model performance is still insufficient for small sample sizes and can be improved by deploying DA techniques and l-GAN-based fake PC generation, which will be addressed in this paper.

## 3 Proposed Framework

Figure 2 shows our proposed framework to complete the 3D PCs with a limited number of PCs. First, we propose a two-step DA technique to enlarge the quantity and diversity of the PCs. In the first step, we generate new PCs by splitting and recombining the raw PCs based on the bilateral symmetry of human teeth, as shown in Fig. 2(*a*.1). Then, we apply two filters to remove the outliers and the redundant PCs from the combined PCs. In the second step, we use the raw and filtered combined PCs to train AE1. AE1 creates a latent representation of the PCs, which are used to train the l-GAN1. The l-GAN1 can generate new PC’s encoded representations from a random vector (i.e., seed **z**). To make sure the generated PCs are consistent with the teeth molds, we propose to use filters to isolate the useful PCs and modify the seed **z** distribution iteratively (see Fig. 2(*a*.2)).

Second, we deploy a hybrid approach that completes the PCs using two methods, namely, AE2 and RL-GAN, as shown in Fig. 2(b). In the first method, AE2 takes the encoded representation and decodes it back into a completed teeth PC (PC_{AE2}). Consequently, in the second method, the RL agent uses the encoded representation from AE2 to control the l-GAN2 generator to get RL-GAN encoded representation which is turned into a complete PC ($PCGAN2$) using the AE2 decoder. Finally, we select the best completion by computing the similarity between the output PCs ($PCAE2$ and $PCGAN2$) and the input PC (i.e., incomplete PC). We then perform the ablation study to investigate the contribution of each component. We introduce details of the proposed framework in the following sections.

### 3.1 Point Cloud Combination.

A small training dataset may cause overfitting and can significantly affect the generalization capability of a neural network [63]. Data augmentation is a general technique to alleviate the problems caused by data sparsity [64]. Hence, after mirroring our dataset, we combine our raw PCs following the procedure described in Fig. 2(*a*.1). To start, we take two PCs (*aa*′ and *bb*′, where *aa*′ ≠ *bb*′) from the raw dataset and divide them into left (i.e., *a* and *b*) and right sides (i.e., *a*′ and *b*′) by the median plane. A median plane is a sagittal plane placed in the center of the human body that divides it into two symmetrical parts [65].

The median plane is determined as follows: (1) The PCs are translated to be centered and scaled to unit length. (2) We compute the principal component axes of the first PC using principal component analysis and then aligned the *x*-, *y*-, and *z*-axis with the principal component axes. This step allows us to align the PC with a reference [66]. Since the PCs of teeth molds are symmetric, after the alignment, the *y*–*z* plane coincides with the median plane. (3) Finally, we register the remaining PCs to the first PC using an iterative closes point algorithm [54]. The relative positions of the teeth point cloud and the median plane are determined based on the Euclidean distance of the scaled PCs to the origin of the *x*-, *y*-, and *z*-axis. Then, we combine the right halves with the left halves (i.e., *a* with *b*′ and *b* with *a*′) to obtain two new PCs per combination. Hence, we generate $ng=2(T2)=T(T\u22121)$ combined samples, where *T* is the number of PCs in the raw dataset.

*d*

_{CH}), a broadly adopted metric to measure the similarity between two PCs [67], to quantify the differences between PCs. The

*d*

_{CH}between two PCs (

*P*

_{1}and

*P*

_{2}) is defined as

*x*∈

*P*

_{1}finds its nearest neighbor

*y*∈

*P*

_{2}and vice versa. All the point-level pairwise distances are averaged to produce the shape-level distance [67].

We compute the *d*_{CH} between every pair of PCs in the raw dataset and define the min and max thresholds as the minimum and maximum *d*_{CH}, respectively. These thresholds are used in our designed filters to remove redundant and outlier PCs. In particular, the first filter (F1) is designed to remove outliers. We first calculate the *d*_{CH} between the generated PCs and the first PC in the raw dataset. Then F1 removes the outlier PCs that have *d*_{CH} larger than the max threshold. Here, only the first sample is picked to avoid the computational burden otherwise incurred in comparing with all raw PCs. Then, the second filter (F2) iteratively removes the redundant PCs by maintaining a pairwise matrix of *d*_{CH} of generated PCs. In each iteration, F2 removes the PC with the maximum number of redundant samples (i.e., *d*_{CH} that are smaller than the min threshold) with other PCs. Then, it updates the pairwise distance matrix. The filtering process is repeated until all the redundant PCs have been removed (i.e., there is no *d*_{CH} in the pairwise distance matrix that is less than the min threshold).

Consequently, the final number (*n*_{f}) of generated PCs is *n*_{f} = *n*_{g} − *n*_{F1} − *n*_{F2}, where *n*_{g} is the original number of generated PCs and *n*_{F1} and *n*_{F2} are the number of PCs removed by F1 and F2, respectively. Finally, we group the gathered data (i.e., raw, mirrored, and *n*_{f} combined PCs) as the l-GAN1 dataset, which is used to train our iterative l-GAN1 network for the second step of DA.

### 3.2 Iterative L-GAN1.

As shown in Fig. 3, we propose an iterative l-GAN framework to iteratively generate PCs, remove outlier and redundant PCs, isolate the useful PCs’ seed **z** distribution, and use the new distribution to generate fake PCs for DA. The above steps are repeated until a certain number of iterations is achieved.

#### 3.2.1 Autoencoder 1.

*E*) and a decoder (

*E*

^{−1}). The

*E*is a network unit through which the input (i.e., PC) is transformed into a multidimensional array referred to as a global feature vector (GFV) (i.e., latent representation). On the other hand, the decoder

*E*

^{−1}is a fully connected network that reverts the process by transforming the GFV back into the raw PC space. To train our AE, we implement a weighted loss function:

*L*

_{CH}is the

*d*

_{CH}between the input (

*PC*

_{in}) and output (

*PC*

_{out}) PCs and

*L*

_{GFV}is the

*L*

_{2}distance between the input and output PC’s GFV (i.e.,

*E*(PC

_{in}) and

*E*(PC

_{out})).

*ω*

_{CH}and

*ω*

_{GFV}are the corresponding weights.

To train our AE, we use the Adam stochastic gradient descent optimizer [68]. The detailed architecture, momentum, learning rate, and other parameters will be introduced in Sec. 4.3.1. To train the AE1, we use the l-GAN1 dataset (i.e., raw, mirrored, and combined PCs). Then, we use the AE1’s encoder (*E*1) to generate a latent representation of our l-GAN1 dataset.

#### 3.2.2 Latent Space Generative Adversarial Network 1.

Several studies have shown that training a GAN on a latent representation leads to more stable results compared to training on raw PCs [6,9]. Here, we apply our pre-trained AE1 to obtain our l-GAN1 dataset’s GFVs. Then, we use the GFVs to train the l-GAN1 (see Fig. 2(*a*.2)).

*D*) and generator (

*G*) loss functions are described in Eqs. (3) and (4), respectively.

**z**,

*λ*is a regularization parameter, $x^=\epsilon x+(1\u2212\epsilon )x~$ is an intermediate variable computed at each training step using a random number $\epsilon $, and $Px^$ is the distribution of $x^$ [70].

*L*

_{D}is a modified

*Earth-Mover*distance constructed using the Kantorovich–Rubinstein duality and a gradient penalty to circumvent tractability issues [70]. Since

*D*estimates the probability that a sample comes from the real data,

*L*

_{G}is large when

*G*produces GFVs that do not resemble the real data. The detailed l-GAN architecture, learning rate, and other parameters will be introduced in Sec. 4.3.2.

#### 3.2.3 Iterative L-GAN1.

Once trained, the l-GAN1 generator (G1) transforms a noise vector (i.e., seed **z**) into the desired target distribution (i.e., fake GFV). The fake GFV can be decoded into a fake PC with *E*1^{−1}. Hence, we propose to use our pre-trained l-GAN1 and filters to iteratively modify the seed **z**’s distribution and generate the fake PCs.

An overview of the iterative l-GAN1 algorithm is shown in Algorithm 1. We first use the pre-trained G1 to generate *n*_{g} fake GFVs from *n*_{g} seeds **z**_{s}. Then, we apply our *E*1^{−1} to the fake GFVs to decode them into fake PCs. We then use F1 and F2 to remove the outlier and redundant PCs, and isolate useful seed **z**_{s} (*Z*_{1}) from the filtered PCs and store the remaining PCs in an accumulator PC_{T}. Consequently, we estimate the *Z*_{1}’s distribution by fitting its mean and covariance matrix and use them to generate a new set of *n*_{g} GFVs. The above processes are repeated for $itmax$ iterations. Finally, we remove the potential redundant PCs by applying F2 to the accumulated data in PC_{T} (see Algortithm 1). By using the pre-trained l-GAN1 iteratively, we produce *n*_{l−GAN1} fake PCs. Then, we group the raw and the generated data (i.e., mirrored, combined, and *n*_{l−GAN1} l-GAN1 PCs) into an RL-GAN dataset that is used to train the hybrid RL-GAN.

##### Training iterative l-GAN1

**Input models:**

Pre-trained AE1 decoder: $E1\u22121$

Pre-trained l-GAN1 generator: $G1$

**Input functions:**

Filter-1 (removes the outliers): $F1$

Filter-2 (removes the redundant PCs): $F2$

**Input data:**

Number of iterations: $itmax$

Number of PCs generated per iteration: $ng$

**Final output:**

Set of l-GAN1 PCs: $PCf$

1: Initialize the mean: $\mu =0$

2: Initialize the covariance matrix: $Cov=I$

3: Initialize an empty array to store the generated PCs: $PCT$

4: **for**$it<itmax$**do**

5: Use $\mu $ and Cov to randomly generate a matrix containing $ng$ seed vectors **z**: $Z0$

6: Obtain $ng$ GFVs: $GFV0=G1(Z0)$

7: Obtain $ng$ PCs: $PC0=E1\u22121(GFV0)$

8: Remove the outliers with F1: $PC1=F1(PC0)$

9: Remove the redundant PCs with F2: $PC2=F2(PC1)$

10: Store the $PC1\u2032$s corresponding seed **z**: $Z1$

11: Store the $PC2$ on $PCT$

12: Update the mean: $\mu =mean(Z1)$

13: Update the covariance matrix: $Cov=cov(Z1)$

14: **end for**

15: Remove the redundant PCs with F2: $PCf=F2(PCT)$

### 3.3 Hybrid RL-GAN.

#### 3.3.1 Reinforcement Learning.

**z**) for the l-GAN2 generator (G2). We train our RL agent using the RL-GAN dataset, following the procedures shown in Algorithm 2 [6]. To start, the agent obtains an input state by encoding the input PC and picks a suitable seed

**z**. Then, G2 uses the seed

**z**to create an RL-GAN GFV, and the decoder (

*E*2

^{−1}) transforms the GFV into a complete PC. Depending on the quality of the action, the environment gives a reward (

*r*) back to the agent. As shown in Fig. 4, the reward for the completion is the combination of negated loss functions that evaluate the intermediate results computed along the process. Specifically, we include the

*r*

_{CH}= −

*L*

_{CH}to ensure that the complete PCs resemble the input PCs, the

*r*

_{GFV}= −

*L*

_{GFV}to quantify the similarity between the input and output latent representations, and

*r*

_{D}=

*D*(

*GFV*) to guarantee that the fake GFV follows the encoded real data distribution. The final combined reward function is shown in Eq. (5), where

*ω*

_{CH},

*ω*

_{GFV}, and

*ω*

_{D}are the corresponding weights to each reward.

##### Training RL-GAN [6]

**Input models:**

Pre-trained AE2 encoder: $E2$

Pre-trained AE2 decoder: $E2\u22121$

Pre-trained l-GAN2 generator: $G2$

Pre-trained l-GAN2 discriminator: $D2$

**Input data:**

Number of iterations: $tmax$

Starting time: $t0$

**Final output:**

Completed PC: $PCRL-GAN$

1: Initialize the environment $Env$: $E2,E2\u22121,G2$$D2$

2: Initialize the policy $\pi $ with $DDPG$, actor $A$, critic $C$, and replay buffer $R$

3: **for**$t<tmax$**do**

4: Get $PCin$

5: **if**$t>0$**then**

6: Train actor $A$ and critic $C$ with $R$

7: **end if**

8: Get state: $st=E2(PCin)$

9: **if**$t<t0$**then**

10: Obtain a random action: $at$

11: **else**

12: Obtain action: $at=A(st)$

13: **end if**

14: Implement action: $GFVRL-GAN=G2(at)$

15: Obtain final PC: $PCRL-GAN=E2\u22121(GFVRL-GAN)$

16: Compute the reward with Eq. (5): $rt$

17: Obtain new state: $st+1=E2(PCRL-GAN)$

18: Store transition $(st,at,rt,st+1)$ in $R$

19: **end for**

To train the RL agent, we use a deep deterministic policy gradient (DDPG) [72]. The DDPG algorithm relies on a parameterized actor and a critic network. The actor–network specifies the current policy (*π*) by deterministically mapping states to a specific action [72]. On the other hand, the critic network provides a measure of the quality of action concerning the input state [6].

##### 3.3.2 Hybrid RL-GAN.

RL-GANs can complete the incomplete PCs but the completed PCs may not always preserve the local details well [6]. In contrast, a pre-trained AE can accomplish shape completion but its performance degrades drastically as the percentage of missing data increases [9]. To address these problems, we use a hybrid RL-GAN to select the best completion between RL-GAN and AE. In particular, we complete a PC using AE2 to get $PCAE2$ and RL-GAN to get PC_{RL−GAN}. Then, we compute the *d*_{CH} of $PCAE2$ and PC_{RL−GAN} to the incomplete PC, respectively, and select the smaller *d*_{CH} between the two as the completion output.

## 3.4 Ablation Study.

In deep learning, an ablation study involves measuring the performance of a system after removing one or more of its components to help understand the relative contribution of the ablated components to overall performance [73–75].

## 4 Case Study

### 4.1 Data Acquisition and Preparation.

Figure 6 shows the data acquisition and preparation procedures, including 3D printing, 3D scanning, and processing. We introduce the details below.

#### 4.1.1 3D Printing and 3D Scanning.

The lower jaw teeth molds were printed with an Ender Creality Pro printer. The printer specifications are: nozzle size 0.4 mm, infill density 20%, printing speed 50 mm/s, wall thickness 0.8 mm, and extruder temperature 200 °C. Forty-five teeth molds are printed in total. After the printing, a 3D light-based scanner, Solutionx C500 3D, with an incorporated base plate was used to scan the printed parts. The parameters used for the scanner are set to scanning area FOV350 (diagonal distance up to 350 mm), scanning volume 264 × 218 × 120 mm, and point spacing 0.110. A customized scan path is used to optimize the scanning process time and improve the PC quality, following the steps proposed in Ref. [79]. A total of forty-five positions were used to finish a teeth mold scan. To obtain a single 3D mesh from a scan path, a multi-step registration was applied to the scanning sequence.

#### 4.1.2 Data Preparation.

We apply a voxel downsampling [76] to reduce the PCs’ dimension to 2048 points. The downsampled PCs are translated to be centered and scaled to unit length. Additionally, to allow the combination process, we register our downsampled PCs by applying an iterative closest point algorithm [76], where all the PCs are registered to the first PC in the training dataset.

Once we process and prepare our raw data, we obtain 45 PCs, referred to as raw PCs. The raw PCs are then randomly and equally split into training, validation, and testing datasets. We apply the DA techniques described in Secs. 3.1 and 3.2 to our training data, and use this enlarged dataset to train the hybrid RL-GAN. Since the framework works in a latent representation, we must ensure that our AEs perform well for the encoding of new PCs. Consequently, we use the validation dataset to select the best AE1 and AE2 for the l-GAN1 and l-GAN2, respectively.

### 4.2 Point Cloud Combination.

We then perform the first step of the DA. In particular, we duplicated our training dataset by mirroring the 15 raw PCs. After that, we combined our 15 raw PCs and generated *n*_{g} = 210 PCs. Then, we computed the *d*_{CH} between every pair of PCs in our raw dataset to get the minimum (min = 5.141 × 10^{−5}) and the maximum (max = 4 × 10^{−4}) *d*_{CH} allowed distance. Consequently, we applied our F1 and F2 to remove outliers and redundant PCs. Since we combined the raw PCs by their plane of symmetry (i.e., median plane), the generated data resembled the real PCs; hence, F1 identified no outliers (i.e., *n*_{F1} = 0). However, the combination process inevitably created redundant data, and F2 discarded *n*_{F2} = 35 PCs. Through the first DA step, we generated *n*_{f} = 175 final combined PCs.

Finally, we grouped the raw (i.e., 15 training PCs) and generated data (i.e., 15 mirrored and 175 final combined PCs) into an l-GAN1 dataset to train our iterative l-GAN model.

### 4.3 Iterative L-GAN1

#### 4.3.1 Autoencoder 1.

The encoder architecture follows the design principle described by Ref. [80]. Specifically, 1D convolutional layers with kernel size one and an increasing number of features. Max-pooling layers are used in the encoder network for spatial downsampling of input data [81]. In our implementation, the encoder is composed of five 1D convolutional layers with 128, 128, 256, 128, and 128 channels. The decoder is a fully connected neural network (FCN) with four layers of 128, 128, 256, and 6144 neurons. Both networks use batch normalization and ReLU activation functions. We trained our AE models by minimizing the combined loss function described in Eq. (2) with *ω*_{CH} = 100 and *ω*_{GFV} = 30. We used a batch size *b* = 49, Adam optimizer with *β*_{1} = 0.8, *β*_{2} = 0.99, and a learning rate *lr* = 5 × 10^{−4} for 10,000 iterations. To select the final AE, we evaluate our model performance (i.e., *L*_{AE}) with the validation dataset.

Then, we apply the selected AE1 encoder *E*1 to obtain the latent representation of the l-GAN1 data.

#### 4.3.2 Latent Space Generative Adversarial Network 1.

To train the l-GAN (i.e, generator and discriminator) models, we implemented the self-attention framework described in Refs. [6,69]. Both models were trained using a WGAN-GP adversarial loss with *λ* = 10, Adam optimizer with *β*_{1} = 0.8, *β*_{2} = 0.99, and *lr* = 5 × 10^{−4} for 10,000 iterations [70]. During this process, we use *b* = 41 (i.e., batch size) 32-dimensional seed **z**_{s}, randomly sampled from a multivariate normal distribution with the initial $\mu =0$ and Cov = *I*.

#### 4.3.3 Iterative L-GAN1.

We apply our iterative l-GAN1 it_{max} = 10 iterations and generate *n*_{gi} = 1000 PCs per iteration. After each iteration, we remove the outlier and redundant PCs by applying F1 and F2 filters and store the filtered PCs (*n*_{fi}) in an accumulator PC_{T}. Figure 7 shows the number of PCs removed by F1 and F2 (i.e., *n*_{F1i} and *n*_{F2i}) and the number of PCs (|PC_{T}|) accumulated over iterations. Since we update the seed **z** distribution after applying F1 in each iteration, we can observe that the number of outliers (i.e., *n*_{F1i}) decreases in each iteration. However, most of the remaining PCs were redundant, so the number of PCs removed by F2 increased in each step. In the end, we generated |PC_{T}| = 285 PCs; however, after removing the redundant PCs with F2, our dataset is reduced to *n*_{fT} = 49 PCs.

### 4.4 Hybrid RL-GAN

#### 4.4.1 Reinforcement Learning Agent.

To train our RL agent, we adopted the actor-critic method proposed by Sarmad et al. [6]. This network has four layers of 400, 400, 300, and 300 neurons, with ReLU activation for the first three layers and Tanh activation for the last layer. Similarly, the critic is an FCN with four layers of 400, 432, 300, and 300 neurons, with the ReLU activation function in the first three layers and no activation function in the last layer.

The training process of the agent is composed of two steps [6]. First, we collect experience using one sample at a time. Towards this end, the agent picks a seed **z** and we evaluate its performance using Eq. (5) with *ω*_{CH} = 100, *ω*_{GFV} = 10, and *ω*_{D} = 0.001 [6]. In the second step, we train our actor-critic network using DDPG with *b* = 100 [6]. In this study, the total number of iterations was *t*_{max} = 20, 000 with a starting time of *t*_{0} = 1000. The state dimension is the GFV size (i.e., 128), while the action dimension is the number of elements in the seed **z** (i.e., 32).

#### 4.4.2 Hybrid RL-GAN.

We use the RL-GAN dataset (i.e., 15 raw, 15 mirrored, 175 combined, and 49 l-GAN PCs) to train our hybrid RL-GAN. We trained our AE2, l-GAN2, and RL Agent following the procedures described in Secs. 4.3.1, 4.3.2, and 4.4.1, respectively. To evaluate the completion performance, we use our testing samples (i.e., 15 PCs) to generate six incomplete datasets ($N=5,10,15,\u2026,30%missing area$), where *N* random locations are selected, and an area of $1%$ is removed at each location. We use the incomplete datasets to evaluate the hybrid RL-GAN performance against benchmark models PCN [55], CRN [60], VRC-Net [56], AE1 trained for the iterative l-GAN1, AE2, and RL-GAN.

Figures 8(a)–8(f) display the shape completion results for 5–30% missing area for an example sample in the testing dataset. The first column shows the raw PCs as the ground truth, while the second column shows the incomplete PCs. The other columns show the performance of the hybrid RL-GAN and benchmark methods.

Figure 8 corroborates that VRC-Net and CRN reconstructions miss the detailed PC structure, and only the general teeth shape is obtained. PCN-completed PCs are more uniformly distributed compared to the VRC-Net and CRN approaches. Compared with the benchmark models, the hybrid RL-GAN framework improves the accuracy of missing PC data completion. On the other hand, one can see that both AE2 and RL-GAN can complete the shape well when the missing area is small. However, the AE’s performance degrades when the missing area is big (see Fig. 9), while RL-GAN is more stable when the missing area is increased. Hybrid RL-GAN selects the best completion between the AE2 and the RL-GAN outputs by evaluating their *d*_{CH} to the incomplete PC. The hybrid RL-GAN selection is marked in Fig. 8 with a black star.

Furthermore, we quantify the PC completion performance for the proposed and benchmark methods using *d*_{CH}. The smaller the distance, the better the completion. Figure 9 shows the average *d*_{CH} between the completed PCs and the ground truth PCs in the testing dataset. One can see that VRC-Net has the highest *d*_{CH}, followed by the CRN. This could happen because the amount of training PCs is not enough for these methods since they need massive datasets for training [56,60]. Although PCN generates better results than VRC-Net and CRN, our approach outperforms the state-of-the-art methods, as displayed in Fig. 9. By including the l-GAN1 PCs, we reduced the AE’s completion error; hence AE2 outperforms AE1.

Similar to the previous study [6], the AEs’ performance degrades drastically and becomes unstable as the percentage of missing data increases. This behavior can be caused by the high GFV variability when the missing area is getting bigger. On the other hand, the RL-GAN’s completion error is much more stable, yet the generated RL-GAN PCs may fail to preserve local details. Consequently, by selecting the best completion, the hybrid approach addresses the AE and RL-GAN problems and reduces the *d*_{CH} to the ground truth.

### 4.5 Ablation Study.

In the ablation study, we remove one module at a time and obtained five ablated systems. Then, we compute the $%IncCH$ using Eq. (6). In all cases, we evaluate the ablated systems and the original system’s performance using the validation dataset.

For the first ablated system (AS1), we remove the PC-Combination module. After mirroring our dataset, we directly run the iterative l-GAN1 (Sec. 4.3.3). However, since 30 PCs is not enough to train a suitable AE, the iterative l-GAN1 could not produce any useful data. Therefore, we use 30 PCs to train RL-GAN (i.e., l-GAN2 and RL Agent) and reconstruct the incomplete data following the procedure described in Secs. 4.4.1 and 4.4.2, respectively. As shown in Table 1, when we remove the PC Combination, the average chamfer distance for all the missing areas increases by 233.34%.

Missing area (%) | AS1 (%) | AS2 (%) | AS3 (%) | AS4 (%) | AS5 (%) |
---|---|---|---|---|---|

5 | 237.33 | 39.44 | 28.20 | 6.22 | 6.62 |

10 | 235.96 | 38.96 | 69.93 | 16.85 | 6.15 |

15 | 230.64 | 37.43 | 32.68 | 17.85 | 4.43 |

20 | 235.14 | 38.76 | 30.59 | 12.22 | 6.03 |

25 | 223.54 | 34.23 | 35.64 | 8.65 | 2.17 |

30 | 237.42 | 41.19 | 62.14 | 18.75 | 6.61 |

Average | 233.34 | 38.34 | 43.20 | 13.42 | 5.34 |

Missing area (%) | AS1 (%) | AS2 (%) | AS3 (%) | AS4 (%) | AS5 (%) |
---|---|---|---|---|---|

5 | 237.33 | 39.44 | 28.20 | 6.22 | 6.62 |

10 | 235.96 | 38.96 | 69.93 | 16.85 | 6.15 |

15 | 230.64 | 37.43 | 32.68 | 17.85 | 4.43 |

20 | 235.14 | 38.76 | 30.59 | 12.22 | 6.03 |

25 | 223.54 | 34.23 | 35.64 | 8.65 | 2.17 |

30 | 237.42 | 41.19 | 62.14 | 18.75 | 6.61 |

Average | 233.34 | 38.34 | 43.20 | 13.42 | 5.34 |

Note: AS1, remove PC-Combination; AS2, remove filters; AS3, remove iterative l-GAN1; AS4, remove RL-GAN; AS5, remove hybrid RL-GAN.

In the second ablated system (AS2), we do not use the filters that control the data augmentation. Thus, we perform PC-Combination (Sec. 4.2) and obtain 210 PCs. Without the filters, we cannot control the l-GAN1’s seed **z** distribution, so we generate 1000 PCs in a single step. Then, we follow Secs. 4.4.1 and 4.4.2 to train the remaining modules and reconstruct the incomplete PCs accordingly. Table 1 shows that when we remove the filters, the average chamfer distance for all the missing areas increases by 38.34%.

For the third ablated system (AS3), we remove the iterative l-GAN1. Therefore, we augment our data using PC-Combination (Sec. 4.2). Then, we use our data to train RL-GAN (i.e., l-GAN2 and RL Agent) following Sec. 4.4.1. Finally, we process the incomplete PCs and select the best reconstruction between the RL-GAN2 and the AE2 as described in Sec. 4.4.2. Table 1 shows that without the iterative l-GAN1, the average chamfer distance for all the missing areas increases by 43.20%.

In the fourth ablated system (AS4), we remove the RL-GAN module. Therefore, after augmenting our data following Secs. 4.2 and 4.3.3, finally, we process the incomplete data as described in Sec. 4.4.2. Without the RL-GAN model, the hybrid RL-GAN always chooses the AE2 reconstruction. As shown in Table 1, when we remove the RL-GAN module, the average chamfer distance increases by 13.42%.

For the final ablated system (AS5), we remove the hybrid RL-GAN module. We start by augmenting our dataset following Secs. 4.2 and 4.3.3. After that, we train RL-GAN (i.e., l-GAN2 and RL Agent) as described in Sec. 4.4.1 and use RL-GAN to reconstruct the incomplete data. Table 1 shows that without the iterative l-GAN1, the average chamfer distance for all the missing areas increases by 5.34%.

As shown in Table 1, the ablation study demonstrated that all the modules in the proposed framework are helpful in PC completion.

## 5 Conclusion and Future Work

3D scanned PCs are more and more widely used in orthodontics applications. One critical issue of the PCs is the missing areas due to factors such as limited viewing angles and occlusions. In this paper, we proposed a systematic framework for 3D PC completion with limited data. The framework consists of (1) a two-step data augmentation technique based on the bilateral symmetry of human teeth and an iterative GAN, and (2) a hybrid RL-GAN method that selects the best completion from AE and RL-GAN. Through the demonstration in the 3D teeth mold PCs, the proposed framework can achieve accurate PC completion. The proposed PC completion framework can be applied to other PC completion scenarios with limited data and complex shapes.

In the future, based on the completed 3D PCs, we will explore the forecast of the teeth alignment of a patient over time. This forecast will help the dentists in the monitoring and planning of the alignment procedures. Another direction is to consider the biographic information, habit, etc. for the personalized PC completion and forecast.

## Acknowledgment

This work is partially supported by the NSF Grant No. FM-2134409 and Sustainable Manufacturing and Advanced Robotics Technologies, Community of Excellence (SMART CoE) at the State University of New York at Buffalo.

## Conflict of Interest

There are no conflicts of interest.

## Data Availability Statement

The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.