## Abstract

This paper shows how to use conditional generative models in two-dimensional (2D) airfoil optimization to probabilistically predict good initialization points within the vicinity of the optima given the input boundary conditions, thus warm starting and accelerating further optimization. We accommodate the possibility of multiple optimal designs corresponding to the same input boundary condition and take this inversion ambiguity into account when designing our prediction framework. To this end, we first employ the conditional formulation of our previous work BézierGAN–Conditional BézierGAN (CBGAN)—as a baseline, then introduce its sibling conditional entropic BézierGAN (CEBGAN), which is based on optimal transport regularized with entropy. Compared with CBGAN, CEBGAN overcomes mode collapse plaguing conventional GANs, improves the average lift-drag (C_{l}/C_{d}) efficiency of airfoil predictions from 80.8% of the optimal value to 95.8%, and meanwhile accelerates the training process by 30.7%. Furthermore, we investigate the unique ability of CEBGAN to produce a log-likelihood lower bound that may help select generated samples of higher performance (e.g., aerodynamic performance). In addition, we provide insights into the performance differences between these two models with low-dimensional toy problems and visualizations. These results and the probabilistic formulation of this inverse problem justify the extension of our GAN-based inverse design paradigm to other inverse design problems or broader inverse problems.

## 1 Introduction

Design synthesis with shape optimization is often time-consuming. After setting up a forward (analysis) model of the objective function under a set of boundary conditions or requirements, you have to specify an initial set of design variables and embark on some (typically) wearisome and costly iterative algorithm to minimize that objective. This swings to its extreme when the analysis model is formulated as nonlinear differential equations that need to be solved iteratively, as, in the case of airfoil design, the Navier–Stokes equation governs the system. Optimizing a design for a single set of boundary conditions can take days or weeks, depending on the number of design variables and complexity of the objective function or forward model.

Therefore, it is tempting to construct a mapping that leads us directly from the input problem parameters—e.g., boundary conditions, constraints, or other problem-specific requirements—to the optimal design variables. Alternatively, to be more pragmatic, at least to the vicinity of the optima to short-circuit or accelerate a subsequent optimization process. Researchers refer to such mappings as *Inverse Design*.

The quest for such a mapping induces several challenges. Apart from seeking a model with enough complexity and regularity to approximate this complicated mapping with precision, another inevitable conundrum is the *inversion ambiguity* that inverse design problems usually confront. This is when there are multiple potential, near-optimal designs corresponding to the same input condition. This stymies traditional bijective regression models.

In this work, we attempt to address these challenges in the context of 2D airfoil optimization, in which, given the input boundary conditions, we predict the corresponding airfoils with near-maximal lift-drag (*C*_{l}/*C*_{d}) efficiency. Specifically, after casting this inverse design problem from a probabilistic perspective, we introduce two recently developed probabilistic generative models—conditional BézierGAN (CBGAN) and conditional entropic BézierGAN (CEBGAN)—to realize airfoil inverse design in a probabilistic and data-driven manner. Our result shows that CBGAN and CEBGAN can, respectively, produce airfoils with 80.8% and 95.8% of the average optimal airfoil performance. Their predictions can then serve as warm start initialization points to accelerate further optimization. This paper’s key contributions are as follows:

Our conditional GANs (i.e., CBGAN and CEBGAN) take the freestream conditions and the target property, including the Mach number, Reynolds number, and the target lift coefficient as input, then directly predict airfoils of near-optimal

*C*_{l}/*C*_{d}efficiency (80.8% and 95.8% of optimal*C*_{l}/*C*_{d}, respectively) that can short-circuit optimization. We measure the optimality gap in both instantaneous and cumulative senses as their performance metrics.Our CEBGAN incorporates the recent advances in computational optimal transport to accelerate its training by 30.7% compared with CBGAN while achieving better final performance (higher

*C*_{l}/*C*_{d}efficiency).CEBGAN (or more generally CEGAN—conditional entropic GAN) enables evaluating the surrogate log-likelihood of samples for decision-making during prediction. We develop the formulation of the surrogate log-likelihood for CEGAN and prove its validity in the Supplementary Materials on the ASME Digital Collection. We further investigate its practicability by examining its correlation with actual sample performance (i.e., aerodynamic efficiency).

We compare the performance of CGAN and CEGAN on low-dimensional toy problems to provide more insight into each model’s behavior and illustrate the paper’s key results on an understandable example.

We create a dataset of optimal airfoils and the algorithm for generating it for future studies. This dataset and the corresponding code to replicate the paper are located online.

^{2}

## 2 Background and Related Work

This section first provides some background context on inverse problems, inverse airfoil design, and mainstream generative models to lay the foundation for how we cast inverse airfoil design from a probabilistic viewpoint. Then, we enumerate some recent work incorporating conditional generative models to solve inverse design problems or broader inverse problems.

### 2.1 Inverse Problems.

**x**from the observations

**y**governed by a forward problem:

*F*:

*X*→

*Y*is a forward operator mapping parameters to observed data, and

**e**is the observation noise or tolerance; for the implicit one on the right, $R:X\xd7Y\u2192Rm$ is a residual operator measuring

*X*and

*Y*’s observation of the governing equations such as PDEs. The explicit form can be converted to the implicit form via $R(x,y)\u2190y\u2212F(x)$. Although most forward problems are well-posed, their inversions are usually not, with none or multiple

**x**corresponding to the same

**y**. This cripples deterministic or straightforward attempts at inversion. Many works rely on regularization to escape this pitfall [2,3].

In contrast to regularization, Bayesian inference tackles this issue by taking all possible solutions into account and quantifying the uncertainty of each with probability measure [1,2]. From the Bayesian perspective, if we take **x** and **y** both as the realizations of certain random variables, solving the forward problem is equivalent to sampling a dataset {**x**, **y**} from *p*(**x**, **y**) = *p*(**y**|**x**) *p*(**x**) where the likelihood *p*(**y**|**x**) is assumed to exist and can be formulated from the deterministic forward problem. The prior *p*(**x**) can be uniform, Gaussian, or any other distributions depending on the prior information to embed. The inverse problem then corresponds to deriving the posterior *p*(**x**|**y**) of this probabilistic model.

### 2.2 Inverse Design and Inverse Airfoil Design.

**x**given the desired problem parameter

**y**, which are all governed by certain optimality condition in the form of

*R*(

**x**,

**y**) =

**0**. Specifically, we propose to formulate the inverse design problem in general as

*ε*controls the tolerance of error. This probabilistic formulation is reasonable because it assigns a higher posterior probability to design variables

**x**whose residuals are closer to

**0**.

**y**and be optimal under a chosen criterion. This criterion is typically computed via some numerical forward model or simulator. In our airfoil example, the airfoil shape is first processed by some shape parameterization algorithm to obtain design variables

**x**and passed to a mesh generator

*M*for meshing. The generated mesh is then fed together with the desired property

**y**and boundary conditions

**b**into a computational fluid dynamics (CFD) solver to produce a chosen objective

*J*. Ideally, the second-order optimality condition of

*J*−$\u2207xJ(M(x),y,b)=0$ and $HxJ(M(x),y,b)\u2ab00$ – is used to examine whether

**x**has properties

**y**under

**b**and is also optimal in terms of certain requirements. Following Eq. (2), we formulate the inverse airfoil design problem as follows:

*a*)

*b*)

We try to formulate the inverse airfoil design problem as generically as possible here such that it can extend to other domains and tasks, and so we did not specify the components of Eqs. (3*a*) and (3*b*). However, for the specific problems addressed in this paper, the methodology section details these choices.

The analytical form of the posterior *p*(**x**|**y**) is in general infeasible by virtue of the likelihood’s complexity. However, one practical work-around is to use any universal approximator [4] with adequate model capacity—such as any of the mainstream generative models introduced next—to learn *p*(**x**|**y**) from the generated dataset {**x**, **y**} on a data-driven basis.

### 2.3 Generative Models.

Traditional generative models of limited complexity such as Gaussian Mixture Models are in general insufficient to approximate real-world high-dimensional target distributions *p*_{r}(**x**). As of writing, the three most commonly used generative models including their variations are generative adversarial networks (GANs) [5–9], variational autoencoders (VAEs) [10–15], and flow-based models [16–18]. They all share a deep neural network generator *G* : *Z* → *X* in common that serves the same purpose—implicitly representing a distribution *p*_{g}(**x**) via the transformation of latent prior distribution *p*(**z**) of noise. They mainly differ in the way they drive *p*_{g} towards *p*_{r}:

GANs (e.g., vanilla GAN [5]) achieve this by either exactly or approximately minimizing some statistical distances between

*p*_{g}and*p*_{r}, as the next section elucidates. In general, GANs do not provide information about the sample likelihood*p*_{g}(**x**). This is one of their primary weaknesses.VAEs achieve this by indirectly maximizing the sample likelihood via its lower bound. Since

*G*here is not designed to be invertible and merely represents a low-dimensional manifold of measure zero in high-dimensional data space, likelihood maximization becomes possible by introducing Gaussian-like noise in the data space and via variational inference.Flow-based models align the dimension of

**x**and**z**and carefully design the generator*G*to make it invertible such that the analytical form of the density function can be retrieved for direct likelihood maximization.

Currently, GANs often generate higher quality samples than VAE or Flow models. This indicates GANs” good convergence to at least some modes of *p*_{r}. VAE’s convergence issues are usually attributed to posterior collapse [11–13], whereas flow-based models currently need to sacrifice their expressivity on the altar of invertibility.

Our work will focus on GANs’ learning of the posterior *p*(**x**|**y**), for which the generator now takes the conditional form *G* : *Y* × *Z* → *X* to induce its dependency on the condition **y**, forming an approximate distribution *p*_{g}(**x**|**y**).

### 2.4 Recent Work on Inverse Design.

Inverse design based on conditional generative models is an emerging area without a significant body of existing work. So far, within the realm of mechanical engineering, it is almost universally applied to the design of nanomaterials and microstructures, including nano-photonics [19–23], nanoelectronics [24], cellular structures [25], porous materials [26], crystals [27], etc. It has also been applied to the synthesis of kinematic linkages recently [28]. Outside of those isolated areas, it has been employed by inverse molecular design [29], medical imaging [30,31], gene expression inference [32], and image processing [33] among others. Other than using conditional generative models, inverse design was also enabled by using a traditional numerical regime (i.e., a design optimization framework) [34] and other machine learning algorithms (e.g., Gaussian process and reinforcement learning) [35,36] for structural optimization, metamaterial design, and flow sculpting. Furthermore, with regard to the generative model’s application to unconditional design synthesis, which can potentially be adapted for a conditional configuration, several works have emerged lately. For instance, Chen and Fuge demonstrated design synthesis preserving inter-part dependencies using GANs of hierarchical architecture [37]. Oh et al. ran topology optimization and GANs in a loop to explore the design space and synthesize new designs with efficiency [38]. Shu et al. generated 3D models using GANs [39].

To the extent of our knowledge, only two existing works studied the conditional synthesis of curves using generative models, both related to airfoil design. In Ref. [40], Yilmaz and German used conditional GAN to generate airfoils of specified stall conditions and airfoil drag polars. Achour et al. [41] implemented a conditional GAN taking discrete conditions representing four classes as input to generate airfoils falling into the desired quarter of the [*C*_{l}/*C*_{d} ratio × shape area] domain. Our work differs from theirs via the key contributions listed in the end of the introduction (Sec. 1).

## 3 Methodology

In this section, we present the formulation and technical details of our airfoil prediction frameworks. We first introduce the generation of the airfoil dataset for inverse airfoil design. After that, we shed light on conditional BézierGAN (CBGAN) construction and its evolution to the optimal-transport-based conditional entropic BézierGAN (CEBGAN), which is easier to train and yields information about the sample likelihood. We end the section by describing the metrics we use to measure their performances on our generalized regression problems.

### 3.1 Dataset Creation Via Two-Dimensional Airfoil Optimization.

The conditional GAN models’ real dataset consists of optimal shape designs of 2D airfoils. To build the dataset, we perform shape optimization over a range of input boundary conditions to achieve high-performing 2D airfoil designs. We use the SU2^{3} solver (an open-source PDE analysis toolset) to perform gradient-based shape optimization following the general process shown in Fig. 1.

The flowchart starts with a baseline geometry (represented as surface points) and its mesh as input to the design cycle, along with a chosen objective function *J* and a set of design variables **x**. SU2 allows users to choose different objectives such as drag coefficient (*C*_{d}), lift coefficient (*C*_{l}), and efficiency (*C*_{l}/*C*_{d}). In this paper, we choose *C*_{d} as the objective function (*J* = *C*_{d}) with a target *C*_{l} (**y** = *C*_{l})—i.e., the optimization is run at a fixed *C*_{l} which works by updating the angle of attack (*α*) during the optimization such that the resulting *C*_{l} matches the target *C*_{l} value. In SU2, Hicks-Henne bump functions and free-form deformation (FFD) control point approach [42] are used to parameterize 2D airfoils and generate the design variables **x**. SU2 also provides different optimizers (e.g., SLSQP, CG, BFGS, and POWELL) to do the gradient-based optimization. A chosen gradient-based optimizer will orchestrate the design cycle consisting of the direct flow solver, adjoint solver, and geometry/mesh deformation tools in SU2. The iterative design loop proceeds until an optimum is found or reaching a maximum number of optimizer iterations. We develop an integrated computational pipeline that automates the optimization by merging GMSH^{4} (open-source finite element mesh generator) mesher and SU2 optimizer through a python script.

In this paper, we use the gradient-based SLSQP optimizer and the Hick-Henne parameterization method to minimize *C*_{d} such that the efficiency (i.e., *C*_{l}/*C*_{d}) is maximized at a constant *C*_{l}. To find the optimal airfoil design (and mitigate converging to local optima in non-convex problems), for each group of freestream conditions **b** and target properties **y**—i.e., Mach number (Ma), Reynolds number (Re), and target lift coefficient (*C*_{l})—we perform adjoint optimization on eight diverse candidate airfoils and pick the optimized one with the highest efficiency (Fig. 2).

The diverse candidate airfoils are generated by applying Latin Hypercube sampling [43] on the lower-dimensional latent space (of BézierGAN [44]) for diverse coverage (eight samples) and feeding the eight groups of latent codes into the pre-trained BézierGAN model. To generate sufficient data, each of the eight candidate airfoils is optimized (using adjoint optimization) for 1000+ steps with the same 1000+ sets of input freestream conditions (Ma and Re) and target *C*_{l}. We select the final adjoint-optimized airfoil with the highest efficiency and store it in our database of samples.

### 3.2 BézierGAN.

#### 3.2.1 Bézier Layer and Regularization.

BézierGAN is essentially a specialized InfoGAN [46]. Its only difference to its predecessor is the additional Bézier layer mounted on its generator and the accompanying regularization loss, which ensure the generation of smooth Bézier curves and make it suitable for geometry-related engineering applications.

**P**,

**w**, and

*n*are, respectively, control points, weights, and predetermined degree defining the rational Bézier curve, and

**X**is the tensor of data points sampled from the Bézier curve according to the

*m*+ 1 parameter variables

**u**that determine the sampling intervals. For numerical stability, this layer is usually evaluated on a logarithmic scale. It serves as the final output layer of the generator and does not hinder backpropagation, due to its differentiability. In the training process of BézierGAN, we can apply additional regularizations to

**P**and

**w**to further rectify the quality of the Bézier curve. Apart from this distinction, the other part of BézierGAN’s training is the same as those of vanilla GAN and InfoGAN and can also benefit from their improvements.

#### 3.2.2 Minimax Game and Probabilistic Perspective.

*D*is the discriminator,

*G*is the generator,

**z**is the noise, and

**x**are the real samples we are trying to counterfeit using

*G*after running this game several times. Doing so encourages the distribution

*p*

_{g}(

**x**) implicitly represented by

*G*to converge to

*p*

_{r}(

**x**), which is the underlying distribution generating real data. Past work has proven that when the discriminator is trained to optimum before each update of the generator, we are equivalently minimizing the Jensen–Shannon divergence between

*p*

_{r}and

*p*

_{g}, i.e., JSD(

*p*

_{r}‖

*p*

_{g}) [5]. In practice, however, because of inefficiency, vanishing gradients, and instability [47], in each epoch, the discriminator is only updated for a few iterations and thus barely arrives at its optimum; so this theory is just a reasonable approximation.

Nevertheless, this probabilistic perspective does provide researchers with many insights and new directions to delve into. Although the mode collapse plaguing vanilla GAN cannot be attributed to JS divergence in full [48,49], the minimax game based on it still plays a role in this and other defects because several proposals that replace this game assuming JS divergence to be the culprit have achieved significant improvement [6–9]. One prominent work, if not the most among them, is the Wasserstein GAN (WGAN) [8], which is established on optimal transport theory that we shall introduce in the next subsection.

#### 3.2.3 Conditional Formulation and CBGAN.

**y**stands for the condition corresponding to

**x**and both the discriminator and generator additionally take

**y**as a input, so that

*G*implicitly represents a conditional distribution

*p*

_{g}(

**x**|

**y**). From the same JS divergence point of view, it readily follows that this minimax game is approximately minimizing JSD(

*p*

_{r}(

**x**,

**y**) ‖

*p*

_{g}(

**x**|

**y**)

*p*

_{r}(

**y**)) to lead

*p*

_{g}(

**x**|

**y**) to

*p*

_{r}(

**x**|

**y**).

Figure 3 presents the architecture of CBGAN for our inverse airfoil design task. Its generator takes the Gaussian noise vector **z**, the desired property **y** which is *C*_{l} here, and the freestream conditions **b** which are Ma and Re as input, and output the design variables **x**, which are the 192 data points on the Bézier curve of an airfoil and the angle of attack *α*. The discriminator, on the other hand, takes **x**, **y**, and **b** as input and produces the probability of being an optimal airfoil.

### 3.3 Entropic BézierGAN.

Arjovsky and Bottou [47] hypothesize that one critical defect vanilla GANs suffer from is the discontinuity of JS divergence over distributions concentrated on low-dimensional manifolds embedded in high-dimensional spaces. These low-dimensional manifolds are usually misaligned, so it prompts the discriminator to overpower the generator and leads to vanishing gradients. To overcome this convergence issue, WGANs equipped with Wasserstein distance, a special case of optimal transport (OT) distance, came into play. This modification improved the stability of GAN’s training process drastically and significantly alleviated mode collapse [8,9] common in early GAN models. Later, these WGANs were further generalized to a broader range of OT distances, and the Lipschitz smoothness constraint originally enforced with crudeness was also replaced with an entropic soft regularization term in the loss function [51–54]. These GANs are dubbed Smoothed WGAN, OT GAN, or Entropic GAN (EGAN), and the lower bound estimation of likelihood is also enabled due to their special properties [54].

Many real-world datasets indeed reside on low-dimensional manifolds [55] and the airfoil dataset should be no exception, as reflected in Ref. [44]. Therefore, we hypothesized that EGAN based on optimal transport might carry corresponding benefits for inverse design. Our entropic sibling of BézierGAN also employs the Bézier layer as the final layer of the generator. Likewise, it has no difference to regular EGANs other than that.

#### 3.3.1 Optimal Transport With Entropic Regularization.

*p*

_{r}(

**x**) and the generator’s approximate distribution

*p*

_{g}(

**x**)—the Kantorovich optimal transport distance regularized with entropy or equivalently KL divergence has the following expression [56,57]:

*p*

_{r},

*p*

_{g}) is the set of joint distributions $PX,X^$ whose marginal distributions equal

*p*

_{r}and

*p*

_{g},

*c*is cost function usually symmetric positive, and

*λ*≥ 0 is a weight controlling the degree of regularization. When

*c*= ‖ · ‖

_{2}and

*λ*= 0, this OT distance reduces to the well-known Wasserstein-1 distance used in WGAN [8,9], minimizing which is how WGAN is trained.

*λ*> 0. Though a direct optimization of Eq. (7) is highly intractable, due to the Fenchel–Rockafellar theorem, the strong duality holds so that we can evaluate its dual instead [56–58]:

*f*and

*g*are the Lagrange multipliers that can be optimized through a universal approximator like neural networks or the Sinkhorn algorithm introduced next. Equipped with this OT probability distance, in EGAN, instead of minimizing the JS divergence, we can train the generator

*G*, bringing

*p*

_{g}to

*p*

_{r}via [52,54]:

#### 3.3.2 Sinkhorn Divergence.

#### 3.3.3 Sinkhorn Algorithm.

Evaluating Eq. (8) via parameterized *f* and *g* represented by neural networks, as in Refs. [8,9,54], is time-consuming. To accelerate the training process, we can realize this clumsy maximization with the Sinkhorn algorithm [58], which is the coordinate ascent coming from the first-order optimality condition for Eq. (8). Normally for a *λ* not too close to 0, the Sinkhorn algorithm can converge within milliseconds [56]. Equation (11) can then be evaluated by applying the Sinkhorn algorithm to each of its three terms.

#### 3.3.4 Conditional Formulation and CEBGAN.

*p*

_{r}(

**x**|

**y**) with

*p*

_{g}(

**x**|

**y**) through minimizing the Sinkhorn divergence $S\lambda (pr(x,y)\Vert pg(x\u2223y)pr(y))$. This indicates we have to design a cost function $c([x,y],[x^,y^])$ for the prediction-condition bundles. One effortless way is to construct it by $c([x,y],[x^,y^])=c1(x,x^)+c2(y,y^)$. Specifically, for the evaluation of $S\lambda $ in our airfoil design application, we set

*λ*= 5 and build a shift-invariant cost function

*c*with

*L*

_{1}norm by

#### 3.3.5 Conditional Surrogate Log-Likelihood.

*c*is shift-invariant:

*H*is the Shannon-entropy function, $PZ$ is the empirical distribution of sampled noise, and $v\u22c6$ (Eq. (9)) is evaluated with the optimal $f\u22c6$ and $g\u22c6$. Then, for a given condition

**y**and a series of generated samples {

**x**

_{i}=

*G*(

**z**

_{i},

**y**)}

_{n}, we can use the RHS of Eq. (16) as the surrogate log-likelihood (SLL) to evaluate the plausibility of the bundles {[

**x**

_{i},

**y**]}

_{n}and select the ones with higher SLL as the samples more likely to have good quality. While sample likelihood is not always reliable for this purpose (especially when the generative model is ill-converged [59]) and the SLL only provides a lower bound, it is nevertheless our only resort for evaluating GAN-based sample likelihoods. (Recall from Sec. 2.3 that the lack of explicit exact sample likelihoods is GAN’s key weakness compared to VAE- or Flow-based models.) We will investigate its effectiveness statistically in the experiment section.

### 3.4 Metrics.

To evaluate the quality of our conditional GAN’s approximation to $p(x\u2223y,b,O)$ for the inverse airfoil design in a quantitative manner, we compute two classes of metrics. First, we compute the kernel maximum mean discrepancy (MMD)—a measure of distributional fit—that measures how well the generative model matches the training data. Second, we compute how the generative model reduces the optimality gap between the ground-truth optimal airfoil and the one produced by the conditional generative model. This directly measures how the generative model affects downstream optimization performance. We evaluate this in both the instantaneous setting and cumulatively, such as when the generative model warm-starts a traditional optimization process. In addition, we calculate the Pearson correlation coefficient between the surrogate log-likelihood of generated airfoils and their actual performance and plot its distribution.

#### 3.4.1 Maximum Mean Discrepancy.

*p*

_{r}(

**x**,

**y**) and

*p*

_{g}(

**x|**)

*y**p*

_{r}(

**y**). The MMD between two distributions

*p*(

*x*) and

*q*(

*y*) is defined by

*σ*= 1 as

*k*in our application as this is a common choice.

Though not as intuitive as traditional regression metrics such as MSE in giving an intuitive estimation of the error magnitude directly in the data space, MMD is much more justified for this work as MSE essentially comes from the KL divergence between the data distribution and the unimodal Gaussian regression model, whereas *p*_{g}(**x**|**y**) here is no longer such a simplified conditional distribution.

#### 3.4.2 Reduction in Instantaneous Optimality Gap.

Regarding the GAN prediction as a one-step optimization, instantaneous optimality gap checks the efficacy of the conditional GANs by comparing the performance (*C*_{l}/*C*_{d} ratio in our case) of the predicted airfoil to the performance of the optimized airfoil after one step (iteration) of the iterative adjoint method (i.e., *h*_{i=1} in Eq. (19)). Regardless of the instant time saving, we want to show that the one-step prediction using conditional GANs also surpasses an adjoint step using gradient-based optimization to improve the airfoil performance. If we ignore the time cost of the instant one-step prediction and compare the GAN predicted airfoil to the original design (i.e., *h*_{0} without any optimization) directly, the performance can have a more significant improvement (as demonstrated in Fig. 7).

#### 3.4.3 Reduction in Cumulative Optimality Gap.

While the Instantaneous Optimality Gap (often referred to and used in papers on so-called zero-shot optimization) is a useful performance measure, it ignores the fact that an optimizer can further refine a conditional generative model’s prediction. That is, a conditional model might *warm-start* a further optimization. This section describes how we calculate the amount of effort the conditional GANs can save in those cases.

*n*-iteration optimization history {

*h*

_{i}}

_{n}which records the performance

*h*

_{i}(

*i.e.*,

*C*

_{l}/

*C*

_{d}ratio) at each optimization iteration

*i*, after calculating the percentage of each

*h*

_{i}with respect to the optimal value

*h*

_{n}in the history, we define the

*cumulative optimality gap*(COG) as the area enclosed by the optimization history curve normalized in percentage and the horizontal line of $100%$ corresponding to

*h*

_{n}(a specific example can be seen in Fig. 9). In other words,

*relative reduction in COG*, namely

#### 3.4.4 Correlation Between Surrogate Log-Likelihood and C_{l}/C_{d} Efficiency.

*C*

_{l}/

*C*

_{d}efficiency in our case. This can be accomplished by first generating

*n*airfoils for each of the

*m*input conditions in the test set. Then, for each input condition indexed by

*i*, we evaluate the corresponding airfoil predictions’ surrogate values ${sj(i)}n$ and

*C*

_{l}/

*C*

_{d}efficiencies ${ej(i)}n$ and calculate the

*Pearson correlation coefficient*

*r*

_{i}between them:

*m*coefficients {

*r*

_{i}}

_{m}using a histogram. If the SLL is a practical indicator of a sample’s optimality, we can expect most correlation coefficients to be at least greater than 0 and, preferably, closer to the ideal or maximum value of 1.

## 4 Experiments, Results, and Discussion

We first use two simple experiments in low-dimensional spaces to visually study and illustrate the ability of and the performance differences between CGAN and CEGAN, specifically in:

Ability to converge to complicated conditional distributions.

Ability to capture multimodality of the distributions.

Because of the complexity of the posteriors, these two factors are predominantly the foundation of a generative model’s good performance in tackling high-dimensional inverse problems. Only a model with these two abilities can generate samples of good fidelity and handle the ubiquitous inversion ambiguity in inverse problems, i.e., having multiple solutions to the same input.

After using these simple illustrative examples to build intuition, we then employ both generative models on the more realistic problem of learning an optimal 2D airfoil manifold, which is intrinsically a high-dimensional conditional distribution. We compare their learning performance through the lens of reducing the optimality gap and time of airfoil CFD optimization, wherein we use the generative models to provide a good warm-start initialization in the neighborhood of the final solution. Moreover, we investigate the effectiveness of SLL in assessing generated sample’s quality.

### 4.1 Revisiting Regression.

This experiment aims to illustrate the ability of conditional GANs in approximating complicated conditional distributions and the performance differences between a traditional Gaussian regression model, the vanilla conditional GAN, and the entropic one based on optimal transport.

*x*

_{i},

*y*

_{i}} composed of 200 samples is then sampled from

*p*(

*x*,

*y*) =

*p*(

*y*|

*x*)

*p*(

*x*) where $p(x)=U(\u22121,1)$, a uniform distribution between −1 and 1. These samples are then fed into the generative models for training. They are illustrated in Fig. 4 as dots underneath, with each color corresponding to each Gaussian component.

Three regression models are then selected to retrieve this multimodal conditional distribution for comparison. The first one is a traditional unimodal Gaussian regression model $q(y\u2223x)=N(y\u2223f(x),\sigma 2)$ whose mean function *f* is represented by a neural network and trained with MSE loss as usual. The result is shown on the left of Fig. 4, with the mean function *f* plotted on top of the data points. Obviously, it can converge to neither one of the two modes because MSE is only minimized when the prediction approaches the average of the targets. From a probability point of view, this is because MSE originates from the KL divergence— $KL(p(x,y)\Vert q(y\u2223x)p(x))$—while minimizing this divergence can only lead *q*(*y* | *x*) to average across all the modes of *p*(*y* | *x*) [61]. This further explains why MSE is not an ideal metric for assessing learning results in more complicated applications wherein having mode collapse yet producing good quality samples is usually preferred over a mediocre overall convergence.

This conditional distribution is then approximated by CGAN and CEGAN, both having the same generator structure representing *q*(*y* | *x*) with more complexity than the former Gaussian model. After the same amount of training epochs, we can see from the sample points they generate—the dots on top of the data points in Fig. 4, with the CGAN’s prediction in the middle and the CEGAN’s on the right—that they both learn to recover the multimodality of the conditional distribution and have comparable convergence. Because of the additional discriminator training phase, each training epoch of CGAN is longer than that of CEGAN under our hyperparameter configuration, not to mention that it takes extra effort in tuning hyperparameters to redress the delicate balance between the discriminator and the generator when training vanilla GANs. These bolster our preference for CEGAN over CGAN, especially when applied to learning complicated conditional distributions.

### 4.2 Mode Collapse Examination.

In this experiment, we demonstrate EGAN’s ability to overcome mode collapse and the convergence issues in vanilla GANs and handle the multimodality of the target distribution.

*p*(

**x**) equivalently as

*p*

_{X|Y=y}(

**x**), i.e., a conditional distribution with its condition

*Y*fixed. Success in learning this distribution is a prerequisite for a high-performing conditional generative model.

The learning results are demonstrated in Fig. 5, with the 800 samples plotted as dots underneath and the generated samples of each model as dots above. The vanilla GAN fails to sufficiently converge to all modes in this case, while the EGAN trained with Sinkhorn divergence successfully captures all the 8 modes. Similar convergence issues of vanilla GAN are also reported in many works such as Ref. [49]. This result indicates that EGAN is a much better choice for solving inverse design problems where inversion ambiguity may exist.

### 4.3 Airfoil Prediction.

After grasping the general performance distinctions between CGAN and CEGAN with the help of those low-dimensional toy problems, we now turn to a more realistic problem. In this final experiment, with the Bézier layer equipped, we investigate CBGAN and CEBGAN’s ability to learn the high-dimensional posterior of optimal airfoils conditioned on the freestream conditions and target property. We generate the corresponding near-optimal airfoil on unseen input conditions and also use this as a warm-start initialization to short-circuit the subsequent optimizations. We quantify and compare these two GANs’ performances using the metrics defined in Sec. 3.4. Regarding CEBGAN, we examine its SLL’s efficacy in distinguishing good predictions from the poor ones after CEBGAN converges.

#### 4.3.1 Dataset and Training.

Our dataset contains 1245 optimized airfoils with corresponding input freestream conditions **b** and the target properties **y** (i.e., Ma, Re, and *C*_{l}).

**Optimized airfoils:** Each airfoil is optimized using the adjoint method described in Sec. 3.1. Each optimized airfoil consists of 192 surface points (192 × 2 coordinates) along with the optimized angle of attack, *α* (Fig. 6).

**Input conditions:** We generate *N* groups of Ma, Re, and *C*_{l} through the Latin Hypercube sampling strategy to evenly cover the design space. In this paper, we have Ma ranging from 0.2 to 0.9, Re from 10^{7} to 10^{8}, and *C*_{l} from 0.8 to 1.4. We excluded the input conditions leading to airfoil optimization failure (divergence that leads to negative or abnormally high efficiency). The final dataset contains 1245 groups of input conditions (Ma, Re, and *C*_{l}) and the corresponding 1245 optimized airfoils. The average time elapse of each SU2 airfoil optimization instance is 55.7 mins on the Deepthought2 HPC^{5} at UMD, with 500 instances running in parallel.

The 1245 airfoils are split into two parts: 995 airfoils for four-fold cross-validation and training the final generative models, and 250 airfoils reserved for testing. During the cross-validation phase, MMD (Sec. 3.4.1) is used for evaluating GANs’ convergence to the distribution of the validation set, against the value of which we adjust the hyperparameters accordingly, setting them to the candidate having the lowest MMD. Once every hyperparameter value is determined, all the 995 training samples are fed into CBGAN and CEBGAN to train the final generative models. The CBGAN is trained with a batch size of 32 for 160,000 iterations, and it takes 3h 51m to finish, while the CEBGAN with a batch size of 128 for 120,000 iterations and takes 2h. Both are trained on NVIDIA Tesla V100 DGXS 32GB GPU. It is hard to directly compare their training speed as they have different hyperparameters, training algorithms, and final performances, but roughly in the sense of time cost per iteration, CEBGAN is about 30.7% faster to train than CBGAN.

#### 4.3.2 Quantitative Performance of Conditional GANs.

The freestream conditions (Ma and Re) and target lift coefficients (*C*_{l}) of the 250 testing airfoils are fed into the trained CBGAN and CEBGAN models to generate 250 airfoils with fixed zero noise, respectively. The predicted airfoils’ efficiencies are evaluated by the SU2 simulator [62] and benchmarked with the ground truth (i.e., the optimal efficiency from a converged adjoint optimization) to demonstrate the performance of the two conditional GANs.

Specifically, for performance (*C*_{l}/*C*_{d}) validation of a predicted airfoil, the corresponding input conditions (Ma, Re, and *C*_{l}) of a predicted airfoil (including predicted *α*) together with other default freestream conditions^{6} are wrapped into a SU2 configuration file. GMSH generates a 2D mesh of the predicted airfoil (i.e., 192 × 2 coordinates) (as shown in Fig. 1), which we then write into an SU2 format. The efficiency is evaluated by solving the RANS equations on the predicted airfoil using air with the corresponding freestream conditions and target *C*_{l}.

The ground-truth efficiency value is acquired by performing the iterative adjoint optimization exhibited in Fig. 1. With the same input conditions, the eight candidate airfoils (Fig. 2) are iteratively optimized. The optimized one with the highest efficiency is the ground-truth optimal design, and the highest efficiency is the ground-truth optimal efficiency.

For easy comparison, we normalize the 250 testing samples’ efficiencies into percentages. The ground-truth optimal efficiency (i.e., *h*_{n} in Eq. (19)) of each airfoil is represented as $100%$, which is used as the comparison baseline. The other types of efficiencies can be represented as percentages by dividing with their corresponding ground-truth efficiency (i.e., *h*_{i}/*h*_{n}), indicating how close they are to the optimal efficiency.

*Reduction in instantaneous optimality gap:* For a more intuitive comparison, we first compute and compare the average reduction in instantaneous optimality gap (Sec. 3.4.2) of the 250 testing airfoils in different scenarios. Figure 7 illustrates the comparison:

The “Final: Adjoint” line indicates the average ground-truth efficiency of the 250 testing samples. As every ground-truth efficiency is represented as $100%$, the average has no variance.

The “Initial: CBGAN” line indicates the average efficiency of the 250 predicted airfoils from CBGAN, and the “Initial: CEBGAN” line indicates the average efficiency of the 250 CEBGAN generated airfoils. Without using any further optimization, the CBGAN predicted airfoils achieve an average of $80.8%$ of the ground-truth efficiency while the CEBGAN predicted airfoils achieve an average of $95.8%$ of the ground-truth efficiency.

The “No Optimization” line indicates the average efficiency ($22.9%$) of the original 250 airfoils before using adjoint optimization.

The “Initial: Adjoint” line indicates the average efficiency of the optimized 250 airfoils after taking one step (iteration) of adjoint optimization using the SU2 optimizer. These airfoils achieve an average of $49.3%$ of the ground-truth efficiency.

The box plot underneath indicates the minimum, the first quartile, the median, the third quartile, and the maximum of the 250 testing airfoils at each of the three scenarios (i.e., GAN prediction, one-step adjoint, and initial design). The medians—75.9%, 84.2%, 66.4%, 19.4% from left to right—do not coincide well with the means above, which suggests skewness in the distributions of instantaneous optimality gaps.

Both conditional GANs can quickly (<1 s) predict airfoils whose performance (i.e., efficiency) has been significantly improved (from $22.9%$ to $80.8%$ with CBGAN and $95.8%$ with CEBGAN for these 250 testing samples) compared to the original designs. This represents a reduction in instantaneous optimality gap (with respect to true optimal *C*_{l}/*C*_{d}). For CBGAN, the instantaneous gap is reduced by a factor of 4.0 × compared to initial airfoils and 2.6 × compared to one step of the adjoint method. For CEBGAN, the two numbers are 18.4 × and 12.1 ×, respectively. The average efficiency of the CEBGAN predicted airfoils has been close to the ground-truth one ($100%$). We have also computed a two-sample t-test on their difference in instantaneous optimality gap. This test resulted in *p* = 0.0001. Though the conditional GANs cannot directly generate the ground-truth optimal airfoils, they surpass an adjoint step and provide a warm start point for a restart optimization, as we show next.

*Reduction in Cumulative Optimality Gap:* We demonstrate the reduction in cumulative optimality gap (Sec. 3.4.3) by comparing the iterative optimization histories (averaged) of the 250 testing samples between the original adjoint optimization (using original airfoils as a start) and the restart adjoint optimization (using GAN predicted airfoils as a warm start). Figure 8 illustrates the comparison:

The “Final: Adjoint” line, the “No Optimization” line, and the three dashed lines indicate the same average efficiencies in Fig. 7.

The “Optimization: Adjoint” curve indicates the original adjoint optimization history of the 250 testing samples (in average). The average number of evaluations for the original adjoint optimization is 41. The fill of the same color indicates the efficiency standard deviation (±StdDev) of the 250 testing samples for each evaluation.

The “Optimization: CBGAN” and “Optimization: CEBGAN” curves indicate the restart adjoint optimization histories of the 250 testing samples (in average) with CBGAN and CEBGAN, respectively. The average number of evaluations of the restart adjoint optimization is 35 for CBGAN and 20 for CEBGAN. The fills of the same respective colors indicate the efficiency standard deviation (±StdDev) of the 250 testing samples for each evaluation.

The “Final: CBGAN” and “Final: CEBGAN” lines indicate the average efficiency of the optimal airfoils after the restart optimization with CBGAN and CEBGAN, respectively. As we can see, the restart optimization can lead to a slightly higher efficiency, which is $102.6%$ of the ground-truth efficiency for CBGAN and $98.2%$ of the ground-truth efficiency for CEBGAN.

If we compare the area enveloped by the “Optimization: Adjoint” curve, y-axis, and the “Final: Adjoint” line to the area enveloped by the “Optimization: CBGAN”/“Optimization: CEBGAN” curve, y-axis, and the “Final: CBGAN”/“Final: CEBGAN” line (as described in Sec. 3.4.3), we can see a significant relative reduction in cumulative optimality gap (RiCOG) by $41.6%$ for CBGAN and $91.3%$ for CEBGAN.

It is also worth noticing that both the CEBGAN and CBGAN can yield some predictions that either instantaneously goes beyond the 100% ground-truth optimal efficiency (as in Fig. 7), or can be further optimized to surpass this threshold (as in Fig. 8). This indicates that the dataset contains many locally optimal airfoil designs, even though we attempt to select the best one among the eight candidates, as mentioned in Sec. 3.1. Therefore, the conditional GANs are still learning the distribution of local minima. More discussion on this will be presented in Sec. 5.

**Reduction in Cumulative Optimality Gap of an Example Airfoil:** We demonstrate a single concrete case of one example airfoil (predicted using both CBGAN and CEBGAN) to show how the results in Fig. 4 translate to a single data point. Figure 9 demonstrates the example case:

The “Initial: Adjoint”, “Initial: CBGAN” and “Initial: CEBGAN” lines indicate the efficiencies of the one-step adjoint, CBGAN predicted, and CEBGAN predicted airfoils, respectively. The "No Optimization" line indicate the efficiency of the original airfoil.

The “Final: Adjoint”, “Final: CBGAN” and “Final: CEBGAN” lines indicate the efficiencies of the ground-truth optimal, CBGAN restart optimized, and CEBGAN restart optimized airfoils, respectively.

The “Optimization: Adjoint”, “Optimization: CBGAN” and “Optimization: CEBGAN” curves indicate the original, CBGAN, and CEBGAN restart adjoint optimization histories of this specific example airfoil.

Instead of using the percentage of the ground-truth optimal efficiency, in Fig. 9, the y-axis indicates the actual efficiency values (these are normalized to percentages to make multiple airfoils comparable on the same axis in Figs. 7 and 8). In Fig. 9, other than demonstrating the reduction in cumulative optimality gap of a specific case, we can also show the specific status of the airfoil in different stages. For example, we provide the airfoil shapes in four different stages (original airfoil, optimal airfoil after original adjoint optimization, GAN predicted airfoils, and the optimized airfoils after restart adjoint optimization). As a warm start, the GAN predicted airfoils can achieve the optima (even higher efficiency) faster than using the original adjoint optimization. With the GAN predicted airfoil, even if we halt the restart optimization halfway, we can still achieve an airfoil of similar quality to a full adjoint run. Figure 10 uses histograms and kernel density estimation (KDE) to show the distribution of the relative reduction in cumulative optimality gap (RiCOG) for the 250 testing samples.

#### 4.3.3 Practicability of the Surrogate Log-Likelihood.

In the above example, we fixed the noise vector of the generator to demonstrate an example case. However, fixing this noise vector can only let us produce a single sample for each input condition. This becomes inappropriate when the diversity of predictions is critical or when inversion ambiguity exists. Multiple modes can correspond to the same input condition (e.g., the conditional distribution in Fig. 5 or multiple, approximately equally good airfoils for a given set of boundary conditions). In that case, we need to randomly sample a large batch of noise vectors from the noise distribution *p*(**z**) and use them to produce many valid samples from across all modes.

Assuming we can sample the generator, how do we distinguish samples with a higher likelihood of being high quality? Conceptually, these samples should lie within the vicinity of the conditional distribution modes, while low-performing samples should lie in the long tails of the distribution. How can we assess this without running any further costly CFD simulations?

As elucidated before, unlike traditional GANs, EGAN provides us with a surrogate log-likelihood which is essentially a lower bound of the sample’s likelihood. It is then promising to use this surrogate value as an indicator of a sample’s quality. That is, the higher the surrogate value, the better the quality, provided the EGAN converges well to the target conditional distribution, as demonstrated by Balaji et al. [54]. To verify this experimentally for CEBGAN, we randomly sample 10 airfoils for each of the 250 test input conditions and calculate the Pearson correlation coefficient between the surrogate values of the airfoils and their ground-truth *C*_{l}/*C*_{d} efficiencies, as introduced in Sec. 3.4.4. If the surrogate likelihood is a meaningfully useful quality measure, it should have a high correlation with the expensive-to-compute ground truth CFD simulation values.

For this task, we draw 4096 samples from the distribution (Eq. (17)) when evaluating the expectation in Eq. (16). The distribution of these coefficients for the 250 test samples is illustrated in Fig. 11. Unfortunately, the correlations are almost symmetrically distributed around 0. This implies that the SLL does not usefully discriminate between high- and low-quality predictions for inverse airfoil design. We may also use a scatter plot to illustrate the correlation between SLL and *C*_{l}/*C*_{d} ratio more concretely. Before doing this, for each of the 250 input conditions, we need to normalize the SLL and the *C*_{l}/*C*_{d} ratio over all the 10 trials with mean and standard deviation to unify their range across input conditions and mitigate the distortion of outliers. Figure 12 demonstrates the distribution of these normalized points, where we can neither notice any apparent positive correlation. More discussions on the possible causes are included in Sec. 5.

## 5 Limitations and Future Work

Our inverse design methodology shows its potential in accelerating design optimization. However, before extrapolating it to the other tasks, one should be aware of several limitations of our data-driven inverse design paradigm.

First, because of the data-driven training process, our GANs’ performance depends on the quality of the training dataset. Thus, if our optimal airfoil dataset obtained through the CFD adjoint method consists of many local optima, we should not expect our GANs to always yield global optima on either seen or unseen input conditions. We observe that our GANs can generate airfoils of higher performance than our targets in the dataset, so this local optima issue indeed exists in our current dataset. Likewise, the diversity of training samples directly affects the diversity of conditional GAN’s generation. While generating the dataset, we used eight airfoils generated by the BézierGAN as our initialization points, but for at least some input conditions as shown in Fig. 2, the final optimized airfoils more or less have shapes similar to their original counterparts. It is impossible that these eight typical shapes can represent all the airfoils. Therefore, just like in many other design tasks, we made global optimality and diversity optional. Despite this limitation, our conditional GANs still demonstrate their ability to accurately approximate the posterior imperfectly represented by the dataset. This data-driven paradigm is still valuable in many tasks where global optimality and diversity of results are not mandatory.

One would likely then propose to increase the dataset size to overcome these issues. However, the optimal airfoil dataset generation is very time-consuming, and it is impractical to retrieve all optima. One possible solution is to use a low-dimensional design representation, as presented in Ref. [44], to accelerate the optimization process. Another more radical way is to discard this data generating process and use variational inference to directly learn the posterior, which could be an interesting research topic in the future.

In addition, the surrogate log likelihood—the lower bound to the exact sample log likelihood—cannot effectively differentiate between samples of good and bad quality. There are several hypotheses on this that might be worth probing in the future: 1. Sample likelihood is just not very related to sample’s quality, let alone its lower bound approximation; 2. The lower bound is not tight enough to reveal the sample likelihood’s positive correlation with sample quality; 3. This is cause by the incompleteness of the training dataset such that many designs of better performance are either downplayed or neglected in this empirical distribution, thus assigned lower likelihood by the CEGAN after training, but they somehow get recovered when generating the predictions.

Furthermore, this paper did not address two important factors that affect real-world use cases. First, we did not address scalability—the relation between the design problem’s dimensionality and the number of data samples required to maintain a good approximation. Higher dimensions could occur from the design parameterization (e.g., 3D versus 2D), the input conditions (e.g., including heat transfer inputs), or the design objectives (e.g., optimizing not only drag, but also manufacturability or vibration). Increasing any of these would complicate Inverse Design and require a larger number of data samples. Second, we did not address how to handle design constraints (aside from those not implicitly encoded in the training dataset). Future work could address how inverse design can more explicitly adapt and capture new design constraints without requiring retraining.

## 6 Conclusions

In this work, we employed two conditional GANs—CBGAN and CEBGAN—to approximate the posterior of the optimal airfoils conditioned on the freestream conditions and the target properties. Then, given unseen input conditions, we could generate warm start initialization points near the optima and accelerate the subsequent airfoil shape optimization. Our results show that both generative models can accomplish this task, but the CEBGAN—CEGAN based on regularized optimal transport and equipped with Bézier layer—performs uniformly better than that of vanilla CBGAN, either in terms of training speed or prediction accuracy among other metrics. CEGAN’s advantage in approximating the multimodal distribution also manifests in simple toy examples. In addition, unlike traditional GANs, CEGAN also provides us with the unique ability to approximate the sample likelihood via a lower bound dubbed the surrogate log-likelihood, though its potential in selecting good samples remains to be investigated and uncovered.

Our GAN-based probabilistic inverse design paradigm is applicable to inverse airfoil design and the other inverse problems where high-dimensional posteriors need to be approximated. The Bézier layer we employed is suitable for curve-related design problems, but in other problems we can replace the Bézier layer with different architectures. For example, we can use a free-form deformation (FFD) layer for 3D airfoil design problems [63], or use standard 2D/3D convolutional layers for pixelated/voxelized designs [38]. The advantages brought by the conditional GAN framework and the entropic regularization should be invariant to such architecture adjustment. In future work, we will test our method on more applications to demonstrate this point. The performance and surrogate log-likelihood of CEGAN suggest that it is a GAN more suitable for these posterior-retrieving tasks.

Though we solely focused on GANs’ application in this inverse design work, it does not exclude the potential of using other generative models for similar tasks. Our preference for GANs is based on the widely alleged good quality of its generated samples in other works and the possibility of extending its power to more complicated inverse designs in the future. However, for many inverse design problems, other generative models like VAEs and flow-based models may also generate predictions of comparable quality, and given their more straightforward and accurate sample likelihood evaluation process, they may even be better candidates than GANs in certain tasks. This is worth investigating in the future.

## Footnotes

https://su2code.github.io/. SU2 can deal with different kinds of physical problems by choosing different solvers such as Euler’s Navier-Stokes” and Reynolds-averaged Navier–Stokes (RANS)’ equations. In this paper, we optimize airfoils by solving the RANS equation.

In this paper, air’s freestream pressure is 101,325 Pa, and the freestream temperature is 288.15 K.

See Note ^{2}.

## Acknowledgment

This research was supported in part by funding from the U.S. Department of Energy’s Advanced Research Projects Agency-Energy (ARPA-E) DIFFERENTIATE funding opportunity through award DE-AR0001216. The authors acknowledge the University of Maryland supercomputing resources^{7} made available for conducting the research reported in this paper.

## Conflict of Interest

There are no conflicts of interest.

## Data Availability Statement

The data and information that support the findings of this article are freely available.^{8} The authors attest that all data for this study are included in the paper.