Abstract

Reinforcement learning algorithms can autonomously learn to search a design space for high-performance solutions. However, modern engineering often entails the use of computationally intensive simulation, which can lead to slower design timelines with highly iterative approaches such as reinforcement learning. This work provides a reinforcement learning framework that leverages models of varying fidelity to enable an effective solution search while reducing overall computational needs. Specifically, it utilizes models of varying fidelity while training the agent, iteratively progressing from low- to high fidelity. To demonstrate the effectiveness of the proposed framework, we apply it to two multimodal multi-objective constrained mixed integer nonlinear design problems involving the components of a ground and aerial vehicle. Specifically, for each problem, we utilize a high-fidelity and a low-fidelity deep neural network surrogate model, trained on performance data generated from underlying ground truth models. A tradeoff between solution quality and the proportion of low-fidelity surrogate model usage is observed. Specifically, high-quality solutions are achieved with substantial reductions in computational expense, showcasing the effectiveness of the framework for design problems where the use of just a high-fidelity model is infeasible. This solution quality-computational efficiency tradeoff is contextualized by visualizing the exploration behavior of the design agents.

1 Introduction

The discrete and multimodal nature of high-dimensional engineering design problems makes design synthesis challenging. This challenge is potentially met through deep Reinforcement Learning (RL) algorithms, which can autonomously learn to explore the design space based on the nature of the design problem [14]. However, in many modern design problems, expensive high-fidelity representations and simulations are necessary to accurately evaluate the performance of a potential solution. This can adversely affect the overall computational efficiency of deep RL algorithms, leading to slower design timelines, and higher economic and environmental risks [5,6]. As lower-fidelity simulations still contain potentially valuable information about the performance of a solution, they can be utilized to reduce the computational expense of exploration. This work explores the tradeoff between solution performance and computational efficiency when using different combinations of high- and low-fidelity models. Specifically, this work proposes an RL framework that utilizes models of varying fidelity [712] to search the design space for high-performance solutions.

Engineering analysis models serve the purpose of characterizing the relationship between the design of an engineered system and its performance attributes. The degree to which such a model can reproduce the behavior of a real-world system is referred to as model fidelity [11]. High-fidelity models typically use computationally expensive numerical simulations to accurately capture the underlying relationship of interest. Low-fidelity models are usually a simplification of the high-fidelity model. This can involve utilizing simpler geometric representations or physics models, simulating in a reduced dimensional space, using partially converged results, or preparing a data-fit surrogate model with high-fidelity simulation data [7]. While low-fidelity models are less accurate, they offer the advantage of being computationally cheaper than high-fidelity ones.

The combination of models at varying fidelity levels is common in engineering practice [7,9,10,12,13]. For instance, a first-order approximation is found in the design of buildings for seismic loading, where the dominant vibrational frequency of a building is approximated as the reciprocal of the number of stories in the building [14]. The approximation is typically used to complement high-fidelity seismic damage simulations for the safety of dense urban areas [15]. A variable-fidelity strategy that utilizes low-fidelity models for early-stage design exploration and high-fidelity models for the later stages may be able to balance the tradeoff between computational efficiency and solution quality. Further, the utilization of RL in such a variable-fidelity strategy makes it possible to learn exploration strategies that benefit from the varied feedback received at low-fidelity and high-fidelity levels. This strategy could involve the use of design representations and analysis models of varying fidelity during exploration. The proposed RL framework assumes a fixed design representation and encompasses analysis models of varying fidelity that compose the reward formulation.

The rest of the paper is organized as follows. Section 2 provides a brief introduction to design space exploration and discusses the potential of deep RL as an autonomous design optimizer. In Sec. 3, we propose an RL framework for design wherein the agent trades off between models with different computational costs and levels of fidelity and detail other methodologies used in this work. In Sec. 4, two multimodal multi-objective constrained mixed integer nonlinear design problems are introduced to demonstrate the effectiveness of the proposed framework. Section 5 presents the results of the case studies, including an analysis of exploration behavior and an assessment of the tradeoff between solution quality and computational efficiency. Section 6 summarizes the contribution of the paper and proposes several directions for future work.

2 Background

2.1 Design Space Exploration.

The design of engineered systems often involves the abstract arrangement of components, the selection of specific components for the arrangement, and the assignment of parameter values to the parameterized components. In some cases, design also entails the synthesis of new components; however, when no new components are being synthesized, the design problem reduces to a configuration design problem [16,17]. When the arrangement of components is fixed, the design task reduces to a skeletal design problem [16], which is the focus of this work. It involves selection from sets of all the types of components (e.g., battery choice, controller choice) and assignment of values to the discrete and continuous parameters associated with each component (e.g., physical parameters governing component size or cyber parameters in the controller cost function). The design space of the system is composed of all combinations of the design variables, including both component choices and the associated component parameters. These design variables can be used to compute multiple objectives involving system performance, cost, and other relevant attributes associated with different disciplinary domains [18].

Decision-making involved in the design of an engineered system is sequential in nature [19]. Specifically, it involves searching the design space of the system to determine which combination of variables yields optimal designs. This is referred to as design space exploration [18]. However, when the design space is enormous, it may be infeasible to achieve designs that meet optimality criteria. Accordingly, algorithms attempt to search the design space in an optimally directed fashion to reach designs that satisfice [20,21]. As design problems are often multimodal in nature and involve discrete design variables, it limits the use of gradient-based optimization algorithms. Rather, gradient-free optimization algorithms are preferred for exploring the design space. For instance, Stoecklein et al. [22] employed an evolutionary algorithm for a highly multimodal and discrete design problem involving micropillar sequences for fluid flow sculpting. However, research on optimization algorithms shows that there is no single algorithm whose performance dominates others [23]. Moreover, it was found in that work that all algorithms can provide the best solution for at least some problems. Accordingly, the designer will need to iteratively implement different algorithms to find the most suitable one. For instance, Saldanha et al. [24] demonstrate a methodology for choosing the best evolutionary algorithm for a heat exchanger design problem from a finite set of algorithm alternatives.

A methodology that can autonomously choose or learn an algorithm for design space exploration could be beneficial for design space exploration. Li and Malik [1] have demonstrated that algorithms designed by an RL agent can outperform existing algorithms in terms of solution quality and computational efficiency. To this end, we utilize RL for autonomously exploring the design space of engineered systems.

2.2 Reinforcement Learning-Driven Design.

RL algorithms [25] can iteratively learn effective strategies for the sequential decision-making task of exploring the design space [3,4]. Moreover, they can leverage exploration data in future iterations more efficiently than other design algorithms. For instance, Lee et al. [2] have identified deep RL approaches to be more data efficient than evolutionary optimization approaches for a multimodal and discrete fluid flow sculpting design problem. To emphasize, on the one hand, the genetic algorithm-based design approach uses a widely applicable heuristic at each iteration of the exploration. On the other hand, an RL agent learns to explore by creating a mapping from the design space to an action space that maximizes the long-term collection of rewards across several iterations. By learning strategies specific to the characteristics of the design space, it attempts to maximize the solution quality for the design problem. Further, RL-based design approaches possess generalization and transfer capabilities [3,26,27]. Lastly, when compared to other machine learning-based design approaches, an RL approach can accommodate non-differentiable objective formulations and is not limited by the data input by the designer [28].

While RL algorithms can learn to optimize efficiently from the agent's experience, many engineered systems demand the use of expensive high-fidelity representations and simulations (like computational fluid dynamics or finite element analysis) for evaluating objective functions and constraints that compose the agent reward. First, this can adversely affect computational efficiency of the RL algorithm leading to slower design timelines. There has been an increasing interest in reducing the timelines from years to months in recent years [29]. Second, the high computational and energy expense incurred in deep learning implementations is becoming economically and environmentally unsustainable [5,6]. Martínez-Plumed et al. [30] have identified that insufficient effort has been put toward dimensions like computational and data efficiency in the race to achieve performance benchmarks. For instance, an open reimplementation of the RL-based AlphaZero was trained using 2000 NVIDIA V100 graphics processing units (GPU) with 87 years of GPU time [31]. These aspects limit the applicability of standard deep RL algorithms for engineering design. Thereby, it is important to develop frameworks that improve their computational efficiency. The sequential design process typically involves the sequencing of representations and analysis models of varying fidelity to reduce computational demands [8]. For instance, Mehmani et al. [9] and Wang et al. [10] utilize models of varying fidelity by progressively transitioning to higher fidelity levels to find solutions to design problems efficiently. Accordingly, we hypothesize that an RL approach that progressively utilizes models from low- to high-fidelity [7] could achieve the desired solution quality at a reasonable computational cost.

3 Methodology

This work proposes an RL framework for design space exploration using models of varying fidelity. This section outlines the specific methodology used to construct the framework. In Sec. 3.1, we formalize the skeletal design problem as a multi-objective constrained mixed integer problem. Further, we build upon this to prepare a mathematical formulation of the problem involving models of varying fidelity. Based on this formulation, Sec. 3.2 proposes the RL framework and details the agent–design space interaction. Section 3.3 outlines the methodology for training neural networks to serve as tunable surrogate models of varying fidelity. Section 3.4 describes the parametric study for training RL agents with different proportions of a low- and high-fidelity model.

3.1 Skeletal Design Problem Formulation.

The skeletal design addressed in this work involves the optimization of multiple objectives (f(x, y)) defined by several continuous (x) and discrete (y) design variables such that a set of inequality (g(x, y)) and equality (h(x, y)) constraints are satisfied. For a system involving p objectives, m continuous variables, n discrete variables, r inequality constraints, and s equality constraints, this is mathematically defined in negative null form according to the traditional optimization paradigm as follows:
minimizef(x,y):(Rm,Zn)Rps.t.g(x,y)0wheregRrandh(x,y)=0wherehRs
(1)
where ℝ is the reals set and ℤ is the integers set.
The RL framework is further based on objective and constraint models of varying accuracy and computational efficiency. These objectives are held in a matrix, F, of size p × qf, referred to as the objective fidelity matrix, where p is the number of objectives and qf is the maximum number of fidelity levels at which any objective is defined. We intentionally make few assumptions about the form of the engineering analysis models from which these objectives are evaluated. For instance, some objective terms may not be computable for every model. As a convention, the objectives are ordered from the lowest to the highest fidelity level. Further, for objectives with lesser than qf fidelity levels, the remaining terms of the column are kept 0. The matrix is defined as follows:
F(x,y)=(f11f1qffijfp1fpqf)
(2)
where, fij is the ith objective defined at the jth fidelity level.
Further, we define the matrix, G, of size r × qg, referred to as the inequality constraint fidelity matrix, where r is the number of inequality constraints and qg is the maximum number of fidelity levels at which any inequality constraint is defined. These constraints may or may not be associated with the same engineering analysis models as in the objective fidelity matrix. For instance, a problem may have an objective evaluated using a fluid flow model, while the constraint could be evaluated using a structural model. Further, the ordering of constraints and the absent fidelity levels for a specific constraint follow the same treatment as the objective fidelity matrix. The matrix is defined as follows:
G(x,y)=(g11g1qggijgr1grqg)
(3)
where gij is the ith inequality constraint defined at the jth fidelity level.
Similarly, we define the matrix, H, of size s × qh, referred to as the equality constraint fidelity matrix, where s is the number of equality constraints and qh is the maximum number of fidelity levels at which any equality constraint is defined
H(x,y)=(h11h1qhhijhs1hsqh)
(4)
where hij is the ith equality constraint defined at the jth fidelity level.
To formulate the scalar reward for an RL algorithm, we define an objective weighting matrix Wf, of size p × qf. The objective weighting matrix is defined as follows:
Wf=(w11w1qfwijwp1wpqf)
(5)
where wij is the weight towards the ith objective defined at the jth fidelity level.
The weighted sum of the objective fidelity matrix using the objective weighting matrix essentially reduces the objective to a scalar function, f′ as defined below:
f(x,y,Wf)=sum(WfF(x,y))
(6)
where, represents the Hadamard product.

Similarly, we define constraint weighting matrices, Wg and Wh, of size r × qg and s × qh, respectively. Accordingly, the weighted constraint fidelity matrices, G′ and H′ are defined as follows:

G(x,y,Wg)=WgG(x,y)
(7)
H(x,y,Wh)=WhH(x,y)
(8)

Unlike the objective matrix, these do not involve a summation of the terms which makes it possible instead to uniquely penalize the agent for every constraint it violates, as detailed in the reward formulation in Sec. 3.2.

While it is customary to reduce a multi-objective problem to a single objective problem by a weighted sum in optimization algorithms, our approach also provides the flexibility to choose different fidelity levels for different objectives and constraints in different portions of the search. This results in a reduced computational expense when some of the objectives and constraints utilize just some low-fidelity model in some portions of the search. Specifically, this is achieved by using sparse weights across fidelity levels of a particular objective or constraint.

3.2 Reinforcement Learning Framework.

A reinforcement learning-based design agent can solve the skeletal design problem by starting with a seed design and iteratively tuning the continuous and discrete variables to minimize the objective, f′ while satisfying the constraints G′ and H′. Accordingly, the agent–design space interaction when the agent transitions from state t to state t + 1 during training is illustrated in Fig. 1 and discussed hereinafter.

Fig. 1
Reinforcement learning framework based on models of varying fidelity
Fig. 1
Reinforcement learning framework based on models of varying fidelity
Close modal
Like any RL agent, the design agent defined here needs to learn to take actions (a) in an agent state (s) based on the feedback received in the form of scalar rewards (R). Specifically, the agent needs to learn a policy, i.e., a mapping from the state space to the action space to maximize the sum of rewards it sees over time. At the iteration t, the agent state (st) is composed of the design state (dt) and the elements of the weighting matrices in the iteration as specified by the weighting schedules ((Wf)t, (Wg)t, (Wh)t). The design state (dt) is defined by the design variables (xt, yt) that define the system. Accordingly, the agent state is defined as follows:
st={xt,yt,(Wf)t,(Wg)t,(Wh)t}
(9)
The agent actions (at) define how much each design variable needs to be incremented or decremented in the iteration. For the discrete variables, we utilize a rounding approach that generalizes well across the available discrete options [32].
at=(ax)tRm,(ay)tRn
(10)
xt+1=xt+(ax)t
(11)
yt+1=yt+(ay)t
(12)
where ⌊⌉ denotes the nearest integer function.

While the agent cannot modify the weighting matrices, the contents of those matrices are still useful to the agent to condition its learning at different fidelity levels. For instance, if the agent happens to be in the same design state at different fidelity levels, having information about the operating fidelity level would enable it to learn to make decisions based on the reward computed using that specific fidelity level. However, as the actions depend on both the design state and the fidelity level, there would be an interaction between them.

The agent reward (Rt+1) measures the quality of the action (at) that transitions the design from state dt to dt+1. This depends on the amount by which the scalar objective f′ reduces. Further, when the agent is in the infeasible domain, the change in the amounts by which each of the constraints in the weighted constraint matrices (G′, H′) are violated would guide the agent to navigate to feasible regions. Accordingly, three reward functions (Rf, Rg, Rh) are defined that compose the agent reward as shown below:
Rt+1=(Rf)t+1+(Rg)t+1+(Rh)t+1
(13)
where Rf is a function that rewards or penalizes the agent based on how much the objective value reduces or increases in the iteration, i.e.,
(Rf)t+1=ftft+1
(14)
Rg is a function that penalizes or rewards the agent based on how much more or less it violates a constraint when it is in the infeasible region, i.e.,
(Rg)t+1=k=1rl=1qg(max(0,(Gt)kl)max(0,(Gt+1))kl))
(15)
Rh is a function that penalizes or rewards the agent for how much more or less it steps away from the equality constraint hypersurface, i.e.,
(Rh)t+1=k=1sl=1qh(|(Ht)kl||(Ht+1)kl|)
(16)
Lastly, to evaluate a policy that is trained using the proposed framework, a design quality metric, Q, is defined as follows:
Q=fqfk=1rmax(0,(G)kqg)k=1s|Hkqh|
(17)
where, qf, qg, and qh indicate the highest fidelity levels associated with each term.

3.3 Training Deep Neural Network Surrogates.

To demonstrate the proposed approach of RL using models of varying fidelity, we use neural networks as a tunable approach to construct surrogate models of varying accuracy and cost based on data sets generated from an underlying ground truth model. Neural networks were chosen because they can be easily tuned to achieve different levels of fidelity, for instance by varying the network architecture or the training data set. Hereinafter, the methodology to train the networks is detailed.

In this work, the weights toward various objectives and constraints at a specific fidelity level are kept constant throughout the training. The multi-fidelity multi-objective formulation therefore reduces to just a multi-fidelity formulation. This also permitted the utilization of a single neural network to predict all the objectives and constraint terms at a particular fidelity level. Specifically, we utilize two neural networks serving as models of high- and low-fidelity to predict the objective and constraint terms.

The number of neurons in the input layer is equal to the sum of the number of design variables (m + n). Further, the number of neurons in the output layer is equal to the sum of the number of objectives and constraints (p + r + s). The number of hidden layers, i.e., the depth of the neural network and the number of neurons per layer, i.e., the width of the neural network are the key components in the design of the neural network. They are tuned to obtain models of varying fidelity. For instance, a model with zero hidden layers and linear activation functions can serve as a model of low fidelity. On the other hand, a network with large width and depth can serve as a high-fidelity model.

The accuracy of a surrogate model is also influenced by the number of samples utilized to construct the surrogate [7]. In the context of neural networks, the size of the data set that is used for training influences the model accuracy [3335]. In this work, both the architecture of the neural network and the size of the dataset used for training were varied to prepare models of varying fidelity. Specifically, deeper networks and larger data sets were used for higher levels of fidelity. The depth of the networks and the size of the datasets used for training were found by iterative tuning to prepare disparate models. This process was supported by AutoKeras [36] in some cases, as it offers the flexibility of training models of varying fidelity by explicitly specifying the maximum number of parameters allowed while searching for a neural network architecture that performs well on a dataset.

3.4 Training and Evaluating Reinforcement Learning Agents.

To demonstrate the tradeoff between computational efficiency and solution quality using the proposed RL framework, a parametric study is conducted with varying training schedules as shown in Fig. 2. As the formulation is reduced from multi-objective multi-fidelity to bi-fidelity, we utilize a scalar binary parameter, w′, that determines the operating fidelity level (w′ = 0 for low-fidelity, w′ = 1 for high-fidelity). The parameter, n in Fig. 2 determines the iteration at which the fidelity level switches from the low-fidelity neural network to the high-fidelity one. This essentially governs the proportion of usage of the low- and high-fidelity models.

Fig. 2
Training schedules for the parametric study
Fig. 2
Training schedules for the parametric study
Close modal

A proximal policy optimization algorithm [37] is used for training the policies with several values of n. Several randomly sampled designs that may or may not satisfy the constraints are used as seed designs for training each policy. The number of iterations per episode, the total number of episodes, and other RL hyperparameters are tuned to yield designs that satisfice for a policy that utilizes just the high-fidelity network for each problem and are kept constant throughout the parametric study.

After the completion of training, the learned policies are evaluated by passing the seeds to yield the design solutions. In addition to evaluating the metric Q using the high-fidelity model, it is also evaluated using the low-fidelity model to understand the exploration behavior of the agent, as discussed in Sec. 5. Accordingly, the two metrics are defined as follows:
Ql=f1k=1rmax(0,(G)k1)k=1s|Hk1|
(18)
Qh=f2k=1rmax(0,(G)k2)k=1s|Hk2|
(19)

To understand the behavior of exploration using the proposed framework, a two-dimensional embedding is trained for visualizing several trajectories in the design space [38]. Specifically, Principal Component Analysis (PCA) is performed using all the design state vectors that were visited while evaluating all the trained policies.

To understand the tradeoff between computational efficiency and solution quality, the data from all cases of the study are processed. As the total number of iterations is constant across the study, the total time required to evaluate the objectives and constraints using the low- and high-fidelity models is utilized to reflect the computational efficiency. The solution quality values that are evaluated using the high-fidelity model, as per Eq. 19, are utilized for eliciting the tradeoff trend. Specifically, an exponential curve is fitted using a least squares method using the data from all cases.

4 Case Studies

The motivation for the case studies is to demonstrate the effectiveness of the proposed reinforcement learning framework for design space exploration and the tradeoff between computational efficiency and solution quality. We consider two multimodal multi-objective constrained mixed integer nonlinear skeletal design problems involving the physical components of a ground and aerial vehicle. The details of the ground and aerial vehicle problem are described in Secs. 4.1 and 4.2, respectively.

4.1 Ground Vehicle Problem.

The ground vehicle skeletal design problem is based on prior work on Formula Society of Automotive Engineers (SAE) vehicles [39,40]. It involves multiple subsystems of the vehicle such as suspension, wings, etc. Figure 3 shows a schematic of the vehicle along with the number of design variables associated with different subsystems. The design space of the problem comprises 29 continuous (e.g., cabin length, wing length) and 10 discrete (e.g., engine choice from a set of 21 engines) variables. To reduce the dimensionality of the problem, the discrete variables are transformed from nominal to ordinal based on their size and key performance indicators. By considering all possible discrete values and merely 10 values for the continuous variables, the size of the combinatorial space is of the order of 1039, a value comparable to the state space size of 1040 in chess [41].

Fig. 3
Schematic of the ground vehicle (labels indicate the number of design variables associated with different sub-systems)
Fig. 3
Schematic of the ground vehicle (labels indicate the number of design variables associated with different sub-systems)
Close modal

The design objective is defined by a set of 11 sub-objectives to judge the overall performance of the system. These are the mass of the vehicle, center of gravity height, drag, downforce, acceleration, crash force, impact attenuator volume, cornering velocity, braking distance, suspension acceleration, and pitch moment. Further, the design is subject to 80 practical inequality constraints (which are the choice of modeler, e.g., rear wing length should have a value of at least 0.05 m) and natural inequality constraints (which represent physical necessity, e.g., a positive ground normal reaction). These include 78 linear equality constraints and two nonlinear inequality constraints. The reader is referred to prior work [40] for the detailed analytical expressions associated with the objectives and constraints.

Specifically, two neural networks are trained on datasets of sizes 100,000 and 600 that were generated using analytical expressions. These data sets are scaled using a min–max normalization technique prior to training. The architecture of the low-fidelity neural network was limited to a linear model without a hidden layer. Specifically, the architecture of this network involving 39 design variables, 11 objectives, and 2 constraints is defined as follows:
I39O13,L
where INi represents the input layer of size Ni, and ONo,k represents the output layer of size No with activation k. Further, the smaller dataset is utilized for training this network.
The number of hidden layers and the neurons in each layer was iteratively increased to obtain the high-fidelity neural network with high accuracy on the bigger dataset. Specifically, the architecture of the high-fidelity neural network is defined as follows:
I39D1024,SD512,SD256,SD128,SD64,SO13,L
where INi represents the input layer of size Ni, Dj,k represents a dense (hidden) layer of size j with an activation denoted by k, S represents the sigmoid activation, R represents the ReLU activation, L represents linear activation, and ONo,k represents the output layer of size No with activation k.

The neural networks are trained using the mean squared error loss function and the adaptive moment estimation optimizer. The prediction accuracies of the objectives and constraints are measured using the coefficient of determination of the individual components as well as the overall coefficient of determination of the predictions. The high-fidelity model accuracies range from 0.977 to 0.999 for the individual components. The low-fidelity model accuracies range from −0.661 to 0.493 for the individual components. The higher accuracies of the high-fidelity network are due to the larger size of the dataset and a larger neural network as compared to the low-fidelity network. While some values of the coefficients of determination of the individual components are negative (indicating a poor fit), the remaining components could still drive the exploration behavior toward high-quality regions on average. The medians of the computational expense of prediction using the neural networks and the accuracy as measured by the overall coefficient of determination of the predictions are illustrated in Fig. 4.

Fig. 4
Characteristics of the deep neural networks for the ground vehicle problem (error bars reflect the interquartile range)
Fig. 4
Characteristics of the deep neural networks for the ground vehicle problem (error bars reflect the interquartile range)
Close modal

The objective weights associated with different sub-objectives are adapted from Soria Zurita et al. [40] and are kept constant throughout all training schedules. Further, all the constraint weights are kept equal to 1. The objective and constraint models of the neural networks along with these weights compose the reward function. The state of the agent is defined by the design variables and the scalar binary parameter, w′ that determines the operating fidelity level. To ensure that the agent makes stable progress across the iterations of a learning episode, the magnitude of allowable design parameter changes is capped at a value of one-tenth of the range of the continuous variables and a value of 2 for the ordinal variables.

4.2 Aerial Vehicle Problem.

The aerial vehicle skeletal design problem involves a quadcopter that is designed using a corpus of components and a high-fidelity flight dynamics model [42]. The components include batteries, electronic speed controls (ESCs), motors, and propellers. The design space of the problem comprises two continuous variables—arm length and support length—and four ordinal variables for the choice of batteries, ESCs, motors, and propellers from an ordered set of the components like the previous case study. The reader is referred to prior work [42] for details on the corpus of components used in this problem. Figure 5 illustrates the skeletal design artifact of a quadcopter generated by assigning random values to the design variables. By considering all possible discrete values and merely 10 values for the continuous variables, the size of the combinatorial space is of the order of 108. While the number of design variables is lower than in the previous case study, this problem involves a larger number of choices for the ordinal variables.

Fig. 5
Skeletal design artifact of the aerial vehicle problem (labels indicate the number of design variables associated with different components)
Fig. 5
Skeletal design artifact of the aerial vehicle problem (labels indicate the number of design variables associated with different components)
Close modal

The design objective is defined by a vector of five sub-objectives to judge the overall performance of the system. These include the maximum hover time, maximum attainable speed, range covered at this maximum attainable speed, maximum coverable range, and speed maintained to cover this range. To emphasize, these objectives aim at developing fast, long-range quadcopters. Further, the design is subject to 27 inequality constraints associated with physical interferences in design and the operating limits of the quadcopter components. These include fixed bounds on the six design variables and 15 nonlinear inequality constraints. The reader is referred to prior work [42] on the flight dynamics model for further details on these objectives and constraints. Despite having a smaller combinatorial state space than the previous study, this one has a higher number of nonlinear constraints.

Two neural networks are trained on a dataset of size 381 that was generated using high-fidelity simulations. This dataset was scaled using a min–max normalization technique like the previous problem. Due to the limited size of the dataset and a high number of nonlinear constraints, AutoKeras was utilized to support the design of the high-fidelity neural network. Specifically, the architecture of this network with six design variables, 8 objectives, and 8 constraint terms is defined as follows:
I6D1024,RD64,RO16,L
The architecture of the low-fidelity neural network was obtained by removing a layer and reducing the number of neurons. A simpler linear model was not utilized like the previous problem as it resulted in a fit that was worse than a horizontal hyperplane at the mean of the data. The network architecture is defined as follows:
I6D16,RO16,R

The neural networks are trained using the mean squared error loss function and the adaptive moment estimation optimizer like the previous problem. The prediction accuracies of objectives and constraints for the high-fidelity network are lower than the previous problem because of the smaller dataset. The accuracies of the individual components of the high-fidelity predictions range from 0.395 to 0.907. The accuracies of the individual components of the low-fidelity predictions range from −2.531 to 0.629. The higher accuracies of the high-fidelity network are due to the larger neural network as compared to the low-fidelity network. Like the previous problem, while some values of the coefficient of determination of the individual components are negative (indicating a poor fit), the remaining components could still drive the exploration behavior toward high-quality regions. The medians of the computational expense of prediction using the neural networks, and the accuracy as measured by the overall coefficient of determination of the predictions is illustrated in Fig. 6.

Fig. 6
Characteristics of the deep neural networks for the aerial vehicle problem (error bars reflect the interquartile range)
Fig. 6
Characteristics of the deep neural networks for the aerial vehicle problem (error bars reflect the interquartile range)
Close modal

The objective weights are kept equal due to the lack of expert knowledge of specific design requirements. Further, all the constraint weights are equated to 1. The agent state, actions, reward formulation, number of iterations per episode, and the RL hyperparameters are the same as in the previous case study.

5 Results and Discussion

5.1 Ground Vehicle Problem.

The RL policies were trained and evaluated for the cases n = {0, 10, 20, …, 80, 90, 100} for the ground vehicle problem. Specifically, 6000 seed designs were randomly sampled and utilized for training and evaluating all the policies. The results of the evaluation for four cases (low-fidelity alone, mixed with low-fidelity dominant, mixed with high-fidelity dominant, and high-fidelity alone) are shown in Fig. 7 and discussed in further detail. These cases specifically correspond to the parameter values of n = {0, 30, 70, 100}. The quality of the solutions for all the cases is better than the seed designs, indicated by the upward trend in all plots. This showcases the ability of the proposed variable-fidelity framework to effectively search for solutions. Further, the solution qualities are higher for the cases n = {0, 30} than the cases n = {70, 100}. Aside from the final quality values, the nature of the quality-iteration plots is different for different cases. For the case n = 0 (Fig. 7(a)), the quality increases with a small rise in dispersion in the initial iterations. For the case n = 30 (Fig. 7(b)), the dispersion in initial iterations is higher than in the case n = 0. However, it eventually converges into a high-quality region. For the case n = 70 (Fig. 7(c)), the quality rises slowly when the low-fidelity model is operational. Further, a steep rise in quality is observed when the agent switches to the high-fidelity model. However, it does not converge to a high-quality region because of the limited number of iterations left after the switch. Lastly, for the case n = 100 (Fig. 7(d)), the nature of the plot is similar to the case n = 0 (Fig. 7(a)). However, it converges to a region of lower quality. To emphasize further, the difference in the plots is observed even when the low-fidelity model has been operational for the same number of iterations in different cases. This indicates that the agent is exploring different regions of the design space even when using the low-fidelity model for the same number of iterations before the switch.

Fig. 7
Trained policies are evaluated by passing seed designs for the ground vehicle problem (vertical line indicates a switch from low- to high-fidelity; violin plots indicate medians)
Fig. 7
Trained policies are evaluated by passing seed designs for the ground vehicle problem (vertical line indicates a switch from low- to high-fidelity; violin plots indicate medians)
Close modal

To understand the exploration behavior of the RL agents, a two-dimensional embedding is prepared by performing PCA on all the design vectors that were visited while evaluating all the trained policies. The embedding is visualized in Fig. 8. In these sub-figures, the scatter plot shows all the regions of the design space that were visited during policy evaluation across all cases. The color reflects the quality of the design as measured by the high-fidelity and low-fidelity model in Figs. 8(a) and 8(b), respectively. Further, we plot one seed and the evaluated trajectories for each of the four cases discussed earlier. Additionally, for the mixed-fidelity cases, we highlight the agent step at which the fidelity level changes.

Fig. 8
Exploration behavior of RL agents is visualized using the PCA embedding (nonlinear colormap is used to show details in high-quality regions)
Fig. 8
Exploration behavior of RL agents is visualized using the PCA embedding (nonlinear colormap is used to show details in high-quality regions)
Close modal

First, the colormaps are indicative that the low-fidelity model significantly deviates from the high-fidelity model in several regions of the design space. For instance, in the case n = 0, the trajectory converges to a high-quality region as per the high-fidelity model (Fig. 8(a)). However, this region has poor quality as per the low-fidelity model (Fig. 8(b)). For the case n = 100, the trajectory converges to another region that has the highest quality as per the low-fidelity model. However, this region has a moderate quality as measured by the high-fidelity model. This explains the difference in the qualities that were observed in Figs. 7(a) and 7(d), respectively. For the mixed-fidelity cases, the trajectories lie in between the high-fidelity and low-fidelity trajectories based on which model is dominant. Moreover, the trajectories are significantly different for these cases even before the agent switches from the low-fidelity model to the high-fidelity one. We attribute this to the evaluative feedback received from future states (including the states when the high-fidelity model is operational) and the interaction in the learning at both fidelity levels in a specific design state. In the case n = 30, the location of the trajectory is shifted due to the influence of the low-fidelity model. Accordingly, it passes through regions of lower quality than the case n = 0 when measured by the high-fidelity model. However, it eventually converges into a high-quality region. This is in accordance with a higher dispersion followed by convergence to a high quality that was observed in Fig. 7(b). For the case n = 70, the trajectory shifts further toward the trajectory of the case n = 100. In this region, the quality (as measured by the high-fidelity model) along the trajectory rises slowly when the low-fidelity model is operational. Further, the direction of the trajectory changes drastically when the model switches. Specifically, it starts moving toward a high-quality region as measured by the high-fidelity model. However, it makes limited progress as only a few iterations are remaining when this model is operational. Again, this corroborates with the quality-iteration plot in Fig. 7(c).

To understand the tradeoff between computational efficiency and solution quality, the data from all the cases (n = {0, 10, 20, …, 80, 90, 100}) was processed. Specifically, for each case, the total time for evaluating the objectives, constraints, and solution quality as measured by the high-fidelity metric was computed and is shown in Fig. 9. For the cases when the high-fidelity model is dominant (i.e., n = {0, …, 50} on the right side of the plot), high quality of solutions is maintained even with a significant reduction in compute time. For the cases n = {60, 70, 80}, the quality values have high dispersion with lower or comparable values than other cases that have a lesser computation time. This is attributed to the fact that the agent has few iterations remaining to be able to converge to a different region after switching to the high-fidelity model. This behavior is detailed for the case n = 70 in Figs. 7(c) and 8. For the cases n = {90, 100}, we achieve a moderate quality of solutions based on the low-fidelity model. Lastly, the quality of the seed designs is plotted at t = 0 as the sampling of seed designs does not involve the computation of objectives and constraints. To elicit a tradeoff trend from the data, an exponential curve is fitted using a least squares method using the solutions obtained from all seed designs across all the cases. Specifically, we use the form Qh = aekt + b, where b is the asymptotic value achieved by Qh when the high-fidelity model is dominant, a + b is the Qh-intercept that reflects the quality of the seed designs, and k is a parameter that reflects the rate at which the quality changes. The goodness of this fit is measured using the coefficient of determination and is noted in Fig. 9. A high value of the coefficient of determination for the best fit curve shows the suitability of the chosen functional form and a good resultant fit for the data. We observe that the solution quality increases with computation time across the cases. Further, it should be noted that this curve of solution quality versus compute time is concave and approaches an asymptotic value of −0.173. This indicates that good solutions can be achieved with substantial reductions in compute time.

Fig. 9
Tradeoff between solution quality and computational efficiency is observed (median values with 25th and 75th percentiles; curve shown is the best fit exponential curve)
Fig. 9
Tradeoff between solution quality and computational efficiency is observed (median values with 25th and 75th percentiles; curve shown is the best fit exponential curve)
Close modal

5.2 Aerial Vehicle Problem.

The RL policies were trained and evaluated for the aerial vehicle problem, for the same cases and the same number of seed designs as the previous problem. The results of evaluating the policies for the cases n = {0, 30, 70, 100} are shown in Fig. 10. The quality of the solutions for all the cases is better than the seed designs, again showing that the framework effectively searches for solutions in all cases. Further, the solution qualities drop as the low-fidelity model usage increases, similar to the previous problem. For the cases n = 0 and n = 30 (Figs. 10(a) and 10(b)), the quality increases in a similar manner to yield high-performance solutions. For the case n = 70 (Fig. 10(c)), the rate at which quality improves rises a bit when the agent switches models. Lastly, for the case n = 100 (Fig. 10(d)), the agent converges to a region of lower quality than the other cases. Unlike the previous problem, the nature of the quality-iteration plots is similar across the cases until the low-fidelity model is operational. This indicates that the agent may be exploring a similar region of the design space across the cases when the low-fidelity model is operational before the switch.

Fig. 10
Trained policies are evaluated by passing seed designs for the aerial vehicle problem (vertical line indicates a switch from low- to high-fidelity; violin plots indicate medians)
Fig. 10
Trained policies are evaluated by passing seed designs for the aerial vehicle problem (vertical line indicates a switch from low- to high-fidelity; violin plots indicate medians)
Close modal

To understand the exploration behavior of the RL agents, a two-dimensional embedding is prepared similar to the previous problem and is shown in Fig. 11. First, the colormaps are indicative that the low-fidelity model deviates from the high-fidelity model mainly in the latter portions of the trajectory. The four trajectories in Fig. 11 follow a similar path until the last few iterations. For the cases n = 0 and n = 30, the agent solutions converge to a similar region as was indicative in Figs. 10(a) and 10(b). For the case n = 100, the trajectory changes direction toward a high-quality region as measured by the low-fidelity model. However, this region has lower quality when measured by the high-fidelity model. This explains the lower quality that is observed in Fig. 10(d). For the case n = 70, the trajectory follows the same direction as the case n = 100 until the low-fidelity model is operational. After the model switches, it changes its direction toward the high-quality region as measured by the high-fidelity model. While this improves the quality, this improvement is limited by the number of remaining iterations. These search patterns explain the quality-iteration plots of Figs. 10(c) and 10(d).

Fig. 11
Exploration behavior of RL agents is visualized using the PCA embedding (nonlinear colormap is used to show details in high-quality regions)
Fig. 11
Exploration behavior of RL agents is visualized using the PCA embedding (nonlinear colormap is used to show details in high-quality regions)
Close modal

To understand the tradeoff between computational efficiency and solution quality, the data from all the cases (n = {0, 10, 20, …, 80, 90, 100}) were processed. Specifically, for each case, the total time for evaluating the objectives, constraints, and solution quality as measured by the high-fidelity metric was computed and is shown in Fig. 12. For the cases when the high-fidelity model is dominant (i.e., n = {0, …, 50} on the right side of the plot), a high quality of solutions is maintained even with a significant reduction in compute time. For the remaining cases n = {60, …, 100}, we observe a steady decrease in quality with a decrease in computation time. Lastly, the quality of the seed designs is plotted at t = 0 as the sampling of seed designs does not involve the computation of objectives and constraints. To elicit a tradeoff trend from the data, an exponential curve is fitted similarly to the previous problem and is shown in Fig. 12. The high value of the coefficient of determination 0.729 confirms the suitability of the chosen functional form. We observe that the solution quality increases with computation time across the cases. Further, it should be noted that this curve of solution quality versus compute time is concave and approaches an asymptotic value of −0.205. This indicates fairly good solutions can be achieved with substantial reductions in computing time.

Fig. 12
Tradeoff between solution quality and computational efficiency is observed (median values with 25th and 75th percentiles; the curve shown is the best fit exponential curve)
Fig. 12
Tradeoff between solution quality and computational efficiency is observed (median values with 25th and 75th percentiles; the curve shown is the best fit exponential curve)
Close modal

These parametric case studies showcase that the proposed framework can not only balance the tradeoff between computational efficiency and solution quality but also find high-performance solutions to high-dimensional design problems where the use of just a high-fidelity model is infeasible.

6 Conclusion

This paper proposes a reinforcement learning framework based on models of varying fidelity that addresses the computational expense of the high-fidelity simulations often necessary to evaluate objective and constraint functions in design space exploration. It uses neural network models of varying fidelity and gives the RL agent flexibility to incorporate predefined constant or variable schedules for exploration using these models. We showcase the potential of the framework in two case studies that involve the design of the physical components of a ground and aerial vehicle. The RL agent converges to high-performance regions of the design space using objective evaluations at two fidelity levels. A parametric study with different training schedules for exploration at these fidelity levels demonstrates the tradeoff between computational efficiency and solution quality. Further, a concave tradeoff trend showcases the potential of the framework to find high-performance solutions to design problems where the use of just a high-fidelity model is infeasible. Lastly, the exploration behavior of the agents is discussed by visualizing an embedding space.

Future work should explore the application of this framework to design problems beyond skeletal design. These include configuration design problems based on graph grammar representations. While RL has been used in conjunction with shape grammars for generative design [43], the design space exploration using such a representation is not researched upon using variable-fidelity models. Alternatively, this could involve learning embeddings for representing the design space [44,45] and extending the existing framework for exploring this embedding space. The weighting schedules of the framework can also be potentially designed or made adaptive based on multi-fidelity model management strategies [7,9,10,13], knowledge of expert designer behavior [46], or RL-based approaches [47]. The variation of attributes like episode length and the number of episodes could also reveal different patterns across low-fidelity model usage. Lastly, the search space can be modified to bias the search toward non-intuitive solutions by incorporating curiosity [48] into the agent reward.

The two case studies are also limited to the application to the physical domains of the engineered system. It would be interesting to evaluate the potential of the proposed frameworks for the co-design of the cyber and physical domains of such systems. For instance, in prior work that addressed an intelligent manufacturing shop floor [49], the design space of the physical components of robots and their control policies can be explored together to yield high-performing shop floors. A semi-automated human-in-the-loop strategy, involving a designer or domain expert who steers the exploration tool after few iterations with the help of visualization can also be an extension to the proposed framework. With humans creating associations across design domains, and machines recognizing statistical patterns from data, such a framework could lead to a symbiotic exploration paradigm.

Acknowledgment

The authors are grateful to the Southwest Research Institute for providing simulation capabilities for the drone case study used in this work. We are also grateful to Susmit Jha and Adam Cobb of SRI International for their feedback on early versions of this work.

Funding Data

  • This material is based upon work supported by the Defense Advanced Research Projects Agency through cooperative agreement FA8750-20-C-0002. Any opinions, findings, and conclusions or recommendations expressed in this work are those of the authors and do not necessarily reflect the views of the sponsors.

Conflict of Interest

There are no conflicts of interest.

Data Availability Statement

The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.

References

1.
Li
,
K.
, and
Malik
,
J.
,
2016
, “
Learning to Optimize
,”
arXiv preprint
. https://arxiv.org/abs/1606.01885
2.
Lee
,
X. Y.
,
Balu
,
A.
,
Stoecklein
,
D.
,
Ganapathysubramanian
,
B.
, and
Sarkar
,
S.
,
2019
, “
A Case Study of Deep Reinforcement Learning for Engineering Design: Application to Microfluidic Devices for Flow Sculpting
,”
ASME J. Mech. Des.
,
141
(
11
), p.
111401
.
3.
Dworschak
,
F.
,
Dietze
,
S.
,
Wittmann
,
M.
,
Schleich
,
B.
, and
Wartzack
,
S.
,
2022
, “
Reinforcement Learning for Engineering Design Automation
,”
Adv. Eng. Inform.
,
52
, p.
101612
.
4.
Ororbia
,
M. E.
, and
Warn
,
G. P.
,
2022
, “
Design Synthesis Through a Markov Decision Process and Reinforcement Learning Framework
,”
ASME J. Comput. Inf. Sci. Eng.
,
22
(
2
), p.
021002
.
5.
Bender
,
E. M.
,
Gebru
,
T.
,
McMillan-Major
,
A.
, and
Shmitchell
,
S.
,
2021
, “
On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?
FAccT 2021—Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency
,
Virtual Event Canada
,
Mar. 3–10
, Association for Computing Machinery, Inc., pp.
610
623
.
6.
Thompson
,
N. C.
,
Greenewald
,
K.
,
Lee
,
K.
, and
Manso
,
G. F.
,
2020
, “
The Computational Limits of Deep Learning
,”
arXiv preprint
.https://arxiv.org/abs/2007.05558
7.
Fernández-Godino
,
M. G.
,
Park
,
C.
,
Kim
,
N.-H.
, and
Haftka
,
R. T.
,
2016
, “
Review of Multi-Fidelity Models
,”
arXiv preprint
. https://arxiv.org/abs/1609.07196
8.
Miller
,
S. W.
,
Yukish
,
M. A.
, and
Simpson
,
T. W.
,
2018
, “
Design as a Sequential Decision Process: A Method for Reducing Design Set Space Using Models to Bound Objectives
,”
Struct. Multidiscipl. Optim.
,
57
(
1
), pp.
305
324
.
9.
Mehmani
,
A.
,
Chowdhury
,
S.
,
Tong
,
W.
, and
Messac
,
A.
,
2015
, “Adaptive Switching of Variable-Fidelity Models in Population-Based Optimization,”
Engineering and Applied Sciences Optimization: Dedicated to the Memory of Professor
,
M.G.
Karlaftis
,
N.D.
Lagaros
, and
M.
Papadrakakis
, eds.,
Springer International Publishing
,
Cham
, pp.
175
205
.
10.
Wang
,
X.
,
Liu
,
Y.
,
Sun
,
W.
,
Song
,
X.
, and
Zhang
,
J.
,
2018
, “
Multidisciplinary and Multifidelity Design Optimization of Electric Vehicle Battery Thermal Management System
,”
ASME J. Mech. Des.
,
140
(
9
), p.
094501
.
11.
Gross
,
D. C.
,
1999
, “
Report from the Fidelity Implementation Study Group
,”
Simulation Interoperability Workshop
,
Orlando, FL
,
Mar. 14–19
.
12.
Kennedy
,
M. C.
, and
O’hagan
,
A.
,
2000
, “
Predicting the Output from a Complex Computer Code When Fast Approximations Are Available
,”
Biometrika
,
87
(
1
), pp.
1
13
.
13.
Peherstorfer
,
B.
,
Willcox
,
K.
, and
Gunzburger
,
M.
,
2018
, “
Survey of Multifidelity Methods in Uncertainty Propagation, Inference, and Optimization
,”
SIAM Rev.
,
60
(
3
), pp.
550
591
.
14.
Newmark
,
N. M.
, and
Hall
,
W. J.
,
1981
,
Earthquake Resistant Design Considerations and Seismic Design Spectra. EERI Report No. 620/N46/1981, Earthquake Engineering Research Institute, Oakland, CA
.
15.
Xu
,
Z.
,
Lu
,
X.
,
Guan
,
H.
,
Han
,
B.
, and
Ren
,
A.
,
2014
, “
Seismic Damage Simulation in Urban Areas Based on a High-Fidelity Structural Model and a Physics Engine
,”
Natural Hazards
,
71
(
3
), pp.
1679
1693
.
16.
Wielinga
,
B.
, and
Schreiber
,
G.
,
1997
, “
Configuration-Design Problem Solving
,”
IEEE Expert
,
12
(
2
), pp.
49
56
.
17.
Mittal
,
S.
, and
Frayman
,
F.
,
1989
, “
Towards a Generic Model of Configuration Tasks
,”
IJCAI
,
2
, pp.
1395
1401
.
18.
Neema
,
H.
,
Lattmann
,
Z.
,
Meijer
,
P.
,
Klingler
,
J.
,
Neema
,
S.
,
Bapty
,
T.
,
Sztipanovits
,
J.
, and
Karsai
,
G.
,
2014
, “
Design Space Exploration and Manipulation for Cyber Physical Systems
,”
IFIP First International Workshop on Design Space Exploration of Cyber-Physical Systems
,
Berlin, Germany
,
April
, p.
8
.
19.
Miller
,
S. W.
,
Simpson
,
T. W.
,
Yukish
,
M. A.
,
Bennett
,
L. A.
,
Lego
,
S. E.
, and
Stump
,
G. M.
,
2013
, “
Preference Construction, Sequential Decision Making, and Trade Space Exploration
,”
International Design Engineering Technical Conferences and Computers and Information in Engineering Conference.
,
Portland, OR
,
Aug. 4–7
.
20.
Ball
,
L. J.
,
Maskill
,
L.
, and
Ormerod
,
T. C.
,
1998
, “
Satisficing in Engineering Design: Causes, Consequences and Implications for Design Support
,”
Autom. Constr.
,
7
(
2–3
), pp.
213
227
.
21.
Simon
,
H. A
,
2008
,
Satisficing. In: The New Palgrave Dictionary of Economics
,
Palgrave Macmillan
,
London
.
22.
Stoecklein
,
D.
,
Wu
,
C.-Y.
,
Kim
,
D.
,
di Carlo
,
D.
, and
Ganapathysubramanian
,
B.
,
2016
, “
Optimization of Micropillar Sequences for Fluid Flow Sculpting
,”
Phys. Fluids
,
28
(
1
), p.
012003
.
23.
Rios
,
L. M.
, and
Sahinidis
,
N. v.
,
2013
, “
Derivative-Free Optimization: A Review of Algorithms and Comparison of Software Implementations
,”
J. Glob. Optim.
,
56
(
3
)
,
pp.
1247
1293
.
24.
Saldanha
,
W. H.
,
Soares
,
G. L.
,
Machado-Coelho
,
T. M.
,
dos Santos
,
E. D.
, and
Ekel
,
P. I.
,
2017
, “
Choosing the Best Evolutionary Algorithm to Optimize the Multiobjective Shell-and-Tube Heat Exchanger Design Problem Using PROMETHEE
,”
Appl. Therm. Eng.
,
127
, pp.
1049
1061
.
25.
Sutton
,
R. S.
, and
Barto
,
A. G.
,
2018
,
Reinforcement Learning: An Introduction
,
MIT Press
,
Cambridge, MA
.
26.
Brown
,
N.
,
Garland
,
A.
,
Fadel
,
G.
, and
Li
,
G.
,
2022
, “
Deep Reinforcement Learning for Engineering Design Through Topology Optimization of Elementally Discretized Design Domains
,”
Mater. Des.
,
218
, p.
110672
.
27.
Settaluri
,
K.
,
Haj-Ali
,
A.
,
Huang
,
Q.
,
Hakhamaneshi
,
K.
, and
Nikolic
,
B.
,
2020
, “
AutoCkt: Deep Reinforcement Learning of Analog Circuit Designs
,”
Proceedings of the 2020 Design, Automation and Test in Europe Conference and Exhibition, DATE 2020
,
Grenoble, France
,
Mar. 9–13
, Institute of Electrical and Electronics Engineers Inc., pp.
490
495
.
28.
Regenwetter
,
L.
,
Nobari
,
A. H.
, and
Ahmed
,
F.
,
2022
, “
Deep Generative Models in Engineering Design: A Review
,”
ASME J. Mech. Des.
,
144
(
7
), p.
071704
.
29.
DARPA Information Innovation Office
,
2019
,
Broad Agency Announcement Symbiotic Design for Cyber Physical Systems HR001119S0083
.
30.
Martínez-Plumed
,
F.
,
Avin
,
S.
,
Brundage
,
M.
,
Dafoe
,
A.
,
hÉigeartaigh
,
S.Ó.
, and
Hernández-Orallo
,
J.
,
2018
, “
Between Progress and Potential Impact of AI: the Neglected Dimensions
,” arxXiv preprint, arXiv:1806.00610v2. https://arxiv.org/abs/1806.00610v2
31.
Tian
,
Y.
,
Ma
,
J.
,
Gong
,
Q.
,
Sengupta
,
S.
,
Chen
,
Z.
,
Pinkerton
,
J.
, and
Zitnick
,
C. L.
,
2019
, “
ELF OpenGo: An Analysis and Open Reimplementation of AlphaZero
,”
International Conference on Machine Learning
,
Long Beach, CA
,
June 9–15
.
32.
van Hasselt
,
H.
, and
Wiering
,
M. A.
,
2009
, “
Using Continuous Action Spaces to Solve Discrete Problems
,”
Proceedings of the International Joint Conference on Neural Networks
,
Atlanta, GA
,
June 14–19
, pp.
1149
1156
.
33.
Williams
,
G.
,
Meisel
,
N. A.
,
Simpson
,
T. W.
, and
McComb
,
C.
,
2020
, “
Deriving Metamodels to Relate Machine Learning Quality to Design Repository Characteristics in the Context of Additive Manufacturing
,”
Proceedings of the Volume 11A: 46th Design Automation Conference (DAC)
,
Virtual
,
Aug. 17–19
.
34.
Williams
,
G.
,
Meisel
,
N. A.
,
Simpson
,
T. W.
, and
McComb
,
C.
,
2019
, “
Design Repository Effectiveness for 3D Convolutional Neural Networks: Application to Additive Manufacturing
,”
ASME J. Mech. Des.
,
141
(
11
), p.
111701
.
35.
Williams
,
G.
,
Puentes
,
L.
,
Nelson
,
J.
,
Menold
,
J.
,
Tucker
,
C.
, and
McComb
,
C.
,
2020
, “
Comparing Attribute- and Form-Based Machine Learning Techniques for Component Prediction
,”
Proceedings of the Volume 11B: 46th Design Automation Conference (DAC)
,
Virtual
,
Aug. 17–19
.
36.
Jin
,
H.
,
Song
,
Q.
, and
Hu
,
X.
,
2019
, “
Auto-Keras: An Efficient Neural Architecture Search System
,”
Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
,
Anchorage, AK
,
Aug. 4–8
, ACM, New York, pp.
1946
1956
.
37.
Schulman
,
J.
,
Wolski
,
F.
,
Dhariwal
,
P.
,
Radford
,
A.
, and
Klimov
,
O.
,
2017
, “
Proximal Policy Optimization Algorithms
,”
arXiv preprint
. https://arxiv.org/abs/1707.06347
38.
Agrawal
,
A.
, and
McComb
,
C.
,
2022
, “
Comparing Strategies for Visualizing the High-Dimensional Exploration Behavior of CPS Design Agents
,”
Proceedings of the 2022 IEEE Workshop on Design Automation for CPS and IoT (DESTION)
,
Milano, Italy
,
May 3–6
, IEEE, pp.
64
69
.
39.
Lapp
,
S.
,
Jablokow
,
K.
, and
McComb
,
C.
,
2019
, “
KABOOM: An Agent-Based Model for Simulating Cognitive Style in Team Problem Solving
,”
Design Sci.
,
5
, pp.
1
32
.
40.
Soria Zurita
,
N. F.
,
Colby
,
M. K.
,
Tumer
,
I. Y.
,
Hoyle
,
C.
, and
Tumer
,
K.
,
2018
, “
Design of Complex Engineered Systems Using Multi-Agent Coordination
,”
ASME J. Comput. Inf. Sci. Eng.
,
18
(
1
), p.
011003
.
41.
Steinerberger
,
S.
,
2015
, “
On the Number of Positions in Chess Without Promotion
,”
Int. J. Game Theory
,
44
(
3
), pp.
761
767
.
42.
Walker
,
J. D.
,
Heim
,
F. M.
,
Surampudi
,
B.
,
Bueno
,
P.
,
Carpenter
,
A.
,
Chocron
,
S.
,
Cutshall
,
J.
, et al
,
2022
, “
A Flight Dynamics Model for Exploring the Distributed Electrical EVTOL Cyber Physical Design Space
,”
Proceedings of the 2022 IEEE Workshop on Design Automation for CPS and IoT (DESTION)
,
Milano, Italy
,
May 3–6
, IEEE, pp.
7
12
.
43.
Ruiz-Montiel
,
M.
,
Boned
,
J.
,
Gavilanes
,
J.
,
Jiménez
,
E.
,
Mandow
,
L.
, and
Pérez-de-la-Cruz
,
J.-L.
,
2013
, “
Design With Shape Grammars and Reinforcement Learning
,”
Adv. Eng. Inform.
,
27
(
2
), pp.
230
245
.
44.
Mirhoseini
,
A.
,
Goldie
,
A.
,
Yazgan
,
M.
,
Jiang
,
J. W.
,
Songhori
,
E.
,
Wang
,
S.
,
Lee
,
Y. J.
, et al
,
2021
, “
A Graph Placement Methodology for Fast Chip Design
,”
Nature
,
594
(
7862
), pp.
207
212
.
45.
Tavakoli
,
M.
, and
Baldi
,
P.
,
2020
, “
Continuous Representation of Molecules Using Graph Variational Autoencoder
,”
arXiv preprint
. https://arxiv.org/abs/2004.08152
46.
Cross
,
N.
,
2004
, “
Expertise in Design: An Overview
,”
Design Studies
,
25
(
5
), pp.
427
441
.
47.
Chhabra
,
J. P.
, and
Warn
,
G. P.
,
2019
, “
A Method for Model Selection Using Reinforcement Learning When Viewing Design as a Sequential Decision Process
,”
Struct. Multidiscip. Optim
,
59
(
5
), pp.
1521
1542
.
48.
Grace
,
K.
,
Lou Maher
,
M.
,
Wilson
,
D.
, and
Najjar
,
N.
,
2017
, “
Personalised Specific Curiosity for Computational Design Systems
,”
Design Computing and Cognition ’16
,
Evanston (Chicago), IL
,
June 27–29
, Springer International Publishing, Cham, pp.
593
610
.
49.
Agrawal
,
A.
,
Won
,
S. J.
,
Sharma
,
T.
,
Deshpande
,
M.
, and
McComb
,
C.
,
2021
, “
A Multi-Agent Reinforcement Learning Framework for Intelligent Manufacturing With Autonomous Mobile Robots
,”
Proc. Des. Soc.
,
1
, pp.
161
170
.