Abstract
In this paper, we consider the optimal control of material microstructures. Such material microstructures are modeled by the so-called phase-field model. We study the underlying physical structure of the model and propose a data-based approach for its optimal control, along with a comparison to the control using a state-of-the-art reinforcement learning (RL) algorithm. Simulation results show the feasibility of optimally controlling such microstructures to attain desired material properties and complex target microstructures.
1 Introduction
In this paper, we consider the optimal control of complex and high degree-of-freedom material microstructure dynamics, by doing which we aim at developing a robust tool for exploring the fundamental limits of materials microstructure control during processing. The microstructure dynamics process is governed by the so-called phase-field model [1], which is a powerful tool for modeling microstructure evolution of materials. It represents the spatial distribution of physical quantities of interest by an order parameter field governed by one or a set of partial differential equations (PDEs). The phase-field method can naturally handle material systems with complicated interface, thanks to which it has been successfully applied to the simulation of a wide range of materials microstructure evolution processes, such as alloy solidification [2,3], alloy phase transformation [4,5], grain growth [6,7], and failure mechanism of materials [8,9], etc. For this work, we consider the evolution of the interface in multiphase microstructures to be guided by the Allen–Cahn [10] and Cahn–Hilliard [11] PDEs. Albeit significant research in the optimal control of the Allen–Cahn [12–14] and Cahn–Hilliard [15,16] PDEs in the recent decade, material processing through data-driven optimal control of material microstructure dynamics still faces some challenging obstacles for real-time implementation. Utilizing nonlinear convex-optimization algorithms to solve for the optimal control of nonlinear Allen–Cahn and Cahn–Hilliard PDEs is viable for smaller instances. However, this approach encounters the well-known “curse of dimensionality,” manifesting increased computational complexity as discretization scales to finer grids to capture intricate dynamics. These algorithms are also known to have convergence issues, owing in part to the existence of multiple local minima. Furthermore, the evolution of microstructures is inherently susceptible to uncertainties stemming from material properties, external conditions, and initial configurations. The development of control strategies resilient to these uncertainties remains an ongoing challenge. Finally, navigating the selection of an appropriate boundary condition, capable of steering microstructure evolution in desired directions, remains a substantial hurdle in the pursuit of effective control methodologies.
The phase-field model is complex, nonlinear, and high degree-of-freedom, and does not admit analytical solutions. Thus, data-based approaches are natural to consider for the control of these systems. In recent work [17], we proposed a novel decoupled data-based control (D2C) algorithm for learning to control an unknown nonlinear dynamical system. Our approach introduced a rigorous decoupling of the open-loop (planning) problem from the closed-loop (feedback control) problem. This decoupling allows us to come up with a highly efficient approach to solve the problem in a completely data-based fashion. Our approach proceeds in two steps: (i) first, we optimize the nominal open-loop trajectory of the system using a “black-box simulation” model, and (ii) then we identify the linear system governing perturbations from the nominal trajectory using random input–output perturbation data, and design a linear-quadratic regulator (LQR) controller for this linearized system. We have shown that the performance of D2C algorithm is approximately optimal, in the sense that the decoupled design is near-optimal to second order in a suitably defined noise parameter [17]. In this work, we apply the D2C approach to the optimal control of the evolution of material microstructures governed by the phase-field model. For comparison, we consider reinforcement learning (RL) techniques [18]. The past several years have seen significant progress in deep neural networks-based reinforcement learning approaches for controlling unknown dynamical systems, with applications in many areas like playing games [19], locomotion [20], and robotic hand manipulation [21]. A number of new algorithms that show promising performance are proposed [22–24] and various improvements and innovations have been continuously developed. However, despite excellent performance on a number of tasks, RL is highly data intensive. The training time for such algorithms is typically very large, and high variance and reproducibility issues mar the performance [25]. Thus, material microstructure control is intractable for current RL techniques. Recent work in RL has also focused on PDE control using techniques like the deep deterministic policy gradient (DDPG) with modifications such as action descriptors to limit the large action-spaces in infinite dimensional systems [26], as such algorithms tend to degrade in performance with an increase in dimensionality of the action-space. However, such an approach, in general, is not feasible for the microstructure problem (Fig. 1).
The control design of systems governed by PDEs is known to be computationally intractable [27]. Owing to the infinite dimensionality of the system, typically model reduction techniques are used to reduce the size of the state space. The most widely used method is the proper orthogonal decomposition method that finds a reduced basis for the approximation of the nonlinear PDE using a simulation of the same [28]. However, in general, the basis eigenfunctions can be quite different around the optimal trajectory when compared to an initial trajectory, and an approximation using the latter's reduced basis can lead to unacceptable performance [29,30]. In our approach, the nominal optimization problem is directly solved which does not need any reduced order modeling. Furthermore, the feedback design is accomplished by identifying a linear time-varying (LTV) model of the system based purely on the input output data while facilitating the use of linear control design techniques such as linear-quadratic-gaussian/iterative linear quadratic regulator (ILQR) on the problem as opposed to having to solve a nonlinear control problem in the methods stated previously. Furthermore, the entire control design for the problem is done purely in a data-based fashion as the technique only queries a black-box simulation model of the process.
The contributions of the paper are as follows: we show how to model the dynamics of a multiphase microstructure and unveil its underlying structure. We present the application of the D2C approach, and a state-of-the-art RL technique (DDPG) to the control of such material systems, and show that local D2C approach outperforms the global RL approach such as DDPG when operating on higher dimensional phase-field PDEs. We show that the D2C is highly robust to noise in the practical noise regimes, and that global optimality can be recovered in higher noise levels through an open-loop replanning approach. Further, we can exploit the peculiar physical properties of material microstructure dynamics to optimize the control actuation architecture that leads to highly computationally efficient control that does not sacrifice much performance. We envisage that this work is a first step toward the construction of a systematic feedback control synthesis approach for the control of material microstructures, potentially scalable to real applications.
The rest of the paper is organized as follows: In Sec. 2, the microstructure dynamics are expanded upon, describing the different PDEs tested. We propose the D2C approach in Sec. 3. In Sec. 5, the proposed approach is illustrated through custom-defined test problems of varying dimensions, along with testing robustness to noise, followed by improvements to the algorithms which exploit the physics of the system.
2 Microstructure Dynamics: Classes of Partial Differential Equations Implemented
This section provides a brief overview of the nonlinear dynamics model of a general multiphase microstructure. We assume an infinitely large, two-dimensional (2D) structure satisfying periodical boundary conditions.
2.1 The Material System.
where is called the “order parameter,” which is a spatially varying quantity. In controls parlance, is the state of the system and is infinite dimensional, i.e., a spatiotemporally varying function. It reflects the component proportion of each phase of material system. For a two-phase system studied in this work, = −1 represents one pure phase and = 1 represent the other, while (−1, 1) stands for a combination state of both phases on the boundary interface between two pure phases; M is a parameter related to the mobility of material, which is assumed constant in this study; F is an energy function with a nonlinear dependence on ; and γ is a gradient coefficient controlling the diffusion level or thickness of the boundary interface.
In essence, the phase-field method is one of gradient flow methods, which means the evolution process follows the path of steepest descent in an energy landscape starting from an initial state until arriving at a local minimum. So, the behavior of microstructures in phase-field modeling highly depends on the selection of energy density function F. For instance, the double-well potential function owing to two minima at =±1 and separated by a maximum at = 0 (as plotted in Fig. 2) will cause the field to separate into regions where 1, divided by the boundary interface, while the single-well potential function owing to a single minimum at = 0 (see Fig. 2) predicts a gradual smoothing of any initial nonuniform field, yielding a uniform field with = 0 in the long-time limit.
Accordingly, the evolution of material microstructure can be governed by selecting proper form of energy density function F. Controlling the evolution process of microstructures represented by Allen–Cahn equation and Cahn–Hilliard diffusion equation are completely distinct. The former one is governed through generating or deleting order parameter straightforwardly, while the latter one is done through guiding the transporting and redistribution of the conserved order parameter across the whole domain.
2.2 Discretized Model for Partial Differential Equation Control.
The phase-field is discretized into a 2D grid of dimension N × N. Let denote the phase-field of the cell defined by grid location (i, j) for all , at time t.
with the periodic boundary conditions and .
where .
Given the phase-field vector and the control input vector ut at any time t, as a stack of all states and control inputs , respectively, we can simulate the phase-field at the next time-step from Eqs. (6) and (7). We also observe the dynamics to be affine in control for both cases.
where is the phase variable on the spatially discretized finite difference grid at time t, and represents the control variables on the same grid at time t.
Remark 1. We consider the system available to us as sufficiently finely discretized to be able to accurately represent the dynamics of the infinite-dimensional PDE system. All our claims regarding the optimality of the feedback solution are with respect to this high, albeit finite, dimensional problem.
2.3 Minimization of Energy Function F for Controlling Microstructure Dynamics.
The control variables T and h govern the evolution process of microstructure by influencing the slope and minimums of energy density function F as shown in Fig. 3. Suppose that h = 0, a positive T yields a double-well F while negative T gives a single-well F, as seen in Fig. 3(a). As explained earlier, under h = 0, T determines the phase will go to a uniform state or phase-separating state in the long-time limit. When 0, the symmetric energy profile shown in Fig. 3(a) becomes skewed as shown in Fig. 3(b). Under this condition, the parameter h determines which phase will be favored during evolution. If h > 0, the minimum locates on the right-hand side, and therefore the phase represented by is favored while, if h < 0, then the field more easily evolves to the phase .
In most applications, T and h represent specific input conditions, i.e., representing temperature and an external force field (e.g., electromagnetic or mechanical), respectively. Since we aim at investigating the control of material microstructure evolution during processing, we suppose that T and h can be adjusted to any value instantaneously, and that the feedback to the controller is the entire phase-field (please see Sec. 5 for details).
2.4 Difference of Allen–Cahn and Cahn–Hilliard Control.
Determined by the different orders of derivatives of control variables (T and h) in the evolution equations, the evolving behaviors of order parameter field from initial state to the desired goal state governed by Allen–Cahn and Cahn–Hilliard equations are completely distinct. For the Allen–Cahn equation, T and h are taken no derivative, and the value of order parameter can be therefore changed straightforwardly by choosing proper T and h (see in Fig. 4). While for the Cahn–Hilliard equation, a Laplacian operation is applied to the term containing T and h, which causes the order parameter field evolves to the desired state through the transport of order parameter field (see in Fig. 4). Consequently, the demanded cost of controlling Allen–Cahn model to evolve from a given initial state to a goal state depends on the discrepancy of the initial state from the desired goal state, while that for Cahn–Hilliard model depends on the distribution of order parameter field as well.
3 Decoupled Data-Based Control Algorithm
In this section, we propose a data-based approach to solve the material phase-separation control problem. In particular, we apply the so-called D2C algorithm that we recently proposed [17], and extend it for high-dimensional applications. The D2C algorithm is a highly data-efficient RL method that has shown to be much superior to state-of-the-art RL algorithms such as the DDPG in terms of data efficiency and training stability while retaining similar or better performance. In the following, we give a very brief introduction to the D2C algorithm (please see Refs. [17], [31], and [32] for the relevant details).
where is the state, ut is the control, wt is a white noise perturbation to the system, and ϵ is a small parameter that modulates the noise in the system. Suppose now that we have a finite horizon objective function: , where denotes a running incremental cost, denotes a terminal cost function, and is an expectation operator taken over the sample paths of the system. The objective of the control design is then to design a feedback policy that minimizes the cost function above and is given by .
The D2C algorithm then proposes a three step procedure to approximate the solution to the above problem.
Second, the perturbed time-varying linear system around the nominal given by is identified in terms of the time-varying system matrices At and Bt.
Third, an LQR controller for the above time-varying linear system is designed whose time-varying gain is given by Kt. Finally, the control applied to the system is given by .
3.1 Open-Loop Trajectory Optimization: Iterative Linear Quadratic Regulator-Based Approach.
We utilize an ILQR-based method to solve the open-loop optimization problem. ILQR typically requires the availability of analytical system Jacobian and thus cannot be directly applied when such analytical gradient information is unavailable (much like nonlinearprograming software whose efficiency depends on the availability of analytical gradients and Hessians). In order to make it an (analytical) model-free algorithm, it is sufficient to obtain estimates of the system Jacobians from simulations, and a sample-efficient randomized way of doing so is described in Sec. 3.1.1. Since ILQR is a well-established framework, we skip the details and instead present pseudocode in Algorithm 1.
3.1.1 Estimation of Jacobians: Linear Least Squares by Central Difference.
where “ns” be the number of samples for each of the random variables, and . Denote the random samples as , and .
We are free to choose the distribution of and . Given that we perform rollouts of the system, where is a Gaussian white noise sequence for all rollouts i, the state perturbations are also Gaussian and independent for the different rollouts i, for any given time t. This ensures that is very well conditioned. To see this, let us consider the terms in the matrix , . Similarly, and .
3.2 Linear Time-Varying System Identification.
Closed-loop control design in step 2 of D2C requires the knowledge of the linearized system parameters At and Bt for . Here, we use the standard least square method to estimate these parameters from input–output experiment data.
First start from the perturbed linear system about the nominal trajectory and estimate the system parameters At and Bt from: where is the control perturbation vector we feed to the system at step t of the nth simulation, and is the state perturbation vector that we get from running forward simulations/rollouts with the above control perturbations. All the perturbations are zero-mean, independent and identically distributed, Gaussian noise with covariance matrix σI. σ is a o(u) small value selected by the user. denotes the deviation of the output state vector from the nominal state after propagating for one step.
Finally, using the standard least square method, the linearized system parameters are estimated as .
3.3 Closed-Loop Control Design.
Given the estimated perturbed linear system, we design a finite horizon, discrete time LQR [33] along the trajectory for each time-step to minimize the cost function , subject to , where . The feedback gains are calculated as , where Pt is solved in a back propagation fashion from the Riccati equation: . The closed-loop control policy is , where is the state deviation vector from the nominal state at time-step t.
3.4 Decoupled Data-Based Control Algorithm: Summary.
The D2C algorithm is summarized in Algorithm 2.
Input: Initial State—, System parameters—; |
. /* Initialize the iteration number m to 1.*/ |
= true. |
/* Run until the difference in costs between subsequent iterations is less an ϵ fraction of the former cost.*/ |
whileordo |
/*Each iteration has a backward pass followed by a forward pass.*/ |
{}, Backward Pass(). |
ifbackward_pass_success_flag == truethen |
Forward |
Pass(). |
whileforward_pass_flag == falsedo |
Forward |
Pass(). |
Reduce α from . |
end while |
end if |
Else |
Increase μ from . /* Regularization step */ |
end if |
. |
. |
end while |
return |
Input: Initial State—, System parameters—; |
. /* Initialize the iteration number m to 1.*/ |
= true. |
/* Run until the difference in costs between subsequent iterations is less an ϵ fraction of the former cost.*/ |
whileordo |
/*Each iteration has a backward pass followed by a forward pass.*/ |
{}, Backward Pass(). |
ifbackward_pass_success_flag == truethen |
Forward |
Pass(). |
whileforward_pass_flag == falsedo |
Forward |
Pass(). |
Reduce α from . |
end while |
end if |
Else |
Increase μ from . /* Regularization step */ |
end if |
. |
. |
end while |
return |
(1) Solve the deterministic open-loop optimization problem for optimal open-loop control sequence and state trajectory using ILQR (Sec. 3.1). |
(2) Linearize and identify the LTV system parameters via least square (Sec. 3.2). |
(3) Solve the Riccati equations for each step along the nominal trajectory for feedback gain . |
(4) Apply the closed-loop control policy, whilet < Tdo |
(12) |
. |
end while |
(1) Solve the deterministic open-loop optimization problem for optimal open-loop control sequence and state trajectory using ILQR (Sec. 3.1). |
(2) Linearize and identify the LTV system parameters via least square (Sec. 3.2). |
(3) Solve the Riccati equations for each step along the nominal trajectory for feedback gain . |
(4) Apply the closed-loop control policy, whilet < Tdo |
(12) |
. |
end while |
4 Control Discretization
The system dynamics and algorithm described in Sec. 2 and 3 assume the state to be fully observed, and the actuation was designed to have grid-level discretization in the control inputs. However, such a detailed actuation scheme is not practical, owing to (a) a lack of actuators of such precision scales, and (b) the scale of the actuation growing exponentially for realistic grid discretizations. Thus, there is a need to find a suitable control discretization, akin to the boundary condition, for suitable direction of the phase-field dynamics.
In general, PDEs generally tend to have an underlying low-dimensional structure owing to sparsity. Existing adaptive strategy for the Allen–Cahn equation without control uses a fine mesh on the interface and coarser mesh on the bulk regions [13]; however, incorporating control with such a scheme may result in the nucleation of a phase [34]. Hence, there is a need to fine-tune the control discretization in order to optimize between controllability and feasibility.
4.1 Tuning Control Discretization.
Selection of the ideal control discretization warrants the consideration of the features present in the dynamics, along with the interfaces between the multiple phases present in the microstructure. Typically, a grid-scale control actuation would be ideal for maintaining full control over the microstructure dynamics. However, from Sec. 3.2, we know that the computation of the LTV system parameters (At, Bt) scales as , with a full grid-scale control discretization resulting in . As we scale the grid to realistic meshes, the LTV system identification, thus, scales exponentially, making the algorithm infeasible! Hence, there is a need for reducing the control discretization to minimize computational complexity. On the contrary, too coarse of a control mesh would result in the inability to dictate the dynamics to achieve specific features, leading to a reduction in the reachability of the target microstructure.
In order to empirically determine the effect of different control discretization schemes and figure out the optimal choice, we vary the discretization from the phase grid-level to coarser grids. We aim to achieve a banded microstructure configuration as our target and vary the actuation scheme. This results in situations where the actuation grid is either smaller or larger than the features of the target microstructure (Figs. 5(a) and 5(b)). For both these cases, it is observed that the ease of control effort, as compared to the original grid-scale control discretization, comes at the expense of reachability of the target microstructure (Fig. 5(d)).
We also observe that, given a segment of the grid with the same goal state, if we choose our control resolution such that the same control input is fed to the grid sections with the same corresponding target-state, we are able to achieve the desired state (Figs. 5(c) and 5(d)). Thus, it is essential to optimize between using a sufficiently fine grid that can capture the features of the target microstructure, but is sparse enough to reduce computational burden and ease of application without significant losses in precision.
Hence, we arrive at a spatial control strategy based on the distribution of order parameter in goal state named as “goal state-oriented” or GS-o control in this study.
4.2 Goal State-Oriented Scheme of Actuation Discretization.
While the desired order parameter field always consists of a certain combination of pure phases represented by =±1, numerical experiments reveal that the order parameter can be guided from any initial value to pure phases by using the same control sequence (T and h).
This can be done through a two-step process. Starting from any arbitrary distribution of the order parameters, we can homogenize the microstructure through uniform heating, hence giving a structure with a uniform initial order parameter, considered as our new initial state. Now, we can realize the control by applying identical control sequence to the region whose target values in the goal state are the same (Fig. 6). That is, we do not control every pixel across the whole domain, instead, we control consistently two regions: one is represented by = −1 and the other one by = +1 in the goal state (Fig. 5(c)). Thus, the same control inputs (T, h) are used to actuate all the regions corresponding to , while a different control input is used to actuate all the regions corresponding to . This actuation scheme is typical of these material microstructure systems and computational results on using such a scheme are presented in Sec. 5.
This treatment significantly reduces the number of control inputs from the order of the spatial discretization () to just 4, and therefore greatly reduces the computational cost. Moreover, such a control resolution scheme is more feasible in practical scenarios as well.
5 Simulation Results
In this section, we compare the training and performance of the data-based D2C approach, with the RL-based DDPG, on a material with control inputs as the temperature and external field inputs on subsets of grid points.
5.1 Structure and Task.
We simulated the phase-separation dynamics in python, through calling an explicit, second-order solver subroutine in fortran. The system and its tasks are defined as follows:
Material Microstructure. The material model simulated consists of a two-dimensional grid of dimensions 10 × 10, 20 × 20, and 50 × 50, i.e., 100, 400, and 2500 fully observable states, respectively. The order parameter at each of the grid point can be varied in the range . Furthermore, the model is propagated using an explicit, second-order, central-difference based scheme. The control inputs (T, h) are actuated at each grid point separately; thus, the number of control variables is twice the total number of observable states (200, 800, and 5000, respectively). The control task is to influence the material dynamics to converge to a
The initial and the desired final states of the model are shown in Fig. 7.
5.2 Algorithms Tested.
Deep deterministic policy gradient is a policy-gradient based off-policy reinforcement learning algorithm that operates in continuous state and action-spaces. It relies on two function approximation networks, one each for the actor and the critic. The critic network estimates the Q(s, a) value given the state and the action taken, while the actor network engenders a policy given the current state. Neural networks are employed to represent the networks.
Further details regarding the algorithm and the implementation are provided in Appendix A2.
5.3 Training and Testing.
Decoupled data-based control implementation is done in three stages corresponding to those mentioned in Sec. 3, and a black-box phase-field model is simulated in python.
Training: The open-loop training plots in Fig. 8 show the cost curve during training. After the cost curves converge, we get the optimal control sequence that could drive the systems to accomplish their tasks. The training parameters and outcomes are summarized in Tables 1–3. We note that the training time is dominated by the python simulation code and can be made significantly faster with an optimized and parallelized code. Thus, we envisage this work as an initial proof of the concept for controlling material microstructures.
Training time (s) | ||
---|---|---|
Goal state | D2C | DDPG |
Goal-I (10 × 10) | 15.83 | 2567.71* |
Goal-II (20 × 20) | 55.206 | 4419.36* |
Goal-III (50 × 50) | 3289.354 | ** |
Training time (s) | ||
---|---|---|
Goal state | D2C | DDPG |
Goal-I (10 × 10) | 15.83 | 2567.71* |
Goal-II (20 × 20) | 55.206 | 4419.36* |
Goal-III (50 × 50) | 3289.354 | ** |
(1) The open-loop training is run on a laptop with a two-core CPU@2.9 GHz and 12 GB RAM. No multithreading at this point. (2)
implies algorithm cannot converge to the goal state. (3)
implies system running out of memory at run-time.
Training time (s) | ||
---|---|---|
Goal state | D2C | DDPG |
Goal-I (10 × 10) | 141.573 | 3598.298* |
Goal-II (20 × 20) | 5083.223 | 9956.454* |
Training time (s) | ||
---|---|---|
Goal state | D2C | DDPG |
Goal-I (10 × 10) | 141.573 | 3598.298* |
Goal-II (20 × 20) | 5083.223 | 9956.454* |
(1) The open-loop training is run on a laptop with a two-core CPU@2.9 GHz and 12 GB RAM. No multithreading at this point. (2)
implies algorithm cannot converge to the goal state.
System | No. of steps | No. of actuators | No. of parameters optimized in D2C | No. of parameters optimized in DDPG |
---|---|---|---|---|
Goal-I | 10 | 200 | 2000 | 461,901 |
Goal-II | 10 | 800 | 8000 | 1,122,501 |
Goal-III | 10 | 5000 | 50,000 | 5,746,701 |
System | No. of steps | No. of actuators | No. of parameters optimized in D2C | No. of parameters optimized in DDPG |
---|---|---|---|---|
Goal-I | 10 | 200 | 2000 | 461,901 |
Goal-II | 10 | 800 | 8000 | 1,122,501 |
Goal-III | 10 | 5000 | 50,000 | 5,746,701 |
Testing Criterion: Given that RL algorithms perform a search over a nonlinear parametrization for optimization, it might be expected for DDPG to yield a global feedback policy, while D2C, with a linear feedback structure, should provide a locally valid feedback solution. However, from the training outcomes, we observe that DDPG fails to converge to the target, and hence using DDPG solutions as a global feedback law is infeasible in this context. Thus, for the closed-loop design, we proceed with the system identification and feedback gain design step of the D2C algorithm mentioned in Secs. 3.2 and 3.3 to get the closed-loop control policy.
To observe how the closed-loop feedback solution performs globally, we benchmark the performance between the open-loop and the closed-loop D2C control policy under different noise levels. The open-loop control policy is to apply the optimal control sequence solved in the open-loop training step without any feedback. Adding perturbation to this control sequence drives the state off the nominal trajectory, increasing the episodic cost with a corresponding increase in noise levels. Hence, a zero-mean Gaussian independent and identically distributed. noise is added to every control channel at each step, with the standard deviation proportional to the maximum control signal in the designed optimal control sequence for D2C, sweeping over a range of noise levels to explore the “global” behavior. One-hundred rollouts are simulated for each noise level, and the mean and standard deviation of the total episodic cost are used as the benchmark criterion.
The feedback policy obtained appears to be near-optimal and highly robust for low noise levels (Fig. 9), which is as expected from the -optimality of the algorithm used [32]. But when operating in high noise regimes, the robustness of the feedback policy quickly degenerates (Fig. 10).
It must be noted that the range of noise levels (at least up until about 100% of the maximum nominal control signal) that we are considering here is far beyond what is typically encountered in practical scenarios. Generally, noise in the system is a result of either external disturbances in the ambient conditions or small disturbances in the control channels, arising potentially due to faulty actuators, signal attenuation, cross-talk, etc. In practice, control actuators have a typical signal-to-noise ratio of , resulting in a noise of the control signal. Thus, as evident from the results, the feedback policy should maintain robustness and near-optimality in such practical regimes.
5.4 Optimization for Computational Efficiency.
From the experiments, we are able to converge the given microstructure to any of our given states within finite time, through an iterative optimization. Thus, the question arises: What would be the most optimal actuation scheme so as to minimize the control cost (analogous to energy expenditure in practical applications), without significantly impacting the ease of application of such a scheme? Are there approaches for smarter selection of our initial guess wherein we can leverage the knowledge about the physics of the system, in order to ensure faster convergence?
From the set of observations, there are three regimes wherein we tackle this problem.
5.4.1 Actuation Discretization Using Goal State-Oriented Control.
The advantage of GS-o control in computation efficiency over the fully actuated control is shown by Fig. 11: (i) thanks to less number of control variables, GS-o converges much faster than fully actuated control as seen in Fig. 11(a), and takes much less computation time shown in Fig. 11(b) and (ii) the cost of attaining better computation efficiency is that the GS-o control either requires relatively higher control cost to guide the microstructure to similar terminal states with the same discrepancy from goal state as seen in Fig. 11(c) or attains a terminal state with relatively larger discrepancy with the same control cost. This is caused by the increasing difficulty of controlling the interface region in GS-o control. Considering the sharp interface in goal state is actually a sort of smoothing in real cases, the error locating nearby interface during control makes limited sense. Therefore, it is reasonable to conclude that GS-o is able to control the system with much better computational efficiency.
5.4.2 Temporal Discretization.
The rate of attenuation scheduling of the control inputs (i.e., horizon discretization) also impacts the total control cost. We, thus, need to test whether holding a control input for longer step sizes is more efficient than introducing a faster frequency of updating the controls.
From simulations held for a fixed period of the dynamics, it is observed that increasing the number of control update steps (while decreasing the average hold times) is significantly more cost-efficient (Fig. 12).
5.4.3 Warm Starting (Model-Based Approach).
In this section, we derive a simple control scheme to control the inputs to each grid point in the material, so as to reach the desired phase distribution. In order to achieve this, we exploit the Allen–Cahn dynamics along with the assumptions of a periodic boundary condition to get a better initial estimate for the control scheme.
Thus, the simplified model reduces to a temporal differential equation instead of a complex spatiotemporal partial differential equation. This single pixel system is then fed to our ILQR-based control scheme, to generate a look-up table (of sorts) for a control trajectory from the given initial state to both stable final states .
The control trajectories thus obtained are mapped to any desired final state structure and serve as an improved initial guess for the main algorithm, i.e., now the algorithm only has to optimize for the diffusion term.
We compare the D2C training plots in Fig. 13, with a random initial guess, along with optimizing the single-pixel model-based control trajectory (Fig. 14). From the training convergence plots, it is clear that the control trajectory obtained from the model-based approach has a performance within 10% of that of the optimized trajectory which includes the diffusive-aspects of the Allen–Cahn dynamics.
Our approach, thus, cuts the training time by near 40%, and the total convergence time (including generating the look-up table for the single-pixel model) is still less than that of running a full-scale model with a random initial guess.
We also observe that the final optimal control trajectory for different spatial-points of the material, converging to the same final phase , is the same. Thus, mapping the same initial guess for all such points is justified.
Discussion: The total time comparison in Tables 1 and 2 shows that D2C learns the optimal policy significantly faster than DDPG. The primary reason for this disparity is the feedback parametrization of the two methods: the DDPG deep neural nets are complex parametrizations that are difficult to search over, when compared to the highly compact open-loop + linear feedback parametrization of D2C, as is apparent from Table 3. We observe from experiments that the DDPG algorithm is not able to converge to the goal state for the material system of state-dimension over 5 × 5 discretization (25 states and 50 controls), while the D2C algorithm can quickly generate the closed-loop feedback solution even for the more complex systems.
From Fig. 11(c), we also observe that, for the Allen–Cahn dynamics, the additional cost from switching to a control-grid of smaller discretization, mainly stemming from an increase in control cost, is significantly smaller when compared to the ease of application, and computation time advantage (Fig. 11(b)) gained from such an approach.
Limitations: Albeit the computational advantages of a tighter parametrization, along with the near-optimality guarantees that the D2C algorithm holds, scaling such an optimal control approach to realistic microstructure grids using this approach will be computationally intractable, owing to the sampling/rollouts required for system identification of the “black-box.” In order to exploit the sparsity of the microstructure dynamics, we have proposed a new approach to solve the issue of scaling complexity in our recent work [35]. Also, the D2C algorithm relies on a full-state feedback to generate the optimal trajectories, which may not be an option for realistic scale grids owing to sensor constraints. In order to incorporate our proposed GS-o approach to such problems, we have proposed a partially observed application of the ILQR algorithm in our recent work [36].
6 Conclusions and Future Work
In this article, we have provided an overview of the modeling of multiphase microstructures in accordance with the Allen–Cahn and Cahn–Hilliard equations. We have also compared learning/data-based approaches to the control of such structures, outlining their relative merits and demerits.
In the future, we further plan to extend the methodology to real-world materials, leading to the development of an experimental testbed on which the proposed approach for microstructure control can be verified and modified to facilitate the development of designer materials.
Funding Data
NSF CDS&E program (Grant No. 1802867; Funder ID: 10.13039/100000078).
Data Availability Statement
The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.
Footnotes
This paper was presented at the American Control Conference 2024.
Appendix
A1 Iterative Linear Quadratic Regulator Details.
We present details of the D2C–ILQR algorithm in the following. The “forward pass” and “backward pass” algorithms are summarized in Algorithms 3 and 4, respectively. Algorithm 5 presents the differential dynamic programming (DDP) feedback algorithm used to calculate the optimal feedback gain.
Input: Nominal trajectory—, previous iteration policy parameters—, and system and cost parameters—. |
. |
. |
whilet < Ndo |
/*Simulate one step forward (α is the line-search parameter.)*/ |
. |
end while |
iftois acceptablethen |
return, true. |
end if |
else |
return, false. |
end if |
Input: Nominal trajectory—, previous iteration policy parameters—, and system and cost parameters—. |
. |
. |
whilet < Ndo |
/*Simulate one step forward (α is the line-search parameter.)*/ |
. |
end while |
iftois acceptablethen |
return, true. |
end if |
else |
return, false. |
end if |
Input: Nominal trajectory—, previous iteration policy parameters—, horizon—N, and system and cost parameters—. |
/* Backward pass starts from the final time-step, i.e., N − 1.*/ |
. |
Compute and using boundary conditions. |
/*Keep a copy of previous policy gains.*/ |
and . |
whiledo |
/*Obtain the Jacobians from simulator rollouts as shown in Sec.3.1.1:*/ |
/*Obtain the partials of the Q function as follows:*/ |
ifis positive-definitethen |
end if |
Else |
/*If is not positive-definite, then, abort the backward pass.*/ |
return, false. |
end if |
/*Obtain the partials of the value function Jt as follows:*/ |
end while |
return, true. |
Input: Nominal trajectory—, previous iteration policy parameters—, horizon—N, and system and cost parameters—. |
/* Backward pass starts from the final time-step, i.e., N − 1.*/ |
. |
Compute and using boundary conditions. |
/*Keep a copy of previous policy gains.*/ |
and . |
whiledo |
/*Obtain the Jacobians from simulator rollouts as shown in Sec.3.1.1:*/ |
/*Obtain the partials of the Q function as follows:*/ |
ifis positive-definitethen |
end if |
Else |
/*If is not positive-definite, then, abort the backward pass.*/ |
return, false. |
end if |
/*Obtain the partials of the value function Jt as follows:*/ |
end while |
return, true. |
Input: Nominal trajectory—, horizon—N, and system and cost parameters—. |
/* Start from the final time-step, i.e., N − 1.*/ |
. |
Compute and using boundary conditions. |
whiledo |
/*Obtain the Jacobians from simulator rollouts as shown in Sec.3.1.1:*/ |
/*Obtain the Hessians from simulator rollouts as shown above:*/ |
/*Obtain the partials of the Q function as follows:*/ |
ifis positive-definitethen |
end if |
else |
/*Ifis not positive-definite, then, abort the backward pass.*/ |
return error. |
end if |
/*Obtain the partials of the value function Jt as follows:*/ |
end while |
return {K}, true. |
Input: Nominal trajectory—, horizon—N, and system and cost parameters—. |
/* Start from the final time-step, i.e., N − 1.*/ |
. |
Compute and using boundary conditions. |
whiledo |
/*Obtain the Jacobians from simulator rollouts as shown in Sec.3.1.1:*/ |
/*Obtain the Hessians from simulator rollouts as shown above:*/ |
/*Obtain the partials of the Q function as follows:*/ |
ifis positive-definitethen |
end if |
else |
/*Ifis not positive-definite, then, abort the backward pass.*/ |
return error. |
end if |
/*Obtain the partials of the value function Jt as follows:*/ |
end while |
return {K}, true. |
DDP Feedback Gain Calculation. Once the optimal nominal trajectory is obtained with ILQR, one DDP back pass is conducted to find the linear optimal feedback gain as shown in Algorithm 5. Then, the linear feedback is wrapped around the nominal control sequence (), where is the state deviation from the nominal state .
A2 Deep Deterministic Policy Gradient Algorithm Implementation Details.
The off-policy characteristic of the algorithm employs a separate behavioral policy by introducing additive noise to the policy output obtained from the actor network. The critic network minimizes loss based on the temporal-difference error, and the actor network uses the deterministic policy gradient theorem to update its policy gradient as shown below:
The actor and the critic networks consist of two hidden layers with the first layer containing 400 (“relu” activated) units followed by the second layer containing 300 (relu activated) units. The output layer of the actor network has the number of (“tanh” activated) units equal to that of the number of actions in the action-space.
Target networks one each for the actor and the critic are employed for a gradual update of network parameters, thereby reducing the oscillations and a better training stability. The target networks are updated at . Finally, the networks are compiled using Adams' optimizer with a learning rate of 0.001.
where Θ indicates how fast the process reverts to mean, μ is the equilibrium or the mean value, and σ corresponds to the degree of volatility of the process. Θ is set to 0.15, μ to 0, and σ is annealed from 0.35 to 0.05 over the training process.
A3 Deep Deterministic Policy Gradient Trajectories.
The corresponding trajectories as generated by the DDPG algorithm can be seen in Fig. 15. It is observed that the DDPG algorithm cannot converge to the goal state for grids of over 5 × 5 discretization.