Abstract
Model validation is the process of determining the degree to which a model is an accurate representation of the true value in the real world. The results of a model validation study can be used to either quantify the model form uncertainty or to improve/calibrate the model. The model validation process becomes complex when there is uncertainty in the simulation and/or experimental outcomes. These uncertainties can be in the form of aleatory uncertainties due to randomness or epistemic uncertainties due to lack of knowledge. Five different approaches are used for addressing model validation and predictive capability: (1) the area validation metric (AVM), (2) a modified area validation metric (MAVM) with confidence intervals, (3) the validation uncertainty procedure from ASME V&V 20, (4) a calibration procedure interpreted from ASME V&V 20, and (5) identification of the model discrepancy term using Bayesian estimation. To provide an unambiguous assessment of these different approaches, synthetic experimental data is generated from computational fluid dynamics simulations of an airfoil with a flap. A simplified model is then developed using thin airfoil theory. The accuracy of the simplified model is assessed using the synthetic experimental data. The quantities examined include the two-dimensional lift and moment coefficients for the airfoil with varying angles of attack and flap deflection angles. Each of these approaches is assessed for the ability to tightly encapsulate the true value at conditions both where experimental results are provided and prediction locations where no experimental data are available. Generally, it was seen that the MAVM performed the best in cases where there is a sparse amount of data and/or large extrapolations. Furthermore, it was found that Bayesian estimation outperformed the others where there is an extensive amount of experimental data that covers the application domain.
1 Introduction
Mathematical models are useful as they can adequately predict real world physics, but nearly always have some error and/or uncertainty attached to their results. Error and uncertainty can come from many factors. They could be introduced from assumptions and approximations in the formulation of the model, as well as uncertainty in the measured inputs required by the model. Furthermore, the true value of a physical quantity cannot be measured perfectly because of experimental measurement uncertainties. Regardless of the difficulties, it is important to assess model accuracy by comparison of simulation results with experimental measurements. Model validation is the process of determining the degree to which a model is an accurate representation of the real world from the perspective of the intended uses of the model. Model calibration is the process of adjusting physical model parameters in the computational model to improve agreement with the experimental measurements. It needs to be noted that any experiment is a result from the real world, regardless of its connection to an actual system of interest or measurement difficulties. For the purposes of this paper, it is stressed that calibration refers to an updated model discrepancy estimation and not a calibration of the model inputs themselves. A validation experiment is an experiment conducted with the primary purpose of assessing the predictive capability of a mathematical model. Validation experiments differ from traditional experiments used for exploring a physical phenomenon or obtaining information about a system because the customer for the validation experiment is commonly the model builder or computational analyst. For a detailed discussion of the design and execution of model validation experiments, see Oberkampf and Roy [1] and Oberkampf and Smith [2].
Various approaches have been proposed for assessing model accuracy, i.e., model validation, given that experimental measurements are available [3–5]. In Secs. 5 and 6, four different approaches are used for addressing model validation: (1) the area validation metric (AVM) [6], (2) a modified area validation metric (MAVM) with confidence intervals [7,8], (3) the validation uncertainty procedure from ASME V&V 20 [9], and (4) identification of the model discrepancy term using Bayesian estimation [10]. To provide an unambiguous assessment of the effectiveness of these different approaches, synthetic experimental data, i.e., truth data, is generated. The synthetic experimental data is generated using a high-fidelity computational fluid dynamics (CFD) simulation of turbulent, compressible, flow over a two-dimensional wing with a flap. The synthetic experimental data is presumed to have only an associated random measurement uncertainty, but no bias or systematic errors. To properly assess the effectiveness of each of the methods, random samples are drawn from the synthetic experimental data and then used in each method.
The effectiveness of each of the methods is assessed using the same approximate model of flow over the airfoil with a flap. The approximate model is based on thin airfoil theory which assumes inviscid, incompressible flow. The approximate model is assumed to have two uncertain inputs, angle of attack and flap angle, which are characterized as Gaussian random variables. The methods are assessed at a number of input conditions where (synthetic) experimental data are available. Discussion of results for this traditional case of model validation is given in Sec. 5. The effectiveness of the methods is also assessed for input conditions where no experimental data are considered to be available. This activity is typically referred to as assessment of predictive capability of a model, or model-based extrapolation to conditions where experimental data are not available. Using the method of manufactured universes (MMU) [11], in combination with the synthetic experimental data, the methods are assessed for predictive capability. Each of the methods uses an extrapolation technique to estimate the mean value of the model form uncertainty and its prediction interval in the conditions where no experimental data are available. Discussion of results for predictive capability is given in Sec. 6. A summary of the performance of each method is summarized in Sec. 7.
2 Model Validation/Calibration Methods
Four different methods are assessed for their effectiveness in model validation/calibration. Model validation is the process of determining the degree to which a model is an accurate representation of the true value in the real world. The four validation/calibration approaches that are assessed are the area validation metric [6], the modified area validation metric [7,8], the ASME's V&V 20 standard validation uncertainty [9], and a Bayesian estimation approach [10]. These four methods were chosen because they are each popular methods for assessing model accuracy when experimental measurements are available. For the assessment here, all methods assume that numerical solution error in the simulation result is negligible. Furthermore, it is assumed that the experimental measurement uncertainty occurs only in the measurement of the system response quantities (SRQs).
2.1 Area Validation Metric.
where d is the area validation metric and Y is the SRQ. Once d is determined, the interval in which the true value is estimated to lie is [F(Y) – d, F(Y) + d]. An example of the area validation metric applied for a case where only aleatory uncertainties are present in the model inputs is shown in Fig. 1.
The MFU, as estimated using the AVM, is treated as a purely epistemic uncertainty when used in combination with probability bounds analysis (PBA) [1,6,12,13]. That is, the AVM is considered as an interval-valued uncertainty on both sides of the CDF computed from the nondeterministic simulation. Figure 2 shows the probability box (p-box) that results when the AVM is added to both sides of the CDF from the simulation. This p-box represents the family of all possible CDFs that can exist within its bounds. The outer bounding shape of the p-box in Fig. 2 is due to the probabilistically characterized input uncertainty. The p-box can be interpreted as follows. The probability that the SRQ = 4 or less, is in the interval [0.02, 0.52]. That is, given the inclusion of the estimated MFU, the SRQ = 4 or less could be as low as 0.02, but it could be as high as 0.52. As discussed in Refs. [1,12], and [13], epistemic uncertainties due to model input uncertainty and numerical solution uncertainty can also be included using PBA. These additional uncertainties additively increase the size of each preceding p-box.
2.2 Modified Area Validation Metric.
The interval-valued estimate of the MFU [d–, d+], when combined with PBA, will yield an asymmetric p-box as opposed to the symmetric p-box for the AVM shown in Fig. 2. The more biased the simulation is relative to the available experimental data, the more asymmetric the p-box will be when using the MAVM.
2.3 ASME V&V 20 Standard Validation Uncertainty.
where s2 is the variance of the experimental measurements and N is the number of samples. This results in the uncertainty estimate of the mean experimental value being quite small when many experiment replicates are available.
where δmodel is the discrepancy between the model and the true value in nature. For this case, it is indicated that the true model discrepancy is most likely less than the sum of the total validation uncertainty and the absolute value of the true error. The reason this claim can be confidently made is that in situations where |E| ≪ uval, the true error is relatively small meaning that at prediction locations the model discrepancy will most likely be relatively small. Therefore, the true value will most likely be easily encapsulated by the sum of the validation uncertainty and true error. Figure 4 illustrates when this method should be used as a validation method for cases when the ratio of the model discrepancy to the validation uncertainty is greater than five, and when it is to be used as a calibration method for when that ratio is less than five. In this figure, the dashed line at zero indicates the mean simulation result, and the squares represent the model error.
2.4 Bayesian Model Estimation.
where X are domain locations where experimental data are available and are locations where model discrepancy is to be predicted. Figure 5 shows an example of posteriors created assuming a prior mean of zero and a prior variance of the variance in the observed errors between the experimental samples and the mean simulation result, and then the updated posterior after incorporating the observation data to reflect in the model discrepancy.
3 Method of Manufactured Universes and Validation Case
Since our goal is not to actually validate or calibrate a model, but instead to assess the usefulness of different model validation and prediction frameworks, we will not use actual experimental data, which would leave ambiguity regarding the true value in nature. Instead, we will use the MMU developed by Stripling et al. [11] to unambiguously assess the different model validation, calibration, and prediction techniques. Similar to the method of manufactured solutions used for code verification, MMU involves the generation of a manufactured reality, or “universe,” from which “experimental” observations can be made. A low-fidelity model is then developed and assessed with uncertain inputs and possibly including numerical solution errors. Since the true behavior of “reality” is known from the manufactured universe, the estimation of model form uncertainty and errors in the lower-fidelity model can be performed. It is suggested that the manufactured reality be based on a high-fidelity model so as to obtain similar qualitative behavior as that found in a real experiment. This approach can be used to compare different methods for estimating model form uncertainty and calibration in the presence of random experimental measurement errors, experimental bias errors, modeling errors, and uncertainties in the model inputs. It can also be used to assess approaches for extrapolating model form uncertainty to conditions where no experimental data are available (i.e., model prediction locations) [7].
where cl is the 2D lift coefficient, cm(c/4) is the 2D moment coefficient about the quarter chord, α is the angle of attack, and dz/dx is the derivative of the mean camber line with respect to x. Figure 7 shows the lift and moment coefficient provided from the manufactured universe with the red x's representing the locations at which high-fidelity CFD results were provided. The lift and moment coefficients provided by thin airfoil theory are shown in Fig. 8. Note the linearity of the results for both coefficients because thin airfoil theory does not account for flow separation, and is expected to deviate significantly at high angles of attack and flap deflection. Despite this deviation due to thin airfoil theory failing to account for flow separation, this is not an issue for the two goals of this investigation. The primary goal is to assess first how well each method encapsulates errors in the (imperfect) model relative to the true value in nature with its uncertainty bounds (i.e., the method's conservativeness). The secondary goal is to determine how tightly the uncertainty bounds encapsulate the true value (the CFD result) if it is found to be conservative. Since these uncertainties will then be extrapolated to parts of the model input domain where no experimental observations will be made, the model imperfections found in thin airfoil theory will be helpful in determining the predictive capability of each of these methods. In this validation case, a normal (Gaussian) uncertainty with a mean at the nominal value and a standard deviation of 0.5 deg and 1.0 deg is propagated through the angle of attack and flap deflection, respectively. Each uncertain input was sampled as a normal distribution with the mean being the intended measurement of angle of attack or flap deflection, and the standard deviation of the distribution is the associated uncertainty (0.5 deg for angle of attack and 1 deg for flap deflection). For the output quantities (lift and moment coefficient), additional random measurement uncertainty is assigned to each experimental replicate measurement as one would have in an actual experiment. This uncertainty is assumed to be Gaussian with a mean of zero and a standard distribution of 5% of the mean value of the output quantity found by evaluating the true value in nature as a function of the nominal input values.
4 Performance Metrics
where φ2,c is more dependent on the measure of the ratio of the associated uncertainty with the calibration to the original true error magnitude that existed before calibration. For both validation and calibration methods, if a method fails to be conservative, then φ2 will be taken to be zero for that case. As with conservativeness, larger tightness values near 100% are preferred.
where φ is the overall assessment and αw is a weighting factor, typically set to be 0.5 (i.e., equal weighting) for low consequence applications such as preliminary design or 0.9 for higher consequence applications in order to place a greater significance on conservativeness.
5 Results for Model Validation
Each of the methods previously discussed was first assessed at locations of the input domain where synthetic experimental results were provided (i.e., the red x's in Fig. 7) and compared with results from the model. Uncertainty bounds about each of the results are shown in Fig. 9 for both 2 and 16 experimental samples for the nominal condition of α = 30 deg and δ = 20 deg. Note that for this validation activity, only one observation point was used and no regression fit was performed for the AVM, the MAVM, and V&V 20 methods. For the Bayesian calibration method, the Gaussian process across the entire input domain was created but was only assessed at the single validation location in this section. The green dashed line represents the mean simulation result, and the gray dashed line represents the experimental mean for the given number of samples. Notice how as the number of experimental samples increases from 2 to 16, the gray line moves closer to the blue dashed line. The blue dashed line is the true value, which is taken to be the experimental mean as the number of experimental samples approaches infinity. For the purposes of this study, the true value was taken as the experimental mean from a size of 100,000 samples. It is also seen that as the number of experimental samples increases, the uncertainty bounds about the true value generally become smaller.
In Fig. 10, the conservativeness of each method aggregated over all of the 35 observation points as a function of sample size is shown. As would be assumed, with the increase in experimental samples available, the conservativeness of each method generally increases. However, this was not the case for area validation metric. This is because the area validation metric has no included confidence interval such as the modified area validation metric. Therefore, whether the area validation metric is conservative or not largely depends on the value of the mean experimental SRQs relative to the true value (i.e., it will be larger than the true value approximately 50% of the time). The same trend is also seen for the tightness measurement, shown in Fig. 11, as the uncertainty bounds for each validation metric and calibration method becomes tighter around the true value, excluding the area validation metric. The area validation metric's tightness for these locations converge to about 50% due to its symmetric MFU bounds, and the other four methods' tightness measurement does not quite converge to 100% with increased sample size due to the confidence intervals associated with each method. Thus, at these observation locations, each method slightly over-predicts the associated uncertainty of the model relative to the true value when a large number of experimental samples are available. Figure 12 shows the overall assessment of the methods for the case where experimental data are available, with αw taken to be 0.5 and 0.9. Based on these results, it is shown that the modified area validation metric, V&V 20, and Bayesian calibration perform well even for low experimental sample sizes. When using a weighting factor of 0.9 and placing more of an emphasis on conservativeness for the overall assessment, most of the methods converge to 95% for small sample sizes (with the exception of the AVM). The high overall assessment of the calibration methods is probably expected, however, due to the methods having experimental data to help obtain an idea of where the true value lies. The added confidence intervals about the experimental data when computing the MAVM show an obvious improvement from the AVM in its tightness about the true value and its overall assessment.
6 Results for Predictive Capability
The second part of comparing these methods is to observe how they performed at locations where no experimental results are provided, and the results have to be interpolated or extrapolated to the prediction locations. This was done by creating a best fit of the metric results at the locations where experimental SRQs are provided. Outlined in Roy and Oberkampf [1], 95% prediction intervals were then placed about the fit, and the upper limit of the prediction interval is taken as the metric result. Therefore, in this instance the upper level of the 95% prediction intervals are taken for d, d+, d–, E, and uval then used as the uncertainty for their respective method. Note that for calibration using V&V 20, the model is calibrated by E and the associated uncertainty is the 95% prediction interval for E plus the upper limit on the prediction interval for kuval [14]. However, this does not have to be done for the Bayesian model updating as predictions are made directly from sampling the Gaussian Process model. A simple 1D example of this interpolation is shown in Fig. 13 for the extrapolation of the area validation metric using a quadratic regression fit. The uncertainties or model errors/discrepancies are extrapolated to 20 prediction locations located at combinations of angles of attack of 0 deg, 18 deg, 25 deg, 38 deg, and 45 deg and flap deflections of 0 deg, 8 deg, 17 deg, and 25 deg. For the case in which a sparse number of experimental observations are available, a linear regression fit is used in both the dimension of angle of attack and flap deflection due to the limited number of available points. However, as the number of observations increases (as for the moderate and plentiful cases), a bi-quadratic regression fit is used.
6.1 Sparse Experimental Observations.
Four observations were assumed at angles of attack of 10 deg and 30 deg and flap deflections of 5 deg and 20 deg. These locations are listed in Table 1 and shown in Fig. 14 with the observation locations being shown in white and prediction locations in red. The uncertainty intervals for each method are shown in Fig. 15 for experimental sample sizes of 2 and 16 at α = 18 deg and δ = 25 deg. The locations where experimental data are available are defined as the validation domain, and the locations at which predictions are being made are the prediction domain. In this case, the data are sparse, and there is a significant amount of extrapolation that is required from the validation domain. It is seen how the modified area validation metric accounts for bias error by its one-sided interval, unlike the area validation metric's symmetric uncertainty interval. The uncertainty intervals on the modified area validation metric can also be seen closing in on the true value with the increase in sample size from 2 to 16, due to the reduced confidence intervals about the experimental mean. Additionally, it is seen that the AVM and MAVM are overly conservative for lift coefficient compared to V&V 20 and Bayesian calibration. This is because at this prediction location the true value and means value from the simulation are relatively close to one another, and the added prediction interval for those methods, it makes them overly conservative at this location while having a tighter bound about the true value at prediction locations further away from the validation domain.
Locations of experimental observations | ||
---|---|---|
Amount of observations | Angles of attack (α) | Flap deflections (δ) |
Sparse | 10 deg and 30 deg | 5 deg and 20 deg |
Moderate | 6 deg, 13 deg, 28 deg, 34 deg, and 42 deg | 5 deg, 13 deg, 21 deg, 27 deg, and 35 deg |
Plentiful | 1 deg, 4 deg, 9 deg, 15 deg, 19 deg, 24 deg, 28 deg, 32 deg, 35 deg, 40 deg, and 41 deg | –1 deg, 2 deg, 5 deg, 7 deg, 10 deg, 13 deg, 15 deg, 21 deg, 26 deg, 29 deg, 35 deg, and 41 deg |
Locations of experimental observations | ||
---|---|---|
Amount of observations | Angles of attack (α) | Flap deflections (δ) |
Sparse | 10 deg and 30 deg | 5 deg and 20 deg |
Moderate | 6 deg, 13 deg, 28 deg, 34 deg, and 42 deg | 5 deg, 13 deg, 21 deg, 27 deg, and 35 deg |
Plentiful | 1 deg, 4 deg, 9 deg, 15 deg, 19 deg, 24 deg, 28 deg, 32 deg, 35 deg, 40 deg, and 41 deg | –1 deg, 2 deg, 5 deg, 7 deg, 10 deg, 13 deg, 15 deg, 21 deg, 26 deg, 29 deg, 35 deg, and 41 deg |
When examining the conservativeness of each method shown in Fig. 16, it is seen that the area validation metric and the modified area validation metric are shown to be the most reliably conservative of the five methods when a small number of observations are available and there is a significant extrapolation from the validation domain to the prediction domain. However, it is interesting to see the different levels of conservativeness that are shown between lift coefficient and moment coefficient for the AVM and MAVM. This comparison shows the effectiveness of the extrapolation of these methods and being conservative is reliant upon where observations are made to collect data in their assessment. Because in this case only two observations are made in each respective dimension of the uncertain inputs, the information from these two methods can only be extrapolated via a linear regression fit. Due to the observations being made in locations where the experiment still behaves linearly for lift coefficient with respect to angle of attack and flap deflection, the linear fit closely encapsulates the true value for small angles of attack and flap deflection. However, when extrapolating out to regions of high angles of attack and flap deflections (where flow separation occurs and the lift coefficient becomes nonlinear), the methods tend to not be conservative. This is not the case for moment coefficient due to the experiment containing a lot of nonlinearity across the input domain, therefore the prediction intervals on the extrapolation regression fits are more easily able to encapsulate the true value outside of the validation domain. It is also important to note the poor performance of Bayesian calibration when a low amount of data are available. This is because Bayesian calibration assumes a prior mean discrepancy of zero before calibration (i.e., no difference between the experiment and model), and does not calibrate the model outside of the region where observations are available in which information on how the experiment and model differ is provided. Therefore, when making predictions for the mean discrepancy with few observations, it is hard to acquire meaningful information outside of where those observations are made.
Examining the tightness in Fig. 17, the modified area validation metric is shown to be the tightest in this case. Also, it is shown that the AVM was about half as tight as the MAVM as expected, due to the MAVM's separate tracking of d+ and d– areas to help account for bias error. The Bayesian updating method was seen to be the least conservative in part due to the fact that the method assumes a large uncertainty about the calibration outside of the domain where observations were made, therefore leading to a low tightness about the true value where conservative. However, it should be noted that all approaches have issues with tightness due to a large amount of extrapolation from the validation domain.
The overall combined assessment of the five methods is shown in Fig. 18. Overall the MAVM, the AVM, and V&V 20 calibration methods performed the best. In the case where conservativeness and tightness were equally weighted (αw = 0.5), the MAVM consistently outperformed the AVM due to the MAVM's ability to be nearly twice as tight as the AVM when conservative. When increasing the weight on conservativeness to αw = 0.9, the performance between the two methods becomes much more comparable as both methods were shown to be reliably conservative, even with a sparse amount of observations. The V&V 20 calibration method also proved reliable due to the ability of the error regression fit to produce a relatively tight calibration for lift coefficient with a sparse amount of observations, while being conservative with the added error regression confidence interval and validation uncertainty.
6.2 Moderate Experimental Observations.
Twenty-five observation locations are used in this case, and they were located at angles of attack of 6 deg, 13 deg, 28 deg, 34 deg, and 42 deg and flap deflections of 5 deg, 13 deg, 21 deg, 27 deg, and 35 deg as listed in Table 1 and shown in Fig. 19. Most of the prediction locations involve interpolation, but those at the lowest and highest angle of attack and lowest flap deflection angles involve some mild extrapolation. Upon the interpolation/extrapolation of the five methods, it was seen as shown in Fig. 20 that for AVM and MAVM their respective uncertainty intervals at prediction locations were larger than those at observation locations, making them slightly more conservative with their predictive capability. However, for the calibration methods of V&V 20 and Bayesian model updating this was not the case. The conservativeness of these two methods when making predictions is largely dictated by the amount of observation locations available across the domain of the input space. Since these methods rely on observation data to create an updated model, the more experimental observations available will provide a better prediction about the true value. Figure 21 shows the conservativeness of each as a function of sample size at the prediction locations. It is seen that Bayesian calibration was usually conservative for all sample sizes at prediction locations, while the 95% prediction interval added to the AVM, MAVM, and V&V 20 interpolation also maintained a high conservativeness for all sample sizes.
When examining the tightness of these methods for prediction locations as displayed in Fig. 22, it is shown that MAVM is the tightest. It is important to note the difference in tightness however for MAVM between lift coefficient and moment coefficient. This is because, as further investigation of the prediction intervals showed, some of the predictions have lift coefficient values similar to the experiment. Therefore, the prediction interval greatly over predicts the true value. However, the moment coefficient values from experiment contain more variability across the input domain, having the prediction locations closer to lower bounds of the prediction intervals. The Bayesian updating method, which is a function of a prior chosen initial variance and correlation length scale, was found to be not very tight due to the large confidence intervals associated with the method at locations where no observational data is provided.
Assessing these methods using equal weighting of conservativeness and tightness (αw = 0.5) the MAVM is seen to perform slightly better, as shown in Fig. 23. However, as αw increases, placing a greater weight on conservativeness, Bayesian calibration has a slightly better overall performance, followed closely by the other four methods.
6.3 Plentiful Experimental Observations.
One hundred 21 observations were made at angles of attack 1 deg, 4 deg, 9 deg, 15 deg, 19 deg, 24 deg, 28 deg, 32 deg, 35 deg, 40 deg, and 41 deg and flap deflections of 2 deg, 5 deg, 7 deg, 10 deg, 13 deg, 15 deg, 21 deg, 26 deg, 29 deg, 35 deg, and 41 deg, and the uncertainties or model errors/discrepancies are extrapolated to the same 20 prediction locations used in the previous cases. These locations are shown in Fig. 24 with the observation locations being shown in white, and prediction locations in red. For this case, there are many experimental observation locations and the prediction domain is almost entirely within the validation domain (i.e., there is very little extrapolation). The uncertainty intervals for each method are shown in Figure 25 for experimental sample sizes of 2 and 16. It is seen that even with the large amount of experimental data, the uncertainty intervals for each respective method are still relatively large due to the added prediction interval for the interpolation/extrapolation and the confidence interval for the Gaussian process for Bayesian calibration. In this case, each of the methods is found to be reliably conservative as shown in Fig. 26. The Bayesian calibration method is conservative in nearly every prediction case due to the method's ability to actively update the model discrepancy for a better approximation in relation to the experiment. The other four methods are also nearly conservative in every instance with the exception of one prediction location for the moment coefficient. This is due to the interpolations being nonconservative at one prediction location slightly outside the domain where observations were made. In measurement of tightness as shown in Figure 27, Bayesian calibration and the modified area validation metric were shown to be the tightest.
When looking at the overall assessment of the methods in Fig. 28, the MAVM and Bayesian calibration perform the best in preliminary design cases where conservativeness and tightness are equally weighted (αw = 0.5). However, when the assessment weight factor is increased to αw = 0.9, it is clear that Bayesian calibration, and to a lesser extent, the MAVM is slightly better than the other two approaches in higher consequence scenarios where more reliable conservativeness is preferred and plentiful data are available over the entire prediction domain.
7 Conclusions
Upon examining and comparing these five validation/calibration methods, it is observed that the AVM is shown to be the least reliably conservative of the five methods for validation where data are available. In fact, with the development of the MAVM, MAVM is preferred over the AVM due to its included confidence interval and its ability to detect bias error. The other four methods are generally conservative at locations with data. When examining their predictive capability, the MAVM and Bayesian calibration appear to perform the best depending on the amount of observation data available when considering both conservativeness and tightness in the overall assessment. However, as more of a weight is placed on conservativeness, as it would be for high consequence applications, Bayesian calibration performs better than the MAVM for moderate amounts of data. For plentiful data, Bayesian calibration and the MAVM slightly outperformed the other three methods. With more observation points, calibration is more attractive. With limited data, simply estimating the model form uncertainty (with no calibration) is recommended. These findings are summarized in Table 2. This table shows the recommended approach given the amount of experimental data or interpolation/extrapolation that is required from the validation domain to the prediction domain and also takes into account the level of risk one is willing to assume. Since both the assessment of the metrics for lift coefficient and moment coefficient saw generally the same trends in performance, it would be expected that these rankings and recommendations would be applicable for any validation experiment based on the amount of experimental data available and the amount of extrapolation required to locations where validation or calibration is being done.
Decision risk | ||
---|---|---|
Amount of experimental data | Low (preliminary design) | High (high consequence) |
Sparse/extensive extrapolation | MFU only (MAVM) | MFU only (MAVM) |
Moderate/some extrapolation | MFU only (MAVM) | Calibration + MFU (K and O or MAVM) |
Plentiful/interpolation only | Mainly calibration (K and O) | Calibration + MFU (K and O or MAVM) |
MFU: model form uncertainty | ||
MAVM: modified area validation metric [7] | ||
V&V 20: ASMEs Standard Validation Uncertainty [9] | ||
K&O: Bayesian calibration [10] |
Decision risk | ||
---|---|---|
Amount of experimental data | Low (preliminary design) | High (high consequence) |
Sparse/extensive extrapolation | MFU only (MAVM) | MFU only (MAVM) |
Moderate/some extrapolation | MFU only (MAVM) | Calibration + MFU (K and O or MAVM) |
Plentiful/interpolation only | Mainly calibration (K and O) | Calibration + MFU (K and O or MAVM) |
MFU: model form uncertainty | ||
MAVM: modified area validation metric [7] | ||
V&V 20: ASMEs Standard Validation Uncertainty [9] | ||
K&O: Bayesian calibration [10] |
Acknowledgment
This work was supported by Intelligent Light (Dr. Earl Duque Project Manager) as part of a Phase II SBIR funded by the U.S. Department of Energy, Office of Science, Office of Advance Scientific Computing Research, under Award Number DE-SC0015162. This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof.
The authors would like to thank Cray Inc. for provided access to their corporate Cray XE40 computer, Geert Wenes of Cray Inc. for helping to acquire access and David Whitaker from Cray Inc. for assistance in porting of OVERFLOW2 to the XE40 and for streamlining the use of FieldView on their system. Special thanks to Dr. Heng Xiao and Dr. Jinlong Wu for providing their insight on Bayesian updating, and Professor James Coder at the University of Tennessee in Knoxville for providing the setup of the OVERFLOW2 runs used for establishing the synthetic experimental data used in this study.
Funding Data
Office of Science (Grant No. DE-SC0015162; Funder ID: 10.13039/100006132).
Data Availability Statement
The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.
Nomenclature
- AVM =
area validation metric
- cl =
2D lift coefficient
- cm(c/4) =
2D moment coefficient about the quarter chord
- CDF =
cumulative distribution function
- d =
area validation metric
- d– =
area validation metric for area smaller than simulation CDF (MAVM)
- D =
mean experimental result
- d+ =
area validation metric for area larger than simulation CDF (MAVM)
- dz/dx =
derivative of the mean camber line with respect to x
- E =
model error
- F(Y) =
simulation CDF curve
- =
single Gaussian process posterior realization
- GP =
Gaussian process
- k =
coverage factor
- k(x,x′) =
covariance matrix between x and x′
- l =
length scale
- MAVM =
modified area validation metric
- MFU =
model form uncertainty
- m(x) =
mean function of the Gaussian process prior
- N =
number of samples
- nobs =
number of observations
- S =
mean simulation result
- Sn(Y) =
experiment CDF curve
- s2 =
variance in experimental data
- SRQ =
system response quantity
- uD =
experimental data uncertainty
- uinput =
input uncertainty
- unum =
numerical uncertainty
- uval =
validation uncertainty
- X =
locations for which data are available in Bayesian updating
- =
locations for which model discrepancy is identified using Bayesian estimation
- α =
angle of attack, degrees
- δ =
flap deflection, degrees
- δmodel =
model discrepancy
- σn2 =
variance in observation model discrepancies
- Φ =
overall assessment
- Φ1 =
conservativeness of a method
- Φ2 =
tightness of a method
- Φ2,v =
tightness for validation method
- Φ2,c =
tightness for calibration method
Appendix
Automation of Model Form Uncertainty Methods.
High-level algorithms for implementing three of the MFU estimation methods discussed in Sec. 2 are given below.
1. | Calculate confidence interval CDFs for the experiment |
2. Compute: | Determine individual probability for simulation SRQ |
3. Compute: | Determine individual probability for experimental samples |
4. fork = 1 to 2 do | |
5. Sn(Y) = SnConf(Y)(k) | |
6. ifN > Sdo | |
7. for j = 1 to S do | |
8. ifdREM ! = 0 do | |
9. Compute: d(i) = (Sn(Y)(i)) – F(Y(j)))*(pSn*i – pF*(j–1)) | Determine remaining area between previous simulation SRQ and individual experiment CDF |
10. if d(i) > 0 do | |
11. d+ = d+ + d(i) | Sum area greater than simulation CDF |
12. else do | |
13. d− = d− + d(i) | Sum area less than simulation CDF |
14. end if | Increase experiment SRQ index |
15. Compute: i = i + 1 | |
16. end if | |
17. while j*pF > i*pSndo | |
18. Compute: d(i) = (Sn(Y(i)) – F(Y(j)))*pF | Determine area between individual simulation SRQ and experiment CDF |
19. if d(i) > 0 do | |
20. Compute: d+ = d+ + d(i) | Sum area greater than simulation CDF |
21. else do | |
22. Compute: d− = d− + d(i) | Sum area less than simulation CDF |
23. end if | Increase experiment SRQ index |
24. Compute: i = i + 1 | |
25. end while | |
26. Compute: dREM = (Sn(Y(i)) – F(Y(j)))*(pF*(j) – pSn*(i–1)) | Determine remaining area between individual simulation SRQ and previous experiment CDF |
27. if dREM > 0 do | |
28. Compute: d+ = d+ + dREM | |
29. else do | Sum area greater than simulation CDF |
30. Compute: d– = d– + dREM | |
31. end if | Sum area less than simulation CDF |
32. end for | |
33. else if N ≤ S do | |
34. for j = 1 to N do | |
35. if dREM ! = 0 do | |
36. Compute: d(i) = (Sn(Y(j)) – F(Y(i)))*(pF*i – pSn*(j–1)) | Determine remaining area between individual simulation SRQ and previous experiment CDF |
37. ifd(i) > 0 do | |
38. d+ = d+ + d(i) | |
39. else do | Sum area greater than simulation CDF |
40. d− = d– + d(i) | |
41. end if | Sum area less than simulation CDF |
42. Compute: i = i + 1 | Increase simulation SRQ index |
43. end if | |
44. while i*pF < j*pSndo | |
45. Compute: d(i) = (Sn(Y(j)) – F(Y(i)))*pF | Determine area between individual simulation SRQ and experiment CDF |
46. if d(i) > 0 do | |
47. Compute: d+ = d+ + d(i) | Sum area greater than simulation CDF |
48. else do | |
49. Compute: d– = d– + d(i) | Sum area less than simulation CDF |
50. end if | |
51. Compute: i = i + 1 | Increase simulation SRQ index |
52. end while | |
53. Compute: dREM = (Sn(Y(j)) – F(Y(i)))*(pSn*(j) – pF*(i–1)) | Determine remaining area between previous simulation SRQ and individual experiment CDF |
54. if dREM > 0 do | |
55. Compute: d+ = d+ + dREM | |
56. else do | Sum area greater than simulation CDF |
57. Compute: d– = d– + dREM | |
58. end if | Sum area less than simulation CDF |
59. end for | |
60. end if | |
61. Compute: dconf+(k) =abs(d+) | Save each calculated area of the confidence interval CDFs being greater than the simulation CDF |
62. Compute: dconf-(k)=abs(d–) | Save each calculated area of the confidence interval CDFs being less than the simulation CDF |
63. end for | |
64. Compute: d+ = max(dconf+) | Take maximum of the positive areas and the upper bound uncertainty |
65. Compute: d– = max(dconf–) | Take the maximum of the negative areas as the lower bound uncertainty |
1. | Calculate confidence interval CDFs for the experiment |
2. Compute: | Determine individual probability for simulation SRQ |
3. Compute: | Determine individual probability for experimental samples |
4. fork = 1 to 2 do | |
5. Sn(Y) = SnConf(Y)(k) | |
6. ifN > Sdo | |
7. for j = 1 to S do | |
8. ifdREM ! = 0 do | |
9. Compute: d(i) = (Sn(Y)(i)) – F(Y(j)))*(pSn*i – pF*(j–1)) | Determine remaining area between previous simulation SRQ and individual experiment CDF |
10. if d(i) > 0 do | |
11. d+ = d+ + d(i) | Sum area greater than simulation CDF |
12. else do | |
13. d− = d− + d(i) | Sum area less than simulation CDF |
14. end if | Increase experiment SRQ index |
15. Compute: i = i + 1 | |
16. end if | |
17. while j*pF > i*pSndo | |
18. Compute: d(i) = (Sn(Y(i)) – F(Y(j)))*pF | Determine area between individual simulation SRQ and experiment CDF |
19. if d(i) > 0 do | |
20. Compute: d+ = d+ + d(i) | Sum area greater than simulation CDF |
21. else do | |
22. Compute: d− = d− + d(i) | Sum area less than simulation CDF |
23. end if | Increase experiment SRQ index |
24. Compute: i = i + 1 | |
25. end while | |
26. Compute: dREM = (Sn(Y(i)) – F(Y(j)))*(pF*(j) – pSn*(i–1)) | Determine remaining area between individual simulation SRQ and previous experiment CDF |
27. if dREM > 0 do | |
28. Compute: d+ = d+ + dREM | |
29. else do | Sum area greater than simulation CDF |
30. Compute: d– = d– + dREM | |
31. end if | Sum area less than simulation CDF |
32. end for | |
33. else if N ≤ S do | |
34. for j = 1 to N do | |
35. if dREM ! = 0 do | |
36. Compute: d(i) = (Sn(Y(j)) – F(Y(i)))*(pF*i – pSn*(j–1)) | Determine remaining area between individual simulation SRQ and previous experiment CDF |
37. ifd(i) > 0 do | |
38. d+ = d+ + d(i) | |
39. else do | Sum area greater than simulation CDF |
40. d− = d– + d(i) | |
41. end if | Sum area less than simulation CDF |
42. Compute: i = i + 1 | Increase simulation SRQ index |
43. end if | |
44. while i*pF < j*pSndo | |
45. Compute: d(i) = (Sn(Y(j)) – F(Y(i)))*pF | Determine area between individual simulation SRQ and experiment CDF |
46. if d(i) > 0 do | |
47. Compute: d+ = d+ + d(i) | Sum area greater than simulation CDF |
48. else do | |
49. Compute: d– = d– + d(i) | Sum area less than simulation CDF |
50. end if | |
51. Compute: i = i + 1 | Increase simulation SRQ index |
52. end while | |
53. Compute: dREM = (Sn(Y(j)) – F(Y(i)))*(pSn*(j) – pF*(i–1)) | Determine remaining area between previous simulation SRQ and individual experiment CDF |
54. if dREM > 0 do | |
55. Compute: d+ = d+ + dREM | |
56. else do | Sum area greater than simulation CDF |
57. Compute: d– = d– + dREM | |
58. end if | Sum area less than simulation CDF |
59. end for | |
60. end if | |
61. Compute: dconf+(k) =abs(d+) | Save each calculated area of the confidence interval CDFs being greater than the simulation CDF |
62. Compute: dconf-(k)=abs(d–) | Save each calculated area of the confidence interval CDFs being less than the simulation CDF |
63. end for | |
64. Compute: d+ = max(dconf+) | Take maximum of the positive areas and the upper bound uncertainty |
65. Compute: d– = max(dconf–) | Take the maximum of the negative areas as the lower bound uncertainty |
1. Compute: E = means(Sn(Y)) – mean(F(Y)) | Determine error between simulation and experiment |
2. Compute: | Determine uncertainty in experimental data |
3. Compute: uinput = std(F(Y)) | Determine uncertainty in simulation due to nondeterministic inputs |
4. Compute: unum = uro + uiter + uDE | Determine numerical uncertainty in simulation |
5. Compute: | Determine overall validation uncertainty |
1. Compute: E = means(Sn(Y)) – mean(F(Y)) | Determine error between simulation and experiment |
2. Compute: | Determine uncertainty in experimental data |
3. Compute: uinput = std(F(Y)) | Determine uncertainty in simulation due to nondeterministic inputs |
4. Compute: unum = uro + uiter + uDE | Determine numerical uncertainty in simulation |
5. Compute: | Determine overall validation uncertainty |
1. for i = 1 to max(1) | Determine covariance matrix for observations |
2. | |
3. | Sample different l values for maximum likelihood |
4. end for | |
5. Compute: | Determine covariance matrix for observations and predictions |
6. Compute: | Determine covariance matrix for predictions |
7. Compute: | Determine mean posterior function |
8. Compute: | Determine covariance of mean posterior function |
9. Compute: L = chol(cov(f*)) | Cholesky decompose posterior covariance matrix |
10. for j = 1 to n do | |
11. Compute: f*(j) = L*randn(length(X*),1) | Sample posterior realization |
12. end do |
1. for i = 1 to max(1) | Determine covariance matrix for observations |
2. | |
3. | Sample different l values for maximum likelihood |
4. end for | |
5. Compute: | Determine covariance matrix for observations and predictions |
6. Compute: | Determine covariance matrix for predictions |
7. Compute: | Determine mean posterior function |
8. Compute: | Determine covariance of mean posterior function |
9. Compute: L = chol(cov(f*)) | Cholesky decompose posterior covariance matrix |
10. for j = 1 to n do | |
11. Compute: f*(j) = L*randn(length(X*),1) | Sample posterior realization |
12. end do |