Abstract

Model validation is the process of determining the degree to which a model is an accurate representation of the true value in the real world. The results of a model validation study can be used to either quantify the model form uncertainty or to improve/calibrate the model. The model validation process becomes complex when there is uncertainty in the simulation and/or experimental outcomes. These uncertainties can be in the form of aleatory uncertainties due to randomness or epistemic uncertainties due to lack of knowledge. Five different approaches are used for addressing model validation and predictive capability: (1) the area validation metric (AVM), (2) a modified area validation metric (MAVM) with confidence intervals, (3) the validation uncertainty procedure from ASME V&V 20, (4) a calibration procedure interpreted from ASME V&V 20, and (5) identification of the model discrepancy term using Bayesian estimation. To provide an unambiguous assessment of these different approaches, synthetic experimental data is generated from computational fluid dynamics simulations of an airfoil with a flap. A simplified model is then developed using thin airfoil theory. The accuracy of the simplified model is assessed using the synthetic experimental data. The quantities examined include the two-dimensional lift and moment coefficients for the airfoil with varying angles of attack and flap deflection angles. Each of these approaches is assessed for the ability to tightly encapsulate the true value at conditions both where experimental results are provided and prediction locations where no experimental data are available. Generally, it was seen that the MAVM performed the best in cases where there is a sparse amount of data and/or large extrapolations. Furthermore, it was found that Bayesian estimation outperformed the others where there is an extensive amount of experimental data that covers the application domain.

1 Introduction

Mathematical models are useful as they can adequately predict real world physics, but nearly always have some error and/or uncertainty attached to their results. Error and uncertainty can come from many factors. They could be introduced from assumptions and approximations in the formulation of the model, as well as uncertainty in the measured inputs required by the model. Furthermore, the true value of a physical quantity cannot be measured perfectly because of experimental measurement uncertainties. Regardless of the difficulties, it is important to assess model accuracy by comparison of simulation results with experimental measurements. Model validation is the process of determining the degree to which a model is an accurate representation of the real world from the perspective of the intended uses of the model. Model calibration is the process of adjusting physical model parameters in the computational model to improve agreement with the experimental measurements. It needs to be noted that any experiment is a result from the real world, regardless of its connection to an actual system of interest or measurement difficulties. For the purposes of this paper, it is stressed that calibration refers to an updated model discrepancy estimation and not a calibration of the model inputs themselves. A validation experiment is an experiment conducted with the primary purpose of assessing the predictive capability of a mathematical model. Validation experiments differ from traditional experiments used for exploring a physical phenomenon or obtaining information about a system because the customer for the validation experiment is commonly the model builder or computational analyst. For a detailed discussion of the design and execution of model validation experiments, see Oberkampf and Roy [1] and Oberkampf and Smith [2].

Various approaches have been proposed for assessing model accuracy, i.e., model validation, given that experimental measurements are available [35]. In Secs. 5 and 6, four different approaches are used for addressing model validation: (1) the area validation metric (AVM) [6], (2) a modified area validation metric (MAVM) with confidence intervals [7,8], (3) the validation uncertainty procedure from ASME V&V 20 [9], and (4) identification of the model discrepancy term using Bayesian estimation [10]. To provide an unambiguous assessment of the effectiveness of these different approaches, synthetic experimental data, i.e., truth data, is generated. The synthetic experimental data is generated using a high-fidelity computational fluid dynamics (CFD) simulation of turbulent, compressible, flow over a two-dimensional wing with a flap. The synthetic experimental data is presumed to have only an associated random measurement uncertainty, but no bias or systematic errors. To properly assess the effectiveness of each of the methods, random samples are drawn from the synthetic experimental data and then used in each method.

The effectiveness of each of the methods is assessed using the same approximate model of flow over the airfoil with a flap. The approximate model is based on thin airfoil theory which assumes inviscid, incompressible flow. The approximate model is assumed to have two uncertain inputs, angle of attack and flap angle, which are characterized as Gaussian random variables. The methods are assessed at a number of input conditions where (synthetic) experimental data are available. Discussion of results for this traditional case of model validation is given in Sec. 5. The effectiveness of the methods is also assessed for input conditions where no experimental data are considered to be available. This activity is typically referred to as assessment of predictive capability of a model, or model-based extrapolation to conditions where experimental data are not available. Using the method of manufactured universes (MMU) [11], in combination with the synthetic experimental data, the methods are assessed for predictive capability. Each of the methods uses an extrapolation technique to estimate the mean value of the model form uncertainty and its prediction interval in the conditions where no experimental data are available. Discussion of results for predictive capability is given in Sec. 6. A summary of the performance of each method is summarized in Sec. 7.

2 Model Validation/Calibration Methods

Four different methods are assessed for their effectiveness in model validation/calibration. Model validation is the process of determining the degree to which a model is an accurate representation of the true value in the real world. The four validation/calibration approaches that are assessed are the area validation metric [6], the modified area validation metric [7,8], the ASME's V&V 20 standard validation uncertainty [9], and a Bayesian estimation approach [10]. These four methods were chosen because they are each popular methods for assessing model accuracy when experimental measurements are available. For the assessment here, all methods assume that numerical solution error in the simulation result is negligible. Furthermore, it is assumed that the experimental measurement uncertainty occurs only in the measurement of the system response quantities (SRQs).

2.1 Area Validation Metric.

The area validation metric [6] provides a method of estimating the model form uncertainty by placing uncertainty bounds about the simulation SRQs. The model form uncertainty (MFU) is the uncertainty in the mathematical model that results from the assumptions and approximation made in the formulation of the model. The MFU excludes the uncertainty due to the uncertainty in the model input parameters. After propagating the experimentally measured input uncertainty through the model, cumulative distribution functions (CDFs) are created for each SRQ. The magnitude of the model form uncertainty about these simulation results is estimated by determining the absolute value area between the experiment and simulation empirical CDFs, which will be referred to as Sn(Y) and F(Y), respectively. The model form uncertainty is therefore given as

where d is the area validation metric and Y is the SRQ. Once d is determined, the interval in which the true value is estimated to lie is [F(Y) – d, F(Y) + d]. An example of the area validation metric applied for a case where only aleatory uncertainties are present in the model inputs is shown in Fig. 1.

Fig. 1
Area validation metric (reproduced from Ref. [6])
Fig. 1
Area validation metric (reproduced from Ref. [6])
Close modal

The MFU, as estimated using the AVM, is treated as a purely epistemic uncertainty when used in combination with probability bounds analysis (PBA) [1,6,12,13]. That is, the AVM is considered as an interval-valued uncertainty on both sides of the CDF computed from the nondeterministic simulation. Figure 2 shows the probability box (p-box) that results when the AVM is added to both sides of the CDF from the simulation. This p-box represents the family of all possible CDFs that can exist within its bounds. The outer bounding shape of the p-box in Fig. 2 is due to the probabilistically characterized input uncertainty. The p-box can be interpreted as follows. The probability that the SRQ = 4 or less, is in the interval [0.02, 0.52]. That is, given the inclusion of the estimated MFU, the SRQ = 4 or less could be as low as 0.02, but it could be as high as 0.52. As discussed in Refs. [1,12], and [13], epistemic uncertainties due to model input uncertainty and numerical solution uncertainty can also be included using PBA. These additional uncertainties additively increase the size of each preceding p-box.

Fig. 2
Example of total uncertainty represented as an extended p-box due to model input uncertainty and model form uncertainty
Fig. 2
Example of total uncertainty represented as an extended p-box due to model input uncertainty and model form uncertainty
Close modal

2.2 Modified Area Validation Metric.

The area validation metric tends to over predict the model form uncertainty of the simulation results in relation to the true value due to the model uncertainty being applied symmetrically to the simulation, regardless of the relation between the experiment and simulation CDFs. In addition, the original AVM does not account for the additional uncertainty arising due to small experimental (or computational) sample sizes. In order to overcome these weaknesses, Voyles and Roy [7,8] proposed the MAVM. Small sample sizes are accounted for by applying a confidence interval for the mean, assuming a Student t-distribution, about the entire experimental CDF. This approach is justified since, when the experimental and simulation CDFs do not overlap, the AVM simply defaults to the difference in the mean values of the distributions. Also, note that small sample sizes in the simulation results can be accounted for similarly using a confidence interval. Finally, the maximum area between the experimental confidence interval bounds and the simulation CDF is chosen because our goal is to estimate model form uncertainty. If instead, we were seeking the “evidence for disagreement” between the simulation and experiment, the minimum area would be appropriate. The MAVM separately tracks two different areas between the CDFs. These two regions are the area where the right 95% confidence interval of the experimental CDF provides SRQs larger than the simulation CDF (d+) and the area where the left 95% confidence interval of the experiment SRQs are smaller than the predicted SRQs from the model (d), as shown in Fig. 3. The interval in which the metric estimates the true value and is most likely to exist is [F(Y) – d, F(Y) + d+], where F(Y) is the simulation CDF and d+ and d are defined as
Fig. 3
Modified area validation metric with confidence intervals
Fig. 3
Modified area validation metric with confidence intervals
Close modal

The interval-valued estimate of the MFU [d, d+], when combined with PBA, will yield an asymmetric p-box as opposed to the symmetric p-box for the AVM shown in Fig. 2. The more biased the simulation is relative to the available experimental data, the more asymmetric the p-box will be when using the MAVM.

2.3 ASME V&V 20 Standard Validation Uncertainty.

A validation standard was developed by ASME for verification and validation of computational fluid dynamics and heat transfer [9], referred to herein as V&V 20. The implementation of V&V 20 involves the computation of two parameters. The first is the error E between the expected values of the simulation S and the experiment D (here taken to be the mean values)
The second parameter is the standard validation uncertainty, uval, which is given by
where unum is the numerical uncertainty in the model SRQs, uinput is the effect of propagating the input uncertainties through the simulation to obtain the uncertainty in the SRQs, and uD is the uncertainty in the experimental data. When applying this metric to estimate where the true value exists, the true model error is given by the interval
where k is a coverage factor. Although not addressed directly in the standard, for cases where |E| ≫ kuval, δmodel can just be taken to be E and the model results can possibly be updated since a clear bias error has been determined. In cases where E has a similar magnitude to kuval, then the above interval equation for δmodel should be used. When |E| ≪ kuval, then the model cannot be corrected and the model uncertainty is expected to be less than or equal to kuval. Since uval represents a standard uncertainty (approximately 68% confidence for a normal distribution), a coverage factor must be included to achieve other confidence levels. For 95% confidence when the distribution is not known, the V&V 20 standard recommends coverage factors between 2 and 3. Here we will simply use a coverage factor of 2. In our results, when E ≤ kuval, the true values, with 95% confidence, are expected to fall within the interval S ± kuval, where k = 2. Again, the standard itself does not address how the results of the validation assessment should be used, so the approach taken here should be considered just one possible method for use. The uncertainty in the experimental results is found by computing confidence intervals about the experimental mean using the Student's t-distribution

where s2 is the variance of the experimental measurements and N is the number of samples. This results in the uncertainty estimate of the mean experimental value being quite small when many experiment replicates are available.

For the implementation of ASME's Standard Validation uncertainty, Roache [14] provides a recommendation on how the true error and validation uncertainty should be applied for validation and calibration at locations where no data are available (predictions). The authors of this paper then extended this recommendation and propose that for cases where E ≫ uval that the simulation is calibrated by E, with an added uncertainty of Uval defined as
where uval is the validation uncertainty with an added 95% prediction interval and ufit is the prediction interval on the interpolation of E between observation locations. For cases where Euval, it is recommended that E and uval are used as a validation uncertainty about the original simulation result defined by

where δmodel is the discrepancy between the model and the true value in nature. For this case, it is indicated that the true model discrepancy is most likely less than the sum of the total validation uncertainty and the absolute value of the true error. The reason this claim can be confidently made is that in situations where |E| ≪ uval, the true error is relatively small meaning that at prediction locations the model discrepancy will most likely be relatively small. Therefore, the true value will most likely be easily encapsulated by the sum of the validation uncertainty and true error. Figure 4 illustrates when this method should be used as a validation method for cases when the ratio of the model discrepancy to the validation uncertainty is greater than five, and when it is to be used as a calibration method for when that ratio is less than five. In this figure, the dashed line at zero indicates the mean simulation result, and the squares represent the model error.

Fig. 4
Demonstration of the application of V&V 20 as a validation or calibration method and its dependence on the ratio of difference in observed mean error and validation uncertainty
Fig. 4
Demonstration of the application of V&V 20 as a validation or calibration method and its dependence on the ratio of difference in observed mean error and validation uncertainty
Close modal

2.4 Bayesian Model Estimation.

The final validation approach that will be assessed is the method developed by Kennedy and O'Hagan [10 ]. This method commonly combines model validation, model parameter calibration, and model prediction using a Gaussian process (GP) computational approach. For the present assessment analysis, no model parameter calibration was conducted because it is assumed that these are known precisely. By obtaining experimental results at a range of input locations for the model, the observed discrepancy between the experiment and the simulation at these locations can be determined. These discrepancies can then be used to update a model discrepancy term to better represent the experimental data, i.e., Bayesian estimation. Gaussian process is then used to connect the observations to one another over the input domain to create the updated model. The updated model function is defined as
where f(x) is a posterior realization of the GP, m(x) is the mean posterior function selected to represent the model discrepancy (m(x) = 0 for this case), and k(x,x′) is the covariance matrix of the input quantities with one another. The way that the GP behaves between observation locations is dependent upon this covariance matrix which is given by
where σn2 is the variance in the observation discrepancies and l is the correlation length scale, which denotes the typical distance over which the function values at different spatial locations become decorrelated. Normal distributions were assumed for each of these parameters. The length scale is determined here by using maximum likelihood estimation [15] by selecting l such that the probability that the GP posterior correctly predicts the behavior of the true model is the highest. In order to maximize this probability, a length scale is selected such that the following function is at a maximum:
where yo are the observed model discrepancies, xo are the locations of the input domain where observations were made, nobs are the number of observations made, and Ko is the covariance matrix of the observation locations with itself (Ko = k(xo, xo)). The joint distribution of the observed target values and the function values at the test locations under the prior is then given as [16]
where f* is a single Gaussian process realization, and it has a conditional (posterior) distribution of

where X are domain locations where experimental data are available and X* are locations where model discrepancy is to be predicted. Figure 5 shows an example of posteriors created assuming a prior mean of zero and a prior variance of the variance in the observed errors between the experimental samples and the mean simulation result, and then the updated posterior after incorporating the observation data to reflect in the model discrepancy.

Fig. 5
Display of updated posterior model using observational data using the Kennedy and O'Hagan approach [14]
Fig. 5
Display of updated posterior model using observational data using the Kennedy and O'Hagan approach [14]
Close modal

3 Method of Manufactured Universes and Validation Case

Since our goal is not to actually validate or calibrate a model, but instead to assess the usefulness of different model validation and prediction frameworks, we will not use actual experimental data, which would leave ambiguity regarding the true value in nature. Instead, we will use the MMU developed by Stripling et al. [11] to unambiguously assess the different model validation, calibration, and prediction techniques. Similar to the method of manufactured solutions used for code verification, MMU involves the generation of a manufactured reality, or “universe,” from which “experimental” observations can be made. A low-fidelity model is then developed and assessed with uncertain inputs and possibly including numerical solution errors. Since the true behavior of “reality” is known from the manufactured universe, the estimation of model form uncertainty and errors in the lower-fidelity model can be performed. It is suggested that the manufactured reality be based on a high-fidelity model so as to obtain similar qualitative behavior as that found in a real experiment. This approach can be used to compare different methods for estimating model form uncertainty and calibration in the presence of random experimental measurement errors, experimental bias errors, modeling errors, and uncertainties in the model inputs. It can also be used to assess approaches for extrapolating model form uncertainty to conditions where no experimental data are available (i.e., model prediction locations) [7].

Due to the cost of obtaining “real world” experimental data, a high fidelity CFD model was used as the “experiment.” The experimental model was created using the overflow solver version 2.2n [17,18], featuring the 2D flow around the MD 30P/30N (McDonnell-Douglas: Berkeley, MO) multi-element airfoil as shown in Fig. 6 [7]. The 2D lift and moment coefficients were solved for seven different angles-of-attack, α = 0 deg, 5 deg, 10 deg, 15 deg, 20 deg, 25 deg, 30 deg, over five different flap deflection angles, δ = 0 deg, 10 deg, 20 deg, 30 deg, 40 deg. From these results, a manufactured universe was created using bi-harmonic curve fits to serve as the sampling space for obtaining “synthetic” experimental data. Therefore, this manufactured universe is a continuous function of the uncertain model inputs (α and δ) in which nondeterministic “experimental” outputs are obtained by propagating uncertainties through the inputs, and then adding in additional random measurement error in the output quantities (lift and moment coefficients). The low fidelity model used here is thin airfoil theory [19]. The formulas given by thin airfoil theory are

where cl is the 2D lift coefficient, cm(c/4) is the 2D moment coefficient about the quarter chord, α is the angle of attack, and dz/dx is the derivative of the mean camber line with respect to x. Figure 7 shows the lift and moment coefficient provided from the manufactured universe with the red x's representing the locations at which high-fidelity CFD results were provided. The lift and moment coefficients provided by thin airfoil theory are shown in Fig. 8. Note the linearity of the results for both coefficients because thin airfoil theory does not account for flow separation, and is expected to deviate significantly at high angles of attack and flap deflection. Despite this deviation due to thin airfoil theory failing to account for flow separation, this is not an issue for the two goals of this investigation. The primary goal is to assess first how well each method encapsulates errors in the (imperfect) model relative to the true value in nature with its uncertainty bounds (i.e., the method's conservativeness). The secondary goal is to determine how tightly the uncertainty bounds encapsulate the true value (the CFD result) if it is found to be conservative. Since these uncertainties will then be extrapolated to parts of the model input domain where no experimental observations will be made, the model imperfections found in thin airfoil theory will be helpful in determining the predictive capability of each of these methods. In this validation case, a normal (Gaussian) uncertainty with a mean at the nominal value and a standard deviation of 0.5 deg and 1.0 deg is propagated through the angle of attack and flap deflection, respectively. Each uncertain input was sampled as a normal distribution with the mean being the intended measurement of angle of attack or flap deflection, and the standard deviation of the distribution is the associated uncertainty (0.5 deg for angle of attack and 1 deg for flap deflection). For the output quantities (lift and moment coefficient), additional random measurement uncertainty is assigned to each experimental replicate measurement as one would have in an actual experiment. This uncertainty is assumed to be Gaussian with a mean of zero and a standard distribution of 5% of the mean value of the output quantity found by evaluating the true value in nature as a function of the nominal input values.

Fig. 6
Overset grid system and typical flowfield result shown as a coordinate slice colored by local Mach number for the MD 30P/30N airfoil
Fig. 6
Overset grid system and typical flowfield result shown as a coordinate slice colored by local Mach number for the MD 30P/30N airfoil
Close modal
Fig. 7
(a) cl and (b) cmc/4 for manufactured universe showing the 35 CFD runs used to generate the curve fits (i.e., synthetic experimental data)
Fig. 7
(a) cl and (b) cmc/4 for manufactured universe showing the 35 CFD runs used to generate the curve fits (i.e., synthetic experimental data)
Close modal
Fig. 8
(a) cl and (b) cmc/4 for low fidelity model
Fig. 8
(a) cl and (b) cmc/4 for low fidelity model
Close modal

4 Performance Metrics

To compare these different validations and prediction methods, two metrics are used to assess them. The first is conservativeness, φ1, which can also be thought of as the reliability of the estimator to capture the true model error. It is measured as the percentage of the time that a respective method encapsulates the true value in nature within its associated uncertainty bounds. In this case, the “true value” is taken as the mean of the experimental output as the number of input samples approaches infinity (the mean of the high fidelity model for N = 10,000 in this case). Conservativeness values closer to 100% are ideal, with values above 95% considered to be very good. Note that a method can have a high conservativeness factor, but not be very informative to a decision maker because it is excessively conservative. The second factor taken into consideration is tightness, φ2, which assesses how tightly the uncertainty interval about either the predicted or calibrated value bounds the true value. It should be noted that φ2 is only calculated if a respective method is proven to be conservative in a particular instance. It is measured as
where φ2,v is the tightness assessment for validation methods and is simply the ratio of the true model error magnitude to the associated uncertainty about the mean simulation result. The reason this tightness calculation only applies to validation methods is due to the fact that it could penalize calibration methods that have an updated simulation mean close to the true value. Therefore, the following formula will be used to measure tightness for calibration methods:

where φ2,c is more dependent on the measure of the ratio of the associated uncertainty with the calibration to the original true error magnitude that existed before calibration. For both validation and calibration methods, if a method fails to be conservative, then φ2 will be taken to be zero for that case. As with conservativeness, larger tightness values near 100% are preferred.

These two assessment factors are then combined into an overall assessment by

where φ is the overall assessment and αw is a weighting factor, typically set to be 0.5 (i.e., equal weighting) for low consequence applications such as preliminary design or 0.9 for higher consequence applications in order to place a greater significance on conservativeness.

5 Results for Model Validation

Each of the methods previously discussed was first assessed at locations of the input domain where synthetic experimental results were provided (i.e., the red x's in Fig. 7) and compared with results from the model. Uncertainty bounds about each of the results are shown in Fig. 9 for both 2 and 16 experimental samples for the nominal condition of α = 30 deg and δ = 20 deg. Note that for this validation activity, only one observation point was used and no regression fit was performed for the AVM, the MAVM, and V&V 20 methods. For the Bayesian calibration method, the Gaussian process across the entire input domain was created but was only assessed at the single validation location in this section. The green dashed line represents the mean simulation result, and the gray dashed line represents the experimental mean for the given number of samples. Notice how as the number of experimental samples increases from 2 to 16, the gray line moves closer to the blue dashed line. The blue dashed line is the true value, which is taken to be the experimental mean as the number of experimental samples approaches infinity. For the purposes of this study, the true value was taken as the experimental mean from a size of 100,000 samples. It is also seen that as the number of experimental samples increases, the uncertainty bounds about the true value generally become smaller.

Fig. 9
(a) Uncertainty intervals for cl (two experimental samples), (b) uncertainty intervals for cmc/4 (two experimental samples), (c) uncertainty intervals for cl (16 experimental samples), (d) uncertainty intervals for cmc/4 (16 experimental samples) at α = 30 deg and δ = 20 deg
Fig. 9
(a) Uncertainty intervals for cl (two experimental samples), (b) uncertainty intervals for cmc/4 (two experimental samples), (c) uncertainty intervals for cl (16 experimental samples), (d) uncertainty intervals for cmc/4 (16 experimental samples) at α = 30 deg and δ = 20 deg
Close modal

In Fig. 10, the conservativeness of each method aggregated over all of the 35 observation points as a function of sample size is shown. As would be assumed, with the increase in experimental samples available, the conservativeness of each method generally increases. However, this was not the case for area validation metric. This is because the area validation metric has no included confidence interval such as the modified area validation metric. Therefore, whether the area validation metric is conservative or not largely depends on the value of the mean experimental SRQs relative to the true value (i.e., it will be larger than the true value approximately 50% of the time). The same trend is also seen for the tightness measurement, shown in Fig. 11, as the uncertainty bounds for each validation metric and calibration method becomes tighter around the true value, excluding the area validation metric. The area validation metric's tightness for these locations converge to about 50% due to its symmetric MFU bounds, and the other four methods' tightness measurement does not quite converge to 100% with increased sample size due to the confidence intervals associated with each method. Thus, at these observation locations, each method slightly over-predicts the associated uncertainty of the model relative to the true value when a large number of experimental samples are available. Figure 12 shows the overall assessment of the methods for the case where experimental data are available, with αw taken to be 0.5 and 0.9. Based on these results, it is shown that the modified area validation metric, V&V 20, and Bayesian calibration perform well even for low experimental sample sizes. When using a weighting factor of 0.9 and placing more of an emphasis on conservativeness for the overall assessment, most of the methods converge to 95% for small sample sizes (with the exception of the AVM). The high overall assessment of the calibration methods is probably expected, however, due to the methods having experimental data to help obtain an idea of where the true value lies. The added confidence intervals about the experimental data when computing the MAVM show an obvious improvement from the AVM in its tightness about the true value and its overall assessment.

Fig. 10
Conservativeness as a function of sample size for each validation metric at locations were observation data are available for (a) cl and (b) cmc/4
Fig. 10
Conservativeness as a function of sample size for each validation metric at locations were observation data are available for (a) cl and (b) cmc/4
Close modal
Fig. 11
Tightness measurements as a function of sample size for locations at which observation data are available for (a) cl and (b) cmc/4
Fig. 11
Tightness measurements as a function of sample size for locations at which observation data are available for (a) cl and (b) cmc/4
Close modal
Fig. 12
Overall assessment as a function of sample size for locations at which observation data are available for (a) cl and (b) cmc/4 (assumes αw = 0.5) and (c) cl and (d) cmc/4 (assumes αw = 0.9)
Fig. 12
Overall assessment as a function of sample size for locations at which observation data are available for (a) cl and (b) cmc/4 (assumes αw = 0.5) and (c) cl and (d) cmc/4 (assumes αw = 0.9)
Close modal

6 Results for Predictive Capability

The second part of comparing these methods is to observe how they performed at locations where no experimental results are provided, and the results have to be interpolated or extrapolated to the prediction locations. This was done by creating a best fit of the metric results at the locations where experimental SRQs are provided. Outlined in Roy and Oberkampf [1], 95% prediction intervals were then placed about the fit, and the upper limit of the prediction interval is taken as the metric result. Therefore, in this instance the upper level of the 95% prediction intervals are taken for d, d+, d, E, and uval then used as the uncertainty for their respective method. Note that for calibration using V&V 20, the model is calibrated by E and the associated uncertainty is the 95% prediction interval for E plus the upper limit on the prediction interval for kuval [14]. However, this does not have to be done for the Bayesian model updating as predictions are made directly from sampling the Gaussian Process model. A simple 1D example of this interpolation is shown in Fig. 13 for the extrapolation of the area validation metric using a quadratic regression fit. The uncertainties or model errors/discrepancies are extrapolated to 20 prediction locations located at combinations of angles of attack of 0 deg, 18 deg, 25 deg, 38 deg, and 45 deg and flap deflections of 0 deg, 8 deg, 17 deg, and 25 deg. For the case in which a sparse number of experimental observations are available, a linear regression fit is used in both the dimension of angle of attack and flap deflection due to the limited number of available points. However, as the number of observations increases (as for the moderate and plentiful cases), a bi-quadratic regression fit is used.

Fig. 13
Prediction model form uncertainty regression for cl using area validation metric
Fig. 13
Prediction model form uncertainty regression for cl using area validation metric
Close modal

6.1 Sparse Experimental Observations.

Four observations were assumed at angles of attack of 10 deg and 30 deg and flap deflections of 5 deg and 20 deg. These locations are listed in Table 1 and shown in Fig. 14 with the observation locations being shown in white and prediction locations in red. The uncertainty intervals for each method are shown in Fig. 15 for experimental sample sizes of 2 and 16 at α = 18 deg and δ = 25 deg. The locations where experimental data are available are defined as the validation domain, and the locations at which predictions are being made are the prediction domain. In this case, the data are sparse, and there is a significant amount of extrapolation that is required from the validation domain. It is seen how the modified area validation metric accounts for bias error by its one-sided interval, unlike the area validation metric's symmetric uncertainty interval. The uncertainty intervals on the modified area validation metric can also be seen closing in on the true value with the increase in sample size from 2 to 16, due to the reduced confidence intervals about the experimental mean. Additionally, it is seen that the AVM and MAVM are overly conservative for lift coefficient compared to V&V 20 and Bayesian calibration. This is because at this prediction location the true value and means value from the simulation are relatively close to one another, and the added prediction interval for those methods, it makes them overly conservative at this location while having a tighter bound about the true value at prediction locations further away from the validation domain.

Fig. 14
Prediction locations where validation/calibration methods are being assessed with sparse data for (a) cl and (b) cmc/4 (white x's: observations/validation domain, red x's: predictions/application domain)
Fig. 14
Prediction locations where validation/calibration methods are being assessed with sparse data for (a) cl and (b) cmc/4 (white x's: observations/validation domain, red x's: predictions/application domain)
Close modal
Fig. 15
Prediction location uncertainty intervals (a) uncertainty intervals for cl (two experimental samples), (b) uncertainty intervals for cmc/4 (two experimental samples), (c) uncertainty intervals for cl (16 experimental samples), (d) uncertainty intervals for cmc/4 (16 experimental samples) at α = 18 deg and δ = 25 deg
Fig. 15
Prediction location uncertainty intervals (a) uncertainty intervals for cl (two experimental samples), (b) uncertainty intervals for cmc/4 (two experimental samples), (c) uncertainty intervals for cl (16 experimental samples), (d) uncertainty intervals for cmc/4 (16 experimental samples) at α = 18 deg and δ = 25 deg
Close modal
Table 1

Combinations of angles of attack and flap deflections at which experimental data was obtained for each assessment case

Locations of experimental observations
Amount of observationsAngles of attack (α)Flap deflections (δ)
Sparse10 deg and 30 deg5 deg and 20 deg
Moderate6 deg, 13 deg, 28 deg, 34 deg, and 42 deg5 deg, 13 deg, 21 deg, 27 deg, and 35 deg
Plentiful1 deg, 4 deg, 9 deg, 15 deg, 19 deg, 24 deg, 28 deg, 32 deg, 35 deg, 40 deg, and 41 deg–1 deg, 2 deg, 5 deg, 7 deg, 10 deg, 13 deg, 15 deg, 21 deg, 26 deg, 29 deg, 35 deg, and 41 deg
Locations of experimental observations
Amount of observationsAngles of attack (α)Flap deflections (δ)
Sparse10 deg and 30 deg5 deg and 20 deg
Moderate6 deg, 13 deg, 28 deg, 34 deg, and 42 deg5 deg, 13 deg, 21 deg, 27 deg, and 35 deg
Plentiful1 deg, 4 deg, 9 deg, 15 deg, 19 deg, 24 deg, 28 deg, 32 deg, 35 deg, 40 deg, and 41 deg–1 deg, 2 deg, 5 deg, 7 deg, 10 deg, 13 deg, 15 deg, 21 deg, 26 deg, 29 deg, 35 deg, and 41 deg

When examining the conservativeness of each method shown in Fig. 16, it is seen that the area validation metric and the modified area validation metric are shown to be the most reliably conservative of the five methods when a small number of observations are available and there is a significant extrapolation from the validation domain to the prediction domain. However, it is interesting to see the different levels of conservativeness that are shown between lift coefficient and moment coefficient for the AVM and MAVM. This comparison shows the effectiveness of the extrapolation of these methods and being conservative is reliant upon where observations are made to collect data in their assessment. Because in this case only two observations are made in each respective dimension of the uncertain inputs, the information from these two methods can only be extrapolated via a linear regression fit. Due to the observations being made in locations where the experiment still behaves linearly for lift coefficient with respect to angle of attack and flap deflection, the linear fit closely encapsulates the true value for small angles of attack and flap deflection. However, when extrapolating out to regions of high angles of attack and flap deflections (where flow separation occurs and the lift coefficient becomes nonlinear), the methods tend to not be conservative. This is not the case for moment coefficient due to the experiment containing a lot of nonlinearity across the input domain, therefore the prediction intervals on the extrapolation regression fits are more easily able to encapsulate the true value outside of the validation domain. It is also important to note the poor performance of Bayesian calibration when a low amount of data are available. This is because Bayesian calibration assumes a prior mean discrepancy of zero before calibration (i.e., no difference between the experiment and model), and does not calibrate the model outside of the region where observations are available in which information on how the experiment and model differ is provided. Therefore, when making predictions for the mean discrepancy with few observations, it is hard to acquire meaningful information outside of where those observations are made.

Fig. 16
Conservativeness at 20 prediction locations for each method for (a) cl and (b) cmc/4
Fig. 16
Conservativeness at 20 prediction locations for each method for (a) cl and (b) cmc/4
Close modal

Examining the tightness in Fig. 17, the modified area validation metric is shown to be the tightest in this case. Also, it is shown that the AVM was about half as tight as the MAVM as expected, due to the MAVM's separate tracking of d+ and d areas to help account for bias error. The Bayesian updating method was seen to be the least conservative in part due to the fact that the method assumes a large uncertainty about the calibration outside of the domain where observations were made, therefore leading to a low tightness about the true value where conservative. However, it should be noted that all approaches have issues with tightness due to a large amount of extrapolation from the validation domain.

Fig. 17
Tightness at 20 prediction locations for each method for (a) cl and (b) cmc/4
Fig. 17
Tightness at 20 prediction locations for each method for (a) cl and (b) cmc/4
Close modal

The overall combined assessment of the five methods is shown in Fig. 18. Overall the MAVM, the AVM, and V&V 20 calibration methods performed the best. In the case where conservativeness and tightness were equally weighted (αw = 0.5), the MAVM consistently outperformed the AVM due to the MAVM's ability to be nearly twice as tight as the AVM when conservative. When increasing the weight on conservativeness to αw = 0.9, the performance between the two methods becomes much more comparable as both methods were shown to be reliably conservative, even with a sparse amount of observations. The V&V 20 calibration method also proved reliable due to the ability of the error regression fit to produce a relatively tight calibration for lift coefficient with a sparse amount of observations, while being conservative with the added error regression confidence interval and validation uncertainty.

Fig. 18
Overall assessment at 20 predictions locations using αw = 0.5 for (a) cl and (b) cmc/4 and αw = 0.9 for (c) cl and (d) cmc/4
Fig. 18
Overall assessment at 20 predictions locations using αw = 0.5 for (a) cl and (b) cmc/4 and αw = 0.9 for (c) cl and (d) cmc/4
Close modal

6.2 Moderate Experimental Observations.

Twenty-five observation locations are used in this case, and they were located at angles of attack of 6 deg, 13 deg, 28 deg, 34 deg, and 42 deg and flap deflections of 5 deg, 13 deg, 21 deg, 27 deg, and 35 deg as listed in Table 1 and shown in Fig. 19. Most of the prediction locations involve interpolation, but those at the lowest and highest angle of attack and lowest flap deflection angles involve some mild extrapolation. Upon the interpolation/extrapolation of the five methods, it was seen as shown in Fig. 20 that for AVM and MAVM their respective uncertainty intervals at prediction locations were larger than those at observation locations, making them slightly more conservative with their predictive capability. However, for the calibration methods of V&V 20 and Bayesian model updating this was not the case. The conservativeness of these two methods when making predictions is largely dictated by the amount of observation locations available across the domain of the input space. Since these methods rely on observation data to create an updated model, the more experimental observations available will provide a better prediction about the true value. Figure 21 shows the conservativeness of each as a function of sample size at the prediction locations. It is seen that Bayesian calibration was usually conservative for all sample sizes at prediction locations, while the 95% prediction interval added to the AVM, MAVM, and V&V 20 interpolation also maintained a high conservativeness for all sample sizes.

Fig. 19
Prediction locations where validation/calibration methods are being assessed with moderate data for (a) cl and (b) cmc/4 (white x's: observations/validation domain, red x's: predictions/application domain)
Fig. 19
Prediction locations where validation/calibration methods are being assessed with moderate data for (a) cl and (b) cmc/4 (white x's: observations/validation domain, red x's: predictions/application domain)
Close modal
Fig. 20
Prediction location uncertainty intervals (a) uncertainty intervals for cl (two experimental samples), (b) uncertainty intervals for cm (two experimental samples), (c) uncertainty intervals for cl (16 experimental samples), (d) uncertainty intervals for cmc/4 (16 experimental samples) at α = 18 deg and δ = 25 deg
Fig. 20
Prediction location uncertainty intervals (a) uncertainty intervals for cl (two experimental samples), (b) uncertainty intervals for cm (two experimental samples), (c) uncertainty intervals for cl (16 experimental samples), (d) uncertainty intervals for cmc/4 (16 experimental samples) at α = 18 deg and δ = 25 deg
Close modal
Fig. 21
Conservativeness at 20 prediction locations for each method for (a) cl and (b) cmc/4
Fig. 21
Conservativeness at 20 prediction locations for each method for (a) cl and (b) cmc/4
Close modal

When examining the tightness of these methods for prediction locations as displayed in Fig. 22, it is shown that MAVM is the tightest. It is important to note the difference in tightness however for MAVM between lift coefficient and moment coefficient. This is because, as further investigation of the prediction intervals showed, some of the predictions have lift coefficient values similar to the experiment. Therefore, the prediction interval greatly over predicts the true value. However, the moment coefficient values from experiment contain more variability across the input domain, having the prediction locations closer to lower bounds of the prediction intervals. The Bayesian updating method, which is a function of a prior chosen initial variance and correlation length scale, was found to be not very tight due to the large confidence intervals associated with the method at locations where no observational data is provided.

Fig. 22
Tightness at 20 prediction locations for each method for (a) cl and (b) cmc/4
Fig. 22
Tightness at 20 prediction locations for each method for (a) cl and (b) cmc/4
Close modal
Fig. 24
Prediction locations where validation/calibration methods are being assessed with plentiful data for (a) cl and (b) cmc/4 (white x's: observations/validation domain, red x's: predictions/application domain)
Fig. 24
Prediction locations where validation/calibration methods are being assessed with plentiful data for (a) cl and (b) cmc/4 (white x's: observations/validation domain, red x's: predictions/application domain)
Close modal

Assessing these methods using equal weighting of conservativeness and tightness (αw = 0.5) the MAVM is seen to perform slightly better, as shown in Fig. 23. However, as αw increases, placing a greater weight on conservativeness, Bayesian calibration has a slightly better overall performance, followed closely by the other four methods.

Fig. 23
Overall assessment at 20 predictions locations using αw = 0.5 for (a) cl and (b) cmc/4 and αw = 0.9 for (c) cl and (d) cmc/4
Fig. 23
Overall assessment at 20 predictions locations using αw = 0.5 for (a) cl and (b) cmc/4 and αw = 0.9 for (c) cl and (d) cmc/4
Close modal

6.3 Plentiful Experimental Observations.

One hundred 21 observations were made at angles of attack 1 deg, 4 deg, 9 deg, 15 deg, 19 deg, 24 deg, 28 deg, 32 deg, 35 deg, 40 deg, and 41 deg and flap deflections of 2 deg, 5 deg, 7 deg, 10 deg, 13 deg, 15 deg, 21 deg, 26 deg, 29 deg, 35 deg, and 41 deg, and the uncertainties or model errors/discrepancies are extrapolated to the same 20 prediction locations used in the previous cases. These locations are shown in Fig. 24 with the observation locations being shown in white, and prediction locations in red. For this case, there are many experimental observation locations and the prediction domain is almost entirely within the validation domain (i.e., there is very little extrapolation). The uncertainty intervals for each method are shown in Figure 25 for experimental sample sizes of 2 and 16. It is seen that even with the large amount of experimental data, the uncertainty intervals for each respective method are still relatively large due to the added prediction interval for the interpolation/extrapolation and the confidence interval for the Gaussian process for Bayesian calibration. In this case, each of the methods is found to be reliably conservative as shown in Fig. 26. The Bayesian calibration method is conservative in nearly every prediction case due to the method's ability to actively update the model discrepancy for a better approximation in relation to the experiment. The other four methods are also nearly conservative in every instance with the exception of one prediction location for the moment coefficient. This is due to the interpolations being nonconservative at one prediction location slightly outside the domain where observations were made. In measurement of tightness as shown in Figure 27, Bayesian calibration and the modified area validation metric were shown to be the tightest.

Fig. 27
Tightness at 20 prediction locations for each method for (a) cl and (b) cmc/4
Fig. 27
Tightness at 20 prediction locations for each method for (a) cl and (b) cmc/4
Close modal
Fig. 25
Prediction location uncertainty intervals (a) uncertainty intervals for cl (two experimental samples), (b) uncertainty intervals for cmc/4 (two experimental samples), (c) uncertainty intervals for cl (16 experimental samples), (d) uncertainty intervals for cmc/4 (16 experimental samples) at α = 18 deg and δ = 25 deg
Fig. 25
Prediction location uncertainty intervals (a) uncertainty intervals for cl (two experimental samples), (b) uncertainty intervals for cmc/4 (two experimental samples), (c) uncertainty intervals for cl (16 experimental samples), (d) uncertainty intervals for cmc/4 (16 experimental samples) at α = 18 deg and δ = 25 deg
Close modal
Fig. 26
Conservativeness at 20 prediction locations for each method for (a) cl and (b) cmc/4
Fig. 26
Conservativeness at 20 prediction locations for each method for (a) cl and (b) cmc/4
Close modal

When looking at the overall assessment of the methods in Fig. 28, the MAVM and Bayesian calibration perform the best in preliminary design cases where conservativeness and tightness are equally weighted (αw = 0.5). However, when the assessment weight factor is increased to αw = 0.9, it is clear that Bayesian calibration, and to a lesser extent, the MAVM is slightly better than the other two approaches in higher consequence scenarios where more reliable conservativeness is preferred and plentiful data are available over the entire prediction domain.

Fig. 28
Overall assessment at 20 predictions locations using αw = 0.5 for (a) cl and (b) cmc/4 and αw = 0.9 for (c) cl and (d) cmc/4
Fig. 28
Overall assessment at 20 predictions locations using αw = 0.5 for (a) cl and (b) cmc/4 and αw = 0.9 for (c) cl and (d) cmc/4
Close modal

7 Conclusions

Upon examining and comparing these five validation/calibration methods, it is observed that the AVM is shown to be the least reliably conservative of the five methods for validation where data are available. In fact, with the development of the MAVM, MAVM is preferred over the AVM due to its included confidence interval and its ability to detect bias error. The other four methods are generally conservative at locations with data. When examining their predictive capability, the MAVM and Bayesian calibration appear to perform the best depending on the amount of observation data available when considering both conservativeness and tightness in the overall assessment. However, as more of a weight is placed on conservativeness, as it would be for high consequence applications, Bayesian calibration performs better than the MAVM for moderate amounts of data. For plentiful data, Bayesian calibration and the MAVM slightly outperformed the other three methods. With more observation points, calibration is more attractive. With limited data, simply estimating the model form uncertainty (with no calibration) is recommended. These findings are summarized in Table 2. This table shows the recommended approach given the amount of experimental data or interpolation/extrapolation that is required from the validation domain to the prediction domain and also takes into account the level of risk one is willing to assume. Since both the assessment of the metrics for lift coefficient and moment coefficient saw generally the same trends in performance, it would be expected that these rankings and recommendations would be applicable for any validation experiment based on the amount of experimental data available and the amount of extrapolation required to locations where validation or calibration is being done.

Table 2

Validation/calibration recommendation as a factor of experimental data and decision risk

Decision risk
Amount of experimental dataLow (preliminary design)High (high consequence)
Sparse/extensive extrapolationMFU only (MAVM)MFU only (MAVM)
Moderate/some extrapolationMFU only (MAVM)Calibration + MFU (K and O or MAVM)
Plentiful/interpolation onlyMainly calibration (K and O)Calibration + MFU (K and O or MAVM)
MFU: model form uncertainty
MAVM: modified area validation metric [7]
V&V 20: ASMEs Standard Validation Uncertainty [9]
K&O: Bayesian calibration [10]
Decision risk
Amount of experimental dataLow (preliminary design)High (high consequence)
Sparse/extensive extrapolationMFU only (MAVM)MFU only (MAVM)
Moderate/some extrapolationMFU only (MAVM)Calibration + MFU (K and O or MAVM)
Plentiful/interpolation onlyMainly calibration (K and O)Calibration + MFU (K and O or MAVM)
MFU: model form uncertainty
MAVM: modified area validation metric [7]
V&V 20: ASMEs Standard Validation Uncertainty [9]
K&O: Bayesian calibration [10]

Acknowledgment

This work was supported by Intelligent Light (Dr. Earl Duque Project Manager) as part of a Phase II SBIR funded by the U.S. Department of Energy, Office of Science, Office of Advance Scientific Computing Research, under Award Number DE-SC0015162. This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof.

The authors would like to thank Cray Inc. for provided access to their corporate Cray XE40 computer, Geert Wenes of Cray Inc. for helping to acquire access and David Whitaker from Cray Inc. for assistance in porting of OVERFLOW2 to the XE40 and for streamlining the use of FieldView on their system. Special thanks to Dr. Heng Xiao and Dr. Jinlong Wu for providing their insight on Bayesian updating, and Professor James Coder at the University of Tennessee in Knoxville for providing the setup of the OVERFLOW2 runs used for establishing the synthetic experimental data used in this study.

Funding Data

  • Office of Science (Grant No. DE-SC0015162; Funder ID: 10.13039/100006132).

Data Availability Statement

The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.

Nomenclature

AVM =

area validation metric

cl =

2D lift coefficient

cm(c/4) =

2D moment coefficient about the quarter chord

CDF =

cumulative distribution function

d =

area validation metric

d =

area validation metric for area smaller than simulation CDF (MAVM)

D =

mean experimental result

d+ =

area validation metric for area larger than simulation CDF (MAVM)

dz/dx =

derivative of the mean camber line with respect to x

E =

model error

F(Y) =

simulation CDF curve

f* =

single Gaussian process posterior realization

GP =

Gaussian process

k =

coverage factor

k(x,x′) =

covariance matrix between x and x

l =

length scale

MAVM =

modified area validation metric

MFU =

model form uncertainty

m(x) =

mean function of the Gaussian process prior

N =

number of samples

nobs =

number of observations

S =

mean simulation result

Sn(Y) =

experiment CDF curve

s2 =

variance in experimental data

SRQ =

system response quantity

uD =

experimental data uncertainty

uinput =

input uncertainty

unum =

numerical uncertainty

uval =

validation uncertainty

X =

locations for which data are available in Bayesian updating

X* =

locations for which model discrepancy is identified using Bayesian estimation

α =

angle of attack, degrees

δ =

flap deflection, degrees

δmodel =

model discrepancy

σn2 =

variance in observation model discrepancies

Φ =

overall assessment

Φ1 =

conservativeness of a method

Φ2 =

tightness of a method

Φ2,v =

tightness for validation method

Φ2,c =

tightness for calibration method

Appendix

Automation of Model Form Uncertainty Methods.

High-level algorithms for implementing three of the MFU estimation methods discussed in Sec. 2 are given below.

Algorithm 1

Modified area validation metric

1. SnConf(Y)=[Sn(Y)tα2,v1std(sn(Y))N,Sn(Y)+tα2,v1std(sn(Y))N] Calculate confidence interval CDFs for the experiment
2. Compute: pF=1S Determine individual probability for simulation SRQ
3. Compute: pSn=1N Determine individual probability for experimental samples
4. fork = 1 to 2 do
5. Sn(Y) = SnConf(Y)(k)
6. ifN > Sdo
7.  for j = 1 to S do
8.   ifdREM ! = 0 do
9.      Compute: d(i) = (Sn(Y)(i)) – F(Y(j)))*(pSn*i – pF*(j–1)) Determine remaining area between previous simulation SRQ and individual experiment CDF
10.      if d(i) > 0 do
11.          d+ = d+ + d(i) Sum area greater than simulation CDF
12.      else do
13.          d = d + d(i) Sum area less than simulation CDF
14.     end if Increase experiment SRQ index
15.     Compute: i = i + 1
16.   end if
17.   while j*pF > i*pSndo
18.      Compute: d(i) = (Sn(Y(i)) – F(Y(j)))*pF Determine area between individual simulation SRQ and experiment CDF
19.      if d(i) > 0 do
20.          Compute: d+ = d+ + d(i) Sum area greater than simulation CDF
21.     else do
22.          Compute: d = d + d(i) Sum area less than simulation CDF
23.     end if Increase experiment SRQ index
24.     Compute: i = i + 1
25.    end while
26.    Compute: dREM = (Sn(Y(i)) – F(Y(j)))*(pF*(j) – pSn*(i–1)) Determine remaining area between individual simulation SRQ and previous experiment CDF
27.    if dREM > 0 do
28.          Compute: d+ = d+ + dREM
29.    else do Sum area greater than simulation CDF
30.          Compute: d = d + dREM
31.    end if Sum area less than simulation CDF
32.  end for
33. else if N  S do
34.  for j = 1 to N do
35.    if dREM ! = 0 do
36.         Compute: d(i) = (Sn(Y(j)) – F(Y(i)))*(pF*i – pSn*(j–1)) Determine remaining area between individual simulation SRQ and previous experiment CDF
37.     ifd(i) > 0 do
38.          d+ = d+ + d(i)
39.     else do Sum area greater than simulation CDF
40.          d = d + d(i)
41.     end if Sum area less than simulation CDF
42.     Compute: i = i + 1 Increase simulation SRQ index
43.    end if
44.    while i*pF < j*pSndo
45.         Compute: d(i) = (Sn(Y(j)) – F(Y(i)))*pF Determine area between individual simulation SRQ and experiment CDF
46.     if d(i) > 0 do
47.          Compute: d+ = d+ + d(i) Sum area greater than simulation CDF
48.     else do
49.          Compute: d = d + d(i) Sum area less than simulation CDF
50.     end if
51.     Compute: i = i + 1 Increase simulation SRQ index
52.    end while
53.    Compute: dREM = (Sn(Y(j)) – F(Y(i)))*(pSn*(j) – pF*(i–1)) Determine remaining area between previous simulation SRQ and individual experiment CDF
54.    if dREM > 0 do
55.          Compute: d+ = d+ + dREM
56.    else do Sum area greater than simulation CDF
57.          Compute: d = d + dREM
58.     end if Sum area less than simulation CDF
59.   end for
60. end if
61.  Compute: dconf+(k) =abs(d+) Save each calculated area of the confidence interval CDFs being greater than the simulation CDF
62.  Compute: dconf-(k)=abs(d) Save each calculated area of the confidence interval CDFs being less than the simulation CDF
63. end for
64. Compute: d+ = max(dconf+) Take maximum of the positive areas and the upper bound uncertainty
65. Compute: d = max(dconf) Take the maximum of the negative areas as the lower bound uncertainty
1. SnConf(Y)=[Sn(Y)tα2,v1std(sn(Y))N,Sn(Y)+tα2,v1std(sn(Y))N] Calculate confidence interval CDFs for the experiment
2. Compute: pF=1S Determine individual probability for simulation SRQ
3. Compute: pSn=1N Determine individual probability for experimental samples
4. fork = 1 to 2 do
5. Sn(Y) = SnConf(Y)(k)
6. ifN > Sdo
7.  for j = 1 to S do
8.   ifdREM ! = 0 do
9.      Compute: d(i) = (Sn(Y)(i)) – F(Y(j)))*(pSn*i – pF*(j–1)) Determine remaining area between previous simulation SRQ and individual experiment CDF
10.      if d(i) > 0 do
11.          d+ = d+ + d(i) Sum area greater than simulation CDF
12.      else do
13.          d = d + d(i) Sum area less than simulation CDF
14.     end if Increase experiment SRQ index
15.     Compute: i = i + 1
16.   end if
17.   while j*pF > i*pSndo
18.      Compute: d(i) = (Sn(Y(i)) – F(Y(j)))*pF Determine area between individual simulation SRQ and experiment CDF
19.      if d(i) > 0 do
20.          Compute: d+ = d+ + d(i) Sum area greater than simulation CDF
21.     else do
22.          Compute: d = d + d(i) Sum area less than simulation CDF
23.     end if Increase experiment SRQ index
24.     Compute: i = i + 1
25.    end while
26.    Compute: dREM = (Sn(Y(i)) – F(Y(j)))*(pF*(j) – pSn*(i–1)) Determine remaining area between individual simulation SRQ and previous experiment CDF
27.    if dREM > 0 do
28.          Compute: d+ = d+ + dREM
29.    else do Sum area greater than simulation CDF
30.          Compute: d = d + dREM
31.    end if Sum area less than simulation CDF
32.  end for
33. else if N  S do
34.  for j = 1 to N do
35.    if dREM ! = 0 do
36.         Compute: d(i) = (Sn(Y(j)) – F(Y(i)))*(pF*i – pSn*(j–1)) Determine remaining area between individual simulation SRQ and previous experiment CDF
37.     ifd(i) > 0 do
38.          d+ = d+ + d(i)
39.     else do Sum area greater than simulation CDF
40.          d = d + d(i)
41.     end if Sum area less than simulation CDF
42.     Compute: i = i + 1 Increase simulation SRQ index
43.    end if
44.    while i*pF < j*pSndo
45.         Compute: d(i) = (Sn(Y(j)) – F(Y(i)))*pF Determine area between individual simulation SRQ and experiment CDF
46.     if d(i) > 0 do
47.          Compute: d+ = d+ + d(i) Sum area greater than simulation CDF
48.     else do
49.          Compute: d = d + d(i) Sum area less than simulation CDF
50.     end if
51.     Compute: i = i + 1 Increase simulation SRQ index
52.    end while
53.    Compute: dREM = (Sn(Y(j)) – F(Y(i)))*(pSn*(j) – pF*(i–1)) Determine remaining area between previous simulation SRQ and individual experiment CDF
54.    if dREM > 0 do
55.          Compute: d+ = d+ + dREM
56.    else do Sum area greater than simulation CDF
57.          Compute: d = d + dREM
58.     end if Sum area less than simulation CDF
59.   end for
60. end if
61.  Compute: dconf+(k) =abs(d+) Save each calculated area of the confidence interval CDFs being greater than the simulation CDF
62.  Compute: dconf-(k)=abs(d) Save each calculated area of the confidence interval CDFs being less than the simulation CDF
63. end for
64. Compute: d+ = max(dconf+) Take maximum of the positive areas and the upper bound uncertainty
65. Compute: d = max(dconf) Take the maximum of the negative areas as the lower bound uncertainty
Algorithm 2

ASME's standard validation uncertainty

1. Compute: E = means(Sn(Y)) – mean(F(Y)) Determine error between simulation and experiment
2. Compute: udata=tα2,v1std(sn(Y))N Determine uncertainty in experimental data
3. Compute: uinput = std(F(Y)) Determine uncertainty in simulation due to nondeterministic inputs
4. Compute: unum = uro + uiter + uDE Determine numerical uncertainty in simulation
5. Compute: uval=udata2+uinput2+unum2 Determine overall validation uncertainty
1. Compute: E = means(Sn(Y)) – mean(F(Y)) Determine error between simulation and experiment
2. Compute: udata=tα2,v1std(sn(Y))N Determine uncertainty in experimental data
3. Compute: uinput = std(F(Y)) Determine uncertainty in simulation due to nondeterministic inputs
4. Compute: unum = uro + uiter + uDE Determine numerical uncertainty in simulation
5. Compute: uval=udata2+uinput2+unum2 Determine overall validation uncertainty
Algorithm 3

pBayesian Updating [10]

1. for i = 1 to max(1) Determine covariance matrix for observations
2.  Ko(X,X)=σo2exp(|XXT|2l(i)2)
3.  ln(p(yo|xo,σo,l))=12yoTKoyo12ln(det(Ko))nobs2ln(2π) Sample different l values for maximum likelihood
4. end for
5. Compute: K(X*,X)=σo2exp(|XX*|2l2) Determine covariance matrix for observations and predictions
6. Compute: K(X*,X*)=σo2exp(|X*X*T|2l2) Determine covariance matrix for predictions
7. Compute: f¯*=K(X*,X)[K(X,X)+σn2I]1y Determine mean posterior function
8. Compute: cov(f*)=K(X*,X*)K(X*,X)[K(X,X)+σn2I]1K(X,X*) Determine covariance of mean posterior function
9. Compute: L = chol(cov(f*)) Cholesky decompose posterior covariance matrix
10. for j = 1 to n do
11.  Compute: f*(j) = L*randn(length(X*),1) Sample posterior realization
12. end do
1. for i = 1 to max(1) Determine covariance matrix for observations
2.  Ko(X,X)=σo2exp(|XXT|2l(i)2)
3.  ln(p(yo|xo,σo,l))=12yoTKoyo12ln(det(Ko))nobs2ln(2π) Sample different l values for maximum likelihood
4. end for
5. Compute: K(X*,X)=σo2exp(|XX*|2l2) Determine covariance matrix for observations and predictions
6. Compute: K(X*,X*)=σo2exp(|X*X*T|2l2) Determine covariance matrix for predictions
7. Compute: f¯*=K(X*,X)[K(X,X)+σn2I]1y Determine mean posterior function
8. Compute: cov(f*)=K(X*,X*)K(X*,X)[K(X,X)+σn2I]1K(X,X*) Determine covariance of mean posterior function
9. Compute: L = chol(cov(f*)) Cholesky decompose posterior covariance matrix
10. for j = 1 to n do
11.  Compute: f*(j) = L*randn(length(X*),1) Sample posterior realization
12. end do

References

1.
Oberkampf
,
W. L.
, and
Roy
,
C. J.
,
2010
,
Verification and Validation in Scientific Computing
,
Cambridge University Press
,
New York
.
2.
Oberkampf
,
W. L.
, and
Smith
,
B. L.
,
2017
, “
Assessment Criteria for Computational Fluid Dynamics Model Validation Experiments
,”
ASME J. Verif., Valid., Uncert. Quantif.
,
2
(
3
), p.
031002
.10.1115/1.4037887
3.
Montgomery
,
D. C.
,
2017
,
Design and Analysis of Experiments
, 9th ed,.
Wiley
, Hoboken,
NJ
.
4.
ASME
,
2005
, “
Test Uncertainty
,” ASME, New York, Standard No. ASME PTC 19.1-2005.
5.
ISO
,
1995
,
ISO Guide to the Expression of Uncertainty in Measurement
,
ISO
,
Geneva, Switzerland
.
6.
Ferson
,
S.
,
Oberkampf
,
W. L.
, and
Ginzburg
,
L.
,
2008
, “
Model Validation and Predictive Capability for the Thermal Challenge Problem
,”
Comput. Methods Appl. Mech. Eng.
,
197
(
29–32
), pp.
2408
2430
.10.1016/j.cma.2007.07.030
7.
Voyles
,
I. T.
, and
Roy
,
C. J.
, Jan
2014
, “
Evaluation of Model Validation Techniques in the Presence of Uncertainty
,”
AIAA
Paper No. 2014-0120. 10.2514/6.2014-0120
8.
Voyles
,
I. T.
, and
Roy
,
C. J.
, Jan
2015
, “
Evaluation of Model Validation Techniques in the Presence of Aleatory and Epistemic Input Uncertainties
,”
AIAA
Paper No. 2015-1374. 10.2514/6.2015-1374
9.
ASME
,
2009
,
Standard for Verification and Validation in Computational Fluid Dynamics and Heat Transfer
,
American Society of Mechanical Engineers
,
New York
, ASME Standard No. V&V 20-2009.
10.
Kennedy
,
M. C.
, and
O'Hagan
,
A.
,
2001
, “
Bayesian Calibration of Computer Models
,”
J. R. Stat. Soc. Ser. B (Stat. Methodol.)
,
63
(
3
), pp.
425
464
.10.1111/1467-9868.00294
11.
Stripling
,
H. F.
,
Adams
,
M. L.
,
McClarren
,
R. G.
, and
Mallick
,
B. K.
,
2011
, “
The Method of Manufactured Universes for Validating Uncertainty Quantification Methods
,”
Reliab. Eng. Syst. Saf.
,
96
(
9
), pp.
1242
1256
.10.1016/j.ress.2010.11.012
12.
Roy
,
C. J.
, and
Oberkampf
,
W. L.
,
2011
, “
A Comprehensive Framework for Verification, Validation, and Uncertainty Quantification in Scientific Computing
,”
Comput. Methods Appl. Mech. Eng.
,
200
(
25–28
), pp.
2131
2144
.10.1016/j.cma.2011.03.016
13.
Roy
,
C. J.
, and
Balch
,
M. S.
,
2012
, “
A Holistic Approach to Uncertainty Quantification With Application to Supersonic Nozzle Thrust
,”
Int. J. Uncertainty Quantif.
,
2
(
4
), pp.
363
381
.10.1615/Int.J.UncertaintyQuantification.2012003562
14.
Roache
,
P. J.
,
2017
, “
Interpretation of Validation Results Following ASME V&V20-2009
,”
ASME J. Verif., Valid., Uncertainty Quantif.
,
2
(
2
), p.
024501
.10.1115/1.4037706
15.
Coleman
,
H. W.
, and
Steele
,
W. G.
,
2009
,
Experimentation, Validation, and Uncertainty Analysis for Engineers
, 3rd ed.,
Wiley and Sons
,
New York
.
16.
Rasmussen
,
C. E.
, and
Williams
,
C. K. I.
,
2006
,
Gaussian Processes for Machine Learning
,
The MIT Press
, Cambridge, MA, pp.
14
16
.
17.
Buning
,
P. G.
,
Gomez
,
R. J.
, and
Scallion
,
W. I.
,
2004
, “
CFD Approaches for Simulation of Wing-Body Stage Separation
,”
AIAA
Paper No. 2004-4838. 10.2514/6.2004-4838
18.
Morrison
,
J. H.
,
1998
, “
Numerical Study of Turbulence Model Predictions for the MD 30P/30N and NHLP-2D Three-Element Highlift Configurations
,” NASA, Langley, VA, Standard No. NASA/CR-1998-208967, NAS 1.26:208967.
19.
Anderson
,
J. D.
,
2011
,
Fundamentals of Aerodynamics
, 5th ed.,
McGraw-Hill
, New York.