Abstract

There is an urgent need for developing collaborative process-defect modeling in metal-based additive manufacturing (AM). This mainly stems from the high volume of training data needed to develop reliable machine learning models for in-situ anomaly detection. The requirements for large data are especially challenging for small-to-medium manufacturers (SMMs), for whom collecting copious amounts of data is usually cost prohibitive. The objective of this research is to develop a secured data sharing mechanism for directed energy deposition (DED) based AM without disclosing product design information, facilitating secured data aggregation for collaborative modeling. However, one major obstacle is the privacy concerns that arise from data sharing, since AM process data contain confidential design information, such as the printing path. The proposed adaptive design de-identification for additive manufacturing (ADDAM) methodology integrates AM process knowledge into an adaptive de-identification procedure to mask the printing trajectory information in metal-based AM thermal history, which otherwise discloses substantial printing path information. This adaptive approach applies a flexible data privacy level to each thermal image based on its similarity with the other images, facilitating better data utility preservation while protecting data privacy. A real-world case study was used to validate the proposed method based on the fabrication of two cylindrical parts using a DED process. These results are expressed as a Pareto optimal solution, demonstrating significant improvements in privacy gain and minimal utility loss. The proposed method can facilitate privacy improvements of up to 30% with as little as 0% losses in dataset utility after de-identification.

1 Introduction

One of the biggest limitations in the broader adoption of directed energy deposition (DED) based additive manufacturing (AM) techniques is the in-situ defect detection for part certification. It is crucial for users to detect process anomalies in an effective and timely manner since the offline counterpart methods have proven costly and time-consuming [14]. Machine learning (ML) and artificial intelligence have played a crucial role in the development of in-situ anomaly detection models for AM [47]. However, due to the high part complexity, the highly variable part designs, and printing parameters, building a robust machine learning model for in-situ process monitoring requires large amounts of training data, which can be prohibitively expensive [4,5,8]. Recently, the AM research community has identified these obstacles as a serious roadblock to the accelerated adoption of AM, especially for those small-to-medium sized manufacturers (SMMs) [46].

One potential solution is to facilitate data sharing through the direct aggregation of process data from multiple AM users [9,10]. The idea of data sharing has been proposed as an important tool to expand AM technologies [7] and several publications also see it as a remedy to limited data availability plaguing SMMs [5,6]. The aggregated training data can then be leveraged to develop a more accurate, robust, and generalizable machine learning model for anomaly detection. Furthermore, these models would require less training data from each user than traditional independent machine learning models [5,11]. This is especially helpful for SMMs, as it will decrease the amount of data required from each user and tackles one of the discussed challenges for integrating machine learning with AM [46].

Unfortunately, the major obstacle in aggregating process data from multiple AM users is the data privacy concerns that arise from sharing process data outside the user's organization. This key drawback is a highly discussed limitation and forms one of the major gaps in the development and implementation of AM data sharing frameworks [5,7]. In AM, the process data contain critical product design information, which heavily involves the intellectual property (IP) of the individual user. Sharing these data outside the user's organization can potentially expose AM users to the risk of IP theft. This could occur when a malicious third-party gains access to the shared data and can reverse engineer the AM design specifications, utilizing the printing path and other parameters derived from the AM process data. What is worse, AM is typically used in new product prototyping due to its toolless and flexible fabrication for accelerated design iterations. Therefore, the risk of IP theft in AM process data can be even more detrimental to AM practitioners, especially SMM users.

This paper proposes an adaptive design de-identification for additive manufacturing (ADDAM) methodology for masking the design information contained in AM thermal process data, while simultaneously retaining the quality related information for anomaly detection. This methodology will allow for the secure sharing of AM process data among multiple users, which establishes the foundation for data aggregation and transfer learning modeling. This will facilitate the development of collaborative privacy-preserving anomaly detection models with improved IP security and model robustness (Fig. 1). The technical contributions of this paper include: (1) the development of process data privacy and design de-identification framework for AM applications; and (2) the development of the new ADDAM algorithm with measurable privacy and utility for AM process data.

The remainder of the paper is organized as follows. The state-of-the-art studies are summarized in Sec. 2. Section 3 discusses the data privacy problem and de-identification methods for AM applications. In Sec. 4, the proposed ADDAM methodology is introduced, and Sec. 5 introduces the case study to evaluate the effectiveness of the proposed method. Finally, the conclusion and future work are summarized in Sec. 6.

2 Related Research

This section provides a survey of research related to the proposed method, which includes (1) collaborative defect detection for metal-based AM; (2) AM process security and privacy concerns; and (3) a brief survey of the currently used anonymization techniques and their corresponding limitations.

2.1 Collaborative Defect Detection for Metal-Based Additive Manufacturing.

This section focuses on the relevance of collaborative smart manufacturing in metal-based AM processes. Various in-situ process monitoring and defect detection methods have been proposed for identifying anomalies [12]. Among those methods, thermal imaging has been adopted to capture the AM thermal history under the premise that a stable thermal history will result in homogenous and thus defect-free structures. The high-dimensional thermal history data are reduced to extract key process features that are then leveraged for anomaly detection [13,1316]. Moreover, layer-wise anomaly detection methods using thermal process data have been proposed for DED processes [3,1618], which provide an additional advantage compared to the defect detection models that only use local thermal features. However, the key limitation of this previous work is that these models were only evaluated using one set of design and printing parameters at a time. Changes in the process parameters can lead to deteriorated model accuracy, and the models would need to be re-trained and re-validated by newly collected data. This makes it potentially infeasible to develop accurate anomaly detection models for SMMs, who may print small batches of highly diverse parts [8,10]. Transfer learning techniques can be leveraged to address the modeling limitations related to limited data availability. Transfer learning provides the user with the ability to apply learned knowledge or data from one domain to another related domain [19]. This would allow the knowledge contained within multiple datasets to be leveraged in machine learning models, instead of completely discarding and re-collecting data to accommodate the change of AM process parameters. This can further the development of a collaborative data sharing framework. Currently, transfer learning has been proposed for transferring knowledge between different machines [10] and materials [20] for anomaly detection and distortion quantification [9,21]. However, there are significant data privacy risks that may arise from sharing AM process data among different AM users. The AM process data contain confidential product information (e.g., design specifications and mechanical properties) that may jeopardize the product IP. By sharing AM process data outside their organization, AM users compromise their data privacy and are exposed to the risk for IP theft [2224]. This is especially detrimental when using AM in the early phase of product prototyping and development. Lack of IP protection may lead to tremendous loss for the enterprise [25,26]. Therefore, there is an urgent need in establishing a privacy-preserving data sharing framework to facilitate data sharing among multiple AM users for collaborative process-defect modeling, while not disclosing confidential product design information.

2.2 Privacy and Security Concerns in Additive Manufacturing Systems.

In the new era of industry 4.0, manufacturing systems are becoming more interconnected [27]. As AM systems have become increasingly prominent within industrial manufacturing applications, privacy and security have become significant issues that can affect a variety of different aspects of the AM process [24]. Traditionally, there are three fundamental concepts related to data security: confidentiality, integrity, and availability. This triad of security concerns encompasses vulnerabilities in manufacturing, including the overall data confidentiality, data reliability and consistency, and availability of equipment for service [26]. Most current data security and privacy concerns focus on preventing cyber-physical attacks that target data integrity and availability, which can diminish the availability of the equipment or integrity of the printed parts and collected data [22,24,26,2830]. However, this leaves a significant gap for preventing cyber-domain attacks, which target the product IP of the users [24,25].

The main threats to AM IP protection are the attacks on data confidentiality. This type of attacks is commonly conducted by gaining malicious access to process data or related datasets and extracting key details to identify some confidential information [26]. This attack can be directly leveraged with AM process data to retrieve the product printing path information, and then reverse engineer the printed part design specifications [22,24] (Fig. 2). These attacks can be costly and detrimental to the AM users, as they directly attack the user's IP [26]. There are four specific tactics leveraged to preserve data privacy and prevent confidentiality attacks, including anonymization, access control, encryption, and querying systems [22]. From these different techniques, the most viable options for enhancing data security and facilitating transfer learning include anonymization and data encryption.

The objective of anonymization is to remove or obscure the confidential information contained within the dataset, reducing the availability of specific, identifying characteristics available within the dataset [22,31]. The privacy is enhanced by either suppressing or generalizing identifying features that can be used to collect sensitive information contained within the data. However, the biggest limitations facing anonymization revolve around ensuring that the data protection is strong enough to withstand re-identification attacks [22]. On the other hand, encryption is also a strong data security technique, which encodes the data so that it appears to be random, irrelevant data that are hard to understand without the proper encryption keys [32]. Despite the proven data protections, there are still reservations surrounding the overall usability of the post-encryption data [22]. Specific forms of encryption, such as homogeneous encryption, are designed to allow computations to be performed once the data are encrypted [32], but the computational complexity is limited to only simple models [22,33]. In addition, for both privacy measures, as the extent of the data protection increases, the overall usability of the protected data decreases [22,31]. This means that achieving higher levels of data privacy traditionally leads to greater losses in data usability. Anonymization and encryption provide specific advantages to data privacy protection, but still face major challenges when balancing data privacy with data usability. Due to the additional computational restrictions associated with encryption, anonymization provides a potentially more effective framework for incorporating data privacy measures into collaborative, data-sharing AM applications.

2.3 The k-Anonymization Method and Its Applications.

This section details various anonymization methods, including the k-anonymization and k-same family of methodologies, as well as other de-identification models, which form the foundation for the proposed ADDAM methodology. Moreover, the major limitations of these methods, when applying to AM design de-identification, are summarized.

2.3.1 Traditional k-Anonymization and Adaptations.

k-anonymization is a specific form of de-identification for data privacy proposed in Ref. [34], and is an effective solution to guaranteeing data privacy, while still preserving some data usability. This method was originally designed for protecting individual sample identities and was primarily implemented for the tabular dataset applications. This includes data privacy protection for customer data [35], healthcare data [34], and public transportation data [36], as well as various other applications where the privacy for the sample identities is required. Tabular-structured datasets are defined as datasets that are minimally complex and contain independent (or weakly correlated) features, such as a person's name, zip code, social security number, health condition, and others. These types of datasets provide an ideal application of k-anonymization, where the identity-compromising attributes are either generalized or suppressed to the point where there are at least k − 1 identical samples for each sample in the dataset [34,37]. However, for more complex applications, additional modifications are needed to improve the applicability of k-anonymization. For example, the Mondrian multidimensional k-anonymization algorithm was formulated as an improved privacy-enhancing method to the traditional methodology [38]. The Mondrian method goes one step further to incorporate multidimensional partitioning to the anonymization procedure. This partitioning is used to achieve a more robust anonymization, as it factors in the relationship between different features during the generalization process [38]. Furthermore, clustering [39,40] and p-sensitive anonymization algorithms [37] have also been proposed as other improvements to the traditional k-anonymization method. These updated methodologies still leverage the key generalization and suppression techniques used to ensure data privacy, but provide additional approaches to enhancing the process [23]. For all cases of k-anonymization, data protection techniques are applied to the identifying features, instead of applying anonymization to all features in the dataset. This helps to ensure the user-defined level of data privacy, while maintaining the usability of the non-identifying attributes.

However, k-anonymization methods face a few critical limitations. First, the de-identification approach is primarily applicable to tabular-structured datasets. Traditional applications of traditional k-anonymization and its variants (Mondrian [38], clustering [39,40], and p-sensitive [35,37]) do not translate well to more complex data, such as image data or other multidimensional datasets. These datasets contain features that are highly correlated and highly nonlinear, which provides a new challenge for k-anonymization. Second, k-anonymization and most of its variants and enhancements cannot guarantee that there will be no data leakage [37]. These methods can provide enhanced data privacy, but do not provide complete protection, unless the dataset usability is extremely compromised. Finally, the proposed anonymization tactics of generalization and suppression are specific to the dataset application and can severely impact the interpretation of numerical attributes [34,37]. This is primarily attributed to the generalization tactic, which in many cases converts the numerical attribute into a categorical variable (i.e., a person's numeric age into a categorical age range). This impacts the overall usability of the dataset and may potentially affect the applications. Because of the abovementioned limitations, several novel approaches to extending k-anonymization to the privacy preservation of more complex data structures have been proposed, as discussed in the next section.

2.3.2 k-Anonymization for Image Data.

More recently, image data have become increasingly available, especially through the widespread implementation of security and surveillance monitoring systems. This has caused a drastic increase in the need for protecting individuals’ privacy and identity [41]. Traditional naive methods, such as blurring and pixilation, can mask the key identity information from images. However, they only serve the purpose of eliminating the identity of individuals within the images, and thus retain very little to no data utility [42]. Despite the alterations to the images, some of these methods only deter human recognition, as computer algorithms can be leveraged to reverse the distortions and re-identify those individuals [43]. To improve data privacy, several different techniques for facial de-identification algorithms have been developed [4150]. These different approaches provide stronger protection guarantees and better overall data usability in de-identified images, pulling inspiration from the previous work of k-anonymization [34].

From the different approaches to facial de-identification, there are a few methods that provide robust de-identification capacities, which show potential for applications extending beyond facial image data. First, the k-same approach takes the average of k-similar images within a subset of facial images, and replaces the subset with an averaged, surrogate image [41]. This method is the most naive scheme and extends the k-anonymization technique to complex image data, where these datasets can reach the same level of privacy as the k-anonymization algorithm (see Ref. [26] for proof). However, there are two main limitations of this methodology. The first is that the k-same method does not provide a satisfactory level of data utility [42]. This is because the image space is highly nonlinear and there is a steep utility loss (UL) when replacing the entire group of images with one single surrogate image. In addition, there is the threat of re-identification, since all the anonymization is performed using the original image dataset, meaning that some original information is contained within the published data [44]. From the k-same methodology, the k-same-select model was derived to improve the utility performance by providing prior knowledge about the dataset into the de-identification process, which further enhances the utility preservation [42]. Furthermore, the k-same-model (k-same-M) approach also extends the k-same method to implement de-identification within the active appearance models (AAM) [46], which are widely used in modeling and tracking facial image data. This produces a higher quality image, but there are still challenges in capturing key utility features, such as facial expressions during the anonymization process [49]. In summary, despite these enhancements, there were still significant gaps in applying data privacy to facial images to achieve a trade-off between privacy and utility.

To address the limitations of the k-same methods, the GARP-Face and attribute preserved face de-identification (APFD) anonymization algorithms were developed for de-identifying facial images to achieve better balance between privacy and utility. Instead of replacing image groups with a surrogate image, both methods define the facial features, construct nearest neighborhoods, and use a separate utility specific subset of images to perform the anonymization. The GARP-Face (Gender-Age-Race) model [44] identifies useful features to preserve information (e.g., gender, age, and race) and develops classifiers to identify these features from the sample images. These features are then leveraged to identify k-similar images, which are then combined in the de-identification process to produce a surrogate image. The APFD method [45] follows a similar approach but leverages an additional optimization function that determines the optimal weights to be applied when averaging images. This weighted objective function is directly applied to the shape and appearance parameters, maximizing the number of common attributes the original and de-identified image share. Furthermore, both techniques also implement AAM to identify and characterize the shape and appearance parameters of the face. Overall, the results from this improved feature-targeting and preservation process show improved privacy and data utility preservation.

It is worth noting that these different facial de-identification methods apply the same level of data privacy to each image in the de-identified dataset, making them global de-identification approaches. However, the global de-identification approach is difficult to be directly applied to AM thermal images for the following reasons. First, unlike the facial de-identification datasets, the AM thermal images suffer from limited data availability and a tendency to have repeating identities within the dataset. This can lead to compromised performance when directly applying a global de-identification model, as many of the nearest neighbor images may share the same identities, and the limited number of samples can degrade the overall dataset diversity. Second, the AM process data anomalies demonstrate high variations in their distributions, meaning that they are distinctly different from both the healthy distribution and each other. However, the facial image data do not encounter this problem, as most human faces will share a similar distribution of features. This creates another roadblock to directly implementing global de-identification methods, since directly averaging k-nearest neighbors will blur the difference between healthy and abnormal melt pool images, leading to dramatically degraded data usability (i.e., anomaly detection performance).

3 De-identification and Data Privacy for Additive Manufacturing

This section will introduce the various types of AM data, as well as the confidentiality and the vulnerability in these data. In addition, the role of data privacy in AM and the importance of maintaining the balance between data utility and privacy are explained. The formal definitions related to data privacy for AM applications set the foundation for the proposed ADDAM algorithm.

3.1 Additive Manufacturing Data Description.

As described in Fig. 3, various types of AM data are generated in the four major steps of AM, i.e., design, slicing, manufacturing, and inspection. Together, these steps construct the cyber-physical AM systems [5].

The design phase includes the generation of the computer aided design (CAD) and standard triangle language (STL) files, which represent the detailed, three-dimensional part design. This information is highly confidential, especially for rapid prototyping applications. Because of this, the data generated during this phase (CAD and STL design files) should be maintained internally, and never shared for IP protection purposes.

The slicing phase takes the design file as the input and generates a g-code file, which contains several different process parameters, including the printing path, print speed, layer thickness, temperature settings, and many others. Like the design files, these process parameters also contain confidential design attributes, and should never be shared externally.

The manufacturing phase involves the physical printing process while generating a variety of process data, including thermal imaging data, acceleration, acoustics, and others. Recently, the process data play critical roles in in-situ process monitoring and anomaly detection. However, the process data contain confidential design information, particularly relating to the printing geometries and parameters. These embedded features can be extracted and linked back to the part design, compromising the product IP. Therefore, the implementation of data privacy measures is particularly important at this phase because the collected process data are expected to be externally shared and aggregated.

Finally, the inspection phase is where the final printed part is evaluated for quality assurance. This includes checking the geometric, dimension, and tolerance features (GD&T) of the part, as well as detecting defects within the print part. Although this process also creates vulnerabilities for IP theft, most data collected during this phase will be stored internally and only accessed locally. The data from this phase that are shared externally for developing in-situ defect modeling (anomaly labels) usually do not contain confidential design information.

3.2 Key Definitions in Additive Manufacturing Privacy.

In this section, several important definitions of AM process data de-identification for process-defect modeling are introduced by integrating AM process knowledge into data privacy and anonymization related terminologies.

Definition I

AM data privacy is defined as the ability of the shared AM data to prevent a malicious third party from identifying critical product design specifications. For example, for metal AM thermal process data, specific privacy measures need to be applied directly to the melt pool images to properly de-identify/mask the printing trajectory information (Fig. 4). This creates a safeguard for protecting against IP thefts through the AM process data.

When applying the de-identification framework, the metal AM data discussed in Sec. 3.1 can be briefly categorized into three groups of attributes [23,34,37,38], as summarized in Table 1.

  1. AM sensitive attributes are attributes that can directly identify the design information contained within the dataset. This includes design data (i.e., CAD files), attributes derived from the design data (e.g., g-codes and printing angular information), and the complete thermal history, all of which pose a significant IP privacy risk. Furthermore, AM design features are embedded within the complete thermal history, which poses a significant risk of data privacy. These features can be directly extracted from the thermal process images themselves (as illustrated in Fig. 4). This creates a major vulnerability for the product IP when sharing the data externally, where malicious third parties could gain access to the complete thermal image set and extract these critical design features. Thus, it is important that AM sensitive attributes are kept locally, or any relationship between the shared data and corresponding sensitive attributes needs to be de-identified.

  2. AM quasi-identifiers are attributes, that alone, do not directly give away the product design information. However, when used in conjunction with other AM quasi-identifiers, or sensitive attributes, they can be leveraged to further identify confidential design features. For example, within the thermal process data, each melt pool image alone (or each pixel within the image) does not directly give away confidential design information. However, when a large enough set of thermal images are available, they can be directly used to re-identify the sensitive AM design features. Furthermore, features such as the layer-wise location of the melt pool, and the sequential image ID, can be used to enhance the identification of compromising trends and information within the process data. Ultimately, the AM quasi-identifier's relationship with the sensitive attributes should be removed or de-identified for secure data sharing.

  3. AM insensitive attributes are attributes that do not have any direct relationship with the design information. This includes the AM utility features, which represent the geometric and thermal features within the melt pools (e.g., melt pool area and eccentricity, and maximum temperature). Unlike the AM design features, these utility features are insensitive to design information, but informative for utility preservation (e.g., anomaly detection). Overall, they do not pose a security risk and are able to be leveraged for de-identification, or externally shared if desired.

Definition II

AM data utility is defined as the overall usability of the dataset for specific modeling purpose (e.g., anomaly detection) after applying privacy-preserving measures [51]. For the AM process data de-identification, this means that sufficient information is retained in the de-identified data for the end-user to train defect detection models. This is measured by the ability of a machine learning model to accurately detect the presence of anomalies within the de-identified data.

4 The Proposed ADDAM Methodology

In this section, the ADDAM methodology is proposed for de-identifying design information from AM melt pool image data. This new methodology focuses on developing a secure aggregation mechanism for collaborative process-defect modeling by masking the design information in the thermal history while retaining the process quality information. This section starts with an overview of the proposed ADDAM methodology (Fig. 5), followed by a subsequent breakdown for each of the main stages of the proposed method.

4.1 Proposed ADDAM Overview.

The major advantage of the ADDAM algorithm is the introduction of the novel adaptive mechanism to determine the level of data privacy on a per-image basis. This deviates from the traditional forms of k-anonymization, which take a global approach to data privacy, de-identifying each image with the same, globally determined level of data privacy. The proposed adaptive approach is motivated by the following two reasons.

First, the AM process data tend to be imbalanced and suffer from limited data availability, where there are vastly more cases of healthy melt pools as compared to abnormalities. This creates two major challenges. To start, there are potentially a limited number of unique angular identities available to de-identify. This means that de-identifying a sample image with its k-closet images may not necessarily improve data privacy if its nearest neighbors contain the same angular identity. In addition, due to the rare and diverse nature of anomalies, the k-closest images of an abnormal image may include either healthy images or abnormal images with different abnormality categories, leading to reduced distinction between healthy and unhealthy melt pool images after de-identification. This will significantly jeopardize the data utility (i.e., anomaly detection). Second, during the printing process there is a noticeable thermal distribution change over time in the thermal history. As a result, the baseline for healthy melt pools observed at different layers would vary significantly, even though their process parameters are set the same. Implementing a global k value completely neglects this drifting trend in the thermal distribution and will lead to de-identification using images that are not actually neighbors in the printing process.

A reference or gallery set of s thermal images, with each image containing r × c pixels, can be denoted as R:={RiRr×c,i=1,,s}. The proposed ADDAM methodology defines a transformation function f :ℝr×c → ℝr×c, which generates a surrogate thermal image for each observed thermal image IjRr×c, as illustrated in Eq. (1).
(1)
where I~jRr×c denotes the surrogate image for Ij with its angular identity φ(Ij) de-identified, φ(·) denotes the instantaneous printing orientation of the thermal image. The transformation function f is implemented by pooling the observed thermal image Ij with a selective subset of kj − 1 thermal images from the reference set R, denoted as RjR and |Rj|=kj1, where | · | denotes the cardinality of the image set.

The de-identification function, f, aims to improve data privacy by masking the design information (i.e., printing path information) from each image Ij, while simultaneously retaining data utility for anomaly detection and part certification. The proposed ADDAM methodology can be divided into several stages, which are discussed in the following subsections.

4.2 Stage 1: Reference Set Selection.

In real-world applications, AM users have the ability to use their historical data, or data available from machine calibrations, to create a diverse and robust reference set R for de-identification. There are some key requirements to keep in mind when developing this independent reference set. First, the reference set should have a high diversity of angular orientations. This is important as it will better facilitate proper de-identification, as more unique identities can lead to more variability in the de-identified images with respect to the angular identity. Second, the reference data need to share a similar domain distribution for the data to be de-identified. This is important for the similarity space construction and the preservation of the data utility, as the geometric and thermal features derived from each distribution are indicative to the overall characteristics of the distribution. If these features differ too much, it will drastically impact the adaptive procedure of the algorithm and lead to utility and/or privacy degradation. Finally, the reference set should not include any samples that are also within the set of images to be de-identified. This will lead to a degraded privacy gains (PG), as these duplicate reference images would be guaranteed to be included in the adaptive-k samples used to de-identify the original image.

After selection of the reference images, the overall reference set quality can be evaluated in a couple of ways. The first is to evaluate the overall difference between the derived thermal and geometric features of the reference set and the de-identification set. These features play an important role in de-identification, and if their distribution in the reference set differs too much from the de-identification set, it will impact the overall algorithm performance. Second, the two domain distributions could be quantitatively evaluated using a distance metric, such as maximum mean discrepancy (MMD) or Kullback–Leibler divergence. This allows a user to quantify the distance and difference between two distributions with metrics that are commonly used in transfer learning and domain adaption applications [52,53].

4.3 Stage 2: Process Data Dimension Reduction.

To reduce the dimensionality of the thermal images, the reference set, R, is used to fit vectorized principal component analysis (vPCA) for low-dimensional process feature extraction. The vPCA achieves dimensionality reduction by mapping the original melt pool images into a low-dimensional space, where each sample image, Ij, is then transformed into this space, as illustrated in Eq. (2).
(2)
where Wp represents the projection matrix estimated from the reference image set, R, and p denotes the percentage of the total variability explained by the extracted PCs, denoted as vj. In most cases, the value of p is set as 95% such that the major variability in the original melt pool image Ij can be retained in vj.

4.4 Stage 3: Additive Manufacturing Utility Attribute Space Construction.

The utility attribute space (UAS) incorporates derived features to construct a vector space to evaluate the utility-aware similarity of sample images to images in the reference image set. The features used to construct this space include both the geometric features and the other insensitive, utility related features. These derived features can be directly indicative of the overall health status of the melt pool and play an important role in preserving the dataset utility and achieving adaptive de-identification. However, it is important to note that these features underperform compared to the features extracted using vPCA for anomaly detection. For this reason, these features are not leveraged during classification. The UAS is leveraged to identify the abnormal and healthy melt pool images, based on how similar they are to their neighbors. This improves data privacy as it ensures that healthy melt pools, which tend to have a high number of neighbors, achieve a higher level of data privacy. Since healthy melt pools tend to make up the majority of data samples, this ensures better data set privacy. In addition, the UAS allows for abnormal melt pools to maintain a minimum level of de-identification, which in turn maintains dataset usability. This is due to the characteristic fact that the abnormal melt pools are dissimilar from healthy melt pools and each other, allowing these samples to maintain their distinct characteristics by using a lower adaptive k value. It is important to note that this will not compromise the overall dataset privacy, as with AM data, not every image has to be de-identified to ensure data privacy. The main risks are exposed when a large set of images are available and can be used together, and the minimally de-identified anomalous images only make up a small subset of the data.

Multiple AM utility attributes are proposed to form the UAS. The first attribute is the L2 norm of the reconstruction error denoted as gj1, which can be calculated in Eq. (3) for each Ij
(3)
where I^jRr×c denotes the image reconstructed from vj. This feature is important as the vPCA algorithm is fit using healthy reference images, which provide a larger L2 reconstruction error for melt pools that contain anomalies. Moreover, a few additional utility features can be extracted from each original melt pool image Ij, including peak temperature and its row and column location in the field of view, as well as the area and eccentricity of the melt pool, which is segmented using the melting point of the feedstock material. These abovementioned features of Ij are denoted as gjw(w = 1, 2, …, 6). The six-dimensional feature vector is denoted as gj=(gj1,gj2,,gj6), which forms the UAS to determine the similarity of each melt pool image Ij against the reference images.

A distance function is defined in the UAS, denoted as dg(X,Y), which represents the Euclidean distance between two thermal images, i.e., X and Y, in the UAS. This distance function is used to identify the subset of images in R to be used to de-identify the observed image Ij, and thus acts as one of the controlling mechanisms used to tune the sensitivity of the ADDAM algorithm when determining the adaptive k value.

4.5 Stage 4: Determination of the Adaptive kj Value.

This stage determines the adaptive kj value for Ij. The proposed method significantly departs from the traditional k-same, GARP, and APFD algorithms, which utilize a global k value to achieve image anonymization [44,45]. There are two distinct and important operations within the ADDAM algorithm. First, the ADDAM algorithm implements a series of constraints when determining the k-closest reference images of Ij. These constraints leverage characteristics of each melt pool, including the layer location and angular identity, and define the neighborhood size within the UAS. This plays a crucial role in the ADDAM algorithm, as it allows the user to adjust and control the sensitivity and tune the de-identification algorithm. Second, the adaptive algorithm employs an additional balancing mechanism, which ensures that the reference set, combined with the sample image Ij, is equally diverse across all possible angular identities in the dataset. Both aspects are critical components that de-identify the angular identities while retaining the utility related information in the de-identified image.

For each angular identity in R, denoted as θn (n = 1, 2, …, m), the corresponding angular-reference set, used to de-identify Ij, can be defined in Eq. (4)
(4)
where the first constraint enforces the identified neighbors to be in proximity of Ij in terms of the build layers, l(·) denotes the layer index where the thermal image is collected from, and Δl represents the pre-defined maximum allowable layer difference between the identified neighboring images and Ij; the second constraint requires the elements in Rjn to be of the angular identity θn; the last constraint forces that the Euclidean distance (denoted as dg) between the identified neighboring images and Ij are no larger than a pre-defined threshold value M in the UAS defined in stage 3. After applying these constraints, the number of closest reference images in Rjn can be calculated as below
(5)
where kjn0, and kjn varies according to the similarity of Ij to the reference thermal images in R as well as the corresponding angular identity θn. For example, if Ij is a healthy thermal image, there will be many Ris in proximity of Ij in terms of both build layers and within the UAS, and thus the value of kjn will be larger. However, if Ij is an unhealthy thermal image, there will be very few (or even none) neighboring thermal images in R, and thus the kjn value will be very small (or even zero). In the case where one or more of the kjn=0, Ij is probably extremely abnormal, and therefore will receive no de-identification to keep its significant deviation from the healthy group. This scenario is extremely rare within I, and will not create any major privacy concerns as abnormal melt pools make up the minority. In addition, it is worth noting that the sample image Ij is the nearest neighbor to itself within the subset where φ(Ij)=θn. The sample image will be incorporated into the corresponding Rjn of the same angular identity θn. This ensures that sample image angular identify will be accounted for when the algorithm undergoes a balancing procedure.

Subsequently, the adaptive algorithm involves a crucial balancing function that ensures that there is an equal representation of images within each reference subset Rjn. This prevents an overpopulation of one angular identity during the de-identification process, which can impact the amount of data privacy achieved. This step results from the major difference present between the ADDAM algorithm and traditional k-anonymization algorithms. Traditionally, when applying global anonymization techniques, each image within the dataset contained a unique identity, such as a human face. If this image is anonymized with any other identity in the dataset, there will be a resulting gain of privacy for that individual. However, with AM thermal process data, there are repeating identities within the dataset. Therefore, the de-identification with the same identity will not yield any privacy gains. Balancing the distribution of these angular identities within Rj guarantees that not one unique identity will be more prominent than the others during de-identification. This is accomplished by first ensuring that each angular-based subset previously determined is re-indexed into a monotonically increasing order, such that dg(Ij,R(1))dg(Ij,R(2))dg(Ij,R(kjn))dg(Ij,R(s)). Re-indexing ensures that the images with the shortest Euclidean distance to the sample image will be first in the order of the subsets.

From here, a fourth filter is applied, which limits the size of each subgroup to be equal to the smallest subgroup. This is the novel balancing procedure which ensures that each angular identity is equally represented within the closest kn images to the sample image
(6)
and the balanced identify subgroup Rj*n={Ri|Ridg(Ij,R(kj*)),RiRjn}. Next, the aggregated de-identification set, Rj, can be formed by directly merging Rj*n's to form the larger and equally diverse de-identification dataset. This aggregated set, Rj, is directly used to de-identify sample image Ij
(7)
(8)
where kj is the number of aggregated, closest images used to de-identify Ij. The aggregated de-identification set, Rj, is a direct combination of all the balanced reference subgroups, Rj*n. This is the set of images (sample image and closest reference images) that will be directly used to de-identify Ij.

4.6 Stage 5: Melt Pool Image De-Identification.

The final stage of the proposed methodology is AM process image de-identification, given the kj neighboring images identified in stage 4. For each sample image Ij, all the images in Rj are combined to form the anonymized image, I~j, by directly averaging the dimensionally reduced images in Rj as below

(9)
where each image within the aggregated de-identification set is directly averaged to create a de-identified PC vector (v~j), which can then be reversely transformed into the original image space to obtain the surrogate image I~j to be published and aggregated with data from other AM users.

4.7 Evaluation of Design De-Identification Performance.

To evaluate the design de-identification performance for secured collaborative AM process-defect modeling, two novel anonymization performance metrics are introduced to meet the needs of AM applications. These metrics will allow for the measurable gain in privacy and loss in utility of the dataset shared, and then can be further evaluated using a Pareto front [54,55] to quantify the trade-off between two conflicting objectives: (1) minimizing utility loss and (2) maximizing privacy gain [44]. These two metrics are derived from the traditional classification metrics, which have been previously leveraged to evaluate the performance of de-identification and k-anonymization algorithms [41,43,44,46]. Traditionally, the data privacy performance can be gauged as the number of correct predictions before and after de-identification. This allows for a natural and easily implementable method for evaluating model performance using ML models by simply calculating the performance metrics before and after.

Definition III
UL is defined as the decrease in the anomaly detection performance (in percentage) due to de-identification.
(10)
where XBase and XAnon denote the anomaly detection performance metrics achieved by the original dataset and the de-identified dataset, respectively. It is worth noting that based on the definition, UL is usually a negative value. Therefore, it is desirable to either minimize |UL| or maximize UL. In addition, the UL metric is written in a general form of anomaly detection performance metrics above, while it relies on leveraging classification metrics, such as F1 (13) or overall accuracy (14). In general, the F1 can be leveraged when evaluating UL, as AM process data are traditionally unbalanced with respect to the anomaly labels.
Definition IV
PG is evaluating the classifier model's ability to predict the printing path orientation between the baseline and de-identified datasets, ultimately evaluating the privacy gains from implementing de-identification algorithms.
(11)
where ZBase and ZAnon denote the printing orientation classification performance metrics achieved by the original dataset and the de-identified dataset, respectively, and they are also written in general form and rely on the specific classification metric, which is determined heavily on the balanced or unbalanced characterization of the dataset. In general, the accuracy can be leveraged when evaluating PG, if the datasets are balanced with respect to the print orientation labels. Had the angular class labeling been unbalanced, the F1 should be used.
Both PG and UL are plotted in a two-dimensional plot to find the Pareto front of optimal solutions, determining the overall performance of ADDAM. The following equations describe the different classification metrics used to build the UL performance metric.
(12)
(13)
In these equations, TP represents the correct prediction that there is a defect present and the melt pool is abnormal, and TN represents the correct prediction that there are no defects present, and the melt pool is healthy. In addition, FP represents the incorrect prediction that there is a defect present, but the melt pool is healthy, and FN represents the incorrect prediction that there are no defects present, but the melt pool is abnormal. The metrics used depend on how balanced the data is with respect to class labels. For example, when the dataset is unbalanced, F1 should be used for XBase and XAnon. Otherwise, accuracy would be a good choice [56]. Furthermore, accuracy is leveraged as the underlying metric behind the PG.
(14)

In summary, a pseudocode of the proposed ADDAM algorithm is detailed in Fig. 6.

5 Case Study

This section will discuss the case study used to validate the proposed ADDAM methodology with respect to both data privacy gain and data utility preservation.

5.1 Experimental Setup and Data Description.

The experimental setup is visualized in Fig. 7, which consists of an OPTOMEC LENS 750 DED machine equipped with a co-axial pyrometer camera (Stratonics Inc., Laguna Hills, CA) to capture the thermal images during the fabrication [2,3,17,18]. The LENS DED machine leverages a 1.0 kW Nd:YAG laser, and the pyrometer is mounted above the DED machine, outside of the inert chamber, where it is aligned with a series of mirrors to obtain a co-axial view. The specifications of the pyrometer are as follows:

  • Exposure time: 2.0274 ms

  • Image size 752 × 480 and pixel pitch 6.45 µm

  • Captured temperature range: 1000–2500 °C

  • Pixel clock: 5 MHz

  • Image collection rate: 6.4 Hz

Two cylindrical specimens with different printing parameters and infill patterns were fabricated for data collection. The key printing parameters are summarized in Table 2.

The specimen fabrication resulted in raw thermal images with 480 rows and 752 columns, in which each pixel represents a temperature reading at the corresponding location. First, these images are cropped into 201 × 201 to reduce the image dimensions and remove irrelevant regions that do not contain the melt pools. It is important to note that the initial cropping parameters were consistent across all the images. In addition, the instantaneous printing orientations of both datasets were determined by leveraging the g-codes of the two specimens post-processing. There are two unique angular identities in part 1 (0 deg/180 deg), and three in part 2 (60 deg/180 deg/300 deg). Furthermore, due to the existing trends in the AM thermal process data, only the data after layer 20 were leveraged for tuning and evaluating the performance of the different algorithms. This provides a better, more consistent evaluation of ADDAM performance. Overall, these two datasets will provide four unique angular identities and 2458 thermal melt pool images for experimentation. This is a limited dataset that will allow more controlled experimentation and simulates the limited data availability faced by SMMs. The results are reflective and comparable to the application of ADDAM in a practical setting.

After part fabrication, the porosities were detected utilizing the XCT inspection and subsequently matched with the thermal images based on the porosity location and the g-code for part 1 only. As a result, the thermal images were labeled as defect present (1) or defect absent (0). For part 2, there is no post-process inspection data available for anomaly detection modeling.

5.2 Evaluation Procedure

5.2.1 Benchmark Method Selection.

For benchmark comparison, a global k-anonymization approach was applied. This involves anonymizing each sample image with a constant number of k-closest neighbors, instead of allowing an adaptive k value to be applied to each image. This is indicative of the traditional global k-anonymization methods that have been used in the past, primarily in the k-same methods. The performance comparison will demonstrate the effectiveness of the proposed adaptive mechanism in de-identifying AM process data. It is worth noting that the global k value will be the only hyperparameter to tune for the benchmark method.

5.2.2 Two Testing Scenarios.

Two different testing scenarios were designed to evaluate the performance of the ADDAM algorithm.

Scenario I: This scenario was aimed at evaluating both the data utility and privacy by applying the ADDAM algorithm exclusively to part 1, where there are both anomaly and theta labels. This scenario simulates a single, independent user who is applying the ADDAM algorithm to their dataset before data sharing.

Scenario II: This scenario was designed to evaluate the effect of additional instantaneous print orientations on the privacy-preserving abilities of the ADDAM, as well as to evaluate the utility preservation abilities when aggregating two datasets. This is simulating the collaboration of two users, or a single user leveraging two datasets, to de-identify the thermal process data. Ultimately providing further validation to the results from the first scenario, as there were limited print orientations available within the first test, as well as providing an evaluation on the performance of ADDAM when aggregating multiple datasets.

5.2.3 Data Splitting for Evaluation.

For both previously described scenarios, 30% of the sample images were used as the reference image set (R) for the de-identification process, which simulates an independent reference or gallery set that shared a similar distribution to the de-identification data. The remaining images were used to as the sample images (I). More specifically, for part 1, 30% of the healthy melt pool images (Class = 0) were used to form R. This is a similar tactic to those used in Ref. [3], where the distribution of the normal melt pools is leveraged to identify abnormal melt pools. However, for part 2 there is no normal and abnormal class labels, so the reference data (R) are taken by randomly sampling 30% of the original melt pool images. This data splitting method is described in detail in Fig. 8.

In addition, the MMD [52] can be leveraged to verify the similarity of the distribution between the reference set and the de-identification set. The MMD is essentially defined as the distance between the feature mean of two distributions. This similarity metric has been commonly leveraged in transfer learning applications to determine the distance, or similarity, between the source and target domains [11], and can be used as a loss function in deep learning applications [57]. The calculated MMD scores between the reference set and the sample sets for both testing scenarios are summarized in Table 3. In general, the lower MMD score is, the smaller the distance between the feature means of the two datasets will be. It can be observed that the MMD scores for both testing scenarios are only 1.41% and 0.97% of the MMD score between the distributions of two fabricated parts.

Furthermore, from the sample image set (I), 30% of the images were randomly sampled and used as a tuning set (T) to tune both the ADDAM user-defined hyperparameters (M and Δl) and the global k nearest neighbor parameter (k). These tuning data are first de-identified using different combinations of the user-defined hyperparameters and is then evaluated each time using a support vector machine (SVM) classifier for anomaly detection and angular identity detection. The remaining 70% of the sample images were used as an evaluation set (E) to gauge the performance of the optimal user-defined de-identification parameters identified from the tuning process. The evaluation set is de-identified using each of the parameter sets selected from the tuning data. After de-identification, the de-identified evaluation data were split into 80/20 training/testing sets and fed into SVM classifiers to predict anomalies and angular identities, producing the overall UL and PG performance of the de-identification algorithm. This final SVM performance evaluation was performed over ten iterations and results in an averaged performance for the de-identification algorithm. This entire procedure was repeated for both scenarios, just with either part 1 independent or part 1 and part 2 aggregated datasets, which also dictate if either anomaly-detecting and/or angular identity detecting SVM classifiers are leveraged.

To evaluate the algorithm performance in these two scenarios, an SVM classifier was chosen due to its ability to characterize the nonlinear relationships within high-dimensional data. The SVM classifier was used during both the tuning stage and during the final evaluation stage, and the SVM hyperparameters were tuned using grid search cross-validation with a stratified shuffle splitting strategy. In addition, ten replications were performed for each scenario test, and the average performance across these replications was reported and compared to evaluate model robustness.

5.3 Parameter Tuning.

For each image within the sample dataset, there are several parameters to consider, these include the variability explained in the PCs (p) and the user-defined constraints related parameters, i.e., M and Δl. For the p value, the variability explained by the PCs was fixed at 95%. This value was chosen as an adequate level of variation that will reduce the high dimensionality of the data, while simultaneously capturing the explained variance within melt pools. This allows for less computational expensive experimentation while still retaining enough information to identify both the presence of abnormal melt pools and the detection of the print orientation angles. In addition, the user-defined inputs, M and Δl, and the benchmark input, k, were evaluated over different ranges of values These ranges were designed to capture a variety of possible values and highlight how varying input values can affect the performance of the ADDAM algorithm and are depicted in Table 4.

The user-defined inputs were evaluated based on the tuning data set in terms of both PG and UL, and all the Pareto efficient solutions were found through evaluating the performance metrics on a mesh grid of the two de-identification hyperparameters. The Pareto efficient solutions were chosen such that they maximized the increase in privacy, while minimizing the loss of utility. A visualization of the ADDAM tuning process is depicted in Fig. 9. It is important to note that due to the limited number of unique angular identities, too high of a distance constraint (M) can lead to a decrease or stagnation in the privacy gain. In addition, larger Δl values can lead to higher privacy gains in some scenarios but can adversely impact usability.

Furthermore, the benchmark methodology (global k-anonymization) was also tuned to provide comparable evaluations. This included using the same SVM classifier and tuning data split as the ADDAM algorithm. However, this method does not incorporate a balancing parameter, as it directly uses the k − 1 nearest neighbors to de-identify the image. A visualization of the global k-anonymization can be seen in Fig. 10, and it is important to note that the general trend exists that increasing PG decreases the UL. This shows that there is a direct, inverse relationship between the privacy gain and utility preserving performance of global anonymization models. In addition, the variation in performance between k values can be attributed to the lack of unique angular identities available in each dataset and imbalanced nature of the dataset.

5.4 Results and Discussion.

This section details the results from the experimentation described in the previous sections. All tests were evaluated using the same SVM model setup described previously, to ensure comparability between the proposed and benchmark method.

First, the baseline performance of the SVM model was determined for each of the two testing scenarios. This baseline test highlights the non-anonymized performance of the chosen SVM classifier, which is the maximum data utility that can be achieved. As noted previously, the F1-score will be the primary metric to evaluate UL. The accuracy metric will be leveraged when evaluating the angular classification performance, PG. The baseline results for both scenarios are listed in Table 5. In addition, it is important to note that the vPCA extracted features were chosen to evaluate our proposed ADDAM method due to their higher performance over the geometric and thermal features for anomaly detection.

Second, the validation data (T) were leveraged in the ADDAM algorithm and global k algorithm to determine which parameters were optimal for each scenario. As illustrated in Fig. 11, each point represents a combination of user-defined inputs (M and Δl) for ADDAM, or a global k level for the benchmark. From here, the Pareto optimal points were identified (higher opacity) as the points that lie on the optimal front of the performance area for each scenario. The additional points (lower opacity) are the other combination of parameters which do not lie on the Pareto optimal front. These points represent parameters that do not perform optimally using the datasets in scenarios I and II, and are not chosen to evaluate the final test performance. The specific performance and corresponding hyperparameter values are shown in Fig. 11. It is important to note that the advantage of the ADDAM algorithm is its ability to preserve data usability, through a smaller |UL|, provided similar privacy gain, PG. From these optimal points, the corresponding hyperparameter sets were selected and then used to de-identify the testing dataset (E) for the benchmark and ADDAM methods.

The final phase of experimentation takes the Pareto optimal set of the hyperparameter values identified in the tuning stage and applies them to the held-out evaluation data E to determine an averaged performance in both PG and UL. This evaluation is similar to the tuning results depicted in Fig. 11, however these represent the optimal combination of parameters used on the hold-out testing data, ultimately representing the final performance. The ADDAM algorithm again outperforms the benchmark method for both testing scenarios, which is detailed in Fig. 12. The Pareto optimal values found from the evaluation data (E) were better positioned to minimize |UL| and maximize PG for ADDAM, as compared to global k. These results show that the ADDAM algorithm uniformly outperforms the benchmark global k method.

Furthermore, in scenario I, the ADDAM algorithm can achieve a comparable or slightly larger PG, without sacrificing as nearly as much data usability as the global k method. This trend is present when implementing the ADDAM algorithm in both the tuning and evaluation stages. In addition, for scenario II the ADDAM algorithm was able to achieve a noticeable higher privacy gain value, i.e., PG, while maintaining a comparable, and even slightly better, utility loss than the benchmark method. This reinforces the effectiveness of the ADDAM algorithm in practical applications, where complex part geometries would be leveraged in the de-identification. This would provide more diverse angular identities, leading to more improved de-identification results. From both testing scenarios, the ADDAM algorithm was able to outperform the benchmark method in at least one or both optimization objectives. The better performance in utility preservation and increased data privacy of the ADDAM algorithm can be explained through the adaptive de-identification approach. With ADDAM, the user is maximizing the features preserved in the abnormal melt pools, because these images will receive lower, or even zero, level of de-identification. This effectively preserves the features that define the abnormalities. On the other hand, in the benchmark method with global k, the k-closest neighbors were chosen as a constant optimal value, which does not provide the de-identification flexibility to abnormal images. This, as a result, will blur the distinction between the healthy and abnormal melt pool images, sacrificing the AM data utility in anomaly detection.

In a practical application, these results would provide the AM user with the ability to leverage an optimal set of solutions and optimize a de-identification algorithm that best suits their needs. This can be primarily attributed to the Pareto front evaluation technique, which provides an optimal set of solutions and allows the user to evaluate the trade-off between utility preservation and data privacy. From here, a user can evaluate these optimal solutions and decide if they want to prioritize de-identification, utility preservation, or find a balance. This allows the user an additional level of customization to better meet their specific application needs.

6 Conclusion and Future Work

In conclusion, this paper proposes a novel, adaptive approach named ADDAM methodology to achieving de-identification of design information for AM thermal process data, resulting in secure, de-identified AM process data that can be leveraged for the development of more robust in-situ defect detection models. This new adaptive de-identification approach outperforms the traditional global approaches to achieving dataset privacy. Ultimately improving overall dataset privacy (20–30% improvement), while sacrificing a limited amount of data utility (0–10% maximum loss in usability) on the controlled dataset. This creates a stronger defense against IP theft while still allowing AM users to aggregate data, overcoming some of the challenges posed by limited process data for robust process-defect modeling for SMM. Furthermore, the ADDAM algorithm was evaluated on thermal process data collected from a DED process; however, the adaptive framework can easily be expanded beyond DED systems. Many different metal-based AM systems could collect very similar thermal process data, and the adaptive approach itself provides a novel method for de-identifying AM process data, which tends to share the same characteristics of being unbalanced and containing a limited number of unique identities.

There are a few directions that remain open for future research. First, the inclusion of additional angular identities provides a potentially effective improvement in the ADDAM algorithm performance. This includes evaluating the potential effects of using infill orientation angles that are not based on a unidirectional infill pattern or a free-formed component. In addition, leveraging larger datasets that reflect more complex part geometries will provide a more diverse reference set, which may result in stronger de-identification per image. This will ultimately translate into stronger dataset-level data privacy, and be reflective of practical applications. Furthermore, with an increased diversity of angular identities, a potential improvement for the evaluation method would be to apply a regression-based evaluation of the angular identities. This would provide a continuous-valued result, which could provide a more accurate evaluation of the angular identity detection. Second, the proposed ADDAM algorithm is aimed at providing a melt-pool wise data privacy, which will provide data privacy while achieving an elevated level of data utility preservation. Future research can potentially develop an additional, compounding privacy measures to further protect against re-identification attacks on a layer-wise level. This could involve incorporating additional image-augmentation measures and layer-wise anonymization techniques to the proposed adaptive de-identification method to achieve larger gains in data privacy. Finally, the adaptive approach to de-identification can be applied to other applications, outside of the AM domain. The ADDAM methodology implements a novel adaptive approach to de-identification that can be beneficial to achieving improved data privacy in different applications, especially where the traditional global k-anonymization approaches may not be as effective. This includes instances where the dataset may not have a large number of unique identities or instances there are additional features available that can be extracted and leveraged to enhance the data privacy through similarity space construction.

Acknowledgment

This work was partially sponsored by National Science Foundation CMMI-2046515.

Conflict of Interest

There are no conflicts of interest.

Data Availability Statement

The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.

References

1.
Khanzadeh
,
M.
,
Tian
,
W.
,
Yadollahi
,
A.
,
Doude
,
H. R.
,
Tschopp
,
M. A.
, and
Bian
,
L.
,
2018
, “
Dual Process Monitoring of Metal-Based Additive Manufacturing Using Tensor Decomposition of Thermal Image Streams
,”
Addit. Manuf.
,
23
, pp.
443
456
.
2.
Khanzadeh
,
M.
,
Chowdhury
,
S.
,
Marufuzzaman
,
M.
,
Tschopp
,
M. A.
, and
Bian
,
L.
,
2018
, “
Porosity Prediction: Supervised-Learning of Thermal History for Direct Laser Deposition
,”
J. Manuf. Syst.
,
47
, pp.
69
82
.
3.
Seifi
,
S. H.
,
Tian
,
W.
,
Doude
,
H.
,
Tschopp
,
M. A.
, and
Bian
,
L.
,
2019
, “
Layer-Wise Modeling and Anomaly Detection for Laser-Based Additive Manufacturing
,”
ASME J. Manuf. Sci. Eng.
,
141
(
8
), p.
081013
.
4.
Qin
,
J.
,
Hu
,
F.
,
Liu
,
Y.
,
Witherell
,
P.
,
Wang
,
C. C. L.
,
Rosen
,
D. W.
,
Simpson
,
T. W.
,
Lu
,
Y.
, and
Tang
,
Q.
,
2022
, “
Research and Application of Machine Learning for Additive Manufacturing
,”
Addit. Manuf.
,
52
, pp.
102691
.
5.
Liu
,
C.
,
Tian
,
W.
, and
Kan
,
C.
,
2022
, “
When AI Meets Additive Manufacturing: Challenges and Emerging Opportunities for Human-Centered Products Development
,”
J. Manuf. Syst.
,
64
, pp.
648
656
.
6.
Patel
,
J.
,
2019
,
Data-Driven Modeling for Additive Manufacturing of Metals: Proceedings of a Workshop
,
National Academies Press
,
Washington, DC
.
7.
Aggour
,
K.
,
Aman
,
R.
,
Bell
,
T.
,
Browne
,
C.
,
Casukhela
,
R.
,
Clemente
,
M.
,
Cobb
,
K.
, et al
Strategic Guide: Additive Manufacturing Data Management and Schema
”.
8.
Cheng
,
L.
,
Tsung
,
F.
, and
Wang
,
A.
,
2017
, “
A Statistical Transfer Learning Perspective for Modeling Shape Deviations in Additive Manufacturing
,”
IEEE Robot. Autom. Lett.
,
2
(
4
), pp.
1988
1993
.
9.
Huang
,
X.
,
Xie
,
T.
,
Wang
,
Z.
,
Chen
,
L.
,
Zhou
,
Q.
, and
Hu
,
Z.
,
2022
, “
A Transfer Learning-Based Multi-Fidelity Point-Cloud Neural Network Approach for Melt Pool Modeling in Additive Manufacturing
,”
ASCE-ASME J. Risk Uncert. Eng. Sys. Part B Mech. Eng.
,
8
(
1
), p.
011104
.
10.
Ren
,
J.
,
Wei
,
A. T.
,
Jiang
,
Z.
,
Wang
,
H.
, and
Wang
,
X.
,
2021
, “
Improved Modeling of Kinematics-Induced Geometric Variations in Extrusion-Based Additive Manufacturing Through Between-Printer Transfer Learning
,”
IEEE Trans. Autom. Sci. Eng.
,
19
(
3
), pp.
2310
2321
.
11.
Zhuang
,
F.
,
Qi
,
Z.
,
Duan
,
K.
,
Xi
,
D.
,
Zhu
,
Y.
,
Zhu
,
H.
,
Xiong
,
H.
, et al
,
2021
, “
A Comprehensive Survey on Transfer Learning
,”
Proc. IEEE
,
109
(
1
), pp.
43
76
.
12.
McCann
,
R.
,
Obeidi
,
M. A.
,
Hughes
,
C.
,
McCarthy
,
É
,
Egan
,
D. S.
,
Vijayaraghavan
,
R. K.
,
Joshi
,
A. M.
, et al
,
2021
, “
In-Situ Sensing, Process Monitoring and Machine Control in Laser Powder Bed Fusion: A Review
,”
Addit. Manuf.
,
45
, p.
102058
.
13.
Tschopp
,
M. A.
,
2017
, “
A Methodology for Predicting Porosity From Thermal Imaging of Melt Pools in Additive Manufacturing Thin Wall Sections
,”
ASME 2017 12th International Manufacturing Science and Engineering Conference
, Los Angeles, CA, June 4–8, pp.
1
10
.
14.
Khanzadeh
,
M.
,
Chowdhury
,
S.
,
Tschopp
,
M. A.
,
Doude
,
H. R.
,
Marufuzzaman
,
M.
, and
Bian
,
L.
,
2019
, “
In-Situ Monitoring of Melt Pool Images for Porosity Prediction in Directed Energy Deposition Processes
,”
IISE Trans
,
51
(
5
), pp.
437
455
.
15.
Tian
,
Q.
,
Guo
,
S.
,
Melder
,
E.
,
Bian
,
L.
, and
Grace
,
G. W.
,
2021
, “
Deep Learning-Based Data Fusion Method for In Situ Porosity Detection in Laser-Based Additive Manufacturing
,”
ASME J. Manuf. Sci. Eng.
,
143
(
4
), p.
041011
.
16.
Scime
,
L.
,
Siddel
,
D.
,
Baird
,
S.
, and
Paquit
,
V.
,
2020
, “
Layer-Wise Anomaly Detection and Classification for Powder Bed Additive Manufacturing Processes: A Machine-Agnostic Algorithm for Real-Time Pixel-Wise Semantic Segmentation
,”
Addit. Manuf.
,
36
, p.
101453
.
17.
Mahmoudi
,
M.
,
Ezzat
,
A. A.
, and
Elwany
,
A.
,
2019
, “
Layerwise Anomaly Detection in Laser Powder-Bed Fusion Metal Additive Manufacturing
,”
ASME J. Manuf. Sci. Eng.
,
141
(
3
), p.
031002
.
18.
Esfahani
,
M. N.
,
Bappy
,
M. M.
,
Bian
,
L.
, and
Tian
,
W.
,
2022
, “
In-Situ Layer-Wise Certification for Direct Laser Deposition Processes Based on Thermal Image Series Analysis
,”
J. Manuf. Process
,
75
, pp.
895
902
.
19.
Pan
,
S. J.
, and
Yang
,
Q.
,
2010
, “
A Survey on Transfer Learning
,”
IEEE Trans. Knowl. Data Eng.
,
22
(
10
), pp.
1345
1359
.
20.
Liu
,
S.
,
Stebner
,
A. P.
,
Kappes
,
B. B.
, and
Zhang
,
X.
,
2021
, “
Machine Learning for Knowledge Transfer Across Multiple Metals Additive Manufacturing Printers
,”
Addit. Manuf.
,
39
, p.
101877
.
21.
Francis
,
J.
,
Sabbaghi
,
A.
,
Ravi Shankar
,
M.
,
Ghasri-Khouzani
,
M.
, and
Bian
,
L.
,
2020
, “
Efficient Distortion Prediction of Additively Manufactured Parts Using Bayesian Model Transfer Between Material Systems
,”
ASME J. Manuf. Sci. Eng.
,
142
(
5
), p.
051001
.
22.
Hu
,
Q.
,
Chen
,
R.
,
Yang
,
H.
, and
Kumara
,
S.
,
2020
, “
Privacy-Preserving Data Mining for Smart Manufacturing
,”
Smart Sustain. Manuf. Syst.
,
4
(
2
), p.
20190043
.
23.
Samarati
,
P.
, and
Sweeney
,
L.
,
2001
, “
Protecting Privacy When Disclosing Information: k-Anonymity and Its Enforcement Through Generalization and Suppression
,”
IEEE Trans. Knowl. Data Eng.
,
13
(
6
), pp.
1010
1027
. doi.org/10.1109/69.971193
24.
Islam
,
M. N.
,
Tu
,
Y.
,
Hossen
,
M. I.
,
Guo
,
S.
, and
Hei
,
2021
, “A Survey on Limitation, Security and Privacy Issues on Additive Manufacturing.” http://arxiv.org/abs/2103.06400
25.
Yampolskiy
,
M.
,
Andel
,
T. R.
,
McDonald
,
J. T.
,
Glisson
,
W. B.
, and
Yasinsac
,
A.
,
2014
, “
Intellectual Property Protection in Additive Layer Manufacturing: Requirements for Secure Outsourcing
,”
ACM International Conference Proceeding Series
,
New Orleans, LA
,
Dec. 8–12
.
26.
Chhetri
,
S. R.
,
Rashid
,
N.
,
Faezi
,
S.
, and
al Faruque
,
M. A.
,
2017
, “
Security Trends and Advances in Manufacturing Systems in the Era of Industry 4.0
,”
IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD
,
San Jose, CA
,
Nov. 13–16
, pp.
1039
1046
.
27.
Tao
,
F.
,
Qi
,
Q.
,
Liu
,
A.
, and
Kusiak
,
A.
,
2018
, “
Data-Driven Smart Manufacturing
,”
J. Manuf. Syst.
,
48
, pp.
157
169
.
28.
Mamun
,
A.
,
Liu
,
C.
,
Kan
,
C.
, and
Tian
,
W.
,
2021
, “
Real-Time Process Authentication for Additive Manufacturing Processes Based on In-Situ Video Analysis
,”
Procedia Manuf.
,
53
, pp.
697
704
.
29.
Zeltmann
,
S. E.
,
Gupta
,
N.
,
Tsoutsos
,
N. G.
,
Maniatakos
,
M.
,
Rajendran
,
J.
, and
Karri
,
R.
,
2016
, “
Manufacturing and Security Challenges in 3D Printing
,”
JOM
,
68
(
7
), pp.
1872
1881
.
30.
Chhetri
,
S. R.
,
Canedo
,
A.
, and
al Faruque
,
M. A.
,
2016
, “
KCAD: Kinetic Cyber-Attack Detection Method for Cyber-Physical Additive Manufacturing Systems
,”
IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD
,
Austin, TX
,
Nov. 7–10
.
31.
Murthy
,
S.
,
Bakar
,
A. A.
,
Rahim
,
F. A.
, and
Ramli
,
R.
,
2019
, “
A Comparative Study of Data Anonymization Techniques
,”
IEEE 5th International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing (HPSC), and IEEE International Conference on Intelligent Data and Security
,
Washington, DC
.
32.
Fontaine
,
C.
, and
Galand
,
F.
,
2007
, “
A Survey of Homomorphic Encryption for Nonspecialists
,”
EURASIP J. Inf. Secur.
,
2007
, pp.
1
10
.
33.
Gatlin
,
J.
,
Belikovetsky
,
S.
,
Elovici
,
Y.
,
Skjellum
,
A.
,
Lubell
,
J.
,
Witherell
,
P.
, and
Yampolskiy
,
M.
,
2021
, “
Encryption is Futile: Reconstructing 3D-Printed Models Using the Power Side-Channel
,”
ACM International Conference Proceeding Series
,
San Sebastian, Spain
,
Oct. 6–8
, pp.
135
147
.
34.
Sweeney
,
L.
,
2002
, “
K-Anonymity: A Model for Protecting Privacy
,”
Int. J. Uncertaint. Fuzz. Knowl. Based Syst.
,
10
(
5
), pp.
557
570
.
35.
Zhong
,
S.
,
Yang
,
Z.
, and
Wright
,
R. N.
,
2005
, “
Privacy-Enhancing k-Anonymization of Customer Data
,”
Proceedings of the Twenty-Fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principals of Database System
, pp.
139
147
.
36.
Bhati
,
B. S.
,
Ivanchev
,
J.
,
Bojic
,
I.
,
Datta
,
A.
, and
Eckhoff
,
D.
,
2021
, “
Utility-Driven k-Anonymization of Public Transport User Data
,”
IEEE Access
,
9
, pp.
23608
23623
.
37.
Domingo-Ferrer
,
J.
, and
Torra
,
V.
,
2008
, “
A Critique of k-Anonymity and Some of Its Enhancements
,”
Proceedings of ARES 2008—3rd International Conference on Availability, Security, and Reliability
,
Barcelona, Spain
,
Mar. 4–7
, pp.
990
993
.
38.
LeFevre
,
K.
,
DeWitt
,
D. J.
, and
Ramakrishnan
,
R.
,
2006
, “
Mondrian Multidimensional k-Anonymity
,”
Proceedings of International Conference on Data Engineering
,
Atlanta, GA
,
Apr. 3–8
, p.
25
.
39.
Lin
,
J. L.
, and
Wei
,
M. C.
,
2008
, “
An Efficient Clustering Method for k-Anonymization
,”
ACM International Conference Proceeding Series
,
Nantes, France
,
Mar. 25–29
, Vol. 331, pp.
46
50
.
40.
Ni
,
S.
,
Xie
,
M.
, and
Qian
,
Q.
,
2017
, “
Clustering Based k-Anonymity Algorithm for Privacy Preservation
,”
Int. J. Netw. Secur.
,
19
(
6
), pp.
1062
1071
.
41.
Newton
,
E. M.
,
Sweeney
,
L.
, and
Malin
,
B.
,
2005
, “
Preserving Privacy by De-Identifying Face Images
,”
IEEE Trans. Knowl. Data Eng.
,
17
(
2
), pp.
232
243
.
42.
Gross
,
R.
,
Airoldi
,
E.
,
Malin
,
B.
, and
Sweeney
,
L.
,
2006
, “Integrating Utility Into Face De-Identification,”
Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 3856 LNCS
, pp.
227
242
.
43.
Gross
,
R.
,
Sweeney
,
L.
,
Cohn
,
J.
,
de La Torre
,
F.
, and
Baker
,
S.
,
2009
, “Face De-identification,”
Protecting Privacy in Video Surveillance
,
A.
Senior
, ed.,
Springer London
,
London, UK
.
44.
Du
,
L.
,
Yi
,
M.
,
Blasch
,
E.
, and
Ling
,
H.
,
2014
, “
GARP-Face: Balancing Privacy Protection and Utility Preservation in Face De-Identification
,”
IJCB 2014—2014 IEEE/IAPR International Joint Conference on Biometrics
,
Clearwater, FL
,
Sept. 29–Oct. 2
.
45.
Jourabloo
,
A.
,
Yin
,
X.
, and
Liu
,
X.
,
2015
, “
Attribute Preserved Face De-Identification
,”
Proceedings of 2015 International Conference on Biometrics, ICB 2015
,
Phuket, Thailand
,
May 19–22
, pp.
278
285
.
46.
Gross
,
R.
,
Sweeney
,
L.
,
de La Torre
,
F.
, and
Baker
,
S.
,
2006
, “
Model-Based Face De-Identification
,”
2006 Conference on Computer Vision and Pattern Recognition Workshops
,
New York, NY
,
June 17–22
, p.
161
.
47.
Meng
,
L.
, and
Sun
,
Z.
,
2014
, “
Face De-Identification With Perfect Privacy Protection
,”
2014 37th International Convention on Information and Communication Technology, Electronics and Microelectronics, MIPRO 2014—Proceedings
,
Opatija, Croatia
,
May 26–30
, pp.
1234
1239
.
48.
Li
,
T.
, and
Lin
,
L.
,
2019
, “
AnonymousNet: Natural Face De-Identification With Measurable Privacy
,”
IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
,
Long Beach, CA
,
June 16–17
, pp.
56
65
.
49.
Meden
,
B.
,
Emersic
,
Z.
,
Struc
,
V.
, and
Peer
,
P.
,
2017
, “
k-Same-Net : Neural-Network-Based Face De-identification
,”
2017 International Conference and Workshop on Bioinspired Intelligence (IWOBI)
,
Funchal, Portugal
,
July 10–13
.
50.
Nakamura
,
T.
,
Sakuma
,
Y.
, and
Nishi
,
H.
,
2019
, “
Face Image Anonymization as an Application of Multidimensional Data k-Anonymizer
,”
Proceedings—2019 7th International Symposium on Computing and Networking Workshops, CANDARW 2019
,
Nagasaki, Japan
,
Nov. 26–29
, pp.
155
161
.
51.
Brickell
,
J.
, and
Shmatikov
,
V.
,
2008
, “
The Cost of Privacy: Destruction of Data-Mining Utility in Anonymized Data Publishing
,”
Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
,
Las Vegas, NV
,
August
, pp.
70
78
.
52.
Zhang
,
B.
,
Chen
,
C.
, and
Wang
,
L.
,
Sept. 2020
, “
Privacy-Preserving Transfer Learning Via Secure Maximum Mean Discrepancy
.” http://arxiv.org/abs/2009.11680
53.
Uguroglu
,
S.
, and
Carbonell
,
J.
,
2011
, “
Feature Selection for Transfer Learning
”.
54.
Abbass
,
H. A.
,
Sarker
,
R.
, and
Newton
,
C.
,
2001
, “
PDE: A Pareto-Frontier Differential Evolution Approach for Multi-objective Optimization Problems
,”
Proceedings of the IEEE Conference on Evolutionary Computation, ICEC
,
New Orleans, LA
,
June 6–8
, Vol. 2, pp.
971
978
.
55.
Tian
,
W.
,
Ma
,
J.
, and
Alizadeh
,
M.
,
2019
, “
Energy Consumption Optimization With Geometric Accuracy Consideration for Fused Filament Fabrication Processes
,”
Int. J. Adv. Manuf. Technol.
,
103
(
5–8
), pp.
3223
3233
.
56.
Menardi
,
G.
, and
Torelli
,
N.
,
2014
, “
Training and Assessing Classification Rules With Imbalanced Data
,”
Data Min. Knowl. Discov.
,
28
(
1
), pp.
92
122
.
57.
Dziugaite
,
G. K.
,
Roy
,
D. M.
, and
Ghahramani
,
Z.
,
May 2015
, “
Training Generative Neural Networks Via Maximum Mean Discrepancy Optimization
.” http://arxiv.org/abs/1505.03906