Abstract

Conceptual design is the foundational stage of a design process that translates ill-defined design problems into low-fidelity design concepts and prototypes through design search, creation, and integration. In this stage, product shape design is one of the most paramount aspects. When applying deep learning-based methods to product shape design, two major challenges exist: (1) design data exhibit in multiple modalities and (2) an increasing demand for creativity. With recent advances in deep learning of cross-modal tasks (DLCMTs), which can transfer one design modality to another, we see opportunities to develop artificial intelligence (AI) to assist the design of product shapes in a new paradigm. In this paper, we conduct a systematic review of the retrieval, generation, and manipulation methods for DLCMT that involve three cross-modal types: text-to-3D shape, text-to-sketch, and sketch-to-3D shape. The review identifies 50 articles from a pool of 1341 papers in the fields of computer graphics, computer vision, and engineering design. We review (1) state-of-the-art DLCMT methods that can be applied to product shape design and (2) identify the key challenges, such as lack of consideration of engineering performance in the early design phase that need to be addressed when applying DLCMT methods. In the end, we discuss the potential solutions to these challenges and propose a list of research questions that point to future directions of data-driven conceptual design.

1 Introduction

The product shape is essential in the conceptual design of engineered products because it can affect both the esthetics and the engineering performance of a product [1]. Figure 1 shows the flow of information and the key steps for the design of product shapes at the conceptual design stage [1], where the information can be categorized into three modalities: natural language (NL) (e.g., text), sketches (e.g., 2D silhouette), and 3D shapes (e.g., meshes). We call them design modalities. Generally, customer needs and engineering requirement documents are in the form of natural languages. Design sketches and drawings are effective ways of brainstorming and expressing designers’ preferences. Low-fidelity design concepts and prototypes from the conceptual design stage are often represented by 3D shapes in digital format. Design search, design creation, and design integration are the core steps of conceptual design to gather information from existing design solutions for inspiration and to develop novel design concepts to better explore the design space [1].

Fig. 1
Iterative conceptual design stage in the development of engineered products
Fig. 1
Iterative conceptual design stage in the development of engineered products
Close modal

Early design automation methods, such as grammar- and rule-based methods, rely primarily on human design experience and knowledge to generate design alternatives [2]. In contrast, deep learning methods can learn latent design representations from data without explicit design rules or grammars, so they have been increasingly adopted in many engineering design applications. So far, however, deep learning methods have been applied mainly in the later stages of engineering design for design automation [3]. It is challenging to apply deep learning methods to the conceptual design stage (i.e., the early design stage) for several reasons. For example, data in the conceptual design stage exhibit multiple modalities, but deep learning methods are usually applied to handle a single design modality. Moreover, in conceptual design, designers often gather a large set of information for design inspiration in different design steps, but deep learning methods tend to focus on one specific design task at a time. Finally, human (either user or designer) input and interactions are desired in conceptual design to improve design creativity and human-centered design, but most current design methods developed using deep learning do not interact directly with human data, but only implicitly capture human preferences from training datasets, as shown in Fig. 2.

Fig. 2
Deep learning-based design process with humans in the loop
Fig. 2
Deep learning-based design process with humans in the loop
Close modal

With recent development in deep learning of cross-modal tasks (DLCMT),2 we see the opportunities of applying these methods to address the aforementioned challenges, particularly in product shape design, such as car body and plane fuselage [5,6]. DLCMT allows explicit human input in one design modality and translates it to another modality, e.g., from natural language or sketches to 3D shapes, as shown in Fig. 2. In DLCMT, there are cross-modal retrieval, generation, and manipulation methods. Cross-modal retrieval methods can be used to search an existing design repository for inspiring design ideas. Cross-modal generation methods can be used to explore a design space to generate new design concepts. Lastly, cross-modal manipulation methods can further edit and manipulate existing designs to refine designs. These three categories of methods can be used in the design search, design creation, and design integration steps (Fig. 1), respectively. In this paper, we conducted a systematic review of the state-of-the-art methods for DLCMT. Through a close examination of the existing literature, our objective is to identify the DLCMT methods and technologies that can be used to facilitate the conceptual design and the challenges associated with applying them.

A total of 50 recently published journal articles and conference papers are identified and closely reviewed from the fields of computer graphics, computer vision, and engineering design. We focus on the text, sketches, and 3D shapes because they are the main design modalities in conceptual design. Specifically, we reviewed deep learning methods for three types of cross-modal tasks: text-to-sketch, text-to-3D, and sketch-to-3D. We found that most of the literature comes from computer graphics and computer vision, with few attempts at engineering design applications. This poses new challenges and opportunities for adapting the models and techniques developed to solve engineering design problems and, particularly, to bridge human input and interactions with deep learning methods in the conceptual design of engineered product shapes.

The remainder of this paper is organized as follows. Section 2 introduces background knowledge on conceptual design, design modalities, and our motivation for the review. Section 3 presents the methodology for our systematic review. We tabulate all the reviewed articles and present four statistics from the literature in Sec. 4. We then discuss the literature in detail and answer the research questions (RQs) of the systematic review in Sec. 5. In the end, we propose a list of six research questions that will inform future research directions in Sec. 6 and conclude our work with closing remarks in Sec. 7.

2 Background

2.1 Conceptual Design.

Conceptual design lies in the early phase of a design process in which the form and function of a product are explored [7]. In conceptual design, it is crucial to explore the design space as much as possible, and designers are demanded to generate creative designs so that the products are likely to succeed in the market [8,9]. As shown in Fig. 1, we adapt and reinterpret the five-step concept generation method in conceptual design [1]. The five steps are problem clarification, design search, design creation, design integration, and reflection. Through these five steps, the method transfers information, such as customer needs, engineering requirements, and design ideas, to design concepts in the form of sketches and 3D shapes. The corresponding input and output of each step are represented by dotted rectangles. The process is linear in sequence from left to right, but almost always iterative. For example, feedback from reflection could influence problem clarification and its subsequent steps. Each design step can also be iterative so that the design problem can be better understood, and the design space can be better explored [1].

In the conceptual design phase, the shape of a product is one of the most important considerations that are influential on the esthetics of a product and its engineering performance [1,10]. In this paper, we focus primarily on reviewing the DLCMT methods that can be applied for product shape design in the three concept generation steps, i.e., design search, design creation, and design integration, because they are the core steps for design concept exploration.

2.1.1 Design Search.

Design search is the step of collecting information on existing design solutions to a design problem. In practice, several ways, such as patents, literature, and benchmarking, can be used to gather useful information [1]. By analyzing those existing products, designers can summarize their advantages and disadvantages, so that they can make necessary and customized changes to existing designs to create satisfying ones. However, the repository of existing design options could be huge, so the search process would be time-consuming and cumbersome, placing significant cognitive and physical burdens on designers. One possible solution to this problem is to use an AI-assisted search process, where designers can predefine search criteria and utilize computers to search for relevant design solutions.

2.1.2 Design Creation.

Design creation emphasizes exploring novel design concepts. Designers brainstorm ideas and explore the design space to create novel design concepts based on the knowledge of designers [1]. Design ideas are often presented as sketches and text descriptions during conceptual design [11]. Text descriptions are used to document and describe designers’ ideas, while sketches can help visualize design concepts, further triggering creative design ideas [1214]. Low-fidelity 3D models are then created for better visualization and further development. However, creating 3D models involves a lot of manual work and could be time-consuming. To facilitate the creation of novel 3D shapes, generative design methods can be used to automate the process.

2.1.3 Design Integration.

The design integration is the step where designers aim to systematically integrate the information collected from previous steps to generate the integrated design concept(s) [1]. For product shape design, designers usually need to edit and manipulate designs collected from the design search and design creation steps. But, it can be challenging to modify these designs computationally because their representations have certain formats (e.g., a 3D shape in voxels or point clouds or a sketch of a raster image). Some formats are not editable and must be translated into other formats, such as mesh or B-rep. Therefore, automating the modification with human inputs can significantly simplify the process.

2.2 Modalities in Conceptual Design.

As shown in Fig. 1, there are three main design modalities: NL, sketches, and 3D shapes in conceptual design. In an example of car body design, as shown in Fig. 3, the three modalities could be “I want a red sedan car” (NL), hand-sketching a car with desired features (sketch), and then creating a computer-aided design (CAD) model of the car (3D shape). NL allows people to convey and communicate ideas and thoughts. It is also the primary means for documentation, such as documentation of customer needs and engineering requirements. Sketches are often used to brainstorm design concepts because sketching can stimulate designers’ creative imagination [1214]. Then, a 3D shape is often built to provide better visualization and a low-fidelity prototype model for further evaluation and development of a concept.

Fig. 3
Cross-modal tasks in conceptual design
Fig. 3
Cross-modal tasks in conceptual design
Close modal

NL data are often in the format of the text, which is usually the keyword in DLCMT methods. As shown in Table 1, there are mainly three types of text used as input in DLCMT, which are NLD, object names, and semantic keywords. 2D sketches can be represented in multiple ways, such as a pixel image3 in static pixel space and vector image in dynamic stroke coordinate space [15,16]. There are also generally two types of 3D sketches in the literature, and we refer to them as type I and type II, respectively. Type I: This kind of 3D sketch is represented in a 2D space. But compared to regular 2D sketches, they look like 3D objects. Type II: the 3D sketches that can be represented in a 3D space (either real or computational). Such a type of 3D sketch data can be captured and generated using virtual reality (VR) tools or motion sensing devices. They can also be created using 3D sketching software (e.g., solidworks or autodesk). 3D shapes are typically built as B-rep models using cad software in engineering design. However, in computer graphics and the 3D deep learning fields, 3D shapes are usually represented as meshes, point clouds, and voxel grids. Compared to CAD models, these 3D representations typically have lower fidelity with fewer geometric details and structural information because (1) coarse resolution might be used to represent the shapes due to the limitations of computational resources [17,18], (2) certain representations are not good at representing geometric details and topological structure by nature (e.g., point clouds; see Table 2 for more information), and (3) the conversion of one representation to another might lose geometric or topological information [19,20].

Table 1

The text types of natural language data used in DLCMT and the examples

Text typeExamples
Natural language descriptions (NLDs)“It’s a round glass table with two sets of wooden legs that clasp over the round glass edge”
Object names“chairs,” “cars,” “planes”
Semantic keywords“circular short,” “rectangular wooden”
Text typeExamples
Natural language descriptions (NLDs)“It’s a round glass table with two sets of wooden legs that clasp over the round glass edge”
Object names“chairs,” “cars,” “planes”
Semantic keywords“circular short,” “rectangular wooden”
Table 2

Comparison of pros and cons of the three representations to deep learning methods

3D representationProsCons
Voxels
  • The data structure in fourth-order tensor makes it easy to be adapted in 3D convolution operations in deep learning methods

  • Can deal with 3D shapes with arbitrary topology

  • Low visual quality

  • High computational cost because the number of the 3D representation parameters scale with the increase of spatial resolution in cubes

  • Cannot be directly used in engineering analyses (e.g., finite element analysis (FEA)) for performance evaluation

Point clouds
  • Compatible with the output data format of common scanning software

  • Compact for data storage and management

  • Can deal with 3D shapes with arbitrary topology

  • Low visual quality

  • No detailed geometric information about relationships between points making it hard to convert to meshes

  • Cannot be directly used in engineering analyses (e.g., FEA) for performance evaluation

Meshes
  • High visual quality

  • Compact for data storage and management

  • Widely accepted 3D representation in computer graphics

  • Compatible with downstream engineering software, such as the FEA and computational fluid dynamics tools

  • Discrete and disordered elements make it challenging to be processed by deep learning methods

  • Hard to deal with 3D shapes with arbitrary topology

Implicit representation
  • High visual quality

  • Easy adaption to deep learning methods

  • Compact for data storage and management

  • Can deal with 3D shapes with arbitrary topology

  • Need to use rendering techniques to extract the isosurface of the 3D shapes for visualization

  • Cannot be directly used in engineering analyses (e.g., FEA) for performance evaluation

3D representationProsCons
Voxels
  • The data structure in fourth-order tensor makes it easy to be adapted in 3D convolution operations in deep learning methods

  • Can deal with 3D shapes with arbitrary topology

  • Low visual quality

  • High computational cost because the number of the 3D representation parameters scale with the increase of spatial resolution in cubes

  • Cannot be directly used in engineering analyses (e.g., finite element analysis (FEA)) for performance evaluation

Point clouds
  • Compatible with the output data format of common scanning software

  • Compact for data storage and management

  • Can deal with 3D shapes with arbitrary topology

  • Low visual quality

  • No detailed geometric information about relationships between points making it hard to convert to meshes

  • Cannot be directly used in engineering analyses (e.g., FEA) for performance evaluation

Meshes
  • High visual quality

  • Compact for data storage and management

  • Widely accepted 3D representation in computer graphics

  • Compatible with downstream engineering software, such as the FEA and computational fluid dynamics tools

  • Discrete and disordered elements make it challenging to be processed by deep learning methods

  • Hard to deal with 3D shapes with arbitrary topology

Implicit representation
  • High visual quality

  • Easy adaption to deep learning methods

  • Compact for data storage and management

  • Can deal with 3D shapes with arbitrary topology

  • Need to use rendering techniques to extract the isosurface of the 3D shapes for visualization

  • Cannot be directly used in engineering analyses (e.g., FEA) for performance evaluation

2.3 Review Motivation.

Our motivation for this literature review is driven by the following two major challenges posed in conceptual design. The recent advancement in DLCMT gives us opportunities to address these challenges and bring new design experiences in conceptual design.

Challenge 1: Multi-modalities. There are multiple design steps (e.g., design search, design creation, and design integration) in the conceptual design stage, which involve information and data with different design modalities. Designers conduct design activities with different modalities during the conceptual phase to best explore the design space and generate novel ideas [21,22].

Deep learning methods that can be used for design creation have been the focus [3], but most of them are focused on handling a single design modality as pointed out by Refs. [23,24]. Typically, these methods use unimodal data of designs either in 2D [2527] or 3D [2831]. In addition, there is a lack of either unimodal or cross-modal methods that are useful for design search and design integration [24].

But not until recently, we see studies in the engineering community utilizing DLCMT to assist concept creation or design evaluation [23,32,33]. DLCMT methods take into account multiple design modalities, such as texts and sketches. There are retrieval, generation, and manipulation methods for DLCMT and they can be applied to different steps in conceptual design: (1) DLCMT retrieval methods can be used for design search since they can search existing data and return designs that best match the query of users (e.g., returning several chairs given a query by sketch) [34]; (2) generation methods (e.g., sketch-to-3D shape generation methods [33,35]) can be used to automate the design creation process; (3) manipulation methods can allow designers to modify the designs from another design modality. For example, using a text-to-3D manipulation method [36], designers can modify a 3D design by providing a simple text description without direct manipulation of the design, and this can significantly reduce the time for the design modification.

Challenge 2: Creativity. Design creativity is critical in conceptual design which can largely affect the success of a product in the market. There are three main aspects (i.e., design novelty, contextual information, and human–computer interaction) that should be addressed for design creativity in the context of deep learning-based design processes.

  1. Design novelty. Deep learning methods (e.g., variational autoencoders (VAEs) and generative adversarial networks (GANs)) can generate new data that are not seen in the training dataset but are still based on interpolation within the boundary of the training data. Therefore, the new designs generated from the deep learning-based design process share great similarities with the existing ones used as training data. To improve design creativity, there have been a few deep learning-based methods that focus on developing neural network architectures to generate creative designs by enabling deep learning models’ extrapolating capabilities [37,38]. These methods pose new opportunities for design because they can generate truly novel designs.

  2. Contextual information. On the other hand, humans have played an essential role in design creativity. However, despite advances in the development of network architecture, one observation is that human input and interaction are not much emphasized in the deep learning-based design process [3]. Burnap et al. [39] pointed out that a human’s perception of the quality of the design concepts generated is often not in agreement with their numerical performance measures. The reason could be that in most deep learning-aided design processes, designers can only passively select the preferred design concepts from a set of computer-generated design options, but human designers may have contextual information [40] on a design problem which is hard to be captured by the training data.

  3. Human–computer interaction. As a result, there is a need to actively involve designers in a deep learning-based design process [3,10]. Some efforts in this regard have recently been made in engineered product design. For example, the method introduced by Valdez et al. [41] allows users to manipulate the latent space vectors learned by a GAN model to create preferred design options. Despite recent advances, we believe that design creativity can be further improved by involving humans in the design process to allow more intuitive and natural human input (e.g., text and sketch). Natural language and sketches are the most common human input in conceptual design, and DLCMT methods can intake these human inputs and transfer their modalities from one to another to promote creativity. That is manifested in the envisioned deep generative design process with humans in the loop, as shown in Fig. 2. In such a process, designers can continuously supplement new design ideas during human–computer interaction to guide computers to generate creative and feasible design concepts.

In addition, there should be many design processes and applications that can be facilitated by DLCMT and we show three typical examples in Fig. 4. Design application 1: DLCMT methods can be used to facilitate design democratization, allowing ordinary people to customize designs based on individual preferences [42]. Design application 2: There are also opportunities to develop AI-based pedagogical tools to teach students or train novice designers, allowing them to explore design alternatives with naive input, for example, just a simple word [43]. Design application 3: Immersive design uses VR, augmented reality (AR), and mixed reality (MR) to create a realistic digital environment in which a user is virtually immersed and can even physically interact with the digital environment [44]. The DLCMT methods can be integrated into immersive design applications to enhance the design experience in human–computer interaction.

Fig. 4
Potential design applications enabled by DLCMT: (a) democratization of product design, (b) AI-based pedagogical tools for educating and training students or novice designers, and (c) immersive design environment
Fig. 4
Potential design applications enabled by DLCMT: (a) democratization of product design, (b) AI-based pedagogical tools for educating and training students or novice designers, and (c) immersive design environment
Close modal

In summary, DLCMT methods are likely to introduce new opportunities to support and enhance activities in the conceptual design stage for product shape design and beyond. We conduct a close examination of the existing literature aiming to identify the existing DLCMT methods and technologies that can be used for conceptual product shape design and the challenges associated with applying them. We will also discuss potential solutions to these challenges and point out future research directions.

3 Methodology

This study adopts a systematic literature review approach [45] with the procedure of formulating research questions for a review, identifying relevant studies, evaluating the quality of the studies, summarizing the studies, and interpreting the findings.

3.1 Research Questions.

We are motivated to ask two RQs according to the discussion above.

RQ 1. What DLCMT methods can be used in the following three steps of conceptual design?

  1. Design search

  2. Design creation

  3. Design integration

RQ 2. What are the challenges in applying DLCMT to conceptual design and how can they be addressed?

3.2 Literature Search

3.2.1 Content Scope and Keywords.

We defined the content scope using the following three criteria to search the literature relevant to DLCMTs: (1) conceptual design: design search, design creation, and design integration steps (highlighted in Fig. 1). (2) Shape design: discrete, physical, and engineered products. (3) Design modality: text, sketch, and 3D shape.

The keywords identified and used in the literature search process are “text-to-sketch retrieval,” “text-to-sketch generation,” “text-to-shape retrieval,” “text-to-shape generation,” “sketch-based 3D shape retrieval,” and “sketch-based 3D shape generation.” For “sketch-based 3D shape generation,” we include the other three commonly used names: “sketch-based 3D shape reconstruction,” “sketch-based 3D shape synthesis,” and “3D shape reconstruction from sketches.”

The reasons for choosing these keywords come from the following aspects. (1) DLCMT between two different modalities of text, sketch, and 3D shape should have six permutations of cross-modal. In this paper, we focus on the following three cross-modal tasks: text-to-sketch, sketch-to-3D shape, and text-to-3D shape, which are then concatenated with retrieval or generation to form the initial keywords (e.g., text-to-sketch generation). We did not include sketch-to-text, 3D shape-to-sketch, and 3D shape-to-text because sketches or 3D shapes are often the most common artifacts, and the design information flows in an order of text, sketches, and 3D shapes during the conceptual design. (2) We focus on design search which corresponds to retrieval methods, design creation which corresponds to generation methods, and design integration which corresponds to manipulation methods.4 In addition, for the sketch-to-3D shape retrieval or generation methods, we made some modifications to the keywords according to the naming convention in the literature (see a comprehensive review on deep learning methods for free-hand sketch [15]). For example, we used “sketch-based 3D shape retrieval” instead of “sketch-to-3D shape retrieval” and the other three common terms introduced previously.

3.2.2 Literature Search Process.

As shown in Fig. 5, we finally selected 50 articles that meet our scope of review. Searches were conducted on the main databases of the literature (i.e., the source scope): ScienceDirect, Web of Science, Scopus, IEEExplore, Association for Computing Machinery (ACM) Digital Libraries, and Google Scholar within the time range of Jan. 2013 to Jun. 2022 (i.e., the time scope: the studies published in the past 10 years). The reason for choosing that time range is that many significant improvements in deep learning methods occurred after 2013, for example, VAEs (2013) [46] and GANs (2014) [47]. Since then, they have been widely applied in various applications, including the cross-modal tasks reviewed in this paper.

Fig. 5
Literature search process
Fig. 5
Literature search process
Close modal

The initial search yielded 1341 seed articles, including duplicates, of which the majority (i.e., 1304 papers) is related to two categories: sketch-based 3D shape retrieval and generation, with only 37 articles for the other four categories (i.e., text-to-sketch retrieval: 0; text-to-sketch generation: 3; text-to-3D shape retrieval: 10; and text-to-3D shape generation: 24) (see details in Table 3 in Appendix  A). To make the review manageable, for the two categories of sketch-to-3D works, we decided to identify the most influential studies from those 1304 papers using Connected Papers.5 We found that Refs. [35,48] are pioneering work for deep learning-based sketch-to-3D shape retrieval and generation, respectively [24]. Therefore, they were used as the origin papers to find their most relevant work via Connected Papers (see Fig. 11 in Appendix  A for the two generated graphs). The search by Connected Papers identified 21 articles including Refs. [35,48] that meet our content scope.

Table 3

Studies found in major databases using keywords of “text-to-sketch retrieval” (TSkRet), “text-to-sketch generation” (TSkG), “text-to-shape retrieval” (TShRet), “text-to-shape generation” (TShG), “sketch-based 3D shape retrieval” (SkShRet), “sketch-based 3D shape generation” (SkShG), “sketch-based 3D shape reconstruction” (SkShRec), “sketch-based 3D shape synthesis” (SkShSyn), and “3D shape reconstruction from sketches” (ShRecSk)

Keywords (double quotation marks included)
TSkRetTSkGTShRetTShGSkShRetSkShGSkShRecSkShSynShRecSk
DatabaseScienceDirect000020000
Web of Science0010201001
Scopus001145451095
IEEExplore0001131101
ACM Digital Libraries0010140003
Google Scholar03722559 (96)7 (5)5 (2)1 (0)120 (35)
Total03102410621471220
Keywords (double quotation marks included)
TSkRetTSkGTShRetTShGSkShRetSkShGSkShRecSkShSynShRecSk
DatabaseScienceDirect000020000
Web of Science0010201001
Scopus001145451095
IEEExplore0001131101
ACM Digital Libraries0010140003
Google Scholar03722559 (96)7 (5)5 (2)1 (0)120 (35)
Total03102410621471220

Another finding was that the publication year of the articles in the two literature graphs turned out to be up to 2020, which could indicate that relevant articles published after 2020 have not gained enough attention to be considered influential by Connected Papers. The finding motivated us to further find the most recent studies for these two categories, so we decided to search relevant articles within the time range from Jan. 2021 to Jun. 2022 in Google Scholar only, because we found that Google Scholar is more inclusive compared to other databases (i.e., the results from other databases turn out to be a subset of the results obtained from Google Scholar. See the comparison in Table 3 in Appendix  A). Hundred and thirty eight articles were found in this search process. In total, 196 papers were found to merit close examination and review.

We then reviewed the titles and abstracts of all these articles to judge their relevance to our content scope. We excluded 12 preprints, one Master thesis, and one Ph.D. dissertation from those 196 papers because the preprints are not peer-reviewed or officially published. Finally, 50 articles were considered the most relevant and therefore closely reviewed.

4 Summary Statistics of the Literature

We summarized all 50 articles in terms of the following variables: method type, publication year, representation of design modalities, training dataset(s), object class of the training data, generalizability, user interface, user study, and publication source in Table 4 of Appendix  B which provides a complete list of these articles and the corresponding values for each of these variables. We report the statistics of four variables here, including the type of DLCMT, user interface, user study, and publication source, as an example, and introduce the others in detail in Sec. 5.

Table 4

Summary of the literature

Type of DLCMTReferenceYearMethodText typeSketch type3D representationDatasetObject classGeneralizability beyond trained classesUser interfaceUser studyPublication source
Text to 3D shape retrievalHan et al. [66]2019CNN and GRUNLDN/AVoxel3D-text dataset [17]Chairs and tablesNoNoNoConference: AAAI
Chen et al. [17]2018Text encoder (CNN, GRU) and shape encoder (3D-CNN)NLDN/AVoxelProposed a 3D-text dataset based on ShapeNet [49]Chairs, tables, and synthetic objectsNoNoNoConference: ACCV
Text to 3D shape generationJain et al. [102]2022Network based on CLIP [52]NLDN/ANeRF [103]Common objects in context (COCO) [139]Diverse classesYesNoNoConference: CVPR
Sanghi et al. [43]2022Network based on PointNet [126], CLIP [52], and OccNet [98]Object namesN/AVoxelShapeNet [49]Diverse classesNoNoNoConference: CVPR
Liu et al. [50]2022Shape autoencoder, word-level spatial transformer, and shape generator (implicit maximum likelihood estimation (IMLE) [140])NLDN/AImplicit representation, mesh3D-text dataset [17]Chairs and tablesNoNoNoConference: CVPR
Jahan et al. [93]2021Shape encoder and decoder label regression network,Semantic keywordsN/AImplicit representation, meshCOSEG [94] and ModelNet [95]Chairs, tables, and lampsNoNoNoJournal: CGF
Li et al. [97]2020GAN-based networkNLDN/AVoxel3D-text dataset [17]Chairs and tablesNoNoNoConference: ICVRV
Chen et al. [17]2018Text encoder (CNN, GRU), shape encoder (3D-CNN), and GANNLDN/AVoxelProposed a 3D-text dataset based on ShapeNet [49]Chairs, tables, and synthetic objectsNoNoNoConference: ACCV
Text to sketch generationYuan et al. [63]2021GAN and Bi-Long short term memory (LSTM)NLDStatic pixel spaceN/AProposed SketchCUB based on CUB [106]BirdsNoNoYesConference: CVPR
Huang et al. [54]2020Composition proposer (transformer) and object generator (Sketch-RNN [16])NLDDynamic stroke coordinate spaceN/ACoDraw [141]Diverse classesYesYesYesConference: IUI
Huang and Canny [53]2019Scene composer (transformer) and object sketcher (Sketch-RNN [16])NLDDynamic stroke coordinate spaceN/AVisual Genome [107] and Quick, Draw! [108]Diverse classesYesYesYesConference: UIST
Wang et al. [105]2018GAN-based networkNLDStatic pixel spaceN/AProposed Text2Sketch based on dataset [142]Human facesNoNoNoConference: ICIP
Sketch to 3D shape retrievalQin et al. [32]2022Generative Recursive Autoencoders for Shape Structures (GRASS) [143] and k-nearest neighborsN/AStatic pixel spaceB-RepProposed a CAD model-sketches datasetDiverse classesYesYesNoJournal: AEI
Yang et al. [75]20223D model network and 2D sketch network (MVCNN [84])N/AStatic pixel spaceMeshSHREC13 [82], SHREC14 [83], and SHREC16 [91]Diverse classesYesNoNoJournal: MS
Qi et al. [34]2021Sketch encoder and shape encoder (MVCNN [84])N/AStatic pixel spaceMeshProposed a fine-grained dataset based on ShapeNet [49]Chairs and lampsNoNoNoJournal: TIP
Manda et al. [86]2021MVCNN [84], Group-View Convolutional Neural Networks (GVCNN) [144], RotationNet [145], and Multiview convolutional neural networks with Self Attention (MVCNN-SA) [146]N/AStatic pixel spaceB-RepProposed CADSketchNet based on ESB [87] and MCB [88]Diverse classesYesNoNoJournal: CG
Liang et al. [80]2021Sketch network and view networkN/AStatic pixel spaceMeshSHREC13 [82] and SHREC14 [83]Diverse classesYesNoNoJournal: TIP
Liu and Zhao [81]2021MVCNN [84] and guidance cleaning networkN/AStatic pixel spaceMeshSHREC13 [82] and SHREC14 [83]Diverse classesYesNoNoConference: ICCEIA-VR
Xia et al. [74]2021Student network and teacher network (MVCNN [84])N/AStatic pixel spaceMeshSHREC13 [82]Diverse classesYesNoNoConference: ICCS
Li et al. [55]2021CNN-based networkN/AType II 3D sketchMeshSHREC16STB [89]Diverse classesYesYesNoJournal: MTA
Navarro et al. [85]2021CNN-based networkN/AStatic pixel spaceMeshProposed a line drawing dataset based on ShapeNet [49]Diverse classesYesNoNoJournal: CGF
Chen et al. [78]2019Sketch network, segmented stochastic-viewing shape network, and view attention networkN/AStatic pixel spaceMeshSHREC13 [82], SHREC14 [83], and PART-SHREC14 [147]Diverse classesYesNoNoConference: CVPR
Dai et al. [71]2018Source domain network and target domain network (3D-scale-invariant feature transform (SIFT) [148])N/AStatic pixel spaceMeshSHREC13 [82], SHREC14 [83], and SHREC16 [91]Diverse classesYesNoNoJournal: TIP
Chen and Fang [73]2018MVCNN [84], GAN, metric network, and cross-modality transformation networkN/AStatic pixel spaceMeshSHREC13 [82] and SHREC14 [83]Diverse classesYesNoNoConference: ECCV
Sketch to 3D shape retrievalDai et al. [72]2017Source domain network and target domain network (3D-SIFT [148])N/AStatic pixel spaceMeshSHREC13 [82] and SHREC14 [83]Diverse classesYesNoNoConference: AAAI
Xie et al. [77]2017CNN and metric networkN/AStatic pixel spaceMeshSHREC13 [82] and SHREC14 [83]Diverse classesYesNoNoConference: CVPR
Zhu et al. [70]2016Cross-domain neural network and pyramid cross-domain networkN/AStatic pixel spaceMeshSHREC14 [83]Diverse classesYesNoNoConference: AAAI
Ye et al. [89]2016CNN-based networkN/AType II 3D sketchMeshProposed SHREC16STBDiverse classesYesNoNoConference: ICPR
Wang et al. [48]2015CNN and Siamese networkN/AStatic pixel spaceMeshPSB [67], SHREC13 [82], and SHREC14 [83]Diverse classesYesNoNoConference: CVPR
Sketch and text to 3D shape retrievalStemasov et al. [62]2022Flask representation state transfer and HoloLensNLDType II 3D sketchMesh and voxelThingiverse and MyMiniFactoryDiverse classesYesYesNoConference: CHI
Giunchi et al. [44]2021CNN-based networkNLDType II 3D sketchMeshProposed a variational chairs dataset based on ShapeNet [49]ChairsNoYesYesConference: IMX
Sketch to 3D shape generationLi et al. [33]2022Target-embedding variational autoencoderN/AStatic pixel spaceMeshDataset [149]Cars and cupsNoNoNoJournal: JMD
Nozawa et al. [20]2022GAN and lazy learningN/AStatic pixel spacePoint cloud and meshShapeNet [49]CarsNoNoNoJournal: VC
Du et al. [59]2021CNN, OccNet [98], and 3D-CNNN/AStatic pixel spaceImplicit representation and meshPartNet [150]Chairs, tables, and lampsNoYesYesJournal: CGF
Wang et al. [118]2021Sketch component segmentation network, transformation network, and VAEN/AStatic pixel spacePoint cloud and meshDataset [35]Characters, airplanes, and chairsNoNoNoJournal: WCMC
Guillard et al. [6]2021Encoder (MeshSDF [151]), decoder, and differential rendererN/AStatic pixel spaceImplicit representation and meshShapeNet [49]Cars and chairsNoYesNoConference: ICCV
Sketch to 3D shape generationZhang et al. [120]2021View-aware generation network (encoder and decoder) and discriminatorN/AStatic pixel spaceMeshShapeNet-Sketch [152], Sketchy [153], and TuBerlin [154]Diverse classesYesNoNoConference: CVPR
Yang et al. [115]2021CNN-based networkN/AStatic pixel spaceMeshArchive of motion capture as surface shape (AMASS) [155]Human bodiesNoNoNoConference: MMM
Luo et al. [60]2021Voxel-aligned implicit network and pixel-aligned implicit networkN/AStatic pixel spaceImplicit representation and meshProposed 3DAnimalHeadAnimal headsNoYesYesConference: UIST
Jin et al. [51]2020VAEN/AStatic pixel spaceVoxel and meshPSB [67] and benchmark [156]Diverse classesYesNoNoConference: I3D
Smirnov et al. [5]2020CNN-based networkN/AStatic pixel spaceB-Rep and meshShapeNet [49]Diverse classesNoNoNoConference: ICLR
Nozawa et al. [19]2020Encoder–decoder and lazy learningN/AStatic pixel spacePoint cloud an meshShapeNet [49]CarsNoNoNoConference: VISIGRAPP
Smirnov et al. [122]2019CNN-based networkN/AStatic pixel spaceB-Rep and meshShapeNet [49]Diverse classesNoNoNoConference: ICLR
Delanoy et al. [114]2019CNN-based networkN/AType I 3D sketchVoxelCOSEG [94]Chairs, vases, and synthetic shapesNoNoNoJournal: CG
Wang et al. [121]2018Autoencoder and GANN/AStatic pixel spaceVoxelSHREC13 [82] and ShapeNet [49]ChairsNoNoNoConference: MM
Li et al. [56]2018DFNet (encoder–decoder) and GeomNet (encoder–decoder)N/AStatic pixel spaceMeshDataset [35]CharactersNoYesYesJournal: TOG
Delanoy et al. [57]2018Singleview CNN and updater CNNN/AType I 3D sketchVoxelCOSEG [94]Chairs, vases, and synthetic shapesNoYesYesJournal: PACMCGIT
Lun et al. [35]2017Encoder and multiview decoderN/AStatic pixel spacePoint cloud and meshThe Models Resource and ShapeNet [49]Characters, airplanes, and chairsNoNoYesConference: 3DIMPVT
Han et al. [58]2017Deep regression networkN/AStatic pixel spaceMeshFaceware-house [117]Face caricaturesNoYesYesJournal: TOG
Text to 3D shape manipulationLiu et al. [50]2022Shape autoencoder, word-level spatial transformer, and shape generator (IMLE [140])NLDN/AImplicit representation and mesh3D-text dataset [17]Chairs and tablesNoNoNoConference: CVPR
Wang et al. [61]2022Disentangled conditional NeRF, CLIP [52], and GANSemantic keywords and object namesN/ANeRF [103]Photoshapes [157] and Carla [158]Chairs and carsNoYesYesConference: CVPR
Michel et al. [36]2022Neural style filed network, differentiable renderer, and CLIP [52]Semantic keywords and object namesN/AMeshCOSEG [94], Thingi10K [159], and ShapeNet [49], Turbo Squid and ModelNet [95]Diverse classesYesNoYesConference: CVPR
Sketch to 3D shape manipulationGuillardet al. [6]2021Encoder (MeshSDF [151]), decoder, and differential rendererN/AStatic pixel spaceImplicit representation and meshShapeNet [49]Cars and chairsNoYesNoConference: ICCV
Jin et al. [51]2020VAEN/AStatic pixel spaceVoxel and meshPSB [67] and benchmark [156]Diverse classesYesNoNoConference: I3D
Type of DLCMTReferenceYearMethodText typeSketch type3D representationDatasetObject classGeneralizability beyond trained classesUser interfaceUser studyPublication source
Text to 3D shape retrievalHan et al. [66]2019CNN and GRUNLDN/AVoxel3D-text dataset [17]Chairs and tablesNoNoNoConference: AAAI
Chen et al. [17]2018Text encoder (CNN, GRU) and shape encoder (3D-CNN)NLDN/AVoxelProposed a 3D-text dataset based on ShapeNet [49]Chairs, tables, and synthetic objectsNoNoNoConference: ACCV
Text to 3D shape generationJain et al. [102]2022Network based on CLIP [52]NLDN/ANeRF [103]Common objects in context (COCO) [139]Diverse classesYesNoNoConference: CVPR
Sanghi et al. [43]2022Network based on PointNet [126], CLIP [52], and OccNet [98]Object namesN/AVoxelShapeNet [49]Diverse classesNoNoNoConference: CVPR
Liu et al. [50]2022Shape autoencoder, word-level spatial transformer, and shape generator (implicit maximum likelihood estimation (IMLE) [140])NLDN/AImplicit representation, mesh3D-text dataset [17]Chairs and tablesNoNoNoConference: CVPR
Jahan et al. [93]2021Shape encoder and decoder label regression network,Semantic keywordsN/AImplicit representation, meshCOSEG [94] and ModelNet [95]Chairs, tables, and lampsNoNoNoJournal: CGF
Li et al. [97]2020GAN-based networkNLDN/AVoxel3D-text dataset [17]Chairs and tablesNoNoNoConference: ICVRV
Chen et al. [17]2018Text encoder (CNN, GRU), shape encoder (3D-CNN), and GANNLDN/AVoxelProposed a 3D-text dataset based on ShapeNet [49]Chairs, tables, and synthetic objectsNoNoNoConference: ACCV
Text to sketch generationYuan et al. [63]2021GAN and Bi-Long short term memory (LSTM)NLDStatic pixel spaceN/AProposed SketchCUB based on CUB [106]BirdsNoNoYesConference: CVPR
Huang et al. [54]2020Composition proposer (transformer) and object generator (Sketch-RNN [16])NLDDynamic stroke coordinate spaceN/ACoDraw [141]Diverse classesYesYesYesConference: IUI
Huang and Canny [53]2019Scene composer (transformer) and object sketcher (Sketch-RNN [16])NLDDynamic stroke coordinate spaceN/AVisual Genome [107] and Quick, Draw! [108]Diverse classesYesYesYesConference: UIST
Wang et al. [105]2018GAN-based networkNLDStatic pixel spaceN/AProposed Text2Sketch based on dataset [142]Human facesNoNoNoConference: ICIP
Sketch to 3D shape retrievalQin et al. [32]2022Generative Recursive Autoencoders for Shape Structures (GRASS) [143] and k-nearest neighborsN/AStatic pixel spaceB-RepProposed a CAD model-sketches datasetDiverse classesYesYesNoJournal: AEI
Yang et al. [75]20223D model network and 2D sketch network (MVCNN [84])N/AStatic pixel spaceMeshSHREC13 [82], SHREC14 [83], and SHREC16 [91]Diverse classesYesNoNoJournal: MS
Qi et al. [34]2021Sketch encoder and shape encoder (MVCNN [84])N/AStatic pixel spaceMeshProposed a fine-grained dataset based on ShapeNet [49]Chairs and lampsNoNoNoJournal: TIP
Manda et al. [86]2021MVCNN [84], Group-View Convolutional Neural Networks (GVCNN) [144], RotationNet [145], and Multiview convolutional neural networks with Self Attention (MVCNN-SA) [146]N/AStatic pixel spaceB-RepProposed CADSketchNet based on ESB [87] and MCB [88]Diverse classesYesNoNoJournal: CG
Liang et al. [80]2021Sketch network and view networkN/AStatic pixel spaceMeshSHREC13 [82] and SHREC14 [83]Diverse classesYesNoNoJournal: TIP
Liu and Zhao [81]2021MVCNN [84] and guidance cleaning networkN/AStatic pixel spaceMeshSHREC13 [82] and SHREC14 [83]Diverse classesYesNoNoConference: ICCEIA-VR
Xia et al. [74]2021Student network and teacher network (MVCNN [84])N/AStatic pixel spaceMeshSHREC13 [82]Diverse classesYesNoNoConference: ICCS
Li et al. [55]2021CNN-based networkN/AType II 3D sketchMeshSHREC16STB [89]Diverse classesYesYesNoJournal: MTA
Navarro et al. [85]2021CNN-based networkN/AStatic pixel spaceMeshProposed a line drawing dataset based on ShapeNet [49]Diverse classesYesNoNoJournal: CGF
Chen et al. [78]2019Sketch network, segmented stochastic-viewing shape network, and view attention networkN/AStatic pixel spaceMeshSHREC13 [82], SHREC14 [83], and PART-SHREC14 [147]Diverse classesYesNoNoConference: CVPR
Dai et al. [71]2018Source domain network and target domain network (3D-scale-invariant feature transform (SIFT) [148])N/AStatic pixel spaceMeshSHREC13 [82], SHREC14 [83], and SHREC16 [91]Diverse classesYesNoNoJournal: TIP
Chen and Fang [73]2018MVCNN [84], GAN, metric network, and cross-modality transformation networkN/AStatic pixel spaceMeshSHREC13 [82] and SHREC14 [83]Diverse classesYesNoNoConference: ECCV
Sketch to 3D shape retrievalDai et al. [72]2017Source domain network and target domain network (3D-SIFT [148])N/AStatic pixel spaceMeshSHREC13 [82] and SHREC14 [83]Diverse classesYesNoNoConference: AAAI
Xie et al. [77]2017CNN and metric networkN/AStatic pixel spaceMeshSHREC13 [82] and SHREC14 [83]Diverse classesYesNoNoConference: CVPR
Zhu et al. [70]2016Cross-domain neural network and pyramid cross-domain networkN/AStatic pixel spaceMeshSHREC14 [83]Diverse classesYesNoNoConference: AAAI
Ye et al. [89]2016CNN-based networkN/AType II 3D sketchMeshProposed SHREC16STBDiverse classesYesNoNoConference: ICPR
Wang et al. [48]2015CNN and Siamese networkN/AStatic pixel spaceMeshPSB [67], SHREC13 [82], and SHREC14 [83]Diverse classesYesNoNoConference: CVPR
Sketch and text to 3D shape retrievalStemasov et al. [62]2022Flask representation state transfer and HoloLensNLDType II 3D sketchMesh and voxelThingiverse and MyMiniFactoryDiverse classesYesYesNoConference: CHI
Giunchi et al. [44]2021CNN-based networkNLDType II 3D sketchMeshProposed a variational chairs dataset based on ShapeNet [49]ChairsNoYesYesConference: IMX
Sketch to 3D shape generationLi et al. [33]2022Target-embedding variational autoencoderN/AStatic pixel spaceMeshDataset [149]Cars and cupsNoNoNoJournal: JMD
Nozawa et al. [20]2022GAN and lazy learningN/AStatic pixel spacePoint cloud and meshShapeNet [49]CarsNoNoNoJournal: VC
Du et al. [59]2021CNN, OccNet [98], and 3D-CNNN/AStatic pixel spaceImplicit representation and meshPartNet [150]Chairs, tables, and lampsNoYesYesJournal: CGF
Wang et al. [118]2021Sketch component segmentation network, transformation network, and VAEN/AStatic pixel spacePoint cloud and meshDataset [35]Characters, airplanes, and chairsNoNoNoJournal: WCMC
Guillard et al. [6]2021Encoder (MeshSDF [151]), decoder, and differential rendererN/AStatic pixel spaceImplicit representation and meshShapeNet [49]Cars and chairsNoYesNoConference: ICCV
Sketch to 3D shape generationZhang et al. [120]2021View-aware generation network (encoder and decoder) and discriminatorN/AStatic pixel spaceMeshShapeNet-Sketch [152], Sketchy [153], and TuBerlin [154]Diverse classesYesNoNoConference: CVPR
Yang et al. [115]2021CNN-based networkN/AStatic pixel spaceMeshArchive of motion capture as surface shape (AMASS) [155]Human bodiesNoNoNoConference: MMM
Luo et al. [60]2021Voxel-aligned implicit network and pixel-aligned implicit networkN/AStatic pixel spaceImplicit representation and meshProposed 3DAnimalHeadAnimal headsNoYesYesConference: UIST
Jin et al. [51]2020VAEN/AStatic pixel spaceVoxel and meshPSB [67] and benchmark [156]Diverse classesYesNoNoConference: I3D
Smirnov et al. [5]2020CNN-based networkN/AStatic pixel spaceB-Rep and meshShapeNet [49]Diverse classesNoNoNoConference: ICLR
Nozawa et al. [19]2020Encoder–decoder and lazy learningN/AStatic pixel spacePoint cloud an meshShapeNet [49]CarsNoNoNoConference: VISIGRAPP
Smirnov et al. [122]2019CNN-based networkN/AStatic pixel spaceB-Rep and meshShapeNet [49]Diverse classesNoNoNoConference: ICLR
Delanoy et al. [114]2019CNN-based networkN/AType I 3D sketchVoxelCOSEG [94]Chairs, vases, and synthetic shapesNoNoNoJournal: CG
Wang et al. [121]2018Autoencoder and GANN/AStatic pixel spaceVoxelSHREC13 [82] and ShapeNet [49]ChairsNoNoNoConference: MM
Li et al. [56]2018DFNet (encoder–decoder) and GeomNet (encoder–decoder)N/AStatic pixel spaceMeshDataset [35]CharactersNoYesYesJournal: TOG
Delanoy et al. [57]2018Singleview CNN and updater CNNN/AType I 3D sketchVoxelCOSEG [94]Chairs, vases, and synthetic shapesNoYesYesJournal: PACMCGIT
Lun et al. [35]2017Encoder and multiview decoderN/AStatic pixel spacePoint cloud and meshThe Models Resource and ShapeNet [49]Characters, airplanes, and chairsNoNoYesConference: 3DIMPVT
Han et al. [58]2017Deep regression networkN/AStatic pixel spaceMeshFaceware-house [117]Face caricaturesNoYesYesJournal: TOG
Text to 3D shape manipulationLiu et al. [50]2022Shape autoencoder, word-level spatial transformer, and shape generator (IMLE [140])NLDN/AImplicit representation and mesh3D-text dataset [17]Chairs and tablesNoNoNoConference: CVPR
Wang et al. [61]2022Disentangled conditional NeRF, CLIP [52], and GANSemantic keywords and object namesN/ANeRF [103]Photoshapes [157] and Carla [158]Chairs and carsNoYesYesConference: CVPR
Michel et al. [36]2022Neural style filed network, differentiable renderer, and CLIP [52]Semantic keywords and object namesN/AMeshCOSEG [94], Thingi10K [159], and ShapeNet [49], Turbo Squid and ModelNet [95]Diverse classesYesNoYesConference: CVPR
Sketch to 3D shape manipulationGuillardet al. [6]2021Encoder (MeshSDF [151]), decoder, and differential rendererN/AStatic pixel spaceImplicit representation and meshShapeNet [49]Cars and chairsNoYesNoConference: ICCV
Jin et al. [51]2020VAEN/AStatic pixel spaceVoxel and meshPSB [67] and benchmark [156]Diverse classesYesNoNoConference: I3D

We did not find any work related to text-to-sketch retrieval, possibly due to the lack of interest in practical applications. We obtained two articles for text-to-3D shape retrieval, six articles for text-to-3D shape generation, four articles for text-to-sketch generation, 19 articles for sketch-to-3D shape retrieval, 18 articles for sketch-to-3D generation, and five articles for cross-modal design manipulation. Among these works, Ref. [17] can work for text-to-3D shape retrieval and generation; Ref. [50] can perform text-to-3D shape generation and manipulation; Refs. [6,51] are shown to be capable of sketch-to-3D shape generation and manipulation.

Only 15 peer-reviewed publications are relevant to text-to-3D shape retrieval, text-to-3D shape generation, text-to-sketch generation, and cross-modal design manipulation, but we observe a recent surging interest in these topics especially text-related ones, possibly due to advances in natural language processing (e.g., contrastive language-image pretraining (CLIP) [52]) since our preliminary literature review [24].

There are 13 studies [6,32,44,5362] that provide user interfaces. The user interface application serves as a way to show the effectiveness of the proposed deep learning approach, which can also better facilitate human–AI interaction for creative designs. Especially, Refs. [44,62] provide user interfaces in VR and AR settings, respectively, which can further improve the user experience of human–computer interaction in immersive design. Additionally, 12 studies [35,36,44,53,54,5661,63] conducted user studies to further validate their methods and user applications. User studies can serve as a way to hear from human users so that researchers can improve the proposed methods from users’ feedback. It can also help study human–computer interaction in a real situation.

The articles reviewed are from conference proceedings (32) and journals (18). Most DLCMT methods come from the domains of computer science and computer engineering with only two papers [32,33] from the engineering design community.

5 Review and Discussion

In this section, we summarize our review of the papers in each of the cross-modal task categories and discuss their technical details, from which we draw insights into the challenges and opportunities of applying such methods in the engineering design field and discuss potential solutions to the challenges.

5.1 RQ 1-(1): What DLCMT Methods Can Be Used in Design Search of Conceptual Design?

5.1.1 Text-to-3D Shape Retrieval.

The history of text-to-3D shape retrieval methods can be traced back to Min et al. [64], who used pure text information (query text and description associated with 3D shapes) for the 3D retrieval task, which is essentially a text–text matching.

For state-of-the-art deep learning methods as we introduce below, it is a common strategy to learn a cross-modal representation for text and 3D shapes using cross-modal representation learning techniques (see Ref. [4] for more information). Figure 6(a) demonstrates the process of a text-to-3D retrieval task. As a pioneering and representative work for this task, Chen et al. [17] first constructed a joint embedding of text and 3D shapes using an encoder composed of a convolution neural network (CNN) and a recurrent neural network (RNN) on text data and a 3D-CNN encoder on 3D voxel shapes. A triplet loss was applied and learning-by-association [65] was used to align the embedded representations of text and 3D shapes. They also introduced a 3D-text cross-modal dataset including two sub-datasets: (1) ShapeNet [49] (chairs and tables only) with a natural language description and (2) geometric primitives with synthetic text descriptions. However, the computational cost caused by the cubic complexity of 3D voxels limits this method to the machine learning of low-resolution voxels. Consequently, the learned joint representations will have low discriminative ability. Han et al. [66] built a Y2Seq2Seq network architecture using a gated recurrent unit (GRU, one variation of RNN) to encode features of multiple-view images to represent the shape. To obtain the joint embedding of text and sketches, they trained the network using both intermodality and intramodality reconstruction losses, in addition to the triplet loss and classification loss. Therefore, the proposed network could learn more discriminative representations than Ref. [17].

Fig. 6
Demonstration of (a) text-to-3D shape retrieval: retrieving 3D shapes that best match the NLDs from a given dataset or repository and (b) text-to-3D shape generation: automatically generating a 3D shape that matches the NLD. The examples of NLD and images are obtained from ShapeNet [49].
Fig. 6
Demonstration of (a) text-to-3D shape retrieval: retrieving 3D shapes that best match the NLDs from a given dataset or repository and (b) text-to-3D shape generation: automatically generating a 3D shape that matches the NLD. The examples of NLD and images are obtained from ShapeNet [49].
Close modal

5.1.2 Sketch-to-3D Shape Retrieval.

Sketch-to-3D shape retrieval has been extensively studied using non-deep learning methods [68]. These methods usually consist of three steps: (1) automatically select multiple views from a given 3D shape in the hope that one of them is similar to the input sketch(es); (2) project the 3D shape into 2D space from the selected viewpoints; and (3) match the sketch against the 2D projections based on predefined features. However, the selection of best viewpoints, as well as the design of predefined matching features, could be subjective and random, which motivates the development of deep learning-based methods that can avoid the subjective selection of views and learn features from the data of sketches and 3D shapes [48]. In light of the scope of this review, we focus on deep learning methods for sketch-to-3D shape retrieval.

Wang et al. [48] initialized the effort and proposed to learn feature representations for sketch-to-3D shape retrieval as shown in Fig. 7, which avoided computing multiple views of a 3D model. They applied two Siamese CNNs [69] for views of 3D shapes and sketches, respectively, and a loss function defined on the within-domain and cross-domain similarities. To reduce the discrepancies between the sketch features and the 3D shape features, Zhu et al. [70] built a pyramid cross-domain neural network of sketches and 3D shapes. They used the network to establish a many-to-one relationship between the sketch features and a 3D shape feature. Dai et al. [71,72] proposed a novel deep correlated holistic metric learning method with two distinct neural networks for sketch and 3D shape. Such a deep learning method mapped features from both domains into one feature space. In the construction of its loss function, both discriminative loss and correlation loss were used to increase the discrimination of features within each domain and the correlation between domains. Chen and Fang [73] developed a GAN-based deep adaptation model to transform sketch features into 3D shape features, of which correlations can be enhanced by minimizing the mean discrepancy between modes. Xia et al. [74] proposed a novel semantic similarity metric learning method based on a “teacher–student” strategy by using a teacher network to guide the training of the student network. The teacher network was trained to extract the semantic features of the 3D shapes. The student network was then trained using the pre-learned 3D shape features to learn the sketch features. Similarly, Yang et al. [75] applied a sequential learning strategy to learn 3D shape features without 2D sketches first and then used the learned features of 3D shapes to guide the learning of sketch features. During the query process, they further integrated clustering algorithms to categorize subclasses in a shape class to improve retrieval accuracy. In the methods mentioned above, deep metric learning [76] was applied to mitigate the modality discrepancy between the sketch and the 3D shape.

Fig. 7
Sketch-to-3D shape retrieval method by Wang et al. [48]. For each row, the 2D drawing is the query sketch and the 3D models are the retrieved 3D shapes from an existing dataset, PSB [67]. The figure is used with permission.
Fig. 7
Sketch-to-3D shape retrieval method by Wang et al. [48]. For each row, the 2D drawing is the query sketch and the 3D models are the retrieved 3D shapes from an existing dataset, PSB [67]. The figure is used with permission.
Close modal

There are also methods that study how to represent 3D shapes more comprehensively so that 3D shapes can better correspond to sketches. Xie et al. [77] proposed a method to learn a Wasserstein barycenter of CNN features extracted from 2D projections of a 3D shape. They constructed the metric network to map sketches and the Wasserstein barycenters of 3D shapes to a common deep feature space. Then a discriminative loss was formulated to learn the deep features. The deep features learned could then be used for the sketch-to-3D shape retrieval. Chen et al. [78] proposed a novel stochastic sampling method to randomly sample rendering views of the sphere around a 3D shape and incorporated an attention network (see Ref. [79] for a comprehensive review) to exploit the importance of different views. They also developed a novel binary coding strategy to address the time-efficiency issue of sketch-to-3D shape retrieval.

Another direction to reduce the large cross-modality difference between 2D sketches and 3D shapes is to deal with noise in the sketch data. Liang et al. [80] pioneered this direction by developing a method called noise-resistant sketch feature learning with uncertainty, which achieved the new state-of-the-art for sketch-based 3D shape retrieval. Liu et al. [81] proposed a guidance cleaning network to remove low-quality sketches that have much noise, which is like a data cleaning process. The authors showed superior results over state-of-the-art methods because the learning of noisy data was suppressed.

All the methods introduced above achieve state-of-the-art results on commonly used sketch-to-3D retrieval datasets, such as princeton shape benchmark (PSB) [67], SHREC13 [82], and SHREC14 [83]. The multiview CNN (MVCNN) [84] has been widely used in all these methods to generate features from projection images of 3D shapes. Different from these methods aiming to retrieve objects by coarse category-level retrieval of 3D shapes given an input sketch, Qi et al. [34] introduced a novel task of fine-grained instance-level sketch-to-3D shape retrieval, with the aim of retrieving one specific 3D shape that best matches the input sketch. They created a set of paired sketch-to-3D shape data of chairs and lamps from ShapeNet [49]. Then, they built a deep joint embedding learning-based model with a novel cross-modal view attention module to learn the features of sketches and 3D shapes. As the first effort to find local image correspondences between design sketches, Navarro et al. [85] proposed a synthetic line drawing dataset rendered from 3D shapes from ShapeNet [49]. The authors obtained a learned descriptor, namely, SketchZoom descriptor, for dense registration in line drawings and showed its promising application in sketch-3D shape retrieval by identifying local correspondences between sketches.

There is also interest in using CAD data in 3D shape retrieval. Qin et al. [32] developed a sketch-to-3D CAD shape retrieval approach using the VAE and structural semantics. They created their training dataset by collecting 3D CAD models from local companies and obtained their six-view projections as sketch data. Manda et al. [86] developed a new sketch-3D CAD model dataset, CADSketchNet, from the engineering shape benchmark (ESB) [87] and mechanical components benchmark (MCB) [88] datasets. The authors also analyzed various deep learning-based sketch-to-3D retrieval approaches using the proposed dataset and reported the comparison results.

Efforts have also been made to bridge the semantic gap between sketches and 3D shapes to improve sketch-based 3D shape retrieval. Ye et al. [89] presented a CNN-based 3D sketch-based shape retrieval (CNN-SBR) architecture based on 3D sketch (Type II) data obtained from SketchANet [90]. Using data augmentation to prevent overfitting, they achieved a significant improvement compared to other learning-based methods. Building on previous work [89,91], Li et al. [55] proposed a novel interactive application supported by CNN-SBR. The method used Microsoft Kinect, which can track the 3D locations of 20 joints of a human body, to track the 3D locations of a user’s hand to create a 3D sketch. The proposed method was tested on a proposed dataset and achieved state-of-the-art performance in 3D sketch-based 3D shape retrieval.

The idea of utilizing a 3D sketch (type II) as query input has been further applied to VR and AR settings to facilitate the immersive design. Building on the method proposed in Ref. [92], Giunchi et al. [44] designed a multimodal interface for 3D model retrieval in VR with both sketch and voice input. The authors implemented a consistent translation method between queries of 3D sketch and voice, allowing their integration during a single search session. Similarly, ShapeFindAR [62] combined both 3D sketch and textual input to enable in situ spatial search of a 3D model repository in an AR setting. The server was built using a representation state transfer application programming interface provided by Flask, a web framework for the python programming language.

5.2 RQ 1-(2): What DLCMT Methods Can Be Used in Design Creation of Conceptual Design?

5.2.1 Text-to-3D Shape Generation.

The task of text-to-3D shape generation is illustrated in Fig. 6(b). To accomplish this task, Jahan et al. [93] proposed a semantic label-guided shape generation approach, which can take one-hot semantic keywords as input and generate 3D voxel shapes without color and texture. The proposed method was trained using chairs, tables, and lamps obtained from the co-segmentation (COSEG) dataset [94] and ModelNet [95]. Based on their work on text-to-3D shape retrieval task using a joint embedding of text and 3D shape, Chen et al. [17] further combined the joint embedding model with a conditional Wasserstein GAN (WGAN) framework [96], which enables the generation of colored voxel shapes in low resolution. To improve the surface quality of the generated 3D shapes, several studies have been conducted using the proposed 3D-text cross-modal dataset by Chen et al. [17]. Li et al. [97] proposed to use class labels to guide the generation of 3D voxel shapes with the assumption that shapes with different labels (e.g., chairs and tables) have different characteristics. They added an independent classifier to the WGAN framework [96] to guide the training process. The classifier could be trained together with the generator to enable more distinctive class features in the generated 3D shapes. To further improve the quality of 3D shapes generated with color and shape, Liu et al. [50] leveraged implicit occupancy [98] as the 3D representation and proposed a word-level spatial transformer [99] to correlate shape features with semantic features of text by decoupling shape and color predictions for learning features in both texts and shapes.

The methods introduced above only support the generation of 3D shapes in individual categories (e.g., the chair category or the table category). The generalizability (the ability to generalization) of these methods remains challenging due to the unavailability and limited size of the paired data of 3D shapes and text descriptions. To improve generalizability, some researchers have tried to utilize some pre-trained models (e.g., CLIP [52]) and zero-shot learning techniques [100]. Sanghi et al. [43] proposed a method called CLIP-forge, which could generate 3D voxel shapes from text descriptions for ShapeNet [49] objects. It required training data (i.e., rendered images, voxel shapes, query points, and occupancy) obtained from 3D shapes without text labels. They first learned an encoding vector of a 3D geometry and then a normalizing flow model [101] of that encoding vector conditioned on a CLIP [52] feature embedding.

CLIP-forge has good generalizability to ShapeNet [49] categories. To further improve the generalizability to classes outside common 3D shape datasets (e.g., ShapeNet [49] and ModelNet [95]), Jain et al. [102] combined neural radiance field (NeRF) [103] with an image-text loss from CLIP [52] to form dream fields. A dream field is a neural 3D representation that can return a rendered 2D image given the desired viewpoint. After training, the method could generate colored 3D neural geometry from text prompts without using 3D shape data, resulting in better generalizability.

5.2.2 Text-to-Sketch Generation.

Sketches can inspire design ideas [1214], and text-to-sketch tools could help designers efficiently capture fleeting design inspirations. The generation of images from text descriptions (i.e., text-to-image synthesis/generation) has seen great progress recently [104]. Unlike text-to-image generation, text-to-sketch synthesis is more challenging and can only rely on rigid edge/stroke information without color features (i.e., pixel values) in an image [63].

Text2Sketch [105] applied a Stagewise-GAN (i.e., generative adversarial network) to encode human face attributes identified from text descriptions and transforms those attributes into sketches, which were trained on a manually annotated dataset of text-face sketches. Although the method was applied in face recognition instead of product design, it is worth being introduced here because the method is inspiring and could be applied to the design domain if a different dataset is used. Yuan et al. [63] constructed a bird sketch dataset by modifying the Caltech-University of California San Diego (UCSD) Birds (CUB) dataset [106], based on which they trained a novel GAN-based model, called T2SGAN. The model featured a conditional layer-instance normalization module that could fuse the image features and sentence vectors, thus efficiently guiding the generation of sketches.

The methods mentioned above were developed for single-object sketch synthesis, and there are also methods for multi-object generation, which could be useful for generating designs part by part. An example of such methods is shown in Fig. 8. Huang and Canny [53] developed Sketchforme by adopting a two-step neural network: (1) a transformer-based mixture density network for the scene composer to generate high-level layouts of sketches and (2) a sketch-RNN [16] based object sketcher to generate individual object sketches. The scene composer and the object sketcher were trained using the visual genome dataset [107] and the “Quick, Draw!” dataset [108], respectively. Since different datasets of text and sketches can be used, this method helped avoid the requirement for paired data of text description and sketches of an object. Based on Ref. [53], Huang et al. [54] took a further step and proposed an interactive sketch generation system called scones. It used a composition proposer to propose a scene-level composition layout of objects and an object generator to generate individual object sketches.

Fig. 8
Demonstration of text-to-sketch generation, which can generate sketches that correspond to users’ NLDs
Fig. 8
Demonstration of text-to-sketch generation, which can generate sketches that correspond to users’ NLDs
Close modal

5.2.3 Sketch-to-3D Shape Generation.

There are mainly two paradigms for 3D shape reconstruction from 2D sketches: the geometric-based method and the learning-based method. Sketch-based interfaces for modeling are a major branch of geometric-based methods [109] and we do not review this line of work in light of the scope of review. We also excluded some methods that apply deep learning techniques, but require predefined geometric models to guide 3D reconstruction, such as the methods presented in Refs. [58,110]. We focus on reviewing deep learning-based methods without using predefined geometric models that require the design of rules.

Deep learning-based sketch-to-3D shape generation without any predefined geometric models was initialized by Lun et al. [35]. They proposed an encoder-multiview-decoder architecture that can extract multiview depth and normal maps from a single sketch or multiple sketches and output a 3D shape in point clouds. The resulting 3D point cloud shape can be converted to a 3D mesh shape for better visualization. 2.5D visual surface geometry (e.g., depth and normal maps) is a representation that can make a 2D image appear to have 3D qualities [111,112]. Similarly to Ref. [35], many works use the strategy of predicting 2.5D information first to guide the generation of 3D shapes. Nozawa et al. [19] extracted depth and mask information from a single input sketch by an encoder–decoder network. Then, a lazy learning [113] method was performed to find similar samples in the dataset to synthesize a 3D shape represented by point clouds. Later, Nozawa et al. [20] extended Ref. [19] by changing the architecture with a combination of GAN and lazy learning.

To improve the surface quality of the shapes resulting from their previous work [57], Delanoy et al. [114] proposed to first predict one normal map per input 3D sketch (type I). Then they fused all normal maps predicted from multiview sketches to the predicted 3D voxel shape to optimize the resulting surface mesh. Li et al. [56] introduced an intermediate CNN layer to model the direction of dense curvature and used an additional output confidence map along with the depth and normal maps extracted using CNNs to generate high-quality 3D mesh shapes. They also provided a user-interaction system for 3D shape design. Similar to the idea of obtaining an intermediate 2.5D representation, Yang et al. [115] proposed a skeleton-aware modeling network to generate 3D human body models using skeletons as the intermediate representation. The network can first interpret sparse joints from input sketches and then predict the skinned multi-person linear model [116] parameters based on joint-wise features. Although this work focuses on the generation of human bodies, the proposed network can inspire design researchers to consider predicting important feature points to guide the generation of 3D shapes. Li et al. [33] proposed a predictive and generative target-embedding variational autoencoder and demonstrated its effectiveness by solving a sketch-to-3D shape generation problem. The authors used a 3D extrusion shape obtained by extruding a 2D silhouette sketch as an intermediate representation, which transferred the problem to a 3D–3D prediction problem. The approach can predict a high-quality 3D mesh shape from a silhouette sketch without inner contour lines, as shown in Fig. 9. In addition to the prediction function, the proposed approach can also generate numerous novel 3D mesh shapes using its generative function.

Fig. 9
Sketch-to-3D shape generation method by Li et al. [33]. The first row shows the input 2D silhouette sketches, and the corresponding predicted 3D mesh shapes are shown in the second row.
Fig. 9
Sketch-to-3D shape generation method by Li et al. [33]. The first row shows the input 2D silhouette sketches, and the corresponding predicted 3D mesh shapes are shown in the second row.
Close modal

The efforts of providing an easy-to-use sketching system can be beneficial to novice users for customized design. Delanoy et al. [57] proposed an interactive sketch-to-3D generations system. They used a CNN to transform 3D sketches (type I) to 3D voxel shapes, and another CNN as an updater to update the predicted 3D shape while users are providing more sketches. The voxel shapes can then be transferred to 3D mesh shapes. However, the output 3D shapes are low quality due to the high memory consumption of the voxel representation. To improve the surface quality of the resulting 3D shapes, mesh and implicit field have been applied by some interaction systems. For example, Han et al. [58] proposed a novel sketching system to generate 3D mesh human faces and caricatures using a CNN-based deep regression network. The method was trained on a newly proposed dataset extended from FaceWare-house [117]. Du et al. [59] designed a novel sketching system composed of a part generator and an automatic assembler to generate part-aware man-made objects with complex structures. They used implicit occupancy [98] as the 3D representation which can be transferred to a 3D mesh shape with detailed geometry. Similarly, Wang et al. [118] introduced a novel sketch-to-3D shape method that can segment a given sketch and build a transformation template that is then used to generate multifarious sketches. These sketches are then taken as input to an encoder-multiview-decoder network similar to Ref. [35] to generate a 3D point cloud shape. Luo et al. [60] proposed a coarse-to-fine-grained 3D mesh modeling system using 3D sketches as input for animalmorphic head design. A coarse mesh can be first generated by the input 3D sketch. Then, a novel pixel-aligned implicit learning approach is used to guide the deformation of the coarse mesh to produce a more detailed mesh. Guillard et al. [6] introduced an interactive system to reconstruct and edit 3D shapes using implicit field representation, DeepSDF [119] format, from 2D sketches using an encoder–decoder architecture, which can output mesh shapes.

The aforementioned methods are usually trained using one individual category of objects and can only deal with 3D shape generation from sketches within that specific category. To improve the generalizability of the method, Jin et al. [51] proposed a novel network consisting of a VAE (i.e., variational autoencoder) and a volumetric autoencoder to learn the joint embedding of sketches and 3D shapes using various classes of objects. The trained network has good generalizability and can be used to predict 3D voxel shapes based on 2D occluding contours. Zhang et al. [120] are the first to generate a 3D mesh shape from a single free-hand sketch. They proposed a view-aware network based on GAN to explicitly condition the process of generating 3D mesh shapes on viewpoints. The method can improve generation quality and bring controllability to output shapes by explicitly adjusting viewpoints, which can be well generalized to out-of-distribution data.

The methods introduced above have to be trained using supervised learning, which means that the training data must be pairs of sketches and 3D shapes (i.e., labeled data). Wang et al. [121] proposed an unsupervised learning method for sketch-to-3D shape reconstruction. They embedded unpaired sketches and rendered images from 3D shapes to a common latent space by training an adaption network via autoencoder with adversarial loss. During the inference of 3D shapes from sketches, they retrieved several nearest-neighboring 3D shapes from the training dataset as prior knowledge for a 3D GAN to generate new 3D shapes that best match the input sketch. This method can only output very coarse 3D voxel shapes but provides an interesting idea based on unsupervised learning for sketch-to-3D shape generation.

In addition to the usage of popular 3D shape representations (e.g., point clouds, voxels, meshes, and implicit representation) in sketch-to-3D shape generation, new 3D representations are gaining more and more attention in this field. For example, Smirnov et al. [5,122] proposed a novel deformable parametric template composed of Coon patches that can naturally fit into a conventional CAD modeling pipeline. The resulting 3D shapes can be easily converted to non-uniform rational basis spline (NURBS) representation, allowing edits in cad software.

5.3 RQ 1-(3): What DLCMT Methods Can Be Used in Design Integration of Conceptual Design?.

In this section, we introduce some works relevant to text-to-3D shape and sketch-to-3D shape integration methods. These methods allow designers to further edit and manipulate 3D designs by changing text prompts or sketches.

The sketch-to-3D shape generation method introduced by Jin et al. [51] could be further used to manipulate a given 3D voxel shape to target input sketches with the learned joint embedding space. However, it focuses on manipulating the outline of a given 3D shape. To enable manipulation of color and shape, CLIP-NeRF [61] was proposed based on CLIP [52], which has a disentangled conditional NeRF [103] architecture by introducing a shape code to deform the 3D volumetric field and an appearance code to control the colors. The method can edit a given colored 3D voxel shape to meet the target semantic description of color and shape. The text-to-3D generation method [50] can also allow intuitive manipulation of the color and shape of a generated 3D mesh shape simply by changing the input semantic keywords of color or shape.

To enable detailed edits or manipulation of geometries, in some works a differentiable renderer has been applied. Sketch2Mesh [6] introduced in Sec. 5.2.3 can also perform shape editing due to the integrated differentiable renderer. Using the representation power of CLIP [52], Michel et al. [36] proposed Text2Mesh (see Fig. 10) to manipulate a given 3D mesh shape by predicting color and local geometric details that conform to the description of the target text.

Fig. 10
Text-to-3D shape manipulation method, Text2Mesh by Michel et al. [36]. The method can manipulate an existing mesh shape by adding color, texture, and geometric details driven by a target natural language description. The figure is used with permission.
Fig. 10
Text-to-3D shape manipulation method, Text2Mesh by Michel et al. [36]. The method can manipulate an existing mesh shape by adding color, texture, and geometric details driven by a target natural language description. The figure is used with permission.
Close modal

There have been a series of DLCMT methods that can be applied to product shape design in different design steps of conceptual design. As a summary of the review, DLCMT methods indeed provide opportunities to address the two major challenges as discussed in Sec. 2 because they can (1) take various design modalities as input and provide methods catering to design search, design creation, and design integration, and (2) improve design creativity by actively involving human input [53,54,59,60]. Taking advantage of these opportunities and implementing the appropriate DLCMT methods in conceptual design can therefore accelerate the search and iteration of design concepts (e.g., Refs. [17,44,48]) and the modification of designs (e.g., Refs. [36,43,51,58]). We also observe that DLCMT methods could be particularly useful in design applications, such as design democratization, design education, and immersive design (e.g., Refs. [17,44,48,62,89]).

5.4 RQ 2: What Are the Challenges in Applying DLCMT to Conceptual Design and How Can They Be Addressed?.

Examination of the literature has helped us identify several challenges in applying DLCMT methods to conceptual design. DLCMT has been focusing on shape synthesis, which can be applied in product shape design, as discussed above. However, Regenwetter et al. [3] state that 3D synthesis work is only tangential to engineering design because they focus more on visual appearance, rather than functional performance or manufacturability. Although we partially agree with Ref. [3] that the overlap between shape synthesis and engineering design is insignificant in light of the importance of shape design, we must admit that product shape is not the only focus in conceptual design. Other factors, such as engineering performance, system design features, and manufacturability, should also be considered and can be incorporated into the data-driven design cycle even in the early stages of the design.

In this section, we discuss in detail the challenges of applying DLCMT methods to engineering design from four aspects, including the lack of cross-modal datasets that incorporate engineering performance and manufacturability, complex systems design using DLCMT, 3D representations in DLCMT, and the generalizability of DLCMT methods.

5.4.1 The Lack of Cross-Modal Datasets That Incorporate Engineering Performance and Manufacturability.

Data are the fuel for deep learning-based design methods. Data sparsity is a challenging issue for data-driven design methods, and there is generally a deficiency of big practical data [3], regardless of the data modality, to train useful and meaningful models for engineered products. Unlike the computer science community, numerous open source unimodal or cross-modal datasets, such as Refs. [17,49,82,95], are available to researchers to compare their methods with state-of-the-art methods. For example, 16 articles (e.g., Refs. [6,17,34,85,120]) use ShapeNet [49] as the training data of their methods. There is a lack of similar benchmark datasets in the engineering design field. Even if those datasets from computer science can also be beneficial to the engineering design community, they mainly focus on the shape of objects and have little emphasis on downstream engineering-related information. Using text-to-3D shape methods as an example, a user could say “I want an SUV with low fuel consumption.” An SUV car shape could be easily generated, but we would not know whether the drag coefficients of the generated designs meet the requirement or not. We might ask the following question: How could a computer understand that NL description and translate it into a primitive SUV car shape taking into account the drag performance? Therefore, finding answers to this question could be an interesting research direction.

Similarly, it is also worth exploring how other downstream engineering requirements and constraints (e.g., manufacturability) can be counted when applying DLCMT to engineering design. We have not found any DLCMT methods that take into account engineering performance and manufacturability. One challenge here is the lack of such datasets. The difficulties primarily rest in the cost (either monetary or time) of running high-fidelity computational or physical experiments. Moreover, certain experimental data could be confidential for commercial or military purposes. The availability of large cross-modal datasets with engineering performance and manufacturability information could greatly ease the verification and validation of existing methods for DLCMT and promote the development of new DLCMT methods for the design of engineered products.

5.4.2 Complex Systems Design Using DLCMT.

A few DLCMT studies [53,54,59] aim to generate designs part by part considering the structural relationship among components, which can be potentially applied to the design of systems. But this leaves a large space for engineering design researchers to investigate in the future. The challenges of addressing systems design using DLCMT mainly stem from the structural complexity of an engineered product, such as dependencies, constraints, and the relationship between components.

An engineered product is usually a system consisting of interconnected parts with complex dependencies. To take into account parts’ dependency information, there are generally two ways to support the conceptual design of a product at the system level when applying DLCMT methods. In the first method, each component of the product is generated separately using DLCMT, and then the components are assembled either automatically using rules-based computer algorithms or manually [53,54]. The second method is often referred to as part-aware generative design [30,123,124]. The objective of using DLCMT methods for part-aware design is to learn the structural relationships and dependencies between parts directly from the training data so that parts generation and assembly can be automatically completed.

Compared to the first method, the second method can save time and the cost of additional assembly steps. Those steps are often non-trivial, especially when one wants to computerize the assembly process in cad software. In addition, part-aware generative design methods better capture the geometric details of 3D shapes [123,124]. For example, in the transition regions between two components (e.g., the connection regions between the side rear mirrors and the car body). These geometric details may significantly influence the engineering performance (for example, aerodynamic drag) of a design.

As mentioned above, there are a few studies, i.e., text-to-sketch generation [53,54] and sketch-to-3D [59] methods for DLCMT attempting to integrate the concept of part-aware design, but most methods treat the design object as a single monolithic part without a systems design perspective. Considering engineering applications, treating a design as a whole piece could limit the transition of the generated design shapes to later design stages, since components are usually manufactured separately. Attention has been paid to by the engineering design community [30,125] for part-aware design. However, how to enable part-aware design in DLCMT remains underexplored and is an important research direction.

5.4.3 3D Representations in DLCMT.

Designs can be factored using different representations for storage, computation, and presentation. For example, 3D representation matters both visual quality and computational cost when implementing DLCMT, and the choice between them is often a difficult decision. Furthermore, in engineering design applications, the choice of 3D representation also influences the compatibility with downstream engineering analysis in cad and cae software. In what follows, we share our insight into the challenges associated with 3D representation in both aspects.

3D shapes with high visual quality and rich geometric details can help designers better understand a design concept. Voxels, point clouds, and meshes are the most commonly used representations for 3D geometry. Similar to the pixels of images, voxel grids are naturally adapted to the convolutional neural network (CNN) model, which is the major reason for its prevalence in 3D geometry learning research. The majority of the DLCMT methods (e.g., Refs. [17,43,57,66,96,97,121]) uses voxels for 3D shape representation. Voxel shapes are usually needed to be converted to mesh shapes for better visualization. However, the transformed mesh shapes will look coarse if the resolution of the voxel shapes is low. This could negatively influence the subjective evaluation of the shape of a design concept, and the design concept might be overlooked by designers. An intuitive way to improve the resolution of the resulting 3D voxel shapes is to use high-resolution training data, but this may not be feasible due to the limited computing resources for training the neural network. Fukamizu et al. [18] provided a two-stage strategy to synthesize high-resolution 3D voxel shapes from natural language, which could be an inspiring method for dealing with low-resolution issues. Point clouds [19,20,35,126] are more efficient in representing 3D objects, but do not cover geometric details. For example, it does not encode the relationship between points and the resulting topology of an object, leading to a challenging conversion to meshes. Using meshes [56,58,120,127] for 3D representation could generally alleviate the low visual quality and data storage problems, but, in the meantime, it is challenging to prepare meshes for deep learning methods due to their discrete face structures and unordered elements. Furthermore, the topology of 3D shapes cannot be easily handled using meshes. Implicit representation of 3D shapes [6,59,60,119] represents the surface of a shape by a continuous volumetric field that encodes the boundary of the shape as the set at the zero level of the learned implicit 3D shape function. It can better address different topologies of 3D shapes and requires less data storage, which is a promising representation for high-resolution 3D shapes. See Table 2 for the pros and cons of applying those four representations to deep learning methods.

In addition to the above four representations, there are a few new 3D representations that are promising for handling the trade-off between the effectiveness of training neural networks and the quality of the resulting 3D shapes. NeRF [61,102,103] is a method for generating novel views of scenes or objects. It can take a set of input images of an object and render the complete object by interpolating between the images. NeRF [103] is also topology-free and can be sampled at high spatial resolutions. However, 3D shapes represented by NeRF are “hidden in the black box” and we can only observe them through images rendered from different viewpoints. All the 3D representations mentioned above (i.e., voxels, point clouds, meshes, NeRF, and implicit representation) are generally not adapted to cad software. This often brings about compatibility issues that could impede downstream editing and engineering analyses of the generated 3D shapes. To solve these problems, there are typically two ways. One way is to convert them to CAD models (e.g., converting stereolithography (STL)/object (OBJ) meshes to B-Rep solids). Another way is to handle the CAD shape data directly in deep learning models. Deep learning of unimodal CAD data is still an underexplored field, although some methods [128132] and CAD datasets [133136] have recently been introduced. DLCMT directly using CAD data [5] can be even more challenging due to the domain gap between design modalities and turns out to be a promising research direction.

Choosing the most appropriate 3D representation compatible with the adopted deep learning technique remains a challenging task. It involves considerations of data availability, data preprocessing, computational cost, visual quality of the resulting 3D shapes, data postprocessing, and the ability to adapt to later design stages.

5.4.4 Generalizability of DLCMT Methods.

Finally, we noticed that efforts have been made to make the DLCMT methods more generalizable, independent of the variation between design objects (e.g., Refs. [36,51]). There are advantages and disadvantages to generalizing the methods. On the one hand, the diversity in different methods helps address the unique nature of different design problems, so a generalized approach may not be optimal for solving a specific design problem. On the other hand, generalizability allows a method to apply to a wider range of design problems. We focus on discussing the advantages here since we observe trending efforts (e.g., Refs. [43,102]) aiming to improve the generalizability of DLCMT methods in the review. It is challenging for deep learning methods to be generalized across multiple design problems [3]. The generalizability of a deep learning method means its ability to generalize to classes of objects beyond those used for training data. For engineering design applications, due to the sparsity of training data and the special treatment designed in the neural network architecture for a specific problem, a deep learning-based design method is difficult to generalize even in the cases where one design modality (e.g., 2D sketches or 3D shapes) is involved, let alone the generalization issues of applying DLCMT methods that involve multiple different modalities.

Some methods [43,102] utilize transfer learning techniques (e.g., zero-shot learning) and pre-trained models (e.g., CLIP [52]) or specially designed neural network architectures (e.g., unsupervised learning methods [118]) to improve generalizability, which could be good starting points for the engineering design community to further explore other possibilities. The challenge of generalizing the methods for DLCMT couples with other challenges and requires a community-wide effort to share datasets, create data repositories, define benchmark problems, and develop testing standards.

In summary, we have discussed the opportunities and challenges associated with applying DLCMT methods to conceptual design and proposed potential solutions to overcome the challenges with the insight gained from this literature review effort. The insights generated can potentially point to promising research directions for future studies.

6 Research Questions for Future Design Research

We notice that the opportunities and challenges identified previously are highly related to several trending topics in the engineering design community. In this section, we propose six RQs that relate DLCMT to these trending topics: RQ (1) → design representations [137]; RQ (2) → generalizability and transferability of deep learning-based design methods [22]; RQ (3) → decision-making in AI-enabled design process [138]; RQ (4) and (5) → human–AI collaboration [23]; RQ (6) → design creativity in deep learning-based design process [37]. These RQs also point to potential research directions (see Sec. 7 for detail) where DLCMT can lead to. We hope these RQs can arouse a wide range of discussion and call for more efforts within the engineering design community to develop and apply DLCMT methods to address the challenges associated with conceptual design and beyond.

  1. What are the guidelines for selecting the most appropriate design representations in DLCMT?

  2. How much can the generalizability and transferability of the latent representation of multimodal data learned from DLCMT be extended across different product shape categories?

  3. Since DLCMT methods can shorten the cycle of generating designs and even connect to the downstream engineering analyses and manufacturing requirements, how could the information coming from the later design stages influence the regeneration of design concepts, and thereby a designer’s decisions?

  4. DLCMT methods have the potential to facilitate the data-driven design process with humans in the loop, but how can we balance the involvement of humans and computers, and facilitate effective bidirectional human–AI communications to better stimulate designers” creativity at the human–AI interface?

  5. With the establishment of the human–AI interaction in the conceptual design based on DLCMT, what could the co-evolution between humans and AI look like?

  6. Although design creativity can be augmented by bringing humans in the loop when using DLCMT methods for product shapes generation, these methods could suffer from the limitation of data interpolation inherently rooted in data-driven design methods. Fundamental questions, such as what new mechanisms and neural network architectures can be built to enable the algorithm to extrapolate beyond the training data, thus more effectively augmenting designers’ creativity, shall be further explored in the future.

7 Closing Remarks

In this paper, we conducted a systematic review of the methods for DLCMTs, including text-to-sketch, text-to-3D shape, and sketch-to-3D shape retrieval and generation methods, for the conceptual design of product shapes. Those methods could be applied in the design search, design creation, and design integration steps of conceptual design. Unlike other deep learning methods applied in engineering design, DLCMT allows human input of texts and sketches, which can explicitly reflect designers’ and/or users’ preferences. As designers can be more actively involved in such a design process, human–computer interaction and collaboration are promoted, thereby it has a great potential to improve the conceptual design of products using a data-driven design process with humans in the loop compared to traditional design automation methods and computer-aided design methods. DLCMT could also facilitate the engineering design education and democratization of product development by allowing intuitive inputs (e.g., text descriptions and sketches), and an immersive design environment by integrating VR, AR, and MR techniques.

With the attempt to apply new 3D data representations in DLCMT and the availability of more public datasets, opportunities open up for the development of new methods for DLCMT. However, the deficiency of training datasets, trade-off in the choice of representations of 3D shapes, lack of consideration of engineering performance, manufacturability, and part-aware design, and the ability of generalization still challenge the engineering design community to apply DLCMT to engineered product design. We would like to encourage attention and efforts from the engineering design community.

There are a few limitations in the current literature review that the authors would like to acknowledge and share. First, the set of keywords used to search the literature has covered all topics in our scope of the review. However, other topics, such as shape-to-text generation (namely, shape captioning in the literature), could also be of interest to the engineering design community. Second, for the topics of sketch-to-3D shape retrieval and generation, we did not include all relevant articles, although we have covered the most influential and the most recent publications.

In the future, we will continue the review and conduct a more comprehensive analysis of the relevant works on DLCMT. Besides the review effort, we see the merit of conducting a comparative study to further understand the effects of DLCMT on the conceptual design by enabling and disabling the DLCMT-based assistance in the design process. We believe that the methods reviewed, the discussion of opportunities, challenges, potential solutions, and future research directions of applying DLCMT to conceptual product shape design can benefit the data-driven design research in the engineering design community. We hope this review effort can also facilitate the discussion and attract more attention from the engineering design community and industry stakeholders when applying DLCMT to improve the conceptual design of product shapes and beyond.

Footnotes

2

DLCMT is a class of problems, aiming to translate one modality of data to another, e.g., from text to 3D shapes. To solve this problem, there is a large body of literature on cross-modal representation learning (CMRL). CMRL aims to build embeddings using information from multiple modalities (e.g., texts, audio, and images) in a common semantic space, which allows the model to compute cross-modal similarity [4]. In this paper, our review is not limited to reviewing CMRL methods but also includes other deep learning methods that can solve cross-modal problems.

3

Images can include both sketches and natural photos. In the literature, we notice that DLCMT methods of natural photos usually use “image” while the methods of sketches use “sketch” as the keyword, respectively. Also, in engineering design, sketches are usually considered as lines and strokes. To identify DLCMT methods for engineering design, we exclude corresponding methods of “image.”

4

We did not explicitly search for cross-modal manipulation methods because these methods cannot be found directly using specific keywords, but can be indirectly identified during the search for cross-modal retrieval and generation methods. For example, we found the work Text2Mesh [36], using the keyword “text-to-shape generation” because that keyword appears in the literature review section of the article, but the work should belong to manipulation methods after carefully reading its content. However, this might leave room for a more comprehensive review of the cross-modal manipulation methods by developing a different search strategy in the future.

5

https://www.connectedpapers.com/. Connected Papers allow readers to enter an origin paper and can generate a graph of papers with the strongest connections to the origin paper by analyzing about 50,000 research papers.

Acknowledgment

The authors gratefully acknowledge the financial support from the National Science Foundation through award 2207408.

Conflict of Interest

There are no conflicts of interest.

Data Availability Statement

The authors attest that all data for this study are included in the paper.

Appendix A: Details of Literature Search

As introduced in Sec. 3.2, Table 3 shows the number of articles found in major literature databases. In addition, we used the time range of Jan. 2021 to Jun. 2022 to search for the most recent studies for sketch-to-3D shape retrieval and generation, the number of which is indicated in parentheses (e.g., (35) for ShRecSk).

Figure 11 shows the articles that are most relevant to the two key articles [35,48] using Connected Papers (accessed in Jun. 2022). Studies that meet the scope of our review are indicated using a quadrilateral in each sub-figure.

Fig. 11
(a) Studies for sketch-to-3D retrieval that are similar to Ref. [48] and (b) studies for sketch-to-3D generation that are similar to Ref. [35]
Fig. 11
(a) Studies for sketch-to-3D retrieval that are similar to Ref. [48] and (b) studies for sketch-to-3D generation that are similar to Ref. [35]
Close modal

Appendix B: Paper Summary

We summarize and tabulate all 50 articles reviewed in Table 3. There are 11 source journals and 20 conference proceedings, and their acronyms are shown below.

Table T0001
Nomenclature
CGComputers & Graphics
MSMultimedia Systems
VCThe Visual Computer
CGFComputer Graphics Forum
AEIAdvanced Engineering Informatics
TIPIEEE Transactions on Image Processing
MTAMultimedia Tools and Applications
TOGACM Transactions on Graphics
JMDJournal of Mechanical Design
WCMCWireless Communications and Mobile Computing
PACMCGITThe Proceedings of the ACM in Computer Graphics and Interactive Techniques
MMInternational Conference on Multimedia
IUIInternational Conference on Intelligent User Interfaces
CHIConference on Human Factors in Computing Systems
I3DSymposium on Interactive 3D Graphics and Games
MMMInternational Conference on Multimedia Modeling
IMXACM International Conference on Interactive Media Experiences
CVPRComputer Vision and Pattern Recognition Conference
ICCVInternational Conference on Computer Vision
ECCVEuropean Conference on Computer Vision
ACCVAsian Conference on Computer Vision
AAAIAssociation for the Advancement of Artificial Intelligence Conference on Artificial Intelligence
ICIPIEEE International Conference on Image Processing
UISTAnnual ACM Symposium on User Interface Software and Technology
ICCSInternational Conference on Computational Science
ICPRInternational Conference on Pattern Recognition
ICLRInternational Conference on Learning Representations
ICVRVInternational Conference on Virtual Reality and Visualization
3DIMPVTInternational Conference on 3D Imaging Modeling, Processing, Visualization, and Transmission
VISIGRAPPInternational Joint Conference on Computer Vision Imaging and Computer Graphics Theory and Applications
ICCEIA-VRInternational Conference on Computer Engineering and Innovative Application of VR
Nomenclature
CGComputers & Graphics
MSMultimedia Systems
VCThe Visual Computer
CGFComputer Graphics Forum
AEIAdvanced Engineering Informatics
TIPIEEE Transactions on Image Processing
MTAMultimedia Tools and Applications
TOGACM Transactions on Graphics
JMDJournal of Mechanical Design
WCMCWireless Communications and Mobile Computing
PACMCGITThe Proceedings of the ACM in Computer Graphics and Interactive Techniques
MMInternational Conference on Multimedia
IUIInternational Conference on Intelligent User Interfaces
CHIConference on Human Factors in Computing Systems
I3DSymposium on Interactive 3D Graphics and Games
MMMInternational Conference on Multimedia Modeling
IMXACM International Conference on Interactive Media Experiences
CVPRComputer Vision and Pattern Recognition Conference
ICCVInternational Conference on Computer Vision
ECCVEuropean Conference on Computer Vision
ACCVAsian Conference on Computer Vision
AAAIAssociation for the Advancement of Artificial Intelligence Conference on Artificial Intelligence
ICIPIEEE International Conference on Image Processing
UISTAnnual ACM Symposium on User Interface Software and Technology
ICCSInternational Conference on Computational Science
ICPRInternational Conference on Pattern Recognition
ICLRInternational Conference on Learning Representations
ICVRVInternational Conference on Virtual Reality and Visualization
3DIMPVTInternational Conference on 3D Imaging Modeling, Processing, Visualization, and Transmission
VISIGRAPPInternational Joint Conference on Computer Vision Imaging and Computer Graphics Theory and Applications
ICCEIA-VRInternational Conference on Computer Engineering and Innovative Application of VR

References

1.
Ulrich
,
K. T.
, and
Eppinger
,
S. D.
,
2016
,
Product Design and Development
, 6th ed.,
McGraw-Hill Education
,
New York
.
2.
Chakrabarti
,
A.
,
Shea
,
K.
,
Stone
,
R.
,
Cagan
,
J.
,
Campbell
,
M.
,
Hernandez
,
N. V.
, and
Wood
,
K. L.
,
2011
, “
Computer-based Design Synthesis Research: An Overview
,”
ASME J. Comput. Inf. Sci. Eng.
,
11
(
2
), p.
021003
.
3.
Regenwetter
,
L.
,
Nobari
,
A. H.
, and
Ahmed
,
F.
,
2022
, “
Deep Generative Models in Engineering Design: A Review
,”
ASME J. Mech. Des.
,
144
(
7
), p.
071704
.
4.
Liu
,
Z.
,
Lin
,
Y.
, and
Sun
,
M.
,
2020
,
Cross-Modal Representation
,
Springer
,
Singapore
.
5.
Smirnov
,
D.
,
Bessmeltsev
,
M.
, and
Solomon
,
J.
,
2021
, “
Learning Manifold Patch-Based Representations of Man-Made Shapes
,”
International Conference on Learning Representations
,
Virtual
,
May 3–7
.
6.
Guillard
,
B.
,
Remelli
,
E.
,
Yvernay
,
P.
, and
Fua
,
P.
,
2021
, “
Sketch2mesh: Reconstructing and Editing 3d Shapes From Sketches
,”
Proceedings of the IEEE/CVF International Conference on Computer Vision
,
Virtual
,
Oct. 11–17
, pp.
13023
13032
.
7.
Otto
,
K. N.
, and
Wood
,
K.
,
2001
,
Product Design: Techniques in Reverse Engineering and New Product Development
,
Prentice Hall
,
Upper Saddle River, NJ
.
8.
Yang
,
M. C.
,
2009
, “
Observations on Concept Generation and Sketching in Engineering Design
,”
Res. Eng. Des.
,
20
(
1
), pp.
1
11
.
9.
Hyun
,
K. H.
, and
Lee
,
J.-H.
,
2018
, “
Balancing Homogeneity and Heterogeneity in Design Exploration by Synthesizing Novel Design Alternatives Based on Genetic Algorithm and Strategic Styling Decision
,”
Adv. Eng. Inform.
,
38
, pp.
113
128
.
10.
Mountstephens
,
J.
, and
Teo
,
J.
,
2020
, “
Progress and Challenges in Generative Product Design: A Review of Systems
,”
Computers
,
9
(
4
), p.
80
.
11.
Ahmed
,
F.
,
Ramachandran
,
S. K.
,
Fuge
,
M. D.
,
Hunter
,
S. T.
, and
Miller
,
S. R.
,
2018
, “
Interpreting Idea Maps: Pairwise Comparisons Reveal What Makes Ideas Novel
,”
ASME J. Mech. Des.
,
141
(
2
), p.
021102
.
12.
Krish
,
S.
,
2011
, “
A Practical Generative Design Method
,”
Comput. Aided Des.
,
43
(
1
), pp.
88
100
.
13.
Pratt
,
M. J.
,
Anderson
,
B. D.
, and
Ranger
,
T.
,
2005
, “
Towards the Standardized Exchange of Parameterized Feature-Based CAD Models
,”
Comput. Aided Des.
,
37
(
12
), pp.
1251
1265
.
14.
Menezes
,
A.
, and
Lawson
,
B.
,
2006
, “
How Designers Perceive Sketches
,”
Des. Stud.
,
27
(
5
), pp.
571
585
.
15.
Xu
,
P.
,
Hospedales
,
T. M.
,
Yin
,
Q.
,
Song
,
Y.-Z.
,
Xiang
,
T.
, and
Wang
,
L.
,
2023
, “
Deep Learning for Free-Hand Sketch: A Survey
,”
IEEE Trans. Pattern Anal. Mach. Intell.
,
45
(
1
), pp.
285
312
.
16.
Ha
,
D.
, and
Eck
,
D.
,
2018
, “
A Neural Representation of Sketch Drawings
,”
International Conference on Learning Representations.
,
Vancouver Convention Center, Vancouver, BC, Canada
,
Apr. 30–May 3
.
17.
Chen
,
K.
,
Choy
,
C. B.
,
Savva
,
M.
,
Chang
,
A. X.
,
Funkhouser
,
T.
, and
Savarese
,
S.
,
2018
, “
Text2shape: Generating Shapes From Natural Language by Learning Joint Embeddings
,”
Asian Conference on Computer Vision
,
Perth, Australia
,
Dec. 2–6
, pp.
100
116
.
18.
Fukamizu
,
K.
,
Kondo
,
M.
, and
Sakamoto
,
R.
,
2019
, “
Generation High Resolution 3d Model From Natural Language by Generative Adversarial Network
,” Preprint arXiv:1901.07165.
19.
Nozawa
,
N.
,
Shum
,
H. P.
,
Ho
,
E. S.
, and
Morishima
,
S.
,
2020
, “
Single Sketch Image Based 3d Car Shape Reconstruction With Deep Learning and Lazy Learning
,”
International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications
,
Valletta, Malta
,
Feb. 27–29
, pp.
179
190
.
20.
Nozawa
,
N.
,
Shum
,
H. P.
,
Feng
,
Q.
,
Ho
,
E. S.
, and
Morishima
,
S.
,
2022
, “
3d Car Shape Reconstruction From a Contour Sketch Using GAN and Lazy Learning
,”
Vis. Comput.
,
38
(
4
), pp.
1317
1330
.
21.
Wendrich
,
R. E.
,
2018
, “
Multiple Modalities, Sensoriums, Experiences in Blended Spaces With Toolness and Tools for Conceptual Design Engineering
,”
International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, Vol. 51739
,
Quebec City, Canada
,
Aug. 26–29
,
p. V01BT02A046
.
22.
Song
,
B.
,
Miller
,
S.
, and
Ahmed
,
F.
,
2022
, “
Hey, Ai! Can You See What I See? Multimodal Transfer Learning-Based Design Metrics Prediction for Sketches With Text Descriptionss
,”
ASME 2022 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference
,
St. Louis, MO
,
Aug. 14–17
,
p. V006T06A017
.
23.
Song
,
B.
,
Zurita
,
N. S.
,
Zhang
,
G.
,
Stump
,
G.
,
Balon
,
C.
,
Miller
,
S.
,
Yukish
,
M.
,
Cagan
,
J.
, and
McComb
,
C.
,
2020
, “
Toward Hybrid Teams: A Platform to Understand Human–Computer Collaboration During the Design of Complex Engineered Systems
,”
Proceedings of the Design Society: DESIGN Conference
,
Virtual
,
Oct. 26–29
, pp.
1551
1560
.
24.
Li
,
X.
,
Wang
,
Y.
, and
Sha
,
Z.
,
2022
, “
Deep Learning of Cross-Modal Tasks for Conceptual Design of Engineered Products: A Review
,”
International Design Engineering Technical Conferences and Computers and Information in Engineering Conference
,
St. Louis, MO
,
Aug. 14–17
,
p. V006T06A016
.
25.
Chen
,
W.
,
Chiu
,
K.
, and
Fuge
,
M. D.
,
2020
, “
Airfoil Design Parameterization and Optimization Using Bézier Generative Adversarial Networks
,”
AIAA J.
,
58
(
11
), pp.
4723
4735
.
26.
Oh
,
S.
,
Jung
,
Y.
,
Kim
,
S.
,
Lee
,
I.
, and
Kang
,
N.
,
2019
, “
Deep Generative Design: Integration of Topology Optimization and Generative Models
,”
ASME J. Mech. Des.
,
141
(
11
), p.
111405
.
27.
Dering
,
M.
,
Cunningham
,
J.
,
Desai
,
R.
,
Yukish
,
M. A.
,
Simpson
,
T. W.
, and
Tucker
,
C. S.
,
2018
, “
A Physics-Based Virtual Environment for Enhancing the Quality of Deep Generative Designs
,”
ASME 2018 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference
,
Quebec City, Quebec, Canada
,
Aug. 26–29
,
p. V02AT03A015
.
28.
Shu
,
D.
,
Cunningham
,
J.
,
Stump
,
G.
,
Miller
,
S. W.
,
Yukish
,
M. A.
,
Simpson
,
T. W.
, and
Tucker
,
C. S.
,
2020
, “
3D Design Using Generative Adversarial Networks and Physics-Based Validation
,”
ASME J. Mech. Des.
,
142
(
7
), p.
071701
.
29.
Zhang
,
W.
,
Yang
,
Z.
,
Jiang
,
H.
,
Nigam
,
S.
,
Yamakawa
,
S.
,
Furuhata
,
T.
,
Shimada
,
K.
, and
Kara
,
L. B.
,
2019
, “
3d Shape Synthesis for Conceptual Design and Optimization Using Variational Autoencoders
,”
International Design Engineering Technical Conferences and Computers and Information in Engineering Conference
,
Anaheim, CA
,
Aug. 18–21
,
p. V02AT03A017
.
30.
Li
,
X.
,
Xie
,
C.
, and
Sha
,
Z.
,
2021
, “
Part-Aware Product Design Agent Using Deep Generative Network and Local Linear Embedding
,”
Proceedings of the 54th Hawaii International Conference on System Sciences
,
Virtual
,
Jan. 5–8
,
p. 5250
.
31.
Brock
,
A.
,
Lim
,
T.
,
Ritchie
,
J. M.
, and
Weston
,
N.
,
2016
, “
Context-Aware Content Generation for Virtual Environments
,”
International Design Engineering Technical Conferences and Computers and Information in Engineering Conference
,
Charlotte, NC
,
Aug. 21–24
,
p. V01BT02A045
.
32.
Qin
,
F.
,
Qiu
,
S.
,
Gao
,
S.
, and
Bai
,
J.
,
2022
, “
3d CAD Model Retrieval Based on Sketch and Unsupervised Variational Autoencoder
,”
Adv. Eng. Inform.
,
51
, p.
101427
.
33.
Li
,
X.
,
Xie
,
C.
, and
Sha
,
Z.
,
2022
, “
A Predictive and Generative Design Approach for Three-Dimensional Mesh Shapes Using Target-Embedding Variational Autoencoder
,”
ASME J. Mech. Des.
,
144
(
11
), p.
114501
.
34.
Qi
,
A.
,
Gryaditskaya
,
Y.
,
Song
,
J.
,
Yang
,
Y.
,
Qi
,
Y.
,
Hospedales
,
T. M.
,
Xiang
,
T.
, and
Song
,
Y.-Z.
,
2021
, “
Toward Fine-Grained Sketch-Based 3d Shape Retrieval
,”
IEEE Trans. Image Process.
,
30
, pp.
8595
8606
.
35.
Lun
,
Z.
,
Gadelha
,
M.
,
Kalogerakis
,
E.
,
Maji
,
S.
, and
Wang
,
R.
,
2017
, “
3d Shape Reconstruction From Sketches Via Multi-View Convolutional Networks
,”
2017 International Conference on 3D Vision (3DV)
,
Qingdao, China
,
Oct. 10–12
, pp.
67
77
.
36.
Michel
,
O.
,
Bar-On
,
R.
,
Liu
,
R.
,
Benaim
,
S.
, and
Hanocka
,
R.
,
2022
, “
Text2mesh: Text-Driven Neural Stylization for Meshes
,”
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
,
New Orleans, LA
,
June 21–24
, pp.
13492
13502
.
37.
Elgammal
,
A.
,
Liu
,
B.
,
Elhoseiny
,
M.
, and
Mazzone
,
M.
,
2017
, “
Art’ by Learning About Styles and Deviating From Style Norms
,”
Proceedings of the 8th International Conference on Computational Creativity
,
Atlanta, GA
,
June 19–23
.
38.
Chen
,
W.
, and
Ahmed
,
F.
,
2021
, “
PaDGAN: Learning to Generate High-Quality Novel Designs
,”
ASME J. Mech. Des.
,
143
(
3
), p.
031703
.
39.
Burnap
,
A.
,
Liu
,
Y.
,
Pan
,
Y.
,
Lee
,
H.
,
Gonzalez
,
R.
, and
Papalambros
,
P. Y.
,
2016
, “
Estimating and Exploring the Product Form Design Space Using Deep Generative Models
,”
International Design Engineering Technical Conferences and Computers and Information in Engineering Conference
,
Charlotte, NC
,
Aug. 21–24
,
p. V02AT03A013
.
40.
Judd
,
G.
, and
Steenkiste
,
P.
,
2003
, “
Providing Contextual Information to Pervasive Computing Applications
,”
Proceedings of the First IEEE International Conference on Pervasive Computing and Communications
,
Fort Worth, TX
,
Mar. 23–26
, pp.
133
142
.
41.
Valdez
,
S.
,
Seepersad
,
C.
, and
Kambampati
,
S.
,
2021
, “
A Framework for Interactive Structural Design Exploration
,”
International Design Engineering Technical Conferences and Computers and Information in Engineering Conference
,
Virtual
,
Aug. 17–19
,
p. V03BT03A006
.
42.
Starly
,
B.
,
Angrish
,
A.
, and
Cohen
,
P.
,
2019
, “
Research Directions in Democratizing Innovation Through Design Automation, One-Click Manufacturing Services and Intelligent Machines
,” Preprint arXiv:1909.10476.
43.
Sanghi
,
A.
,
Chu
,
H.
,
Lambourne
,
J. G.
,
Wang
,
Y.
,
Cheng
,
C.-Y.
,
Fumero
,
M.
, and
Malekshan
,
K. R.
,
2022
, “
Clip-Forge: Towards Zero-Shot Text-to-Shape Generation
,”
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
,
New Orleans, LA
,
June 21–24
, pp.
18603
18613
.
44.
Giunchi
,
D.
,
Sztrajman
,
A.
,
James
,
S.
, and
Steed
,
A.
,
2021
, “
Mixing Modalities of 3D Sketching and Speech for Interactive Model Retrieval in Virtual Reality
,”
ACM International Conference on Interactive Media Experiences
,
Virtual
,
June 21–23
, pp.
144
155
.
45.
Khan
,
K. S.
,
Kunz
,
R.
,
Kleijnen
,
J.
, and
Antes
,
G.
,
2003
, “
Five Steps to Conducting a Systematic Review
,”
J. Royal Soc. Med.
,
96
(
3
), pp.
118
121
.
46.
Kingma
,
D. P.
, and
Welling
,
M.
,
2014
, “
Auto-Encoding Variational Bayes
,”
International Conference on Learning Representation
,
Banff, Canada
,
Apr. 14–16
.
47.
Goodfellow
,
I.
,
Pouget-Abadie
,
J.
,
Mirza
,
M.
,
Xu
,
B.
,
Warde-Farley
,
D.
,
Ozair
,
S.
,
Courville
,
A.
, and
Bengio
,
Y.
,
2014
, “
Generative Adversarial Nets
,”
Conference on Neural Information Processing Systems
,
Montreal, Canada
,
Dec. 8–13
, pp.
2672
2680
.
48.
Wang
,
F.
,
Kang
,
L.
, and
Li
,
Y.
,
2015
, “
Sketch-Based 3D Shape Retrieval Using Convolutional Neural Networks
,”
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
,
Boston, MA
,
June 8–10
, pp.
1875
1883
.
49.
Chang
,
A. X.
,
Funkhouser
,
T.
,
Guibas
,
L.
,
Hanrahan
,
P.
,
Huang
,
Q.
,
Li
,
Z.
,
Savarese
,
S.
,
Savva
,
M.
,
Song
,
S.
, and
Su
,
H.
,
2015
, “
Shapenet: An Information-Rich 3D Model Repository
,” Preprint arXiv:1512.03012.
50.
Liu
,
Z.
,
Wang
,
Y.
,
Qi
,
X.
, and
Fu
,
C.-W.
,
2022
, “
Towards Implicit Text-Guided 3D Shape Generation
,”
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
,
New Orleans, LA
,
June 21–24
, pp.
17896
17906
.
51.
Jin
,
A.
,
Fu
,
Q.
, and
Deng
,
Z.
,
2020
, “
Contour-Based 3D Modeling Through Joint Embedding of Shapes and Contours
,”
Symposium on Interactive 3D Graphics and Games
,
Virtual
,
Sept. 14–18
, pp.
1
10
.
52.
Radford
,
A.
,
Kim
,
J. W.
,
Hallacy
,
C.
,
Ramesh
,
A.
,
Goh
,
G.
,
Agarwal
,
S.
,
Sastry
,
G.
,
Askell
,
A.
,
Mishkin
,
P.
,
Clark
,
J.
, and
Krueger
,
G.
,
2021
, “
Learning Transferable Visual Models From Natural Language Supervision
,”
International Conference on Machine Learning
,
Virtual
,
July 18–24
, pp.
8748
8763
.
53.
Huang
,
F.
, and
Canny
,
J. F.
,
2019
, “
Sketchforme: Composing Sketched Scenes From Text Descriptions for Interactive Applications
,”
Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology
,
New Orleans, LA
,
Oct. 20–23
, pp.
209
220
.
54.
Huang
,
F.
,
Schoop
,
E.
,
Ha
,
D.
, and
Canny
,
J.
,
2020
, “
Scones: Towards Conversational Authoring of Sketches
,”
Proceedings of the 25th International Conference on Intelligent User Interfaces
,
Cagliari, Italy
,
Mar. 17–20
, pp.
313
323
.
55.
Li
,
B.
,
Yuan
,
J.
,
Ye
,
Y.
,
Lu
,
Y.
,
Zhang
,
C.
, and
Tian
,
Q.
,
2021
, “
3D Sketching for 3D Object Retrieval
,”
Multimedia Tools Appl.
,
80
(
6
), pp.
9569
9595
.
56.
Li
,
C.
,
Pan
,
H.
,
Liu
,
Y.
,
Tong
,
X.
,
Sheffer
,
A.
, and
Wang
,
W.
,
2018
, “
Robust Flow-Guided Neural Prediction for Sketch-Based Freeform Surface Modeling
,”
ACM Trans. Graph.
,
37
(
6
), pp.
1
12
.
57.
Delanoy
,
J.
,
Aubry
,
M.
,
Isola
,
P.
,
Efros
,
A. A.
, and
Bousseau
,
A.
,
2018
, “
3D Sketching Using Multi-view Deep Volumetric Prediction
,”
Proc. ACM Comput. Graph. Interact. Tech.
,
1
(
1
), pp.
1
22
.
58.
Han
,
X.
,
Gao
,
C.
, and
Yu
,
Y.
,
2017
, “
Deepsketch2face: A Deep Learning Based Sketching System for 3D Face and Caricature Modeling
,”
ACM Trans. Graph.
,
36
(
4
), pp.
1
12
.
59.
Du
,
D.
,
Zhu
,
H.
,
Nie
,
Y.
,
Han
,
X.
,
Cui
,
S.
,
Yu
,
Y.
, and
Liu
,
L.
,
2021
, “
Learning Part Generation and Assembly for Sketching Man-Made Objects
,”
Comput. Graph. Forum
,
40
(
1
), pp.
222
233
.
60.
Luo
,
Z.
,
Zhou
,
J.
,
Zhu
,
H.
,
Du
,
D.
,
Han
,
X.
, and
Fu
,
H.
,
2021
, “
Simpmodeling: Sketching Implicit Field to Guide Mesh Modeling for 3D Animalmorphic Head Design
,”
The 34th Annual ACM Symposium on User Interface Software and Technology
,
Virtual
,
Oct. 10–14
, pp.
854
863
.
61.
Wang
,
C.
,
Chai
,
M.
,
He
,
M.
,
Chen
,
D.
, and
Liao
,
J.
,
2022
, “
Clip-nerf: Text-and-Image Driven Manipulation of Neural Radiance Fields
,”
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
,
New Orleans, LA
,
June 21–24
, pp.
3835
3844
.
62.
Stemasov
,
E.
,
Wagner
,
T.
,
Gugenheimer
,
J.
, and
Rukzio
,
E.
,
2022
, “
Shapefindar: Exploring In-Situ Spatial Search for Physical Artifact Retrieval Using Mixed Reality
,”
CHI Conference on Human Factors in Computing Systems
,
New Orleans, LA
,
Apr. 30–May 5
, pp.
1
12
.
63.
Yuan
,
S.
,
Dai
,
A.
,
Yan
,
Z.
,
Guo
,
Z.
,
Liu
,
R.
, and
Chen
,
M.
,
2021
, “
Sketchbird: Learning to Generate Bird Sketches From Text
,”
Proceedings of the IEEE/CVF International Conference on Computer Vision
,
Virtual
,
Oct. 11–17
, pp.
2443
2452
.
64.
Min
,
P.
,
Kazhdan
,
M.
, and
Funkhouser
,
T.
,
2004
, “
A Comparison of Text and Shape Matching for Retrieval of Online 3D Models
,”
International Conference on Theory and Practice of Digital Libraries
,
Bath, UK
,
Sept. 12–17
, pp.
209
220
.
65.
Haeusser
,
P.
,
Mordvintsev
,
A.
, and
Cremers
,
D.
,
2017
, “
Learning by Association-A Versatile Semi-Supervised Training Method for Neural Networks
,”
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
,
Honolulu, HI
,
July 22–25
, pp.
89
98
.
66.
Han
,
Z.
,
Shang
,
M.
,
Wang
,
X.
,
Liu
,
Y.-S.
, and
Zwicker
,
M.
,
2019
, “
Y2seq2seq: Cross-Modal Representation Learning for 3D Shape and Text by Joint Reconstruction and Prediction of View and Word Sequences
,”
Proceedings of the AAAI Conference on Artificial Intelligence
,
Honolulu, HI
,
Jan. 27–Feb. 1
, pp.
126
133
.
67.
Shilane
,
P.
,
Min
,
P.
,
Kazhdan
,
M.
, and
Funkhouser
,
T.
,
2004
, “
The Princeton Shape Benchmark
,”
Proceedings Shape Modeling Applications, 2004
,
Genova, Italy
,
June 7–9
, pp.
167
178
.
68.
Li
,
B.
,
Lu
,
Y.
,
Godil
,
A.
,
Schreck
,
T.
,
Bustos
,
B.
,
Ferreira
,
A.
,
Furuya
,
T.
,
Fonseca
,
M. J.
,
Johan
,
H.
,
Matsuda
,
T.
, and
Ohbuchi
,
R.
,
2014
, “
A Comparison of Methods for Sketch-Based 3D Shape Retrieval
,”
Comput. Vis. Image Understand.
,
119
, pp.
57
80
.
69.
Chopra
,
S.
,
Hadsell
,
R.
, and
LeCun
,
Y.
,
2005
, “
Learning a Similarity Metric Discriminatively, With Application to Face Verification
,”
IEEE Computer Society Conference on Computer Vision and Pattern Recognition
,
San Diego, CA
,
June 20–26
, pp.
539
546
.
70.
Zhu
,
F.
,
Xie
,
J.
, and
Fang
,
Y.
,
2016
, “
Learning Cross-Domain Neural Networks for Sketch-Based 3D Shape Retrieval
,”
Proceedings of the AAAI Conference on Artificial Intelligence
,
Phoenix, AZ
,
Feb. 12–17
, pp.
3683
3689
.
71.
Dai
,
G.
,
Xie
,
J.
, and
Fang
,
Y.
,
2018
, “
Deep Correlated Holistic Metric Learning for Sketch-Based 3D Shape Retrieval
,”
IEEE Trans. Image Process.
,
27
(
7
), pp.
3374
3386
.
72.
Dai
,
G.
,
Xie
,
J.
,
Zhu
,
F.
, and
Fang
,
Y.
,
2017
, “
Deep Correlated Metric Learning for Sketch-Based 3D Shape Retrieval
,”
Thirty-First AAAI Conference on Artificial Intelligence.
,
San Francisco, CA
,
Feb. 4–9
, pp.
4002
4008
.
73.
Chen
,
J.
, and
Fang
,
Y.
,
2018
, “
Deep Cross-Modality Adaptation Via Semantics Preserving Adversarial Learning for Sketch-Based 3D Shape Retrieval
,”
Proceedings of the European Conference on Computer Vision (ECCV)
,
Munich, Germany
,
Sept. 8–14
, pp.
605
620
.
74.
Xia
,
Y.
,
Wang
,
S.
,
You
,
L.
, and
Zhang
,
J.
,
2021
, “
Semantic Similarity Metric Learning for Sketch-Based 3D Shape Retrieval
,”
International Conference on Computational Science
,
Krakow, Poland
,
June 16–18
, pp.
59
69
.
75.
Yang
,
H.
,
Tian
,
Y.
,
Yang
,
C.
,
Wang
,
Z.
,
Wang
,
L.
, and
Li
,
H.
,
2022
, “
Sequential Learning for Sketch-Based 3D Model Retrieval
,”
Multimedia Syst.
,
28
(
3
), pp.
761
778
.
76.
Kaya
,
M.
, and
Bilge
,
H. Ş.
,
2019
, “
Deep Metric Learning: A Survey
,”
Symmetry
,
11
(
9
), p.
1066
.
77.
Xie
,
J.
,
Dai
,
G.
,
Zhu
,
F.
, and
Fang
,
Y.
,
2017
, “
Learning Barycentric Representations of 3D Shapes for Sketch-Based 3D Shape Retrieval
,”
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
,
Honolulu, HI
,
July 21–26
, pp.
5068
5076
.
78.
Chen
,
J.
,
Qin
,
J.
,
Liu
,
L.
,
Zhu
,
F.
,
Shen
,
F.
,
Xie
,
J.
, and
Shao
,
L.
,
2019
, “
Deep Sketch-Shape Hashing With Segmented 3D Stochastic Viewing
,”
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
,
Long Beach, CA
,
June 16–20
, pp.
791
800
.
79.
Niu
,
Z.
,
Zhong
,
G.
, and
Yu
,
H.
,
2021
, “
A Review on the Attention Mechanism of Deep Learning
,”
Neurocomputing
,
452
, pp.
48
62
.
80.
Liang
,
S.
,
Dai
,
W.
, and
Wei
,
Y.
,
2021
, “
Uncertainty Learning for Noise Resistant Sketch-Based 3D Shape Retrieval
,”
IEEE Trans. Image Process.
,
30
, pp.
8632
8643
.
81.
Liu
,
Q.
, and
Zhao
,
S.
,
2021
, “
Guidance Cleaning Network for Sketch-Based 3D Shape Retrieval
,”
J. Phys.: Conf. Ser.
,
1961
(
1
), p.
012072
.
82.
Li
,
B.
,
Lu
,
Y.
,
Godil
,
A.
,
Schreck
,
T.
,
Aono
,
M.
,
Johan
,
H.
,
Saavedra
,
J. M.
, and
Tashiro
,
S.
,
2013
, “
SHREC’13 Track: Large Scale Sketch-Based 3D Shape Retrieval
,”
Proceedings of the Sixth Eurographics Workshop on 3D Object Retrieval
,
Girona, Spain
,
May 11
, pp.
89
96
.
83.
Li
,
B.
,
Lu
,
Y.
,
Li
,
C.
,
Godil
,
A.
,
Schreck
,
T.
,
Aono
,
M.
,
Burtscher
,
M.
,
Fu
,
H.
,
Furuya
,
T.
,
Johan
,
H.
, and
Liu
,
J.
,
2014
, “
Shrec’14 Track: Extended Large Scale Sketch-Based 3D Shape Retrieval
,”
Eurographics Workshop on 3D Object Retrieval
,
Strasbourg, France
,
Apr. 6
, pp.
121
130
.
84.
Su
,
H.
,
Maji
,
S.
,
Kalogerakis
,
E.
, and
Learned-Miller
,
E.
,
2015
, “
Multi-View Convolutional Neural Networks for 3D Shape Recognition
,”
Proceedings of the IEEE International Conference on Computer Vision
,
Santiago, Chile
,
Dec. 7–13
, pp.
945
953
.
85.
Navarro
,
P.
,
Orlando
,
J. I.
,
Delrieux
,
C.
, and
Iarussi
,
E.
,
2021
, “
Sketchzooms: Deep Multi-View Descriptors for Matching Line Drawings
,”
Comput. Graph. Forum
,
40
(
1
), pp.
410
423
.
86.
Manda
,
B.
,
Dhayarkar
,
S.
,
Mitheran
,
S.
,
Viekash
,
V.
, and
Muthuganapathy
,
R.
,
2021
, “
‘Cadsketchnet’—An Annotated Sketch Dataset for 3D CAD Model Retrieval With Deep Neural Networks
,”
Comput. Graph.
,
99
, pp.
100
113
.
87.
Jayanti
,
S.
,
Kalyanaraman
,
Y.
,
Iyer
,
N.
, and
Ramani
,
K.
,
2006
, “
Developing an Engineering Shape Benchmark for CAD Models
,”
Comput. Aided Des.
,
38
(
9
), pp.
939
953
.
88.
Kim
,
S.
,
Chi
,
H.-G.
,
Hu
,
X.
,
Huang
,
Q.
, and
Ramani
,
K.
,
2020
, “
A Large-Scale Annotated Mechanical Components Benchmark for Classification and Retrieval Tasks With Deep Neural Networks
,”
European Conference on Computer Vision
,
Virtual
,
Aug. 23–28
, pp.
175
191
.
89.
Ye
,
Y.
,
Li
,
B.
, and
Lu
,
Y.
,
2016
, “
3D Sketch-Based 3D Model Retrieval With Convolutional Neural Network
,”
International Conference on Pattern Recognition
,
Cancun, Mexico
,
Dec. 4–8
, pp.
2936
2941
.
90.
Yang
,
Y.
, and
Hospedales
,
T. M.
,
2015
, “
Deep Neural Networks for Sketch Recognition
,” Preprint arXiv:1501.07873, 1(2), p.
3
.
91.
Li
,
B.
,
Lu
,
Y.
,
Duan
,
F.
,
Dong
,
S.
,
Fan
,
Y.
,
Qian
,
L.
,
Laga
,
H.
,
Li
,
H.
,
Li
,
Y.
,
Lui
,
P.
, and
Ovsjanikov
,
M.
,
2016
, “
Shrec’16 Track: 3D Sketch-Based 3D Shape Retrieval
,”
Proceedings of the Eurographics 2016 Workshop on 3D Object Retrieval
,
Lisbon, Portugal
,
May 8
, pp.
47
54
.
92.
Giunchi
,
D.
,
James
,
S.
, and
Steed
,
A.
,
2018
, “
3D Sketching for Interactive Model Retrieval in Virtual Reality
,”
Proceedings of the Joint Symposium on Computational Aesthetics and Sketch-Based Interfaces and Modeling and Non-Photorealistic Animation and Rendering
,
Victoria, Canada
,
Aug. 17–19
, pp.
1
12
.
93.
Jahan
,
T.
,
Guan
,
Y.
, and
van Kaick
,
O.
,
2021
, “
Semantics-Guided Latent Space Exploration for Shape Generation
,”
Comput. Graph. Forum
,
40
(
2
), pp.
115
126
.
94.
Wang
,
Y.
,
Asafi
,
S.
,
Van Kaick
,
O.
,
Zhang
,
H.
,
Cohen-Or
,
D.
, and
Chen
,
B.
,
2012
, “
Active Co-analysis of a Set of Shapes
,”
ACM Trans. Graph.
,
31
(
6
), pp.
1
10
.
95.
Wu
,
Z.
,
Song
,
S.
,
Khosla
,
A.
,
Yu
,
F.
,
Zhang
,
L.
,
Tang
,
X.
, and
Xiao
,
J.
,
2015
, “
3d Shapenets: A Deep Representation for Volumetric Shapes
,”
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
,
Boston, MA
,
June 7–12
, pp.
1912
1920
.
96.
Arjovsky
,
M.
,
Chintala
,
S.
, and
Bottou
,
L.
,
2017
, “
Wasserstein Generative Adversarial Networks
,”
International Conference on Machine Learning
,
Sydney, Australia
,
Aug. 6–11
, pp.
214
223
.
97.
Li
,
B.
,
Yu
,
Y.
, and
Li
,
Y.
,
2020
, “
Lbwgan: Label Based Shape Synthesis From Text With WGANS
,”
2020 International Conference on Virtual Reality and Visualization (ICVRV)
,
Recife, Brazil
,
Nov. 13–14
, pp.
47
52
.
98.
Mescheder
,
L.
,
Oechsle
,
M.
,
Niemeyer
,
M.
,
Nowozin
,
S.
, and
Geiger
,
A.
,
2019
, “
Occupancy Networks: Learning 3D Reconstruction in Function Space
,”
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
,
Long Beach, CA
,
June 16–20
, pp.
4460
4470
.
99.
Vaswani
,
A.
,
Shazeer
,
N.
,
Parmar
,
N.
,
Uszkoreit
,
J.
,
Jones
,
L.
,
Gomez
,
A. N.
,
Kaiser
,
Ł.
, and
Polosukhin
,
I.
,
2017
, “
Attention Is All You Need
,”
Conference on Neural Information Processing Systems
,
Long Beach, CA
,
Dec. 4–9
, pp.
5998
6008
.
100.
Xian
,
Y.
,
Lampert
,
C. H.
,
Schiele
,
B.
, and
Akata
,
Z.
,
2018
, “
Zero-Shot Learning—A Comprehensive Evaluation of the Good, the Bad and the Ugly
,”
IEEE Trans. Pattern Anal. Mach. Intell.
,
41
(
9
), pp.
2251
2265
.
101.
Dinh
,
L.
,
Sohl-Dickstein
,
J.
, and
Bengio
,
S.
,
2017
, “
Density Estimation Using Real NVP
,”
International Conference on Learning Representations
,
Toulon, France
,
Apr. 24–26
.
102.
Jain
,
A.
,
Mildenhall
,
B.
,
Barron
,
J. T.
,
Abbeel
,
P.
, and
Poole
,
B.
,
2022
, “
Zero-Shot Text-Guided Object Generation With Dream Fields
,”
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
,
New Orleans, LA
,
June 21–24
, pp.
867
876
.
103.
Mildenhall
,
B.
,
Srinivasan
,
P. P.
,
Tancik
,
M.
,
Barron
,
J. T.
,
Ramamoorthi
,
R.
, and
Ng
,
R.
,
2020
, “
Nerf: Representing Scenes as Neural Radiance Fields for View Synthesis
,”
European Conference on Computer Vision
,
Virtual
,
Aug. 23–28
, pp.
405
421
.
104.
Frolov
,
S.
,
Hinz
,
T.
,
Raue
,
F.
,
Hees
,
J.
, and
Dengel
,
A.
,
2021
, “
Adversarial Text-to-Image Synthesis: A Review
,”
Neural Netw.
,
144
, pp.
187
209
.
105.
Wang
,
Y.
,
Chang
,
L.
,
Cheng
,
Y.
,
Jin
,
L.
,
Cheng
,
Z.
,
Deng
,
X.
, and
Duan
,
F.
,
2018
, “
Text2sketch: Learning Face Sketch From Facial Attribute Text
,”
2018 25th IEEE International Conference on Image Processing (ICIP)
,
Athens, Greece
,
Oct. 7–10
, pp.
669
673
.
106.
Wah
,
C.
,
Branson
,
S.
,
Welinder
,
P.
,
Perona
,
P.
, and
Belongie
,
S.
,
2010
, “
Caltech-UCSD Birds 200
,”
California Institute of Technology. CNS-TR-2010-001
.
107.
Krishna
,
R.
,
Zhu
,
Y.
,
Groth
,
O.
,
Johnson
,
J.
,
Hata
,
K.
,
Kravitz
,
J.
,
Chen
,
S.
,
Kalantidis
,
Y.
,
Li
,
L.-J.
,
Shamma
,
D. A.
,
Bernstein
,
M. S.
, and
Fei-Fei
,
L.
,
2017
, “
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
,”
Int. J. Comput. Vis.
,
123
(
1
), pp.
32
73
.
108.
Jongejan
,
J.
,
Rowley
,
H.
,
Kawashima
,
T.
,
Kim
,
J.
, and
Fox-Gieg
,
N.
,
2016
, “
The Quick, Draw!-AI Experiment
,”
Mount View, CA
, p.
4
, Accessed February 17, 2018.
109.
Olsen
,
L.
,
Samavati
,
F. F.
,
Sousa
,
M. C.
, and
Jorge
,
J. A.
,
2009
, “
Sketch-Based Modeling: A Survey
,”
Comput. Graph.
,
33
(
1
), pp.
85
103
.
110.
Nishida
,
G.
,
Garcia-Dorado
,
I.
,
Aliaga
,
D. G.
,
Benes
,
B.
, and
Bousseau
,
A.
,
2016
, “
Interactive Sketching of Urban Procedural Models
,”
ACM Trans. Graph.
,
35
(
4
), pp.
1
11
.
111.
He
,
Y.
,
Xie
,
H.
,
Zhang
,
C.
,
Yang
,
X.
, and
Miyata
,
K.
,
2021
, “
Sketch-Based Normal Map Generation With Geometric Sampling
,”
International Workshop on Advanced Imaging Technology (IWAIT)
,
Virtual
,
Jan. 5–6
, pp.
261
266
.
112.
Su
,
W.
,
Du
,
D.
,
Yang
,
X.
,
Zhou
,
S.
, and
Fu
,
H.
,
2018
, “
Interactive Sketch-Based Normal Map Generation With Deep Neural Networks
,”
Proc. ACM Comput. Graph. Interact. Tech.
,
1
(
1
), pp.
1
17
.
113.
Aha
,
D. W.
,
2013
,
Lazy Learning
, 1st ed.,
Springer Science & Business Media
,
Dordrecht, Netherlands
.
114.
Delanoy
,
J.
,
Coeurjolly
,
D.
,
Lachaud
,
J.-O.
, and
Bousseau
,
A.
,
2019
, “
Combining Voxel and Normal Predictions for Multi-view 3d Sketching
,”
Comput. Graph.
,
82
, pp.
65
72
.
115.
Yang
,
K.
,
Lu
,
J.
,
Hu
,
S.
, and
Chen
,
X.
,
2021
, “
Deep 3D Modeling of Human Bodies From Freehand Sketching
,”
International Conference on Multimedia Modeling
,
Phu Quoc, Vietnam
,
June 6–10
, pp.
36
48
.
116.
Pavlakos
,
G.
,
Choutas
,
V.
,
Ghorbani
,
N.
,
Bolkart
,
T.
,
Osman
,
A. A.
,
Tzionas
,
D.
, and
Black
,
M. J.
,
2019
, “
Expressive Body Capture: 3D Hands, Face, and Body From a Single Image
,”
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
,
Long Beach, CA
,
June 16–20
, pp.
10975
10985
.
117.
Cao
,
C.
,
Weng
,
Y.
,
Zhou
,
S.
,
Tong
,
Y.
, and
Zhou
,
K.
,
2013
, “
Facewarehouse: A 3D Facial Expression Database for Visual Computing
,”
IEEE Trans. Vis. Comput. Graph.
,
20
(
3
), pp.
413
425
.
118.
Wang
,
F.
,
Yang
,
Y.
,
Zhao
,
B.
,
Jiang
,
D.
,
Chen
,
S.
, and
Sheng
,
J.
,
2021
, “
Reconstructing 3D Model From Single-View Sketch With Deep Neural Network
,”
Wireless Commun. Mobile Comput.
,
2021
. .
Article ID 5577530
.
119.
Park
,
J. J.
,
Florence
,
P.
,
Straub
,
J.
,
Newcombe
,
R.
, and
Lovegrove
,
S.
,
2019
, “
Deepsdf: Learning Continuous Signed Distance Functions for Shape Representation
,”
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
,
Long Beach, CA
,
June 16–20
, pp.
165
174
.
120.
Zhang
,
S.-H.
,
Guo
,
Y.-C.
, and
Gu
,
Q.-W.
,
2021
, “
Sketch2model: View-Aware 3D Modeling From Single Free-Hand Sketches
,”
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
,
Virtual
,
June 19–25
, pp.
6012
6021
.
121.
Wang
,
L.
,
Qian
,
C.
,
Wang
,
J.
, and
Fang
,
Y.
,
2018
, “
Unsupervised Learning of 3D Model Reconstruction From Hand-Drawn Sketches
,”
Proceedings of the 26th ACM International Conference on Multimedia
,
Seoul, South Korea
,
Oct. 22–26
, pp.
1820
1828
.
122.
Smirnov
,
D.
,
Bessmeltsev
,
M.
, and
Solomon
,
J.
,
2019
, “
Deep Sketch-Based Modeling of Man-Made Shapes
,” Preprint arXiv: 1906.12337.
123.
Gao
,
L.
,
Yang
,
J.
,
Wu
,
T.
,
Yuan
,
Y.-J.
,
Fu
,
H.
,
Lai
,
Y.-K.
, and
Zhang
,
H.
,
2019
, “
Sdm-net: Deep Generative Network for Structured Deformable Mesh
,”
ACM Trans. Graph.
,
38
(
6
), pp.
1
15
.
124.
Mo
,
K.
,
Guerrero
,
P.
,
Yi
,
L.
,
Su
,
H.
,
Wonka
,
P.
,
Mitra
,
N. J.
, and
Guibas
,
L. J.
,
2019
, “
Structurenet: Hierarchical Graph Networks for 3D Shape Generation
,”
ACM Trans. Graph.
,
38
(
6
), pp.
1
19
.
125.
Chen
,
W.
, and
Fuge
,
M.
,
2019
, “
Synthesizing Designs With Interpart Dependencies Using Hierarchical Generative Adversarial Networks
,”
ASME J. Mech. Des.
,
141
(
11
), p.
111403
.
126.
Qi
,
C. R.
,
Su
,
H.
,
Mo
,
K.
, and
Guibas
,
L. J.
,
2017
, “
Pointnet: Deep Learning on Point Sets for 3D Classification and Segmentation
,”
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
,
Honolulu, HI
,
July 22–25
, pp.
652
660
.
127.
Yang
,
M. C.
,
2003
, “
Concept Generation and Sketching: Correlations With Design Outcome
,”
International Design Engineering Technical Conferences and Computers and Information in Engineering Conference
,
Chicago, IL
,
Aug. 3–6
, pp.
829
834
.
128.
Wu
,
R.
,
Xiao
,
C.
, and
Zheng
,
C.
,
2021
, “
Deepcad: A Deep Generative Network for Computer-Aided Design Models
,”
Proceedings of the IEEE/CVF International Conference on Computer Vision
,
Virtual
,
Oct. 11–17
, pp.
6772
6782
.
129.
Para
,
W.
,
Bhat
,
S.
,
Guerrero
,
P.
,
Kelly
,
T.
,
Mitra
,
N.
,
Guibas
,
L. J.
, and
Wonka
,
P.
,
2021
, “
Sketchgen: Generating Constrained CAD Sketches
,”
Advances in Neural Information Processing Systems
,
Virtual
,
Dec. 6–14
, pp.
5077
5088
.
130.
Ganin
,
Y.
,
Bartunov
,
S.
,
Li
,
Y.
,
Keller
,
E.
, and
Saliceti
,
S.
,
2021
, “
Computer-Aided Design as Language
,”
Conferences on Neural Information Processing Systems
,
Virtual
,
Dec. 6–12
, pp.
5885
5897
.
131.
Willis
,
K. D.
,
Jayaraman
,
P. K.
,
Lambourne
,
J. G.
,
Chu
,
H.
, and
Pu
,
Y.
,
2021
, “
Engineering Sketch Generation for Computer-Aided Design
,”
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
,
Virtual
,
June 19–25
, pp.
2105
2114
.
132.
Jayaraman
,
P. K.
,
Sanghi
,
A.
,
Lambourne
,
J. G.
,
Willis
,
K. D.
,
Davies
,
T.
,
Shayani
,
H.
, and
Morris
,
N.
,
2021
, “
Uv-net: Learning From Boundary Representations
,”
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
,
Virtual
,
June 19–25
, pp.
11703
11712
.
133.
Koch
,
S.
,
Matveev
,
A.
,
Jiang
,
Z.
,
Williams
,
F.
,
Artemov
,
A.
,
Burnaev
,
E.
,
Alexa
,
M.
,
Zorin
,
D.
, and
Panozzo
,
D.
,
2019
, “
Abc: A Big CAD Model Dataset for Geometric Deep Learning
,”
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
,
Long Beach, CA
,
June 15–20
, pp.
9601
9611
.
134.
Seff
,
A.
,
Ovadia
,
Y.
,
Zhou
,
W.
, and
Adams
,
R. P.
,
2020
, “
Sketchgraphs: A Large-Scale Dataset for Modeling Relational Geometry in Computer-Aided Design
,” Preprint arXiv:2007.08506.
135.
Gryaditskaya
,
Y.
,
Sypesteyn
,
M.
,
Hoftijzer
,
J. W.
,
Pont
,
S. C.
,
Durand
,
F.
, and
Bousseau
,
A.
,
2019
, “
Opensketch: A Richly-Annotated Dataset of Product Design Sketches
,”
ACM Trans. Graph.
,
38
(
6
), pp.
232
1
.
136.
Regenwetter
,
L.
,
Curry
,
B.
, and
Ahmed
,
F.
,
2021
, “
Biked: A Dataset and Machine Learning Benchmarks for Data-Driven Bicycle Design
,”
International Design Engineering Technical Conferences and Computers and Information in Engineering Conference
,
Virtual
,
Aug. 17–19
,
p. V03AT03A019
.
137.
Fuge
,
M.
,
2022
, “
The Frontiers in Design Representation (Finder) Summer School
,” https://ideal.umd.edu/FinDeR/, Accessed October 1, 2022.
138.
Li
,
X.
,
Demirel
,
H. O.
,
Goldstein
,
M. H.
, and
Sha
,
Z.
,
2021
, “
Exploring Generative Design Thinking for Engineering Design and Design Education
,”
2021 ASEE Midwest Section Conference
,
Virtual
,
Sept. 13–15
.
139.
Lin
,
T.-Y.
,
Maire
,
M.
,
Belongie
,
S.
,
Hays
,
J.
,
Perona
,
P.
,
Ramanan
,
D.
,
Dollár
,
P.
, and
Zitnick
,
C. L.
,
2014
, “
Microsoft Coco: Common Objects in Context
,”
European Conference on Computer Vision
,
Zurich, Switzerland
,
Sept. 6–12
, pp.
740
755
.
140.
Chen
,
Z.
, and
Zhang
,
H.
,
2019
, “
Learning Implicit Fields for Generative Shape Modeling
,”
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
,
Long Beach, CA
,
June 16–20
, pp.
5939
5948
.
141.
Kim
,
J.-H.
,
Kitaev
,
N.
,
Chen
,
X.
,
Rohrbach
,
M.
,
Zhang
,
B.-T.
,
Tian
,
Y.
,
Batra
,
D.
, and
Parikh
,
D.
,
2019
, “
Codraw: Collaborative Drawing as a Testbed for Grounded Goal-Driven Communication
,”
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
,
Florence, Italy
,
July 28–Aug. 2
, pp.
6495
6513
.
142.
Zhang
,
W.
,
Wang
,
X.
, and
Tang
,
X.
,
2011
, “
Coupled Information-Theoretic Encoding for Face Photo-Sketch Recognition
,”
Proceedings of the IEEE Conference Computer Vision and Pattern Recognition
,
Colorado Springs, CO
,
June 20–25
, pp.
513
520
.
143.
Li
,
J.
,
Xu
,
K.
,
Chaudhuri
,
S.
,
Yumer
,
E.
,
Zhang
,
H.
, and
Guibas
,
L.
,
2017
, “
Grass: Generative Recursive Autoencoders for Shape Structures
,”
ACM Trans. Graph.
,
36
(
4
), pp.
1
14
.
144.
Feng
,
Y.
,
Zhang
,
Z.
,
Zhao
,
X.
,
Ji
,
R.
, and
Gao
,
Y.
,
2018
, “
GVCNN: Group-View Convolutional Neural Networks for 3D Shape Recognition
,”
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
,
Salt Lake City, UT
,
June 18–22
, pp.
264
272
.
145.
Kanezaki
,
A.
,
Matsushita
,
Y.
, and
Nishida
,
Y.
,
2019
, “
Rotationnet for Joint Object Categorization and Unsupervised Pose Estimation From Multi-view Images
,”
IEEE Trans. Pattern Anal. Mach. Intell.
,
43
(
1
), pp.
269
283
.
146.
Shajahan
,
D. A.
,
Nayel
,
V.
, and
Muthuganapathy
,
R.
,
2019
, “
Roof Classification From 3-D Lidar Point Clouds Using Multiview CNN With Self-attention
,”
IEEE Geosci. Remote Sens. Lett.
,
17
(
8
), pp.
1465
1469
.
147.
Qi
,
A.
,
Song
,
Y.-Z.
, and
Xiang
,
T.
,
2018
, “
Semantic Embedding for Sketch-Based 3D Shape Retrieval
,”
British Machine Vision Conference
,
Newcastle, UK
,
Sept. 2–5
, pp.
11
12
.
148.
Darom
,
T.
, and
Keller
,
Y.
,
2012
, “
Scale-Invariant Features for 3-d Mesh Models
,”
IEEE Trans. Image Process.
,
21
(
5
), pp.
2758
2769
.
149.
Umetani
,
N.
,
2017
, “
Exploring Generative 3D Shapes Using Autoencoder Networks
,”
SIGGRAPH Asia 2017 Technical Briefs
,
Bangkok, Thailand
,
Nov. 27–30
, pp.
1
4
.
150.
Mo
,
K.
,
Zhu
,
S.
,
Chang
,
A. X.
,
Yi
,
L.
,
Tripathi
,
S.
,
Guibas
,
L. J.
, and
Su
,
H.
,
2019
, “
Partnet: A Large-Scale Benchmark for Fine-Grained and Hierarchical Part-Level 3D Object Understanding
,”
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
,
Long Beach, CA
,
June 16–20
, pp.
909
918
.
151.
Remelli
,
E.
,
Lukoianov
,
A.
,
Richter
,
S.
,
Guillard
,
B.
,
Bagautdinov
,
T.
,
Baque
,
P.
, and
Fua
,
P.
,
2020
, “
Meshsdf: Differentiable Iso-surface Extraction
,”
Conference on Neural Information Processing Systems
,
Virtual
,
Dec. 6–12
, pp.
22468
22478
.
152.
Kar
,
A.
,
Häne
,
C.
, and
Malik
,
J.
,
2017
, “
Learning a Multi-view Stereo Machine
,”
Conference on Neural Information Processing Systems
,
Long Beach, CA
,
Dec. 4–9
, pp.
365
376
.
153.
Sangkloy
,
P.
,
Burnell
,
N.
,
Ham
,
C.
, and
Hays
,
J.
,
2016
, “
The Sketchy Database: Learning to Retrieve Badly Drawn Bunnies
,”
ACM Trans. Graph.
,
35
(
4
), pp.
1
12
.
154.
Eitz
,
M.
,
Hays
,
J.
, and
Alexa
,
M.
,
2012
, “
How Do Humans Sketch Objects
?”
ACM Trans. Graph.
,
31
(
4
), pp.
1
10
.
155.
Mahmood
,
N.
,
Ghorbani
,
N.
,
Troje
,
N. F.
,
Pons-Moll
,
G.
, and
Black
,
M. J.
,
2019
, “
Amass: Archive of Motion Capture as Surface Shapes
,”
Proceedings of the IEEE/CVF International Conference on Computer Vision
,
Seoul, South Korea
,
Oct. 27–Nov. 2
, pp.
5442
5451
.
156.
Chen
,
X.
,
Golovinskiy
,
A.
, and
Funkhouser
,
T.
,
2009
, “
A Benchmark for 3D Mesh Segmentation
,”
ACM Trans. Graph.
,
28
(
3
), pp.
1
12
.
157.
Park
,
K.
,
Rematas
,
K.
,
Farhadi
,
A.
, and
Seitz
,
S. M.
,
2018
, “
Photoshape: Photorealistic Materials for Large-Scale Shape Collections
,”
ACM Trans. Graph.
,
37
(
6
), pp.
1
12
.
158.
Dosovitskiy
,
A.
,
Ros
,
G.
,
Codevilla
,
F.
,
Lopez
,
A.
, and
Koltun
,
V.
,
2017
, “
Carla: An Open Urban Driving Simulator
,”
Conference on Robot Learning
,
Mountain View, CA
,
Nov. 13–15
, pp.
1
16
.
159.
Zhou
,
Q.
, and
Jacobson
,
A.
,
2016
, “
Thingi10k: A Dataset of 10,000 3D-Printing Models
,” Preprint arXiv:1605.04797.