Abstract

In an increasingly interconnected & cyber-physical world, complexity is often cited as the root cause of adverse project outcomes, including cost-overruns and schedule delays. This realization has prompted calls for better complexity management, which hinges on the ability to recognize and measure complexity early in the design process. However, while numerous complexity measures (CMs) have been promulgated, there is limited agreement about “how” complexity should be measured and what a good measure should entail. In this paper, we propose a framework for benchmarking CMs in terms of how well they are able to detect systematic variation along key aspects of complexity growth. Specifically, the literature is consistent in expecting that complexity growth is correlated with increases in size, number of interconnections, and randomness of the system architecture. Therefore, to neutrally compare six representative CMs, we synthetically create a set of system architectures that systematically vary across each dimension. We find that none of the measures are able to detect changes in all three dimensions simultaneously, though several are consistent in their response to one or two. We also find that there is a dichotomy in the literature regarding the archetype of systems that are considered as complex: CMs developed by researchers focused on physics-based (e.g., aircraft) tend to emphasize interconnectedness and structure whereas flow-based (e.g., the power grid) focus on size. Our findings emphasize the need for more careful validation across proposed measures. Our framework provides a path to enable shared progress towards the goal of better complexity management.

1 Introduction

Understanding system complexity is of interest to scholars and practitioners because it is thought to be a primary driver of adverse project outcomes, including cost, schedule, and scope overruns [18]. For example, the James Webb Space Telescope, which is often labeled as “one of NASA’s most complex programs” to date, is famously several billion dollars over budget and many years delayed partially because of its “technical complexity” [9,10]. In other words, the cause of these overruns is attributed to unmanaged and/or uncharacterized complexity. This realization has prompted calls for better complexity management, which hinges on the ability to recognize and measure complexity early in the design process [8,1114]. Complexity measurement can also serve an important role throughout the product lifecycle [15]. Agreed-upon complexity measures (CMs) would enable designers to identify risks to cost, schedule, and scope overruns earlier, and choose system alternatives that avoid complexity induced issues outright while meeting the necessary requirements [3,7,12,1618].

Since Simon’s seminal work that defined and established the construct of complexity as an emergent property of systems that arises from parts interacting in non-simple ways [19], scholars across engineering, computer science, business, and the natural sciences have sought to measure the complexity of systems [1,11,12,20,21]. However, despite significant scholarly attention, there exists no agreed-upon methodology for measuring complexity, or even consensus on what a good CM would entail. A review of the Engineering Design and Systems Engineering (EDSE) literature reveals that over 50 CMs have been articulated [11,20,2224] in the last four decades, and more are produced every year. However, there is no apparent progress towards convergence on measurement procedures [7,8,12,17,18,25,26]. The literature includes wide variety in both approaches and intent, all under the banner of complexity measurement. Rather than continuing to promulgate new CMs, we contend that there is a need to take a step back and take stock of existing measures.

To that end, this paper proposes a neutral approach to benchmarking CMs and applies it to assess the extent to which a representative subset of CMs is able to capture agreed-upon aspects of the underlying complexity construct. The proposed framework relies on the idea that while we observed wide variation in the measurement approaches taken by proposed metrics, there is broad (albeit not previously synthesized) agreement about what complexity growth entails. Therefore, our analysis begins by synthesizing a set of commonly held beliefs about complexity growth from the literature. There is broad agreement that valid CM should be able to detect and evaluate the complexity growth resulting from changes in the system’s size and level of interconnection, and identify complexity reduction when structure (e.g., modularity) is introduced into the system’s architecture. We contend that, at minimum, a good CM should be able to detect complexity impacts due to each of these attributes.

To assess the state of the literature, we select a set of representative CMs and apply them to synthetic system architectures that systematically vary each of the key dimensions: size, interconnectedness, and structure. The CM selection was supported by a systematic review of the literature that focused on documenting the numerous ways that CMs operationalize complexity. This enables us to choose a representative set of measures that capture the predominant measurement approaches across the literature. The synthetic architectures enable a series of controlled tests for exploring whether or not each CM selected for this study is able to detect changes in system complexity.

We found that none of the investigated CMs are able to correctly detect all of the induced changes, nor do they reach a consensus among themselves. We explain these disparities in terms of the intellectual origins of the CMs. Specifically, we identified a dichotomy of physics-based versus flow-based perspectives of complexity. CMs derived from these two perspectives exhibit similarities in terms of how they consistently consider architectures that are archetypical of their origin domain as more complex compared to the others. This suggests that examining complexity measurement across all engineering systems may be masking some within-group agreement. In addition, by virtue of the proposed benchmarking approach, we documented that many of the fundamental architectural assumptions related to complexity, such as establishing a modular structure, are not detected by many of the CMs. We conclude with a call for more clarity around how a CM reacts to a range of system attributes of before the CM is applied throughout the literature. The proposed benchmarking framework is one viable approach to do this.

2 Background

There have been a large number of attempts to measure the complexity of engineered systems. Proposed strategies to measure complexity generally focus on assessing difficulties throughout a system’s life cycle [14,19,2732]. These difficulties include those that are associated with a system’s description [2,25,33,34], the prediction of its behavior [35,36], assembling the system [7,12,19,3740], designing the components [25,7,8,16,18,4148], or delivering its specific functionality [1,4951].

Although there is a wide range of measurement strategies across the literature, we identified broad consensus about a few key attributes, which we term commonly held beliefs. In this review, we focus on establishing the basis for the key aspects of our framework. To that end, Sec. 2.1 focuses on establishing the commonly held beliefs. To do this, we discuss the origins of the complexity measurement concept, how it was diffused into the EDSE community, and document the few beliefs about complexity that are commonly held by the EDSE literature. Section 2.2 selects a subset of representative CMs for further analysis. To do this, we explore the diverging perspectives regarding how the phenomena of complexity should be measured. Finally, Sec. 2.3 contextualizes our proposed benchmarking approach in terms of the strategies adopted by previous efforts, and outlines how and why it has been difficult to bring the community towards convergence.

2.1 Commonly Held Beliefs Regarding Which System Attributes Drive Complexity.

To understand the current state of complexity measurement in EDSE, we reviewed the literature and traced the origins of CMs used in this discipline. Complexity measurement concepts relevant to the modern EDSE literature were first studied in the 1960s as part of the issue of computability in mathematics [52,53]. These first CMs focused on aspects of a math problem or the computer program used to solve that math problem, such as the length of the software [33,34], the number of computational paths [17], the unique and total number of functions and variables [25,54], and the variety and variability of system states [55]. As the engineering design community began to study what made their systems complex, they borrowed from the computer science literature [2,18,24,28,49], they developed their own measures [1,56,57], or looked at other fields such as graph theory [5,7,8,12,38,58] and modern physics [8,59] to draw inspiration.

Despite these different origins and variety of viewpoints, we found wide agreement in the EDSE literature that the complexity of an engineered system is dependent on three architectural properties; size, interconnection, and structure. These commonly held beliefs are supported regardless of what aspect of the engineered system is being studied in terms of its instantiation (or how it is built), the description of the system, the functionality it must provide, or the states that it can have. While there are other beliefs in the literature about what drives complexity, such as the number of feedback loops within the structure [5,60,61], we did not find them as widely held as these three.

First and foremost, we found widespread support for the commonly held belief that a system’s complexity grows as the size of the system increases in terms of elements or parts [11,14,19,20,24,30,31,62]. With increasing size, more parts need to be designed and developed, increasing the complexity related to the system as built [24,7,8,12,16,4148]. Furthermore, as more elements are added, there is more information required that needs to be communicated to fully describe the system such that someone can understand it, increasing the descriptive complexity of the system [26,34,47,63]. Additionally, as systems grow in size in terms of their potential states, these states can further interact, generating more and more unique system states and increasing the difficulty and complexity of predicting the system’s state [2,29,47,49]. Finally, as systems grow in number of functions that must be delivered, this increases the functional complexity related to the system as there is more that it needs to do [1,51]. Thus, we find support for a commonly held belief that if a system increases in size, we expect complexity to increase and a valid CM should capture this change.

Second, there is a consensus that increasing the interconnections in a system should increase its complexity [11,14,19,20,24,30,31,62,64]. Increasing the number of interconnections within a system translates to more interfaces between mechanical, electrical, and computational elements that need to be managed, which increases the complexity of the system’s design [24,7,8,12,16,18,41,42,4548]. More interconnections within a system also results in more necessary description to be conveyed so that the system can be fully understood [26]. Furthermore, greater interconnectedness causes more coupling between the probability distributions that define the system’s states, which makes it more difficult to describe the system or predict its behavior [65]. Finally, an increased level of interconnections generally means that there are more sub-functions and interdependencies between the functions, which render it more difficult to provide the desired functionality [1,51]. Thus, complexity, and by extension valid complexity measurement, should increase with the level of interconnections (or number of interconnections).

Finally, we found a commonly held belief that “structuring” or architecting a system’s components to minimize their random interconnections, such as establishing a modular structure, reduces its complexity [11,14,19,20,24,27,29,30,62,66,67]. As a system is structured away from a random structure of interactions and elements, there is less indirect coupling effects from component changes, which makes design easier [35,7,8,12,4143,46,48]. Furthermore, the complexity is reduced as the system can be represented with less information and simplified [68], which reduces the description difficulty [26]. Also, establishing a structure reduces the entropy associated with the system, constraining the propagating effects of changes and failures to smaller subsets of the larger system such as designed modules and clusters [29,47]. Additionally, making the functions more modular improves the independence of the functions and reduces the system’s overall complexity and contributes to the reduction in the system’s overall complexity [1,49,51]. Consequently, one would expect a CM to see a decrease in complexity when a system is made modular, regardless of the specific mode of modularization used (pure, bus, sectional, or sequential or other).

2.2 Diverging Perspectives on How to Measure Complexity.

Despite the aforementioned agreement with respect to the impact of complexity and which system attributes drive it, we found much more variety and divergence in terms of how to measure complexity. During our literature review, we identified 68 CMs. These focused on different aspects of the system and the context in which it was created, including the design programs and projects [43,6971], the specific design problem and associated requirements documentation [12,43], the fabrication process [35,37,72], or the engineered system itself [1,2,7,8,12,16,18,26,29,41,4549,51,63,65,73]. For the scope of this paper, we focus on the last category—the engineering system itself—as it is the primary concern of the EDSE literature.

Within this subset, we examined variation in two dimensions: the aspects of the system being evaluated and the operationalization used in the metric. In terms of aspects, we observe three categories of focus across the EDSE literature: (1) the physical artifact, in terms of its components, interfaces, and the way they fit together [25,7,12,1618,4148]; (2) the way the problem is described [25,26,33,34,47,54]; and (3) how the system elements interact to produce the desired functionality and performance is delivered [1,2,29,47,49,50]. In the rest of the discussion we only focus on the first two categories of aspects since that is the emphasis of the EDSE literature. Moreover, the functional perspective tends to rely on system representations that are incompatible with the benchmarking framework adopted in this paper.

In terms of operationalizations, we again observed three main categories: (1) raw count of system elements, including physical parts or informational bits [7,12,18,33,34,43,44,47]; (2) composite counts that assign weights to multiple constituent elements, such as a weighted measure of components, interfaces, or interactions [24,8,17,25,41,42,45,47,48,54,74]; and (3) more sophisticated measures that sought to capture the overall structure [5,7,12,26,46]. These had more variety in their approach, ranging from topology to entropy measures.

Table 1 provides an overview of how the CMs discussed in our literature review fell along these two dimensions. In the remainder of our benchmarking analysis, we will focus on the six CMs bolded in Table 1. As shown in the table, these cover the space of aspect and operationalization including one from each of the represented perspectives, plus an additional sample from the most popular perspective. The six include McCabe’s cyclomatic complexity (MCC) [17], Halstead’s volume measure [25,54], Broniatowski and Moses’ descriptive CM [26], Hölttä-Otto interface complexity [18], Summers and Shah’s coupling complexity [7,12], and Sinha et al.’s structural complexity [8,74]. In addition to being representative of the current literature, these measures are some of the most highly cited measures in EDSE.

Table 1

EDSE CMs by aspects of the system evaluated and operationalization (papers with multiple measures are indicated with repeated citation numbers and these multiple measures are not considered part of an index, but rather multiple views of the same construct. Bolded measures are the measures used in this analysis)

Physical artifactDescription
Raw count[1,18 ,43,44,47]
Combined weights and counts[24,7,8 ,12,16,17 ,41,42,45,48,74][25 ,33,34,47,54]
Topology/path/entropy[5,7 ,12,46][26 ]
Physical artifactDescription
Raw count[1,18 ,43,44,47]
Combined weights and counts[24,7,8 ,12,16,17 ,41,42,45,48,74][25 ,33,34,47,54]
Topology/path/entropy[5,7 ,12,46][26 ]

In Table 2, we provide supplementary information for each of the selected CMs to illustrate how they work and differences in operationalizations. In the remainder of this work, we will be referring to CMs with their abbreviations in the left column. We have the first CMs proposed, MCC and Halstead-derived volume measure complexity (HVM), a modern engineering systems interpretation of Kolmogorov complexity with Broniatowski and Moses descriptive complexity (BDC), and an instantiation of early software CMs in Hölttä–Otto interface complexity (HIC). Furthermore, we also have modern and highly cited EDSE measures in Sinha et al. structural complexity (SSC) and Summers et al. coupling complexity (SCC). Some of these measures are already being used as a foundation for further analysis [42,48,77], though widespread adoption has not yet happened. Some measures that are used, such as SCC, are part of a larger set of measures that help provide multiple, different views of complexity in engineered systems. In the equations in Table 2, E represents the number of interfaces, Eu represents the number of unique interfaces, Eij represents the difficulty of an interface between component i and j, N represents the number of components, Nu represents the number of unique components, Ni represents the difficulty of a component i, Li represents the number of levels in a system decomposition, Mij the number of set sizes of a given length, Sn set size, and A an adjacency matrix or design structure matrix (DSM) of all components and interfaces. We assume that both difficulty terms, Eij and Ni, are equal to 1 for simplicity purposes. All of this information can be captured using a DSM which means that most of these CMs can be automated using software programs except the BDC, which requires some manual analysis to assess the structure of the system to classify it as a tree, grid, team, or filled structure. One CM, Summers et al.’s coupling complexity (SCC), is actually one of a set of nine measures that are designed to investigate complexity throughout the entirety of the engineering design process. They recommend using multiple CMs to develop a complete view, including measures of size (e.g., Halstead-derived volume). Each of the other measures is presented to be standalone measures of engineering complexity.

Table 2

Selection of CMs and the justification for including in this study

Measure name and abbreviationMeasuring complexity is based on …How do they operationalize complexity?The intended use of the CM …Justification for including in this study
MCC [17]How many pathways a program can take through a decision treeCMCC = EN + 2 * P (1)Modularize to decrease the number of pathways a module has such that the code is maintainablePublished in 1976, the first CM to utilize the graph theory conceptualization. Cited 7735 times and by most CMs in the EDSE literature
HVM [25,54]How much information is required to describe a system’s parts and interfacesCHVM = (N + E) * log(Nu + Eu) (2)Modularize to decrease system size such that mental comparison fatigue is reducedPublished in 1977, one of the earliest examples of the descriptive perspective.
Cited 4223 times and inspired many CMs such as [2,7,12]
BDC [26]How much information is required to describe the structure of the systemCBDC = Size (Description String) (3)Modularize to abstract the system architecture such that the system can be understood with less cognitive effortA modern and unique implementation of Kolmogorov complexity for EDSE [33,34,75]
HIC [18]How much a change in one component can affect another component through its interfacesCHIC=Eij (4)Modularize to minimize the impact of changes in terms of rework timeOnly CM that was inspired by a professional standard (IEEE Standard 982.1-1988). Cited 112 times
SCC [7,12]How much the system can be decomposedCSCC=Li*Mij*Sn (5)Assess system difficulty, compare systems, and help modularization and decomposition effortsOriginal formulation using graph theory, inspired by earlier work [1,2,29,49] and highly cited in the EDSE literature [76]
SSC [8,74]How dense and centralized the structure is in addition to the difficulties of component and interface developmentCSSC=Ni+Eij*Eng(A)N (6)Predict system and subsystem costs and support modularizationDraws directly on Schrodinger’s equation and graph theory. Widely used to study modularity and granularity [42,48,77]
Measure name and abbreviationMeasuring complexity is based on …How do they operationalize complexity?The intended use of the CM …Justification for including in this study
MCC [17]How many pathways a program can take through a decision treeCMCC = EN + 2 * P (1)Modularize to decrease the number of pathways a module has such that the code is maintainablePublished in 1976, the first CM to utilize the graph theory conceptualization. Cited 7735 times and by most CMs in the EDSE literature
HVM [25,54]How much information is required to describe a system’s parts and interfacesCHVM = (N + E) * log(Nu + Eu) (2)Modularize to decrease system size such that mental comparison fatigue is reducedPublished in 1977, one of the earliest examples of the descriptive perspective.
Cited 4223 times and inspired many CMs such as [2,7,12]
BDC [26]How much information is required to describe the structure of the systemCBDC = Size (Description String) (3)Modularize to abstract the system architecture such that the system can be understood with less cognitive effortA modern and unique implementation of Kolmogorov complexity for EDSE [33,34,75]
HIC [18]How much a change in one component can affect another component through its interfacesCHIC=Eij (4)Modularize to minimize the impact of changes in terms of rework timeOnly CM that was inspired by a professional standard (IEEE Standard 982.1-1988). Cited 112 times
SCC [7,12]How much the system can be decomposedCSCC=Li*Mij*Sn (5)Assess system difficulty, compare systems, and help modularization and decomposition effortsOriginal formulation using graph theory, inspired by earlier work [1,2,29,49] and highly cited in the EDSE literature [76]
SSC [8,74]How dense and centralized the structure is in addition to the difficulties of component and interface developmentCSSC=Ni+Eij*Eng(A)N (6)Predict system and subsystem costs and support modularizationDraws directly on Schrodinger’s equation and graph theory. Widely used to study modularity and granularity [42,48,77]

2.3 Issues With Complexity Measure Validation.

In reviewing the large number of proposed CMs, we observed weak validation within studies and limited comparison across studies. This is driven by two issues: there is a lack of established theory of complexity to directly validate against and authors did not represent systems in a consistent way that would facilitate direct comparison from one study to another.

The first issue originates from the difficulties associated with representing and sharing real-world systems as needed to facilitate comparison across studies. Many leading papers offer a complex system to validate their measure, such as a space system architectures [5,29,74], consumer home products or robotics [7,16], industrial or military equipment [1,42,47,48], and software and information systems [17,25,54]. However, due to the proprietary nature of most systems, it is nearly impossible to replicate stated results. We have only seen that two papers use the same set of data, but they did not compare their output measurements [5,74]. Moreover, even if it were possible to share inputs, choices of system representation approach can have a significant impact on how inputs are translated to the DSM format, and the results of the corresponding CMs. Representations of systems with varying levels of granularity have been shown to have an effect on complexity measurement [7,77]. There is no standard for system representation for CMs, which means that even if compared at the output level, measures can capture differences in resolution rather than changes in actual complexity. In part to overcome this, our proposed approach compares measures based on synthetic inputs that are not subject to representational choices.

The second challenge stems from how measures tend to validate against proxies rather than construct of complexity as defined in theory. Proxies of complexity, such as cost [5,65,78], rework [8,79], and schedule [2,4,8,3941,8082], are used to see if a proposed CM reacts in the same way as other measures of the difficulty of system development and thus is a good measure of complexity. However, these proxies are also affected by factors such as organizational capabilities, the availability of resources, any political or strategic choices, or even the quality of the infrastructure necessary to design, develop, and fabricate a system. Ultimately, this makes the process of validating via proxy open to potential confounding errors and prevents us from knowing how effective a given measure is in capturing complexity because what the CM is being calibrated can vary independently of actual complexity. In order to see if a CM is effective in capturing the construct of complexity, we should not compare measures to measures or measures to proxies, but rather compare a measure’s behavior as it directly relates to changes that we know affects the complexity of the system.

3 Methodology

In order to characterize the state of the complexity literature and assess when and to what extent established measures agree, we propose a benchmarking study that is designed to overcome the methodological limitations identified in Sec. 2.3. First, we developed a family of synthetic system architectures that embodied systematic variations along the dimensions of commonly held beliefs as defined in Sec. 2.1. Next, we implemented the six representative CMs discussed in Sec. 2.2 to evaluate each of the synthetic system architectures. This provides a common platform for comparison across CMs and enables a direct test of how the CMs respond to induced changes in complexity while removing the context of the system architecture which might impact complexity measurement, such as industry, granularity, or stage of design as mentioned in Sec. 2.3. Importantly, it obviates the need to agree on proxies, share proprietary system details, or standardize system representations. Our approach draws inspiration: Weyuker [83] who established an axiomatic approach to assess software CMs and Hölttä-Otto et al. [84] that leveraged synthetic architectures to test modularity measures. The sections that follow describe the benchmarking framework.

3.1 Generating Synthetic System Architectures.

Section 2.1 synthesized three commonly held beliefs about how a system’s attributes impact its complexity: (1) complexity increases with size, (2) complexity increases with interconnectedness, and (3) complexity decreases with structural patterns like modularity. To test these beliefs, we generated synthetic architectures that systematically varied size, interconnections, and two representative structures: random and perfectly modular. Figure 1 shows the resulting design. We defined three levels of each of size and interconnections and two levels of structure, resulting in 18 test architectures.

Fig. 1
Design of the benchmarking approach
Fig. 1
Design of the benchmarking approach
Close modal

Each synthetic test architecture was generated using the popular DSM [8588] format. DSMs have been used extensively in the systems engineering and design literature to represent a wide variety of systems such as aerospace, automotive, electronics, information technology, manufacturing, mechanical, and software [86]. The extensive literature includes standards for system representation that we adopted here [89]. In the context of a DSM, increasing the size simply means adding more rows (and corresponding columns), and increasing the number of interfaces corresponds to adding off-diagonal elements. A DSM can also capture information about structure. For the random structures, we randomly placed the off-diagonal elements, creating a random interconnection among components. For the perfectly modular structures, we represented them with perfectly encapsulated squares of interconnections that touch the centerline and do not have interfaces with other modules. Across all test architectures, we chose to make the systems bidirectional or undirected in their interfaces, and we chose to make the interfaces binary.2 We found that this was convenient for the experimental approach and similar to how validations have happened for these CM and other measures in the past [8,84].

We operationalized high, medium, and low for each dimension as follows. For system size, a small size with 36 elements, a medium size with 25% more components and 45 elements, and a large size with 125% more components and 81 elements. This intends to represent a wide spread of realistic systems. We did not go smaller than 36 because that would have prevented us from exploring alternative system architecture motifs, such as perfect and bus modularity. For our interconnections, we picked the levels of 90 interfaces, 56% more interfaces with 140 interfaces, and 144% more interfaces with 220 interfaces. These specific sizes were chosen to cover a representative range but also to enable a practical comparison of modular architectures, as certain number of interfaces with the given size would not allow for perfectly square and filled modules.

For structure, there were numerous choices that we could have made to test if complexity decreases with additional structuring. Multiple past studies have argued that architectural strategies such as modularity, layering, commonality, etc. reduce the complexity of the design process, regardless of the specific type of modularity or structuring used [27,48,62,90,91]. As a first test of this hypothesis, we operationalized structure with two canonical levels, represented by the extreme cases of random or perfect modularity, where modules are “tightly connected within but loosely connected to other modules” [84,90]. Totally random structures are thought to have the maximal amount of complexity while perfectly modular structures are thought to have a minimal amount of complexity, so one would expect a significant difference between the two when measuring complexity [11,14,19,20,24,27,29,30,62,66,67]. To ensure that this choice is not a main driver of our results, we also include an alternative mode of structuring (bus modularity) as a plausibility probe in our sensitivity analysis.

We generated the random architectures using a random number generator for a given size and number of interfaces and we generated the perfectly modular architectures by hand. We do also recognize that there is some impact on the exact structure of a random or modular structure. A random structure of a given size and interconnections might have a slightly different complexity measurement than another random structure. A perfectly modular architecture is also different than a bus-modular architecture, but both do represent modular and potentially lower complexity architectures. These assumptions will be explored as part of a sensitivity analysis. Namely, we generated 1000 random architecture for every size and interface count and we also generated bus-modular architectures using definitions from the literature [84,9194].

Table 3 shows six examples of the DSMs used in our analysis. Table 3 highlights that we were able to calibrate the test cases to retain potential confounds such as module size and density relatively constant, while varying other features of the architecture.

Table 3

Some system architectures used in the experiment

3.2 Evaluating Changes Detected by Representative Complexity Measures.

After generating the simulated test architectures, we measured the complexity of each architecture using each of the selected CMs. We then assessed their response to each belief by calculating the percentage change in complexity measured from a self-baseline along each dimension of change. This process is visualized in Fig. 2. This approach allowed us to observe if these CMs were sensitive to the architectural stimuli in a way that is consistent with the expectations of the literature and allowed us to isolate the potential interaction between properties. In terms of sensitivity, we expected an increase in size and interconnections to lead to an increase in complexity, whereas imposing a modular structure was expected to decrease the complexity of a random architecture.

Fig. 2
Benchmarking procedure for specific commonly held beliefs
Fig. 2
Benchmarking procedure for specific commonly held beliefs
Close modal

4 Results

In this section, we present the results of our benchmarking study, focusing on the individual CMs and how they detect controlled architectural changes in size in Sec. 4.1, interconnectedness in Sec. 4.2, and having a modular structure in Sec. 4.3. In the below sections, results reported for random architectures are the averages of 1000 Monte-Carlo runs. For the modular architecture results, we initially present the perfectly modular results and then in Sec. 4.4, we further probe into sensitivity against modularity type by exploring a bus-modular structure. We present the results of the sensitivity analysis for modularity type in Appendix  B.

4.1 Test of Commonly Held EDSE Belief 1: Increasing Size Increases Complexity Trends.

In Table 4, we present how the CMs interpret the controlled increases in size, while holding all other architectural properties constant. The columns represent the response to increase in size, where we adopt a size 36 test DSM as the reference point, and incrementally increase its size to 45 (a +25% increase) and then to 81 (+125% from size 36) for a given number of interfaces and structure. We then benchmark the observed changes in complexity against the reference point. In order to understand if these insights vary with respect to the characteristics of the reference point, we repeat our analysis by increasing the interface count of the reference point while keeping the size constant, and document this in rows (the -L suffix after the measure acronym corresponds to 90 interfaces, -M to 140 interfaces, and -H to 220 interfaces). We also vary the architectural structure of the reference point and document it with the facets. Finally, the sparklines visualize the general trend of the CMs’ response for increases in size given a fixed number of interfaces and architectural structure.

Table 4

CMs’ interpretation of controlled increases in a system’s size

Based on the commonly held belief of the EDSE literature, we expected the controlled increases in a systems’ size to result in a corresponding increase in complexity with each step to be detected by the CMs. A review of Table 4, best summarized by the sparklines, reveals that only a few CMs respond in line with this expectation. HVM and BDC consistently exhibited the expected monotonically increasing behavior for all explored cases. However, for the same exact unit change, BDC suggested a much larger increase for modular architectures, displaying self-inconsistent behavior. We observed that the MCC documented a contradicting interpretation, indicating a decrease in complexity with increases in size, across all reference cases. Nevertheless, its response was self-consistent across the random and modular cases. HIC was observed to be completely insensitive to any changes in size. We observed that SCC and SSC displayed self-inconsistency, increasing, or decreasing based on the properties of the reference point and both did not meet the expectation of the literature completely. Particularly for the modular architectures, we observed that SCC readings lacked self-consistency when compared to random case. While we observed that the SSC readings were rather infinitesimal compared to the other CMs, increases in size only increased a systems’ complexity when the structure was modular and the interface count was low, otherwise, it was self-inconsistent. However, when the structure was random, we observed that the complexity reading of SSC consistently decreased, displaying exactly the opposite of expected behavior.

4.2 Test of Commonly Held EDSE Belief 2: Increasing Interconnections Increases Complexity Trends.

The impact of increasing the number of interfaces while holding all other architectural properties constant is presented in Table 5. The format is similar to Table 4. We adopt a reference architecture with 90 interfaces and gradually increase the number of interfaces to 140 (+56%) and 220 (+144%) for a given size and structure. We evaluate the change in complexity reading with respect to the reference architecture. We also vary the size and structure properties of the baseline case to understand sensitivities, as we did in Table 4, and record the change in rows (the -L suffix here after the measure acronym corresponds to a size 36 architecture, -M to a size 45, and -H to size 81). Sparklines indicate the general response trend of CMs with respect to increases in the number of interfaces for a given size and architectural structure.

Table 5

CMs’ interpretation of controlled increases in a system’s interconnectedness (note that the scale focuses on changes within 300% of the original value, leading to some extremes to go off the graph)

We observe that CMs regard the increases in interface count in a more consistent manner compared to the size change, with all five of the six of the CMs pointing at a steady increase in complexity for every reference size and structure in Table 5. While there are nuances in the magnitude of the increase, we observe a broader support for second commonly held belief than the first, conveying that there is more of an agreement regarding the role of interconnections and their impact on a systems’ complexity. The only contradicting case was BDC. While BDC behaved as expected for random reference points, it interpreted an increase in the interface count as a decrease in complexity for all modular reference points, exhibiting self-inconsistency based on structure. Another finding is that for MCC, HVM, and HIC, changes in complexity due to interface count increase were identical regardless of initial size or structure. SSC did demonstrate increase as interconnections increased, but it was dependent on the base structure. Finally, we observed that SCC demonstrated self-inconsistency by recording a significantly larger increase in complexity for large and modular reference points.

4.3 Test of Commonly Held EDSE Belief 3: Establishing a Modular Structure Decreases Complexity Trends.

In Table 6, we present the CMs’ interpretation of imposing a modular structure to a random reference point, with all else held constant. Similar to the previous tables, we document the percentage change in complexity, as evaluated by each measure, for a given size and number of interfaces (the -L suffix here after the measure acronym corresponds to a size 36 architecture, -M to a size 45, and -H to size 81). The sparklines indicate the general trend of change in complexity when an average random reference architecture is converted from a random to a perfectly modular structure.

Table 6

CMs’ interpretation of a change towards a modular architectural structure

It is a widely held belief in the EDSE community that establishing a modular structure is an effective way for reducing and managing complexity. The highlight of Table 6 is that most measures do not capture this commonly held belief. Only one measure, BDC, consistently captured this commonly held belief by decreasing in complexity measurement by 66–95% for all system architecture sizes and level of interconnections, and the magnitude of change was observed to grow with the number of interfaces. Half of all the CMs we explored, MCC, HVM, and HIC, were fully insensitive to changes from a random to modular structure, as indicated by flat sparklines. SCC and SSC displayed inconsistent behavior that was often contradictory to our expectation and contradictory to itself. SCC frequently suggested that establishing a modular structure was causing an increase in complexity. Besides, in all but a couple case, the increase was large, varying 48–75%. SSC measured rather small changes in complexity. However, in two-thirds of all test cases, it explained a change towards modularity with an increase in complexity.

4.4 Results of the Sensitivity Analysis.

For our structural tests, we recognized that our specific choices in how to represent and generate random and modular architectures could have had an effect on our results. To mitigate this, we did sensitivity analysis in two ways: (1) generating 1000 random architectures and studying their population statistics and (2) testing bus-modular structures as opposed to perfectly modular structures. We found that the outcomes of our sensitivity analysis agreed with the results above, but also revealed some unexpected behavior.

When looking at the population statistics for the 1000 random architectures, we observed a few things. First, for MCC, HVM, HIC, and BDC, the evaluation did not change. Second, we noticed that SCC and SSC’s variation fit well to the normal distribution and that the bounds of the 95% confidence interval for the mean was no more than 1.1% for SCC and 0.1% for SSC away from the mean. This means that complexity evaluation was tightly distributed for the random architectures for all of our CMs. This is not what was expected, since topology is thought to be important, particularly in the physics-based systems literature. Connections among components that are “distant” in the system architecture are thought to require more coordination and cause more feedback, risk, and difficulty compared to connections between two components in the same module [14,24,27,29,42,48,62]. If the CMs we explored captured this property, we would expect a greater amount of variance due to randomness, yet this phenomenon was not detected at all.

When looking at the difference in measurement for bus-modular architectures versus modular architectures, we found no difference in the trends of the measures with respect to the benchmarking tests. There were, of course, differences in the specific results that merit some discussion. The results are presented in Appendix  B. We observed that BDC consistently ranked bus-modular architectures as more complex than perfectly modular ones by 12–36%, though the trend was still negative compared to random equivalents. This was likely due to the extra necessary information to describe the bus. We also noticed that SCC and SSC would sometimes rank the bus-modular less complex than the modular one in our tests. Since these measures were self-inconsistent even for the perfectly modular case, this finding did not impact the overall results. Nonetheless, these differences suggest that a more detailed exploration of the impact of different kinds of structures could be an important exercise as the complexity literature matures.

5 Findings and Discussion

This section is organized in two subsections. In Sec. 5.1, we synthesize insights from the benchmarking study. In Sec. 5.2, we delve into the intellectual origins of the CMs and explain why and how the observed differences among the CMs emerged.

5.1 Summary of the Benchmarking Study.

We benchmarked the six representative CMs against a set of test system architectures that systematically varied in terms of size, interfaces, and structure. Based on the literature, we expected complexity to increase with size and interfaces and decrease with the structure change from random to modular. We summarize our results in Table 7 where rows correspond to the CMs included in the experiment, the columns represent the specific architectural stimuli, and the cells summarize the results for a given CM and stimulus test. Within the cells, we indicate if a measure met or did not meet the commonly held beliefs was insensitive to the change, was self-inconsistent (i.e., for the size change test, a self-inconsistent measure had different behavior based on the level of interconnections or structure) or was inconsistent in meeting the expectation of the commonly held belief (i.e., we saw u-shaped curves where complexity would increase for one step of a test and decrease for another step and vice-versa).

Table 7

Summary of the benchmarking results

MeasureCommonly held belief 1: Does increasing size result in an increase in the complexity measurement?Commonly held belief 2: Does increasing the number of interfaces result in an increase in the complexity measurement?Commonly held belief 3: Does a change from random to a modular structure result in a decrease in the complexity measurement?
MCCNoYesInsensitive
HVMYesYesInsensitive
BDCYes but self-inconsistent
Complexity measurement increased for all architectures, but at different rates for modular and random architectures
Mixed. Self—inconsistent and inconsistent with respect to the commonly held belief
Complexity measurement increased for all random architectures and decreased for all modular architectures
Yes
HICInsensitiveYesInsensitive
SCCMixed. Self—inconsistent and inconsistent with respect to commonly held belief
Equal split between increasing and decreasing trend, uncorrelated to other properties
Yes but self-inconsistent
Overall response is consistent with proposition, but systematic difference depending on base structure
Mixed. Self-inconsistent and inconsistent with respect to the commonly held belief
For all seven of nine cases, complexity increased with no consistent pattern. For two of nine cases, a decrease was detected
SSCMixed. Self-inconsistent and inconsistent with respect to commonly held belief
For five out of six cases, complexity decreases, uncorrelated to other properties
Yes but self-inconsistent
There is some sensitivity to the base structure, with greater impact for smaller, less interconnected architectures
Mixed. Self-inconsistent and inconsistent with respect to the commonly held belief
For only six of nine cases, complexity increased. These were the ones with high degrees of interconnections
MeasureCommonly held belief 1: Does increasing size result in an increase in the complexity measurement?Commonly held belief 2: Does increasing the number of interfaces result in an increase in the complexity measurement?Commonly held belief 3: Does a change from random to a modular structure result in a decrease in the complexity measurement?
MCCNoYesInsensitive
HVMYesYesInsensitive
BDCYes but self-inconsistent
Complexity measurement increased for all architectures, but at different rates for modular and random architectures
Mixed. Self—inconsistent and inconsistent with respect to the commonly held belief
Complexity measurement increased for all random architectures and decreased for all modular architectures
Yes
HICInsensitiveYesInsensitive
SCCMixed. Self—inconsistent and inconsistent with respect to commonly held belief
Equal split between increasing and decreasing trend, uncorrelated to other properties
Yes but self-inconsistent
Overall response is consistent with proposition, but systematic difference depending on base structure
Mixed. Self-inconsistent and inconsistent with respect to the commonly held belief
For all seven of nine cases, complexity increased with no consistent pattern. For two of nine cases, a decrease was detected
SSCMixed. Self-inconsistent and inconsistent with respect to commonly held belief
For five out of six cases, complexity decreases, uncorrelated to other properties
Yes but self-inconsistent
There is some sensitivity to the base structure, with greater impact for smaller, less interconnected architectures
Mixed. Self-inconsistent and inconsistent with respect to the commonly held belief
For only six of nine cases, complexity increased. These were the ones with high degrees of interconnections

In Table 7, a row-wise review indicates that none of the representative CMs included in this study consistently demonstrated the expected response. We observed that HVM and BDC responded as expected for two out of three tests. HVM consistently captured increases in size and interface count, but it did not detect any change in complexity when a random structure was made modular. On the other hand, BDC captured the changes in size and structure, yet its reaction to the increases in the interface count was dependent on the structure of the reference architecture. SCC and SSC both responded positively to interface counts; however, SCC demonstrated a self-inconsistent response with significant sensitivity to the reference point. Additionally, both measures had inconsistent responses to changes in size or structure. Finally, two of our measures, MCC and HIC, were wholly insensitive to structure and in the case of MCC, complexity measurement actually decreased complexity measurement as size increased.

A column-wise review of Table 7 provides a nuanced view of the results. Regarding the first commonly held belief on size, we would have expected that all CM’s would be able to capture size change, since it is a surface feature and core to almost all complexity discussion. However, this was consistently captured by only one of the CMs, HVM. BDC also showed a monotonically positive response, but exhibited different behavior depending on the base structure. Two CMs, SCC and SSC, inconsistently captured size change, both with respect to the expectation and themselves. Of the remainder, HIC was insensitive and MCC interpreted a size increase as a consistent decrease in a systems’ complexity, opposite to expectation.

Regarding the second commonly held belief (interfaces), we observed that CMs somewhat agreed regarding the influence of interface count on complexity, suggesting that an interconnections-based view of complexity is more prominent in the literature. MCC, HVM, HIC, and SSC all showed the expected trend. SCC also behaved as expected overall, but the response was systematically different depending on the reference structure. BDC was an outlier, showing opposite response to changes depending on the base structure. For this measure, it is easier to describe more when one can abstract the large modules of the systems.

Regarding the third commonly held belief, we observed results that were largely inconsistent with expectations. Only BDC showed the expected decrease in complexity when structure changed from random to modular. Half of the measures, MCC, HVM, and HIC, had no reaction for imposing a modular structure. Additionally, the two CMs that exhibited inconsistent responses, SCC, and SSC, misclassified the change more than two-thirds of the time. BDC was the only CM that consistently classified imposing a modular structure as a complexity-reducing architectural act. The wide range of behaviors in response to these stimuli hints at a fractionalized understanding of complexity, suggesting that there are numerous and separate viewpoints on complexity being expressed simultaneously.

5.2 How We Believe the Origins of a Complexity Measure Affects How They Measure Complexity in Engineered Systems.

In the previous section, we demonstrated that our set of representative CMs is largely inconsistent with the commonly held beliefs of this field. Here we argue that there are a few interrelated mechanisms and distinctions in the mental models of the authors that are driving the observed results: what type of system is considered complex, what features of the system architecture render a system complex, and how complexity is defined.

Within the EDSE community, there are various definitions regarding what constitutes a complex system [14]. Figure 3 depicts one possible taxonomy to organize complex systems, summarized in two branches: (i) physics-based systems that have hardware-based elements that directly interact with each other and (ii) flow-based systems that are characterized by transfer of information or material [20], for example, the internet or a transportation system. A similar distinction is also made by Maier and Rechtin [66], who proposed different heuristics for managing complexity in these two archetypes. Others follow a similar approach for handling complexity, proposing different sets of best practices for handling physics-based versus flow-based systems [14,28]. Thus, we adopt this dichotomy regarding the types of complex systems: physics-based and flow-based systems.

Fig. 3
Breakdown of engineered systems types adapted from Ref. [3]
Fig. 3
Breakdown of engineered systems types adapted from Ref. [3]
Close modal

This distinction is important when evaluating CMs because the architectural features that drive complexity in these two paradigms differ. Table 8 illustrates this point with a simple pictorial representation, a functional block diagram, and a DSM. Physics-based systems often feature modularization and have elements that are spatially close or directly connected to each other for transfer of power, data, information, or material [27,91]. The rocket example, presented in the left column of Table 8, depicts a complex system that could be considered as three modules of tightly interconnected parts (the rocket body, the side boosters, and the payload) with sparse connections between the modules. On the other hand, flow-based systems tend to be spatially spread out, with a sparser structure designed to enable information, packages, or decisions to be routed. For instance, consider the “Mail and Package Service” depicted in the right column of Table 8. The network structure is sparse with relatively low interunit dependencies and resembles a tree hierarchy, where there is a central trunk (e.g., the city) that all other elements branch off of without touching other branches. Complexity also means different things to these communities. For researchers who focus on physics-based systems, complexity arises from the interconnections and intra-module couplings, which causes changes in one of the components to propagate through the rest of the system [14,19,2729,66,91,95]. To the contrary, for researchers of flow-based systems, complexity emerges from a lack of regularity and a difficulty in description of the system [20,24,26,28,75].

Table 8

System archetypes and real-world examples

To assess whether our assertion that CMs embody the assumptions of the systems studied by their creators, we identified two test system architectures from our earlier analysis that represented the characteristics of the two archetypes (physics-based and flow-based systems) for more detailed study. Table 9 presents the CMs, information about their origins and perspectives, and how they measured the complexity of the reference complex physics-based (modular, many interfaces comparatively smaller in size) and flow-based (sparse but large) systems, respectively. It is important to note that with these reference systems, we are exploring contrasting example cases of complex systems as judged by their respective literatures and do not intend to convey independent judgement of which architecture is more complex here.

Table 9

Complexity measurement based on system viewpoint

A comparison of complexity evaluations in Table 9 highlights that CMs disagree on which system is more complex in the ways our assertion predicted. The CMs developed by researchers focused on physics-based systems tend to adopt a structural perspective on measurement and their CMs agree that the reference physics-based DSM is more complex. On the other hand, the CMs developed by researchers focused on flow-based systems reported the reference flow-based DSM as more complex when they adopted a descriptive framework. The outlier is MCC which was developed in the context of software (a flow-based system) but adopted a structural approach to measurement. We contend that the viewpoint of the authors had a strong influence on how their CM behaves.

Table 9 also highlights another point about the complexity measurement in the literature. As EDSE researchers are increasingly designing and studying systems that are “cyber-physical,” simultaneously including features of both physics- and flow-based systems, having measures that are consistent with both archetypes becomes more important. Table 9 suggests that measures are much more consistent within archetypes than across. Based on the state of current CMs, we call on researchers to be careful in aligning CMs to the type of systems they are designed to assess. Moving forward, our hope is that a testbed approach, like the one used in this study, can help us better understand how these measures work and CMs capable of detecting complexity across the full range of systems included in EDSE research.

6 Conclusion

In an increasingly interconnected and cyber-physical world, the ability to manage complexity is vital for the EDSE community. However, effective management relies on valid measurement, early in the design process, and throughout the product lifecycle. Despite substantial scholarly attention, so far there are many proposed measures, but much less agreement. We contend that part of the problem stems from differences in reference systems and broader limitations in terms of validation efforts. To that end, this paper proposes an approach to benchmark the status quo of existing CMs in terms of their alignment with the commonly held beliefs in the literature. We specifically focused on the response of the CMs to control changes in size, interconnectedness, and structure. Our results emphasize that no existing CMs simultaneously meet the expectations of the literature, and in fact many of them are not even self-consistent in their response. We explain these observations based on a dichotomy of complexity perspectives between physical and information systems viewpoints.

There are two major contributions of this work. First, we formulated and demonstrated a rigorous, structured, and theory-grounded benchmarking approach that can serve as a testbed for development and validation of future architectural assessment tools and measures. Our test framework could serve as a foundation for cross-validating and calibrating future research on measurement of complexity (or other architectural properties), against the set of CMs or the set of architectural stimuli we explored. Second, our benchmarking efforts provide insight about the current state of the complexity measurement literature. Among the set of representative CMs we considered, none of them responded to all three architectural stimuli consistent with the commonly held expectations of the EDSE literature. This should serve as a cautionary tale to the EDSE community. Given the similarity of responses among some of the measures, especially with regards to our experiment looking at the origin of measures and how they react in Sec. 5.2, it appears that multiple independent discussions coexist in the literature, with limited cohesion and communication across the perspectives. For the implementer of a CM, there is a risk of using a CM that is highly sensitive to a feature related to one type of complexity but not others, so taking time to study how well measures capture the commonly held belief on complexity allow us to avoid these issues.

There are also some limitations of this study that need to be acknowledged. The first one is that our study was not exhaustive in terms of CMs in use in the literature. Instead, we focused our analysis on a set of representative measures which captured the range of perspectives that are present in the community at large. Our approach to selection necessarily excluded one class of CMs that were incompatible with our choice of DSMs as a unifying representation for benchmarking. This limited our study to only explore the descriptive and structural perspectives rather than the dynamic/behavioral perspective. Future work could expand this scope. Second, our choices in representation for system architectures reduce the extremes of how CMs evaluated complexity, but should not impact the measurement behavior with regard to the tests. We only used binary values rather than weighted evaluations to represent the relative complexity of the individual parts and interfaces, since it allowed us to systematically control architectural stimuli and avoid interaction effects. While imposing weights could alter the individual measurement of the investigated CMs, we do not expect a drastic impact on the conclusions of the study. Future work could explore how representation affects complexity measurement independent of synthetic changes like we made in this experiment. Finally, our representations were static in their view of a system. In the future, we could also install probability functions and models of probability coupling to better represent entropy-based and dynamic measures, through there could be a potentially great deal of variation that must be explored such that we could include those types of measures in our framework.

In this paper, we documented the fragmented and divergent state of the EDSE complexity literature, both in terms of what researchers mean by complexity and how it should be measured in engineered systems. While we recognize that each of the previous thrusts pursued a specific purpose through the measures they proposed, such as estimating production lead times or operation costs, and did not all purport to address the overall construct of complexity, since they were presented as “CMs,” we tested them against the standard of validly indexing that construct. This is necessary to establish if they are to be used by others in the valuable pursuit of effective complexity management. Therefore, we argue that a generalizable CM should be able to achieve two goals. First, it should capture complexity as a construct; thus, respond to all three of the commonly held beliefs of the literature. Second, it should be able to detect complexity changes consistently across all complex system archetypes. We found that none of the CMs tested here satisfies those two needs. That said, some measures are better than others in terms of appropriately detecting specific aspects of complexity. Perhaps it is too soon to hope for a single measure that can reliably serve both purposes at this level of literature maturity. With this framework, we lay the foundation for the development of such a metric and rigorous testing the generalizability of future metrics that are proposed.

Although our intent was to characterize and compare existing measures, and not to identify a single best for all contexts, given the analysis it is appropriate to reflect on the usefulness of existing CMs. BDC is appropriate for most systems, especially flow-based systems, as it captures modularity very well and captures size and interfaces with some degree of consistency; however, an important caveat is that it involves a manual characterization step that is not needed in the other strictly algorithmic approaches. For an understanding of complexity that is only based on parts and interfaces, we would recommend HVM. HVM provides a simple, yet coherent and consistent view of complexity that is in line with the shared beliefs of the literature. Ultimately, our work demonstrated that the EDSE community needs to be careful about which CM is used for what purpose, because context and perspective has a strong influence on the measurement outcomes. Our benchmarking approach, if used by others, can help to understand how CMs, or other architectural assessment tools, behave and facilitate cross-validation among different perspectives. Combined with more study of the phenomena of complexity, potentially we can bring the community towards agreement on what the phenomena is and how to measure it.

Footnote

2

This binary representation decision might appear as a limitation of this study; however, our position is that it is not. Indeed, modeling various interface types such as physical, electrical, etc., represented with relative complexity weights, would have changed the magnitude of measurement for some of the investigated CMs. The change could even have been large for CMs that include a weighting function, such as HIC and SCC, whereas it would not change those that do not impose weights, e.g., SCC and MCC. Regardless, modeling various interface types would not influence the primary conclusions of this study, as we are specifically concerned with the ability of CMs to detect the general trend and direction of change in complexity and not the specific magnitude of change.

Acknowledgment

We thank Dr. Joshua Summers, Apurva Patel, Dr. David Broniatowski, Dr. Shashank Tamaskar, and Dr. Dan DeLaurentis for their feedback. We also thank Dag Spicer, senior curator of the Computer History Museum, Paul McJones, member of the Computer History Museum, and Dr. Scott Aaronson, the head “Zookeeper” of the Complexity Zoo for computational complexity, for their guidance on computational CMs.

Conflict of Interest

There are no conflicts of interest.

Data Availability Statement

The datasets generated and supporting the findings of this article are obtained from the corresponding author upon reasonable request. The authors attest that all data for this study are included in the paper.

Funding Data

  • NSF (Grant No. CMMI-1535539; Funder ID: 10.13039/100000001).

Nomenclature

A =

an adjacency matrix capturing the interfaces between components via interfaces

E =

number of interfaces

N =

number of components

Eu =

unique number of interfaces

Eij =

weight of a specific interface between components i and j

Li =

level of decomposition

Mij =

number of set sizes of a given length

Nu =

unique number of components

Ni =

weight of a specific component i

Sn =

set size

Appendix A: System Architectures

Representations of some of the design structure matrices used in the benchmarking are shown here, including both random and modular structures. For a dataset including all of the used matrices, please contact the corresponding author.

Table 0010

Appendix B: Data for Bus-Modular Sensitivity Analysis

Table 0011
Architectural structure
Bus-modularModular
Growth in size364581364581
MeasureInterface level
MCC 90564756.0047.0011.00
14010697106.0097.0061.00
220186177141.00186.00177.00141.00
HVM 90292.48344.58292.48344.58555.25
140382.76440.29382.76440.29665.41
220527.19593.44841.68527.19593.44841.68
BDC 9011114386.00118.00217.00
1409412577.0092.00203.00
220729119461.0081.00168.00
HIC 90909090.0090.0090.00
140140140140.00140.00140.00
220220220220220.00220.00220.00
SCC 9017193301.00420.0096.00
140409374376.00723.00808.00
22011112202581315.001418.001417.00
SSC 90114.2875.45161.00165.00172.11
140253.89225.94253.78256.56250.38
220417.82419.50290.61402.67406.78396.06
Architectural structure
Bus-modularModular
Growth in size364581364581
MeasureInterface level
MCC 90564756.0047.0011.00
14010697106.0097.0061.00
220186177141.00186.00177.00141.00
HVM 90292.48344.58292.48344.58555.25
140382.76440.29382.76440.29665.41
220527.19593.44841.68527.19593.44841.68
BDC 9011114386.00118.00217.00
1409412577.0092.00203.00
220729119461.0081.00168.00
HIC 90909090.0090.0090.00
140140140140.00140.00140.00
220220220220220.00220.00220.00
SCC 9017193301.00420.0096.00
140409374376.00723.00808.00
22011112202581315.001418.001417.00
SSC 90114.2875.45161.00165.00172.11
140253.89225.94253.78256.56250.38
220417.82419.50290.61402.67406.78396.06

References

1.
Bashir
,
H. A.
, and
Thomson
,
V.
,
1999
, “
Estimating Design Complexity
,”
J. Eng. Des.
,
10
(
3
), pp.
247
257
.
2.
Braha
,
D.
, and
Maimon
,
O.
,
1998
, “
The Measurement of a Design Structural and Functional Complexity
,”
IEEE Trans. Syst. Man Cybern. Part A Syst. Humans
,
28
(
4
), pp.
527
535
.
3.
Phukan
,
A.
,
Kalava
,
M.
, and
Prabhu
,
V.
,
2005
, “
Complexity Metrics for Manufacturing Control Architectures Based on Software and Information Flow
,”
Comput. Ind. Eng.
,
49
(
1
), pp.
1
20
.
4.
Salman
,
N.
, and
Dogru
,
A.
,
2004
, “
Design Effort Estimation Using Complexity Metrics
,”
J. Integr. Des. Process Sci.
,
8
(
3
), pp.
83
88
.
5.
Tamaskar
,
S.
,
Neema
,
K.
, and
DeLaurentis
,
D.
,
2014
, “
Framework for Measuring Complexity of Aerospace Systems
,”
Res. Eng. Des.
,
25
(
2
), pp.
125
137
.
6.
Pich
,
M. T.
,
Loch
,
C. H.
, and
de Meyer
,
A.
,
2002
, “
On Uncertainty, Ambiguity, and Complexity in Project Management
,”
Manage. Sci.
,
48
(
8
), pp.
1008
1023
.
7.
Ameri
,
F.
,
Summers
,
J. D.
,
Mocko
,
G. M.
, and
Porter
,
M.
,
2008
, “
Engineering Design Complexity: An Investigation of Methods and Measures
,”
Res. Eng. Des.
,
19
(
2–3
), pp.
161
179
.
8.
Sinha
,
K.
, and
de Weck
,
O. L.
,
2016
, “
Empirical Validation of Structural Complexity Metric and Complexity Management for Engineering Systems
,”
Syst. Eng.
,
19
(
3
), pp.
193
206
.
9.
Government Accountability Office
,
2018
, “
Integration and Test Challenges Have Delayed Launch and Threaten to Push Costs Over Cap
,” United States Government Accountability Office, GAO-18-273.
10.
Government Accountability Office
,
2020
, “
Technical Challenges Have Caused Schedule Strain and May Increase Costs
,” Government Accountability Office, GAO-20-224.
11.
Sheard
,
S. A.
, and
Mostashari
,
A.
,
2010
, “
7.3.1 A Complexity Typology for Systems Engineering
,”
INCOSE Int. Symp.
,
20
(
1
), pp.
933
945
.
12.
Summers
,
J. D.
, and
Shah
,
J. J.
,
2010
, “
Mechanical Engineering Design Complexity Metrics: Size, Coupling, and Solvability
,”
ASME J. Mech. Des.
,
132
(
2
), p.
021004
.
13.
Advancing the Design and Modeling of Complex Systems
,” https://www.darpa.mil/news-events/2015-11-20, Accessed July 9, 2021.
14.
de Weck
,
O. L.
,
Roos
,
D.
, and
Magee
,
C. L.
,
2011
,
Engineering Systems: Meeting Human Needs in a Complex Technological World
,
MIT Press
,
Cambridge, MA
.
15.
INCOSE
,
2015
,
INCOSE Systems Engineering Handbook : A Guide for System Life Cycle Processes and Activities
,
John Wiley & Sons, Inc.
,
San Diego, CA
.
16.
Lee
,
K.
,
Moses
,
M.
, and
Chirikjian
,
G. S.
,
2008
, “
Robotic Self-Replication in Structured Environments: Physical Demonstrations and Complexity Measures
,”
Int. J. Rob. Res.
,
27
(
3–4
), pp.
387
401
.
17.
McCabe
,
T. J.
,
1976
, “
A Complexity Measure
,”
IEEE Trans. Softw. Eng.
,
SE-2
(
4
), pp.
308
320
.
18.
Hölttä
,
K. M. M.
, and
Otto
,
K. N.
,
2005
, “
Incorporating Design Effort Complexity Measures in Product Architectural Design and Assessment
,”
Des. Stud.
,
26
(
5
), pp.
463
485
.
19.
Simon
,
H.
,
1962
, “
The Architecture of Complexity
,”
Proc. Am. Philos. Soc.
,
106
(
6
), pp.
467
482
.
20.
Crawley
,
E.
,
De Weck
,
O.
,
Magee
,
C.
,
Moses
,
J.
,
Seering
,
W.
,
Schindall
,
J.
,
Wallace
,
D.
, and
Whitney
,
D.
,
2004
,
The Influence of Architecture in Engineering Systems
,
Massachusetts Institute of Technology
,
Cambridge, MA
.
21.
Arena
,
M. V.
,
Younossi
,
O.
,
Brancato
,
K.
,
Blickstein
,
I.
, and
Grammich
,
C. A.
,
2008
,
Why Has the Cost of Fixed-Wing Aircraft Risen? A Macroscopic Examination of the Trends in U.S. Military Aircraft Costs Over the Past Several Decades
,
Rand Corporation
,
Santa Monica, CA
.
22.
Lloyd
,
S.
,
2001
, “
Measures of Complexity: A Nonexhaustive List
,”
IEEE Control System Magazine
.
23.
Carey
,
K.
,
2016
, “
A Complexity Primer for Systems Engineers
,” INCOSE Working Group, INCOSE.
24.
Moses
,
J.
,
2004
,
Foundational Issues in Engineering Systems: A Framing Paper
,
Massachusetts Institute of Technology
,
Cambridge, MA
.
25.
Halstead
,
M. H.
,
1977
,
Elements of Software Science
,
Elsevier Science Inc.
,
New York
.
26.
Broniatowski
,
D. A.
, and
Moses
,
J.
,
2016
, “
Measuring Flexibility, Descriptive Complexity, and Rework Potential in Generic System Architectures: Metrics for Generic System Architectures
,”
Syst. Eng.
,
19
(
3
), pp.
207
221
.
27.
Baldwin
,
C. Y.
, and
Clark
,
K. B.
,
2000
,
Design Rules
,
The MIT Press
,
Cambridge, MA
.
28.
Sussman
,
J. M.
,
2002
,
Collected Views on Complexity in Systems
,
Massachusetts Institute of Technology
,
Cambridge, MA
.
29.
Suh
,
N.
,
2005
, “
Complexity in Engineering
,”
CIRP Ann.
,
54
(
2
), pp.
46
63
.
30.
Lindemann
,
U.
,
2009
,
Structural Complexity Management: An Approach for the Field of Product Design
,
Springer
,
Berlin
.
31.
Kossiakoff
,
A.
,
Sweet
,
S. N.
,
Seymour
,
S. J.
, and
Biemer
,
S.
,
2011
,
Systems Engineering: Principles and Practice
, 2nd ed.,
Wiley-Interscience
,
Hoboken, NJ
.
32.
ElMaraghy
,
W.
,
ElMaraghy
,
H.
,
Tomiyama
,
T.
, and
Monostori
,
L.
,
2012
, “
Complexity in Engineering Design and Manufacturing
,”
CIRP Ann.
,
61
(
2
), pp.
793
814
.
33.
Kolmogorov
,
A. N.
,
1983
, “
Combinatorial Foundations of Information Theory and the Calculus of Probabilities
,”
Russ. Math. Surv.
,
38
(
4
), pp.
29
40
.
34.
Kolmogorov
,
A. N.
,
1965
, “
Three Approaches to the Quantitative Definition of Information
,”
Probl. Inf. Transm.
,
1
(
1
), pp.
3
11
.
35.
Hu
,
S. J.
,
Zhu
,
X.
,
Wang
,
H.
, and
Koren
,
Y.
,
2008
, “
Product Variety and Manufacturing Complexity in Assembly Systems and Supply Chains
,”
CIRP Ann.
,
57
(
1
), pp.
45
48
.
36.
Zhu
,
X.
,
Hu
,
S. J.
,
Koren
,
Y.
, and
Marin
,
S.
,
2008
, “
Modeling of Manufacturing Complexity in Mixed-Model Assembly Lines
,”
ASME J. Manuf. Sci. Eng.
,
130
(
5
), p.
051013
.
37.
Zaeh
,
M. F.
,
Wiesbeck
,
M.
,
Stork
,
S.
, and
Schubö
,
A.
,
2009
, “
A Multi-dimensional Measure for Determining the Complexity of Manual Assembly Operations
,”
Prod. Eng.
,
3
(
4–5
), pp.
489
496
.
38.
Boushaala
,
A. A.
,
2010
, “
Project Complexity Indices Based on Topology Features
,”
World Acad. Sci. Eng. Technol.
,
70
, pp.
49
54
.
39.
Shouman
,
M. A.
,
Ghafagy
,
A. Z.
,
Zaghloul
,
M. A.
, and
Boushaala
,
A. A.
,
1999
, “
New Heuristics for Scheduling Single Constrained Resource Projects
,”
Alexandria Eng. J.
,
38
(
3
), pp.
161
177
.
40.
Badiru
,
A. B.
,
1988
, “
Towards the Standardization of Performance Measures for Project Scheduling Heuristics
,”
IEEE Trans. Eng. Manage.
,
35
(
2
), pp.
82
89
.
41.
Caprace
,
J.-D.
, and
Rigo
,
P.
,
2012
, “
A Real-Time Assessment of the Ship Design Complexity
,”
Comput. Aided Des.
,
44
(
3
), pp.
203
208
.
42.
Sinha
,
K.
, and
Suh
,
E. S.
,
2018
, “
Pareto-Optimization of Complex System Architecture for Structural Complexity and Modularity
,”
Res. Eng. Des.
,
29
(
1
), pp.
123
141
.
43.
Orfi
,
N.
,
Terpenny
,
J.
, and
Sahin-Sariisik
,
A.
,
2011
, “
Harnessing Product Complexity: Step 1—Establishing Product Complexity Dimensions and Indicators
,”
Eng. Econ.
,
56
(
1
), pp.
59
79
.
44.
Keane
,
R. G.
,
Deschamps
,
L.
, and
Maguire
,
S.
,
2015
, “
Reducing Detail Design and Construction Work Content by Cost-Effective Decisions in Early-Stage Naval Ship Design
,”
J. Ship. Prod. Des.
,
31
(
3
), pp.
1
14
.
45.
Jacobs
,
M. A.
,
2013
, “
Complexity: Toward an Empirical Measure
,”
Technovation
,
33
(
4–5
), pp.
111
118
.
46.
Mezić
,
I.
,
Fonoberov
,
V. A.
,
Fonoberova
,
M.
, and
Sahai
,
T.
,
2019
, “
Spectral Complexity of Directed Graphs and Application to Structural Decomposition
,”
Complexity
,
2019
(
Jan.
), pp.
1
18
.
47.
Anderson
,
R. J.
, and
Sturges
,
R. H.
,
2013
, “
System Behaviors and Measures: Logical Complexity and State Complexity in Naval Weapons Elevators
,”
Complex Syst.
,
22
(
3
), pp.
247
309
.
48.
Sinha
,
K.
,
Suh
,
E. S.
, and
de Weck
,
O.
,
2018
, “
Integrative Complexity: An Alternative Measure for System Modularity
,”
ASME J. Mech. Des.
,
140
(
5
), p.
051101
.
49.
El-Haik
,
B.
, and
Yang
,
K.
,
1999
, “
The Component of Complexity in Engineering Design
,”
IIE Trans.
,
31
(
10
), pp.
925
934
.
50.
Zhang
,
X.
, and
Thomson
,
V.
,
2016
, “
The Impact and Mitigation of Complexity During Product Design
,”
Int. J. Des. Nat. Ecodyn.
,
11
(
4
), pp.
553
562
.
51.
Zhang
,
X.
, and
Thomson
,
V.
,
2018
, “
A Knowledge-Based Measure of Product Complexity
,”
Comput. Ind. Eng.
,
115
, pp.
80
87
.
52.
Sipser
,
M.
,
1992
, “
The History and Status of the P Versus NP Question
,”
Proceedings of the Twenty-Fourth Annual ACM Symposium on Theory of Computing—STOC’92
,
Victoria, British Columbia
,
May 4–6
, pp.
603
618
.
53.
Fortnow
,
L.
, and
Homer
,
S.
,
2002
, “
A Short History of Computational Complexity
,”
Bulletin of the EATCS
,
80
, pp.
27
.
54.
Prather
,
R. E.
,
1984
, “
An Axiomatic Theory of Software Complexity Measure
,”
Comput. J.
,
27
(
4
), pp.
340
347
.
55.
Shannon
,
C. E.
,
1948
, “
A Mathematical Theory of Communication
,”
Bell Syst. Tech. J.
,
27
(
3
), pp.
379
423
.
56.
Suh
,
N. P.
,
1998
, “
Axiomatic Design Theory for Systems
,”
Res. Eng. Des.
,
10
, p.
21
.
57.
Griffin
,
A.
,
1993
, “
Metrics for Measuring Product Development Cycle Time
,”
J. Prod. Innov. Manage.
,
10
(
2
), pp.
112
125
.
58.
Chen
,
L.
, and
Li
,
S.
,
2005
, “
Analysis of Decomposability and Complexity for Design Problems in the Context of Decomposition
,”
ASME J. Mech. Des.
,
127
(
4
), pp.
545
557
.
59.
Bonilla
,
F.
,
Holzer
,
T.
, and
Sarkani
,
S.
,
2020
, “
Complexity Measure for Engineering Systems Incorporating System States and Behavior
,”
IEEE Syst. J.
(Early Access
), pp.
1
12
.
60.
Sosa
,
M. E.
,
Mihm
,
J.
, and
Browning
,
T. R.
,
2013
, “
Linking Cyclicality and Product Quality
,”
Manuf. Serv. Oper. Manage.
,
15
(
3
), pp.
473
491
.
61.
Luo
,
J.
, and
Magee
,
C. L.
,
2011
, “
Detecting Evolving Patterns of Self-Organizing Networks by Flow Hierarchy Measurement
,”
Complexity
,
16
(
6
), pp.
53
61
.
62.
Pahl
,
G.
,
Wallace
,
K.
, and
Blessing
,
L.
,
2007
,
Engineering Design: A Systematic Approach
, 3rd ed.,
Springer
,
London
.
63.
Levitt
,
R. E.
,
Thomsen
,
J.
,
Christiansen
,
T. R.
,
Kunz
,
J. C.
,
Jin
,
Y.
, and
Nass
,
C.
,
1999
, “
Simulating Project Work Processes and Organizations: Toward a Micro-Contingency Theory of Organizational Design
,”
Manage. Sci.
,
45
(
11
), pp.
1479
1495
.
64.
Perrow
,
C.
,
1999
,
Normal Accidents: Living With High-Risk Technologies
,
Princeton University Press
,
Princeton, NJ
.
65.
Gomes
,
V. M.
,
Paiva
,
J. R. B.
,
Reis
,
M. R. C.
,
Wainer
,
G. A.
, and
Calixto
,
W. P.
,
2019
, “
Mechanism for Measuring System Complexity Applying Sensitivity Analysis
,”
Complexity
,
2019
(
Apr.
), pp.
1
12
.
66.
Maier
,
M. W.
, and
Rechtin
,
E.
,
2000
,
The Art of Systems Architecting
, 2nd ed.,
CRC Press
,
Boca Raton, FL
.
67.
Crawley
,
E.
,
Cameron
,
B.
, and
Selva
,
D.
,
2015
,
System Architecture: Strategy and Product Development for Complex Systems
, 1st ed,
Pearson
,
Hoboken, NJ
.
68.
Rasmussen
,
J.
,
1985
, “
The Role of Hierarchical Knowledge Representation in Decision Making and System Management
,”
IEEE Trans. Syst. Man Cybern.
,
SMC-15
(
2
), pp.
234
243
.
69.
Shafiei-Monfared
,
S.
, and
Jenab
,
K.
,
2012
, “
A Novel Approach for Complexity Measure Analysis in Design Projects
,”
J. Eng. Des.
,
23
(
3
), pp.
185
194
.
70.
Vidal
,
L.-A.
,
Marle
,
F.
, and
Bocquet
,
J.-C.
,
2011
, “
Measuring Project Complexity Using the Analytic Hierarchy Process
,”
Int. J. Project Manage.
,
29
(
6
), pp.
718
727
.
71.
Luo
,
J.
, and
Wood
,
K. L.
,
2017
, “
The Growing Complexity in Invention Process
,”
Res. Eng. Des.
,
28
(
4
), pp.
421
435
.
72.
Wang
,
H.
,
Zhu
,
X.
,
Wang
,
H.
,
Hu
,
S. J.
,
Lin
,
Z.
, and
Chen
,
G.
,
2011
, “
Multi-objective Optimization of Product Variety and Manufacturing Complexity in Mixed-Model Assembly Systems
,”
J. Manuf. Syst.
,
30
(
1
), pp.
16
27
.
73.
Allaire
,
D.
,
He
,
Q.
,
Deyst
,
J.
, and
Willcox
,
K.
,
2012
, “
An Information-Theoretic Metric of System Complexity With Application to Engineering System Design
,”
ASME J. Mech. Des.
,
134
(
10
), p.
100906
.
74.
Sinha
,
K.
,
2014
, “
Structural Complexity and Its Implications for Design of Cyber—Physical Systems
,”
dissertation
,
Massachusetts Institute of Technology
,
Cambridge, MA
.
75.
Moses
,
J.
,
2010
, “Flexibility and Its Relation to Complexity and Architecture,”
Complex Systems Design & Management
,
M.
Aiguier
,
F.
Bretaudeau
, and
D.
Krob
, eds.,
Springer
,
Berlin
, pp.
197
206
.
76.
Mathieson
,
J. L.
,
Wallace
,
B. A.
, and
Summers
,
J. D.
,
2013
, “
Assembly Time Modelling Through Connective Complexity Metrics
,”
Int. J. Comput. Integr. Manuf.
,
26
(
10
), pp.
955
967
.
77.
Min
,
G.
,
Suh
,
E. S.
, and
Hölttä-Otto
,
K.
,
2016
, “
System Architecture, Level of Decomposition, and Structural Complexity: Analysis and Observations
,”
ASME J. Mech. Des.
,
138
(
2
), p.
021102
.
78.
Sinha
,
S.
,
Kumar
,
B.
, and
Thomson
,
A.
,
2011
, “
Complexity Measurement of a Project Activity
,”
Int. J. Ind. Syst. Eng.
,
8
(
4
), pp.
432
448
.
79.
Yu
,
S. B.
, and
Efstathiou
,
J.
,
2006
, “
Complexity in Rework Cells: Theory, Analysis and Comparison
,”
J. Oper. Res. Soc.
,
57
(
5
), pp.
593
602
.
80.
Kumari
,
M.
, and
Kulkarni
,
M. S.
,
2019
, “
Single-Measure and Multi-Measure Approach of Predictive Manufacturing Control: A Comparative Study
,”
Comput. Ind. Eng.
,
127
(
Jan.
), pp.
182
195
.
81.
Bhattacharjee
,
T. K.
, and
Sahu
,
S.
,
1990
, “
Complexity of Single Model Assembly Line Balancing Problems
,”
Eng. Costs Prod. Econ.
,
18
(
3
), pp.
203
214
.
82.
Sivadasan
,
S.
,
Efstathiou
,
J.
,
Calinescu
,
A.
, and
Huatuco
,
L. H.
,
2006
, “
Advances on Measuring the Operational Complexity of Supplier–Customer Systems
,”
Eur. J. Oper. Res.
,
171
(
1
), pp.
208
226
.
83.
Weyuker
,
E. J.
,
1988
, “
Evaluating Software Complexity Measures
,”
IEEE Trans. Softw. Eng.
,
14
(
9
), pp.
1357
1365
.
84.
Hölttä-Otto
,
K.
,
Chiriac
,
N. A.
,
Lysy
,
D.
, and
Suk Suh
,
E.
,
2012
, “
Comparative Analysis of Coupling Modularity Metrics
,”
J. Eng. Des.
,
23
(
10–11
), pp.
790
806
.
85.
Steward
,
D. V.
,
1981
, “
The Design Structure System: A Method for Managing the Design of Complex Systems
,”
IEEE Trans. Eng. Manage.
,
EM-28
(
3
), pp.
71
74
.
86.
Browning
,
T. R.
,
2016
, “
Design Structure Matrix Extensions and Innovations: A Survey and New Opportunities
,”
IEEE Trans. Eng. Manage.
,
63
(
1
), pp.
27
52
.
87.
Steven
,
D. E.
, and
Browning
,
T. R.
,
2010
,
Design Structure Matrix Methods and Applications
,
The MIT Press
,
Cambridge, MA
.
88.
Sharman
,
D. M.
, and
Yassine
,
A. A.
,
2004
, “
Characterizing Complex Product Architectures
,”
Syst. Eng.
,
7
(
1
), pp.
35
60
.
89.
Tilstra
,
A. H.
,
Seepersad
,
C. C.
, and
Wood
,
K. L.
,
2012
, “
A High-Definition Design Structure Matrix (HDDSM) for the Quantitative Assessment of Product Architecture
,”
J. Eng. Des.
,
23
(
10–11
), pp.
767
789
.
90.
Guo
,
F.
, and
Gershenson
,
J. K.
,
2007
, “
Discovering Relationships Between Modularity and Cost
,”
J. Intell. Manuf.
,
18
, pp.
143
157
.
91.
Ulrich
,
K.
,
1995
, “
The Role of Product Architecture in the Manufacturing Firm
,”
Res. Policy
,
24
(
3
), pp.
419
440
.
92.
Browning
,
T. R.
,
2001
, “
Applying the Design Structure Matrix to System Decomposition and Integration Problems: A Review and New Directions
,”
IEEE Trans. Eng. Manage.
,
48
(
3
), pp.
292
306
.
93.
Huang
,
C.-C.
, and
Kusiak
,
A.
,
1998
, “
Modularity in Design of Products and Systems
,”
IEEE Trans. Syst. Man Cybern. Part A Syst. Humans
,
28
(
1
), pp.
66
77
.
94.
Sharman
,
D. M.
, and
Yassine
,
A. A.
,
2007
, “
Architectural Valuation Using the Design Structure Matrix and Real Options Theory
,”
Concurr. Eng.
,
15
(
2
), pp.
157
173
.
95.
Suh
,
N. P.
,
1999
, “
A Theory of Complexity, Periodicity and the Design Axioms
,”
Res. Eng. Des.
,
11
, p.
16
.