Methods of analyzing national power: pitfalls and best practices
By Nicholas Kitchen
Published: May 1, 2026
Executive Summary
Power is the foundational concept in political science, particularly in international relations (IR), yet the multidimensional nature of this concept makes the process of "procedurally framing" it for the purposes of power analysis fraught with methodological difficulties.
As a specialized practice, international relations has used alternative aggregated indicators of national power such as the Composite National Capacity Index (CINC) in order to interpret patterns of interactions between great powers over time. Modeling this “balance of forces” is a theoretical trick that uses “best-matched” variables to demonstrate broad structural patterns of equilibrium behavior.
On the one hand, these endeavors have had mixed success when judged against the historical record. On the other hand, it has been misunderstood and used as a future analytical tool for foreign policy analysis, representing a “misclassification”: multi-dimensional power, and the exercise of power always occurs within a social relationship between the actors. So, the idea of a single total amount of national power in abstract form makes no sense.
National strength assessment programs face a conceptually challenging task. And since the dimensions and areas of power are not replaceable or aggregable, and since subjectivity and context are inherent in how power relations work, generating accurate and detailed knowledge of a nation’s ability to apply resources to secure its goals risks flooding analysts with endless variables.
Furthermore, the choice of capabilities that have the “willingness” to produce the force will depend on the purposes of the force. And yet the meaning of being strong is subjective and contested: there is variation not only in terms of policy objectives, but on a more fundamental level of what constitutes the successful operation of power.
But some form of power assessment is necessary for governments, both as a matter of short-term policy formulation and as a basis for long-term planning. Strategic competition is not a single theater: there are different areas of competition, each with its own logic of interaction, its own instruments of coercion and persuasion.
Deconstructing questions of national power into “domains” is a useful first step for analysts seeking to assess the status of so-called “great power competition.” In each area, assessing the status of power is not just a case of understanding the relevant capabilities that might produce a strategic advantage. Data quality is critical, but often absent, leading to reliance on alternative measures that do not directly assess the capabilities themselves.
Capacity size assessments must be contextualized according to the dynamics of how they are employed, taking into account the structural features of the domain that determine how the advantage is formed. These may include factors such as geography, track correlation, network and knowledge effects, and questions about whether the important is the rank of force, the power gap, or the distance from the lead. It is not just the place, but the place within the distribution—whether the force is concentrated or dispersed—that determines the position of the force.
Accurate and disaggregated data-driven approaches to domain-level power are an essential part of the process of defining problems and developing policies for countries. Yet, this analytical task is rarely rigorously accomplished. Instead, there is the temptation to produce “dashboards” based on weighted alternative metrics that give the false impression of clarity, by delegating the phenomenon we are looking to understand to an easily accessible alternative. And when such efforts provide us with the degree and justification to act, their results risk becoming the very end of politics.
For this reason, the comprehensive strategic assessment approaches have shown skepticism towards data-driven approaches and have avoided any kind of integrated approach to national strength analysis. Instead, comprehensive assessments have preferred to use sporadic and separate studies as part of an ongoing diagnostic effort to understand the nature of the environments in which different aspects of competition take place, and how they can change.
The two approaches do not exclude each other. Qualitative assessments of areas are a prerequisite for understanding what data may be appropriate to collect and how it is analytically contextualized. Therefore, the goal of power analysis should not be to shorten complexity, but rather to produce an impressionistic picture of the multiple domains and dimensions of power that embrace nuances, opening discussions about long-term strategy rather than seeking to provide distinct answers to immediate policy questions.
Strength assessments shape our perceptions of the limits of what is possible. Quantitative classifications and datasheets can provide false confidence, or focus our attention on the scale that aims to reverse the force, rather than the force itself. Deeper and critical assessments, based on simulation methods, will draw less clear conclusions, but in doing so could open political conversations about resource investments, resilience strategies, and cooperation policies with allies and partners.
introduction
The interest in assessing power among practical policymakers reflects the “resurgence” of great-power competition, with major countries viewing international affairs as fundamentally characterized by strategic great-power competition. If states are operating in the age of competition, they should know who the winner is.
And yet, there are different understandings of what winning means and how to recognize it. One view is that it is sufficient to secure preference in common metrics used to measure strength and recognition by others. Another view is that the inherent superiority of this kind is not enough, rather there is a need for displays of coercive obedience that are secured by the application of coercive abilities. A third group believes that resorting to such offers is a sign of failure; They see power in the ability to shape consensual outcomes through influence. Others see the need for these types of consistently active strategies as evidence of a lack of the most fundamental form of power: having shared preferences nurtured through consent manufacturing.
These differences may be overshadowed by the significant strategic posture of recent U.S. administrations, from the emphasis on capabilities in the 1992 Defense Planning Directive, to the positive dynamics of expansion under Clinton, and imperial assertiveness after 9/11. The Obama administration claimed victories in the outcome of painstaking multilateral diplomatic initiatives; In contrast, Donald Trump’s strong commitment to “win” requires creating losers on the other side.
Such disagreements are important, because different ways of thinking have different — and sometimes contradictory — implications for the development of a national strategy. It also has implications for how power is assessed. The purpose of this paper is to explain the pros and cons of the different approaches to assessing national strength by explaining the introductions behind the main approaches used today. While the conceptual and methodological considerations described here have an impact on the rigor with which any assessment of force can be made, the effectiveness of these analyses will depend on their suitability for a fundamental vision of how force works, and what the purpose of national force is.
To guide the reader through this reassessment, the paper goes through four stages:
The conceptual foundations of power are established as a context-specific capability rather than a fixed set of resources or data points.
It criticizes the pitfalls of conventional aggregate analogies and alternative analogies, suggesting instead that foreign policy analysts and national security strategists deconstruct national power into functional domains.
The details of the practice of evaluating these areas are illustrated by developing a causal theory of force to determine where structural advantages really lie.
“Simulation” is presented as a crucial tool for testing the transformation of an underlying feature into a relational influence, and concludes with a warning against allowing reductive “datasheets” to be a substitute for rigorous strategic thinking.
Strength analysis: basic conceptual foundations
On one level, the definition of power is relatively simple: Robert Dahl’s classic definition, derived from Max Weber’s concept of ability (Macht) as “the opportunity to make one’s will prevail within a social relationship, and also against resistance,” captures the intuition that power is one actor’s ability to push another to do something they prefer not to do. And yet, every major conceptual study of power suggests a lack of agreement on how to conceive and define power, and other similar terms such as influence, control, power, persuasion, and coercion. The many forms of force described in the literature suggest that force is a fundamentally elusive concept.
Power is power, not result
Power and influence are sometimes confused, or used interchangeably. And yet, as Peter Morris notes, the two terms express different ideas. "Power always refers to an ability to do things... a talent, ability, or a predisposing characteristic." "Power", while more prevalent in its use, tends to refer to the (successful) exercise of that power.
This leads us to the first of two fallacies in power analysis: the confusion between power and its (successful) exercise. And here we are faced with a major methodological difficulty inherent in the evaluation of strength: to check the presence of force, we must demonstrate ability through successful effects. Of course, saying who was strong is clearly less of a practical benefit than deciding who was strong. But this is also not how we tend to use the concept: few would suggest that the United States was less powerful than North Vietnam in the 1960s. In a war that was asymmetrical, the United States did not succeed in achieving its goals, but this would not lead to the conclusion that it was less powerful in that specific relationship or in a more general assessment. The fallacy of practice is why force should be regarded as a possibility, as an opportunity to triumph, as opposed to a proven ability: force is not the same as the effects of force.
The prevailing solution to the fallacy of practice in international relations has been to think of force in terms of basic resources: the "rules" of force, or often "capabilities." The phrase "rules of force" refers to a second fallacy, which is the fallacy of means, namely that force is not the same means – or rules or resources – that it produces. Capabilities—from diplomats and soldiers to money and guns—are not strong in themselves: the likelihood that they will be able to produce the effects of force successfully depends on the nature of the interaction in which they are deployed.
This is not to say that capabilities have no self-contained potential. Certain resources, by their nature, possess the capacity to produce the effects of force in certain circumstances; And this is its "predisposition". For example, a bottle of whiskey on the shelf has the potential to cause sugar, but it can produce this effect only when consumed, and its effect will depend on both the drinker (his weight and experience) and the context in which she drinks. Such an ability is not a guarantee, but a possibility: as Lux puts it, “Power is a possibility, not a reality – in fact, it is a possibility that may never be realized.”
Power is context-specific
Understanding power requires probabilistically understanding the context in which it operates, including the structures in which interactions take place, the identity and objectives of the actors, and the presence of risk and uncertainty.
Structural Context: Historical trajectories, infrastructure, institutions, and ideas can enable or restrict the interaction of actors, creating a “structural force” for some of these actors. Structural power may stem from a relational advantage: in the moments of order-making following major wars, dominant states may seek to create permanent arrangements that confer an advantage in the future, generating costs if others seek to abandon those structures.
And at the same time, deep social and cultural power emerges and forms norms, perceptions and preferences for a clear natural order of things. This “third face” of power leaves actors either unable or unwilling to conceive alternatives. Whether we attribute this natural order to a “malign” exercise of hegemonic power, as Lux puts it, or conceive it more benignly, it is easier for actors to succeed when their interests and preferences align with the prevailing order.
Risks and Uncertainty: While capabilities may be willing to produce effects, this probability is risk-related and subject to uncertainty. Calculable risks include the risk of misjudging power relationships, incomplete implementation of the strategy, and the response of the other side. Agile actors with "protein power" to adapt, improvise, and innovate quickly are often better positioned to handle the unexpected.
There is no permanent hierarchy of power
It has been axiomatic in realist thinking that the “ultima ratio” of power in international relations is war, and therefore military capacity, and the economic and other resources that underpin it, are the main determinants of strategic competition. And yet, while military power is certainly important, superior military power is generally not decisive either in armed conflict or in "peaceful" strategic competition. And since different forms of power work in different ways, it makes little sense to talk about a permanent hierarchy of power resources. And the historical absence of nuclear-armed forces from using significant military force against each other, despite engaging in major conflicts, puts further doubt on the notions of linear hierarchy of force.
Actors shape their power relationships by how they perceive their interactions: Power can appear in a relationship only when the actors psychologically allow it. Weber's concept of ability (Macht) also refers to the distinct concept of resistance, against which the opportunity to achieve the power capacity rests. Those in positions of weaker power can raise costs over more powerful adversaries, but they may also resist power on a deeper psychological and societal level, generating deep, non-quantifiable resilience.
The absence of a hierarchy of power resources, coupled with the ability of actors to innovate and psychologically resist the imperatives of apparent power, should serve as a warning to power analysts. Not only are these dynamics difficult to identify, predict, and measure, but they can have a decisive impact on power relations. The facts of power are always more incidental and uncertain than they seem.
Strength Measurement: Capabilities, Measures, and Assembly
This "positivist" approach is cognitively based on two assumptions, both of which, as the previous sections have shown, are not necessarily clear: First, we can assume a relatively consistent relationship between latent power resources and the outcomes of power relations; and, second, we can assume a relatively consistent relationship between the potential power resources and the outcomes of power relations. And second, that power resources are measurable realities. Moving forward on that basis leaves us with two key methodological questions: Which capabilities should we measure? And how should we measure it?
Limits of capacity measures and alternative measures
There are inherent challenges in measuring physical capabilities. Some force resources are more easily measured than others (e.g., artillery destructiveness). And yet other common measures are more problematic. For example, it is generally agreed that a large R&D capability is an important support base for a force, but linking R&D to specific capabilities is much more difficult.
To get around challenges like these, alternative measures are often used, but these shorten the requirements to make accurate assessments in ways that can mislead. Spending – one of the most common alternative metrics – is a measure of input rather than output: cost overruns in capital programs, increased costs for veterans’ pensions, and widespread corruption in procurement, all of which can increase the alternative measure, while adding nothing to actual capacity.
Perhaps the most prevalent alternative measure of “national strength” is GDP, which, while widespread, fails to deduct care costs and relies on statistical sampling methods that are “empirically impressionistic” at best. Imagine that in 2014, a periodic reconnection of Nigeria’s GDP statistics led to its GDP growth of 89% overnight. No serious international relations scholar would conclude that Nigeria is nearly twice as strong, but that would be the conclusion of any number of national power formulas.
These limits must give us caution in any pursuit of full quantitative measures of national power. The use of a systems analysis approach can have detrimental effects both in military behavior and strategic decision-making, as happened during Robert McNamara's tenure as Secretary of Defense, when enemy body counts and murder rates were used as a key measure of performance in pursuing the Vietnam War. However, due analytical caution does not mean that we should reject quantitative metrics; Statistical metrics can provide accurate images of specific capacities and supporting resources. The challenge is to understand what those images mean in terms of power, which requires careful consideration in how the results are contextualized, understood and used. Selection of metrics is never a scientifically neutral act: choices will shape who and what can be considered powerful.
The Limits of “National Power” as a Concept of Practical Policy
In the discipline of international relations, measures of force were invented and operated for a specific theoretical purpose: the discovery of the balance of forces. At the heart of thinking about the balance of power is the idea of collective measures of national power. And yet national power pooling is more problematic than these approaches suggest. Pooled measures choose and weigh variables in a fundamentally arbitrary way, relying on self-verification of the "smell test". This unspecified basis results in mismatched results. When policymakers think about the tools they need to succeed, they ask a somewhat different question than social scientists who seek to explain the causal determinants of the Great War. For sociologists, pooled power is a useful baseline for theory building. But for practical political purposes, the concept of “national power” has real flaws.
This is partly because the specific capabilities that will be relevant depend on the nature of the force interaction in question, and international strategy and diplomacy present multidimensional problems. But it is also a more fundamental conceptual problem: there is simply no measurement criterion by which the various power resources can be evaluated against each other and thus grouped at the national level. This will not be a big problem if the capabilities are fungible, i.e. easily transferable from one form to another. Similarly, if there is one issue that dominates international politics, all power is intended to apply to it, aggregation may be possible. But neither this nor that is the case: the majority of resources show very limited transferability and produce specific and different force effects, and nations develop broad and diverse capacities because the purposes for which they are to be developed are wide and varied.
Domains and metrics
Replaceability and aggregability issues become less important the smaller the scope of the assessment. Focusing on distinct tools or elements is a technique used to assess capabilities. Formulas found in the defense policy literature, such as DIME or MIDFIELD, help capture the different types of power assets available to state actors.
To be effective, the Elements of Power approach requires a degree of context about the purposes for which the capabilities will be used, and the actors with whom they will interact. Without such a context, policymakers may be encouraged to merely build in ways that increase the value of instruments but do not necessarily increase the chance of victory. In short, capacity size may be treated as an inherent value even when it is irrelevant, or it may be detrimental to solving the policy problem at hand.
Thinking about power in terms of areas (the origin in the behavior of another state you seek to influence) is a more effective approach. It allows assessments to integrate the capabilities most relevant to those interactions. And when identifying areas, care must be taken to ensure that the aggregation of any indicators reflects relatively homogeneous and replaceable phenomena. As a result, in-field assembly has been more effective on economic, financial and technological issues, where a monetary value of the components can be set without major problems. "In the military sphere, it is not clear that naval, land, air, and cyber assets can be combined into a single metric without a specific probability assessment of the expected nature of military interactions."
Strength measurements by field
Assessing strength within domains helps connect abilities to aspects of behavior that apply to them. But measuring strength within a field is not just a matter of generating quantitative magnitude measurements of the relevant capacity. To assess strength within an area, we need to understand both the nature and operation of the force within that area.
What is the causal theory of force? The theory of force describes the causal chain between the data being collected and the effects of the potential force of power. It has two main elements: First, what is the connection between what is being measured (the scale or the power index) and the power resource itself? Basically, how good is the variant at picking up the power base that we're trying to measure? Secondly, what is the nature of that resource's ability to generate force effects?
What is the nature of the feature? How is power formed in this area? Are the actors' abilities purely relative, or is there a "winner-take-all" dynamic where the dominant actor gets a disproportionate benefit? Do you understand the advantage hierarchically as a matter of rank, or is the distance between the parties important? And in what ways does geography matter? Are there institutional enablers or barriers, normative constraints, or other social structures that shape the way power resources can be employed?
What is the appropriate statistical measure? Quantitative data can provide an efficient picture of power resources as they are operated within domains that can be fully captured by a single metric. Determining the appropriate metric may be a simple question about whether a measure of nominal size, global share, per capita, or adjusted or consolidated data is the most appropriate metric. Or it could be a more complex question about deriving an impressionistic compound from a series of alternative metrics.
Is there good data? Economic studies of power have given the temptation to work backwards from available data without building a proper theory of power, and neglected dimensions of power that cannot be easily quantified, such as the quality of diplomacy that is central to the power of agenda-setting. The data should measure, as much as possible, the power resource itself, and not just be a substitute for it.
Concentration and Dispersion
When scales are used to create an image of the operation and structure of the force within a field, and the distribution of capabilities between actors, the resulting image may show both the "concentration" and the "dispersion" of the force.
"Concentration" refers to a situation in which a single actor (or relatively few actors) has the potential to significantly influence, either across a wide range of actors, or in relation to the most important actors.
a significant capability gap in terms of quantity (e.g., the United States has more military aircraft than the next five countries combined).
Qualitative power differentiation as a result of technology (China's dominance of ultra-high voltage electricity transmission).
Structural or network dynamics create certain advantages (the centrality of the US dollar and US regulated financial institutions to global payment systems).
"Dispersion" refers to the opposite situation, where the distribution of capabilities and the effect of structural arrangements lead to a situation in which no actor is particularly differentiated in terms of its ability to influence other actors. This does not mean that power is evenly distributed. Dispersion is more likely as institutional arrangements mitigate capacity divergence or create positive outcome dynamics.
From Structural Advantage to Relationship Influence
Assessments of concentration or dispersion within domains can indicate where structural advantage or weakness lies. To explain this result, it would be useful to clarify the main assumptions about the objective and framework of competition within that area, the limits of behavior, and the identities of other actors. Turning a structural advantage within a domain into the influence of my relationships over the behavior of another actor may be as simple as signaling the existence of the advantage. Alternatively, it may require the deployment of capabilities, supported by that structural advantage, in order to coerce or stimulate.
At this point, simulation is an essential tool for force analysis. The simulation focuses primarily on understanding why actors are likely to behave in certain ways. Simulating power relationships allows analysts to examine, in a specific relationship of power competition, which capabilities, in what combination, when, where, how and at what cost, the actors are willing to employ them to secure their interests and achieve their goals. Simulations take us beyond force in principle—spacing in capabilities or privileged network location—to power in practice, enabling us to understand how actors perceive and respond to the application of force. Simulations are especially useful when repetitions show the commitment of the actors over time: a more powerful but less committed actor may be able to win the first time, but may then be distracted, while the more interested actor repeatedly returns to the cause.
A variety of analytical tools fall under the banner of “simulation,” including military war games, red-teaming, and stress-testing. Simulation is a qualitative technique, unlike formal modeling or systems analysis that uses constant variables and consistent and measurable criteria. What the simulation allows is to draw inferences and evaluate not only from the results of the exercise itself but also from the insights and reflections of the participants.
Conclusion
The questions of analyzing national power tempt us with a vision of the world in which one country or another is the "No. 1". Who is on the rise and who is on the decline, similar to the tables of the sports league, are attractive abbreviations, capturing the attention of the public and enabling political actors to shape narratives. As Joseph Nye put it: "Periods of 'decline' tell us more about popular psychology than they do about geopolitics."
But assessing strength will also shape perceptions of the limits of what is possible. The probabilistic nature of national strength assessment creates risks that those who interpret the results may not appreciate the inherent symptoms of the results. Data-driven approaches in particular can provide false confidence: metrics change our understanding of value by delegating what we're looking for to an accessible alternative, just as a "FitBit" device reduces our multidimensional goal of health and fitness to a single measure of the number of steps, providing us with a score and with it a motivational chart. By outsourcing the value deliberation process, we can stop thinking about what it really means to be healthy and just focus on how many steps we can take.
Alternative capacity assessments provide such a shortcut. And yet the goal of power analysis must be deeper, to engage in questions about what power means and how it works within domains. Building an impressionistic picture that necessarily captures the multiple dimensions of power will not provide clear conclusions, but it can open political conversations about resource investments, resilience strategies, and cooperation policies with allies and partners. The temptation to create reductionist classifications, visualizations, or datasets that strip data of its underlying context may lead to misinterpretation by its users. Frequent use of these tools may lead to them becoming the very end of politics, as decision makers conclude that a stronger position in relation to opponents and competitors stems from policies that lead to an improvement in the metrics used. Policymakers should be wary of focusing on the index, rather than embracing the complexity of the social reality it purports to represent.
About the Author:
Nicholas Kitchen: Associate Professor at the University of Surrey and the executive director of the Center for the Study of Global Power Competition. This work was produced as part of the Carnegie Foundation's Beyond Disruption initiative.
comments