Estimates of the costs incurred by a data breach can vary enormously. For instance, a 2015 Congressional Research Service report titled “The Target and Other Financial Data Breaches: Frequently Asked Questions” compiled seven different sources’ estimates of the total losses resulting from the 2013 Target breach, ranging from $11 million to $4.9 billion. The high degree of uncertainty and variability surrounding cost estimates for cybersecurity incidents has serious policy consequences, including making it more difficult to foster robust insurance markets for these risks as well as to make decisions about the appropriate level of investment in security controls and defensive interventions. Multiple factors contribute to the poor data quality, including that cybercrime is continuously evolving, cyber criminals succeed by covering their tracks and victims often see more risk than benefit in sharing information. Moreover, the data that does exist is often criticized for an over-reliance on self-reported survey data and the tendency of many security firms to overestimate the costs associated with security breaches in an effort to further promote their own products and services.
While the general lack of good cost data presents a significant impediment to informed decision-making, ignorance of the economic impacts of data breaches varies across categories of costs, events, and stakeholders. Moreover, the need for precision, accuracy, or concurrence in data estimates varies depending on the specific decisions the data is intended to inform. Our overarching goals in this paper are to clarify which types of cybersecurity cost data are more easily collected than others; how policymakers might improve data access and why previous policy-based efforts to do so have largely failed; and what differential ignorance implies for cybersecurity policy and investment in cyber defenses and mitigation.
To address these questions, we examine several common presumptions about the relative magnitudes of cybercrime cost effects for which generally accepted and reasonably precise quantitative estimates are lacking. For example, we review the evidence supporting the commonly accepted and often cited claims that the aggregate investments in defending against and remediating cybercrimes significantly exceed the aggregate investments by attackers; and that the aggregate harm suffered by victims of cybercrimes exceeds the benefits realized by attackers. There are other such statements that are more contentious. For example, it is unclear whether the aggregate expenditures on cyber defense and remediation exceed the aggregate harms from cybercrimes; or whether a significant change in expenditures on cyber defense and remediation would result in proportionately larger changes in the harms resulting from cybercrimes. For each of these presumptions, we consider the existing evidence, what additional evidence might be needed to develop more precise quantitative estimates, and what better estimates might imply for cyber policy and investment.
We argue that the persistent inability to accurately estimate certain types of costs associated with data breaches—especially reputational and loss-of-future-business costs—has played an outsize and detrimental role in dissuading policy-makers from pursuing the collection of cost data related to other, much less fundamentally uncertain costs, including legal fees, ex-ante defense investments, and credit monitoring and notification. Finally, we propose steps for policy-makers to take towards aggregating more reliable, consistently collected cost data associated with data breaches for the categories of costs that are most susceptible to rigorous measurement, without getting too bogged down in discussions of the costs that are most difficult to measure, and which are therefore, by necessity, likely to remain most uncertain. We argue that the high degree of ignorance and uncertainty surrounding this subset of data breach costs should not be used as a reason to abandon measurement of other types of losses incurred by these incidents, and that explicit consideration of our differential ignorance of breach cost elements can help us better understand which questions about the economic impacts of data breaches can and cannot be meaningfully answered.