Evaluation Summary and Metrics: "Long Term Cost-Effectiveness of Resilient Foods for Global Catastrophes Compared to Artificial General Intelligence Safety"
Evaluation 1 of "Long term cost-effectiveness of resilient foods for global catastrophes compared to artificial general intelligence"
This is an evaluation of Denkenberger et al (2022).
90% CI (20, 60)
“On a ‘scale of journals’, what ‘quality of journal’ should this be published in?: Note: 0= lowest/none, 5= highest/best”
90% CI: (1, 2)
See HERE for a more detailed breakdown of the evaluators’ ratings and predictions.
[Manager note: we corrected minor typos in the text.]
This is a very interesting paper on an important and neglected topic. I’d be surprised if I ever again read a paper with such potential importance to global priorities. The authors motivate the discussion well, and should be highly commended for their clear presentation of the structural features of their model, and the thoughtful nature in which uncertainty was addressed head-on in the paper.
Overall, I suspect the biggest contribution this paper will make is contextualising the existing work done by the authors on resilient food into the broader literature of long-termist interventions. This is a significant achievement, and the authors should feel justifiably proud of having accomplished it. However, the paper unfortunately has a number of structural and technical issues which should significantly reduce a reader’s confidence in the quantitative conclusions which aim to go beyond this contextualisation.
In general, there are three broad areas where I think there are material issues with the paper:
The theoretical motivation for their specific philosophy of cost-effectiveness, and specifically whether this philosophy is consistent throughout the essay
The appropriateness of the survey methods, in the sense of applying the results of a highly uncertain survey to an already uncertain model
Some specific concerns with parameterisation
None of these concerns touch upon what I see [as] the main point of the authors, which I take to be that ‘fragile’ food networks should be contextualised alongside other sources of existential risk. I think this point is solidly made, and important. However, they do suggest that significant additional work may be needed to properly prove the headline claim of the paper, which is that in addition to being a source of existential risk the cost-effectiveness of investing in resilient food is amongst the highest benefit-per-cost of any existential risk mitigation.
Structure of cost-effectiveness argument
One significant highlight of the paper is the great ambition it shows in resolving a largely intractable question. Unfortunately, I feel this ambition is also something of a weakness of the paper, since it ends up difficult to follow the logic of the argument throughout.
Structurally, the most challenging element of this paper in terms of argumentative flow is the decision to make the comparator for cost-effectiveness analysis ‘AGI Catastrophe’ rather than ‘do nothing’. My understanding is that the authors make this decision to clearly highlight the importance of resilient food – noting that, “if resilient foods were more cost effective than AGI safety, they could be the highest priority [for the existential risk community]” (since the existential risk community currently spends a lot on AGI Risk mitigation). So roughly, they start with the assumption that AI Risk must be cost-effective, and argue that anything more cost-effective than this must therefore also be cost-effective. The logic is sound, but this decision causes a number of problems with interpretability, since it requires the authors to compare an already highly uncertain model of food resilience against a second highly uncertain model of AGI risk.
The biggest issue with interpretability this causes is that I struggle to understand what features of the analysis are making resilient food appear cost-effective because of some feature of resilient food, and which are making resilient food appear cost-effective because of some feature of AI. The methods used by the authors mean that a mediocre case for resilient food could be made to look highly cost-effective with an exceptionally poor case for AI, since their central result is the multiplier of value on a marginally invested dollar for resilient food vs AI. This is important, because the authors’ argument is that resilient food should be funded because it is more effective than AI Risk management, but this is motivated by AI Risk proponents agreeing [that] AI Risk is important – in scenarios where AI Risk is not worth investing in then this assumption is broken and cost effectiveness analysis against a ’do nothing’ alternative is required. For example, the authors do not investigate scenarios where the benefit of the intervention in the future is negative because “negative impacts would be possible for both resilient foods and AGI safety and there is no obvious reason why either would be more affected”. While this is potentially reasonable on a mathematical level, it does mean that it would be perfectly possible for resilient foods to be net harmful and the paper not correctly identify that funding them is a bad idea – simply because funding AI Risk reduction is an even worse idea, and this is the only given alternative. If the authors want to compare AGI risk mitigation and resilient foods against each other without a ‘do nothing’ common comparator (which I do not think is a good idea), they must at the very least do more to establish that the results of their AI Risk model map closely to the results which cause the AI Risk community to fund AI Risk mitigation so much. As this is not done in the paper, a major issue of interpretability is generated.
A second issue this causes is that the authors must make an awkward ‘assumption of independence’ between nuclear risk, food security risk and AI risk. Although the authors identify this as a limitation of their modelling approach, the assumption does not need to be made if AI risk is not included as a comparator in the model. I don’t think this is a major limitation of the work, but an example of how the choice of comparator has an impact on structural features of the model beyond just the comparator.
More generally, this causes the authors to have to write up their results in a non-natural fashion. As an example of the sort of issues this causes, conclusions are expressed in entirely non-natural units in places (“Ratio of resilient foods mean cost effectiveness to AGI safety mean cost effectiveness” given $100m spend), rather than units which would be more natural (“Cost-effectiveness of funding resilient food development”). I cannot find expressed anywhere in the paper a simple table with the average costs and benefits of the two interventions, although a reference is made to Denkenberger & Pearce (2016) where these values were presented for near-term investment in resilient food. This makes it extremely hard for a reader to draw sensible policy conclusions from the paper unless they are already an expert in AGI risk and so have an intuitive sense of what an intervention which is ‘3-6 times more cost-effective than AGI risk reduction’ looks like. The paper might be improved by the authors communicating summary statistics in a more straightforward fashion. For example, I have spent some time looking for the probability the model assigns to no nuclear war before the time horizon (and hence the probability that the money spent on resilient food is ‘wasted’ with respect to the 100% shortfall scenario) but can’t find this – that seems to be quite an important summary statistic but it has to be derived indirectly from the model.
Fundamentally, I don’t understand why both approaches were not compared to a common scenario of ‘do nothing’ (relative to what we are already doing). The authors’ decision to compare AGI Risk mitigation to resilient foods directly would only be appropriate if the authors expect that increasing funding for resilient food decreased funding for AI safety (that is to say, the authors are claiming that there is a fixed budget for AI-safety-and-food-resilience, and so funding for one must come at the expense of the other). This might be what the authors have in mind as a practical consequence of their argument, as there is an implication that funding for resilient foods might come from existing funding deployed to AGI Risk. But it is not logically necessary that this is the case, and so it creates great conceptual [confusion] to include it in a cost-effectiveness framework that requires AI funding and resilient food funding to be strictly alternatives. To be clear, the ‘AI subunit’ is interesting and publishable in its own right, but in my opinion simply adds complexity and uncertainty to an already complex paper.
Continuing on from this point, I don’t understand the conceptual framework that has the authors consider the value of invested dollars in resilient food at the margin. The authors’ model of the value of an invested dollar is an assumption that it is distributed logarithmically. Since the entire premise of the paper hinges on the reasonability of this argument, it is very surprising there is no sensitivity analysis considering different distributions of the relationship between intervention funding and value. Nevertheless, I am also confused as to the model even on the terms the authors describe; the authors’ model appears to be that there is some sort of ‘invention’ step where the resilient food is created and discovered (this is mostly consistent with Denkenberger & Pearce (2016), and is the only interpretation consistent with the question asked in the survey). In which case, the marginal value of the first invested dollar is zero because the ’invention’ of the food is almost a discrete and binary step. The marginal value per dollar continues to be zero until the 86 millionth dollar, where the marginal value is the entire value of the resilient food in its entirety. There seems to be no reason to consider the marginal dollar value of investment when a structural assumption made by the authors is that there is a specific level of funding which entirely saturates the field, and this would make presenting results significantly more straightforward – it is highly nonstandard to use marginal dollars as the unit of cost in a cost-effectiveness analysis, and indeed is so nonstandard I’m not certain fundamental assumptions of cost-effectiveness analysis still hold. I can see why the authors have chosen to bite this bullet for AI risk given the existing literature on the cost of preventing AI Catastrophe, but there seems to be no reason for it when modelling resilient food and it departs sharply from the norm in cost-effectiveness analysis.
Finally, I don’t understand the structural assumptions motivating the cost-effectiveness of the 10% decline analysis. The authors claim that the mechanism by which resilient foods save lives in the 10% decline analysis is that “the prices [of non-resilient food] would go so high that those in poverty may not be able to afford food” with the implication that resilient foods would be affordable to those in poverty and hence prevent starvation. However, the economic logic of this statement is unclear. It necessitates that the production costs of resilient food is less than the production costs of substitute non-resilient food at the margin, which further implies that producers of resilient food can command supernormal profits during the crisis, which is to say the authors are arguing that resilient foods represent potentially billions of dollars of value to their inventor within the inventor’s lifetime. It is not clear to me why a market-based solution would not emerge for the ‘do nothing’ scenario, which would be a critical issue with the authors’ case since it would remove the assumption that ‘resilient food’ and ‘AGI risk’ are alternative uses of the same money in the 10% scenario, which is necessary for their analysis to function. The authors make the further assumption that preparation for the 100% decline scenario is highly correlated with preparation for the 10% decline scenario, which would mean that a market-based solution emerging prior to nuclear exchange would remove the assumption that ‘resilient food’ and ‘AGI risk’ are alternative uses of the same money in the 100% decline scenario. A supply and demand model might have been a more appropriate model for investigating this effect. Once again, I note that the supply and demand model alone would have been an interesting and publishable piece of work in its own right.
Overall, I think the paper would have benefitted from more attention being paid to the underlying theory of cost-effectiveness motivating the investigation. Decisions made in places seem to have multiplied uncertainty which could have been resolved with a more consistent approach to analysis. As I highlighted earlier, the issues only stem from the incredible ambition of the paper and the authors should be commended for managing to find a route to connect two separate microsimulations, an analysis of funding at the margin and a supply-and-demand model. Nevertheless, the combination of these three approaches weakens the ability to draw strong conclusions from each of these approaches individually.
With respect to methods, the authors use a Monte Carlo simulation with distributions drawn from a survey of field experts. The use of a Monte Carlo technique here is an appropriate choice given the significant level of uncertainty over parameters. The model appears appropriately described in the paper, and functions well (I have only checked the models in Guesstimate, as I could not make the secondary models in Analytica function). A particular highlight of the paper is the figures clearly laying out the logical interrelationship of elements of the model, which made it significantly easier to follow the flow of the argument. I note the authors use ‘probability more effective than’ as a key result, which I think is a natural unit when working in Guesstimate. This is entirely appropriate, but a known weakness of the approach is that it can bias in favour of poor interventions with high uncertainty. The authors could also have presented a SUCRA analysis which does not have this issue, but they may have considered and rejected this approach as unnecessary given the entirely one-sided nature of the results which a SUCRA would not have reversed.
The presentation of the sensitivity analysis as ‘number of parameters needed to flip’ is nonstandard, but a clever way to intuitively express the level of confidence the authors have in their conclusions. Although clever, I am uncertain if the approach is appropriately implemented; the authors limit themselves to the 95% CI for their definition of an ‘unfavourable’ parameter, and I think this approach hides massive structural uncertainty with the model. For example, in Table 5 the authors suggest their results would only change if the probability of nuclear war per year was 4.8x10^-5 (plus some other variables changing) rather than their estimated of 7x10^-3 (incidentally, I think the values for S model and E model are switched in Table 5 – the value for pr(nuclear war) in the table’s S model column corresponds to the probability given in the E model). But it is significantly overconfident to say that risk of nuclear war per year could not possibly be below 4.8x10^-5, so I think the authors overstate their certainty when they say “reverting [reversing?] the conclusion required simultaneously changing the 3-5 most important parameters to the pessimistic ends”; in fact it merely requires that the authors have not correctly identified the ‘pessimistic end’ of any one of the five parameters, which seems likely given the limitations in their data which I will discuss momentarily. I personally would have found one- and two-dimensional threshold analysis a more intuitive way to present the results, but I think the authors have a reasonable argument for their approach. As described earlier, I have some concerns that an appropriate amount of structural sensitivity analysis was undertaken, but the presentation of uncertainty analysis is appropriate in its own terms (if somewhat nonstandard).
Overall, I have no major concerns about the theory or application of the modelling approach. However, I have a number of concerns with the use of the survey instrument:
First, the authors could have done more to explain the level of uncertainty their survey instrument contains. They received eight responses, which is already a very low number of responses for a quantitative survey. In addition, two of the eight responses were from authors of the paper. The authors discuss ‘response bias’ and ‘demand characteristic bias’ which would not typically be applied to data generated by an approximately autoethnographic process – it is obvious that the authors of a survey instrument know what purpose the instrument is to be used for, and have incentives to make the survey generate novel and interesting findings. It might have been a good sensitivity analysis to exclude responses from the authors and other researchers associated with ALLFED since there is a clear conflict of interest that could bias results here.
Second, issues with survey data collection are compounded by the fact that some estimates which are given in the S Model are actually not elicited with the survey technique – they are instead cited to Denkenberger & Pearce (2016) and Denkenberger & Pearce (2018). This is described appropriately in the text, but not clearly marked in the summary Table 1 where I would expect to see it, and the limitation this presents is not described clearly. To be explicit, the limitation is that at least two key parameters in the model are based on a sample of the opinions of two of the eight survey respondents, rather than the full set of eight respondents. As an aside on presentation, the decision to present lower and upper credible intervals in Table 1 rather than median is non-standard for an economics paper, although perhaps this is a discipline-specific convention I am unaware of. Regardless, I’m not sure it is appropriate to present the lowest of eight survey responses as the ‘5th percentile’, as it is actually the 13th percentile and giving 95% confidence intervals implies a level of accuracy the survey instrument cannot reach. While I appreciate the 13th percentile of 8 responses will be the same as the 5th centile of 100 samples drawn from those responses, this is not going to be clear to a casual reader of the paper. ‘Median (range)’ might be a better presentation of the survey data in this table, with better clarity on where each estimate comes from. Alternatively, the authors could look at fitting a lognormal distribution to the survey results using e.g. method of moments, and then resample from the new distribution to create a genuine 95% CI. Regardless, given the low number of responses, it might have been appropriate simply to present all eight estimates for each relevant parameter in a table.
Third, the authors could have done more to make it clear that the ‘Expert Model’ was effectively just another survey with an n of 1. Professor Sandburg, who populated the Expert Model, is also an author on this paper and so it is unclear what if any validation of the Expert Model could reasonably have been undertaken – the E model is therefore likely to suffer from the same drawbacks as the S model. It is also unclear if Professor Sandburg knew the results of the S Model before parameterising his E Model – although this seems highly likely given that 25% of the survey’s respondents were Professor Sandburg’s co-authors. This could be a major source of bias, since presumably the authors would prefer the two models to agree and the expert parameterising the model is a co-author. I also think more work is needed to be done establishing the Expert’s credentials in the field of agricultural R&D (necessary for at least some of the parameter estimates); although I happily accept Professor Sandburg is a world expert on existential risk and a clear choice to act as the parameterising ‘expert’ for most parameters, I think there may have been alternative choices (such as agricultural economists) who may have been more obviously suited to giving some estimates. There is no methodological reason why one expert had to be selected to populate the whole table, and no defence given in the text for why one expert was selected - the paper is highly multidisciplinary and it would be surprising if any one individual had expert knowledge of every relevant element. Overall, this limitation makes me extremely hesitant to accept the authors’ argument that the fact that S model and E model are both robust means the conclusion is equally robust
Generally, I am sympathetic to the authors’ claim that there is unavoidable uncertainty in the investigation of the far future. However, the survey is a very major source of avoidable uncertainty, and it is not a reasonable decision of the authors to present the uncertainty due to their application of survey methods as the same kind of thing as uncertainty about the future potential of humanity. There are a number of steps the authors could have taken to improve the validity and reliability of their survey results, some of which would not even have required rerunning the survey (to be clear however, I think there is a good case for rerunning the survey to ensure a broader panel of responses). With the exception of the survey, however, methods were generally appropriate and valid.
Notwithstanding my concerns about the use of the survey instrument, I have some object level concerns with specific parameters described in the model.
The discount rate for both costs and benefits appears to be zero, which is very nonstandard in economic evaluation. Although the authors make reference to “long termism, the view that the future should have a near zero discount rate”, the reference for this position leads to a claim that a zero rate of pure time preference is common, and a footnote observing that “the consensus against discounting future well-being is not universal”. To be clear, pure time preference is only one component of a well-constructed discount rate and therefore a discount rate should still be applied for costs, and probably for future benefits too. Even notwithstanding that I think this is an error of understanding, it is a limitation of the paper that discount rates were not explored, given they seem very likely to have a major impact on conclusions.
A second concern I have relating to parameterisation is the conceptual model leading to the authors’ proposed costing for the intervention. The authors explain their conceptual model linking nuclear war risk to agricultural decline commendably clearly, and this expands on the already strong argument in Denkenberger & Pearce (2016). However, I am less clear on their conceptual model linking approximately $86m of research to the widescale post-nuclear deployment of resilient foods. The assumption seems to be (and I stress this is my assumption based on Denkenberger & Pearce (2016) – it would help if the authors could make it explicit) that $86m purchases the ‘invention’ of the resilient food, and once the food is ‘invented’ then it can be deployed when needed with only a little bit of ongoing training (covered by the $86m). This seems to me to be an optimistic assumption; there seems to be no cost associated with disseminating the knowledge, or any raw materials necessary to culture the resilient food. Moreover, the model seems to structurally assume that distribution chains survive the nuclear exchange with 100% certainty (or that the materials are disseminated to every household which would increase costs), and that an existing resilient food pipeline exists at the moment of nuclear exchange which can smoothly take over from the non-resilient food pipeline.
I have extremely serious reservations about these points. I think it is fair to say that an economics paper which projected benefits as far into the future as the authors do here without an exploration of discount rates would be automatically rejected by most editors, and it is not clear why the standard should be so different for existential risk analysis. A cost of $86m to mitigate approximately 40% of the impact of a full-scale nuclear war between the US and a peer country seems prima facie absurd, and the level of exploration of such an important parameter is simply not in line with best practice in a cost-effectiveness analysis (especially since this is the parameter on which we might expect the authors to be least expert). I wouldn’t want my reservations about these two points to detract from the very good and careful scholarship elsewhere in the paper, but neither do I want to give the impression that these are just minor technical details – these issues could potentially reverse the authors’ conclusions, and should have been substantially defended in the text.
Overall, this is a novel and insightful paper which is unfortunately burdened with some fairly serious conceptual issues. The authors should be commended for their clear-sighted contextualisation of resilient foods as an issue for discussion in existential risk, and for the scope of their ambition in modelling. Academia would be in a significantly better place if more authors tried to answer far-reaching questions with robust approaches, rather than making incremental contributions to unimportant topics.
Where the issues of the paper lie are structural weaknesses with the cost-effectiveness philosophy deployed, methodological weaknesses with the survey instrument and two potentially conclusion-reversing issues with parameterisation which should have been given substantially more discussion in the text. I am not convinced that the elements of the paper which are robust are sufficiently robust to overcome these weaknesses – my view is that it would be premature to reallocate funding from AI Risk reduction to resilient food on the basis of this paper alone. The most serious conceptual issue which I think needs to be resolved before this can happen is to demonstrate that ‘do nothing’ would be less cost-effective than investing $86m in resilient foods, given that the ‘do nothing’ approach would potentially include strong market dynamics leaning towards resilient foods. I agree with the authors that an agent-based model might be appropriate for this, although a conventional supply-and-demand model might be simpler.
I really hope the authors are interested in publishing follow-on work, looking at elements which I have highlighted in this review as being potentially misaligned to the paper that was actually published but which are nevertheless potentially important contributions to knowledge. In particular, the AI subunit is novel and important enough for its own publication.