Skip to main content

Evaluation Summary and Metrics: "Advance Market Commitments: Insights from Theory and Experience"

Summary, metrics and ratings, and Manager's comments on Evaluation of "Advance Market Commitments: Insights from Theory and Experience" by Kremer et al.,

Published onMar 20, 2023
Evaluation Summary and Metrics: "Advance Market Commitments: Insights from Theory and Experience"
·
history

You're viewing an older Release (#6) of this Pub.

  • This Release (#6) was created on Mar 21, 2023 ()
  • The latest Release (#7) was created on Nov 20, 2023 ().

Preamble

Advance Market Commitments: Insights from Theory and Experience, Michael Kremer, Jonathan Levin, Christopher M. Snyder in AEA Papers and Proceedings (Vol. 110, pp. 269-273). 2014.[1]

We organized three evaluations of this paper (1. David Manheim, 2. Dan Tortorice 3. Joel Tan). To read the evaluations, click the links at the bottom.

Evaluation Manager’s comments (David Reinstein: comments are mainly about the ‘process’)

This was among the first papers we considered for The Unjournal evaluation. We are grateful to the authors of this paper for agreeing to participate and engage with the Unjournal’s evaluation of this paper, and for following through with this and being very helpful and supportive. (Although this was first an NBER working paper, it was selected before we began the “Unjournal Direct track”.)

Note: I also give a general summary of the evaluations and responses in this post on the EA Forum.

Why we chose this paper

We chose this paper for several reasons. Repeating what I wrote on the EA forum, and specifically shared with evaluators.

Advance market commitments for vaccines and drugs seem highly relevant to both global health and development priorities, and to reducing catastrophic and/or existential risk from future pandemics. This is also a practicable policy.

However, as Evaluator 1 (Manheim) notes, the authors do not specifically consider ‘global catastrophic risks’ or extinction risks, or extend their analyses to these considerations. Perhaps future evaluators interested in how research relates to these priorities might make specific suggestions that the authors could respond to. We will work to encourage and enable more interactive dialogue between authors and evaluators.

The authors make specific empirical claims based in specific calculations that could be critically assessed, as well as a specific formal economic model of maximization, with empirical implications.

The authors are well-respected in their field (obviously, this includes a Nobel laureate), but the paper may not have been as carefully reviewed and assessed as it could have been. “AEA Papers and Proceedings” does go through some selection and scrutiny but is not peer-reviewed in the same way that papers in journals like the American Economic Review are.

The authors stand strongly behind their work and are eager to promote its impact; e.g., see this NY Times Op-ed from one of the authors.

The authors’ fairly-detailed engagement with this evaluation confirms this.

The calibration model and some other parts of the explanation might be better suited to interactive and linked formats, rather than pdfs, to get the maximum value (but this is not necessary)

We did not end up pushing to get the authors or evaluators to provide this format. In fact, it was challenging to get evaluators to directly engage in the data and empirics. (Why? Maybe this demands a particular set of skills, it may be labor-intensive and seen to have greater reputational risks than rewards? We hope to do better to enable this work going forward, including working with robustness-replication initiatives.)

Why does it need (more) review,... It was published in AEA P&P which I’ve heard is ‘chosen more than peer reviewed’.

‘What sort of reviewers, and what to ask them?’

We were looking for evaluators:

  1. With policy expertise:

  • Familiarity with GAVI and AMCs

  • Who can assess ‘whether the authors have a reasonable understanding of the feasibility of this’ and ‘are the strategic considerations they highlight relevant/realistic?’

  • Who can identify other policy opportunities

  1. And/or who could assess:

  • Choices made in inferring the effect relative to counterfactual: E.g., [is] Rotavirus a reasonable comparison group?

  • the Cost-effectiveness model: “PCV saved 700,000 lives at a highly favorable cost” – consider the components of this model and its robustness (perhaps a MonteCarlo Fermi)?

Evaluators were asked to follow the general guidelines available here. They were also provided with these additional resources specific to this paper, rationale for its selection, and an ‘editorial’ first pass of aspects of the paper to consider.

The third evaluator (Joel Tan) was given a specific request1:

We are looking for a quantitative evaluation/robustness checking of the credibility of the empirical exercise, how robust it seems to be, and how realistic it seems in the face of other estimates and work that has been done. In a sense, this is like a cross between a peer review and a very small consulting project. In particular, we would like you to focus on point 1 below. We would also like you to consider point 2, at least briefly, and go into detail if you find it worthwhile.

1. Consider the choices made in inferring the effect of the AMC relative to the counterfactual: In particular, is Rotavirus a reasonable comparison group? What would be an appropriate comparison group or modeling alternative to a single comparison group? Is the time-period unusual? Do the results seem in the right ballpark? How might the results to change with an alternative (potentially more appropriate) modeling approach. (Feel free to do your own back-of-the-envelope calculations, or compare to other related literature).

2. Cost-effectiveness model: “PCV saved 700,000 lives at a highly favorable cost” – consider the components of this model and its robustness (e.g., perhaps a MonteCarlo Fermi?)

Overlooked replication package, lessons learned

However, we both overlooked that the authors had provided this replication package as a supplement to their AER Papers and Proceedings publication. In our correspondence with Joel, we had originally linked the NBER version of the paper, and thus Joel did not notice this package.

We asked Joel whether this would have made a difference, and how he would follow up on this work if he had more time. His response:

It would definitely have valuable…. I imagine the basic thing I would want to do is double-check the spreadsheet - I highly doubt we'll get anything like Reinhart/Rogoff, but it would have been remiss not to check.That said, (a) I guess it's valuable to have used a different methodology can found the results to be broadly similar; and (b) most of the work in the review went into gathering data on vaccine coverage and checking how the counterfactual impact analysis is robust against other comparator classes anyway, such that the replicator spreadsheet doesn't help per se.

If I had more time/funding, I think I would be mainly interested in trying to find better datasets (definitely a real possibility I missed something given the short timelines), and to try the analysis against more comparator classes - find a more comprehensive weighted average, maybe. I would want to do this (a) both with KLS's specific approach, as well as the one I did; and (b) would also want to look more closely at the 2nd part of the question, on the issue of DALYs per dose, since we can definitely model the per annum decline in counterfactual disease burden in low income countries and extrapolate that to a declining value in the vaccine.

More work and how to make it happen: This makes sense to me. I agree that there is more work to be done here. This work can be time-consuming, requires technical skill and field-understanding, and may not be glamorous. At the Unjournal, we are exploring ways to link with other projects such as the Institute for Replication, to provide incentives and environments for robustness-checking, sensitivity analysis, and policy-relevant extrapolation.

On the data and code: I personally lean towards a ‘replicable code and data’ approach, preferably using dynamic documents and ‘literate code’, rather than relying on spreadsheets. It is easier to adjust, to interpret, to spot mistakes, and to build on. OSF, BITSS, and the Turing Way project, among others, offer good resources for this

Looking for the data and packages. This also teaches us a lesson: when doing replication, robustness, and extrapolation work we need to check carefully to be sure to access all available resources provided by authors. These are sometimes only found in appendices and supplements to the most recent and formally published version of a paper, where, thankfully, journals like those associated with the American Economic Association are now enforcing data and code availability policies, with the help of a dedicated data editor. But our bibliometric systems could be improved as well – where replication packages are available, these could be made more prominent and better linked to the places people are likely to be reading the paper.

Metrics

Ratings

Evaluator 1

David Manheim

Evaluator 2

Dan Tortorice

Evaluator 3

Joel Tan

Rating category

Rating (0-100)

90% CI

(0-100)*

Comments (footnotes)

Rating (0-100)

Confidence

Note: High = 5, low = 02

Rating (0-100)

Confidence

Overall assessment

80

(70, 90)

3

80

4

79

(59, 94)

Advancing knowledge and practice

25

(20,40)

4

90

5

90

(70, 100)

Methods: Justification, reasonableness, validity, robustness

95

(85,97.5)

5

80

4

70

(50, 90)

Logic & communication

75

(60,90)

6

80

4

70

(50, 90)

Open, collaborative, replicable

N/A

N/A

7

90

3

50

(30, 70)

Engaging with real-world, impact quantification; practice, realism, and relevance8

90

(70, 100)

Relevance to global priorities

60

(40,75)

9

95

3

90

(70, 100)

Predictions

David Manheim

Dan Tortorice

Joel Tan

Prediction metric

Rating (0-5)

90% CI (0-5)* (or low to high)

Comments

Rating (0-5)

90% CI (0-5)* (or low to high) Note: High = 5, low = 0

Comments

Rating (0-5) (or low to high)

Confidence

What ‘quality journal’ do you expect this work will be published in?10

Note: 0= lowest/none, 5= highest/best

3

(2.5,4.5)

The paper seems (counterfactually) very likely to get accepted by a mid-tier journal, even if only due to the authors, and is moderately likely to be accepted into a better journal.11

4

5

Published in AER P&P

5

High

On a ‘scale of journals’, what ‘quality of journal’ should this be published in?

Note: 0= lowest/none, 5= highest/best

N/A

4

5

5

High

Data presentation of metrics

Comments
0
comment
No comments here
Why not start the discussion?