Description
Evaluation of "Meaningfully reducing consumption of meat and animal products is an unsolved problem: A meta-analysis" for The Unjournal.
This response speaks solely for Seth. I am the lead author of the meta-analysis under discussion and I think it’s fair to say this paper is my baby (“you’ll always be a part of me…”)
Our paper has been accepted in a leading journal in our field after a brief round of straightforward revisions.
I happen to think this is an excellent paper. To shift my beliefs on this question, the reviewers would have needed to identify things that I understood as errors rather than differences of opinion/judgment calls. I happen to think that every criticism raised1 amounts to differences of judgment; that the issues raised are hard; that they don’t have clear right/wrong answers; that all meta-analyses boil down to judgment calls; and that ours were reasonable.
I recognize that our paper does some unusual things for a meta-analysis. In effect, the paper advances a distinct vision of what meta-analysis is for, and how to conduct it, that bears further motivation and explanation. I’ll start to do that here, but I think it calls for a separate methods note/paper.
A brief story. Once upon a time, when I was a firebrand of a grad student, I found what I thought were some serious flaws in the design and implementation of a small but well-regarded literature in social psychology. I emailed the authors of some of the original papers and they were receptive to a piece discussing these issues and offering solutions. I then sent them a first draft in which I basically said that some of their most highly-regarded and influential papers did not provide valid, credible causal estimates. They did not like this one bit and their response email, on which my advisor was CC’d, was a real masterclass in how to slap an upstart down. So these reviews have provided a bit of a “ah how the wheel has turned!” moment for me — now I’m the scholar defending my work against ECRs/native of the open science/credibility revolution moments. I view this transition fondly, like an old dog watching young dogs play at the park.
Nevertheless, I wish to counsel any such folks to try to strike a collegial and collaborative tone in such interactions as much as possible, unless the circumstances are pretty extreme. Aside from the obvious things — empirical research is hard, we put a ton of work into this already, it went through many rounds of internal revisions before we publicized it, meta-analyses are inherently limited and imperfect and require imperfect approximations, etc. — academia is not that big. Accusing someone of questionable research practices is, to me, pretty close to accusing them of fraud/tipping the scale in favor of a favored/pre-selected conclusion, i.e. of not really doing research. It risks creating enemies. Why not avoid that if you can?
Managers’ note 29 Jul 2025: The author's response here reacts to an initial version of the second evaluator's report, which at this time of your reading this, may be adjusted very slightly in tone. It also reacted to the use of the term “questionable research practices” (which may have been removed by now). This term is sometimes interpreted to mean intentional, opportunistic, or even fraudulent behavior. However, this is not how it was intended here.
Onto the main points:
It’s standard practice to do a systematic search of relevant academic databases for meta-analyses. If I had been doing this paper with a team of RAs, I probably would have done one. But more to the point, as I discovered pretty quickly, we were looking at an extremely heterogeneous, gigantic literature — think tens of thousands of papers — where sifting through it by terms was probably going to be both extremely laborious and also to yield a pretty low hit rate on average. So we tried something else, and all told, I think it was pretty successful. We addressed this in our supplement, which I reproduce verbatim here:
Our search process was shaped by three features of our research project. First, our surveyed literature was highly interdisciplinary, with few shared terms to describe itself. For instance, the term ‘MAP’ is not universally agreed upon; other papers use animal-based protein, edible animal products, or just meat, while some studies focus on a particular, sometimes unusual category of MAP, such as fish sauce (Kanchanachitra et al., 2020) or discussed their agenda mainly in terms of increasing plant-based alternatives. Coming up with an exhaustive list of terms to search for from first principles would have been very difficult or impossible.
Second, our methods-based inclusion criteria complicated screening on titles and abstracts. While it was sometimes possible to use solely that information to eliminate studies with no interventions (e.g. survey-based research), determining whether an intervention qualified almost always required some amount of full text screening. We also discovered that terms like field experiment have varying meanings across papers, and identifying whether a measured food choice was hypothetical or not often required a close reading. For these reasons, screening thousands or tens of thousands of papers struck us as prohibitively time-consuming.
Third, we found a very large number of prior reviews, typically aimed at one disciplinary strand or conceptual approach, touching on our research question. Reviewing tables and bibliographies of those papers proved fruitful for assembling our dataset.
For these reasons, we employed what could be called a ‘prior-reviews-first’ search strategy. Of the 985 papers we screened, a full 73% came from prior reviews, and 43% of papers in our main dataset. (See the next section of the supplement for notes on reviews that were especially informative.) Then, as detailed in the main text, we employed a multitude of other search strategies to fill in our dataset, one of which was systematic search. In particular, we searched Google Scholar for the following list of terms, and checked ten pages of results for each:
“dynamic” “norms” “meat”
“dynamic” “norms” “meat” “consumption”
“field” “experiment” “plant-based”
“meat” “alternatives” “default” “nudge”
“meat” “consumption” “reducing” “random”
“meat” “purchases” “information” “nudge”
“meat” “reduction” “randomized”
“meat” “sustainable” “random”
“nudge” “meat” “default”
“nudge” “reduce” “meat” “consumption”
“nudge” “sustainable” “consumption” “meat”
“nudge” “theory” “meat” “purchasing”
“norms” “animal” “products”
“nudges” “norms” “meat”
“random” “nudge” “meat”
“randomized controlled trial” “meat” “consumption” “reduce”
“sustainable” “meat” “nudge”
“sustainable” “meat” “nudge” “random”
“university” “meat” “default” “reduction”
Additionally, we searched the American Economic Association’s registry of randomized controlled trials (https://www.socialscienceregistry.org/) for the terms “meat” and “random” and reviewed all matching results in the relevant time frame.
Another innovative part of our search strategy was our use of an AI-based search tool (https://undermind.ai/), to which we described our research question and then reviewed 100 results that it generated. This yielded one paper that met our inclusion criteria (Mattson, 2020) that seems to have slipped past many other systematic search processes.
Finally, we benefited from two in-progress literature reviews affiliated with Rethink Priorities, “a think-and-do tank” that researches animal welfare as one of its main priorities. Both of these literature reviews are aimed at assessing interventions that reduce MAP consumption, but have broader inclusion criteria than our paper employed. For more details on these two projects, see https://osf.io/74paj and https://meat-lime.vercel.app.
(As to the dispute about whether Google Scholar is a credible way to search for papers, I would say, if that were all we were using, it would be a problem.)
The main way we try to address bias is with strict inclusion criteria, which is a non-standard way to approach this, but in my opinion, a very good one (Simonsohn, Simmons & Nelson (2023) articulates this nicely). After that baseline level of focusing our analysis on the estimates we thought most credible, we thought it made more sense to focus on the risks of bias that seemed most specific to this literature. In this context, we didn’t think a formal RoB assessment was a good fit for the kinds of design and measurement variances we actually encountered. I hope that our transparent reporting would let someone else replicate our paper and do this kind of analysis if that was of interest to them.
(I agree fully with the anonymous reviewer that our use of strict inclusion criteria lends a non-standard interpretation to our publication bias estimates. I do not think we can speak to publication bias in the literature as a whole. In terms of my own beliefs, I was pleasantly surprised by how often and plainly advocacy groups published their null findings.)
A peer reviewer raised a similar question, and we added the following section to our supplement:
We selected one dependent variable per intervention using a hierarchical approach: (1) behavioral outcomes over attitudinal/intentional outcomes, (2) the latest measurement timepoint with adequate sample size to meet our eligibility criteria, and (3) the outcome best corresponding to net MAP or RPM reduction. Assessing this usually meant identifying the study’s primary outcome. For example, Carfora & Catellani (2023) recorded self-reported servings of red, processed, and white meat over the previous week, while Feltz et al. (2022) used a food frequency questionnaire asking participants to recall, over a given day, how many times they ate “dairy, chicken, turkey, fish, pork, eggs, beef, bacon, sausages, processed meats, hamburgers, or any animal meat” (p. 7), where responses were coded from 0-5 and the outcome was the sum of those responses.
When studies measured multiple meat categories separately (e.g., “beef, poultry + fish, and meat” in Jalil et al. (2023)), we selected the most comprehensive category (“meat”). While Andersson & Nelander (2021) measured meat and fish separately, they also reported the proportion of vegetarian meals sold, which captured changes in both categories. We chose that as our primary outcome.
We considered including attitudinal outcomes as well to assess whether changes in attitude predict changes in behavior (Verplanken & Orbell, 2022). However, selecting representative attitudinal measures across diverse studies involved a great deal of researcher discretion compared to the straightforward process of identifying consumption outcomes.
in our main dataset of 112 point estimates, there were 12 cases where authors didn’t provide enough information to calculate an effect size — typically the SD was missing and there was no way to back it out, but in other cases, authors just said that there was no effect without statistical elaboration — so we called those “unspecified nulls” and set them to 0.01. A reviewer asked about this, and we said the following:
“The 0.01 vs 0.00 distinction is a convention borrowed from a previous paper. 0.00 would be fine too, but there is something nice about giving authors a small benefit of the doubt and assuming they found some effect in the direction they expected. We added the following to the supplemental section on sensitivity analyses:
In our main dataset, we recorded 12 point estimates as ‘unspecified nulls’ because authors reported them as nulls but did not provide the necessary statistical information to calculate a precise effect. We manually coded these estimates as 0.01, a convention borrowed from Porat et al., 2024. If we set these to be equal to 0.00 instead, our meta-analytic estimate remains 0.07. If we omit them entirely, our overall estimate rises to 0.09 (95% CI: [0.03, 0.16]). Although this difference is small in absolute terms, it is large in relative terms. This difference suggests that meta-analyses which exclude studies for gaps in outcome reporting might be producing upwardly biased overall estimates.
There were other potential ways to handle this. Which leads us to:
Matthew Jané raises many issues about ways in which he thinks our analyses could (or in his opinion, should) have been done differently. Now I happen to think our judgment calls on each of the raised questions were reasonable and defensible. Readers are welcome to disagree.
More broadly, if computing effect sizes or variance differently is of interest, by all means, please conduct the analysis, we’d love to read it! We strove for a really high standard of computational reproducibility. In part that’s about credibility/validity. But the other thing that a high degree of reproducibility gets you is extendability. For instance, changing our d variable to replace all things where eff_type == unspecified_null with something else — you could use rnorm to generate random effect sizes centered on 0, or you could (less conservatively) use MICE (under the rather strong missingness-at-random assumption), or whatever you want — and then re-running our meta-analysis would be straightforward. The number of such extensions we might have run is very large, and we ran the ones that we thought would be of highest interest to readers. (If anyone reading this wants to run any further extensions, and you have any questions about how to go about it, my email is in the paper and I’d be glad to hear from you.)
Finally, Matthew raises an interesting point about the sheer difficulty in calculating effect sizes and how much guesswork went into it for some papers. In my experience, this is fundamental to doing meta-analysis. I’ve never done one where there wasn’t a lot of uncertainty, for at least some papers, in calculating an SMD. (I’ve also never replicated someone else’s meta-analysis and not found serious disagreements with how they calculated effects.) If you wanted to discard these estimates, or use multiple imputation to calculate them, or read the papers and calculate an effect size yourself – all of those would be interesting extensions. But at some point, you declare a paper “done” and submit it.
Evaluation of "Meaningfully reducing consumption of meat and animal products is an unsolved problem: A meta-analysis" for The Unjournal.
Evaluation of "Meaningfully reducing consumption of meat and animal products is an unsolved problem: A meta-analysis" for The Unjournal.