Published onJul 05, 2024
Evaluation 2 of "How Much Would Reducing Lead Exposure Improve Children’s Learning in the Developing World?"
This non-systematic review1 seeks to explore the relationship between lead exposure and children’s learning outcomes by updating existing meta-analyses. Its key strengths are 1) consideration of standardized test scores for reading and mathematics 2) use of a specification curve, and different ways to assess the impacts of publication bias. Critically, a quality assessment is missing and some control choices are unmotivated. This evidence captured in the review itself is likely not causal, though the authors do examine a broader literature on the cognitive implications of lead exposure that certainly goes beyond correlational associations.

Summary Measures

We asked evaluators to give some overall assessments, in addition to ratings across a range of criteria. See the evaluation summary “metrics” for a more detailed breakdown of this. See these ratings in the context of all Unjournal ratings, with some analysis, in our data presentation here.2


90% Credible Interval

Overall assessment


68 - 82

Journal rank tier, normative rating


2.8 - 3.2

Overall assessment: We asked evaluators to rank this paper “heuristically” as a percentile “relative to all serious research in the same area that you have encountered in the last three years.” We requested they “consider all aspects of quality, credibility, importance to knowledge production, and importance to practice.”

Journal rank tier, normative rating (0-5): “On a ‘scale of journals’, what ‘quality of journal’ should this be published in? (See ranking tiers discussed here)” Note: 0= lowest/none, 5= highest/best”.

See here for the full evaluator guidelines, including further explanation of the requested ratings.

Written report

This non-systematic review3 seeks to explore the relationship between lead exposure and children’s learning outcomes by updating existing meta-analyses. Its key strengths are 1) consideration of standardized test scores for reading and mathematics 2) use of a specification curve, and different ways to assess the impacts of publication bias. Critically, a quality assessment is missing and some control choices are unmotivated. This evidence captured in the review itself is likely not causal, though the authors do examine a broader literature on the cognitive implications of lead exposure that certainly goes beyond correlational associations.


  • I view the inclusion of a specification curve very positively. This method allows authors to examine whether their conclusions are sensitive to changes in analytical approach, such as subsampling or use of controls. In a way, this allows for researchers to report their own robustness replication alongside their main ‘conventional’ result (e.g. meta-analytic pooled effect). It might be useful for readers to contextualize this with current wide-scale efforts in economics to mass reproduce research in top journals, which showed that 70% of effects remained significant after a robustness replication, and about half diminished in size (Brodeur et al., 2024)[1]. In other words, the inclusion of this specification curve should offer us confidence that we are looking at robustly negative and significant impacts of lead, albeit small in magnitude.

  • The authors have not discussed their results in this manner, but since lead exposure can be due to macro-level policies or trends, and as such populations will be exposed to lead based on these, it may be appropriate to consider these population-level effects; as such even small effects are very important for police (Rose, 2001)[2]

  • I am very supportive of the consideration of measures beyond IQ. IQ has been criticized as a challenging measure for young children and in some, particularly non-Western, cultural contexts. Standardized test scores for reading and mathematics are a nice addition with added applied utility, i.e. they are directly relevant for education.

  • Publication bias - strong approach, especially when reporting multiple ways to account for publication bias. As is typical for such efforts, the overall pooled effect does moderately decrease. (Though I do wish this corrected-for-publication-bias effect (based on either method) would be provided in numeric terms in the report.)

Critical points

Note I see this paper as a non-systematic review and so consider it to be unfair to assess it against the standards of systematic reviews, e.g. consideration of systematic databases rather than Google Scholar’s, reporting standards such as inter-rater reliability measures - see PRISMA reporting standards for me). At the same time, the non-systematic approach does mean studies could have been missed or selected in a biased manner. One way to have improved on this without an extreme time commitment is to have done a systematic search for systematic reviews in a couple of targeted databases (i.e. using ‘systematic review’ as a search term). There are benefits of searching for new studies systematically as well.

  • The most serious concern I have is that there is no formal quality assessment (risk of bias) screening. Without this, we have no sense of whether or not this meta-analysis perpetuates the ‘trash in - trash out’ problem (see Egger et al. 2001)[3]. By using potentially biased data, erroneous conclusions or mis-estimation can be perpetuated and solidified. This may or may not be the case here but there is no way to tell. For instance, we have no sense of study population, recruitment strategy, attrition, power justification, robustness or validity of measures used etc.

  • Inclusion criteria: studies that “have blood lead level measures from the same individuals, whether contemporaneously or from different points in time.” - I assume if different time points, it would still be required that blood lead level is measured before IQ/ reading/ math, though I cannot find whether this was the specific inclusion criterion

  • Extraction: “We extract any coefficient relating maternal or child blood lead levels…” there is no further mention of maternal lead beyond p.6, does this mean no studies used maternal measures or whether lead at the maternal or child level was treated the same? If the latter, I would be concerned and would prefer such a choice to better and more transparently motivated.

  • “We exclude results which include blood measures for multiple ages in a model separately, as this has a different estimand: the effect of exposure at a particular age, relative to exposure at another age.” - this sounds strange to me, given that likely the first available age would be comparable to all other included studies where there is bone lead measures and cognition measured fitting the inclusion criteria for time of measurement (“contemporaneously or from different points in time”)

  • I note in passing that from the 47s studies, 29 are in high income countries, which might suggest need for more data in LMICs - cannot make this as a strong claim, since this is not a systematic review.

  • In passing as well, some figures need better labeling, e.g. include effect type directly in label.

  • Specification curve - minor; perhaps could have been done in a more principled way - e.g. controls reported in stepwise manner (first each on their own, then combined), a subsampling based on country more direct comparison of high-income vs LMICs

  • The second most serious concern I see is around the reporting standards. Particularly, the choice of controls is discussed very briefly and is not theoretically motivated for the reader. For instance, SES could be not only a confounder but could modify the neurotoxic effects of lead exposure. Inappropriate modeling re: SES/ other controls can lead to underestimates of the effects of lead or missattribution to the wrong risk factor (See Bellinger 2008[4] for more on this specifically or Judea Pearl on causality).

  • Overall, taking these concerns above - lack of quality assessment, the lack of motivation for controls, and the more minor questions about timepoint inclusions - I do not personally take the evidence base captured in the review as causal. Most of the data is observational as well.

  • At the same time, the discussion in Section 4 (‘Assessing the role of unobserved confounders’, starting p.21) is strong and helpful in focusing particularly on what kinds of work exist more broadly on the cognitive impacts of lead, and how much different studies can speak to causality (e.g. animal studies, natural experiments).

Additional note (from correspondence)

Evaluation manager

… What exactly [did you find] non-systematic about the review?…


The authors searched only in Google Scholar - this is not a systematic search database but simply a search engine, and I think it's deeply misleading to use the terminology of 'systematic review' in such cases.

While Google Scholar can be used effectively as a supplementary addition to systematic databases, it itself is not systematic and cannot be searched systematically. There are further differences between systematic databases and search engines like GS- search engines  such as GS are non-replicable, results almost always vary between machines given the algorithms GS uses, you do not reliably know where GS pulls results from, there is no controlled vocabulary (eg MESH terms), you cannot be specific enough within GS and  so likely receive pages and pages of results and thus have to screen up to an arbitrary 'stopping point', whereas typically in systematic reviews all results are pulled. For instance, when I try to replicate the authors’ search strategy in section 2.1 the output I get from GS is "About 9,330,000 results" so it's very difficult for me to understand how the authors report that they "found 951 potential results" without applying some arbitrary stopping rule.

[Evaluation manager: We also ran this search and got many more results than the authors reported.]


[1]Brodeur, A., Mikola, D., & Cook, N. (2024). Mass Reproducibility and Replicability: A New Hope. Online access:

[2]Rose, G. (2001). Sick individuals and sick populations. International journal of epidemiology, 30(3), 427-432.

[3]Egger, M., Smith, G. D., & Sterne, J. A. (2001). Uses and abuses of meta-analysis. Clinical Medicine, 1(6), 478.

[4]Bellinger, D. C. (2008). Lead neurotoxicity and socioeconomic status: conceptual and analytical issues. Neurotoxicology, 29(5), 828-832.

