Apr 17, 2023
Evaluation 1: "The Comparative Impact of Cash Transfers a Psychotherapy Program off Psychological and Economic Wellbeing"

Manager's note (David Reinstein): I added link citations to some of the papers mentioned by the evaluator, as well as adjusting the heading format and making some very small typographic or very minor grammatical adjustments for readability.

Written report

Brief explanation:

  1. The paper advances knowledge and practice on cash and psychotherapy interventions, testing long-term and spillover effects, replicating important earlier studies and specifying conditions for the cash transfer intervention further (lump-sum vs. dispersed/weekly transfers). It does not provide further insight on what would make the psychotherapy intervention work.

  2. Methods are generally robust and well-justified. 

  3. Logic and communication are clear, the reasoning is transparent, arguments make sense. Data and analysis are relevant to the arguments, and conclusions justified. Only when the authors conclude that the timing of the post-intervention measure is not a plausible explanation for why they don’t find PM+, I disagree to some extent.  Figures and tables are easy to understand. 

  4. Open, collaborative, replicable science and methods: Methods and analyses are described in detail, but code and data is not shared, making computational reproducibility hard. Some, but not all materials are shared and can support future research. Pre-registration was done before analysis, not before data collection. Numbers are consistent throughout the paper.

  5. Real-world impact quantification: the paper tests a real-world intervention, and its results and conclusions are very plausible and realistic. 

  6. Global poverty and well-being is highly relevant to global priorities. 


The paper presents a pre-registered RCT in Kenya with the main goal of comparing cash transfers to poor households with a psychotherapy intervention called problem management plus (PM+), as well as testing their combined effect. Primary outcome measures of the RCT included both economic (consumption, assets, household revenue) and psychological well-being (scales for psychiatric screening, stress, happiness, life satisfaction scales, intimate partner violence). The cash transfer corresponds to about 20 months of per capita consumption (in the control group). PM+ is a five-week CBT-based program with one session per week, in which a trained volunteer (community health worker) works on stress management (e.g. relaxation and breathing exercises), problem-solving, behavioral activation, and strengthening social support with individual clients. 

This paper makes important contributions by testing earlier intervention effects for their robustness, including additional outcomes, and evaluating conditions under which the interventions work: (1) It adds further evidence on the robustness of cash-transfer interventions having effects on both economic and psychological outcomes. (A side finding is that weekly cash transfers over 5 weeks are somewhat more effective in increasing monthly non-durable consumption and revenue than one lump-sum.) (2) It does not replicate spill-over effects of cash transfers reported in Haushofer & Shapiro (2016), suggesting these are not robust across different regions in Kenya. (3) It does not find long-term effects of the PM+ intervention on economic well-being and (4) and does not replicate effects of PM+ on psychological well-being (previously reported for 3 months after this intervention) with a more long-term (13 months) measure. (5) In line with this null result, cash transfers combined with PM+ have a very similar effect on most outcome variables than cash transfers alone. 

One earlier RCT (Bryant et al., 2017 [1]) found effects on psychological well-being only (not on economic outcomes) 3 months after the PM+ intervention in a sample of women who were victims of intimate partner violence (IPV). The results on PM+ presented in the paper could indicate one of the following: that PM+ effects on psychological well-being…

  1. do not replicate. (This seems possible given the larger sample of this RCT.)

  2. do not replicate in a general sample, and are only successful when targeted at a specific problem (e.g. intimate partner violence in the earlier RCT).

  3. last for shorter than 200 days (from 200 days onward, analyses of effects over time show no impact on psychological or economic well-being). 

They also indicate that the current implementation of the PM+ was a lot less cost-effective to increase both economic and psychological well-being 1 year after the intervention than cash transfers. 

Other positive aspects and strengths

  • First test of long-term effects of PM+, and of PM+ effects on economic well-being

  • Measures to reduce bias and questionable research practices: the analysis was pre-registered, this pre-analysis plan was largely followed, the paper corrects for multiple comparisons

    • How the authors transparently report on a null result for a replication of spill-over effects of cash transfers, which they published themselves in an earlier paper (Haushofer & Shapiro, 2016) illustrates the limited influence of bias.

  • Robust methods, for instance, a larger sample than the earlier studies it replicates, with over 500 households per group, careful randomization procedure, different data quality checks and smart checks for demand effects.

  • Results are reasonable, and (almost) all conclusions justified (see limitations for one exception)

  • Clearly written

Limitations and potential ways the work could be improved:

Below, I mention some small ways of improving the paper. I consider none of them major limitations, but more refinements of the discussion of some results, or small details that are missing in the paper. None of what follows should change the main practical conclusion of the paper that cash transfers are more cost-effective to improve economic and psychological well-being in the long-term than the described implementation of PM+, because PM+ had no such long-term effects. This conclusion is solid in my opinion.

  1. The discussion of explanations for why PM+ effects could not be replicated seems a bit too confident about the fact that the timing of the follow-up survey did not matter. 

I think that the delay between intervention and follow-up could be a potential explanation for why PM+ has no effects on psychological well-being in the current study. The post-intervention survey was very late to plausibly observe the effect of a short and low-intensity psychotherapy intervention of maximum 5 sessions over 5 weeks. The median time of the follow-up survey is 13.5 months (range: 2-23 months) after the end of the intervention. First, from a psychological perspective, the assumption that such a low-intensity, individual-session therapy intervention would have a lasting effect over a year later seems implausible. Second, the comparison of a cash transfer intervention that corresponds to about 20 months of per capita consumption (quite a long-term/intensive intervention), with a low-intensity 5-week therapy intervention seems a bit unbalanced to me, especially when evaluating outcomes after a year. I had the impression that the study design was set up to measure cash transfer effects, and then added PM+ into the design without adapting it to measure the effects of this second intervention.

I would suggest more clearly mentioning that the study was not set up to detect effects at time scales shorter than 6.5 months (200 days) when discussing the results. The claim that timing is unlikely to explain the null-result could be phrased with more nuance, given that the current analysis of effects over time can only speak to effects after 200 days. Examples for changes I would make throughout the paper are: (1) after the last sentence on p. 5 (“ the delay between intervention and endline is also unlikely to explain our null results …”) - include the footnote number 6 into the main text. (2) change the statement “PM+ being ineffective” on p. 32 to be more specific and say “PM+ not having any long-term effects”.

Based on such discussions, future comparisons of other interventions with psychotherapy could then choose interventions that are equally plausible to have an effect (at the time scale measured), such as higher-intensity or more long-term interventions, interventions in group settings that could foster social support that lasts longer than the intervention itself, or focus on more specific populations or problems. Some of this is already discussed in the paper (e.g., in the final conclusion).

A note: Hindsight bias (thinking such an effect was implausible from the start) could have influenced this judgement of mine; I assume the NGO did believe the intervention could have a long-lasting effect at least on psychological well-being after 1 year. I can also imagine that one of the study’s main goals was to show that such low-intensity psychotherapy interventions are not effective compared to cash transfers in the long-term, if such claims were made before. This is not obvious from how the manuscript introduces its aims, however.

  1. Some short/small clarifications on differences between the pre-analysis plan (PAP) and the paper plan would be important.

  • One pre-registered sub-component of subjective well-being (the custom worries scale) was omitted in the paper, and not included in the index, without mentioning why.  This should definitely be mentioned as a deviation from the PAP in the paper. 

  • Some terms in the PAP differ from terms used in the paper, and it is unclear if they refer to the same variables. 

    1. “Depression” (probably refers to the GHQ12, distress in the paper). Depression is mentioned as the main outcome variable in research question 1 in the PAP. Yet, later in the PAP and in the paper only the subjective well-being index and its subcomponents (not including depression) are described. If depression was simply not the correct term in the PAP, I would briefly mention this somewhere in the paper. 

    2. “Mental health intervention for IPV”: this sounds like an intervention targeted at a specific population and problem, rather than the general population PM+ intervention focused on many different kinds of problems described in the paper. I would clarify these terms briefly in the paper, for example as a footnote in the methods or in the Appendix. 

  • Before reading the PAP in detail, I was wondering why no results on the analysis of norms around IPV were reported. The norm measure is briefly mentioned on p. 17, and all indices/questions are reported in the appendix, but no results are reported. The PAP clarifies that the norms analysis was only planned as part of the analyses of the mechanisms behind spill-over effects, which were not found and thus not analysed (see section III.J, p39).  I would recommend making that clear in the paper (on page 39/40).

  1. Heterogeneity of results: Victims of IPV

Since recruiting was not targeted at victims of intimate partner violence (IPV), I wonder if a median split on IPV to examine heterogeneity of results across women with and without IPV experience actually includes enough study participants with such experiences to rule out that PM+ has positive effects in this subsample. 

In parallel to the analyses conducted for participants with high baseline distress, and extreme distress, an analysis among women with IPV experience would be more convincing. If there are not enough such women in the current study, I would more clearly entertain the possibility that the intervention may only be helpful for this specific population when discussing the heterogeneity of results with regard to IPV on page 35 in section III.F. (The final conclusion does mention this already).  


  • I would find a Figure like Figure 3 but for secondary outcomes very useful (e.g. in the Appendix). 

  • A note for Figure 1 that explains the abbreviations could be helpful (HH = households, what is BL and EL?).

  • Including a link to the other study mentioned on p. 11 on Digital Financial Service incentives would be useful, because both pre-registration and paper mention several questions/details related to it. 

  • Back-check reports (p 15): Calling participants to verify attendance of sessions 18 months after the intervention is very likely influenced by memory biases, that is, failures to recall all attended sessions. The statement “Our phone resurvey could not confirm the high rates of participants receiving the entire schedule, with only 35 percent (20 percent) of PM+ (CT&PM+) subjects remembering having received all five sessions” therefore seems to portray the data on attendance provided by the NGO in a more negative light than what seems justified. Self-reports after 18 months are simply not a strong indicator for potential errors in the data. I would mention this to clarify what the back check results mean. 

  • I have some doubts about how warranted the request for future work taking the possibility that M+ might increase IPV seriously (p.28, end of 2nd paragraph) is. The results are much more suggestive of a demand effect, since the smiley and envelope tasks show no significant increase in IPV. Furthermore, the effects on IPV are not significant in the combined Cash and PM+ condition either. But I agree it’s better to err on the side of caution, I would probably just rephrase the sentence so that it suggests that demand effects are the most likely explanation. 

  • It was not obvious to me why data collection was stopped during elections - because that might have influenced well-being? This could be mentioned very briefly. 

Future research: 

  • Could evaluating group-based approaches for PM+ or other psychotherapeutic interventions be important to increase cost-effectiveness and potential long-term effects? (If such research has not already been done since 2020).

  • Are there spill-over effects in the same household, i.e. for the family of the recipient of an intervention?

My references for these 2 questions: work by the Happier Lives Institute evaluating the work of Strong Minds (interpersonal group-based therapy), 

But see the debate on how robust the evidence for cost-effectiveness of StrongMinds: 

David Reinstein:

what does “they don’t find PM+” mean? I guess the evaluator meant ‘for why they don’t find an impact of the ‘Problem Management plus’ psychological treatment’