Evaluation 2 of "The Comparative Impact of Cash Transfers a Psychotherapy Program off Psychological and Economic Wellbeing"
“On a ‘scale of journals’, what ‘quality of journal’ should this be published in?: Note: 0= lowest/none, 5= highest/best”
Confidence: (3, 4)
See here for a more detailed breakdown of the evaluators’ ratings and predictions, along with any comments they made accompanying these numbers.
Manager’s note (David Reinstein): I added link citations to some of the papers mentioned by the evaluator, as well as adjusting the heading formatting and making some very small typographic adjustments for readability.
This paper studies the economic and psychological effects of providing two different interventions to low-income households in rural Kenya: a program in Cognitive Behavioral Therapy (CBT, a well-established form of psychotherapy) and an unconditional cash transfer. The authors use a randomized controlled trial with a 2-by-2 design to estimate the effect of each intervention alone and of both interventions combined. Both types of intervention have been studied separately in low and middle-income countries, but less research has compared them in the same context or looked at the effects of combining the two.
Strikingly, the authors find no effect of the therapy program on any of their primary economic or psychological outcomes: consumption, wealth, revenue, and an index of psychological wellbeing including depression and anxiety symptoms and functioning. This holds even for those with poor mental health at baseline. The cash transfer, meanwhile, significantly improves all these outcomes. Unsurprisingly given the null effect of therapy, the combination of cash and therapy has similar effects to cash alone.
The randomised controlled trial itself was well-executed and analyzed (as I’d expect from these authors, who in this and other work set a high standard for conducting and reporting randomised trials). Below I discuss this more; I have only minor comments on the implementation and find the results believable and unlikely to be biased.
The contribution is limited in one sense by the null effect of the CBT intervention. The most novel feature of the design was the comparison and interaction of CBT and cash, but as it is, without a ‘first stage’ effect on mental health the study can’t answer the research question of how an effective CBT program (which exist) might compare to, or have complementarities with, cash transfers.
But it’s not wholly fair to penalise the study for this – and the results are nonetheless interesting – because the CBT intervention should have worked. The program was a faithful replication of one that a high-quality trial found effective elsewhere in Kenya (Bryant et al. 2017). With hindsight, maybe more could have been done to improve the chances it worked. But overall it seems reasonable to expect it would have (I would have guessed so). So the study was ex ante well set up to investigate the research questions above. I’ve tried to judge it on that basis; penalising well-designed experiments for null results creates publication bias.
Moreover, the study still provides important new evidence on the (economic) effectiveness of CBT programs for general – rather than clinical – populations. A huge literature finds CBT is effective for people with mental illnesses, including in low- and middle-income countries (Lund et al. 2022), but there is much less evidence on whether CBT benefits general low-income populations. In principle, the techniques behind CBT – such as how to rework negative ‘automatic’ thought patterns – could both prevent mental illness and aid general decision-making, not least for those in poverty given its known effects on the mind. But this study finds no such effect. I know only one other study on this question: Barker et al. (2022), conducted contemporaneously in Ghana, who find the opposite result. So I think this is a significant contribution (though the literature on CBT is large and there could be other studies I’m unaware of).
This contribution is limited a bit because it’s not yet entirely clear why the CBT intervention didn’t work. The authors discuss this in depth and rule out several possibilities. But there are differences with the previous evaluation, Bryant et al. (2017), in terms of both sample and when the effects were measured (1 year later, rather than 3 months). We also don’t know if the CBT managed to change any of the thoughts or behaviours it targeted, which would help to sort out different possibilities. The authors have an interesting hypothesis about the intervention lacking a specific goal, but I didn’t quite understand the underlying argument here.
(Meanwhile, the cash transfer effects are - as intended, I think - in line with a large prior literature including previous work by these authors (Haushofer and Shapiro 2017), so I don’t focus on them more here though they are certainly a useful replication).
Below I organise my specific comments by sections: the design of the trial and intervention, the analysis, and the mechanisms (why there was no effect). I have tried to give positive comments, not just criticisms, and be constructive where I can. My main constructive suggestions are mostly to do with the discussion of why the CBT intervention didn’t work here, particularly when the previous evaluation in Kenya did.
Overall, the randomised controlled trial was conducted and analysed using ‘best practice’ techniques and I could not find significant threats to internal validity. Randomization was stratified on key variables including psychological wellbeing; other variables are balanced at baseline between treatment and control; there is very low attrition which is non-differential by treatment status, and the authors account for potential spillovers in the randomization design. The outcomes are appropriate and measured well.
I think the authors also chose the intervention well. With hindsight, some might argue they should have picked a more intense CBT program – at 5 weekly sessions, this one is shorter than most – and one with a larger evidence base, to improve the chances of a first stage effect on mental health. But I’m not sure such an evidence base exists for the Kenyan context specifically, and there’s meta-analytic evidence that brief CBT can still be effective (Cuijpers et al. 2023).
Main comments on the design:
The compliance reported by the NGO -- 95% of those in the treatment group attended all 5 CBT sessions – is almost surprisingly high, given that in Bryant et al. (2017) only 60% of treated people attended all five sessions, and most people in the authors’ survey 18 months later recalled attending less than five. It’s worth explaining briefly how such high compliance was achieved (were attendance payments more generous?). Otherwise, I worry that maybe some health workers here claimed a session had happened when it hadn’t. Not the most likely possibility, and I don’t mean to be unfair – maybe the authors and the NGO simply did an excellent job ensuring people showed up!
It would have been nice, in hindsight, to have a manipulation check for the CBT intervention – asking participants post-treatment if their thought or behavior patterns had changed, in the way CBT targets. I think some scales to do this for elements of CBT exist, such as the Behavioral Activation for Depression Scale.
Again in hindsight, to facilitate comparison with Bryant et al. (2017)  outcomes could have been measured at 3 months rather than just one year as the authors do. But I understand the authors’ desire to look for longer-run effects (other evidence suggests CBT can improve mental health at such time frames - Cuijpers et al. (2023)), and budgetary considerations may have prevented multiple endline surveys.
It wasn’t clear whether endline surveyors were blinded to treatment status (I see baseline surveyors were). Of course, I’d expect any bias from surveyor unblinding to make the estimated effect larger, which makes this a minor point given the results.
The analysis, including the exact specification, was pre-registered, and is done according to good practice: standard errors are clustered appropriately at the treatment level and there are corrections for multiple hypothesis testing across primary outcomes. It is very nice that the authors aid interpretation of their null result by calculating the negative predictive value, something which is easy to do but not done often in other papers. Also nice is including a discussion of cost-effectiveness.
The key question is why this study found null effects of CBT when the previous study (Bryant et al. 2017) found such large ones. The authors rule out several obvious explanations. First, this study was statistically well-powered enough to rule out effect sizes close to those in Bryant et al., meaning the difference is probably not due to ‘chance’. (It also appears that Bryant et al. was largely well-powered and executed, so their result was not just a fluke – could be worth discussing this in the paper too). Second, Bryant et al. studied a different sample: women victims of gender-based violence in peri-urban Nairobi, rather than low-income household heads in rural Nakuru county. But the authors in this study still find no effect among women, victims of intimate partner violence, or people with high psychological distress.
Beyond this, I have some comments on the potential mechanisms:
Sample differences. There are more differences in the samples that seemed underexplored currently:
The authors say the rural-urban difference is unlikely to explain their findings but it wasn’t clear to me why: rural and urban areas could differ in, say, income and education (Nairobi county has about 30% higher GDP per capita). If there is a good reason why rural and urban should be similar it would be great for the authors to elaborate.
It would be helpful to see a full table of summary statistics in this paper, to be able to compare the sample in more detail with Bryant et al. (Perhaps the table could even make this comparison). I couldn’t find this in the current draft.
It would be good to test other heterogeneity cuts of the data such as age and education. I was able to verify that the sample in Bryant et al. is about ten years younger on average, but it was hard to compare any other variables between the two papers. Of course, these extra analyses would not be pre-registered, but the results might at least be suggestive (and multiple hypothesis testing could be corrected for).
Intervention purpose. The authors argue that being delivered without a particular goal in mind might have made the intervention less effective: in Bryant et al. (2017), the goal was addressing gender-based violence. I am not sure I understand the argument here, given that the intervention content was apparently unchanged (p.5). Is the idea that some elements are ‘improvised’ by the community health workers delivering the intervention, and they do this differently when there is a specific goal in mind (e.g. giving examples relevant to domestic violence)? Or is it something else? This is a really interesting potential mechanism and worth more discussion.
Intervention delivery. This study used different community health workers (CHWs) to deliver the intervention than Bryant et al., and the authors find that some CHWs were more effective than others. It would be good to comment on whether this could explain the difference with Bryant et al. (2017) (I know the NGO had the same selection and training procedures, but was it harder to find CHWs in Nakuru county?). Ideally, getting the data from Bryant et al. (2017) and looking at CHW fixed effects there too would be interesting but I understand it might not be possible.
The authors could also report the estimated main treatment effects, MDEs and NPVs in the subsamples of female, psychologically distressed and high-IPV participants. This would help confirm whether there is enough power to rule out the effects in Bryant et al (2017) when restricting to a similar sample (I expect there is).
The authors could also look at heterogeneity by exactly the sample selection criteria in Bryant et al (2017), who had a low threshold for GHQ-12 but a high threshold for WHODAS. I am not sure that the current heterogeneity analysis by severe distress exactly covers this.