Abstract
Strengths: - Introduces a new technique - treatment variant aggregation (TVA) - to answer policy-relevant questions - Strong theoretical and simulation results grounded in real-world data - Demonstrates advantages in both selection and estimation of policies by comparing to many alternative estimators - Application to an urgent, real-world problem Weaknesses: - Simulations rely on parameters from a single dataset, limiting generalizability - Unclear practical advantage of TVA in selecting better policies compared to alternatives - Bayesian estimators [that are more sophisticated than the ones they tested] could rival or surpass TVA
Summary Measures
We asked evaluators to give some overall assessments, in addition to ratings across a range of criteria. See the evaluation summary “metrics” for a more detailed breakdown of this. See these ratings in the context of all Unjournal ratings, with some analysis, in our data presentation here.
| Rating | 90% Credible Interval |
Overall assessment | 95/100 | 80 - 99 |
Journal rank tier, normative rating | 4.7/5 | 4.0 - 5.0 |
Overall assessment: We asked evaluators to rank this paper “heuristically” as a percentile “relative to all serious research in the same area that you have encountered in the last three years.” We requested they “consider all aspects of quality, credibility, importance to knowledge production, and importance to practice.”
Journal rank tier, normative rating (0-5): “On a ‘scale of journals’, what ‘quality of journal’ should this be published in? (See ranking tiers discussed here)” Note: 0= lowest/none, 5= highest/best”.
See here for the full evaluator guidelines, including further explanation of the requested ratings.
Written report
This paper introduces a new technique - treatment variant aggregation (TVA) - to select a policy from a factorial design and estimate its effect. The authors demonstrate that TVA has desirable theoretical properties and performs well in simulations, both in selecting effective policies and accurately estimating their effects. They apply TVA to study interventions aiming to increase vaccination rates. The authors conclude that the most effective intervention identified by TVA increases vaccination rates by 44%, while the most cost-effective intervention increases vaccination rates by 9%.
Summary
The paper’s primary contribution is a new technique - TVA - to select a policy from a factorial design and estimate its effect. The factorial design assumes M arms each with R dosages. For example, an arm might be monetary incentives with dosages none, low, medium, and high. TVA works as follows:
Feature engineering. Construct a feature matrix such that the features represent marginally increasing dosages. For example, a coefficient on one of these features would represent the marginal effect of switching from medium to high monetary incentives, controlling for the dosages of the other arms.
Feature selection. Use a LASSO regression on a Puffer-transformed version of the feature matrix engineered in step (1) to select certain features. The authors refer to this step as “pooling” (removing features corresponding to different dosages of a given arm) and “pruning” (removing arms altogether).
Effect size estimation and selection. Estimate the coefficients on the policies selected in step (2) using OLS.
Post-selection inference. Apply post-selection inference techniques to obtain quantile-unbiased point estimates and confidence intervals for the best-performing policy from step (3).
The authors show that TVA has desirable theoretical properties. They also use simulations to demonstrate that TVA performs well compared to alternatives, including OLS, certain types of Bayesian estimators, and post-selection inference without feature selection. Specifically, TVA:
Is likely to identify the best policy
Applies minimal shrinkage to the best-performing policy
Accurately estimates the effect of the best-performing policy
Strengths:
While none of the components of TVA are novel, the way the authors stitch them together in a pipeline designed to answer policy-relevant questions is insightful and impressive
Including theoretical results and simulations demonstrates that TVA is an effective tool. Using data from a real factorial experiment to ground the simulation results is especially compelling. This suggests that TVA performs well on the sorts of datasets researchers are likely to see in the real world.
The authors demonstrate TVA's robustness by comparing it to various alternative estimators in different simulation settings.
One of the major drawbacks of Andrews et al., 2021 is that their post-selection inference technique had a high MSE. By applying Andrews et al., 2021 after pooling and pruning, they significantly reduce the MSE of this estimator.
Finally, the authors apply TVA to an important real-world problem - improving vaccine uptake in Haryana, India. I consider this a genuinely important problem that is well addressed by the author’s research.
Weaknesses:[1]
The simulation results appear to draw certain parameters common across simulations from a single dataset (related to vaccination rates in India). These parameters may not be representative of other datasets researchers are likely to encounter. While the authors take steps to mitigate this issue by varying simulated sample sizes and other simulation parameters, the results would be even more compelling if they relied on more than one underlying dataset.
The authors show that TVA is more likely to select the most effective policy than OLS. However, the authors do not estimate the practical effects of this advantage. For example, suppose the best policy increases vaccination rates by 44%. TVA selects this policy 100% of the time, whereas OLS selects it only 50% of the time. However, the other 50% of the time, OLS selects the second-best policy, which increases vaccination rates by 43%. According to the metric the authors present in their simulation results, TVA is twice as likely to select the best policy! However, the practical effects of this improvement are negligible. In sum, it is not clear how much higher vaccination rates would be if we selected policies by TVA as opposed to applying other selection methods. The results would be more compelling if they demonstrated that the policy selected by TVA is, on average, much more practically effective than policies selected by the alternative estimators they consider. Absent these results, it is difficult to evaluate TVA’s practical advantage in selecting better policies.
The authors compare TVA to many alternative estimators, and it is not reasonable to expect them to exhaustively explore every available alternative. However, the paper considers only one alternative Bayesian estimator - a simple version of spike and slab - and a slightly more sophisticated Bayesian estimator could rival or surpass TVA. In particular, I would be interested in seeing how TVA performs compared to a Bayesian estimator that estimates the prior mean as a function of the dosages in each arm and then applies a Bayesian shrinkage estimator as usual, as described in Section 1.4 here. One could even estimate the prior mean using the first steps of TVA; a LASSO regression on the Puffer-transformed “Hasse lattice” features.
Overall, I consider this an extremely impressive and useful paper. I expect the technique it introduces is significantly more accurate in estimating the best-performing treatment than several alternatives, such as OLS, direct application of Andrews et al., 2021’s hybrid estimator, and simple spike and slab Bayesian estimators. I am also cautiously optimistic that it selects more effective policies than alternative estimators, although I am uncertain of how much better it is from a practical perspective (e.g., in terms of increasing vaccination rates compared to counterfactual policy selection methods). I look forward to seeing researchers apply this useful technique to select and estimate policy effects under factorial designs in future work.
Evaluator details
How long have you been in this field?
How many proposals and papers have you evaluated?