One year of justifying our posteriors
For the past year, Seth Benzell and I have been running a particular type of experiment on ourselves with Justified Posteriors, our podcast. Can we behave like good Bayesian learners about research by stating our priors ex-ante, carefully reading papers, and then reporting how we’ve updated our beliefs? This has turned out to be more complicated and more interesting than it seems, something I reflect on in the rest of this essay.
A foundational assumption of Justified Posteriors is that the claims made in published research papers and other intellectual work do not directly correspond to what we believe after reading them. This should be obvious to anyone who has seriously engaged with intellectual work. But what is less obvious is the degree of the gap between the claims in the work and the beliefs of the reader. Is there a slow accumulation of evidence (a vast literature, as one will read in formulaic introductions) that gradually moves our beliefs from zero to one? Or perhaps there is a critical moment, where one paper causes a rethinking of all that came before it, leading to a new conclusion.
We could dredge through the history of science, as our predecessors Popper, Kuhn, and Lakatos have, to come up with examples of both. We idealize the pivotality of Einstein’s tests of general relativity. The evidence we have to deal with is much muddier. We live in a time where claims are circulated as a global pastime. Sometimes these findings come with the trappings of academic prestige and peer review, while other times they come in the form of a polemic dropped like a nuclear bomb into the memesphere, as those who have situational awareness may understand.
Few have time to read deeply, and even thinking seems like one of those lines in a todo list that is never crossed out. Consider the ubiquitous evals used in AI research and cited throughout social media. The number of people who have read the underlying methodology for each eval is minuscule. The ignorance is so vast that people don’t know how few samples are in each eval, let alone the confidence intervals. And yet, a careful evaluation of a new eval such as GDPVal can update our priors by a lot.
This is the water we swim in with Justified Posteriors. The premise for the show seemed simple, but nothing is as simple as it seems. For one, how do we pick a prior, especially without reading the paper? A conceit of the podcast is that we form our priors with zero information about the paper, but even to pick a paper we need to know something about it. Picking a prior turns out to be one of the topics which we struggle with the most.
What are we supposed to learn from a theory paper such as Ide and Talamas’s “Artificial Intelligence in the Knowledge Economy?” A theorist might be satisfied by learning whether this is a useful way of modeling the phenomenon. But we try to translate these into more empirical statements, such as “what percentage of US workers will have managing or creating teams of AI agents as their main job within 5 years.” Typically, we don’t update a lot.
Of the 22 episodes in which we had at least some semblance of priors, the biggest update came for Seth in the episode about “How do social media feed algorithms affect attitudes and behavior in an election campaign?” The randomized control trial evidence on political beliefs convinced him that whether an algorithmic feed or a reverse chronological feed was shown to a user did not affect their political polarization. I already had this as my prior, given prior literature.
Nonetheless, neither of us were willing to update much on the larger claims. The reason is that, as always, the real world is complicated. For example, the paper did not study decisions to moderate content, a process which can be algorithmic but which differs from the algorithmic feed. The paper also did not consider truly directed algorithmic interventions, such as those by Elon Musk on X. We can’t read this paper and just say that algorithmic feeds are not an important determinant of political beliefs.
For me, the biggest update came in the episode “Can AI Make Better Decisions than Doctors?” I came in skeptical that AI could overcome the fundamental problems of causal inference without a randomized control trial. The evidence in the paper strongly updated me toward believing we should be more aggressive in inserting AI into ER decisions.
Interestingly, papers on more macro topics caused smaller updates even if they had much greater implications. Our first episode was fittingly about the now famous Situational Awareness document written by Leopold Aschenbrenner in June 2024. We didn’t have explicit priors, but we thought that AGI was further away than 5 years. We also thought AI was super important and that some of the predictions were plausible. We joked about buying NVIDIA, and didn’t (we were fools). To me, this episode highlights how easy it is to be directionally right, to read the right materials, but to not take ideas seriously enough. The arguments in the paper about power generation and data centers have especially proven correct. And if you squint, we’re following the timeline predictions closely even to this day. Claude Code with Opus 4.5 seems to be just on time for Aschenbrenner’s prediction of a proto-automated-engineer in 2026/2027.
A common theme in our discussions of papers about the economics of AI is that they are often measuring transitory phenomena, such as changes in productivity or performance at a particular point in time. An extreme example of this is the “Simple Macroeconomics of AI” by Daron Acemoglu, which assumes that AI will stay as good as it was in 2024. These papers are often underwhelming, even when they are well-crafted, because what everyone really cares about is what will happen in the future.
Much of my learning has come through conversation about the paper, rather than just by reading the paper. My updates would be very different if I read the paper without talking with Seth about it. This is reminiscent of an academic seminar, in which a group of colleagues focus exclusively on one paper presented by a speaker. Attendees of seminars will know that oftentimes the most interesting part of seminar day occurs in the hallway conversations afterwards, when people share their opinions and discuss. One can tell how serious an academic department is by the quality of the hallway discussion.
This brings me to the next topic, the validity of podcasting as a worthwhile intellectual pursuit for a professor. I am supposed to primarily demonstrate my work on an intellectual topic by writing papers published in top journals. Yet to me it is obvious that we are doing valuable and original work in reading these papers and interpreting them through broader lenses than just the minimum publishable unit. For each episode, we have to understand literatures, engage deeply with evidence, and reason through the implications. This sort of work is something top researchers often do prior to starting new research projects, but is rarely shared outside of side conversations or lab meetings. What Seth and I do is a valid and valuable intellectual activity, not substantively different from writing a paper or a book.
One of the great pleasures of doing the podcast is hearing from our awesome readers and listeners! In the coming year, our goal is to improve the quality of our work by increasing our preparation, improving our audio and video quality, and by bringing on insightful guests. I am excited to continue covering emerging measurements of the AI economy and theoretical frameworks related to the impact and diffusion of AI. As always, we would love to hear from you with any feedback.
Thanks to Seth Benzell for comments and for being a great co-host.



Keep going! We can't have enough critical thinking about these important topics. I'm smarter every time I engage with your work!