Software for Economic Modelling
Not very novel, but still very valuable
During my PhD, I had the good fortune to work on and build some software for economic and statistical modelling. With the help of claude/codex, I was able to build these out faster. All of them are still quite early, and have some rough edges. They all use state of the art programming tools (Rust, JAX, PyTorch, etc.), draw from large bodies of well-established literatures and there is extensive documentation about how to use them.
EconIRL: This is suite of models and methods to analyze and understand sequential choices. These types of choices typically arise when there is something that “accumulates” over time. Consider the owner of durable asset e.g. a truck who must decide what is the best time to replace it (lest it break down on-route). Or consider, a taxi driver that is planning their route through a busy city, going from point A to point B; trading off traffic, time and quality of the road. Or consider, a user of a music app, deciding to skip or play the next song in a playlist. These models help us understand how exactly the user values now vs then, and what preferences drive his decision making at each step. If we can understand these we can then design routes, playlists or better pricing for durables.
PrefGraph: This is software that helps us detect “inconsistency” or “irrationality” in choices. For instance a common way that humans are inconsistent is when their choices change in the presence of “decoy” options which is an obviously bad option but presented to make other options look good. Another area where we might want to test for consistency is in AI agents or LLM responses. Ideally, we’d want our agents to be coherent and consistent, especially when they are to shop or reason for us. Note, that consistency is not the same as being right or wrong. Furthermore, often consistency can fail for other reasons such as changing preferences, a failure to a pay close attention to all options or simply making a ‘hand-trembling’ mistake. This software makes use of advanced graphical techniques developed in Operations Research.
DeepInference: This is a more statistical package that takes traditional statistical models such as the good old linear regression: Y = a + b T + error and makes it into Y = a(X) + b(X) T + error, so the slope and intercepts are now functions of X, which could be potentially very high dimensional and dense objects (e.g. image, text, sequences, graphs and so on). Well, in a world where such high dim. objects are first class citizens, we need to be able to work directly on them when studying empirical relationships in the data. And while it is easy to estimate such models, it is not so easy to run hypothesis tests such as H0: E[b(X)] = 0. We have to rely on statistical tools like influence functions to achieve that.
These packages make it clear that the main thrust of my PhD was to explore how computational machinery from machine learning could be used in service of economic and statistical modelling. And after 5 years, I have come to the conclusion that integration of machine and deep learning methods in economics is not only easy but is a natural direction if we want to model and work with dense objects (which are abundant in the social world).
Furthermore, I think a lot of interesting methodology remains buried in economics or econometrics papers or in arcane codebases, and it takes a significant effort and thinking to make these methods available to the public. However, I think there is value to this: the best validation of a method or tool is its widespread adoption and building open source software is the only way to do this.
The Bayesians learned this the hard way, but now with software like Stan we are able to run Bayesian workflows at ease. Similarily, if CS researchers had not come up with StableBaselines or the many benchmarks for robotics like MujoCo, it would be really hard to do research in the area of reinforcement learning where frameworks are very brittle and breakdown without careful hyper-parameter tuning.
Economics today does not emphasis clean, reproducible code and we do not really teach for this nor reward this in the publication process. But this is a vital aspect of making sure our work is not only reproduced and tested, but also that it is made available for others to build on, which in turn allows us to build on what they create using our work (a positive feedback loop).
