# Statistics and other links and thoughts, September-Mid October 2022

Here are some statistical and other links.

The previous linkpost in this blog was in August, so I have achieved a streak of 2! Or almost 2, as it is already October 13. Let’s call it two positive observations in the series of 2.41 months. Laplace’s rule of succession would suggest that we have a probability

$\frac{2+1}{2.41+2} = \frac{3}{4.41} = 68.027%$

of observing the next link post in the next month. Not too shabby, but not very trustworthy either. Which describes my feelings about continuing blogging this time around (I think I have had three hiatuses of various length, and WP statistic suggest about 3 readers, who may very well be bots). BTW, Laplace’s rule was the first link. ;)

## Predictions

Suggestions of better scoring rules (for collaborative forecasting).

Comments: Sounds very neat, and much better than Keynesian Beauty Contest (for explanation in the prediction market context see eg this ACX post). I fear the suggested oracle method is susceptible to bad actors as oracles, though, in a way a regular prediction market ideally should not be. Though I feel any prediction market should totally not use money at all. Reputation-based mechanisms maybe could be salvaged?

Folk “theorem” 1: If a prediction market gives reward X for correct prediction, but if you as know there are benefits Y to be realized from influencing prediction market making everyone to believe in a biased prediction, and Y > X (and if you are homo economicus), you should try to influence the prediction markets.

Folk “theorem” 2: However, consider that the typical example for realizing such benefits Y would be something not unlike stock market trading or options or w/e. Comparisons to insider trading are easy. ANYWAY, making a trade that yields Y profit is possible because the other trader(s) (patsies) with bad information agree to it, thus they a trade at loss (compared to not making the trade if they had had non-manipulated information provided by prediction markets).

Analogues can be drawn to politics, elections, warfare, etc; the issue is not strictly limited to financial instruments.

The point being, would-be patsies are aware of point 1 and have an incentive not to be patsies. They also totally should spend money in prediction markets to over-correct any probabilistic bad actors.

And finally we have a folk … uh, let say it is a folk conjecture: Doesn’t all the above suggest that any liquid, publicly known prediction market coupled to the other global economy will eventually subsume all of the markets (previously known as the real world), as everyone tries to extract profit by either trading their information or misleading the prediction markets against their better information if it worth more in the non-prediction markets … ?

## VATT

(Finnish state-funded) Institute for Economic Research VATT tweeted many interesting papers about economics of housing during their VATT-päivä seminar. In English, in Finnish

Then I found that VATT + Tilastokeskus + Helsinki GSE have an interesting initiative for producing fast-paced research / policy briefs called Datahuone. I don’t know if it is going to be different than research institutes / think tanks usually do, but finding good think tanks in Finland is difficult, and special bonus if they manage to produce topical outputs.

## Long-term effects of Chinese revolution

Okay, somehow my October was very economics-heavy. Here is The Economist and social capital of grand-children of pre-revolutionary Chinese elite. I quote two nice graphical visualisations that explain the main point

## (How to not) pseudocriticize

(We are now approaching genuine statistical content!)

Stuart Ritchie in his Substack: Pseudocritics, and how not to be one

Summary: People love looking at scatterplots and arguing them supporting / not supporting some particular hypothesis (I myself do this a lot!) One should bear in mind that quantitative, numerical statistics is most needed in judging random-looking scatterplot clouds. After all, quantitative result says how random they are (under some assumptions, naturally), and one would not ask if it wasn’t in dispute.

## Latent factor models

S. E. Heaps, http://arxiv.org/abs/2208.07831 “Structured prior distributions for the covariance matrix in latent factor models” 16 Aug 2022

Comments: I saw a comment somewhere (unfortunately didn’t write down where) that Heaps has managed to derive a Bayesian latent factor model with prior + sampler that is quite useful. Examples include using phylogenetic-tree prior for bird observation data factorization.

Fitting such models naively can be a little painful in my experience. I hope I won’t forget about this, that why I am writing it down here.

Lakens has published now (free, CC-BY-NC-SA) online textbook called Improving your statistical inferences. The parts I have read are not very complicated or technical if you already possess familiarity with statistics, but that just makes it then a not-too-heavy reading: graphics look intuitive and principles well-explained, so chances are, it could improve my statistical inferences!

## R packages

targets looks like a cool way to manage pipeline dependency targets.

box appears to provide a handy way to avoid source() function spaghetti without moving to R package development. Downside: one needs to learn a quite different framework.

That’s all for this time.

No further introduction necessary: Here are some statistics links ->

(Expect these links posts to be irregular in future. I am clearing my tabs.)

Why do tree-based models still outperform deep learning on tabular data? Grinsztajn, Léo, Edouard Oyallon, and Gaël Varoquaux. arXiv, July 18, 2022. https://doi.org/10.48550/arXiv.2207.08815.

Comment: The most interesting thing in the paper was its title. I didn’t know tree-based models are supposed to outperform but I guess they do. I don’t know much about this field, but it sort-of makes intuitive sense: Transformers and prior to them MLP(CNN) architectures have been very impressive at problems in computer vision and natural language — which used to be difficult, because previously the best computer algorithms were not that good at same stuff as our mammalian human brain does 24/7. But “vision” and “natural language” is different kind of difficult than fitting ML models on arbitrary tabular data.

And apparently XGBoost is still good for something. (Learning it wasn’t in vain and it is still relevant?)

McDermott, Grant. “Efficient Simulations in R.” Grant R. McDermott, June 24, 2021. https://grantmcdermott.com/efficient-simulations-in-r/. See also follow-up.

Comment: The most useful quote to me:

However, regressions are run on matrices. Which is to say that when you run a regression in R — and most other languages for that matter — behind the scenes your input data frame is first converted to an equivalent matrix before any computation gets done. Matrices have several features that make them “faster” to compute on than data frames. For example, every element must be of the same type (say, numeric). But let’s just agree that converting a data frame to a matrix requires at least some computational effort. Consider then what happens when we feed our lm.fit() function a pre-created design matrix, instead asking it to convert a bunch of data frame columns on the fly.

McDermott

Torous, William, Florian Gunsilius, and Philippe Rigollet. “An Optimal Transport Approach to Causal Inference.” arXiv, August 12, 2021. https://doi.org/10.48550/arXiv.2108.05858.

Comment: I do not understand optimal transport, but it seems quite cool. (This tutorial paper by Peyré and Cuturi has been resting on my “to-read” shelf since 2020.) Now, earlier this year I have learned a lot of causal inference techniques common in econometrics, such as differences-in-differences. Apparently DiD can be generalized as “CiC” (Changes-in-Changes), but according to Torous and friends, it works poorly and their optimal transport approach works better. (I can’t really say, but graphs look nice.)

# Reading How to Measure Anything, interlude 1: Bayesian and frequentist inference

(Summary in Finnish: Oppimispäiväkirjamerkinnät jatkuvat hitaasti. Tässä lyhyt lukusuosituslinkki Bayes-päättelystä.)

Context: This is a quick interlude note in a series of learning diary notes, where I track my thoughts, comments, and (hopefully) learning process while reading How to Measure Anything by D. W. Hubbard together with some friends in a book club setting. Previous parts in the series: vol.0, vol.1., vol. 2. All installments in the series can be identified by tag “Measure Anything” series on this blog.

## Introduction

Despite the radio silence, the reading club has been marching on steadily but quite slowly. I have work in progress drafts for notes vol. 3, 4 and 5! Unfortunately other life has intervened with finishing the drafts, so the next installments of reading log entries will come up online here on the blog … sometime later.

However, as the book discusses in several places “Bayesian probability”, I thought it would be prudent to share some links to articles that actually explain what it means. (As I am a bit too busy to write a thorough lecture on myself, I will rather defer to experts.)

## Difference between Bayesian and frequentist inference

Very shortly described: The frequentist inference is concerned with interpretation of probability, where probability is understood as property of repeated, independent events (“frequency”). Bayesian inference builds on Bayesian interpretation of probability, where “probability” is taken to be a thing that exists for anything, interpreted as quantification of knowledge about many different things. This kind of interpretation makes it possible to sensibly interpret and use Bayes theorem for inference about various random variables.

This is a succinct definition by a person who has had some years of experience working on this stuff, and it might not much sense if you are not already familiar with it.

While looking for something else entirely, I noticed this five-part series of blog posts by Jake VanderPlas. It illustrates the above brief statement in more detail. I recommend the first part (which I have actually read), as it is quite practical example. However, as a word of warning, the author is an astronomer, so for them “practical” includes use of some mathematical notation and calculations.

For a discussion about implications of these concepts, here is a nice pdf of class notes from Orloff and Bloom, MIT.

This Stat.StackExchange answer by Keith Winstein is great explanation how the difference works out between frequentist confidence intervals and Bayesian credible intervals. It involves chocolate chip cookie jars!

Bio-statistician Frank Harrell has a blog post titled My Journey From Frequentist to Bayesian Statistics. It also collects further links at the end.

## Use of Bayesian statistics is not always very Bayesian in practice

Have you read all of the above?

Good! Here are some thoughts related to the real-life applications of Bayesian inference.

In addition to all of the above, there is a certain internet crowd who likes to use words like “prior”, “Bayesian belief” and “Bayesian update” for many things because “rational agents are Bayesians”. I do not say it is not useful to have such a concept and thus word for inductive reasoning (or, as one may say, “Bayesian update”): if you have a prior state of belief, and then obtain some new information, and if one can quantify the prior and the likelihood of data with with parametric distributions or probabilistic statements, the Bayes’ theorem will tell you what is the mathematically correct probabilistic state of belief (the posterior). (And if you skip the step of quantifying the numbers, one could still argue the procedure of obtaining the posterior belief should look like application of Bayes’ Theorem if one were to put numbers on it, which maybe gives some intuition about reconciling ones beliefs about some matter with new evidence.)

However, more one works with explicit, quantitative Bayesian statistical models (like presented in the VanderPlas blog series) it starts to sound a bit weird to talk about “updating ones belief” without making calculations with any models or probabilities.

It gets even more weird when practicing statisticians (who write authoritative textbooks on Bayesian data analysis) explain that actually, in real life, the way they do Bayesian statistics does not resemble an inductive series of Bayesian belief updates (pdf):

A substantial school in the philosophy of science identifies Bayesian inference with inductive inference and even rationality as such, and seems to be strengthened by the rise and practical success of Bayesian statistics. We argue that the most successful forms of Bayesian statistics do not actually support that particular philosophy but rather accord much better with sophisticated forms of hypothetico‐deductivism. We examine the actual role played by prior distributions in Bayesian models, and the crucial aspects of model checking and model revision, which fall outside the scope of Bayesian confirmation theory. We draw on the literature on the consistency of Bayesian updating and also on our experience of applied work in social science. Clarity about these matters should benefit not just philosophy of science, but also statistical practice. At best, the inductivist view has encouraged researchers to fit and compare models without checking them; at worst, theorists have actively discouraged practitioners from performing model checking because it does not fit into their framework.

Gelman and Shalizi, “Philosophy and the Practice of Bayesian Statistics.” British Journal of Mathematical and Statistical Psychology 66, no. 1 (2013): 8–38. https://doi.org/10.1111/j.2044-8317.2011.02037.x.

If you feel like reading 30 page of philosophy of statistics, read the whole article. The way I read it, looking at the whole of knowledge-making produce of successful practical statistics, which includes the part where (1) one formulates the Bayesian model and priors about some phenomenon, (2) fits the model to data and obtains the posterior inferences with math and algorithms, and (3) then checks if it really works with various other methods, only (2) is really about making Bayesian updates. In combination with parts (1) and (3), the whole procedure is more hypothetico-deductive than inductive, and model checks that have some affinity with Popperian falsifications.

If you want to read more about this kind ” statistician’s way of doing” Bayesian inference, you can read a more recent article “Bayesian Workflow” by Gelman et al. 2020 (arxiv) which presents a comprehensive and quite technical 77-page step by step tutorial into it, or less comprehensive but also quite mathematical essay by Michael Betancourt (2020), “Towards A Principled Bayesian Workflow” (html).