The Magic Square of 33

A Barcelona experience is not quite complete without having visited Sagrada Família, Antoni Gaudí's in several senses of the word monumental project (more or less needing the years from 1883 to 2026 to be completed). On the Passion façade there is an innocent and inconspicuous but magic 4 x 4 square, catching my attention, with sums of rows, columns, diagonals, and 2 x 2 subsquares all equal to 33. Can I make such a 4 x 4 square myself, using tools of probability theory and simulation?

Well, how was your trip to exotic Gambia?, asked my parents, once upon a time, a generation or so ago, when we met with their friends Nils and Karin Spjeldnæs. A certain audible sigh was heard from Karin, accompanied with a smile which we children interpreted as being sardonic. Apparently, she had planned this as a vacation trip, where Nils Spjeldnæs, the professorial professor of geology and paleontology, would escape the eternal thoughts of, well, geology and paleontology. But of course he was quickly and deeply fascinated by the local paleontology, and went digging on the beach (gathering a crowd of locals, who for the first time in their lives saw a white man digging), and he soon published about Gambian bryozones and molluscs and worms and cryptic fauna.

Well, how was my trip to Barcelona, May 2019? Thanks for asking: excellent and diverse, and I don't quite see myself as the Spjeldnæs professor type, unable to escape thoughts on mathematical statistics and probability theory. But after flamenco and opera and Picasso and museums and the national galleries and Rambla, the 4 x 4 magic square on the Passion façade of Gaudí's breathtaking Sagrada Família caught my attention and curiosity. The sums of rows, columns, diagonals, and also of each 2 x 2 subsquare, are all equal to 33 (the age of Jesus when he died for us all).

One does not need to be a mathematician or statistician or theologian to wonder how complicated it would be to come up with such a square, and to ask if there are many alternative squares with the same properties. What a statistician can attempt to do is to form a probability distribution, on the very big set of all 4 x 4 squares filled up by the given set of numbers, such that outcomes with the "all required sums should be 33 or close to 33" property will have high probability, and then sample away from such a distribution.

Squares with horizontal and vertical sums equal to 33

So let me start. The sample space consists of all 4 x 4 squares $x$, with elements $x[i,j]$ for rows $i$ and columns $j$, using precisely the 16 Gaudí numbers. These are not the natural sequence 1 to 16, but rather 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 10, 11, 13, 14, 14, 15, i.e. two of 10, two of 14, and no 12, essentially since Gaudí wished for 33, not 34. There are $16!=1\cdot2\cdot3\cdots15\cdot16$ possible ways of permuting 16 numbers, but here the two 10 and the two 14 can be internally switched, so the sample space, on which I will build a probability distribution, has $16!\,/4\doteq 5.231\cdot 10^{12}$ elements. For such squares $x$, I take an interest in row sums

$\eqalign{ r_1(x)&=\sum_j x[1,j], \cr r_2(x)&=\sum_j x[2,j], \cr r_3(x)&=\sum_j x[3,j], \cr r_4(x)&=\sum_j x[4,j], \cr}$

and similarly column sums $c_1(x),c_2(x),c_3(x),c_4(x)$. I need to see squares with low values of the row-sums and column-sum scores

$\eqalign{ R(x)&=|r_1(x)-33|+|r_2(x)-33|+|r_3(x)-33|+|r_4(x)-33|, \cr C(x)&=|c_1(x)-33|+|c_2(x)-33|+|c_3(x)-33|+|c_4(x)-33|, \cr}$

which I can accomplish via the well-defined but complicated probability distribution

$f(x)=(1/k)\exp\bigl[ -\lambda\{R(x)+C(x)\} \bigr],$

where $\lambda$ is a positive fine-tuning parameter. Here $k$ is the mathematically necessary but in practice highly complicated normalisation constant, a sum over $5.231\cdot10^{12}$ different squares. Squares with maximal probability, for this specially designed distribution, are precisely those with $R(x)=0$ and $C(x)=0$, i.e. those with all horizontal and vertical sums equal to 33. But those amount to exceedingly few needles in an exceedingly enormous haystack.

Markov chains

The point, theoretically and practically, is now that I can manage to sample squares from the $f(x)$ probability distribution, using an appropriately constructed Markov chain of 4 x 4 squares. Perhaps 99.9999 percent of the five trillion possible squares have very low probabilities, under my $f(x)$ model, so if I succeed in simulating a sequence of outcomes from the model what I will see are squares with low scores of $R(x)+C(x)$, i.e. squares where all row sums and all column sums are close to the desired magic number 33. I manage this by setting up a perhaps very long Markov chain of squares, say $x_1,x_2,\ldots$, where (a) the next outcome square $x_{t+1}$ should be easy to generate, given the present $x_t$, and (b) the equilibrium distribution should be precisely the $f(x)$ above. What I do is to (i) start anywhere, e.g. at the utterly simple

$x_1=\pmatrix{ 1 & 2 & 3 & 4 \cr 5 & 6 & 7 & 8 \cr 9 &10 &10 &11 \cr 13 &14 &14 &15 \cr}\,;$

then (ii) generate simple proposals, which I do in the symmetric fashion of selecting two positions randomly and then switch the two numbers; and then (iii) accepting or rejecting these proposals with fine-calibrated probabilities. I do the latter step (iii) using

$\def\prop{{\rm prop}} \def\prev{{\rm prev}} \eqalign{ {\rm pr} &=\min\{1,f(x_\prop)/f(x_\prev)\} \cr &=\min\{1,\exp[-\lambda\{Q(x_\prop)-Q(x_\prev)\}]. \cr}$

as the probability of accepting the $x_{\rm prop}$ as the next $x_{t+1}$; with the complementary probability $1-{\rm pr}$ I let $x_{t+1}$ remain at the $x_t$ position. Here I write $Q(x)=R(x)+C(x)$ for the combined rows and columns score, and with `prop' and `prev' indicating the proposal and the previous square in the Markov chain. If the generated proposal has lower $Q(x_{\rm prop})$ score than the previous $Q(x_{\rm prev})$, then it is automatically accepted, as the next square in my Markov chain of squares. And if it is bigger, it may still be accepted, but then with a certain fine-tuned probability, namely $\exp(-\lambda)$ raised to a certain power, as per the recipe described. Note that the heavily awkward normalisation constant $k$ has dropped out of the algorithm.

Theory for stationary Markov chains, with theorems from the Kolmogorov era of the 1930ies, now guarantee that the chain I've described has a unique equilibrium distribution, namely & precisely, the intended $f(x)$. The essence here is that with $\pi(x'\,|\,x)$ denoting the transition probability for the chain, the chance of the next square being $x'$ when starting from $x$, we do have

$f(x) \pi(x'\,|\,x) = f(x') \pi (x\,|\,x') $

for all matrices $x,x'$.

_{Markov chain of squares, with $\lambda=0.444$. After 2881 steps, the $Q(x)$ score has reached zero, yielding a 4 x 4 matrix with all sums and all columns summing to 33.}

The first matrix I happen to find in this way, the first time $Q(x)$ lands at zero, and clearly one of many possibilities, is

$x_{2881}=\pmatrix{ 2 & 9 & 14 & 8 \cr 11 & 3 & 5 & 14\cr 10 & 6 & 13 & 4\cr 10 & 15 & 1 & 7\cr}.$

Running the chain longer, or starting it again at new and perhaps randomly selected positions, yields other half-magical 4 x 4 squares, with the horizontal and vertical properties.

Very magical squares with lots of sums equal to 33

I have a taller and more magical order, however, demanding similarly that the two diagonals sum to 33, and also for each of the four 2 x 2 blocks (an exercise shows that then also the central 2 x 2 block, in the middle of the 4 x 4 square, must have sum 33). So I revise my probability distribution, on the same exceedingly big sample space of five trillion squares, each made up with the 16 Gaudí numbers, and make it

$\eqalign{ f(x) &=(1/k)\exp\{-\lambda Q(x)\} \cr &=(1/k)\exp\bigl[ -\lambda\{R(x)+C(x)+D(x)+B(x)\} \bigr], }$

say, where $R(x)$ is the sum of four horizontal $| \sum_j x[i,j]-33 |$, $C(x)$ the sum of four vertical $| \sum_i x[i,j]-33 |$, $D(x)$ the sum of two $| \sum{\rm diag}(x)-33 |$, and finally $B(x)$ the sum of four block scores $| \sum_{\rm block}x[i,j] - 33 |$.

Then I revise my Markov chain simulation algorithm accordingly, with a bit more complicated acceptance probabilities for each of the thousands of proposals I generate. The theory goes through, undaunted, and guarantees that the chain's equilibrium distribution is this new $f(x)$. Drawing samples from this $f(x)$ gives squares with four horizontal sums, four vertical sums, two diagonal sums, and four block sums, all being in the vicinity of 33. By keeping up the chain, I will sooner or later find the truly magical Gaudí squares, where all of these 4 + 4 + 2 + 4 = 14 sums are exactly equal to 33.

The first Gaudí square I find, using this technique, happens to be

$\pmatrix{ 10 & 7 & 1 & 15 \cr 6 & 10 & 13 & 4 \cr 14 & 2 & 8 & 9 \cr 3 & 14 & 11 & 5 \cr}$

with all sums and balances magically equal to the desired 33.

A rather magical 8 x 8 square

Benjamin Franklin (1706-1790), statesman, inventor, scientist, inventor, philosopher, economist, printer, and musician (he played the guitar, the harp, the viola da gamba, and for good measure invented his own glass armonica), had the talent to be creatively inventive when he was bored. He must have been a clever doodler and droodler and riddler. Once upon a time he constructed a rather beautiful 8 x 8 square, with lots of sums equal to 260. As he rather modestly writes in his autobiography (published 1793), "I was at length tired with sitting there to hear debates, in which, as clerk, I could take no part, and which were often so unentertaining that I was induc'd to amuse myself with making magic squares or circles".

I'm not comparing myself to Franklin (I can't play the viola da gamba, as yet, unfortunately), but the approaches exhibited in this blog post, and in my accompanying FocuStat blog post on how to solve sudoku puzzles via probability models and Markov chains, lend themselves to clever searching for such 8 x 8 tables, with any set of desired properties. Here's an example, where the order is to find a magical 8 x 8 table, filled up with the numbers 1 to 64, with all horizontal and vertical and diagonal sums equal to 260, and at the same time demanding that also sums of the five most important 4 x 4 blocks should equal to 520. I do this by setting up a probability model $(1/k)\exp\{-\lambda Q(x)\}$, with $Q(x)=R(x)+C(x)+D(x)+B(x)$, with proper score functions for rows, columns, diagonals, and blocks, and then sampling away until I spot a square with $Q(x)=0$. This is what I chanced to find, after half a million or so of random iterations:

$\pmatrix{ 27 & 50 & 6 & 48 & 43 & 39 & 46 & 1 \cr 3 & 56 & 60 & 8 & 14 & 49 & 61 & 9\cr 11 & 28 & 45 & 53 & 24 & 40 & 25 & 34\cr 35 & 42 & 7 & 41 & 16 & 36 & 19 & 64\cr 22 & 17 & 44 & 26 & 54 & 13 & 21 & 63\cr 59 & 5 & 33 & 57 & 29 & 2 & 37 & 38\cr 52 & 32 & 55 & 12 & 62 & 23 & 4 & 20\cr 51 & 30 & 10 & 15 & 18 & 58 & 47 & 31\cr}$

Here all horizontal, vertical, diagonal sums are equal to 260; also, the sum for each of the four 4 x 4 outer blocks, and also for the middle inner 4 x 4 block, is equal to 520.

... and a very magical 10 x 10 square

To flex my magical muscles, which I didn't know I had until I stumbled upon the Gaudí square last week and started thinking about $\exp\{-\lambda Q(x)\}$ models and R-fiddling with Markov chains, and then detected to my mild surprise that even I could solve a sudoku puzzle, I include one more illustration. Now I wish to construct a magical-looking 10 x 10 square, using the natural numbers 1 to 100. If all its horizontal and vertical sums are to be equal, then that magical number needs to be 10 times the average of all the numbers in the square, i.e.

${{\rm magic}}=10\cdot{1\over 100}(1+2+3+\cdots+100) =10{1\over 100}{100\cdot101\over 2}=505,$

which the eight year old kid Carl Friedrich Gauß would've been able to tell you in 1785. I wish to find a 10 x 10 square such that (a) each row and each column sum to this number, (b) also the diagonals, and even (c) the same for each of the twenty 2 x 5 squares, like $x[1\!:\!2,1\!:\!5]$ for the two first rows and five first columns, etc. So, similarly to a construction above, I build the probability model

$\eqalign{ f(x) &=(1/k)\exp\{-\lambda Q(x)\} \cr &=(1/k)\exp\bigl[ -\lambda\{R(x)+C(x)+D(x)+B(x)\} \bigr], \cr}$

with appropriate row score function $R(x)$, column score function $C(x)$, diagonal score function $D(x)$, and block score function $B(x)$. These I construct such that all are zero if the 10 x 10 square is perfect; they are low if $x$ is doing well; and one or more of them become big if $x$ is far from being magical. Specifically, I let $R(x)=\sum_{i=1}^{10} r_i(x)$, with row functions $r_i(x)=|\sum_{j=1}^{10}x[i,j]-505|$, and similarly for $C(x)$ with column functions $c_j(x)$, and for $D(x)$ the sum of two diagonal score functions; finally $B(x)$ is the sum of twenty block controlling functions, like $|\sum_{i=1,2,j=1,2,3,4,5} x[i,j]-505|$.

It takes a while for the resulting $Q(x)$ chain to hit zero, and I needed to ask for a full million random iterations, taking my laptop around ten minutes (with no efforts put into streamlining my R-code). But it works, as shown in the figure:

_{I needed a long chain for this one, but before a full million steps were completed the $Q(x)$ hit zero, after which I could read off a magical 10 x 10 matrix with all sums of rows, of columns, of diagonals, and also of the twenty subblocks of size 2 x 5, were equal to the magic number 505.}

The 10 x 10 square I found in this fashion is this one:

$\pmatrix{ 36 & 39 & 48 & 91 & 64 & 9 & 49 & 34 & 52 & 83 \cr 92 & 17 & 4 & 42 & 72 & 94 & 57 & 28 & 86 & 13\cr 88 & 21 & 85 & 38 & 51 & 54 & 61 & 65 & 32 & 10\cr 79 & 73 & 37 & 18 & 15 & 90 & 23 & 87 & 80 & 3\cr 19 & 8 & 63 & 71 & 99 & 5 & 40 & 69 & 31 & 100\cr 68 & 35 & 96 & 45 & 1 & 84 & 11 & 25 & 74 & 66\cr 33 & 82 & 44 & 95 & 67 & 22 & 53 & 27 & 12 & 70\cr 2 & 55 & 62 & 59 & 6 & 47 & 58 & 43 & 98 & 75\cr 81 & 78 & 46 & 30 & 41 & 76 & 60 & 50 & 14 & 29\cr 7 & 97 & 20 & 16 & 89 & 24 & 93 & 77 & 26 & 56\cr}$

You may check that $R(x)=0$ (ten rows ok), $C(x)=0$ (ten columns ok), $D(x)=0$ (two diagonals ok), $B(x)=0$ (twenty blocks of size 2 x 5 ok).

A few remarks

A. A.A. Markov (1856-1922) invented Markov chains in about 1906, and the world's first serious data analysis using Markov chains was carried out by Markov himself, in 1913, where he amazingly went through the first 20,000 letters of Pushkin's epic poem Евгений Онегин, keeping track of how vowels and consonants followed consonants and vowels, the point being to clearly demonstrate that the vowel-consonant gender of the letters of words are not statistically independent. See more discussion on this in N.L. Hjort and C. Varin's paper ML, PL, QL in Markov chain models (Scandinavian Journal of Statistics, 2008), where we also fit longer-memory Markov chain models to the Pushkin poem. So Markov's sample space, for that application, had two elements. I imagine that he would have been delighted to see how I can use his chains, in enormously big sample spaces, to read off solutions to highly complicated puzzles, like sudoku and magic squares.

_{A.A. Markov, age 30 (photo from wikipedia).}

B. The methods exhibited here, building probability models favouring the right types of outcomes and then constructing Markov chains to sample long lists of realisations, until a solution is found, can be used for various other problems. I have written up a companion FocuStat blogpost illustrating this, Sudoku Solving by Probability Models and Markov Chains. Yet other applications include the travelling salesman problem, where a setup like in the present blog post may work well, in the sense that decently well-working paths of travel will be found, though without the mathematical guarantee that the absolute winner has been found. This is different both for the sudoku puzzle and the magical squares problems, since we know when the Markov chain has found a solution.

C. What I've called Gaudí squares above, in admiration for the main architect for the wonder-instilling Sagrada Família, should perhaps rather be called Subirachs squares, since Josep Maria Subirachs i Sitjar (1927-2014) was the sculptor making that particular detail. Its inspiration is clearly Albrecht Dürer's magical 4 x 4 square, in his Melencolia 1 from 1514:

_{Dürer's Melencolia 1 from 1514, with the magical 4 x 4 square using all number 1-to-16, and with magical number 34, rather than Subirachs's 33; see Hjort (2018).}

The Dürer square

$\pmatrix{ 16 & 3 & 2 &13 \cr 5 &10 &11 & 8\cr 9 & 6 & 7 &12\cr 4 &15 &14 & 1\cr} $

uses all natural numbers 1-to-16, has therefore magic number 34, not 33, and we see that all horizontal, vertical, diagonal, and even various 2 x 2 block sums, are equal to that number. The Dürer square has inspired not merely Subirachs and Gaudí, but also symbolic crime writer Dan Brown. The following exchange is from his 2009 thriller The Lost Symbol:

"Thirty-four", she said. "Every direction adds up to thirty-four."

"Exactly", Langdon said. "But did you know that this magic square is famous because Dürer accomplished the seemingly impossible?" He quickly showed Katherine that in addition to making the rows, columns, and diagonals add up to thirty-four, Dürer had also found a way to make the four quadrants, the four center squares, and even the four corner squares add up to that number. "Most amazing, though, was Dürer's ability to position the numbers 15 and 14 together in the bottom row as an indication of the year in which he accomplished this incredible feat!"

D. My methods, via specially designed probability models and Markov chains, may work well in finding solutions to well-posed highdimensional combinatorial puzzles, like for sudoku and magical squares. Applying the methods does not tell me how many solutions there are, however (well, the sudoku puzzles are constructed to have precisely one solution). It is actually very hard to count the number of solutions for magical squares, and apparently nobody knows, not even the mathematicians, how many $n\times n$ magical squares there are, for $n\ge6$. Apparently, there are 310 such possible solutions to the Gaudí 33-puzzle.

E. Applying my methods above I used a positive fine-tuning $\lambda$ parameter in the $\exp\{-\lambda Q(x)\}$ models, to favour small outcomes of $Q(x)$. I can seriously turn the tables, however, by making $\lambda$ negative, which then suddenly favours the bigger $Q(x)$, very far away from the magical squares. Here's what I might get in this fashion, by rolling the chain for a while, getting the most non-magical Gaudí squares:

$\pmatrix{ 11 & 2 & 1 & 10 \cr 9 & 4 & 3 & 5 \cr 14 & 10 & 8 & 15 \cr 13 & 7 & 6 & 14 \cr}$

You may check that all required sums are now suddenly far cries away from the Jesus number 33.

F. This FocuStat blog post, along with my sudoku solution finder blog post, is more probabilistical than statistical in nature. I have not constructed or fitted models to data, but invented probability models to help me search for the mini-needle in the tera-haystack. Yet, as mentioned also in the accompanying sudoku blog post, there are important application areas, with complex and massive data, like images and texts, where models of the type

$f(x)=(1/k)\exp\{-\lambda^{\rm t}Q(x)\} =(1/k)\exp\Bigl\{ -\sum_{j=1}^p \lambda_jQ_j(x) \Bigr\}, $

are fruitful, typically with a multidimensional $Q(x)=(Q_1(x),\ldots,Q_p(x))$ and hence a vector parameter $\lambda=(\lambda_1,\ldots,\lambda_p)$. If you see photos of nine thousand women, or dolphins, models could be built of this type, with specially designed $Q_j(x)$ functions, in order to look for differences, or signs of homogeneity, or to analyse the levels of variation. Such applications would then involve estimating and assessing the $\lambda_j$ parameters. Tackling the issues would again involve running Markov chains.

A few references

Bayer, C. (2019). Planetary Magical Squares. Learn Religions.

Brown, D. (2009). Lost Symbol, The. Doubleday, New York.

Claeskens, G. and Hjort, N.L. (2008). Model Selection and Model Averaging. Cambridge University Press, Cambridge. (We do model comparison and selection among Markov chains of different orders, for the Pushkin Onegin and for other poems. This can be pursued further.)

Hayes, B. (2013). First links in the Markov chain: Probability and poetry were unlikely partners in the creation of a computational tool. American Scientist, 101, 92-99.

Hjort, N.L. (2019). Sudoku Solving by Probability Models and Markov Chains. FocuStat Blog Post, May 2019.

Hjort, S.N.U. (2018). Mellom milt og Saturn. Speilvendt, no. 2, special issue on Taperne (the Losers), 35-40.

Hjort, N.L. and Varin, C. (2008). ML, PL, QL for Markov chain models. Scandinavian Journal of Statistics, 35, 64-82.

Markov, A.A. (1913). Пример статистического исследования над текстом `Евгения Онегина', иллюстрирующий связь испытаний в цепь. Известия Императорской академии наук, Серия 6, Санкт-Петербург. Here's a link to an English translation.

Pickover, C.A. The Zen of Magic Squares, Circles, and Stars: An Exhibition of Surprising Structures across Dimensions. Princeton University Press, Princeton.

Schweder, T. and Hjort, N.L. (2016). Confidence, Likelihood, Probability. Cambridge University Press, Cambridge. (We do Markov chain analyses in Chapter 4, including confidence curves for crucial Markovian parameters of different orders, for the Onegin text.)

I thank Céline Cunen and Sam-Erik Walker for valuable comments on some of the aspects taken up in this blog post.

Tags: magical squares, Dürer, Dan Brown, Gaudí, puzzles, Sudoku, Markov chains By Nils Lid Hjort