Reason why The editor invited me to comment on a paper from the early 1980s (Koenderink 1984a). I checked Google ScholarFootnote 1 today (February 10, 2021): it had 3 618 citations since 1984. Current rate is \(\pm 50/\mathrm {annum}\). The paper was hardly cited before the 1990s. Citations peak about 2010.Footnote 2 I’ll sketch why I think that is. I freely quote from my own past, because these references sketch the intellectual context.

1 Prehistory

What sparked me off? It was due to my interests in human awareness, neurophysiology, geometry, philosophy of mind and the visual arts. I’ve always been struck by the scale invariance of Leibniz Monads Leibniz (1991). As for the sciences, the well-known “Powers of Ten” movie (Eames and Eames 1977) fired my imagination and a remark by Friedrich Nietzsche alerted me to Boscovich (1762). The problem of the continuum (through Franz Brentano (Brentano 1988; Koenderink et al. 2017a)Footnote 3) kept me awake. I saw similarities between the neurophysiology of Lotze’s (1884) local sign (Koenderink 1984b, c) (via a genial remarkFootnote 4 by Helmholtz (1884)) and Čech cohomology (Čech 1932). In the visual arts I was fascinated by John Ruskin’s “mystery” of distant details. He drew the first “scale space” I’ve ever seen (Ruskin 1857).

The academic problems from psychophysics required novel neurophysiological models. Such needs also arose in computer (image) science. At some point this sparked me off.

Note the serendipity. We only see the rivolets but are blind to the stream (science!).

Mysterious data During the 1970s and 1980s I worked on extensive perimetric studies of visual abilities such as spatiotemporal contrast luminance and hue detection, movement, etc. (Koenderink et al. 1978; van de Grind et al. 1983; van Esch et al. 1984). In retrospect this huge corpus was mostly ignored. My main satisfaction is that there are many “facts” in recent textbooks that I know to be mistaken.Footnote 5 At least I know.

These data were scale independent (Koenderink and van Doorn 1978; Bijl et al. 1989). It was not predicted by physiological models. This led to self-similar models of the visual system that accounted for the bulk of the data (Koenderink and van Doorn 1982a). Although ignored, they survive in the scale space paradigm.

Images as geometric data structures In the early 1980s I was in the physics department of Utrecht University, the Netherlands. Forced to find funds elsewhere, I ended up doing odd-jobs for the American Bureau of Standards and the American Air Force.

The Air Force asked me to report on various laboratories all over the US. At Azriel Rosenfeld’s lab (University of Maryland) I got a feeling of how important image science potentially was. That’s when I started to think about image structure as an algorithmic problem.

In the years soon after my funds came from European esprit projects. My academic interest rendered me a Fremdkörper in the computer science community. They deployed powerful Sun workstations, whereas I ran an Atari 1000 toy. However, I had Marty Veltman’s (at my department) schoonschip (Veltman and Williams 1993),Footnote 6 so I had some formal muscle.

About that time I—having met René Thom (1972) and following tutorials by Michael Berry (1992)—acquired an interest in catastrophe theory. I worked on singularities of optical projections (Koenderink and van Doorn 1979b, 1982b; Koenderink 1990a). This turned out to be crucial in the development of scale space theory.

The 1984 paper Sufficient motivation soon yields ideas. I was aware of the “pyramid” data structures (Burt and Adelson 1983; Crowley and Sanderson 1987) through Rosenfeld’s lab and I fully understood the problem of “spurious resolution” Strasburger (2018) from my interest in photography and the visual arts.Footnote 7

Hardly surprising that I hit upon the Gaussian kernel as special. Many others did too, like Andy Witkin (1983) whom I visited at Palo Alto in 1982. From my perspective, they had irrelevant reasons.

They failed to grasp the key concept. It is the diffusion equation \(\varDelta \varPhi =\varPhi _t\), where \(\varPhi (x,y,t)\) is image intensity, \(\{x,y\}\) are Cartesian coordinates in the image plane, and t a scale parameter.Footnote 8

I’m often asked why I “buried” the 1984 paper in this journal. I knew its founder, professor Werner Reichardt (1924–1992), quite well and my interests fitted his journal more naturally than the “expected” computer science journals. But no doubt the bulk of the citations are from the latter field.

The aftermath The theory became known as “scale space theory.”Footnote 9 It is a standard tool. In medical image processing it is an indispensable part of diagnostic methods. This was important to me, as I was a professor in both physics and the medical faculty during the transition period from silver-based X-ray emulsions to electronic sensors and data storage.

Applications range from the microscopic scale (Midoh et al. 2007) to the cosmic (Schmalzing 1997).

Scale space became one of my “potboilers.” Apart from minor pulp science, there sprouted various diverging threads of academic pursuits (Sect. 3).

2 Formalization of the concept

For a classical physicist it is entirely obvious that diffusion cannot generate, but only destroy spatial articulations. It is easy enough to prove that from the diffusion equation (Sect. 1).

I proceeded to capture the essential concepts as a set of simple axioms from which the diffusion equation follows. This is desirable because the axioms can readily be applied in phenomenological models of psychogenesis, as well as models of neural receptive field structures. These topics were of greater interest to me than computer image processing.

The diffusion equation serves to connect scale levels. One may define a vector field whose streamlines capture such connections. It is like the pointers in discrete image pyramids. The streamlines let one track details over finite scale ranges.

The catastrophe notion is crucial. It lets one handle bifurcations of the streamlines. This captures the global causal structure.Footnote 10 It enables a discrete (topological) description of “deep” image structure (Koenderink and van Doorn 1986, 1987) and thus symbolical filtering.

Another crucial aspect is that the diffusion equation is a linear pde. So scale space applies to arbitrary partial derivatives (Koenderink and van Doorn 1988). This suggested a principled taxonomy of receptive fields.

One may follow the evolution of differential invariants over scale (Koenderink and Richards 1988; Koenderink 1993). Images are trivial fiber bundles (Koenderink and van Doorn 2012). The important differential invariants are like geographical objects such as ruts, ridges, peaks, pits and passes (Koenderink and van Doorn 1979a, 1992, 1994). This yields topological, symbolical description. It suggested how cortical circuits might embody differential geometry and calculus.Footnote 11

Most of this is discussed in the 1984 paper. In the remainder of the 1980s and most of the ‘90’s these structures were studied in detail and were generalized in various directions. Some way beyond my horizon.

Certain developments went “too far” from my perspective. If one prunes the development tree accordingly, “scale space theory” proper already reached its mature form in the ‘90s.

Not that developments beyond scale space proper are not interesting. Some (like Perona and Malik 1990) are elegant and useful. But in retrospect one now has a fairly complete overview of the lay of the land.

3 Diverging paths

Various people formulated more refined and mathematically elegant accounts than I managed in 1984 (Florack 1997; Griffin 2019). Others wrote text books (Lindeberg 1994; Haar Romeny 2003) that have been instrumental in the acceptance of scale space methods.

This is not a review. I am an outsider today. Nobody should feel offended when I ignore a favorite.

An alternative route to scale space is by way of the Hermite transform. This leads to a valuable extension of the formalism (Martens 2006).

Of course, the theory has been applied to various finite dimensions. Non-trivial are extensions to the temporal domain, because of its causal structure (Koenderink 1988; Lindeberg 2013).

An obvious extension is to consider local histograms instead of mere image intensities. The histograms can be taken over regions with a diameter given by the scale. One has a local disarray instead of blurring (Koenderink and van Doorn 1999, 2000; Koenderink et al. 2012). This opens up novel perspectives. One application is the perturbation of images for vision research (Koenderink et al. 2017b) and artistic purposes.

There are endless ways to tune or generalize the axioms, to think of special cases that would imply different families of kernels, to consider various ways to complicate the simple diffusion equation, to apply the formalism to other domains, and so forth. All this—if possible more!—has been done. Perhaps it will be remembered as a cottage industry in theoretical image science for about three decades.

More relevant are implementations of scale space in discrete image processing algorithms. Not a simple matter, but crucial in applications (Lindeberg 1994; Haar Romeny 2003).

The brunt of the task has been completed.

4 The future

Scale spaceFootnote 12 is a tool, like the carpenter’s hammer and nails. It is hidden in software packages. Conceptual developments will be breakthroughs, because unexpected. But who knows?

I am interested in psychogenesis, Gestalt creation (Koenderink 2011, 2015; Koenderink et al. 2017c) and models of computational brain structures (Koenderink and van Doorn 1990; Koenderink 1990b; Koenderink et al. 2016, 2017a; Koenderink and van Doorn 2018). That was my main drive in ‘84.

A humbling fact that progress has been less than spectacular.