Hierarchical Brain

An explanation of the human brain

First published 1st February 2024. This is version 1.5 published 2nd March 2024.
Three pages are not yet published: sleep, memory and an index.
Copyright © 2024 Email info@hierarchicalbrain.com

Warning - the conclusions of this website may be disturbing for some people without a stable mental disposition or with a religious conviction.


Prediction is a term used in different levels of my hierarchy of different levels of description to mean the use of stored information to facilitate future processes. The stored information is broadly what can be called memory, but the nature of the information stored is different at the different levels, so the mechanisms of its usage by prediction are different.

Although prediction sits with memory in my hierarchy as a brain-wide function, and the two are closely connected, there is one major difference: the process of memory is part of my model of my world and I therefore have an innate (although often incorrect) understanding of it; but prediction is a process that I am generally unaware of, it is not modelled except as part of the process of perception. Even though I am not normally aware of the powerful assistance that prediction provides, it is actually crucial for many of my every-day activities such as walking or picking up an object, and is most likely a fundamental part of how I perceive and interact with the world and my body.

Contents of this page
Overview - an overview of my proposals related to prediction.
Introduction to the science - an introduction to the science of prediction in the brain.
The science of prediction before Predictive Processing - a brief history of prediction before Predictive Processing and prerequisites for it.
Predictive Processing - an overview of the theory of Predictive Processing and the Free Energy Principle.
Comments on Predictive Processing and the Free Energy Principle - how my proposals fit with these.
Details - further details of my proposals on prediction.
References - references and footnotes.


An introduction to the science

The science of prediction before Predictive Processing

Predictive Processing

Comments on Predictive Processing and the Free Energy Principle

Further details of my proposals on prediction.

References For information on references, see structure of this website - references

  1. ^ There are over 4000 papers currently listed in Google Scholar that use the phrase 'Predictive brain', but less than 100 of these are from before the year 2000.
  2. ^ Examples of whole books on the subject of prediction in the brain:
    The Predictive Mind - Jakob Hohwy 2013
    Surfing Uncertainty: Prediction, Action, and the Embodied Mind - Andy Clark 2016
    Both of these authors are philosophers rather than neuroscientists.
  3. ^ Surfing Uncertainty - Prediction, Action and the Embodied Mind - Clark 2016 Oxford University Press
    doi: 10.1093/acprof:oso/9780190217013.001.0001
    There are five occurrence of the phrase “probabilistic prediction machine” in the book. For example, page 53, start of chapter 2 “Adjusting the Volume (Noise, Signal, Attention)”, under the heading “2.1 Signal Spotting”: “...the on-board probabilistic prediction machine that underpins our contact with the world.”
    Page 57, under the heading “2.3 The Delicate Dance between Top-Down and Bottom-Up”: “Driving fast along an unfamiliar winding mountain road, we need to let sensory input take the lead. How is a probabilistic prediction machine to cope?”
  4. ^ Ibid. Surfing Uncertainty - Prediction, Action and the Embodied Mind
    Pages 3-4 in Introduction: “For to be able to predict the play of sensory data at multiple spatial and temporal scales just is, or so I shall argue, to encounter the world as a locus of meaning. It is to encounter, in perception, action, and imagination, a world that is structured, populated by organism-salient distal causes, and prone to evolve in certain ways. Perception, understanding, action and imagination, if PP [Predictive Processing] is correct, are constantly co-constructed courtesy of our ongoing attempts at guessing the sensory signal. That guessing ploy is of profound importance. It provides the common currency that binds perception, action, emotion, and the exploitation of environmental structure into a functional whole. In contemporary cognitive scientific parlance, this ploy turns upon the acquisition and deployment of a 'multilayer probabilistic generative model'.”
  5. ^ Response to the Edge.org question What do you consider the most interesting recent [scientific] news? What makes it important? - Lisa Feldman Barrett 2015
    Opening sentence: “Your brain is predictive, not reactive.”
  6. ^ How emotions are made - The secret life of the brain - Lisa Feldman Barrett 2017 Pan Books (UK) or see GoogleScholar.
    This book is by the same author who said that the brain is “predictive, not reactive” (see reference above).
    In the chapter entitled “How the brain makes emotions”, page 113, third paragraph:
    “The infant brain is missing most of the concepts that we have as adults. ... Not surprisingly, the infant brain does not predict well. A grown-up brain is dominated by prediction, but an infant brain is awash in prediction error. So babies must learn about the world from sensory input before their brains can model the world. This learning is a primary task of the infant brain. At first, much of the onslaught of sensory input is new to an infant’s brain, and its significance is undetermined, so little will be ignored. ... Infants absorb the sensory input around them and learn, learn, learn. The developmental psychologist Alison Gopnik describes babies as having a 'lantern' of attention that is exquisitely bright but diffuse. In contrast, your adult brain has a network to shut out information that might sidetrack your predictions, allowing you to do things like read this book without distraction. You have a built-in 'spotlight' of attention that illuminates some things, such as these words, while leaving other things in the dark. The infant brain’s 'lantern' cannot focus in this manner. As the months pass, if everything is working properly, the infant brain begins to predict more effectively. Sensations from the outside world have become concepts in the infant’s model of the world.”
  7. ^ Being You - A new science of consciousness - Anil Seth Faber & Faber London 2021
    Page 80, third paragraph, in the chapter entitled “Perceiving from the inside out”: “The first glimmers of a top-down theory of perception emerge in ancient Greece, with Plato’s Allegory of the Cave. Prisoners, chained and facing a blank wall all their lives, see only a play of shadows cast by objects passing by a fire behind them, and they give the shadows names, because for them the shadows are what is real. The allegory is that our own conscious perceptions are just like these shadows, indirect reflections of hidden causes that we can never directly encounter.”
  8. ^ Ibid. Being You - A new science of consciousness
    Page 81, second paragraph: “Helmholtz proposed the idea of perception as a process of 'unconscious inference'. The contents of perception, he argued, are not given by sensory signals themselves but have to be inferred by combining these signals with the brain’s expectations or beliefs about their causes. In calling this process unconscious, Helmholtz understood that we are not aware of the mechanisms by which perceptual inferences happen, only of the results.”
  9. ^ Ibid. Being You - A new science of consciousness
    Page 107, second paragraph in chapter 5 entitled “The Wizard of Odds”: “By minimising prediction errors everywhere and all the time, it turns out that the brain is actually implementing Bayes’ rule. More precisely, it is approximating Bayes’ rule.”
  10. ^ Treatise on Physiological Optics, Volume III - Hermann von Helmholtz 1867, translated from German by James P. C. Southall 1925
    downloadable here.
    Page 4, in the chapter headed “Concerning the Perceptions in General”: “...activities that lead us to infer that there in front of us at a certain place there is a certain object of a certain character, are generally not conscious activities, but unconscious ones. In their result they are equivalent to a conclusion, to the extent that the observed action on our senses enables us to form an idea as to the possible cause of this action; although, as a matter of fact, it is invariably simply the nervous stimulations that are perceived directly, that is, the actions, but never the external objects themselves.”
  11. ^ Ibid. Treatise on Physiological Optics, Volume III
    Page 23: “The idea of a single individual table which I carry in my mind is correct and exact, provided I can deduce from it correctly the precise sensations I shall have when my eye and my hand are brought into this or that definite relation with respect to the table. Any other sort of similarity between such an idea and the body about which the idea exists, I do not know how to conceive. One is the mental symbol of the other.”
  12. ^ Principles of Neural Science - Sixth edition - Kandel et al. McGraw-Hill US 2021 - or see GoogleScholar.
    Page 721, in chapter 30 “Principles of Sensorimotor Control” under the heading “Estimation of the Body’s Current State Relies on Sensory and Motor Signals”: “The concept of motor prediction was first considered by Helmholtz when trying to understand how we localize visual objects. To calculate the location of an object relative to the head, the central nervous system must know both the retinal location of the object and the gaze direction of the eye. Helmholtz’s ingenious suggestion was that the brain, rather than sensing the gaze direction, predicted it based on a copy of the motor command to the eye muscles. Helmholtz used a simple experiment on himself to demonstrate this. If you move your own eye without using the eye muscles (cover one eye and gently press with your finger on your open eye through the eyelid), the retinal locations of visual objects change. Because the motor command to the eye muscles is required to update the estimate of the eye’s state, the predicted eye position is not updated. However, because the retinal image has changed, this leads to the false percept that the world must have moved.”
    I have not yet managed to locate the source of this text in the work of Helmholtz.
  13. ^ Perceptual illusions and brain models - Gregory 1968
    doi: 10.1098/rspb.1968.0071 downloadable here or see GoogleScholar.
    (All papers of Richard Gregory are available at Richard Gregory - papers)
    Page 6, from sixth paragraph of left-hand column: “Perception seems, then, to be a matter of 'looking up' stored information of objects, and how they behave in various situations. Such systems have great advantages. ... Systems which control their output directly from currently available input information have serious limitations. In biological terms, these would be essentially reflex systems. Some of the advantages of using input information to select stored data for controlling behaviour, in situations which are not unique to the system, are as follows:
    1. In typical situations they can achieve high performance with limited information transmission rate. It is estimated that human transmission rate is only about 15 bits/second. They gain results because perception of objects - which are redundant - requires identification of only certain key features of each object.
    2. They are essentially predictive. In typical circumstances, reaction-time is cut to zero.
    3. They can continue to function in the temporary absence of input; this increases reliability and allows trial selection of alternative inputs.
    4. They can function appropriately to object-characteristics which are not signalled directly to the sensory system. This is generally true of vision, for the image is trivial unless used to 'read' non-optical characteristics of objects.
    5. They give effective gain in signal/noise ratio, since not all aspects of the model have to be separately selected on the available data, when the model has redundancy. Provided the model is appropriate, very little input information can serve to give adequate perception and control.
    There is, however, one disadvantage of 'internal model' look-up systems, which appears inevitably when the selected stored data are out of date or otherwise inappropriate. We may with some confidence attribute perceptual illusions to selection of an inappropriate model, or to mis-scaling of the most appropriate available model.”
  14. ^ The Helmholtz Machine - Dayan, Hinton, Neal and Zemel 1994
    doi: 10.1162/neco.1995.7.5.889 downloadable here or see GoogleScholar.
    Beginning of introduction, page 1: “Following Helmholtz, we view the human perceptual system as a statistical inference engine whose function is to infer the probable causes of sensory input. We show that a device of this kind can learn how to perform these inferences without requiring a teacher to label each sensory input vector with its underlying causes.”
    And page 8, second paragraph: “The Helmholtz machine is closely related to other schemes for self-supervised learning that use feedback as well as feedforward weights. ...the Helmholtz machine treats self-supervised learning as a statistical problem - one of ascertaining a generative model which accurately captures the structure in the input examples.”
  15. ^ On Entropy, Information, and Conservation of Information - Cengel 2021
    doi: 10.3390/e23060779 downloadable here or see GoogleScholar.
    Start of abstract: “The term entropy is used in different meanings in different contexts, sometimes in contradictory ways, resulting in misunderstandings and confusion. The root cause of the problem is the close resemblance of the defining mathematical expressions of entropy in statistical thermodynamics and information in the communications field, also called entropy, differing only by a constant factor with the unit 'J/K' in thermodynamics and 'bits' in the information theory.”
  16. ^ Ibid. On Entropy, Information, and Conservation of Information
    In the section headed “4. Information and Entropy”, last paragraph of page 10, to page 11: “Information (or entropy) in physical sciences and in the communications field is proportional to the number of possible states or configurations N with non-zero probability. At a given time, the probability of any of the possible states of an equiprobable system is p = 1/N. These possible states may be reshuffled as time progresses. The larger the number of allowed states N is, the larger the information, the larger the uncertainty or the degrees of freedom to keep track of, and thus the larger what is not known. Therefore, ironically, information in physical and information sciences turns out to be a measure of ignorance, not a measure of knowledge...”
  17. ^ Physical Memoirs, Selected and Translated from Foreign Sources, Volume 1, Part 1 - Helmholtz 1882, published Taylor & Francis, 1888
    downloadable here or see GoogleScholar.
    In the second section starting on page 43 entitled “On the thermodynamics of Chemical Processes”, page 49 onwards entitled “Idea of Free Energy”, page 55 third paragraph: “For isothermal changes the function δ coincides, as we have seen, with the value of the potential energy for work-values convertible without limit. I propose therefore to style this quantity the 'free energy' of the system of bodies.”
  18. ^ Relating thermodynamics to information theory: the equality of free energy and mutual information - Feinstein 1986
    doi: 10.7907/XVQB-7902 downloadable here or see GoogleScholar.
    Fourth sentence of abstract, page iv: “Thermodynamic free energy measures the approach of the system toward equilibrium. Information theoretical mutual information measures the loss of memory of initial state. We regard the free energy and the mutual information as operators which map probability distributions over state space to real numbers.”
  19. ^ Autoencoders, minimum description length and Helmholtz free energy - Hinton and Zemel 1994
    downloadable here or see GoogleScholar.
    Last paragraph of discussion, page 10: “In this paper we have shown that an autoencoder network can learn factorial codes by using non-equilibrium Helmholtz free energy as an objective function. ... We anticipate that the general approach described here will be useful for a wide variety of complicated generative models. It may even be relevant for gradient descent learning in situations where the model is so complicated that it is seldom feasible to consider more than one or two of the innumerable ways in which the model could generate each observation.”
  20. ^ Whatever next? Predictive brains, situated agents, and the future of cognitive science - Andy Clark 2013
    doi: 10.1017/S0140525X12000477 downloadable here or see GoogleScholar.
    Pages 2-3: “Predictive coding itself was first developed as a data compression strategy in signal processing. Thus, consider a basic task such as image transmission: In most images, the value of one pixel regularly predicts the value of its nearest neighbors, with differences marking important features such as the boundaries between objects. That means that the code for a rich image can be compressed (for a properly informed receiver) by encoding only the 'unexpected' variation: the cases where the actual value departs from the predicted one. What needs to be transmitted is therefore just the difference (a.k.a. the 'prediction error') between the actual current signal and the predicted one. This affords major savings on bandwidth, an economy that was the driving force behind the development of the techniques by James Flanagan and others at Bell Labs during the 1950s. Descendents [sic] of this kind of compression technique are currently used in JPEGs, in various forms of lossless audio compression, and in motion-compressed coding for video.”
  21. ^ Ibid. Whatever next? Predictive brains, situated agents, and the future of cognitive science
    Beginning of abstract: “Brains, it has recently been argued, are essentially prediction machines. They are bundles of cells that support perception and action by constantly attempting to match incoming sensory inputs with top-down expectations or predictions. This is achieved using a hierarchical generative model that aims to minimize prediction error within a bidirectional cascade of cortical processing. Such accounts offer a unifying model of perception and action, illuminate the functional role of attention, and may neatly capture the special contribution of cortical processing to adaptive success.”
  22. ^ Ibid. Whatever next? Predictive brains, situated agents, and the future of cognitive science
    Note 5 on page 22: “In speaking of 'predictive processing' rather than resting with the more common usage 'predictive coding', I mean to highlight the fact that what distinguishes the target approaches is not simply the use of the data compression strategy known as predictive coding. Rather, it is the use of that strategy in the special context of hierarchical systems deploying probabilistic generative models. Such systems exhibit powerful forms of learning and are able flexibly to combine top-down and bottom-up flows of information within a multilayer cascade”
  23. ^ How does the brain do plausible reasoning? - Jaynes 1988
    downloadable here or see GoogleScholar or see Google books
    Start of abstract: “We start from the observation that the human brain does plausible reasoning in a fairly definite way. It is shown that there is only a single set of rules for doing this which is consistent and in qualitative correspondence with common sense. These rules are simply the equations of probability theory, and they can be deduced without any reference to frequencies. We conclude that the method of maximum-entropy inference and the use of Bayes’ theorem are statistical techniques fully as valid as any based on the frequency interpretation of probability.”
    Page 15: “Shannon’s theorem 2 tells us that the consistent measure of the 'amount of uncertainty' in a probability distribution is its entropy, and therefore we must choose the distribution which has maximum entropy subject to the constraints. Any other distribution would represent an arbitrary assumption of some kind of information which was not given to us.”
    Unfortunately, the last page containing the last four references for this paper are missing from all sources I have found.
  24. ^ A Mathematical Theory of Communication - Shannon 1948
    doi: 10.1002/j.1538-7305.1948.tb01338.x downloadable here or see GoogleScholar.
    Page 11 concerning theorem 2: “The form of H will be recognized as that of entropy as defined in certain formulations of statistical mechanics... H is then, for example, the H in Boltzmann’s famous H theorem.”
  25. ^ Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects - Rao and Ballard 1999
    doi: 10.1038/4580 downloadable here or see GoogleScholar.
    Start of abstract: “We describe a model of visual processing in which feedback connections from a higher- to a lower-order visual cortical area carry predictions of lower-level neural activities, whereas the feedforward connections carry the residual errors between the predictions and the actual lower-level activities.”
    End of introduction, bottom of page 79: “Using a hierarchical model of predictive coding, we show that visual cortical neurons with extra-classical RF [Receptive Field] properties can be interpreted as residual error detectors, signaling the difference between an input signal and its statistical prediction based on an efficient internal model of natural images.”
    Under the heading “Results” and “Hierarchical Predictive Coding Model”, page 80: “Each level in the hierarchical model network (except the lowest level, which represents the image) attempts to predict the responses at the next lower level via feedback connections. The error between this prediction and the actual response is then sent back to the higher level via feedforward connections. This error signal is used to correct the estimate of the input signal at each level... The prediction and error-correction cycles occur concurrently throughout the hierarchy, so top-down information influences lower-level estimates, and bottom-up information influences higher-level estimates of the input signal.”
  26. ^ YouTube video - “Ransom & Fazelpour’s Intro to 'Three Problems For Predictive Coding Theory Of Attention'” - Ransom and Fazelpour 2016
    The summary of Predictive Processing is taken partly from this video, which is an accompaniment to the online paper Three Problems for the Predictive Coding Theory of Attention - Ransom and Fazelpour 2015. The video contains a useful introduction to the theory, as well as a description of a possible problem with the theory, and the online paper has a number of thoughts and answers at the end.
    The following quote is from the a slide on the YouTube video at 4' 45":
    “Attention is the process of selecting the prediction error expected to be most precise and revising perceptual hypotheses on this basis.”
  27. ^ The free-energy principle: a rough guide to the brain? - Friston 2009
    doi: 10.1016/j.tics.2009.04.005 downloadable here or see GoogleScholar.
    Third line of Introduction (first page, numbered page 293): “...any adaptive change in the brain will minimize free-energy.”
  28. ^ Ibid. The free-energy principle: a rough guide to the brain?
    Page 299, under the heading “Attention and precision”, second paragraph: “...attention is simply the process of optimising precision [of prediction errors] during hierarchical perceptual inference.”
  29. ^ Driven by compression progress (or here) - Schmidhuber 2009
    doi: 10.1007/978-3-642-02565-5_4 downloadable here or see GoogleScholar.
    Introduction to section 3 on page 12: “... predictors and compressors are closely related. Any type of partial predictability of the incoming sensory data stream can be exploited to improve the compressibility of the whole.”

Page last uploaded Sat Mar 2 02:55:43 2024 MST