"This letter is longer than usual, because I lack the time to make it short."
Blaise Pascal Provincial Letters no.16


posted 18 Apr 2012, 22:00 by John Brown   [ updated 3 Feb 2015, 14:35 ]

"... the goal is that the formal notion match the intuitive one in all the easy cases; 
resolve the hard ones in ways which don't make us boggle; 
and let us frame simple and fruitful generalizations."
Shalizi (2001) paraphrasing Quine (1961) 

Intelligence is an evolved characteristic which has arisen as a result of natural selection. A formalization of intelligence should be cast in these terms, to capture the essence of why brains came about. 

I consider that ...

intelligence is observed as the rate at which a device increases its efficiency in an environment.

i e = δη / δt

Thinking about intelligence in terms of efficiency, stems from the perspective that evolution through natural selection is something of an "arms race" to obtain scarce resources, and to avoid hazards. Those organisms which are more efficient in their reproductive endeavours hold a competitive edge. If an organism can alter it's response to it's environment in such a way that increases the resource obtained, then it has become more effective. If the gain outweighs the cost, then the organism has become more efficient, and requires less resources overall.

"... every evolutionary advantage in nervous systems has come at a cost", and "the development and operation of a brain is tremendously expensive in terms of the energy required(Allman 2000). However, the fact that brains have evolved, demonstrates that they deliver a reproductive advantage"Brains exist because the distribution of resources necessary for survival and the hazards that threaten survival vary in space and time. There would be little need for a nervous system in an immobile organism or an organism that lived in regular and predictable surroundings.(Allman 2000). For example, fruit has higher energy and is easier to digest than the more readily available leaves. However, fruit is more widely dispersed in space and time, and subject to greater competition.

One organism can be more efficient than another, without being more intelligent. A dessert plant may survive and proliferate over many years on a meagre ration of water, while a pair of horses are unlikely to survive a single gestation period at the same location. The dessert plant boasts exceptional water efficiency. The horses, with their capacity for movement, may range broadly in search of water, and water may indeed be located at great cost. However, on a subsequent occasion the horses, with their capacity for memory, vision and smell, will likely travel by a much more efficient route to the water source. The plant remains more efficient, but the horses have increased their efficiency.

Intelligence is something more than the capacity for movement. A female wasp Sphex seals her eggs in a prepared burrow alongside a paralized cricket which serves as food for the wasp grubs. The Sphex follows a routine which involves dragging the cricket to the threshold of the burrow, entering the burrow and then re-emerging to drag the cricket inside. If a researcher moves the cricket a little distance away while the wasp is in the burrow, then the Sphex will repeat the whole routine of dragging, entering, emerging. On one occasion a researcher and wasp repeated the routine forty times (Hofstadter 1996). The Sphex is capable of elaborate movement, but has a limited capacity to improve the efficiency of that routine movement. 

Brains enable gains in efficiency, but are not limited to that, any more than a tongue is limited to agitating food.

Efficiency is a general concept relating to the expenditure required to achieve some outcome - in terms applicable to the task at hand. This general notion is given varying degrees of formalism in physics, economics and rather loosely in the examples in this article. While evolution by natural selection has been referred to as a "blind" process without purpose (Dawkins 1987 1999), this same process can also be viewed as a fierce competition. Efficiency in these competitive terms might be defined broadly in terms of the resources required to make replicas of a strand of DNA. The flexibility (and value) in the notion of efficiency flows through into that of intelligence - in terms of the objective, and the resources under consideration.

This notion of intelligence is tied to a particular environment, such that one may compare the intelligence of two devices within a single environment. Unfortunately, it does nothing to assist us to compare the intelligence of devices in different environments. The traditional game of chess can be viewed as a simple environment within which two potentially intelligent players may compete. There are also variants of this game environment which limit the amount of time available for making each move. It seems readily apparent that intelligence (increased efficiency over time) in the traditional game will largely translate to the fast chess environment. However, fast chess will clearly favour a slightly different intelligence. Consider further altering the chess environment such that players need not wait for their turn, but may make up to one move per second. The manner in which a player might increase their efficiency over time alters significantly. Consider the intelligence of a dedicated chess player in a poker environment, in judo ......... or in Africa. The nature of intelligence is closely related to the nature of the environment.

This notion of Intelligence offers a good match to our intuition:
  • Rocks interact with the environment in a consistent unchanging manner, and hence there is no intelligence.
  • Plants typically interact with the environment only very slowly, or in a repetitive manner. Each individual plant in a species exhibits very similar efficiency, and without significant improvements during the plant's lifespan. Intelligence of plants is vanishingly small.
  • Dogs are able to "learn new tricks" as it were; changing the way that they behave to obtain a reward. Dogs are intelligent.
  • Crows of New Caledonia are able to fashion a tool from a novel material (wire) and use it to extract a grub from an inconveniently shaped hole (Bluff et al 2007). Crows are intelligent.
  • People are intelligent when they "design a better [more efficient] mousetrap", and not so intelligent when they repeatedly pull the handle of a poker machine against the odds.
  • Turing machines operate in complete isolation of the environment for the duration of a deterministic undertaking. There is no capacity to improve efficiency, and hence no intelligence. 
  • Searle's Chinese room operates on a fixed set of instructions with no facility for improving efficiency, and hence there is no intelligence.
  • Chess playing computers which become stronger over time in response to their gaming environment are intelligent.
Recognising intelligence as increasing efficiency over time accords well with our intuition. Inspecting intelligence through this lens provides clarity in some instances where the lack of an accepted formalization has fuelled debate. Efficiency is a suitably flexible notion which bundles an objective with the necessary resources. Intelligence is explicitly tied to an efficiency in a particular environment.

Allman,J. (2000), Evolving Brains. W. H. Freeman. 
Bluff,L. Weir,A. Rutz,C. Wimpenny,J. Kacelnik,A. (2007), Tool-related Cognition in New Caledonian Crows. University of Oxford.
Dawkins,R. (1999), The Extended Phenotype. Oxford University Press.
Dawkins,R. (1987), The Selfish Gene. Oxford University Press.
Hofstadter,D. (1996), Metamagical Themas: Questing for the Essence of Mind and Pattern. Basic Books.
Quine, W (1961), From a Logical Point of View: Logico-Philosophical Essays. Cambridge, Mass.: Harvard University Press, 2nd edn. First edition, 1953.
Shalizi,C. (2001), Causal Architecture, Complexity and Self-Organization in Time Series and Cellular Automata. 


posted 8 Mar 2012, 22:17 by John Brown   [ updated 13 Mar 2012, 19:49 ]

We humans use a large number of symbols in our written languages. Often these same symbols are also used to write mathematical ideas and we each have learned how to manipulate these mathematical symbols to undertake arithmatic, triganometry, calculus and the like. More recently computers have been developed which can store and process vast quantities of these written symbols far more rapidly than humans. However, while human brains can learn to store and manipulate symbols, the operation of a brain remains very distinctly different to that of a computer. Furthermore, Natural spoken languages are very distinctly different to mathematical languages (including computer programming languages). 

Written language symbols were developed to record spoken languages that had evolved in very much earlier times. Children acquire speech in a largely automatic and effortless process, before learning to read and write, and then only with considerable effort. In each case, the recognition system is primary to the production system. For example, infant comprehension precedes speech; vocabulary for reception is generally larger than for production;  multi-linguals that speak with a foreign accent do not hear with a foreign accent (Lamb 1999). 

The artificial written symbols (letters, words, punctuation, sentences, paragraphs etc) are attached to natural spoken language; not the other way around. The correspondence between spoken language and written language is only broad; consider “I am going to” vs. “I'm gonna” vs. “amona”. The rhythm, stress and intonation of spoken language are not completely captured by the written symbols; consider “Jennifer put her briefcase on the dining-room table” vs. “Jennifer put her briefcase on the dining-room table” vs. “Jennifer put her briefcase on the dining-room table”. One word can represent multiple concepts, and concepts can be represented by multiple words; consider hard and difficult.

In mathematics (and in computers), each symbol has a precise, formally defined meaning, founded on a small group of axioms. Natural languages are very different. Consider the natural language word 'game' as famously explored by Wittgenstein. In seeking a definition of a game, Wittgenstein first sought that which was common to all games. He found only a complicated network of similarities, overlapping and criss-crossing: sometimes overall similarities and sometimes similarities of detail. Look for the similarities in board-games, but then extend to card-games, and then on to ball-games. There is often winning, losing and competition, but what of the card-game patience, or throwing a ball to ones-self? What of the role of skill and luck; of the difference of skill in chess and in tennis. What of a war-game? Wittgenstein found that it was not even possible to draw a natural boundary around the meaning of a word. But Wittgenstein's whole point was the astonishing realization that we do not need a definition in order to use a word successfully in natural language. What does it mean to know what a game is, and not be able to say it? (Wittgenstein 1953)

A crucial property of a conceptual system
 is that none of its concepts can be described without 
an account of its relationships to various other concepts.
Lamb 1999

Lamb worked up from the various characteristics of spoken language to arrive at a theory for the structure in the human brain that must exist to support it's idiosyncratic nature. Lamb's structure is a network of nodes (called nections) which can become activated to varying degrees. The activation of one node can flow on to neighbors, but weakening with distance. The activation of a nection in Lamb's structure is equivalent to a concept, and learning involves the recruitment of latent nodes which happen to have the requisite connectivity to assemble the new concept. 

No-where in Lamb's network, is there a place for the storage and retrieval of symbols. “There is no such thing as the meaning of a text apart from an interpreter. And meaning is not conveyed by a text, as the usual metaphor would have us believe. Rather, elements of the text activate meanings in the minds of interpreters. ... A text cannot be interpreted except by virtue of information already present in the system before the text is received.  ... There is therefore no possibility of perfect communication through language.”  (Lamb 1999)

Lamb,S.M (1999), Pathways of the Brain: The Neurocognitive Basis of Language. John Benjamins Publishing Co.
Wittgenstein,L. (1953), Philosophical Investigations. Blackwell. 


posted 4 Mar 2012, 17:40 by John Brown   [ updated 13 Mar 2012, 19:45 ]

Arguments have been presented to the effect that the creation of a thinking machine is not just very difficult, but inherently impossible. It seems prudent to consider at least one of these arguments lest we embark on an impossible quest. Roger Penrose has presented a lengthy argument to this effect  (Penrose 1989 & 1994) which was heralded as “... the most powerful attack yet written on strong AI” (Gardner in foreword to Penrose 1989). Proponents of strong Artificial Intelligence broadly hold that all thinking is computation.

The first part of Penrose's argument is constructed using a technique by Godel. As a first step, Penrose defines an algorithm for a Turning machine (the P.algorithm say), that will accept any Turing machine algorithm as input. If the P.algorithm halts, then it is a reliable indication that the input algorithm will not halt. When a result from an algorithm is thus reliable, then it is said to be “sound”. Penrose goes on to supply the P.algorithm as the input to the P.algorithm itself. In this situation, if the P.algorithm were to halt then we would have a reliable indication that the P.algorithm does not halt – a contradiction. Hence, we must conclude that the P.algorithm does not halt. This is OK because the P.algorithm says nothing if it does not halt; it is only a reliable indicator when it does halt.

As the second step to the argument, Penrose imagined that the P.algorithm contained all of the reliable techniques that any mathematician, past and future, might convert into an algorithm and employ to indicate if a Turing algorithm will halt.

The crux of Penrose's argument is this: By concluding that the P.algrorithm does not halt when applied to itself, a mathematician is able to indicate something that the P.algorithm was unable to. This is despite the fact that the P.algorithm contains all algorithms available to mathematicians. A contradiction. From here, Penrose draws the  “inescapable” conclusion that:

Human mathematicians are not using a knowably sound algorithm in order to ascertain mathematical truth.” 
Penrose 1994
Penrose goes on from here to conclude that:
“...there is something essential in human understanding that is not possible to simulate by any computational means.”
Penrose 1994

However, Penrose's argument is difficult to accept. Consider the following alternate argument built along similar lines. Consider that Professor Penrose himself is given the task of examining Turing machine programs and stating if they will not halt.  P.Penrose is at liberty to employ the full measure of his insight or that of any other colleague and is provided with a (very rare) appendum containing all past and future mathematical proof techniques. P.Penrose must work continuously on each submitted Turing program until he can reliably state that a Turing machine running the submitted program will not halt and then promptly advise the  submitter accordingly. Furthermore, P.Penrose is not allowed to deliver any other result to the submitter. 

A Turning machine is then programmed to do four things in sequence:
  1. send the following question to P.Penrose: “Will a Turing machine with the programming (attached) of the Turing machine submitting this question not halt?”
  2. print out the following statement: “If P.Penrose finds that a Turing machine with this programming will not halt then this Turing machine will halt. This is a contradiction, therefore a Turing machine with this programming will not halt.”
  3. print out the natural numbers in an ascending sequence beginning from 1 until a response is received from P.Penrose.
  4. halt.
Should we now go on with Penrose's argument to conclude that there is something essential in the operation of a Turing machine that is not possible to simulate by P.Penrose? I think not.

In Penrose's argument, the human mathematician is always able to indicate something extra because the mathematician has been placed in a superior position. The mathematician has the privilege of considering the operation of the Turing machine from the “outside”, and this is withheld from the machine. 

Computers exist on an exclusive diet of symbols. Humans write computer programs which define a set of parameters and an algorithm by which to manipulate the values supplied for each parameter.  The human 'understanding', above and beyond symbol manipulation, is in the validity of the algorithmic relationship between the particular parameters of the function. Consider the familiar formula f=ma from Newtonian physics which relates force, mass and acceleration. This formula represents a simple algorithm with three parameters. The 'understanding' of the algorithm is in the pragmatic reality that manipulating mass and acceleration 'just so', will result in something useful. Performing the identical manipulation with mass and temperature simply is not useful. The possibility for 'understanding' is withheld from a symbol manipulator, be it an intelligent person, a million transistors, or Babbage's difference engine; not because of the internal components, but because of the role it has been assigned.

Consider another perspective on Turing machines; a Turing machine is intended to halt when it arrives at an answer, while life is characterized by its continuance. When life halts it takes its consciousness and its intelligence with it. Turing machines appear superficially to be dynamic in their operation. However, the Turing machine is completely isolated from the environment for the entire period between the start and the stop, at which time it delivers a static answer. It matters naught if the time is long or short, or if the machine is fast or slow in its operation, the end result is, by definition, static. Intelligence is a dynamic phenomenon which is useful until it halts while a Turing machine is a device which is useful when it halts

Not impossible.

Penrose,R. (1989), The Emperor's New Mind. Vintage. 
Penrose,R. (1994), Shadows of the Mind. Oxford University Press, Oxford.
Turing, A.M. (1936) On Computable Numbers with an application to the Entscheidungsproblem.

evolution of brains

posted 4 Mar 2012, 17:20 by John Brown   [ updated 18 Apr 2012, 22:12 ]

A. Butler and W. Hodos identified a variety of mechanisms involved in the evolution of vertebrate central nervous systems (Butler & Hodos 2005). Induction is one of the most basic mechanisms where an altered chemical signal induces part of an existing embryonic structure to develop in a different manner. A second mechanism, of critical importance to vertebrate brain evolution, involves the homeobox genes. The homeobox genes specify regions in the early developing nervous system that will develop into identifiable brain structures. These two basic mechanisms are common to all species with developed brains (rather than very simple nervous systems).

A number of additional evolutionary mechanisms are apparent in the vertebrates. Invasion describes a mechanism where a population of neurons evolves new connections to an existing area of the brain and comes to dominate the function of that area. There are also examples involving the loss of connections per sae to a particular neuron population. Another evolutionary mechanism has seen a uniform population of neurons evolving into multiple sub-populations of neurons with different characteristics or connectivity (referred to as differentiation and parcellation respectively).  Duplication of whole neural regions has occurred, enabling one of the regions to evolve a new role while leaving the other region to perform its original function (Allman 2000). A particularly frequent mechanism has been changes in the proliferation of neurons in a particular region, usually in association with one of the other mechanisms.

... great diversity and great complexity have arisen because they are 
both merely the result of a few, simple random mutation events 
that affect the behaviour of particular morphogenetic fields
the phenotypes of which have been favoured highly by natural selection.” 
Butler & Hodos 2005
However, the number of mutations that have not been favored by natural selection is a very large number.

There is more than one way to build a brain. 
A great diversity in brain organization has been achieved independently 
at least four separate times within four separate radiations of vertebrates. ... 
the development of a more complex brain has been accomplished, 
not just once for the ascent of man, but multiple times.” 
Butler & Hodos 2005

For example, the cognitive abilities of birds are on par with those of most mammals, certainly including primates. The general circuitry of the avian brain routing the information from the sense organs to the “higher” areas of the brain and looping around the higher levels are strikingly similar. However, the architecture of the higher levels of the avian brain are conspicuously different (Butler & Hodos 2005).  Much has been attributed to the architecture of the neocortex in mammals. The neocortex is the outermost layer of the cerebral hemispheres with the familiar deep fissures in human brains, but with a  smooth surface in other mammals such as rat brains. The neocortex comprises six fairly uniform layers, each characterized by neuron type and connectivity. The majority of the cells in the neocortex are pyramidal neurons, for their characteristic shape. In human, the neocortex is involved in higher functions such as sensory perception, generation of motor commands, spatial reasoning, conscious thought and language. However, the avian brain has evolved into a different solution to a largely similar set of environmental challenges. Two areas of the avian brain, the wulst and the dorsal ventricular ridge, support many higher cognitive functions. Most of the neurons in these areas have a star shape, and the layering is orientated radially out from the center of the brain. Diverse neural architecture can support highly complex cognitive abilities. There is more than one way to build a brain.

The avian brain is very considerably smaller and necessarily lighter than primate brains. Smaller again, is the brain of an araneophagic jumping spider Portia fimbriata. These predatory spiders invade the web of other spiders and employ deception to attack them. If the resident spider is small then P. fimbriata mimic the vibrations of an ensnared insect in an attempt to lure the resident spider. When the resident spider is large then the P. fimbriata mimic the vibrations of an insect brushing the periphery of the web keeping the resident out of the web and avoiding a full scale attack. When P. fimbriata moves across the web they disguise their movements by simulating a large-scale disturbance of the web such as from wind (Tarsitano, Jackson & Kirchner 2001). This is an example of a minute brain that is capable of deception – the very same capacity that Turing chose as a keen test of intelligence.

In sum, brain evolution has involved an array of mechanisms which reuse the existing structures in complicated and intertwined ways. There is more than one way to build a brain. Remarkable things can be achieved with a very small brain.

Allman,J. (2000), Evolving Brains. W. H. Freeman. 
Butler, A.B. & Hodos, W. (2005), Comparative Vertebrate Neuroanatomy - Evolution and Adaption. Wiley. 
Tarsitano, M., Jackson, R.R., & Kirchner, W.H. (2001), Signals and Signal Choices made by the Araneophagic Jumping Spider Portia fimbriata while Hunting the Orb-Weaving Web spiders Zygiella x-notata and Zosis geniculatus, Ethology vol.106 issue.7, pages.595-615
Turing, A.M. (1950) Computing Machinery and Intelligence. Mind 49: 433-460. 

self organizing systems

posted 4 Mar 2012, 16:59 by John Brown

Herbert Simon noted that a large proportion of complex systems that we observe in nature (systems with a large number of parts and non-simple interaction) exhibit hierarchical structure (Simon 1962). These hierarchical structures are composed of interrelated subsystems which, in turn, have their own internal hierarchic structure. The components of these structures are distinguished in terms of the intensity of interaction. That is, the intensity of the  interaction between components is much less than the intensity of the interactions within each component. Simon referred to such systems as “near decomposable” in recognition that a rough description of these systems need only refer to the interaction between components without needing to describe the inner workings of each component. It is also important that only a very small set of building blocks are typically involved in the construction of a vast array of natural systems.

Hierarchal structures are to be expected in natural systems which have evolved due to random processes. The random jostling of atoms will frequently produce small molecules, but the probability of instantaneously producing a large complex structure, such as an ant, is infinitesimal. However, if the small, randomly produced molecules are stable, then further random jostling will regularly combine a few of these small stable molecules into a larger structure. The time required for the evolution of a complex form from simple elements depends critically on the numbers and distribution of potential intermediate stable forms. Simon drew these conclusions from a broad range of complex natural systems including cosmology, chemistry, biology, sociology, physics and genetics.

The stability of sub-assemblies can be expressed in terms of the point of equilibrium between the concentration of components, and of the sub-assemblies. Unstable sub-assemblies rapidly decompose back into components, while stable sub-assemblies will tend to capture the majority of the available components. When the point of equilibrium is reached, the relative concentrations of components is maintained by continuous formation and dissolution of individual sub-assemblies. In chemical reactions, the presence of a catalyst hastens the progress towards equilibrium between reagents and products without actually altering the point of equilibrium. If the sub-assemblies are in some way removed or consumed by a subsequent reaction, then more of the original components will be consumed to reattain the equilibrium. 

Kauffman further points out that, each new component type enables the formation of yet more new component types (Kauffman 2002). Kauffman coined the term “adjacent possible” to refer to those components that are just step away from the currently existing component set. There is a constant “pressure” for the adjacent possible components to be formed because their concentration is zero and equilibrium has not yet been reached. Kauffman proposed this rapidly expanding “adjacent possible” as an explanation for the rapidly expanding diversity of a wide range of systems from biology to economic markets.

Simon points out that a component which somehow increases the probability that another of its type is formed will greatly alter the equilibrium in favor of that component. These are Dawkins' replicators. 

In sum, the capacity to form stable sub-assemblies is critical to enable large complex systems to form from smaller components in a randomized environment. Each new type of stable sub-assembly introduces the potential for many more new types.

Dawkins, R. (1987), The Selfish Gene. Oxford University Press. 
Kauffman, S. (2002), Investigations. Oxford University Press.
Simon,H,A. (1962), The Architecture of Complexity. Proceedings of the American Philosophical Society, Vol. 106, No. 6. pp. 467-482. 


posted 4 Mar 2012, 16:48 by John Brown

Claude Shannon developed a mathematical theory of communication which has been widely used in many fields of electronics ever since (Shannon 1948). In particular, Shannon was concerned with the uncertainty introduced by a noisy communication channel. When a message is sent over a perfect communication channel, the recipient can be  confident that they have received a reliable copy of the transmitted message.  However, when the channel is noisy, the recipient will be uncertain, and must do their best to guess what message was actually transmitted. Shannon defined a way of measuring this uncertainty and called it entropy.

You may know of the simple game of Hangman, where you must guess an unknown word, one letter at a time. Each wrong guess takes you one step closer to being hung. When a child is first confronted with the game their uncertainty is very high as there are 26 possibilities for each letter position. As their skills improve, they will realize that some letters are more common than others, and their uncertainty will be a little less. Guessing a single letter at random has a 1 in 26 = 3.85% change of being correct. Guessing a single letter known to be from an English text will be less uncertain as the probability of 'e'=12.70%, 't'=9.06%, 'a'=8.17%, and so on down to 'z'=0.07%. This uncertainty can be expressed as entropy = 4.7 in the naïve situation, decreasing to  entropy = 4.2 with a knowledge of word frequency in English. Additional knowledge of English words will further reduce uncertainty as letters are guessed correctly in the game. As a child learns English, their uncertainty in the Hangman environment decreases.

Cosma Shalizi built on Shannon's work by developing an optimal way to extract and store information from a string of symbols (Shalizi 2001). Shalizi's CSSR algorithm first examines a large historical sample of symbols generated by some unknown process. The initial analysis looks for small repeated histories (patterns) within the sample and records the probability of the very next character. This is completed for all small histories in the sample up to some maximum size. These probabilities are then condensed into an ɛ-machine, which is the most concise way possible of representing the knowledge gathered about the process.   

For example, consider a historical sample involving just two symbols, A and B.


A statistical analysis will show that:
A is followed by another A(33%) or B(66%)
B is always followed by an A(100%)

We can refine this by looking at longer histories: 
AA is followed by A(40%) or B(60%). 
AB is followed by A(100%).
BA is followed by A(33%) or B(66%).
BB never occurs in the sample.

Finally we condense the results into 3 states:
S1. A and BA both predict A(33%) or B(66%).
S2. AA predicts A(40%) or B(60%). 
S3. B predicts A(100%).

This ɛ-machine is a model of the unknown process which generated the original sample. The model may be improved by examining a larger sample or collecting statistics for longer histories. If you were to move around the ɛ-machine diagram, using the calculated probabilities at each state, then you will generate a string that is statistically identical to the original sample. Alternatively, if you begin receiving more data from the original unknown process, you will be able to sync with the ɛ-machine after a couple of symbols and then be in a position to make the best possible prediction for the next symbol.

Might an ɛ-machine be used as the basis for an intelligent device? The ɛ-machine reduces uncertainty at the maximum possible rate by capturing all patterns that have any predictive power. Furthermore it is the smallest possible representation and it continues to operate optimally with noise in the signal. However, the CSSR algorithm used to construct the ɛ-machine comes at a cost. 

Modern computers are based on an architecture proposed by Allan Turing (Turing 1936) in response to a problem in Mathematics. This most basic of computers, comprised a simple machine and a tape. The machine was able to move left or right along the tape and was able to read, write or erase a single symbol to each position on the tape, or simply stop. Each action of the machine was completely determined by the symbol most recently read from the tape and the internal state of the machine. Thus, the Turing machine has memory and the ability to act on the memory. The cost of a computation is the amount of memory required and the amount of time required to complete the computation. These costs are referred to as the memory complexity and the time complexity respectively.

The time complexity of constructing an ɛ-machine using Shalizi's CSSR algorithm depends on three things; the total length of the historical sample data (N), the number of symbols in the alphabet (k), and the longest pattern used to generate the statistics (Lmax). Consider a base case with  a sample size N=1000, alphabet size k=2, and history size Lmax=5 which for the sake of this example takes 1 second to run on a computer. Doubling each of these variable in turn increases the run time.

  N kLmaxtime
 Base case1000251
 double sample2000252
 double alphabet1000451400
 double history10002107000

To put this into perspective, consider using CSSR to  construct an ɛ-machine to learn from data coming from an extremely low resolution video camera. A very simple camera might have only 5x5=25 black/white pixels (no gray-scale). We might gather 1 hour of data at 10 frames per second and gather statistics for just 1 second (10 frames). This translates to N=36,000, k=1025, and Lmax=10 with a run time much much longer than the age of the universe on our base case computer.

An ɛ-machine learns quickly in the sense that it uses all of the available information, but the computational cost of doing so is prohibitive. The computations required cannot keep up with the rate that data is available from the real world. Brains do.

J.F.Traub and A.G.Werschulz developed an elegant approach known as Information-based Complexity (IBC) that studies problems for which the information is partial, contaminated and priced (Traub & Werschulz 1999). Partial information is very common in the real world where measurements provide information about a particular time or locale, but assumptions must be made regarding the unmeasured points. The assumptions are global information about the environment such as smoothness. Contaminated information is also common due to noise, rounding-errors and/or deception. The cost of information is incurred during its collection, and in the computations that follow. 

Consider the challenge of weather forecasting, where it is expensive to measure and collate data from even a few points around the globe, and every measurement contains error introduced by the resolution of the instruments. The computations to generate a forecast make assumptions about missing locations and incur further expense.

Computations made with partial or incomplete information, leave some uncertainty in the result.  IBC provides a framework to determine the  amount   of uncertainty remaining, how much information is required to limit the uncertainty, and the optimal amount of information. IBC defines a quantity called the radius of uncertainty which measures the intrinsic uncertainty in a solution due to the available information. The value of information, is it's capacity to reduce the radius of uncertainty.

These notions used by IBC are very helpful. Organisms with very simple nervous systems clearly have very limited capacity to collect and process information. Yet, despite these severe limitations, natural selection has proven that they routinely take actions that are superior to their competition.  When resources are severely limited, and the competition is close, it is important to use those resources to their maximum effect. It is important to collect and utilize that information which reduces uncertainty by the maximum amount for the least cost. IBC provides a framework to evaluate the cost and value of information collected in order to reduce uncertainty.

By way of example, the primate visual system is forward facing with each eye sensitive to light arriving within a cone of only about 120° of the full 360°. The visual acuity in the outer peripheral region is very low compared to the very much higher sensitivity and resolution of the central fovea spanning only about 2°. Primates perform many tens of thousands of saccades per day, which are rapid eye movements, that align the fovea with particular targets in the visual field. The primate visual system collects high quality information from only selected regions rather than all of that available.

In sum, identifying historical patterns enables uncertainty in the future to be reduced. The computational cost of identifying all patterns is prohibitive. Limited resources force brains to gather and utilize only that information which gives the most reduction in uncertainty for the least cost.

Shannon,C.E. (1948), A Mathematical Theory of Communication. 
Shalizi,C. (2001), Causal Architecture, Complexity and Self-Organization in Time Series and Cellular Automata. 
Turing, A.M. (1936) On Computable Numbers with an application to the Entscheidungsproblem.
Turing, A.M. (1950) Computing Machinery and Intelligence. Mind 49: 433-460. 
Traub,J.F. & Werschulz,A.G. (1999), Complexity and Information (Lezioni Lincee). Cambridge University Press. 

like a duck

posted 29 Feb 2012, 13:33 by John Brown   [ updated 18 Apr 2012, 22:09 ]

Much has been written about the now famous Turing Test (Turing 1950). In one sense Turing's challenge was provocative in pitting the capabilities of an inanimate machine against the only readily agreed example of intelligence (a human). On the other hand, Turing also avoided considerable controversy by merely asking that the machine imitate human intelligence rather than display genuine intelligence. In this seminal paper, Turing did not attempt to provide an explicit definition of intelligence, but he may have pointed to the only sensible definition none-the-less.

Turing proposed an 'imitation game' in which a human interrogator must determine the gender of two unseen respondents, a man and a woman. The interrogator may pose any number of questions to the respondents but must rely entirely on typewritten responses. The objective of the male respondent is to cause the interrogator to err while that of the female respondent is to assist the interrogator. Turing then asks what will happen when a machine takes the part of the male respondent in this game. Will the interrogator decide wrongly as often when the game is played like this as he does when the game is played between a man and a woman? 

Turing chose to use the thinking ability of a human interrogator to discriminate a successful imitation of thinking. This is a curiously circular approach, where thinking is being employed to recognize thinking. 

From the perspective of natural selection, it is tremendously important for an agent (an organism that can act on the environment) to be able to rapidly identify another agent in the environment. A nearby agent immediately presents four important alternatives, the famous four F's; to fight, feed, flee or procreate. These four are simply make or break issues as far as the gene's replication success is concerned. The presence of another agent in the environment introduces an entirely different level of cognitive challenge. A piece of fruit or a cliff are passive aspects of the environment where time for decision and action is plentiful. Two agents meeting in the environment are under severe time pressure to select between the four alternatives and act accordingly. Pascal Boyer describes humans as having hyper-active agent detection (Boyer 2001). Brain's have evolved particular excellence in identifying and evaluating other agents. Turing's choice would appear to be well justified in this light.

Turing required that the interrogator and respondents employ human language as the medium of the intelligence contest (rather than say pattern matching or mathematics).

Human language skills are striking in contrast to other species and provide important insight into the human brain. Sydney Lamb investigates the nature of human language with a keen recognition for it's dependence on the brain structures involved (Lamb 1999). Spoken language developed thousands of years before written texts and children learn to speak prior to learning to read and write. Hence, speech is more important than writing to learn about the brain, and it is clear that brains do not store and process the symbols that have later been introduced to write with. There are many common structures and mechanisms across the diversity of human language. Lamb concludes that the linguistic system in human brains is not a single system but an interconnected group of different systems.

While human language was the Turing's medium of choice, the contest itself was deception.

Again from the perspective of natural selection, Dawkins explains that some animals cause other animals to perform actions which are against their best interests (Dawkins 1997, 1999) and that these manipulations result in a arms race of adaptation and counter-adaptation. For example, cuckoo birds routinely place eggs into the nest of robins to obtain free child care, an in doing so have set off an arms race of mimicry and detection between the two species. Dawkins goes so far as to say that "...most animal signals are best seen as neither informative or deceptive, but rather as manipulative." Inanimate objects can only be moved by brute force but other animals can be manipulated by more subtle and efficient means. According to Darwin, manipulation is the intent, and language the medium. Again Turing's choice would appear well founded.

Turing's challenge focuses directly on two of the most highly evolved aspects of human intelligence; language and deception. It is a keen test indeed.

When I see a bird that walks like a duck and swims like a duck 
and quacks like a duck, I call that bird a duck.”  
James Whitcomb Riley.

Were a machine to pass the Turing Test, then people may or may not accept that the machine can think. However, evolution has clearly favored those that have given agents the benefit of the doubt, and acted accordingly. This, in fact, was the whole point.

Boyer,P. (2001), Religion Explained: The Evolutionary Origins of Religious Thought. Basic Books 
Dawkins, R. (1987), The Selfish Gene. Oxford University Press. 
Dawkins,R. (1999), The Extended Phenotype. Oxford University Press. 
Lamb,S.M (1999), Pathways of the Brain: The Neurocognitive Basis of Language. John Benjamins Publishing Co.
Turing, A.M. (1950) Computing Machinery and Intelligence. Mind 49: 433-460. 

why brains?

posted 27 Feb 2012, 02:29 by John Brown   [ updated 13 Mar 2012, 19:57 ]

Natural selection offers a perspective how brains came about.  John Allman has provided a particularly clear view of why brains came about (Allman 2000). I cannot do better that quote his words directly in many cases.

Allman identified three key themes in the evolution of the human brain: “that the essential role of brains is to serve as a buffer against environmental variation; that every evolutionary advance in the nervous system has a cost; and that the development of the brain to the level of complexity that we enjoy – and that makes our lives so rich – depended on the establishment of the human family as a social and reproductive unit.”. (Allman 2000)

Allman goes on to explain that “Brains exist because the distribution of resources necessary for survival and the hazards that threaten survival vary in space and time. There would be little need for a nervous system in an immobile organism or an organism that lived in regular and predictable surroundings. In the chaotic natural world, the distribution and localization of resources and hazards become more difficult to predict for larger spaces and longer spans of time.” (Allman 2000)

In most animals the brain is located near to the entrance to the gut which suggests that the brain arose as the gut’s way of controlling its intake by accepting nutritious foods and rejecting toxins. There are several families of genes that govern both brain and gut development. There is strong evidence that the brain and the gut compete for metabolic energy in the organism and that gut size limits brain size. The main energy using organs are the heart, liver, kidney, stomach and brain so there is a forced trade off between brain and digestive organs. (Allman 2000)

“The brains of warm-blooded vertebrates, the mammals and birds, tend to be larger than the brains of cold-blooded vertebrates of the same body weight. The larger brain in mammals and birds are a crucial part of a large set of mechanisms for maintaining a constant body temperature. Since all chemical reactions are temperature dependent, a constant temperature brings about stability in chemical reactions and a capacity for precise regulation and coordination of complex chemical systems. However, maintaining a constant body temperature requires a tenfold increase in energy expenditure.” The evolutionary changes required to maintain a constant temperature have included changes in the quantity of food consumed and the way it is chewed, in breathing, in locomotion, in parenting behavior, in the senses, in memory, and in the expansion of the forebrain. (Allman 2000)

For example, primates that eat fruit tend to have larger brains than leaf eaters. Fruit has higher energy and is easier to digest than the more readily available leaves. However, fruit is more widely dispersed in space and time, and subject to greater competition. (Allman 2000)

“In a newborn human the brain absorbs nearly two thirds of all the metabolic energy used by the entire body. … and ... Nurturing a large-brained baby imposes enormous energy costs on the mother because of the burden of lactation, which is far more costly than digestion.” (Allman 2000)

“The invention of the extended family enabled humans to evolve much larger brains and avoid the constraints imposed by the extremely slow maturation and low fecundity associated with such large brain size.” (Allman 2000)  Caring for other members of the species holds reproductive advantage for genes because a very large number of genes are contained in all members of the species. The remaining genes in the population, those that are not ubiquitous, account for differences between individual members of the species. There is a particular incentive to care for immediate family members (children, siblings and parents). During sexual reproduction, each parents contributes a  randomly selected half of their genes to their child. Thus, caring for an immediate family member will benefit this even higher proportion of shared genes.  This benefit also exists, but is less pronounced for more distant relations. 

However, there is also an incentive for each individual to obtain more than their fair share of the care on offer. It pays for a child to attract a disproportionate amount of parents resource (self investment vs investment in siblings) by deception (ie screaming louder, looking weaker etc). Dawkins goes so far as to say that "...most animal signals are best seen as neither informative or deceptive, but rather as manipulative." In response, the mother has an incentive to detect the deception and thwart the cheater.

“The personal and immediate social domain is the one closest to our destiny and the one which involves the greatest uncertainty and complexity” (Damasio 1994). Brains that have developed to navigate a social environment face a very much more difficult challenge than brains that exist in environments dominated by inanimate objects. Humans are much harder to anticipate than rocks (except those rocks thrown by humans).

In sum, the development and operation of a brain is tremendously expensive in terms of the energy required. Despite the expense, brains can confer a reproductive advantage through access to even more resources and in navigating a social group. A brain achieves this advantage by both predicting resources / hazards and by physically acting on the environment accordingly. Both the evolution and the development of a brain are inextricably intertwined with the environment.

Allman,J. (2000), Evolving Brains. W. H. Freeman. 
Damasio,A.R (1995), Descartes' Error : Emotion, Reason, and the Human Brain. Harper Perennial. 
Dawkins, R. (1987), The Selfish Gene. Oxford University Press. 

natural selection

posted 27 Feb 2012, 02:27 by John Brown   [ updated 13 Mar 2012, 19:47 ]

Brains, and the greater central nervous system, are the end-result of a long evolutionary process of natural selection. Natural selection offers a perspective on how brains came about, and Richard Dawkins provides a most insightful explanation of natural selection (Dawkins 1987). Appreciating why brains came about, is another matter again.

Dawkins explains that “The fundamental unit, the prime mover of all life, is the replicator. A replicator is anything in the universe of which copies are made. Replicators come into existence, in the first place, by chance, by the random jostling of smaller particles. Once a replicator has come into existence it is capable of generating an indefinitely large set of copies of itself.” However, errors in the copying process introduce new alternate replicators which subsequently compete for resources. Replication success is dependent both on the replicator's characteristics and on the current environment, which may include other replicators. The presence of other replicators can at times be mutually beneficial and at other times detrimental to replication success. Natural selection describes the increase in population of one replicator over another due to an extended lifespan, faster replication, and/or more accurate replication.  As the population becomes increasingly dominated by the most successful replicators, then they are increasingly competing with copies of themselves, or with close variants.

The replicators responsible for life in our corner of the universe are known as genes (not animals or plants). Genes are encoded in molecules of DNA. Some of the initial minute fragments of DNA interacted with the environment in such a way that promoted their own replication. The interaction with the environment of a gene, or group of genes is known as the phenotype. There were DNA copy errors which resulted in different variations of these genes. Different variations of a gene, or group of genes, are known as different genotypes. Each gene variant, or genotype, sported a different interaction with the environment, a different phenotype. The successful phenotypes (physical characteristics and behaviors) promoted superior replication of the underlying genes. This success of one phenotype over another  is the natural selection of one genotype over another. The genes are selected based on their phenotypes. 

Some groups of mutually beneficial genes (a genotype) resulted in the construction of an enclosing cell (a phenotype). Further evolution resulted in the development of multicellular organisms. Still further evolution saw the development of a recognizable Central Nervous System (CNS) and eventually brains, along with a host of other systems and capabilities. For many species, the advent of sexual reproduction provided an important source of gene variants to participate in the natural selection process. 

Each step along this very long path, was an adjustment to the then current genotype, which was entirely reliant on all of the prior adjustments. The success of the adjustment was dependent on the environment, and the competition, at the time. Success is measured by an increase in population.

Dawkins refers to the phenotype as the replicator's survival machine. “A survival machine is a vehicle containing not just one gene but many thousands. The manufacture of a body is a cooperative venture of such intricacy that it is almost impossible to disentangle the contribution of one gene from that of another.”

Dawkins goes to some length to dispel the myth that genes completely determine the development and behavior of each individual organism (Dawkins 1999). There are simply not enough genes in the human genome to specify the entire structural connectivity of the brain (Damasio 1994). Genes are not like a computer program which results in the same output every time. Genes are not like a set of engineering drawings that completely specify each component and assembly of the whole. Rather, genes interact with the environment during development of the organism and represent a tendency for a particular development path. If the environment changes, then the development path will change. And it must be noted: a significant source of environmental change is the actions of genes. Neither genes or the environment are static: both are in a continuous state of change; each directly and indirectly influencing the other.

In sum, brains are just one intricate part of an organism which evolved during a very long natural selection process. Each step in the evolution built on the pre-existing structure from the bottom up. Each step was more successful in creating replicas of the genetic material than the alternatives which existed in that environment, at that time. Genes have a tendency, rather than destiny, to develop in particular ways, and the tendency interacts with the environment. 

Intelligence looks a bit like a grab bag of capabilities; because it is!

Damasio,A.R (1995), Descartes' Error : Emotion, Reason, and the Human Brain. Harper Perennial. 
Dawkins, R. (1987), The Selfish Gene. Oxford University Press. 
Dawkins,R. (1999), The Extended Phenotype. Oxford University Press. 

homo sapiens

posted 26 Feb 2012, 22:33 by John Brown   [ updated 18 Apr 2012, 22:06 ]

Intelligence is a tricky concept to define explicitly. There would appear to be various types of intelligence and various levels of intelligence.

We call ourselves Homo sapiens - man the wise - because our mental capacities are so important to us.

It seems remarkable then, that there is no generally agreed definition of intelligence (Legg & Hutter). Certainly, there has been a wide range of proposals. One obvious approach has been to describe intelligence by way of a list of features which comprise intelligence. These might include the likes of memory, problem solving, reasoning, planning, language, analogy, adaptation, classification, learning, discrimination, judgment, and the capacity to inhibit instinct, all to enable success in an environment. This approach ends up looking like a grab bag of features rather than a clean definition.

A common feature of many definitions of intelligence is the notion of a purposeful adaptive behavioral response to the demands of the environment. However, Intelligence involves a value judgment on the part of the observer about the merits of the behavior observed (Butler & Hodos 2005). 

The famous Turing Test (Turing 1950) recognizes the ambiguities involved with questions such as “Can machines think?” and takes an indirect approach. Turing proposed an 'imitation game' in which a human interrogator must determine the gender of two unseen respondents, a man and a woman. The interrogator may pose any number of questions to the respondents but must rely entirely  on typewritten responses. The objective of the male respondent is to cause the interrogator to err while that of the female respondent is to assist the interrogator. Turing then asks what will happen when a machine takes the part of the male respondent in this game. Will the interrogator decide wrongly as often when the game is played like this as he does when the game is played between a man and a woman? 

Turing's indirect approach of allowing that a machine is merely imitating a thinking human rather than genuinely thinking, allows the discussion to proceed on relatively 'safe' ground. There is no controversy that humans can think, and the idea that a machine might one day deliver a successful imitation, while certainly ambitious, is not unduly confrontational. The modern day term “Artificial Intelligence” seems to follow along similar lines. The development of artificially intelligent systems is a challenging task without being so bold, or so rash, as to claim the development of genuine, non-human intelligence. 

The Turing Test thus investigates the theoretical capacity of a machine to imitate the deceitful responses of a human male to a degree that is indistinguishable to the human interrogator. While Turing did not define 'thinking', he clearly adopted  the assumption that the human male respondent is able to think, and that this ability contributes to his success in deceiving the interrogator. Turing, went on to argue that a digital computer is capable of passing the test, in principle, if not yet in practice. There has been much written on the topic since.

Whether or not a successful machine is actually thinking (whatever that may mean), it is certainly able to imitate a thinking device (a human male) to a considerable extent. The deceptive capacity of each individual human male will of course vary, as will the interrogator's skill in teasing out the truth of the matter. Human performances also clearly vary from one day to the next. So, the Turing Test maps out a region of convincing performance, but the boundaries are distinctly fuzzy. This highlights that a machine operating within this region may well exceed the deceptive capabilities of some human males. One approach that a machine could take would be to closely imitate a particular individual, but this probably would not work so well if the interrogator keeps coming up with new tactics.   Providing 'typical' responses may represent an easier route to success for a machine, but it does introduce the notion of a more 'general' capability – a step away from pure imitation and perhaps towards what we might consider intelligence.

Turing's imitation game pits the intellectual ability of the interrogator against that of the deceptive male respondent. Both contestants will certainly be thinking quite hard about their strategies to win the game. This is interesting, because when a machine takes place of the deceptive male respondent, the intelligence of the interrogator remains as the gauge of the machine's success. Turing chose to use the thinking ability of a human interrogator to discriminate a successful imitation of thinking. This is a curiously circular approach, where thinking is being employed to recognize thinking.

There is a profoundly tragic irony here. Alan Turing was a brilliant man who was instrumental in breaking German ciphers, including the Enigna machine, in the second world war. Turing  appears to have committed suicide after accepting treatment  with female  hormones as an alternative to imprisonment for homosexuality. (Hodges 1996)
In any case, there is a conceivable, loosely-defined set of machines that pass the Turing Test by imitating a human male attempting to deceive an interrogator, by masquerading as a female.
Turing's test drags together two apparently opposite extremes; animate homo-sapiens who unquestionably think; and inanimate machines. This leaves rather a large expanse of middle ground  inviting questions such as, “Can animals other than humans think” or “Can infant humans think?”. Such questions lead into progressively murkier territory with the likes of, “Which animals can think, and at what age?”. These questions are difficult because they challenge our intuition about what it means to think, and challenge our ability to test for intelligence in animals that do not have human language skills.

Turing's test deftly avoids considerable controversy  by simply comparing the machine to the best and most accepted example of thinking that we have - a human. Unfortunately for our ego, mammals do not always have the most sophisticated brain systems and even primates are not necessarily the most complex (Buter & Hodos 2005). For example, many mammals have superior olfaction to that of primates, bats and marine mammals have superior audition, birds have superior vision, and frogs have faster tongues. Even when the ratio of the cortex surface area to brain volume is examined, humans are average rather than exceptional.

It is all to easy to fall into the trap of measuring non-human animal cognition by the gold-standard of human cognition. Indeed, it is difficult to even imagine how animals think and perceive the world with such foreign sense organs as the sonar of bats or the lateral line of fish. Most intelligence tests are designed around quantifying one human cognitive function or another (Butler Hodos 2005) without regard for the differences in animal cognition. For example, rats perform as well in an olfactory version of a test as other primates in visual versions. Butler and Hodos caution against viewing non-human intelligence as a scaled down version of human intelligence as it overlooks the special adaptations of both human and non-humans.

Butler, A.B. & Hodos, W. (2005), Comparative Vertebrate Neuroanatomy - Evolution and Adaption. Wiley. 
Legg,S. & Hutter,M. (2006), A Collection of Definitions of Intelligence.
Russell,S.J. and Norvig, P. (2003), Artificial Intelligence: A Modern Approach. (2nd Edition) Prentice Hall. 
Turing, A.M. (1950) Computing Machinery and Intelligence. Mind 49: 433-460. 

1-10 of 18