_Posted on May 10, 2024_
Memory is an integral component of computing systems; it is difficult to conceive of any useful computation which does not _use_ memory. However, the concept of memory is rather abstract. Indeed, one may consider defining memory concretely, and run into various difficulties. First one may consider what kinds of things have memory, and what common properties these things share. For example, I may say that my brain has memory, that a computer has memory, or that a particular rock has memory. During one of my computer science classes, our professor engaged us in a thought experiment, asking whether a light bulb or light switch has memory. His conclusion was that these both do have memory, though he never attempted to define memory itself, or what properties give these things memory. Now almost any "thing" may reveal past condition based on its current state. For example, a dent in my water bottle may indicate that at some point I dropped it. However, this may also indicate many other things, the peculiarities of which are not revealed by the dent itself. As another example, by taking a measurement of the carbon-14 content of a particular piece of wood we can very well approximate when it died. This gives us some _information_ about its past condition, and possibly the past condition of things surrounding it. However, the only _definite_ information (from which we may extract other indefinite information) we obtain is that this particular piece of wood died at such and such time before now. Now we may wonder in this example where the memory is contained? Is it fair to consider the carbon-14 itself always as memory? If not when does it become memory?
In the case of both the water bottle dent and carbon-14 content of wood, there is some direct _low entropy_ information given regarding the objects past. The water bottle dent tells us that some high impact event occurred in the past, while the carbon-14 content tells us that this particular piece of wood died at such and such time before now. However, the more _useful_ information to be gained from these memory _traces_ has a higher entropy. As discussed above, the dent does not tell us exactly what caused the dent, and the carbon-14 content does not definitely tell us the age or origin of a hypothetical coin underneath the wood sample. This more useful ancillary indefinite information has a high entropy compared to the definite information provided through direct observation. Furthermore, this information is provided only by other things in addition to the particular feature of the object itself. The light bulb even contains this distinction. The low entropy information associated with a lit lightbulb is simply that it was turned on at some point in the past. Higher entropy information which is yet more useful may be _when_ it was turned on or who it was turned on by. However, this high entropy information is only obtained by specific observers who themselves have a memory of ancillary things. Memory in this sense is also distributed!
In contrast to the forgoing discussion, more well structured, explicit, and intentional memory has a different kind of information nature. For example, Bob may arrange eight lightbulbs in a line to encode and turn each one on or off according to one of $2^8 = 256$ possible configurations which a particular future observer Alice would find useful as memory. Note that in order for Alice to know the _meaning_ of the provided information, she must understand the meaning of each possible configuration. We call this common understanding a _protocol_ which must be established beforehand by Bob and Alice collaboratively. This kind of memory gives more definite low entropy information to Bob and Alice, but very high entropy information to any other observer. You may notice that the kind of memory here is far more complex than the memory given under direct consideration of carbon-14 content, lightbulb state, or a water bottle dent in isolation. This memory uses the individual lightbulb state switch memories to collectively provide low entropy ancillary information _as_ distributed memory through the use of protocol. This is memory both in the sense that it records the communicative intent of Bob at a particular time, _and_ in the sense that "something" occurred to place the bulbs in their current states. This memory is most useful only to Alice, who previously established a protocol with Bob in order to extract this utility later.
One cannot help but realize that this is the kind of memory which is most useful to computers. To illustrate this theoretically, I connect these ideas here concretely to models of computation exhibiting different levels of expression. This may help to give an intuitive (informal) notion as to _why_ different models indeed have different levels of expression. For a non-trivial [[Finite Automaton]], the only memory is that provided by the knowledge of the current machine state. In a particular state, one can learn very little definite information about the execution path which brought it there, or what _input_ it has previously encountered. This memory is very high entropy and so is unreliable. In fact, it turns out that [[Finite Automaton]] are generally considered as having no memory at all and are thus the least capable computational model. A more expressive computational model, [[Pushdown Automata]] (PDA), are actually similar in memory structure to the example between Bob and Alice. In thier case, each configuration of the eight light bulbs represented a symbol, which was interpreted through their established protocol. Additionally, as in a [[Pushdown Automata|PDA]], only the latest symbol can be seen by Alice to determine her next action. Naturally, this more structured memory would allow more expressiveness. By reading the stack contents, a PDA can trace back any number of intentional "messages" it left for itself in order to understand explicitly its previous states. This memory is very exact and low entropy, allowing the PDA to determine previous state sequence. Additionally, a PDA is equipped with it's own protocol, just as Bob and Alice are, which indicates the "meaning" of the stack contents, directly impacting its state transitions. A PDA, however, is not the most powerful model of computation. In order to read the next symbol, the machine must remove the one on the top of the stack. Thus we lose some memory in order to move down the stack. However, a [[Turing Machine]], with its infinite and aribitrarily traversable symbol recording mechanism, does not share this fundamental limitation.