A self-delimiting Turing machine considered as a set of (program, output) pairs

A more general framework---Special-purpose computers C, algorithmic probability P, and the halting probability Ω

U is just one of many possible self-delimiting binary computers C. Each such C can be simulated by U by adding a LISP prefix σC:

UC  p) = C(p).

In other words, programs for C also work for U; they just have to be lengthened a little, by |σC| bits. You just slap σC on in front, and that explains to U in LISP how C works!

Now let's define the complexity and the algorithmic probability of a LISP S-expression x if we are using a particular self-delimiting binary computer C:

HC(x) = minC(p)=x |p|
PC(x) = ∑C(p)=x 2−|p|

In other words, HC(x) is the size in bits of the smallest program for C to compute x. And PC(x) is the probability that C computes x when we generate the program p using an independent toss of a fair coin each time C requests another bit of p. This ``probabilistic process'' interpretation of the sum

PC(x) = ∑C(p)=x 2−|p|

suffices to establish that this sum indeed converges and must be between zero and one.

[Note: we'll consider HC(x) to be infinite, and PC(x) will be zero, if C cannot compute x.]

By the way, it follows at once that

PC(x) ≥ 2HC(x).

The message of Part II will be that in order to study the properties of this complexity measure, of the size of the smallest program for x, you actually have to look at all the programs that calculate x, you have to look at the probability of getting x at random. In other words, to understand H, you actually have to ``dig deeper'' and study P. H is the visible part of an iceberg, P is the submerged part. You have to weigh together all the programs that give x, not just look at the minimum-size ones, even if you're only interested in the minimum-size ones!

For any such machine C we also have a ``halting probability'':

ΩC = ∑C(p) halts 2−|p| = ∑x PC(x).

Since it's a well-defined probability, we deduce that this sum converges and is between zero and one.

Ω = ΩU

is also between zero and one and is the halting probability of the particular universal self-delimiting binary computer U that we programmed in LISP in the previous chapter.

For C = U I usually omit the subscript U in HU, PU and ΩU.

Recall that U is ``universal'' because each self-delimiting binary computer C can be simulated by U by adding a LISP prefix σC:

UC  p) = C(p).

You just slap σC on in front, and that explains to U how C works. What is the significance of U's universality? Why have we picked this particular U as the basis for AIT? Well, it's because U gives us an essentially minimal complexity measure:

HU(x) ≤ HC(x) + |σC|.

And it gives us an essentially maximal probability measure:

PU(x) ≥ PC(x) × 2−|σC|.

So any two universal computers U give essentially the same H and P. I picked this particular U because it was the simplest computer that I could construct that can be used as the foundation for AIT. It was the easiest way I could think of to actually write and run the programs that come up in my theory.

To repeat, we normally employ these abbreviations:

H(x) = HU(x)
P(x) = PU(x)
Ω = ΩU.

Joint complexity and algorithmic probability, and ambiguous programs

We'll also consider the complexity HC(x,y) and the algorithmic probability PC(x,y) of calculating two S-expressions, not one. I'll define this as the complexity HC((x y)) and the algorithmic probability PC((x y)) of the LISP pair (x y).

We've already seen this in the previous chapter, for the case C = U. In that case I can omit the subscript U and just write H(x,y) and P(x,y).

The reader should note that this introduces a slight ambiguity in our theory. Why? Because a program p for C to produce two results is also a valid program for producing a single S-expression, the pair! It's only from the context that we know whether we're using p to calculate two things or just one thing!

This may seem worrisome, but in fact in practice there is never any ambiguity, because when we're given a program we always know what it's for. So this will not complicate our theory.

There's also another possible source of ambiguity. A program p returns a final value, but it can also display intermediate results before producing its final value. When we're interested in generating infinite sets of S-expressions, we'll ignore final values and only look at what's displayed. Other times we care only about the final value, not about any intermediate results.

As long as one keeps these ambiguities in mind, they never cause any problem. It just means that what we consider to be the output of a program will depend on the context, and that we could in theory use the same program for different purposes, even though in practice this won't occur.

[We already encountered this ambiguity problem in our proof that

H(x,y) ≤ H(x) + H(y) + c

in the previous chapter; I just swept it under the rug and hoped that you didn't notice anything!]

Conditional algorithmic probability

Last but not least, let's define the conditional algorithmic probability P(y|x) to be the probability of computing y by chance if we are given [x for free via] a minimum-size program x* for U to compute x. I.e.,

P(y|x) = ∑U(p,x*)=y 2−|p|

summed over all programs p for U to compute y from x*, where U(x*) = x and |x*| = H(x).

Note that the probability P(y|x) is called ``conditional'', while the complexity H(y|x), which we defined in the previous chapter, is called ``relative''! This is because in one case we're using notation and terminology from probability theory, and in the other case we're using terminology from computability theory. In the last chapter of Part II we'll show that the word ``conditional'' is appropriate. One might even cross over and refer to H(y|x) as conditional information content and to P(y|x) as relative algorithmic probability, to emphasize the connection between algorithmic probability and program-size complexity that we'll establish later.

What is an r.e. (c.e.) set?

Here's an important piece of terminology.

What is an r.e. (c.e.), [old name] recursively enumerable ([new name] computably enumerable) set of S-expressions? It's a set X of S-expressions that can be generated in some order, with or without repetitions, as the display output of our universal Turing machine U. For example, the set of halting programs, or the set of theorems provable within a given formal axiomatic system, is r.e. (c.e.). This is not the same as being able to decide if an S-expression x is in the set X. Turing's halting problem shows that these two concepts are different!

A more abstract formulation---What is the graph of a computer C?

Now we're ready for a more abstract view of what a self-delimiting binary computer C is. It's just an r.e. (c.e.) set of (program, output) pairs with the property that the programs in this set cannot be prefixes of each other. In other words, we'll define C via the graph of its input/output function, as the set of all ordered pairs (p,C(p)). The fact that C is a binary computer is reflected in the fact that the domain of the function C, the set of all possible programs p, is a set of finite bit strings. And the key self-delimiting property of C is reflected in the fact that the domain of C is a prefix-free set. In other words, if C(p) is defined, and p' is an extension of p, then C(p') cannot be defined. (p,x) and (p',y) cannot both be in the graph of C if the bit string p' is a [proper] extension of the bit string p.

So this is a necessary condition on an r.e. (c.e.) set of pairs for it to be the graph of a self-delimiting binary computer C. And the point of this chapter is to show that this is also a sufficient condition for a set of ordered pairs to be the graph of a self-delimiting binary computer C. So C may be abstractly viewed as merely a set of (program, output) pairs with a prefix-free domain, in other words, in which no extension of a meaningful program is also a meaningful program.

And in the next chapter we'll construct computers that satisfy certain requirements by generating their graphs, the set of all their (program, output) pairs.

How can we prove that this abstract viewpoint is equivalent to the more concrete approach we've taken in the previous chapter, of reading in a bit at a time and only when you are sure that you need to?

Well, we have to find a way to simulate running a program on C if we're given a way to generate C's graph. That's not very hard! Let me show you how to do it in a self-delimiting way. You just start generating more and more of the graph of C and you only read an additional bit of your program p if you see that you're forced to because a proper extension of p is in the domain of C's graph. And of course if you find exactly the program p that you've read so far is in the graph of C, then you know the output that you have to produce, and you return that value and stop.

In other words, start with the null program p, with p = the empty bit string, and generate the (program, output) pairs. If you find (p, output), then you emit that output and stop. If you find (extension of p, output), then you read another bit of p. And then you keep generating more and more of the (program, output) pairs and continually repeating this search process to see if what you've read of the program so far, p, is in one of the (program, output) pairs, or if an extension of it is there. That's it!

Now let me show you the LISP program for doing this!

How to run a computation C(p) given the graph of the computer C

So we're given a never-ending LISP expression that generates the graph of C. It displays the LISP pairs (p C(p)). I'll show you a prefix that we can slap in front of this program for the graph of C, to get a simulation program σC that can actually run programs. [Except in this chapter, this graph will always be generated by using the Kraft inequality algorithm (that we'll see in the next chapter).]

Here's a run; it's pretty much self-explanatory. The graph of C is

{ (1,0), (01,1), (001,2), (0001,3), (00001,4), ... }

and we're given the program p = 0001, so C(p) = 3.

[[[[[
   Given an expr to enumerate (program output) pairs,
   we simulate the Turing machine defined this way.
   We assume this r.e. set of programs is prefix free,
   i.e., no extension of a valid program is a valid
   program.  If so, we will carry out this simulation
   in a self-delimiting way, i.e., we won't read any
   unnecessary bits of the program.
]]]]]
 
define pi
 
[this is the prefix to put in front of the expr to
enumerate the infinite set of (program output) pairs]
 
    [graph is an unending expression for (p o) pairs]
    let graph read-exp
 
    [program read so far; initialize to empty bit string]
    let p nil
 
    let (look-for p [in] pairs)
       if atom pairs false
       [(add new macro caar -> car car to interpreter?)]
       if = p car car pairs [does first pair start with p?]
          car pairs   [if so, return first pair]
          [otherwise, keep looking]
          (look-for p [in] cdr pairs)
 
    let (look-for-extension-of p [in] pairs)
       if atom pairs false
       if (is-prefix? p car car pairs)
          true
          (look-for-extension-of p [in] cdr pairs)
 
    let (is-prefix? p q) [is p a prefix of q?]
       if atom p true
       if atom q false
       if = car p car q
          (is-prefix? cdr p cdr q)
          false
 
    let (loop t)
       [run for time t expr to generate (program output) pairs]
       [pairs are displayed by graph]
       let pairs debug caddr try debug t graph nil
       let found-it? (look-for p pairs) [found pair with program p?]
       if found-it? cadr found-it? [if so, we have output for p!]
       [(if found-it? isn't false, then it's considered true)]
       [is an extension of p in there?]
       if (look-for-extension-of p [in] pairs)
                  [if so, read another bit of p]
           [then] let p debug append p cons read-bit nil
                     (loop + t 1) [and generate more pairs]
                  [don't read more of p, just generate more pairs]
           [else] (loop + t 1)
 
    [initialize time t to 0]
    (loop 0)
 
define      pi
value       ((' (lambda (graph) ((' (lambda (p) ((' (lambda (l
            ook-for) ((' (lambda (look-for-extension-of) ((' (
            lambda (is-prefix?) ((' (lambda (loop) (loop 0)))
            (' (lambda (t) ((' (lambda (pairs) ((' (lambda (fo
            und-it?) (if found-it? (car (cdr found-it?)) (if (
            look-for-extension-of p pairs) ((' (lambda (p) (lo
            op (+ t 1)))) (debug (append p (cons (read-bit) ni
            l)))) (loop (+ t 1)))))) (look-for p pairs)))) (de
            bug (car (cdr (cdr (try (debug t) graph nil)))))))
            )))) (' (lambda (p q) (if (atom p) true (if (atom
            q) false (if (= (car p) (car q)) (is-prefix? (cdr
            p) (cdr q)) false)))))))) (' (lambda (p pairs) (if
             (atom pairs) false (if (is-prefix? p (car (car pa
            irs))) true (look-for-extension-of p (cdr pairs)))
            )))))) (' (lambda (p pairs) (if (atom pairs) false
             (if (= p (car (car pairs))) (car pairs) (look-for
             p (cdr pairs))))))))) nil))) (read-exp))
 
[graph = (1 0) (01 1) (001 2) (0001 3) (00001 4) etc.]
define graph
    let (loop p n)
  [do!] cons display cons p cons n nil
             (loop  cons 0 p  + 1 n)
    (loop '(1) 0)
 
define      graph
value       ((' (lambda (loop) (loop (' (1)) 0))) (' (lambda (
            p n) (cons (display (cons p (cons n nil))) (loop (
            cons 0 p) (+ 1 n))))))
 
[test it!]
 
try 10 graph nil
 
expression  (try 10 graph nil)
value       (failure out-of-time (((1) 0) ((0 1) 1) ((0 0 1) 2
            ) ((0 0 0 1) 3) ((0 0 0 0 1) 4) ((0 0 0 0 0 1) 5)
            ((0 0 0 0 0 0 1) 6) ((0 0 0 0 0 0 0 1) 7) ((0 0 0
            0 0 0 0 0 1) 8)))
 
run-utm-on
 
append
 
     bits pi
 
append
 
     bits graph
 
     '(0 0 0 1)
 
expression  (car (cdr (try no-time-limit (' (eval (read-exp)))
             (append (bits pi) (append (bits graph) (' (0 0 0
            1)))))))
debug       0
debug       ()
debug       1
debug       ()
debug       2
debug       (((1) 0))
debug       (0)
debug       3
debug       (((1) 0) ((0 1) 1))
debug       (0 0)
debug       4
debug       (((1) 0) ((0 1) 1) ((0 0 1) 2))
debug       (0 0 0)
debug       5
debug       (((1) 0) ((0 1) 1) ((0 0 1) 2) ((0 0 0 1) 3))
debug       (0 0 0 1)
debug       6
debug       (((1) 0) ((0 1) 1) ((0 0 1) 2) ((0 0 0 1) 3) ((0 0
             0 0 1) 4))
value       3

Software for this chapter

The source code and runs of the software for this chapter can be found on the web at these URL's:
   http://www.cs.umaine.edu/~chaitin/ait/exec.l
   http://www.cs.umaine.edu/~chaitin/ait/exec.r
 
   http://www.cs.auckland.ac.nz/CDMTCS/chaitin/ait/exec.l
   http://www.cs.auckland.ac.nz/CDMTCS/chaitin/ait/exec.r

Exercises

  1. Try running a different example. So you'll have to modify my LISP code and run it until it works!

  2. Show by concatenating programs that

    P(x,y) ≥ P(x) × P(y) × 2c
    −log2 P(x,y) ≤ −log2 P(x)P(y) + c.

    Determine the value of c!