In other words, programs for C also work for U; they just have to be lengthened a little, by |σC| bits. You just slap σC on in front, and that explains to U in LISP how C works!
Now let's define the complexity and the algorithmic probability of a LISP S-expression x if we are using a particular self-delimiting binary computer C:
In other words, HC(x) is the size in bits of the smallest program for C to compute x. And PC(x) is the probability that C computes x when we generate the program p using an independent toss of a fair coin each time C requests another bit of p. This ``probabilistic process'' interpretation of the sum
suffices to establish that this sum indeed converges and must be between zero and one.
[Note: we'll consider HC(x) to be infinite, and PC(x) will be zero, if C cannot compute x.]
By the way, it follows at once that
The message of Part II will be that in order to study the properties of this complexity measure, of the size of the smallest program for x, you actually have to look at all the programs that calculate x, you have to look at the probability of getting x at random. In other words, to understand H, you actually have to ``dig deeper'' and study P. H is the visible part of an iceberg, P is the submerged part. You have to weigh together all the programs that give x, not just look at the minimum-size ones, even if you're only interested in the minimum-size ones!
For any such machine C we also have a ``halting probability'':
Since it's a well-defined probability, we deduce that this sum converges and is between zero and one.
is also between zero and one and is the halting probability of the particular universal self-delimiting binary computer U that we programmed in LISP in the previous chapter.
For C = U I usually omit the subscript U in HU, PU and ΩU.
Recall that U is ``universal'' because each self-delimiting binary computer C can be simulated by U by adding a LISP prefix σC:
You just slap σC on in front, and that explains to U how C works. What is the significance of U's universality? Why have we picked this particular U as the basis for AIT? Well, it's because U gives us an essentially minimal complexity measure:
And it gives us an essentially maximal probability measure:
So any two universal computers U give essentially the same H and P. I picked this particular U because it was the simplest computer that I could construct that can be used as the foundation for AIT. It was the easiest way I could think of to actually write and run the programs that come up in my theory.
To repeat, we normally employ these abbreviations:
We've already seen this in the previous chapter, for the case C = U. In that case I can omit the subscript U and just write H(x,y) and P(x,y).
The reader should note that this introduces a slight ambiguity in our theory. Why? Because a program p for C to produce two results is also a valid program for producing a single S-expression, the pair! It's only from the context that we know whether we're using p to calculate two things or just one thing!
This may seem worrisome, but in fact in practice there is never any ambiguity, because when we're given a program we always know what it's for. So this will not complicate our theory.
There's also another possible source of ambiguity. A program p returns a final value, but it can also display intermediate results before producing its final value. When we're interested in generating infinite sets of S-expressions, we'll ignore final values and only look at what's displayed. Other times we care only about the final value, not about any intermediate results.
As long as one keeps these ambiguities in mind, they never cause any problem. It just means that what we consider to be the output of a program will depend on the context, and that we could in theory use the same program for different purposes, even though in practice this won't occur.
[We already encountered this ambiguity problem in our proof that
in the previous chapter; I just swept it under the rug and hoped that you didn't notice anything!]
summed over all programs p for U to compute y from x*, where U(x*) = x and |x*| = H(x).
Note that the probability P(y|x) is called ``conditional'', while the complexity H(y|x), which we defined in the previous chapter, is called ``relative''! This is because in one case we're using notation and terminology from probability theory, and in the other case we're using terminology from computability theory. In the last chapter of Part II we'll show that the word ``conditional'' is appropriate. One might even cross over and refer to H(y|x) as conditional information content and to P(y|x) as relative algorithmic probability, to emphasize the connection between algorithmic probability and program-size complexity that we'll establish later.
What is an r.e. (c.e.), [old name] recursively enumerable ([new name] computably enumerable) set of S-expressions? It's a set X of S-expressions that can be generated in some order, with or without repetitions, as the display output of our universal Turing machine U. For example, the set of halting programs, or the set of theorems provable within a given formal axiomatic system, is r.e. (c.e.). This is not the same as being able to decide if an S-expression x is in the set X. Turing's halting problem shows that these two concepts are different!
So this is a necessary condition on an r.e. (c.e.) set of pairs for it to be the graph of a self-delimiting binary computer C. And the point of this chapter is to show that this is also a sufficient condition for a set of ordered pairs to be the graph of a self-delimiting binary computer C. So C may be abstractly viewed as merely a set of (program, output) pairs with a prefix-free domain, in other words, in which no extension of a meaningful program is also a meaningful program.
And in the next chapter we'll construct computers that satisfy certain requirements by generating their graphs, the set of all their (program, output) pairs.
How can we prove that this abstract viewpoint is equivalent to the more concrete approach we've taken in the previous chapter, of reading in a bit at a time and only when you are sure that you need to?
Well, we have to find a way to simulate running a program on C if we're given a way to generate C's graph. That's not very hard! Let me show you how to do it in a self-delimiting way. You just start generating more and more of the graph of C and you only read an additional bit of your program p if you see that you're forced to because a proper extension of p is in the domain of C's graph. And of course if you find exactly the program p that you've read so far is in the graph of C, then you know the output that you have to produce, and you return that value and stop.
In other words, start with the null program p, with p = the empty bit string, and generate the (program, output) pairs. If you find (p, output), then you emit that output and stop. If you find (extension of p, output), then you read another bit of p. And then you keep generating more and more of the (program, output) pairs and continually repeating this search process to see if what you've read of the program so far, p, is in one of the (program, output) pairs, or if an extension of it is there. That's it!
Now let me show you the LISP program for doing this!
Here's a run; it's pretty much self-explanatory. The graph of C is
and we're given the program p = 0001, so C(p) = 3.
[[[[[
Given an expr to enumerate (program output) pairs,
we simulate the Turing machine defined this way.
We assume this r.e. set of programs is prefix free,
i.e., no extension of a valid program is a valid
program. If so, we will carry out this simulation
in a self-delimiting way, i.e., we won't read any
unnecessary bits of the program.
]]]]]
define pi
[this is the prefix to put in front of the expr to
enumerate the infinite set of (program output) pairs]
[graph is an unending expression for (p o) pairs]
let graph read-exp
[program read so far; initialize to empty bit string]
let p nil
let (look-for p [in] pairs)
if atom pairs false
[(add new macro caar -> car car to interpreter?)]
if = p car car pairs [does first pair start with p?]
car pairs [if so, return first pair]
[otherwise, keep looking]
(look-for p [in] cdr pairs)
let (look-for-extension-of p [in] pairs)
if atom pairs false
if (is-prefix? p car car pairs)
true
(look-for-extension-of p [in] cdr pairs)
let (is-prefix? p q) [is p a prefix of q?]
if atom p true
if atom q false
if = car p car q
(is-prefix? cdr p cdr q)
false
let (loop t)
[run for time t expr to generate (program output) pairs]
[pairs are displayed by graph]
let pairs debug caddr try debug t graph nil
let found-it? (look-for p pairs) [found pair with program p?]
if found-it? cadr found-it? [if so, we have output for p!]
[(if found-it? isn't false, then it's considered true)]
[is an extension of p in there?]
if (look-for-extension-of p [in] pairs)
[if so, read another bit of p]
[then] let p debug append p cons read-bit nil
(loop + t 1) [and generate more pairs]
[don't read more of p, just generate more pairs]
[else] (loop + t 1)
[initialize time t to 0]
(loop 0)
define pi
value ((' (lambda (graph) ((' (lambda (p) ((' (lambda (l
ook-for) ((' (lambda (look-for-extension-of) ((' (
lambda (is-prefix?) ((' (lambda (loop) (loop 0)))
(' (lambda (t) ((' (lambda (pairs) ((' (lambda (fo
und-it?) (if found-it? (car (cdr found-it?)) (if (
look-for-extension-of p pairs) ((' (lambda (p) (lo
op (+ t 1)))) (debug (append p (cons (read-bit) ni
l)))) (loop (+ t 1)))))) (look-for p pairs)))) (de
bug (car (cdr (cdr (try (debug t) graph nil)))))))
)))) (' (lambda (p q) (if (atom p) true (if (atom
q) false (if (= (car p) (car q)) (is-prefix? (cdr
p) (cdr q)) false)))))))) (' (lambda (p pairs) (if
(atom pairs) false (if (is-prefix? p (car (car pa
irs))) true (look-for-extension-of p (cdr pairs)))
)))))) (' (lambda (p pairs) (if (atom pairs) false
(if (= p (car (car pairs))) (car pairs) (look-for
p (cdr pairs))))))))) nil))) (read-exp))
[graph = (1 0) (01 1) (001 2) (0001 3) (00001 4) etc.]
define graph
let (loop p n)
[do!] cons display cons p cons n nil
(loop cons 0 p + 1 n)
(loop '(1) 0)
define graph
value ((' (lambda (loop) (loop (' (1)) 0))) (' (lambda (
p n) (cons (display (cons p (cons n nil))) (loop (
cons 0 p) (+ 1 n))))))
[test it!]
try 10 graph nil
expression (try 10 graph nil)
value (failure out-of-time (((1) 0) ((0 1) 1) ((0 0 1) 2
) ((0 0 0 1) 3) ((0 0 0 0 1) 4) ((0 0 0 0 0 1) 5)
((0 0 0 0 0 0 1) 6) ((0 0 0 0 0 0 0 1) 7) ((0 0 0
0 0 0 0 0 1) 8)))
run-utm-on
append
bits pi
append
bits graph
'(0 0 0 1)
expression (car (cdr (try no-time-limit (' (eval (read-exp)))
(append (bits pi) (append (bits graph) (' (0 0 0
1)))))))
debug 0
debug ()
debug 1
debug ()
debug 2
debug (((1) 0))
debug (0)
debug 3
debug (((1) 0) ((0 1) 1))
debug (0 0)
debug 4
debug (((1) 0) ((0 1) 1) ((0 0 1) 2))
debug (0 0 0)
debug 5
debug (((1) 0) ((0 1) 1) ((0 0 1) 2) ((0 0 0 1) 3))
debug (0 0 0 1)
debug 6
debug (((1) 0) ((0 1) 1) ((0 0 1) 2) ((0 0 0 1) 3) ((0 0
0 0 1) 4))
value 3
http://www.cs.umaine.edu/~chaitin/ait/exec.l http://www.cs.umaine.edu/~chaitin/ait/exec.r http://www.cs.auckland.ac.nz/CDMTCS/chaitin/ait/exec.l http://www.cs.auckland.ac.nz/CDMTCS/chaitin/ait/exec.r
Determine the value of c!