How to construct self-delimiting Turing machines: the Kraft inequality

Introduction---Building a computer

In this chapter I'll present a very general technique, which I'll call Kraft, for constructing self-delimiting computers by generating their input/output graphs. This is our most basic tool. We'll use it many times: over and over again in what's left of Part II and in Part III. In fact, using this technique is at the heart of the next two chapters, it's really what these two chapters are about.

Kraft says that if we can generate an infinite list of (output, size of program) requirements for building a self-delimiting binary computer C (and repetitions are allowed in a list), then we can generate C's graph, the infinite set of all C's (program, output) pairs. All this works as long as

∑   (1/2)requested size of program

summed over all the requirements is ≤ 1. I.e., as long as the total algorithmic probability that we are requested to consume is ≤ 1:

ΩC = ∑C(p) halts 2−|p| ≤ 1.

That's it! It's a necessary and sufficient condition for the existence of a computer C satisfying these requirements!

So this really tells us a lot about how to construct self-delimiting computers.

That

ΩC ≤ 1

is called the Kraft inequality in conventional Shannon-style information theory, so I call it that here, even though this theorem is mine, [It's stated and proved for the first time in my paper ``A theory of program size formally identical to information theory,'' J.ACM 22 (1975), pp. 329-340. This paper is also the source of the theorems in the previous and in the next three chapters, unless another reference is explicitly given. Kraft's original theorem deals with finite prefix-free sets of words.] not Kraft's.

Kraft inequality algorithm for constructing computers

This chapter's algorithm takes as input requirement pairs (output, size of program) and produces as output ``assignment pairs'' (program, output). We assume that the requirements are consistent. I.e., that the sum of 1/2 raised to each size is ≤ unity. Then we can produce a set of assignments that meets the requirements and is prefix free, i.e., no extension of a valid program is a valid program. The basic data structure in this program is the free space pool. That's a list of prefixes all of whose extensions (by zero or more bits) are unassigned programs. Initially this is the list consisting of the empty string because everything is free. The algorithm consists of assigning that program that meets each requirement that is available and that comes first in alphabetical order (0 before 1). The free space pool is kept in alphabetical order, to facilitate searching. So the algorithm is to look for the first prefix in the pool that is ≤ the requested size in each requirement. If it has exactly the requested size, then that prefix is the program assigned to the output in the requirement. If not, the prefix must be smaller than the requirement, and represents a piece of free storage (probability) that is a power of two larger than needed. So as program we assign the prefix extended by sufficiently many 0's to reach the requested size, and then the prefix is replaced in the free space pool by prefix00001, prefix0001, prefix001, prefix01, prefix1, for all sizes down from the assigned size. This will always work if the Kraft inequality is satisfied.

For example, if this algorithm is given the requirements

      (0 1) (1 2)  (2 3)   (3 4)...
(i.e., a 1-bit p for 0, a 2-bit p for 1, a 3-bit p for 2, etc.) it will produce the assignments
      (0 0) (10 1) (110 2) (1110 3)...
(i.e., p = 0 yields 0, p = 10 yields 1, p = 110 yields 2, etc.). In this case the free storage pool will always only contain a single prefix, one that consists entirely of 1's.

Key insight: Consider each binary program p to simultaneously be the base-two fraction .p in the unit interval of reals between zero and one.

Key fact: if you follow this first-fit algorithm, the free space will appear in blocks in the unit interval program space whose sizes (probability) are all distinct negative powers of two, and in order of increasing size. So if an allocation cannot be made, the piece requested must be larger (at least twice as large) as the last and largest piece. But the total free storage (probability!) is less than twice the size of the last and largest free piece, so allocating this new piece would have violated the Kraft inequality.

To actually run the assigned programs, we have to feed the output of kraft.l into exec.l. So there will in fact be an additional prefix in front of the assigned programs to tell U how to do kraft.l and how to do exec.l.

Kraft is written as a function that is applied to a finite list of requirements and produces the corresponding finite list of assignments. Exec.l will use try to run the requirement generator for more and more time to produce longer and longer lists of requirements. It will then use the Kraft function to transform these into longer and longer lists of assignments, which it will use to actually run individual programs by reading them bit by bit as required. The Kraft function has the property that applying it to a longer list of requirements just produces a longer list of assignments. I.e., it's monotone, it never changes its mind.

Now let's run this software and give it a few simple test cases:

[[[[[
   KRAFT INEQUALITY CRITERION FOR CONSTRUCTING COMPUTERS
   Take as input requirement pairs (output, size of program)
   and produce as output assignment pairs (program, output).
]]]]]
 
[used to assign a program to an output]
 
define (extend-with-0s bit-string [to] given-length)
   if = length bit-string given-length  [then]
      bit-string  [else]
   append (extend-with-0s bit-string [to] - given-length 1)
          cons 0 nil
 
define      extend-with-0s
value       (lambda (bit-string given-length) (if (= (length b
            it-string) given-length) bit-string (append (exten
            d-with-0s bit-string (- given-length 1)) (cons 0 n
            il))))
 
[test it]
 
(extend-with-0s '(1 1 1) [to] 6)
 
expression  (extend-with-0s (' (1 1 1)) 6)
value       (1 1 1 0 0 0)
 
(extend-with-0s '(1 1 1) [to] 5)
 
expression  (extend-with-0s (' (1 1 1)) 5)
value       (1 1 1 0 0)
 
(extend-with-0s '(1 1 1) [to] 4)
 
expression  (extend-with-0s (' (1 1 1)) 4)
value       (1 1 1 0)
 
(extend-with-0s '(1 1 1) [to] 3)
 
expression  (extend-with-0s (' (1 1 1)) 3)
value       (1 1 1)
 
[used to subtract storage from a piece of free storage]
 
define (remove-piece free-prefix size-of-program)
   if = size-of-program length free-prefix
       nil  [then no storage left, else]
   cons append (extend-with-0s free-prefix [to] - size-of-program 1)
               cons 1 nil
        (remove-piece free-prefix - size-of-program 1)
 
define      remove-piece
value       (lambda (free-prefix size-of-program) (if (= size-
            of-program (length free-prefix)) nil (cons (append
             (extend-with-0s free-prefix (- size-of-program 1)
            ) (cons 1 nil)) (remove-piece free-prefix (- size-
            of-program 1)))))
 
[test it]
 
(remove-piece '(1 1 1) 6)
 
expression  (remove-piece (' (1 1 1)) 6)
value       ((1 1 1 0 0 1) (1 1 1 0 1) (1 1 1 1))
 
(remove-piece '(1 1 1) 5)
 
expression  (remove-piece (' (1 1 1)) 5)
value       ((1 1 1 0 1) (1 1 1 1))
 
(remove-piece '(1 1 1) 4)
 
expression  (remove-piece (' (1 1 1)) 4)
value       ((1 1 1 1))
 
(remove-piece '(1 1 1) 3)
 
expression  (remove-piece (' (1 1 1)) 3)
value       ()
 
[
   Make-assignments uses debug to show us the
   free-space-pool after making each assignment.
   If an assignment cannot be made because
   there is not enough storage, we just skip it.
]
define (make-assignments free-space-pool requirements)
 
   let free-space-pool   debug free-space-pool
                         [Done just to show pool!]
 
   if  atom requirements nil  [no requirements => no assignments]
                              [If so, we're finished!]
 
   let requirement       car  requirements
   let requirements      cdr  requirements
   let output-of-program car  requirement
   let size-of-program   cadr requirement
   let already-scanned   nil
   let not-yet-scanned   free-space-pool
 
   let (loop-thru-free-space-pool) [DEFINE IT!]
 
       if atom not-yet-scanned [cannot make this assignment!]
          [indicate problem and go to next requirement]
          cons not-enough-storage!
               (make-assignments free-space-pool requirements)
 
       let free-prefix      car not-yet-scanned
       let not-yet-scanned  cdr not-yet-scanned
       if < size-of-program length free-prefix
 
          [doesn't fit --- continue scanning]
          let already-scanned
              append already-scanned
                     cons free-prefix nil
          (loop-thru-free-space-pool)
 
       [fits! --- make assignment]
       [add to list of rest of assignments]
       let free-space-pool
           append already-scanned
           append (remove-piece free-prefix size-of-program)
                  not-yet-scanned
 
       let assignment
           [make assignment - add to list of rest of assignments]
           [found free piece where it fits!]
           [extend with zeros to correct size]
           cons (extend-with-0s free-prefix [to] size-of-program)
           cons output-of-program
                nil
 
       [return full list of assignments]
       cons assignment
            (make-assignments free-space-pool requirements)
 
   [NOW DO IT!]
   (loop-thru-free-space-pool)
 
define      make-assignments
value       (lambda (free-space-pool requirements) ((' (lambda
             (free-space-pool) (if (atom requirements) nil (('
             (lambda (requirement) ((' (lambda (requirements)
            ((' (lambda (output-of-program) ((' (lambda (size-
            of-program) ((' (lambda (already-scanned) ((' (lam
            bda (not-yet-scanned) ((' (lambda (loop-thru-free-
            space-pool) (loop-thru-free-space-pool))) (' (lamb
            da () (if (atom not-yet-scanned) (cons not-enough-
            storage! (make-assignments free-space-pool require
            ments)) ((' (lambda (free-prefix) ((' (lambda (not
            -yet-scanned) (if (< size-of-program (length free-
            prefix)) ((' (lambda (already-scanned) (loop-thru-
            free-space-pool))) (append already-scanned (cons f
            ree-prefix nil))) ((' (lambda (free-space-pool) ((
            ' (lambda (assignment) (cons assignment (make-assi
            gnments free-space-pool requirements)))) (cons (ex
            tend-with-0s free-prefix size-of-program) (cons ou
            tput-of-program nil))))) (append already-scanned (
            append (remove-piece free-prefix size-of-program)
            not-yet-scanned)))))) (cdr not-yet-scanned)))) (ca
            r not-yet-scanned)))))))) free-space-pool))) nil))
            ) (car (cdr requirement))))) (car requirement))))
            (cdr requirements)))) (car requirements))))) (debu
            g free-space-pool)))
 
[put it all together]
 
define (kraft requirements)
 
   let free-space-pool '(()) [everything free]
 
   (make-assignments free-space-pool requirements)
 
define      kraft
value       (lambda (requirements) ((' (lambda (free-space-poo
            l) (make-assignments free-space-pool requirements)
            )) (' (()))))
 
[TEST KRAFT!]
 
(kraft '((x 0) (y 1)))
 
expression  (kraft (' ((x 0) (y 1))))
debug       (())
debug       ()
debug       ()
value       ((() x) not-enough-storage!)
 
 
(kraft '((a 1) (b 0) (c 1)))
 
expression  (kraft (' ((a 1) (b 0) (c 1))))
debug       (())
debug       ((1))
debug       ((1))
debug       ()
value       (((0) a) not-enough-storage! ((1) c))
 
 
(kraft '((x 1) (y 2)))
 
expression  (kraft (' ((x 1) (y 2))))
debug       (())
debug       ((1))
debug       ((1 1))
value       (((0) x) ((1 0) y))
 
 
(kraft '((a 1) (b 2) (c 3) (d 4) (e 5)))
 
expression  (kraft (' ((a 1) (b 2) (c 3) (d 4) (e 5))))
debug       (())
debug       ((1))
debug       ((1 1))
debug       ((1 1 1))
debug       ((1 1 1 1))
debug       ((1 1 1 1 1))
value       (((0) a) ((1 0) b) ((1 1 0) c) ((1 1 1 0) d) ((1 1
             1 1 0) e))
 
 
(kraft '((e 5) (d 4) (c 3) (b 2) (a 1)))
 
expression  (kraft (' ((e 5) (d 4) (c 3) (b 2) (a 1))))
debug       (())
debug       ((0 0 0 0 1) (0 0 0 1) (0 0 1) (0 1) (1))
debug       ((0 0 0 0 1) (0 0 1) (0 1) (1))
debug       ((0 0 0 0 1) (0 1) (1))
debug       ((0 0 0 0 1) (1))
debug       ((0 0 0 0 1))
value       (((0 0 0 0 0) e) ((0 0 0 1) d) ((0 0 1) c) ((0 1)
            b) ((1) a))
 
 
(kraft '((e 5) (c 3) (d 4) (a 1) (b 2)))
 
expression  (kraft (' ((e 5) (c 3) (d 4) (a 1) (b 2))))
debug       (())
debug       ((0 0 0 0 1) (0 0 0 1) (0 0 1) (0 1) (1))
debug       ((0 0 0 0 1) (0 0 0 1) (0 1) (1))
debug       ((0 0 0 0 1) (0 1) (1))
debug       ((0 0 0 0 1) (0 1))
debug       ((0 0 0 0 1))
value       (((0 0 0 0 0) e) ((0 0 1) c) ((0 0 0 1) d) ((1) a)
             ((0 1) b))

Software for this chapter

The source code and runs of the software for this chapter can be found on the web at these URL's:
   http://www.cs.umaine.edu/~chaitin/ait/kraft.l
   http://www.cs.umaine.edu/~chaitin/ait/kraft.r
 
   http://www.cs.auckland.ac.nz/CDMTCS/chaitin/ait/kraft.l
   http://www.cs.auckland.ac.nz/CDMTCS/chaitin/ait/kraft.r

Exercises

  1. Play with the Kraft software. Try feeding it other interesting sequences of requirements to see how it behaves. Try to get a feeling for how the algorithm works.

  2. The crucial property that makes our Kraft algorithm work is that at each stage the free storage pool consists of pieces of storage that are all different negative powers of two in size (in the probability space) and that are arranged in the pool in order of increasing size. In other words, the free program prefixes in the pool are arranged in order of decreasing size, and no two of them are the same size. This is the ``induction hypothesis'' in the proof that our algorithm works. Check that this property always holds. To do this, add a function to the Kraft software that examines the free storage pool when it is displayed at the beginning of make-assignments, in the line of code with debug free-space-pool in it. This induction hypothesis checker should use debug to display (induction hypothesis verified) or (induction hypothesis fails!), as the case may be.

  3. Add code to make-assignments for displaying how much free storage (probability!) is left at each point, in the form of a base-two binary fraction between zero and one. This code should also check that the bits that are on in this binary fraction correspond precisely to the blocks of free storage in the pool. That is, if the bit corresponding to 2k is a 1 bit, then there should be a free program prefix that is k bits long, and if that bit is a 0, then hopefully no k-bit prefix is in the free space pool. Once you verify this more precise formulation of the induction hypothesis for our algorithm, can you use it to devise a new and improved Kraft algorithm? If so, do it! If not, why not?

  4. Try combining the software in this chapter with that in the previous chapter. In other words, take a list of (output, size of program) requirements, use kraft to convert it into a list of (program, output) pairs (the graph of a computer C's input/output function), and then feed this into the software for running a given program p while generating the input/output graph of C.

  5. Next, take this combined software and insert the unending algorithms for generating requirements that we construct in the next two chapters. By combining this chapter's and the previous chapter's software with that in each of the next two chapters, you'll get the complete LISP software for proving the two main theorems of Part II, namely,

    H(x) ≤ −log2 PC(x) + c
    H(y|x) ≤ HC(x,y) − H(x) + c'.

  6. And now that you've put together all the software for proving each of the two main theorems of Part II, you can, if you wish, by looking at the total size of the code in each case, get explicit numerical values for the constants c and c' for the particular case that C = U. (That's the case that's important theoretically, but it's not a good test case, which is why we use toy computers C in the next two chapters, not C = U.)

  7. Finally, use these explicit numerical values for c and c' to bound these differences:

    Δ1(x) = | H(x) + log2 P(x) |
    Δ2(x,y) = | H(x) + H(y|x) − H(x,y) |.

    We show in Part II that these two Δ's are O(1), but it would be nice to have explicit numerical upper bounds.