Tuesday, September 12, 2006

Busy Beaver hunter up and running!

A hastily-written python program hums quietly away on four different machines, patiently guessing solutions to the 5-node busy beaver problem. Hooray!! If I'd been thinking properly, I'd have written it in scheme and used gambit to compile it... the algorithm I'm using uses constant memory, but is quite greedy in terms of computation time, so every little bit of speed counts.

After about half an hour of running, one of the processes has found a machine that produces 12 marks after 61 cycles!

It would help prune down the search space enormously if I could prove there were equivalent turing machines out there that had non-isomorphic state graphs; especially if the result allowed us to prune down the possible labels for the edges of the state graph! That's where the really terrible performance comes from...

The potency of configurations

As a fun question to try to answer related to Turing machines... let Cn represent the set of all Turing templates of n states, as defined in the last post. Let P(c) represent the potency of configuration c∈Cn, defined to be the mean productivity of all the Turing machines t∈c.
The question: What's the distribution of P(c)?

Why it matters...

If P(c) has a Gaussian distribution with a fairly tight standard deviation, then the potency function is a useful heuristic that could be used to drive an A* search through Cn. A few samples of t∈c could establish whether c is worth examining.

It's not likely to be the case though, given that two machines in c could behave completely differently. This suggests looking for a different similarity metric between machines.

Monday, September 11, 2006

Hunting beavers: exhaustive generation without isomorphism

In the previous post, I decided to narrow my brute-force search for the 5-node Busy Beaver down to a smaller number of 5-node machines that at least stood a fair chance of halting and used all 5 states. I haven't proved this simplification doesn't exclude the Busy Beaver machine, but I have a strong intuition it doesn't.

The graph-theoretic reformulation

To make the search tractable, we're going to represent the busy beaver Turing machine with a graph. There are two symbols in the alphabet of the Turing machine, so each node in the graph representing a Turing machine "state" should have two outgoing edges. To guarantee that the machine at least contains a HALT instruction, add another node "E" representing the HALT instruction, and stipulate that there must be a path between the start node "S" and the HALT state "E". The situation: Suppose we have a directed graph of n nodes that allows loops and self-loops. Two nodes are distinguished: S and E. There must be a path from every node but E to E. All the nodes but E have two outgoing edges; E must have none. The question: How many non-isomorphic graphs of this form are there? The answer: 65 = 7,776 graphs. Few enough to be tractable!

Ugly details

Where did the magic 65 come from? A chain from S to E must exist, so start there. We can additionally ensure that every node has a path to E by having the chain from S to E run through every node:
The "backbone" of the isomorphism guarantees at least one HALT instruction and gives the machine a chance at reaching it
S 2 3 n E
? ? ? ? ?
That takes care of half the edges. The other n edges each choose one of the n+1 nodes independently, for a total of (n+1)n possibilities.

Applying the results

For each distinct graph found above, a number of machines can be generated by annotating the edges. For example, an annotation "1:1L" means "if I read a one, write a one, move the head left, and transition the Turing machine to the state at the end of this line." There are 2×4×4 = 32 possible annotation sets for each node, putting the solution space at 32n(n+1)n; a whopping 65325 (that's a 26-digit number in decimal!). Still too big to brute force, but still much smaller than the unrestricted solution space of (2n)8n (1040). The restrictions tighten the solution space by roughly 14 orders of magnitude for the n=5 case. Given that the constraints enforce the presence of at least one HALT, and the use of the entire 5 states of the machine, it's easy to see why the stochastic method over the unrestricted state space fails to find answers. The chances of hitting on one of the machines that fits the constraints is roughly 1 to 1014 against; given that the constraint machines aren't guaranteed to halt makes this statistic even more abysmal. Note that this approach still overgenerates. It's easy to see if both outgoing edges of a node lead to the same node that there are only 16 possible labels for the node's edges. The graph approach looks promising. Possibly there are further sensible constraints that would reduce the number of potential non-isomorphic graphs

Wednesday, September 06, 2006

Busy Beaver

Note: This article is in progress. The assignment is due Wednesday the 13th, so you can count on it being done sometime around then!

This problem comes to us by way of Tibor Radó's 1962 paper on the Busy Beaver function. A brief formulation goes like this: Suppose you define the productivity of a Turing machine with n states as the number of 1's written to the Turing machine's tape after it successfully halts. Represent the productivity as a function p(n). Call the machine with n states that maximizes p(n) the Busy Beaver Machine. Radó proved that no Turing machine can compute the value of p(n)1.

Our primary interest in the problem is to find the Busy Beaver Machine for n=5 in order to win a contest for class: who can write the 5-state Turing machine that halts and writes the most ones to an initially blank tape?

For an intuition about the magnitudes involved in this problem, previous research turns up the current leader in the n=5 case at Pascal Michel's page. This particular machine (found in 1989 by Heiner Marxen and Jürgen Buntrock) produces 4,098 marks before halting after 47,176,870 steps.

Formalizing the problem a bit

The Busy Beaver problem, being a subset of Turing literature, requires a bit further definition. Our Turing machine has the following form:

0 1
1 Snew D Inew
The instructions arranged into rows corresponding to the states of the Turing machine, and into columns based on the contents of the cell at the tape's current location. The instruction cell consists of three components: what to write at the current location, which way to move the tape, and what state to transition to next.

First approach: the stochastic method

The stochastic method builds random Turing machines and tests them. Because the Halting Problem is also not solvable, we run the Turing machine for an arbitrary amount of time and stop the machine if it hasn't stopped itself by then. The stochastic method is a rather naive method. Since it's not an exhaustive search of the state space, it's not possible to determine whether p(5) has been maximized or not. The best we can do is run the stochastic machine builder, keeping track of the most performant candidates, until we run out of time for the contest. It's important to understand the size of the search space. Suppose we examine each cell in the 2×n table individually and assign it an instruction. We choose Snew, D, and G independently for each instruction, giving 2×2×5 = 20 different operating instructions. Add to that the halt instruction and the total is 21 instructions, for a total of 1021 combinations! If the stochastic method finds any solution at all, it's by sheer luck. My python process has been running several hours, and the best it's come up with is 5-state Turing machine that produces 1 mark on the tape and halts. Obviously, even a human can do better than this!

Intuitions about behavior may improve our chances

Intuition 1: Avoiding Isomorphisms

The intuition here is that much of the state space is populated by clones—that is, equivalent Turing machines. Any machine M' which is equivalent to the halting machine M also halts and additionally produces the same number of marks on the tape. If a brute-force method could avoid generating all the isomorphisms of a halting machine, the state space to be searched might be significantly reduced. This approach, done without care, may produce a crippling effect on memory usage as each halting machine is stored so the algorithm can avoid producing its isomorphisms. Additionally, if the generator randomly generates new machines and then attempts to prove to itself whether it's an isomorphism of a known machine is likely to be more costly than simply running the machines themselves. What's needed is an algorithm for enumerating the non-isomorphic machines with n states that doesn't involve storing history.

Intuition 2: Graph Theory

Enter graph theory. By reformulating the question in terms of graph theory, we can take advantage of the powerful methods and proofs already developed in this domain. To simplify the machine representation, let's only examine the potential instruction sequences instead of the entire machine. Let's build a directed graph of the states. We'll represent the special HALT instruction as an extra node, giving n + 1 nodes to the state graph. We'll represent state transitions with colored edges to indicate that the state transitions are not fungible. First, some obvious observations on the necessary nature of this graph. The state graph should be connected, with no islands. By this I mean from any node it should be possible to reach the halt node. Otherwise the machine has effectively fewer than n states, and it's been proven that p(n) > p(n') where n > n'. This constraint guarantees that it is at least plausible that the machine will eventually reach the special halt node. Additionally, this helps the machine avoid some obvious infinite loops—for example, a state where both outgoing edges lead back to itself. The graph may be cyclic without necessarily causing the machine to run forever—the state graph doesn't encompass enough information to determine this (and in fact this is a good thing---we're hoping by simplifying the problem we can make it tractable. We already know the problem is not directly soluble, we're just trying to find a practical approximation). Fixing the state graph is a big win. For a given n-state graph, there are now only O(n2) machines (in reality, this number is slightly smaller—at least one of the instructions is the halt instruction, which has no second component). The next question in combinatorial analysis is: how many non-isomorphic, connected graphs of n+1 nodes, where n of the nodes have two distinct outgoing edges, and the special halt node has no outgoing edges? Finally, the last question is: how do we generate these graphs?

Intuition n: The optimal number of HALTs

The question of how many HALT instructions to include in the machine is intuitively related to the bridge-crossing question. Look at it this way: we must start at state 0, and to get the best of the machine we should try to use every state possible. In some cases, this is an even number of crossings, in some cases, an odd number. This affects the number of HALT instructions that need to be available (or does it? After all, we're not restricted to crossing a bridge only once—we're restricted to only crossing bridges that exist. That is, we can't modify the machine after it starts running). This most likely will turn out to be a false intuition.
1. http://plato.stanford.edu/entries/turing-machine/