Algorithmic approaches to solving the Rubik's cube

Material Information

Algorithmic approaches to solving the Rubik's cube
Fowler, Joe
Place of Publication:
Denver, Colo.
University of Colorado Denver
Publication Date:
Physical Description:
47 leaves : illustrations ; 28 cm

Thesis/Dissertation Information

Master's ( Master of Science)
Degree Grantor:
University of Colorado Denver
Degree Divisions:
Department of Computer Science and Engineering, CU Denver
Degree Disciplines:
Computer Science
Committee Chair:
Clark, John
Committee Members:
McConnell, Ross
Gethner, Ellen
Cios, Krzysztof


Subjects / Keywords:
Rubik's Cube ( lcsh )
Algorithms ( lcsh )
Algorithms ( fast )
Rubik's Cube ( fast )
bibliography ( marcgt )
theses ( marcgt )
non-fiction ( marcgt )


Includes bibliographical references (leaves 44-47).
General Note:
Department of Computer Science and Engineering
Statement of Responsibility:
by Joe Fowler.

Record Information

Source Institution:
|University of Colorado Denver
Holding Location:
Auraria Library
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
54670937 ( OCLC )
LD1190.E52 2003m F68 ( lcc )

Full Text
Joe Fowler
B.S., University of Colorado in Boulder, 1995
A thesis submitted to the
University of Colorado in Denver
in partial fulfillment
of the requirements for the degree of
Master of Science
Computer Science

This thesis for the Master of Science
degree by
Joe Fowler
has been approved
Ross McConnell
Ellen Gethner
Krzysztof Cios

Fowler, Joe (M.S., Computer Science)
Algorithmic Approaches To Solving the Rubiks Cube
Thesis directed by Professor John Clark
This thesis explores the standard algorithmic approaches based on various types of
search algorithms to optimally and suboptimally solve the standard 3x3x3 Rubiks
Cube. As part of the original work done for this thesis, included is an original
method to potentially give solutions that would be shorter than standard solutions
based upon macros. This theoretical approach is based upon Bayesian reasoning.
If it could be implemented, it would have the distinction of being able to solve a
given random instance without resorting to search, as is the case with all other
standard algorithms.
This abstract accurately represents the content of the candidates thesis. I
recommend its publication.
John Clark

1. Introduction...........................................................1
1.1 Description of the 3x3x3 Cube.....................................1
1.2 Difficulty of the Rubiks Cube....................................2
1.3 Types of Pieces...................................................2
1.4 Types of Moves....................................................2
1.5 Singmaster Notation...............................................3
1.6 Fundamental Theorems of Cubology..................................4
1.6.1 Formal Definition of State of the Cube.......................4
1.6.2 First Fundamental Theorem of Cubology........................5
1.6.3 Second Fundamental Theorem of Cubology.......................7
1.7 Description of the 2x2x2 Cube.....................................9
1.8 Orders of the 2x2x2 and 3x3x3 Cubes...............................9
1.9 Diameter of the 2x2x2 Cube.......................................10
1.10 Diameter of the 3x3x3 Cube.......................................12
1.11 Cayley Graph of the Cube.........................................14
1.12 Branching Factor of the Cube.....................................14
1.13 Further Background on the Cube...................................16
2. Standard Optimal Approaches...........................................18
2.1 Gods Algorithm..................................................18
2.2 Basic Tree Searches..............................................18
2.2.1 Uninformed Searches.........................................19
2.2.2 Informed Searches...........................................19
2.3 Iterative-Deepening A*...........................................21
2.3.1 Standard IDA*...............................................21
2.3.2 IDA* with Pattern Databases.................................21
3. Standard Suboptimal Approaches........................................25
3.1 Macro Operators..................................................25
3.2 Important Subgroups of the Cube..................................26

3.3 Thistlethwaites Algorithm......................................27
3.4 Kociembas Algorithm............................................28
4. Proposed Approach Bayesian Reasoning...............................30
4.1 Overview of Proposed Bayesian Approach..........................30
4.2 Problems with Reinforcement Learning............................31
4.3 Description of Bayesian Approach................................33
4.4 Conclusions Regarding Bayesian Approach.........................37
A. Some Group Theory Background of Rubiks Cube........................39
B. Symmetry of the Cube.................................................41
References............................................................. 44

Figure 1-1 Standard 3x3x3 Rubiks Cube.........................................1
Figure 1-2 The 15-Puzzle.......................................................2
Figure 1-3 Number of states vs. distance to goal in HTM for 2x2x2 Cube.........11
Figure 1-4 Number of states vs. distance to goal in QTM for 2x2x2 Cube.........11
Figure 1-5 The superflip of the Cube...........................................12
Figure B-l The 13 axes of rotational symmetry of the cube......................41
Figure B-2 Three orthogonal planes of reflection for the cube..................42
Figure B-3 Six diagonal planes of reflection for the cube......................43

Table 1-1 Number of states per distance to goal for 2x2x2 Rubiks Cube...........10
Table 1-2 Calculation of expected number of moves for the 2x2x2 Cube.............12
Table 2-1 Depth searched for 10 random instances using IDA* search...............23
Table 3-1 Orders and indices for nested subgroups of restricted maneuvers........26
Table 3-2 Number of moves for each stage in Thistlethwaites algorithm...........28
Table B-l Summary of rotations of the cube.......................................42

Before detailing the various methods to solve the Rubiks Cube, several
related concepts must first be introduced.
1.1 Description of the 3x3x3 Cube
The standard 3x3x3 Rubiks Cube in Figure 1-1, also known as the Biivos
Kocka, which is Hungarian for magic cube, was first conceived by Emo Rubik in
1974 (cf. [RVKMV]). It is divided into three 3x3x1 slices along each of x, y, and z
axes such that each slice can be rotated a quarter turn clockwise or counterclockwise
with respect to the rest of the cube. The slices divide each face of the Cube into 9
facelets. The slices also divide the Cube into 33 = 27 pieces, or subcubes, 26 of
which are visible. In its goal state each of the six sides has its own color. Each
visible piece has one, two or three facelets that are a unique subset colors, which
makes it distinct from the others. This preserves the orientation of the piece as it
changes position as the slices are rotated. As such the Cube is a pure permutation
puzzle since any state can be characterized as some permutation of its pieces.
However, it is more complicated than other permutation puzzles like the 15-Puzzle
(cf. [Koe]) shown in Figure 1-2 since the orientations of the pieces change as the
puzzle is permuted.
Figure 1-1 Standard 3x3x3 Rubiks Cube1
1 Figure 1 was generated using Mathematica 4.1 for Students using rubikSim, a Mathematica
package from Imagineering LLC (http://www.imagineeringllc.eom/info.html#rubik%20package).

1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 H
Figure 1-2-The 15-Puzzle
1.2 Difficulty of the Rubiks Cube
This added layer of complexity substantially increases the number of
possible states, which makes the Cube in general more challenging to solve than the
simpler puzzles like the 15-Puzzle. While it is easy to devise straightforward
algorithms to solve the 15-Puzzle by successively placing its pieces, the Cube is
more difficult to apply the same approach since to place a single piece, almost all
the other pieces must be moved to do so. This property, known as a nonserializable
sequence of subgoal states, defies standard AI (Artificial Intelligence) learning
techniques (cf. [Kor6]). As such, the only AI techniques that have been applied
involve some form of search.
1.3 Types of Pieces
Of the 27 subcubes, or cubies, 26 are visible, and of those only 20, the 8
comer pieces with three facelets and the 12 edge pieces with two facelets, actually
move. The comer pieces only move to other comer positions, or cubicles, and edge
pieces only move to other edge cubicles. The center pieces with a single facelet in
the center of each face merely rotate. The comer pieces can be twisted any of three
ways. Likewise, the edge pieces can be flipped either of two ways. Thus, the
comer pieces have three orientations each, and the edge pieces have two
orientations each.
1.4 Types of Moves
By fixing the Cube with respect to the center pieces, it is possible to achieve
any state using outer layers moves only. Any move of a central layer can be thought
of as two moves of the adjacent outer layers in opposite directions. For that reason
a sequence of moves is given in terms of outer layer moves. The minimum number
of moves required to solve a given state is known as its distance from the goal state.
The maximum such distance is known as the diameter of the Cube, which can be
measured using one of two metrics.

The first is the Half Turn Metric (HTM) [also known as the Face Turn
Metric (FTM)] in which a move consists of quarter or half turns of any of the outer
layers of the Cube. The second is the Quarter Turn Metric (QTM) that only
consists of quarter turns, so a half turn would be counted as two quarter turns.
Unless, otherwise noted this thesis will use the HTM when denoting the length of a
sequence of moves.
Any sequence of moves which accomplishes some known purpose is called
a macro operator, or macro for short. For instance, a macro might permute three
comer pieces without disturbing any of the other pieces after the whole macro is
applied. It is possible that two macros might accomplish the exact same purpose
while being entirely different. Macro operators can be used to adapt the successive
placement of pieces approach described above. A majority of the solutions that
have been devised to solve the Cube by hand use this method (cf. [Fr], [Pe],
1.5 Singmaster Notation
Singmaster notation devised by David Singmaster (cf. [SiF]) describes a
sequence of moves. Clockwise turns of the six outer layers, or slices, are denoted
by the capital letters: U, D, F, B, R and L, (up, down, front, back, right and left).
Counterclockwise turns have a superscript of -1: U~', D_1, F^1, B_1, R~' and IT1.
Half turns have a superscript of two: U2, D2, F2, B2, R2 and L2. If a subsequence of
moves is repeated then it is listed in parenthesis with an exponent equal to the
number of repetitions. If the whole cube is rotated, then the Old English letters, U,
D, F, B, R, and L, represents clockwise rotations with respect to that face.
Since coloring of the Cube is not unique (cf. [Si2]), Singmaster notation is
also used to label each of the cubies and cubicles. Lowercase italicized letters given
in clockwise order of how the two or three sides intersect at a given comer or edge
are used to describe a cubie. Uppercase is used to denote cubicles. Since there are
either two or three possibilities depending on which side is listed first, the first letter
is given in order of U, D, F, B, R, L. While the cubicles never change orientation,
the cubies do. So this precedence in labeling cubies is with respect to the goal state.
Thus URF and not RFU would be used to denote the upper front right-hand
cubicle. The cubie corresponding to that cubicle would be denoted by urf if its
orientation has not changed from the goal state, and as either rfu or fur if it had. The
orientation is determined by keeping track of how the pieces move from cubicle to
cubicle with respect to the initial state. If a R move was applied to the goal state,
then u facelet of the urf cubie would move to the b facelet of the UBR cubicle.
Thus, u > b, r > r, and/> u, so the comer piece would then become fur.

Since any permutation is uniquely described by cycle notation (cf. [Rot]),
the effect of any macro is given in the cycle notation of the permutation of cubies
after the macro has been applied. Standard cycle notation for a given permutation
consists of a sequence of parenthesized lists for each cycle of the elements that are
permuted. The notion of cycle comes from the fact that as the same permutation is
applied over and over, an element is permuted through some sequence until it
reaches its initial position. This sequence is known as a cycle. As such, each
element is listed at most once. It is unlisted if it has been unaffected by the
Each cycle contains a list of either comer pieces or edge pieces that have
been permuted. For instance (urfbru drb frd)(ur br drfr) would be the cycle
notation for the R move. If the cubies change orientation after going through a
single cycle, then a subscript is added to denote this change in orientation. If an
edge piece is flipped, a + subscript is added. If a comer piece is twisted clockwise a
+ subscript is added, and if it is twisted counterclockwise a subscript is added.
Thus (ulb)+(ubr)_ is an example of two twists of two comer pieces that happens to
be the effect of the macro L(U2LB_ID2BL_1)L_I, which was one of the original
macros given by Emo Rubik.
1.6 Fundamental Theorems of Cubology
The study of the cube is known as cube theory, or as cubology. The number
of legal states that are attainable by some sequence of elementary moves, i.e. a
quarter turn or half turn of an outer layer, from the goal state, is only one twelfth of
the total number of states that are possible if one can physically dissembling a Cube
and reassembling it. The Fundamental Theorems compose an informal proof, i.e.
without resorting to group theory (cf. App. A), of the fundamentals needed to
determine the total number of legal states. Notation that follows matches [Jol].
1.6.1 Formal Definition of State of the Cube
Let g be a state of the Cube. Then define the tuple (v (g), p(g), w (g), cr(g)):
Let p{g) be the permutation of the set {1, 2, ... 8} (each number uniquely
labels one of the eight comer pieces) that corresponds to the permutation of the
comer pieces of state g.
Let v (g) be an 8-tuple vector of the integers -1, 0, and 1 (each number in the
tuple corresponds to the orientation of the comer piece with respect to p(g)) that
corresponds to the orientations of the comer pieces of state g in the 8 dimensional
vector space over the integers modulo 3.

Let labels one of the twelve edge pieces) that corresponds to the permutation of the edge
pieces of state g.
Let w (g) be a 12-tuple vector of the integers 0 and 1 (each number in the
tuple corresponds to the orientation of the edge piece with respect to corresponds to the orientations of the edge pieces of state g in the 12 dimensional
vector space over the integers modulo 2.
Such a tuple, generally written more succinctly as (v, p, w, a), uniquely
defines any given state g of the Cube. If g is a legal state, then it can also be
represented as some sequence of k moves from the goal state, (X) ... Xk_ i, X*)
where X, is an elementary move, i.e. using Singmaster notation:
X,e U {U;, D', F;, EL, L;, R'} for 1 < / < k.
Note that X1 = X, where X e {U, D, F, B, L, R}. Thus, each X, has 18
possible values. Unlike the tuple representation, a given g might have and most
likely does have multiple such representations bounded by a length k given that k is
large enough.
1.6.2 First Fundamental Theorem of Cubology
This is more of a definition than a theorem. It formalizes the notions
introduced in section 1.5. It is a paraphrase of the theorem in [Jol, pp. 148-149]:
First Fundamental Theorem of Cubology:
Apply a + mark, known as a standard reference mark, to the following
facelets of the cubicles of the Cube:
U and D facelets of the eight edge pieces in the top and bottom slices
F and B facelets of the four edges in the middle slice
U and D facelets of the eight comer pieces
Each facelet of a cubie also has a standard reference mark corresponding to
the reference mark of its cubicle in the goal state.
Each move X applied to the Cube gives a new set of markings, known as the
marking relative to X. This set of relative markings uniquely determines the
orientation of each cubie. Specifically in terms of the tuple notation:

The orientation of a cubie is 0 if its V mark relative to X is on the
same facelet as the standard reference mark of the current cubicle.
If the cubie is an edge piece and does not have an orientation of 0,
then its orientation must be 1.
If the cubie is a comer piece and does not have an orientation of 0, its
orientation is 1 if it requires a -271/3 radian twist and -1 if it requires
a 2nl 3 radian twist for its mark relative to X to match the standard
reference mark of the current cubicle.
The correctness of this theorem depends on the precedence of the
Singmaster notation for the first letter used to label a cubicle, which is U, D, F, B, R,
L. Since + marks are applied in order of the precedence, if one refers to the way in
which the orientation is kept track of using Singmaster cycle notation, then one can
see that if a cubie changes orientation, then the cubie requires either a flip if it is an
edge piece or a 2n/3 radian twist if it is a comer piece for the notation of the cubie
to match its original notation in the goal state.
Lets illustrate this with an example. If a F turn is applied to the goal state,
the fr edge piece goes to the DF cubicle. Since />/and r > edge cycle would be (fr fd ...), which would indicate the piece changed orientation
since the original cubie in the DF cubicle in the goal state is has a higher precedence than F. Examining the relative reference mark of the fd
cubie, the + mark is on the F facelet and not on the D facelet as is the standard
reference mark of the DF cubicle. Hence, the orientation changed, and the two
ways of tracking orientation coincide.
The orientation given to a comer piece if its relative reference mark does not
match the standard reference mark is based on what rotation, a 2n/3 radian twist,
needs to be applied to comer piece to make the two marks match. We would like a
positive 2n/ 3 radian twist applied to a comer piece of orientation 0 to then have an
orientation of 1. However, since the definition is terms of the rotation to make the
relative mark match the standard reference mark, such a rotation would be in the
opposite direction of the twist that changes orientation from 0 to 1, and hence would
be a -27t/ 3 radian twist.
Let illustrate this with another example. Consider the urfedge piece in the
URF cubicle in the goal state. Since u > b, r r and/> u for a R move, the
first part of the cycle notation for this move would be (urfbru ...). Notice that the
edge piece has change orientation since the original cubie in the UBR cubicle in the
goal state is ubr and not bru since U has a higher precedence than B or R. If the

letters are shifted clockwise to the right, the orientations would match. This
clockwise shift to the right corresponds to a 2tt / 3 radian twist need to make the
letters match, so the orientation of the comer piece would be -1 according to the
definition given in the theorem. Now lets examine the relative reference mark of
the urfedge piece after a R move. The + goes from the U facelet to the B facelet
of the UBR cubicle, which requires a 2n/3 radian twist to correct, and hence the
orientation is also given as -1.
The above examples can be generalized for any elementary move acting on
edge piece or comer piece. The correctness for each depends on the precedence of
the Singmaster notation matching the precedence of applying the V marks, which
it does. Hence, the two the ways of tracking the orientation of a cubie through
Singmaster cycle notation and reference marks give the same resulting orientation
for any given elementary move. Since any state g can be thought of as some
sequence of elementary moves, this correctness of orientation must be preserved for
the whole sequence. Therefore, the orientation given by applying the theorem to the
tuple representation of g gives the same orientation if one were to keep track of the
all the changes of orientation for any sequence of moves giving that state.
1.6.3 Second Fundamental Theorem of Cubology
This theorem lays the groundwork to determine the number of legal states.
It is given in many forms. This is a paraphrase of the version given in [Jol, p. 182],
Second Fundamental Theorem of Cubology:
A four tuple (v, p, w, a) as defined above is a legal state if and only if
(a) sgn {p) = sgn (a)
(b) vi + V2 + ... + vs = 0 mod 3
(c) w\ + W2 + ... + w 12 = 0 mod 2
Where the function sgn gives the parity of the permutation, which is 0 if it is
an even permutation, i.e. a permutation that can be arrived at by an even
number of transpositions, or swaps of cubies, and 1 otherwise. This function
is proved to be well defined in [RJ, pp. 1-9].
Part (a) is the easiest to prove given a basic knowledge of permutations (cf.
[Rot, pp. 2-9]). Since any quarter turn consists of two four cycles, which each are
odd permutationsa four cycle consists of three swaps, \.e. (a b c d) = (a b)(b c)(c
d) using cycle notation for permutationsthe parity of both the edge and comer
pieces flips. However, since they both flip simultaneously, the corresponding parity
of the two sets is maintained.

The proof for the remainder of this theorem can be given using reference
marks as outlined in the First Fundamental Theorem. Part (c) will be proved before
part (b). Consider any given sequence of elementary k moves (Xi ... X*_i,X*)
applied to the goal state that give a state g of the Cube.
If Xj is a turn of the U or D slice, then the orientations of all the edge pieces
in that slice are preserved for that turn since each U or D facelet goes to another U
or D facelet. Since U and D always have the highest priority, a U facelet with a +
mark keeps it, retaining its 0 orientation. If it does not, it retains its 1 orientation.
Similarly, if X, is a turn of the R or L slice, then the orientations of all the
edge pieces in that slice are preserved for that turn since each R or L facelet goes to
another R or L facelet. Since R and L have the lowest priority, if a R facelet with a
+ mark keeps it, retaining its 1 orientation. If not, it retains its 0 orientation.
If X, is a half turn of the F or B slice, then the orientations of all the edge
pieces in that slice are preserved for that turn since each U or D facelet goes to
another D or U facelet and each R or L facelet goes to another L or R facelet. The
same arguments for the above two cases apply, and the orientations are maintained.
The only change of orientations of edge pieces comes from a single quarter
turns of the F and B slices. However, in this case the orientations of all 4 edge
pieces in the slice rotate change orientation since each U or D facelet goes to a R or
L facelet and each R or L facelet goes to a U or D facelet. So the priorities are
swapped for each edge piece, and the orientations for each change.
The change of orientation ce is then wi + h>2 + ... + wn + ne = ce mod 2
where the w,s are the initial orientations and ne is the number of edge pieces that
change orientation, which is has shown to be either 0 or 4. Thus, it is easily seen
that ce = w\ +W2 + ... + vvi2 + ne = 0 + 0 = 0 mod 2, so property of part (c) is
maintained for any elementary moves X,. Therefore, the property must be maintain
for any legal state, which is arrived at by some such sequence of elementary moves.
The proof for part (b) is similar to that of part(c). However, rather than trace
through each case of how the relative reference marks are altered for each set of
elementary moves, we can observe if a standard reference mark for the cubicle in
which the cubie resides would be taken to another standard reference mark of
another cubicle, then the orientation of that cubie would be maintained since the
mark fixes the orientation of that cubie. If a 2tt/ 3 radian twist is required to make
the marks match then the orientation of the comer piece has changed by -1 or 1

Then we can consider the effects of the a single quarter turn. If it is a turn of
the U or D slice, the orientations of all four comer pieces in the slice are preserved.
If it is a turn of any of the other slices, then two of the U and D facelets go to R and
L facelets and the other two go to F and B facelets. To correct this, a 2tt/3 radian
twist will need to be applied to two of the opposite comer pieces in the slice, and a -
27t/ 3 radian twist will need to be applied to two opposite comer pieces. This means
that the orientations of two of the comer pieces in the slice will have changed by a -
1 and the other orientations of the other two comer pieces will have changed by a 1.
The change of orientation cc is then cc = v\ + v2 + ... + v\2 + nc = 0 mod 2
where the v,s are the initial orientations and nc is the change in orientation of comer
pieces that is either the sum of0 + 0 + 0 + 0orofl-l + l- l, which are both 0.
Using the same argument as for part (c), part (b) is true for any legal state.
1.7 Description of the 2x2x2 Cube
The 2x2x2 Cube is the same as 3x3x3 version except that there are only two
slices along each axis instead of three (cf. [Scher2]). This eliminates the presence of
either edge pieces or center pieces. Without center pieces, the orientation of the
2x2x2 Cube is not fixed with respect to any given move. For instance, the two
moves U and D-1 are essentially the same. Thus, there are only three unique turns
along each of the three axis that give distinct configurations. This gives a total of 9
unique elementary moves rather than 18. Furthermore, since there are 24 possible
rotations of a standard cube (cf. App. B), the total number of unique configurations
is l/24th of the number of possible states neglecting symmetries of rotation.
1.8 Orders of the 2x2x2 and 3x3x3 Cubes
The number of unique states in a puzzle S is known as its order, written | S |.
Let G2 and(?3 be the set of all possible unique states of the 2x2x2 and 3x3x3 Cubes
respectively, which can be achieved if one can takes a Cube apart and then
reassemble the pieces in any order. Note in the case of G2, it only includes states
that are unique under rotation. Let C2QG2 and C3 c G3 be subsets that contain the
legal states reachable by some sequence of elementary moves.
The 4-tuple representation can represent any state in G3. The order of G3
would be the all the possible permutations and orientations for a tuple (v, p, w, a),
which gives |Gj | = 38 8! 212 12! = 519,024,039,293,878,272,000 = 5.2 x 1020.
The factorial terms come from the number of permutations of the eight comer
pieces and twelve edge pieces. The factors of 38 and 212 come from all the possible
orientations of the comer and edges pieces.

Using the 2nd Fundamental Theorem, |C31 = \Gj | / (2 3 2) = \G^ \ / 12.
Each of terms of 2, 3, and 2 come from parts (a), (b) and (c) of theorem
respectively. Thus, |C31 = (38 8! 212 12!) / 12 = 43,252,003,274,489,856,000 =
4.3 x 1019.
The first two terms of 4-tuple representation can represent any state in G2.
The order of G2 would be the all the possible permutations and orientations for a
tuple (v, p) divided by the total number of rotational symmetries. This gives a
significantly smaller number for | G2 | = 38 8! / 24 = 11,022,480 = 1.1 x 107. Only
the part (b) of the 2nd Fundamental Theorem applies, so one can calculate the order
to be |C2 | = |G2 | / 3 = 3,674,160 = 3.6 x 106. The 3x3x3 Cube has substantially
more states than the 2x2x2 Cube since |C31/ |C21 = 11,771,943,321,600 = 1.2 x 1013.
1.9 Diameter of the 2x2x2 Cube
Since 2x2x2 Cube has significantly smaller order than the 3x3x3 Cube, it is
possible to completely determine the distance of each state from goal using standard
BFS. Table 1-1 gives the number of states per distance to goal for both metrics.
0 1 2 3 4 5 6 7 8 9 10 11 Total
0 1 1
1 6 6
2 3 24 27
3 24 96 120
4 6 144 384 534
5 72 768 1,416 2,256
6 9 564 3,608 4,788 8,969
7 126 3,600 15,072 14,260 33,058
8 5 1,248 18,996 57,120 36,780 114,149
9 120 9,480 85,548 195,400 69,960 360,508
10 1,728 55,548 341,604 487,724 43,984 930,588
11 72 14,016 238,920 841,500 256,248 96 1350,852
12 1,032 56,440 455,588 268,532 944 782,536
13 12 928 32,968 54,876 1,496 90,280
14 8 160 108 276
Total 1 9 54 321 1,847 9,992 50,136 227,536 870,072 1,887,748 623,800 2,644 3,674,160
Table 1-1 Number of states per distance to goal for 2x2x2 Rubiks Cube2
As can be seen the diameter of 2x2x2 Rubiks Cube has been determined to
be 11 under the HTM and 14 under the QTM.
2 Data is from Jaaps Puzzle Page http://www.geocities.eom/iaapsch/puzzles/cube2.htm#numpos.

Figure 1-3 and Figure 1-4 are logarithmic plots of the data in Table 1-1,
which demonstrate that the number of states increases exponentially with respect to
the number of optimal moves to solve the 2x2x2 puzzle.
Half Turn Metric
Figure 1-3 Number of states vs. distance to goal in HTM for 2x2x2 Cube
Quarter Turn Metric
Number of Moves
Figure 1-4 Number of states vs. distance to goal in QTM for 2x2x2 Cube
This trend continues up to number of moves that has the maximum number
of states. Then the number of states decreases exponentially. The expected value e
of the number of optimal moves, m(i), to solve one of these puzzles is the centroid
of the plot. It can be written as
e = E[m(i)] = ------ where N is the diameter for that metric.

Because of the exponential nature of this graph, this number is fairly close to
the number of moves with the maximum number of states. This can clearly be seen
in Table 1-2 below.
Number of Moves Half Turn Metric Quarter Turn Metric
/ v/a/evHTM (0 i x stales^ (i) slatesQTM (0 i x statesQm (t)
0 1 0 1 0
1 9 9 6 6
2 54 108 27 54
3 321 963 120 360
4 1,847 7,388 534 2,136
5 9,992 49,960 2,256 11,280
6 50.136 300,816 8,969 53,814
7 227,536 1,592,752 33,058 231,406
8 870,072 6,960,576 114,149 913,192
9 1,887,748 16,989,732 360,508 3.244,572
10 623,800 6,238,000 930,588 9.305,880
11 2,644 29,084 1,350.852 14,859,372
12 0 0 782,536 9,390,432
13 0 0 90,280 1,173,640
14 0 0 276 3,864
Total 3,674,160 32,169,388 3,674,160 39,190,008
Expected Value 8.7556 10.6664
Maximum at 9 moves Maximum at 11 moves
Table 1-2 Calculation of expected number of moves for the 2x2x2 Cube
1.10 Diameter of the 3x3x3 Cube
The best known lower bound is based on the superflip shown in Figure 1-5,
which can be constructed by taking the goal state and then flipping all the edges
without changing their position.
Figure 1-5 The superflip of the Cube

The superflip has been optimally solved using search techniques later
discussed in chapter 2 that exhaustively explore all possible sequences of
elementary moves (X( ... i, X*) in a given metric by incrementally increasing k
until a solution of the shortest possible length has been found.
Dik Winter (cf. [Wil]) found Macro 1-1 in May of 1992, which is 20 moves
in the HTM that solves the superflip. Mark Longridge (cf. [Lo]) found Macro 1-2 in
January of 1995, which is 24 moves in the QTM that also solves the superflip.
Macro 1-1: F B U2 R F2 R2 B2 IT1 D F U2 R~' L'1 U B2 D R2 U B2 U
Macro 1-2: R'1 U2 B L'1 F IT1 B D F U D"1 LD2F'R B~' D F~' LT1 B-1 U D~
These establish the current best known lower bounds for the diameter of the
Cube to be 24 moves in the QTM and 20 moves in the HTM. So far no other
configuration have been discovered to require more than these number of moves in
either metric.
The best known upper bound comes from determining the maximum length
of solutions generated by best suboptimal algorithm to date, known as Kociembas
Algorithm, which is later discussed in section 3.4. It has been shown (cf. [Koc],
[Sch+1]) that Kociembas Algorithm will always produce a solution of no more than
29 moves in the HTM and 42 moves in the QTM.
It is generally thought that the 3x3x3 Cube would share similar
characteristics in terms of the diameter with the 2x2x2 Cube. This would mean that
the diameter would only exceed expected number of moves only by a few moves. It
will later be shown in section 2.3.2 that the number of expected moves is no more
than 18 in the HTM for the 3x3x3 Cube. Since this is twice expected number of
moves of the 2x2x2 Cube, one would expect the diameter of the 3x3x3 Cube to be
no more than twice the diameter of the 2x2x2 Cube, which would be 22.
This is also confirmed by a search done in August of 1993 by Dik Winter
(cf. [Wi2]) for 9,000 instances of the 3x3x3 Cube in which none of them required
more than 20 moves. Please note for future reference that the solutions found
werent necessarily optimal. Referring to Table 1-2, one can calculate the ratio of
the number of states at expected distance from goal e over the number of states at a
distance equal to the diameter d. Let statesmu be the function that gives the
number of states at a particular distance from goal in the HTM.
This calculation gives states^TM (e) / stateshtm (d) ~ 1,887,748 / 2,644 ~
714. This leads one to suspect that if the 3x3x3 had a similar ratio, that out of 9,000

instances, most of which are at an expected distance from goal, at least one should
be at a distance equal to the diameter. This suggests that it is extremely unlikely
that the diameter is more than 21 or 22, and that the diameter is most likely to equal
to the lower bound of 20.
1.11 Cayley Graph of the Cube
The Cube can be represented by a graph in which each vertex, or node,
represents a unique state and each edge represents a transition between adjacent
states. A state is considered adjacent if it is one move away in the HTM, i.e. one
elementary move away. This gives 18 possible adjacent states for each vertex since
there are a total of 18 elementary moves.
Since an edge is connected between two adjacent nodes, this gives a total of
18-| C31 / 2 = 389,268,029,470,408,704,000 = 3.9 x 1020 edges in the graph.
Isometric graphs like these are known as Cayley graphs (cf. [DM, p. 40]). These
kinds of graphs have the same structure with respect to any node.
This graph has a connectivity of 18. This results in a highly interconnected
graph since no more than 20 or so moves, the number edges along an optimal path,
which is the diameter in the HTM, are required to go from any one state to another.
1.12 Branching Factor of the Cube
To provide some of the background to the brute force search techniques used
to the solve puzzles like the Rubiks Cube, the concept of branching factor needs to
be introduced. Imagine picking a node in the above graph that corresponds to the
initial state and then employing a search like BFS to explore the graph to find the
goal state. The branching factor is the ratio of the number of nodes explored at a
given depth d + 1 over the number of nodes explored at a depth of d. We want to
reduce the number of duplicate nodes, that is previously visited nodes, explored.
This technique is called pruning (cf. [KorTa2], [Scherl]).
If we employ some method of pruning, then one uses the notion of the
average branching factor at a distance of d, which is the total number of nodes
actually explored up to a depth of d over the number of nodes encountered up to a
depth of d 1. The asymptotic branching factor can be thought of as the average
branching factor taken at a distance in which the pruning technique becomes
ineffective, which will happen since we are dealing with finite graphs. Often in the
literature the term branching factor is synonymous with the asymptotic branching
factor since that is ultimately what determines the time bound of the search.

Here is how to calculate the average branching factor of 13.5 after two
moves for the 3x3x3 Cube: For the first move one can make any of the 18 possible
moves in the HTM. In making the second move, since one would not rotate the face
just turned, there are 5 remaining faces to rotate, or 5 x 3 = 15 possible moves.
Since turning the two opposing faces in either order results in the same state, this
gives 27 duplicate states after two turns. This can be seen from there being three
pairs of opposite faces, and each face can be rotated by either a quarter turn in either
direction or by a half turn, which gives 3 x (3 x 3) = 27 duplicate states.
To calculate the average branching factor after two moves one takes the total
number of unique states after two moves and divides by the original number of
moves for the first move, which is 18. This gives an average branching factor after
two moves of b (18 x 15 27) / 18 = 15 1.5 = 13.5. One would expect the
average branching factor to decrease with more moves since more duplicate states
would become possible. However, this requires a computer to enumerate these
duplicate states of more than two moves since the calculation quickly becomes too
complicated to manually compute.
Using this pruning technique of eliminating duplicate states between two
consecutive moves, the asymptotic branching factor for the Cube is 13.34847, or
approximately 13.35, as given in [Ko3],
Here is another example of approximating the average branching factor after
two moves for the 15-Puzzle: If the empty square is in one of the four inner
squares, there are four possible moves. If the empty square is one of the eight edge
squares, there are three possible moves. Finally if the square is one of the four
comer squares there are only two possible moves. Thus, the average number of
moves available is (4 4 + 3 8 + 2 4) / 16 = 3 for the first move.
Since one would not move a tile back to the previous position, the average
number of moves for the next move is one less than this, which is 2. The average
branching factor will tend toward 2 as one explores successively deeper depths.
However, this doesnt take into account that the positions in the middle are more
commonly used along optimal paths than ones around the edges. The asymptotic
value of the branching factor for the 15-Puzzle is 2.13 as determined in [KorEd],
One reason why the Rubiks Cube is more difficult than simpler puzzles like
the 15-Puzzle is due to its relatively high branching factor. Just as Tic-Tac-Toe is
an easier game than chess to look ahead for the best move because of its
significantly smaller branching factor, the 15-Puzzle with a branching factor of 2.1
is an easier puzzle to solve than the 3x3x3 Cube with a branching factor of 13.3.

1.13 Further Background on the Cube
After Emo Rubik developed a working model for his puzzle and found a toy
company to market the his Magic Cube, it took his country by storm in the years
1978 to 1980, and subsequently the world by 1981. To date over a 100 million
have been sold worldwide. The Rubiks Cube is an example of a superset of toy
puzzles that has fascinated amateur puzzle solvers, mathematicians and computer
scientists for the past two decades.
While most of the work done involving the group theoretical properties of
the Rubiks Cube was done by mathematicians in the early 1980s, computer
scientists, particularly in the field of AI have continued to be challenged in
obtaining the elusive Gods Algorithm for the Rubiks Cube, an algorithm that
would somehow produce the optimal solution for any of its given states. Failing
that finding even near optimal solutions proved to be a difficult task, taking several
years to achieve.
Part of the attraction of the Rubiks Cube is the simplicity of its design.
Contrary to ones initial intuition, it yields an extraordinarily large number of states.
This number, which was previously shown to be about 4.3 x 1019, is far larger than
the billions of possible positions advertised on its box, which just goes to show
how underestimated this puzzle can be.
However, it is not only the sheer number of states that leads to its
complexity as previously discussed in section 1.2. The far easier to solve 15-Puzzle
and the related 5x5 version of the 24 Puzzle also have a large number of states,
which can be seen to be 16! / 2 =1013 and 25! / 2 = 7.8 x 1024 respectively (cf.
As a side note, the dividing factor of 2 in the above calculations comes from
only even permutations being possible. One reason for the 15-Puzzles initial
popularity is that its creator Sam Loyd offered a $1,000 prize during the 1870s for a
solution to a state that was an odd permutation, i.e. the 14th and 15th tiles swapped.
Since such a state isnt achievable using legal moves from the goal state, no solution
could be found.
Not only do both of the sliding puzzles have a larger number of states as
does the Rubiks Cube, but both puzzles generally require far more than the 20 or so
moves need to solve the Rubiks Cube. The actual diameter of the 15-Puzzle has
been determined to be 87 in [KO], However, the average number of optimal moves
is 53 (cf. [Kor2]). The diameter for the 24-Puzzle is unknown. However, in

[KorTal] they used IDA*, discussed in section 2.3, to solve 10 instance of the 24
Puzzle and found 102.6 moves to be the average.
In addition to its higher branching factor, the Rubiks Cube, unlike sliding
tile puzzles that move one piece at a time, has eight pieces that change position and
orientation with each turn of a face. As a result, the Rubiks Cube is generally the
most complex and difficult puzzle that most people have encountered in their
lifetime since the chances of stumbling onto the solution is very small. Even
determined mathematicians with a thorough knowledge of group theory, which
describes the Cube mathematically, have taken weeks, if not months, to find a
solution without seeing one beforehand.

2. Standard Optimal Approaches
To date all the various optimal approaches requires some form of brute force
search. Each of these methods have their own pros and cons. However, of these
optimal methods only IDA* with the use of pattern databases (cf. [Kor3]) has been
shown to be practical.
2.1 Gods Algorithm
When someone speaks of Gods algorithm (cf. [Scherl]) it can be in one of
two contexts. If it is a puzzle like the 3x3x3 Cube for which no algorithm exists
that can produce an optimal solution for every instance, then one would say that no
one has discovered Gods algorithm for the Rubiks Cube. However, if it is a
simpler puzzle such as the 2x2x2 Cube for which every state can be enumerated in
memory, then when one speaks of Gods algorithm, they are referring to a particular
approach used to solve that simpler puzzle.
Gods algorithm uses a lookup table that is keyed over the various instances
of the problem. Rather than giving the solution, it gives the minimum number of
moves remaining to reach the goal state. Then to solve a particular instance one
simply chooses the next move that is one closer to the goal state. Such a lookup
table would be generated using one of the basic tree searches given in section 2.2.
The lookup table can be further refined to only store the number of moves
left modulo 3, which only requires 2 bits per entry. This is sufficient in picking the
next move since any adjacent move is one less, the same or one more towards the
solution. Hence for the case of the 2x2x2 Cube, since 4 states can be stored in each
byte the size of such a lookup table would be |C*21 / 4 = 3,674,16014 918,540
bytes, which is less than 1 Mb.
2.2 Basic Tree Searches
All of the following searches employ the idea of exploring a graph as
detailed in section 1.11. This can be thought of as tree in which the root node is the
initial state, and each level below is the nodes explored at that given depth. The
nodes in the tree need not be unique. Pruning techniques discuss in section 1.12 can
be used to reduce the likelihood of encountering duplicate states.

2.2.1 Uninformed Searches
Given the extreme number of state, 4.3 x 1019, uninformed searches such as
BFS (breadth first search) or DFS (depth first search) turn out to be impractical.
While BFS a slow and methodical search that is guaranteed to produce an optimal
solution, it doesnt work since all the states must be kept in memory, which clearly
exceeds the memory limit of on any conventional computer.
DFS fails for several reasons. First of all, there is no guarantee it wont run
into some infinite cycle. To correct this one can limit the depth to which it explores.
However, then there is no guarantee it will find the goal state. To further correct the
problem, one can employ an incremental DFS, in which one incrementally increases
the depth being explored.
The advantage of DFS over BFS is that it can potentially find the solution
quite quickly if it happens to select nodes that lead to the solution. It also has the
advantage of only having a linear memory complexity that is 0(bd) where b is the
asymptotic branching factor and d in this case is the expected depth of the search.
The disadvantage of DFS used without iterative deepening is that it is not optimal in
that there is no guarantee it will choose the shortest path, nor is it complete in that
there is no guarantee it will explore the whole graph.
Both DFS and BFS have a time complexity of 0(bd) (cf. [RN]) where b is
13.35 as given in section 1.12 and d is 17.941 as given in Table 2-1. Then, the time
required for a typical instance of the Cube is roughly bd = 13.3517941 = 1.56xl019,
which would require nearly 5,000 years if every node only took a nanosecond to
explore. It is for this reason that both searches are ultimately impractical.
2.2.2 Informed Searches
Informed searches like best-first search keeps a queue of the nodes to left to
explore and orders them by some evaluation function that attempts to select the best
nodes to search first. Two versions of best-first search base the evaluation function
either on the distance incurred or estimated distance remaining. Greedy Best-First Search
Greedy best-first search orders the nodes in the queue based on an estimate
of the distance remaining to the goal state. As the greedy estimate becomes
accurate, this algorithm becomes more similar to Gods algorithm.

The question of how to come up with such an estimate isnt easy. To
increase the likelihood of the solution found to be optimal, it helps if the estimate
never overestimates the distance remaining. Such an estimate is known as an
admissible heuristic. This increases the chances that the shortest path will be found.
Greedy search is also a way of directing a standard DFS search. It chooses
the next node to explore based on some greedy estimate of the distance remaining to
the goal state. However, this approach cannot be used in conjunction with iterative
deepening since that would force the tree to be explored up to a given depth d. As
such it suffers from the same limitations of DFS in that it is not optimal nor
complete. However, it is more likely to find the solution before iterative deepening
DFS. Uniform Cost
Uniform cost best-first search, known more simply as uniform cost, orders
the elements in the queue by the distance from initial state. Here the distance for the
Rubiks Cube is simply based on the number moves from the initial state, which
makes this uniform cost equivalent to BFS. However, if one were to favor some
moves over others and weight them accordingly, then uniform cost would still
produce an optimal solution, i.e. a path to the goal with minimum weight. A*
A* combines the two best-first searches of uniform cost with the greedy
search using an admissible heuristic to produce an optimal search that is more
efficient than standard BFS, by setting its heuristic to be the sum of the distance
from the initial state and underestimate of the distance to the goal state. A proof of
the optimality and completeness of A* is given in [RN, pp. 99-100], The basic idea
behind the proof of optimality is that by keeping track of distance one has come in
addition to how far one has to go, one will never choose a path longer than
The problem with A* is that it has a space complexity of 0(bd), which
quickly exceeds any reasonable memory limit if the either the branching factor or
the expected depth are too great. This happens with both the 15-Puzzle and the
Rubiks Cube. Even though the 15-Puzzle has a relatively low branching factor, it
has an expected depth of 53 as given in section 1.13. A* fails with the 3x3x3 Cube
due to its relatively high branching factor of 13.35.

2.3 Iterative-Deepening A*
Iterative-Deepening A*, also known as IDA*, was discovered by Richard
Korf in 1985 (cf. [Kor4]). It has since become the dominate method to solve this
class of puzzles. Where A* fails, IDA* tends to succeed given strong enough
admissible heuristic.
2.3.1 Standard IDA*
IDA* is simply A* with iterative deepening. This reduces the memory
complexity from 0{bd) to 0(bd). It does this by keeping track of the maximum
depth explored from the last iteration and incrementally increases it. By doing so, it
no longer needs to store all the states in memory. The optimality of A* still applies,
so EDA* also gives an optimal solution. The exact algorithm of IDA* can be seen
in [RN, p. 107],
The time complexity is Q,(bd~e) where e is the expected value of the
admissible heuristic. This can be seen in that more e approximates the distance
remaining, the more directed the search becomes. It will be seen in the section 2.3.2
that this is a lower bound in that it most likely underestimates the time complexity.
An example of an admissible heuristic used with A* is the Manhattan
distance. Applied to the 15-Puzzle, it is defined as the total number of grid units to
move each piece from its current location to its goal location. The number of moves
required to solve a puzzle for a given instance is guaranteed be at least its
Manhattan distance. While A* is unable to solve the 15 Puzzle within current (or
foreseeable) memory limits due to its exponential memory requirements, 100
optimal solutions for the 15-Puzzle were found using IDA* by using the Manhattan
distance as a heuristic in [Kor4],
The Manhattan distance is less useful as a heuristic for the Cube since if one
were total the number of moves necessary to place each cubie, one must divide that
total by eight since eight cubies are moved for each rotation of a slice of the Cube.
The expected value of this heuristic is only 5.5 (cf. [Kor3]), which is insufficient to
solve most states since here bd~e = 13.351794-55 = 13.351244 = 1.0xl014, which is still
too larger of a number.
2.3.2 IDA* with Pattern Databases
To achieve a useful admissible heuristic for puzzles with a fairly large
number of states, pattern databases are used by explicitly storing the number of
moves remaining to solve a particular subproblem. For instance, the 15-Puzzle can

be divided into two partitions of seven and eight tiles. The number of moves to
solve each of these subproblems is stored in a database that can be indexed
effectively for quick access. To combine these two heuristics admissibly, the
maximum of the two is taken. This was the first application of pattern databases
(cf. [CS1], [CS2]) to solve this class of problems. This reduced the number of nodes
expanded by IDA* by a factor of thousand, compared to using the Manhattan
heuristic, increasing the speed of the algorithm by a factor of 12.
In his 1997 paper [Kor3], Korf used the eight comers as one subproblem and
two sets of six of the twelve edges for the two others, which completely covers the
Cube in that any of the 20 movable pieces is in one of these three sets. For his
heuristic he examined the eight or six cubies in one of these smaller problems
(smaller in the sense of far fewer states) and looked up the remaining number of
moves to solve that subproblem. To create an admissible heuristic, he took the
maximum of the three.
For the eight comer cubies, there are (3 8!) / 3 = 88,179,840 unique states
(where dividing factor of three again comes from part (b) of the 2nd Fundamental
Theorem given in section 1.6.3), which allows for using a brute force search to find
optimal solutions. Similarly for six of the twelve edge cubies there are 26 12! / 6!
= 42,577,920 unique states. Since the number of moves for either subproblem
requires no more than 16 moves, this takes no more than 4 bits, of half of a byte, to
store each move giving a total of 44,089,920 bytes for the comers and 21,288,960
bytes for each of the sets of six edges. The databases required 44,089,920 + 2
21288960 = 86,667,840 bytes or approximately 82+ megabytes. Using a standard
BFS, Korf was able to populate all three databases in about an hour.
The expected value of the heuristic for the comer cubie database was 8.764,
whereas the expected value for either of the edge cubie databases was 7.668.
However, by taking the maximum of the three, the expected value of the heuristic
increased to 8.878, which lead to significant improvement in performance.
One would then expect the time complexity for IDA* using pattern
databases to be roughly 0{bd~e) where again e is the expected value of the
heuristic. However, Korf argues in that this estimate of the time complexity, which
is proportional to the number of nodes expanded during the search, is overly
optimistic. He shows this by taking a typical value of d = 17, and doing the
calculation, b d'e = (13.35)1 -8 878 = 1,384,072,500, which is two order of
magnitude less than the complete search by IDA* to depth 17 that expands roughly
122 billion nodes. He argues that the reason for this is that IDA* isnt completely

random, and focuses more of its search on smaller values of the heuristic, and thus,
lowering its effective expected value.
To compensate for this overly optimistic value for e, he gives a fairly
informal argument that attempts to compensate for this optimistic calculation of b d
e by taking a pessimistic value for e, namely the log base b of the number entries m
in the pattern database. This is an underestimate of the expected number of optimal
moves to solve that subproblem. For instance, calculating the pessimistic value for
e for the comer cubie database, one gets e ~ log/,m ~ logi3.35 (88,179,840) = 7.06,
which is lower than experimentally determined value of 8.764 given above.
Similarly, he takes d to be the log base b of the order n of the Cube, which actually
gives a closer estimate since d ~ log/, n ~ lognjs (4.3xl019) = 17.44. So his
calculation goes as follows for the time complexity t, using the above values for d ~
log/, n and e ~ log/, m:
t~bd-e ^b^n~loehm =n/m
This is his argument that the time complexity of IDA* is better described by
0(n / m). Plugging in numbers for n and m for this pattern database gives
n/m = 43,252,003,274,489,856,000 / 173,335,680 = 249,527,409,904
which is a factor of 1.4 off of the average number of nodes generated for the
ten cases, which is 352,656,042,894 as can be seen in Table 2-1 below in which the
first three columns of this data were taken from [Kor3]. The calculations of totals
and averages were done independently.
Number Depth # Nodes Generated Depth x # Nodes
1 16 3,720,885,493 59,534,167,888
2 17 11,485,155,726 195,247,647,342
3 17 64,837,508,623 1,102,237,646,591
4 17 126,005,368,381 2,142,091,262,477
5 18 262,228,269,081 4,720,108,843,458
6 18 344,770,394,346 6,205,867,098,228
7 18 502,417,601,953 9,043,516,835,154
8 18 562.494,969,937 10,124,909,458,866
9 18 626,785,460,346 11,282,138,286,228
10 18 1,021,814,815,051 18,392,666,670,918
Total 3,526,560,428,937 63.268,317,917,150
Average 17.50 352,656,042,894
Weighted Average 17.941
Table 2-1 Depth searched for 10 random instances using IDA* search

However, by examining Table 2-1, the average value for his ten instances is
17.5, and the average weighted by the number of nodes expanded is 17.941. This
last number is probably closer to the actual expected value of the depth of a typical
search than is 17. As in the case of the 2x2x2 Rubiks cube, see Table 2, it is
slightly less than the median value of 18. In a later paper, Korf with Michael Reid
and Stefan Edelkamp (cf. [KorREd]) pointed out that the number of nodes expanded
by IDA* is actually one more than d-e taking into account the frontier of the
search. Taking these factors into account, the calculation then becomes b d~e +1 -
(13.35)'7 941 8*78 + 1 = 211,150,685,393, which is off by a factor of 1.67 from the
average number of nodes. So Korf s argument given in [Kor3] may not hold.
Regardless of which argument is used, the time complexity is still roughly
inversely proportional to the amount of memory. By increasing the database to
include one more cubie in the subproblem, it roughly increases the number of
entries by a factor of b. For instance adding an extra edge to the six edge cubie
database would increase it by a factor of (27 12! / 5!) / (26 12! / 6!) = 2 6 = 12,
which is close to b. By adding an extra cubie, one would expect for the expected
value of the heuristic to increase by roughly one since the difference in the expected
value of the heuristic would be given by log t, mnew log b mid = log t, mnew / m(,id-
Using the above example with the above numbers this would give an increase of the
expected value of log/jj.-, 12 = .96.
This is only an approximate estimate since combining the heuristics of the
different databases wouldnt necessarily propagate this change in the expected value
for the combined heuristic. That is one reason why Korfs argument provided in
[Kor3] is overly simplistic since it doesnt take this into account.
Regardless, the main restriction on pattern databases is the amount of RAM
available to hold the entire database for quick access. Korf explains that if one were
to use a hard drive to hold the database, the disk latency significantly reduces the
performance of the search. For instance, he was able to process 700,000 nodes a
second storing the entire database in physical memory, whereas using a disk drive to
hold the pattern databases that number dropped to a 1,000, nearly by three orders of
A modem implementation of this method is used by Cube Explorer (cf.
[Koc]) developed by Herbert Kociemba (cf. section 3.4) which is one of the fastest
implementations in optimally solving the Rubiks Cube. The it has an option of
constructing a pattern database that requires 1Gb of memory, but by doing so can
optimally solve most instances of the Rubiks Cube in under an hour on modem
desktop PCs.

Standard Suboptimal Approaches
Given that optimal solutions require a fair amount of time and have hefty
memory requirements, much attention has been turned to developing methods for
improving suboptimal approaches to get as close to optimal as possible. This goal
has been achieved with Kociembas algorithm discussed in section 3.4, which gives
solutions that may even be optimal. However, it still requires an IDA* search to
determine whether or not a given solution is indeed optimal.
3,1 Macro Operators
Typical human solutions to the Cube require anywhere from fifty to
hundreds of moves that consist of several macros applied to progressively bring the
Cube into a more restored and ordered state. One example of an efficient four-step
solution that restores the Cube successively in this way is as follows: (1) places the
edges in top layer, (2) places the top comers and their adjacent edges in the middle
layer simultaneously, (3) correctly orients the bottom layer such that the bottom face
is the same color, and (4) finally correctly positions the cubies in the bottom layer.
These are the four steps used by the speed cubist Jiri Fridrich (cf. [FrJ) in his
solution that requires on average 56 moves using the HTM. He can solve a cube in
16 to 17 seconds averaging 4 to 5 moves a second.
A typical macro may permute two or three cubies, twist two or three comers,
or flip a few edges, or do some combination of all these. Several other cubies may
be affected at the end of the macro, though those are generally the ones that havent
been restored yet. However, during a macro the Cube becomes more disordered
where previously restored cubies are moved around giving the Cube the appearance
of being messed up during the macro.
Any of these types of solutions require far more than the 20 moves or so
conjectured to be necessary. To find a solution requiring fewer moves, a computer
becomes necessary.
One algorithm that is based on this approach was given in [Kor6] (cf. [Kor5]
too). In it Korf taught a computer to solve the Cube by systematically building
tables of macros to successively apply. The algorithm was given a sequence of
cubies to set in place. Starting with the first cubie, it would search using one of the
above techniques, to find a macro for each possible position and orientation of the
cubie in which it is out of place. It keeps doing this for each cubie in the sequence.

However, in doing the search for the next cubie in the sequence, it must find a
macro in which the previously placed cubies are left undisturbed at the end of the
macro. For this reason, the length of the macros grow longer as more cubies are
Once the algorithm has produced this table of macros, it can solve any
instance of the Cube. Since it knows a macro for each possible location of the next
cubie to place in the sequence, it simply determines the cubies current position and
orientation and then determines which macro to apply. The sequence in which to
place the cubies is important, since shorter macros will be produced when restoring
all the same type of cubies in a given layer before proceeding to next ones.
However, the final solution produced, even with an efficient ordering of the cubies
to place, is worse than most human solution, requiring 200 moves on average.
3.2 Important Subgroups of the Cube
Please refer to Appendix A for some group theory background on the Cube.
Familiarity with group theory will be assumed for the remainder of this section.
One can form subgroups by restricting what moves are available. For
instance, if only half turns were allowed, then that subgroup would be written in
terms of the generators {U, D, F, B, R, L} as . This group
has a significantly smaller order of 663,552 as can be seen in Table 4, which shows
a series of nested subgroups such that Go G| G2 G3 d G4. The factor column
shows the index, i.e. the number of cosets, between successive subgroups.
Group Order Index
G0 = 37 8! 2" -121/2 4.33 x 1019
2 2,048
G, = 37 8! 121/2 2.11 x 1016
g2 = 8 !2 41 / 2 1.95 x 10
(4 )2'2 3 29,400
g3 = 4!5/12 6.63 x 105
4!5 / 12 663,552
G4 = I (Identity) 1
Table 3-1 Orders and indices for nested subgroups of restricted maneuvers3
3 Data is from Jaaps Rubiks Cube page at

The first nested subgroup G\ has an index in Go, written as [Go : Gi], of
2,048. This comes from only allowing half turns for the right and left faces, so none
of the 12 edges can be flipped or change orientation. Thus, the factor of 212 / 2 = 211
is eliminated from the number of possible states. The factor of 2 comes from part
(c) of the 2nd Fundamental Theorem given in section 1.6.3.
The index of the next quotient group, [Gi : G2], is 37 (42). By further
restricting the set of maneuvers to half turns for the front and back, the orientations
of the comers become fixed leading to the 3 / 3 = 3 term and the middle edges are
placed into the middle slice giving the(42) term since there are 12 choose 4 ways of
doing this. The factor of 3 comes from part (b) of the 2nd Fundamental Theorem
given in section 1.6.3.
The next calculation of [G3: G2] as (4) 2 3 is a little more complex since
in this state the edges of the up and down layers are placed in their correct slices
(front, back, right and left) and the comers are placed in their correct tetrads where a
tetrad consists of four comers such that if one were to cut a cube in half diagonally,
that cut would go through the four comers of a tetrad. Thus, it can be thought of as
two adjacent pairs of antipodal comers.
Together these give a factor of (4)2 since for each there are 8 choose 4 ways
of doing this. Furthermore, both parities of the permutations for the edges and the
comers are made even giving the factor of 2 by part (a) of the 2nd Fundamental
Theorem. Finally the total twist for comers in each tetrad is fixed giving the factor
of 3, which is in addition to the total twist of all the comers being fixed which
corresponds to part (b) of the 2nd Fundamental Theorem.
3.3 Thistlethwaites Algorithm
In the early 1980s the English mathematician Morwen B. Thistlethwaite
devised an algorithm (cf. [Si2]) that uses the nested subgroups given in Table 3-1.
He devised a series of lookup tables for each of the four successive pairs of nested
subgroups, i.e. quotient groups, whose number of entries corresponded the
respective index given in Table 3-1. His solution then progressed from one nested
subgroup to the next until it reached the solved state of G4, the identity, a puzzle
would be solved.
For instance, in the first stage in going from Go to G\, his program would
index into the lookup table using the 2n possible orientations of all the edges. For
each entry, it would contain a sequence of maneuvers that would take the Cube into

some valid position in G\. To populate this database, a BFS could be used that kept
track of the moves back to the solved position as it progressed through the 2n
possible orientations of the edges.
The tables for the other three stages, one for each quotient group, were
determined in a similar way. For each step in proceeding from one nested subgroup
to the next, the properties of that subgroup (as described above) that lead to the
factors that compose the index between two subgroups would be the ones upon
which the table was indexed.
Thistlethwaites algorithm gave the first known upper bound for the
diameter of the Cube to be 52 as seen in Table 3-2. In order to keep his lookup
tables within memory limits of the time, he had a set of simplifying maneuvers that
added to the optimal length in going from one subgroup to the next. He
subsequently improved his method by reducing that number by 2. However, the
best possible bound would be 45, the sum of the diameters of the four quotient
groups. If none of the simplifying maneuvers had been performed, each of the
tables would the size of the index of that quotient group.
Stage: 1 2 3 4 Total
Original algorithm: 7 13 15 17 52
Improved algorithm: 7 13 15 15 50
Best possible: 7 10 13 15 45
Table 3-2 Number of moves for each stage in Thistlethwaites algorithm4
3.4 Kociembas Algorithm
Herbert Kociemba of Germany developed an algorithm based off of
Thistlethwaites that to date is the algorithm that produces the best near optimal
solutions in the shortest amount of time. It typically runs in a few minutes whereas
finding an optimal for the same state may take far longer. Instead of using all the
nested subgroups in Table 3-1, Kociemba came upon the idea of only using G2 =
as an intermediate nested subgroup. Using IDA* with G2 for
his pattern database instead of lookup tables, he then searched for a position in G2.
By keeping track of the maneuvers to get there and subsequently using IDA* again
to search from that position in G2 to the goal state, he obtained a near optimal
solution by combining the two sets of maneuvers. 4
4 Data is from Jaaps Rubiks Cube page at

However, to further reduce the total number of moves in his solution, his
algorithm had a second phase. It would then check to see if using any of the longer
paths from the initial scrambled state to Gi would provide an overall shorter
solution. In doing so he would produce of a solution with the fewest possible
number of moves that contained an element of G2 somewhere along the path. If it
did not, then that solution would be optimal since a full IDA* would have been
performed from the initial state to the goal state. Typical solutions provided by
Kociembas algorithm are only one or two moves from optimal. Occasionally an
optimal solution will be produced that does not contain a state from G2. However,
to prove that the solution is optimal, a full IDA* of the whole space needs to be
performed, which takes significantly longer to run.
Michael Reid and others (cf. [Sch+1]) used Gj for their pattern database to
find optimal solutions for the Rubiks Cube. This particular pattern database
produced a better heuristic than the ones based off of comers and edges used by
Korf. In fact IDA* ran twenty times faster for Reid on a slower computer using less
memory. As such, the main concept here is that pattern databases need not be based
off subset of puzzle pieces, but more generally, can be based off any subgroup of
the appropriate order.

4. Proposed Approach Bayesian Reasoning
In this chapter, the possibility of using probability to solve an instance of the
Cube is explored.
4.1 Overview of Proposed Bayesian Approach
Given that standard search methods to find optimal or near-optimal solutions
fail for larger puzzles such as the 4x4x4 or 5x5x5 Cube, whereas the human
approach of successively ordering the cube piece by piece do not, this proposed
approach shouldnt be compared with those search methods, but rather compared to
the solutions generated by humans, since these are the solutions to effectively beat
for larger puzzles.
This approach requires the development of several metrics that can be used
in combination to hopefully determine whether a move would most likely decrease
the distance to the goal. By using a greedy algorithm in always choosing such a
move, the prior probability distribution would be shifted closer to the goal as moves
are successively applied so that the probability of being closer to the goal is
increased until finally the goal state is reached.
An example of a possible metric may be to take the square of the number of
different colors on each face. The smaller the number, the more the cube resembles
the goal state. It should be noted that while this metric tends to decrease as a speed
cubist solves the cube, it alone is insufficient in determining how close the Cube is
to the goal state.
It should be noted that while some metrics may perform better at a given
distance from the goal state than others, they may perform poorly at another
distance. Given that, the probability of how well the metric performs needs to be
measured with respect to distance from the goal state. One major caveat of this
approach is whether such metrics even exist for states relatively far from the goal
state. This thesis assumes their existence. Part of the problem in implementing this
method would be to determine an appropriate set of metrics.
This method develops a series of probability tables that can be generated
from observable probabilities, i.e. measurable in terms of performing a series of
optimal searches. From these probability tables, the necessary probabilities of
determining the likelihood one is from goal state can be calculated by using
successive application of Bayes Rule.

Another problem with this approach is the requirement of using optimal
searches to determine these probabilities since the total number of searches needed
to measure a probability within a reasonable level accuracy may be beyond current
computational resources.
Originally it had been hoped that Kociembas algorithm, which can generate
near-optimal solutions quickly, could be used to measure these probabilities.
However, given there isnt a direct correlation between the length of a solution
generated by this algorithm versus the actual optimal length, this wont work. This
claim was easily validated by feeding it a state that only required six moves. The
solution it immediately produced was 21 moves. However, giving a solution that
required 16 moves, it quickly produced a solution that required 19 moves. Thinking
of the graph of Cube, which is densely interwoven with a high level of connectivity,
it is reasonable to see that the shortest path from one state to another requiring that
part of the path be a restricted set of moves isnt necessarily related to the optimal
If there were some correlation between the lengths of solutions produced by
a near-optimal solution to the optimal solution, then one could measure the
probability of how many moves less than optimal the solution given is and then
infer the optimal length with some reasonable probability. Referring back Gods
algorithm, it can be seen that solving the cube is equivalent to knowing the distance
one is from goal state. If there was some way of making this measurement by
circumventing doing the full brute force search, then the puzzle wouldnt be as
difficult to solve optimally as it has demonstrated to be.
While this approach may be considered as a way of making this
measurement without performing a brute force search, it currently requires the use
of optimal searches to generate the tables of probabilities. While this undermines
using this approach for larger problems where current search algorithms fail, it may
be possible to generate the probabilities using other techniques such as those used in
reinforcement learning, though there are obstacles there as well.
4.2 Problems with Reinforcement Learning
One of the primary obstacles in translating this problem into the
reinforcement learning paradigm is that it requires a reward function based on the
given state reached and action performed. If a reward function for a state is defined
in terms of distance from the goal state, then coming up with that function is
equivalent to the problem of having access to the lookup table used in Gods
algorithm. Thus to teach an agent though reinforcement learning using the obvious

set of unique states of the Cube and actions based on legal moves requires having
solved that problem optimally for all states.
Even if one were to step down to simpler problem such as the 2x2x2 Cube
for which such a lookup table has been generated, the sheer number of states would
also present a problem. The agent would need to explore all of them several times
given that there are multiple actions associated with each state in order to develop a
policy that could provide an action for any given state that would eventually lead to
a solution.
If one were to somehow overcome these barriers by cleverly defining a set
of aggregate states and corresponding actions for which a reward function could be
easily assessed, then one would be faced with how to weight the reward function
such that the agent didnt oscillate between getting closer to a solution and then
stepping away to gain repeated rewards by doing so that would outweigh the reward
of going directly the goal state. If one were to only give a reward for the goal state,
then the agent would have no idea of how well it was doing without looking ahead
that many moves. A possible solution would be to weight the rewards in an
sufficient exponential manner such that as the agent gets closer to the goal state the
reward of reaching the goal state would outweigh any long term oscillations.
An example of a possible set of aggregate states would be to consider all the
states in which only a single comer piece was correctly placed and oriented as one
aggregate states, and two adjacent comer pieces correctly placed but not correctly
oriented as another, and so on. For states that dont fall into any of these categories,
then one could have a set of aggregate states that are distinguished by how close
they are in the number of moves needed to reach one of the states with at least one
piece correctly placed.
However, even this would lead to large number of possible states given all
the possible subsets of comers and edges. One would most likely need to further
restrict the set of aggregate states to consider only a small portion of the total
number of combinations of comers and edges. By using these kinds of aggregate
states, one would begin to mimic the human approach of solving the cube by
successively placing cubies until the Cube solved. Even so, the particular sequence
chosen might be shorter on average than any current human approach by choosing
to place the cubies in one order versus another given the initial aggregate state.
A problem with the above approach to defining aggregate states would lead
to difficulties in defining the associated actions. The actions would no longer be
single moves, but rather macros. The possible macros to consider at each aggregate
state to reach a more desirable aggregate state are numerous to say the least. That

isnt the main problem though. The difficulty would be that as the agent reaches a
state, the set of possible actions, i.e. macros that would lead to some other
aggregate, wouldnt necessarily be known beforehand. Thus, there would need to
some process in how the set of states and associated actions would evolve as the
agent explores more and more aggregate states.
Even given all the aforementioned difficulties, if one were able to design a
workable Markov decision process, then one would accomplish what no one so far
has: to have an agent learn how to solve the cube, i.e. develop a policy, without
simply searching for a sequence of macros to solve the Cube in some predefined
order of successively arranged pieces.
However, given all these obstacles, this thesis instead focuses on using
Bayesian reasoning to avoid the quagmire of defining an appropriate set of
aggregate states, weighting a reward function appropriately and handling the
evolution of states and possible actions.
4.3 Description of Bayesian Approach
For each given distance d (minimum number of moves to goal state) from 1
to 18+ and a given metric M we define the symbols D and M for parameters d, x, y
as the following abbreviations:
D[d, x] = Distance to goal d changed by x (x can be 1, 0, or -1)
M[d, y] = Metric M changed by y in going from distance d
Braces are used to avoid confusion with the standard use of parenthesis to
denote a function, which neither D or M are.
For simplicity in illustrating this approach let us restrict y to values of 1, 0
and -1 where 1 means the metric increased, 0 means the metric stayed the same and
-1 means the metric decreased. Here let an increased value of the metric mean that
there is a stronger likelihood the Cube is getting closer to the goal state.
There are two types of observable probabilities. First is the probability of
state changing from distance d.
Pr(D[d, .*])= Number instances distance changed from dto d + x
/ Total number instances in going from distance d

The second observable probability is the conditional probability of how the
metric changes given the change in distance from d.
Pr(M[d, y] \ D[d, x]) = Number instances metric changed by y going from
d to d + x
/ Number instances in going from d to d + x
Just to note, to observe each instance in calculating the probabilities, two
optimal solutions must be determined: the one from distance d and the one after
making a single move. Though the solution isnt of importance, the length of the
solution, i.e. distance d, is. Since a huge majority of the states of the Cube reside at
either distance 16 or 17, coming up with distances at 18 or greater may require
several tries before one is discovered.
First we need the probability for y = 1, i.e. the metric increased, determined
by applying Bayes Rule:
Pr(D[d,-l]\M[d, y]) = Pr(M [d, y] \ D[d,-l]) x Pr(D[d,-l])/Pr(M[d, y])
The probability of the metric changing by y from distance d can be
calculated using the formula:
Pr(M [d, y]) = £ Pr(D[dt /]) x Pr(M [d, y] \ D[d, /])
We will also need to calculate the other two conditional probabilities, which
are ?r{D[d,-\\\M [d, y])fory = 0and 1, to determine how likely the distance
decreased even if the metric stayed the same or decreased.
Thus, for each metric M we need to store twelve probabilities, namely
Pr(D[<7, jc])for each of the three values of x and Pr(M [d, y] | D[d, x]) for the nine
combinations of x and y. For each of the 18 categories of distance, 1 to 18+, that
would be a total of 216 probabilities. The category of 18+ is used since one could
determine within minutes whether or not the optimal length is 17 or less. Even so, it
would takes several hours to compute to generate these probabilities to any level of
accuracy, since the total number of instances in applying a move at a distance d
would need to be at least a hundred if not more. Given that is for a single metric,
generating all the required probabilities could become quite time consuming.

Using those numbers from the tables, a simple greedy algorithm would be as
1. Initially assign a prior probability distribution according to the believed
distribution of the number of states at each distance d divide by the total
number of states, which is 4.3 x 1019. Table 1 for the 2x2x2 Cube could be
scaled appropriately to do this by using 17 as the distance with the largest
number of states and 20 as the maximum distance.
2. For each of the possible distances d:
a. For each metric M:
i. Calculate Pr(D[ single moves (three turns for each of the six faces) for the
value of y given by the metric M after making the single
ii. Keep a running total of the probabilities for each move over
the metrics weighting each by the factor W(M, d) of how
good that metric is compared to all the other metrics at that
distance d.
b. Weight the total probabilities of each move by the prior probability
of being at distance d.
c. Choose the move with the greatest probability.
3. Update the probability distribution of d.
4. Go to step 2 and repeat until Cube solved.
The weighting factor W(M, d) of how good a metric is at distance d would be
calculated as follows:
G(M, d) = £Pr(D[d,-l]\M[d, y]) = Pr(M [d, y])/Pr(D[d,-l])
= $>(M[d, y]\D[d,-l])
W(M,d) = G(M, d) / ^G(M, d)
M '

To update the probability distribution P{d) of being at distance d, calculate
the probability Rid, x) of going from distance d and to d + x and use it as follows to
find the updated distribution of P(d):
Rid, x)= YjW(M, d)x?r(D[d, x]\M[d,y})
Q(d) = j^R(d, i) x P(d+ i) / £ R(d,j)
P\d)= Q(d) i XQ(d)
Again the probability Pr(D[d, x] | M [d, y]) would be found using Bayes
Rule as shown above. The denominator of the second formula normalizes R(d, x),
guaranteeing that it sums to 1 for the three values of x for a given d. The
probabilities distribution of Q{d) doesnt need to be normalized, but is done so that
any rounding errors dont propagate between iterations. Thus updated probability
P(d) of being at distance d is equal to Q(d) with infinite precision.
It should be noted that this greedy algorithm is fairly shortsighted. It doesnt
look any more than one move ahead. It may not choose a move that may at first
seem poor, but would lead to a better probability distribution sooner than simply
following this one move greedy approach.
So rather than only considering a single move at a time, the algorithm could
consider the effects of two sequential moves, which there are 243 unique ones. This
would eliminate duplicate calculations between steps, immediately reducing the
branching factor from 18 to 13.5 for the second move. One can take this even
further by considering longer and longer sequence of moves, say length l, and
choosing the one that most improves the probability distribution. Then one would
perform the first move of that sequence and update the probability distribution and
repeat. Since one already searched l 1 moves from that chosen move, storing the
moves consider so far in some kind of dynamic tree structure might improve
performance somewhat. This probably isnt necessary given that for the high
branching factor involved, most of the time is spent on the frontier of the search, i.e.
at a distance l from the current state.

4.4 Conclusions Regarding Bayesian Approach
Trying to employ learning and decision process methods to solve puzzles
like the Rubiks Cube is difficult and problematic. However, the gains in achieving
such a workable approach are great. A whole new class of problems would then
become open to such methods. While the context of the specific case for the
Rubiks Cube may seem quite limited, the above proposed approach is general
enough to apply to other deterministic domains with a uniform set of actions in
which the graph of the system has a high degree of connectivity with non-
serializable subgoals. The Cube can be thought of as a model for the shortest path
problem for complex systems with a huge number of states. This approach could
even be extended to include the stochastic case as long as the associated
probabilities remained constant.
There are two primary problems with this greedy Bayesian approach. The
first is that determining the probabilities requires several optimal searches, which
makes it impractical for problems with a larger number of states than the standard
3x3x3 Rubiks Cube. However, if some other method could be devised such that
those probabilities could iteratively be determined, then this approach would
become more general and valuable.
The second is that the metrics needed by the method to increase the
likelihood of moving closer to the goal state may not exist in any simple form for
larger distances. As such this method may not even work. One way of checking to
see whether or not this method could even solve any problem of this class would be
to run it for the 2x2x2 case in which all the distances are precisely known.
Determining the tables of probabilities given a set of metrics would be trivial.
The main challenge as with the standard Cube would be to define an
appropriate set of metrics. Other possibilities other than measuring the number of
colors per face, would include counting the number of correctly placed pieces or
how far the cube might be from a nested subproblem of restricted moves. For
instance, if a face only had its color and that of the opposite face, then only half
turns are required to solve the cube. That subproblem has a smaller diameter than
the full problem, and thus would have a lower metric. However, such metrics
would require some search a few moves ahead to make them useful.
The main advantage to this method is that it only requires the storage of a
relatively few set of number, i.e. probabilities, to solve the problem, without
requiring huge tables. If the method could provide a solution and if that solution
was on average shorter than those found using standard human approaches of
placing pieces in a specific order, then the method could generate better solutions

for larger problems, provided that the problem of determining the probabilities
without doing full brute force search to measure them has been solved.
Finally this method circumvents many of the problems associated with the
reinforcement learning approach. However, if that method of learning could be
employed, then one would do what no one has done before for the Rubiks Cube:
applied AI learning to the Cube. Still, getting the Bayesian approach to work would
also be a first in that it is a novel approach to solving the Cube. Pursuit of these
ideas may eventually lead to conquering problems where brute force search would
utterly fail. Thus, it is worthwhile to attempt to overcome the various obstacles
given the potential reward.

A. Some Group Theory Background of Rubiks Cube
The Rubiks Cube, along with many of the other permutation puzzles you
may have seen, can be described using group theory, a branch of mathematics that is
a specialization of abstract algebra. Briefly described, a group comprises of any set
G, here all the possible states of the cube under legal maneuvers, with a binary
relation operator/: G x G > G. The properties that define G as a group are
commonly written using operator notation. Taking the binary relation/to be
represented by the operator , the four properties of a group are:
1. Closure of G: For any a, b e G, a b e G
2. Associativity: a (b c) = (a b) c
3. Existence of an identity e\ ae-ea-a
4. Existence of an inverse b: ab = ba = e
The Cube has all of these properties since any state can be thought of as a
sequence of moves to reach that state. Any sequences of moves is clearly
associative, has an inverse, namely the sequence in reverse, and the identity element
is no move at all. Finally, it closed under the operation of successive maneuvers
since that is how it is defined.
For simplicity, this binary operator is typically omitted and assumed, so a
b, becomes ab. Also due to the associate nature of groups parenthesis are also
omitted, so (ab)c would be written as abc. The total number of elements of a finite
group G is called its order and is written as |G|.
A subset H of a group G that also has all the properties of a group is called a
subgroup and is written as H c G or G 3 H. The comer cubies in the previously
described pattern database is an example of a subgroup of the Rubiks Cube. One of
the important properties of subgroups is the existence of cosets. A left coset for a
subgroup H is defined as tH = {th: he H) where t is known as the left transversal or
representative of the left coset. Right cosets and right transversals are subgroup. If
they are, then those subgroups are called normal subgroups, which play an
important role in group theory.

What makes cosets important is that they partition a group into subsets of the
same cardinality5. The number of cosets for a given subgroup S of a finite group G
is called its index, written as [G:S]. This leads to one of the most important
theorems in group theory, Lagranges theorem that states for finite groups, the order
of a subgroup S evenly divides the order of its group G, or written more succinctly,
[G:S] = |G| / |S|.
Cosets have the same structure as their corresponding subgroup. The
collection of cosets for a given subgroup itself forms a group, which are called
factor groups or quotient groups. Since a transversal for a given coset isnt unique,
one cant directly form a group out of an arbitrary set of transversals (the set of
representatives taken from each coset) for a given subgroup. However, given a
subgroup H, then for any two corresponding transversals s and t there exists an
element k e stH such that ab = k where are a and b are elements in the respective
cosets, namely a £ sH and b& tH. It must be noted that while k may be taken as a
valid transversal for the coset stH, it need not equal st. However, the element st is
guaranteed to be in the coset stH since H contains the identity.
An example of all this can be seen by looking at the Rubiks Cube. One
group that can be formed out of the Cube is the set of all possible positions
obtainable if one were able to take apart the Cube and put the cubies back in any
order. A subgroup of this larger group is set of all legal positions using a sequence
of moves, or twists, from its goal state. The index of the quotient group, i.e. the
number of cosets, of this subgroup turns out to be 12. Or stated another way, there
is only a one in twelve chance of being able to restore a cube that has had its cubies
inserted randomly after haven been taken apart. The 12 cosets of this subgroup all
have the same order of roughly 4.3 x 1019. The graphs for each of these cosets are
isomorphic to one another.
A group can also be described by its generators. They are the smallest set of
elements such that any element of the group can be written as a sequence of these
generators. The sequence is evaluated by applying the group operator between
successive terms. Again because of the property associativity, order of evaluation
doesnt matter. For the Rubiks Cube, its set of generators are the primitive
operators, or clockwise turns of each face, written as R, L, U, D, F and B using
Singmaster notation. To denote the group generated by these operators, one uses
the notation .
5 Two sets have the same cardinality if there is a bijective (one-to-one and onto) mapping between
them. Thus for finite sets, sets with the same cardinality have the same number of elements, and
loosely speaking for infinite groups, sets of the same cardinality have the same degree of infinity.

B. Symmetry of the Cube
To fully understand the various symmetries of the cube, one needs to understand the
difference between symmetries based on rotation versus reflection and the role
inversion plays. In terms of three dimensional geometry, symmetry performs a
transformation upon some solid figure, in this case a cube, using rotation, reflection
or inversion, such that the resulting solid occupies the same space it did before the
transformation. This transformation may correspond to an actual physical
movement of the solid as is the case with rotation or it may not as is the case with
reflection and inversion.
Rotation requires an axis upon which to rotate the solid. A rotational symmetry can
be thought of as sticking a straight rod through the center of the solid and turning it
by some factor of 360 so that the solid looks the same as before as it does after.
Figure B-l shows the 13 axes of rotation for the cube that can result in symmetry6 7.
The medium grey axes are face centered. The darker grey axes are through
antipodal comers. The six light grey axes are through the midpoints of opposing
Figure B-l The 13 axes of rotational symmetry of the cube7
A cube has 24 unique rotations that can be performed. One comes from no rotation
at all, 9 result from 90, 180 and 270 turns about each of the three medium grey
axes, 8 result from 120 and 240 turns about each of the four dark grey axes and 6
result from 180 turns about each of the 6 light grey axes. This gives a total
rotational symmetry of 1 + 9 + 8 + 6 = 24. See Table B-l below for a summary.
6 Figures in this section were generated using Mathematica 4.1 for Students.
7 The front faces of the cubes were removed to better view the various axes.

In group theory, this group of rotations corresponds to the symmetric group of four,
written 54, which is simply the group of permutations of four elements. The order
of this group is 4! = 24, and by no small coincidence, these two groups are
isomorphic. The permutations can be thought of as permuting the four red axes in
Figure 4. Each of the 24 rotations gives a unique permutation these axes.
Rotation Color Degrees of rotation Number of Rotations Number of Axes Total Rotations
Face Blue O O 3 3 9
Corner Red 120 2 4 8
Edge Green 180 1 6 6
None 1
Total 13 24
Table B-l Summary of rotations of the cube
Whereas a rotation requires an axis, a reflection requires a plane upon which to
reflect the solid. Reflection symmetry can be thought of as inserting a mirror
through the middle of a solid and taking the combination of the two resulting
reflections for the new solid. There are a total of nine planes of reflection for the
cube. The first three come from reflecting the cube through a plane that divides the
cube horizontally or vertically. These planes are orthogonal to the cube. See Figure
Figure B-2 Three orthogonal planes of reflection for the cube
The next six reflections can be thought of as three pairs of diagonal planes such that
when a pair is viewed from one of the three axes, x, y or z, they appear to be
perpendicular straight lines that intersect the opposite comers of the face centered

on that axis. Hence each axis is the line common to each pair of diagonal planes.
See Figure B-3 for an illustration of these.
Finally, inversion is more difficult to describe and doesnt lend itself easily to
illustration. Inversion can be though of a combination of three sets of swaps.
Referring to Figure B-l, the first is the set of opposing faces that are each centered
on one of three blue axes; the second is set of antipodal comers that are each
intersected by one of the four red axes; and the third is the set of the opposing edges
that are each bisected by one of the six green axes. By swapping these faces,
comers and edges, the cube is inverted from that state. In terms of group theory if
one considers the total number of symmetries of cube, then subgroup of rotations
without inversion is of order 24 corresponding to the rotations noted above, and its
coset is the elements of that subgroup inverted also of order 24. Hence, there are a
total of 48 symmetries of the cube.
While it may not be apparent, any rotation or inversion can be thought of as two or
three successive reflections. In fact six of the reflections generate the entire group
of cubic symmetries of order 48, where the first three are the orthogonal ones and
the second three are one from each pair of diagonal reflections. The reason only one
of each pair of diagonal planes is needed is that one can be reflected by the
orthogonal plane (sharing the same axis) to obtain the other. For instance, a 90-
degree rotation about the z-axis can be thought of as two successive reflections of
the first plane of reflection in Figure B-2 and the first plane of reflection (the wider
one of the two) in Figure B-3.
Figure B-3 Six diagonal planes of reflection for the cube

[BA] Babai, Laszlo and Seress, Akos. On the diameter of permutation groups. European Journal
of Combinatorics, Vol.13, No. 4, pp.231-243, July 1992.
[Ba] Bandelow, Christoph. Inside Rubiks Cube and Beyond. Birkhauser, 1982.
[BCG] Berklekamp, Elwyn R.; Conway, John H.; and Guy, Richard K. Winning Ways Vol. 2:
Games in Particular, pp. 760-768. Academic Press, 1982.
[Bu] Butler, Gregory. Fundamental Algorithms for Permutation Groups. Springer-Verlag, 1991.
[Cr] Cromwell, Peter. Polyhedra. Cambridge University Press, 1997.
[CS1] Culberson, Joseph C. and Schaeffer, Jonathan. Searching with pattern databases. Canadian
Conference on AI, 1996, pp. 402-416.
[CS2] Culberson, Joseph C. and Schaeffer, Jonathan. Efficiently searching the 15-Puzzle.
Technical Report, Department of Computer Science, University of Alberta, 1994.
[DM] Dixon, John D. and Mortimer, Brian. Permutation Groups. Springer-Verlag, 1996
[Eid] Eidswick, J. A. Cubelike puzzles-What are they and how do you solve them? American
Mathematical Monthly, Vol. 93, No. 3, pp. 157-176, March 1986.
[EP] Egner, Sebastian and Piischel, Markus. Solving puzzles related to permutation groups
Proceedings of the International Symposium on Symbolic and Algebraic Computing fISSAC), 1998
[Fr] Fridrich, Jiri. My Speed Cubing Page,
[HB] Hecker, David and Banerji, Ranan. The slice group in Rubik's Cube. Mathematics Magazine,
Vol. 58, No. 4, pp. 211-218, September 1985.
[Hoi] Hofstadter, Douglas. R. Metamagical Themas: The Magic Cube's cubies are twiddled by
cubists and solved by cubemeisters. Scientific American, pp. 20-39, March 1981.
[Ho2] Hofstadter, Douglas. R. Metamagical Themas: Beyond Rubiks Cube: Spheres, pyramids,
dodecahedrons and God knows what else. Scientific American, pp. 16-31, July 1982.
[HM] Huang, Guoxiang and Myers, Dale. Subgoal strategies for solving board puzzles. Journal
of Automated Reasoning, Vol. 20, pp. 215-253, 1998.
[Jo 1 ] Joyner, David. Adventures in Group Theory: Rubik's Cube, Merlins Machine, and Other
Mathematical Toys. Johns Hopkins University Press, April 2002.

[Jo2] Joyner, David. Lecture notes on the mathematics of the Rubiks cube. nts.htm.
[KS] Kantor, William M and Seress, Akos. Permutation group algorithms via black box
recognition algorithms. Groups St Andrews 1997 in Bath (Eds. C. Campbell etal.), LMS Lectures
Notes Series 261, pp. 436-446. Cambridge University Press, 1999.
[Koc] Kociemba, Herbert. Cube Explorer. http://home. t-onl
[Koe] Koeller, Juergen. 15 Puzzle,
[Korl] Korf, Richard. Recent progress on the design and analysis of admissible heuristic functions.
Proceedings of the National Conference on Artificial Intelligence (AAAI-2000), Austin, TX, pp.
1165-1170, August 2000.
[Kor2] Korf, Richard. Sliding-tile puzzles and Rubik's Cube in AI research. IEEE Intelligent
Systems, pp. 8-12, November 1999.
[Kor3] Korf, Richard. Finding optimal solutions to Rubik's Cube using pattern databases.
Proceedings of the Fourteenth National Conference on Artificial Intelligence (AAAI-97), Providence,
RI, pp. 700-705, July 1997.
[Kor4] Korf, Richard. Depth-first iterative-deepening: An optimal admissible tree search.
Artificial Intelligence, Vol. 27, No. 1, pp. 97-109, 1985.
[Kor5] Korf, Richard. Macro-operators: A weak method for learning. Artificial Intelligence, Vol.
26, No. l,pp. 35-77, 1985.
[Kor6] Korf, Richard. Learning to Solve Problems by Searching for Macro-Operators. Pitman
Publishing, Ltd, London, 1985.
[KorEd] Edelkamp, Stefan and Korf, Richard. The branching factor of regular search spaces.
National Conference on Artificial Intelligence (AAAI), Madison, Wisconsin, pp 299-304, AAAI-
Press, 1998.
[KorREd] Korf, Richard; Reid, Michael; and Edelkamp, Stefan. Time complexity of Iterative-
Deepening A*. Artificial Intelligence, Vol. 129, No. 1-2, pp. 199-218, June 2001.
[KorTal] Korf, Richard and Taylor, Larry A. Finding optimal solutions to the Twenty-Four Puzzle.
Proceedings AAA I-96/1 A AI-96, Menlo Park, pp. 1202-1207, AAAI Press, 1996.
[KorTa2] Taylor, Larry A. and Korf, Richard. Pruning duplicate nodes in depth-first search.
Proceedings AAAI-94, Menlo Park, pp. 756-761, AAAI Press, 1994.
[KO] Karlemo, Filip R. W. and Ostergard, Patric R. J. On sliding block puzzles. Journal of
Combinatorial Mathematics and Combinatorial Computing, Vol. 34, pp. 97-107, 2000.
[La] Larson, Mogens Esrom. Rubiks Revenge: The group theoretical solution. American
Mathematical Monthly, Vol. 92, No. 6, pp. 381-390, 1985.

[Lo] Longridge, Mark. Superflip, 24q.
Lovers/Mark Longridge Superflip 24q.html.
[Mil] Miller, David Lee Winston. Solving Rubiks Cube Using the Bestfast Search Algorithm and
Profile Tables.
[Min] Minkwitz, Torsten. An algorithm for solving the factorization problem in permutation
groups. Journal of Symbolic Computation, Vol. 26, pp. 89-95, 1998
[Pe] Petrus, Lars. Solving Rubik's Cube for Speed,
[Row] Rowley, Chris. The group of the Hungarian Magic Cube. Proceedings of the First Western
Australian Conference on Algebra, 1982
[Rob] Robinson, Derek J. S. A Course in the Theory of Groups, 2nd Edition. Springer-Verlag,
[Rot] Rotman, Joseph J. An Introduction to the Theory of Groups, 4th Edition. Springer-Verlag,
[RN] Russell, Stuart and Norvig, Peter. Artificial Intelligence, A Modern Approach. Prentice-Hall,
[RVKMV] Rubik Erno; Varga, Tamas; Keri, Gerzson; Marx, Gyorgy; and Vekerdy, Tamas.
Rubiks Cubic Compendium. Oxford University Press, 1987.
[Scherl] Scherphuis, Jaap. Computer Puzzling, Jaaps Puzzle Page,
[Scher2] Scherphuis, Jaap Rubiks Cube 2x2x2, Jaaps Puzzle Page,
[Scher2] Scherphuis, Jaap. Rubiks Cube 3x3x3, Jaaps Puzzle Page,
[Sch+1] Schonert, Martin et. al. Cube Lovers Index by Date.
[Sch+2] Schonert, Martin et. al. Analyzing Rubik's Cube with GAP.
[SiF] Frey, A.H., Jr. and Singmaster, David. Handbook of Cubik Math. Enslow, 1982.
[Si+1] Singmaster, David et. al. Cubic Circular, 1981-1985. 1 .htm.
[Si2] Singmaster, David. Notes of the Rubiks Magic Cube. Enslow, 1981.
[Su] Sullivan, John B. Groups and Geometry. Wm. C. Brown Publishers. 1994.

[TG] Turner, Edward C. and Gold, Karen F. Rubiks groups. American Mathematical Monthly,
Vol. 92, No. 9, pp. 617-629, November 1986.
[vCS] van de Craats, J. and Schoof, R.J. On the normal subgroups of Rubik groups. Nieuw Archief
Voor Wiskunde, Vol. 2, pp. 267-280, 1984.
[We] Weisstein, Eric W. Rubiks Cube from MathWorld.
[Wil] Winter, Dik T. Kociembas Algorithm, http://www.math.rwth- T. Winter Kociemba's algorithm (2).html.
[Wi2] Winter, Dik T. Diameter of Cube Group, http://www.math.rwth- T. Winter Diameter of cube group .html.
[Za] Zassenhaus, Hans. Rubiks Cube: A toy, a Galois tool, group theory for everybody. Physica,
Vol. 114A, pp. 629-637, 1982.