Stafford University Libraries
Dept. of
>e
Fol. Title
1
"
AUTOMATED
MATHEMATICIAN
Supplementary Materials
for the Stanford Heuristic Programming Project Workshop
January 58,1976
Douglas B. Lenat
Contents
1. Brief global description of the AM project
2. The tutorial talk
3. The details talk
4. Glossary of concepts and terms
5. Existing documentation
1. Overview
Researchers in most branches of science frequently face the difficult task of formulating research
problems which must be soluble yet nontrivial. In any given field, it is usually easier to tackle a
specific given problem than to propose interesting yet managable new questions to investigate.
For example, contrast solving the Missionaries and Cannibals problem with the more illdefined reasoning which led to inventing it.
'..
Let's restrict our attention to creative theory formation in mathematics: how to propose interesting
new concepts and plausible hypotheses connecting them. Although many great minds have
introspected on this problem [Poincare', Hadamard, Polya], we in AI all know the gulf that
separates smooth prose from smooth code.
computer program called AM (for
The experimental vehicle of my research is a
of
the activities involved in mathematical
Mathematician),
which
carries
out
some
Automated
data,
in
relationships
empirical
noticing
formulating new definitions out of
research:
simple
(and,
conjectures
some
less importantly, sometimes proving
ones,
plausible
proposing
existing
them), and evaluating the aesthetic "interestingness" of new concepts.
«
Before discussing how to synthesize a new theory, consider briefly how to analyze one, how to
construct a plausible chain of reasoning which terminates in a given discovery. One can do this
by working backwards, by reducing the creative act to simpler and simpler creative acts. For
example, consider the concept of prime numbers. How might one be led to define such a
notion? Notice the following plausible strategy:
■
/fr/tftf
Douglas B. Lenat
Automated Mathematician
SUPPLEMENT
pug*
2
"If f is a function which transforms elements of A into elements of B, and B is
ordered, then consider just those members of A which are transformed into
extremal elements of B. This set is an interesting subset of A."
"
When f(x) means "factors of x", arid the ordering is "by length", this heuristic says to consider
1
that is, the primes. So this rule
those numbers which have a minimal number of factors
of
actually reduces our task from "proposing the concept prime numbers" to the more elementary
problems of "inventing factorization" and "discovering cardinality"

But suppose we know this general rule: "If f is an interesting function, consider its inverse." It
reduces the task of discovering factorization to the simpler task of discovering multiplication.
Eventually, this task reduces to the discovery of very basic notions, like substitution, setunion, and
equality. To explain how a given researcher might have made a given discovery, suci an
analysis is continued until that inductive task is reduced to "discovering" notions which the
researcher already knew.
Suppose a large collection of these heuristics has been assembled (e.g., by analyzing a great many
discoveries, and writing down new heuristic rules whenever necessary.) Instead of using them to
explain how a given idea might have evolved, one can imagine starting from a basic core of
knowledge and "running the heuristics to generate new concepts.
Such syntheses are precisely' what AM does. The program consists of a corpus of primitive
mathematical concepts and a collection of guiding heuristics. AM's activities all serve to expand
AM itself/ to enlarge upon a given body of mathematical knowledge. To cope with the
enormity of the potential "search space" involved, AM uses its heuristics as judgmental criteria
to guide development in the most promising direction. It appears that the process of
inventing valuable new concepts can be guided successfully using a collection of a few hundred
such heuristics.
...
Each concept is represented as a BEING*, a framelike data structure with 30 different facets or
slots. The types of slots include: Examples, Definitions, Generalizations, Utility, Analogies,
Interestingness, Uninterestingness, and a couple dozen others. The BEINGs representation
provides a convenient scheme for organizing the heuristics; for example, the following strategy fits
into the Examples slot of the Predicate concept: "If, empirically, 10 times as many elements fail
some predicate P, as satisfy it, then some generalization (weakened version) of P might be more
interesting than P". AM considers this suggestion after trying to fill in examples of any
predicate.
AM is initially given a large collection of core concepts, with only a few slots filled in for each.
Its sole activity is to choose some facet of some concept, and fill in that particular slot. In so
doing, new notions will often emerge. Uninteresting ones are forgotten, mildly interesting Dnes
are kept as parts of one slot of one concept, and very interesting ones are granted full
concept status. Such new Beings will have dozens of blank parts, hence the spao; of
possible actions (slots to fill in) grows rapidly. The same heuristics are used both to suggest new
directions for investigation, and to limit attention: both to grow and to prune.
The particular mathematical domains in which AM operates depend on the choice of initial
concepts. Currently, AM is given about a hundred concepts, all of which are what Piaget might
describe as prenumerical: Sets, substitution, equality, relations, and so on. In particular, AM is
"
"
Douglas B. Lenat
Automated Mathematician
SUPPLEMENT
page
3
anything about proof, singlevalued functions, or numbers. Although it was never
able to prove the unique factorization theorem, AM actually did conjecture it. 5 Before this,
AM had to define and investigate concepts corresponding to those we refer to as cardinality,
multiplication, factors, and primes, based on reasoning similar to that in the example above.
not told
The main difficulty with AM is getting it to accurately judge a priori the value of each new
concept, to quickly lose interest in concepts which aren't going to develop into anything. As with
many A I programs, one aspect of working on AM is the degree of precision with which one's
ideas must be formulated. The resultant body of detailed heuristics may be the germ of a more
efficient programme for educating math students than the current dogma.° But perhaps the most
exciting prospect opened up by AM is that of experimentation: one could vary the concepts AM
starts with, vary the heurisitics available, etc., and study the effects on AM's behavior. AM is a
dissertation project in progress; few conclusions have been drawn yet.
The issues to be elaborated upon include:
(1) What are these heuristics? Where do they come from, what is their justification, their power?
(2) What is the AM program like? What is its control structure, its representation for a concept?
How do the heuristics fit in?
(3) How does the AM program work? What does it start with, what does it do from there? How
and why?
(4) What can we all learn from AM? Abstracted out, what are the new ideas, the traps that were
fallen into?
'
will also be proposed as worth investigating. This leads to many
a MAXIMAL number of
interesting questions; the only "newtoMankind" mathematical result so far is in fact that such maximallydivisible numbers
must have the form pi* pg* P3 ~Pfc 1 where the pj's are the first k consecutive primes.'and the exponents aj decrease
with i, and the ratio of (a j* 1 )/(ai*I ) is approximately (as closely as is possibe for integers) log(pj)/log(pj). For example, a
1 7' 1
1 17 1 53 1 The
progression of its
typical divisorrich number is n2 S35 53 7 2 1 1 2 13 I 1 9 23^9 1 3 1 37 1
exponents* 1 (9 64332222222222 2) is about as close as one can get to satisfying the "logarithm" constraint.
This number n has 3,981,312 divisors. The "AM Conjecture" is that no number smaller than n has that many divisor*. By
the way, this n equals 25,603,675,584.
The other extreme, numbers with
'
.
'^1
2 Incidentally, these basic concepts include the operators which enlarge the space (e.g.,
concept in it* own right and a way to generate new ones).
3 Lenat, Douglas, BEINGS: Knowledge as Interacting Experts, 4th
.
COMPOSE2RELATIONS
i» both a
1975,pp. 126133.
so few are found that AM decides to generalize that predicate. The
In fact, after AM attempts to find examples of
result is the predicate which means "Hasthesamelengthas" i.e., the rudiments of Cardinality.

, Due to the firm base of preliminary concepts which AM developed, this relationship was almost obvious. AM sought some predicate
P which, for each n, some member of FACTORSOF(n) satisfied. ALLPRIMES was such a predicate: AM next constructed
the relation which associates, to each number n, all factorizations of n into primes. The full statement of the UFT is simply
i.e., it is defined and singlevalued for all numbers n.
that this relation is a
"
'
Currently, the educator takes the very best work any mathematician has ever done, polishes it until its brilliance is blinding, and
presents it to the student to induce upon. A few individuals (e.g., Minsky and Papert at MIT, Adam* at Stanford) are
experimenting with more realistic strategies for "teaching" creativity.
Douglas
B. Lenat
page
SUPPLEMENT
iutmaUd Mathematician
4
9__T>torial Talk
Objective.
j
r
ntn
the
system
for those
unacquainted
2i«
P CoLents
SSSVi.
CJSI
you might
problems
on
to
"theor, to—
convey a
in
encounter.
that
Some
with it. Also, try
"
33^Detail9L Talk
Objective
in most other
issues (which have analogues
U, group ptaning^on of
*"«
*
£&%
PiELP?EPPP^
J
"planningB.
c
Some of these
;art
v
a
few
detailed
topics are listed below.
Questions, to discuss.
g7^^J7^p7esen tations and languages
nose
"
Discovering vs being led by the
;S«&^ »nou,a £
So—
year Stanford.
AM
the
priorities,
at
next
Prerequisite
foundation.
concepts.
Glossary for those
Functions, Ordering: just glance at the
Relations,
Natura,
Numbers.
Integers
about.
uncomfortable
feel
you
items
unsure,
interesting? Are the, useful? (If you are
they
are
Why
are
the,?
Prlm e numbers: What
the Primes definition in the Glossary).
read
termmemory?
Mathematical research.
Modular Representations of
£#$sS^lTe?.^^
%£»<*&*
r<&
r~T^££^
ewr,es for
*
ose
no,ioni
"
"
Douglas B. Lenat
Automated Mathematician
SUPPLEMENT
page
5
4a. Glossary of Math Terms
Cardinality: the concept of "number".
Two sets are of the same cardinality if they have the
same number of elements.
Composition of two relations R and S: This is a new relation denoted RoS, and defined as
RoS(x) = R(S(x)). So RoS maps elements of the domain of S into elements of the range of R.
Notice that if R and S are both functions, then so is RoS. The intuitive picture of this process is
to operate on x with the relation S, and then apply R to the results.
Function: an operation f which associates, to each element x of some set D, an element f(x) of
some set R. D and R are the domain and range of f. Notice that a function may be considered
a singlevalued relation.
Integers: positive and negative whole numbers; i.e. ..,2,1,0,1,2,..
Map: used as a verb, this word indicates the action of applying a function or a relation; e.g.,
we say that squaring maps 7 into 49. Used as a noun, it is a synonym for function.
Mathematical concept: this is taken to mean all the constructions, definitions, conjectures,
operations, structures, etc. that a mathematician deals with. Some examples: Setintersection,
Sets, The unique factorization theorem, every entry listed in this glossary.
Mathematical intuition: this is the mental imagery which can be brought to bear.
Typically, we transform the situation to an abstract, simplified one, manipulate it there, and
retranslate the results into the original notation. For example, our intuition about "ordering"
may involve the image of marks on a yardstick. We can then answer questions involving
ordering rapidly, using this representation. Three features of the intuitive image should
be noted: (i) it is typically fast and simple, (ii) it is opaque, one cannot introspect too easily on
"why it works", and (iii) it is fallible, occasionally leading to wrong results.
Mathematical research: The fundamental idea here is that mathematics is an empirical science,
just as much as chemistry or physics. In doing research, the ultimate goal is the creation
of new, interesting theories, but the techniques used include looking for patterns in empirical
data, inducing new conjectures, modelling some aspects of the real world, etc. Although the final
product looks like a smooth, formal development, magically flowing from postulates to lemmas to
theorems, the actual research process involved untold blind alleys, rough guesses, and hard
work, (analogy: The process of painting is rarely itself artistic.)
Mathematical theory: to qualify as a theory, we must have (i) a basis of undefined primitive terms,
(ii) definitions involving these, (iii) axioms involving all the primitives and defined terms
(iv) conjectures and theorems relating these terms. To be at all worthwhile, however, the
theory must also meet the fuzzy requirements that (v) there is some correspondence between the
some "real"
primitives and some "realworld" concepts, between the axioms and
relationships, and (vi) some of the theorems are unexpected, hard to prove, elegant, interesting,
etc.
"
Natural numbers: nonnegative integers; i.e., 0, 1, 2, 3,...
Automated Mathematician
Douglas B. Lenat
SUPPLEMENT
page
6
Ordering: the concept of "before" and "after". This distinguishes a list from a bag (multisei:).
The formal axioms for ordering simply state the obvious properties of the intuitive image of a
list.
"
Prime numbers: natural numbers which have no divisors other than 1 and themself; e.g., 17,
but not 15 (=3xs). Primes are interesting because of the myriad times they crop up in diverse
from the Chinese Remainder Theorem (solving systems of linear congruenre
theorems
equations), to the Law of Quadratic Reciprocity, to Fermat's Theorem (for all integers n, for all
primes p, nP is congruent to n (mod p)). The "secret" of their value lies in the fact that all
integers can be factored uniquely into a set of prime divisors. This "Unique Factorization
Theorem" lets us reduce questions about integers to questions about primes.

Relation: an operation which associates, for each element of some set D, a set of elements E » {c j,
e0,..} of some set R. D and R are the domain and range of the relation. For example, the
realtion "<" associates to 5 the set of numbers {5, 6, 7, 8,...} i.e., all integers which 5 is less than
or equal to. The domain and range of this relation are the integers.

4b.
Glossary of
AI Terms
ACTORs: A modular form of representation, useful for distributing of the task of control
among several components in a computer program. Each ACTOR is a black box, with no
parts or slots, but which does have :;ome assertions (a "contract") which he must honor. It
merely responds to a fixed set of messages, by sending out certain messages of his own.
These are delivered via a bureaucracy. Recursive sending is permitted.
BEINGs: A modular form of representation of knowledge as a collection of cooperating
Each module is a list of Question/Answeringprogram pairs, where the set of
experts.
questions is fixed for all the Beings in the system. When any Being has a question, he broadcasts
it to the entire system, and some Being who recognizes it will take over control and try to
answer it by running his appropriate A nsweringprogram. In the process of running this, some
new questions may arise. Notice that Beings distribute responsibility for control and for static
knowledge. The advantages of having each BEING composed of the same structure, the same
names for its "slots", are. (i) efficient communication between Beings, and (ii) easy creation of
and "filling out" of brand new Beings.
Cooperating Knowledge Sources: Very often, in tackling a problem, one receives some hints and
some constraints from very different sources, phrased in very different languages, oflen
addressing different representations of the problem. For example, in trying understand a human
speaker, our memory of the previous discussion and knowledge of the speaker may narrow down
the possible meanings of what he is saying. Our ears, of course, register the precise acoustic waveforms he is uttering. Our English vocabulary forces us to interpret imperfect signals as real
words. Our eyes see his gestures and his lip movements, and give us more information. All
these different sources of information must be used, and yet they all are talking in differ ;nt
"languages" to us. The most trivial solution is to keep all the sources independent, and ki;ep
working until one of them can solve the problem all by itself. A much better solution is to
transform all their babblings into one canonical representation, one single language. There are in
fact no more profound ideas around yet on this "interfacing" problem.
FRAMEs: A modular representation of knowledge. Each module is a list of Feature/Value pairs.
"
"
Douglas B. Lenat
Automated Mathematician
SUPPLEMENT
page
7
The value represents a default assumption which can be relied on until/unless new information
comes in abut that feature. Each frame has whatever features (called "slots") seem appropriate.
Whenever a situation S is encountered, the frame(s) for S are activated. As new
Notice the
information rolls in, it replaces the default information in various slots.
control,
in
such
a
(data),
necessarily
system.
not
knowledge
on
static
distributing
emphasis
Heterarchy: This term refers to the control structure of a computer program. The typical
hierarchical structure is one in which a function calls a subroutine, which processes and then
returns a value to that function. A program is viewed as a tree structure, with lines indicating
"calling". Heterarchical structuring views the whole program as a collection of equal partners, an
unstructured set of functions. "Control" is viewed as a spotlight, which can be flicked from one
function to another. The functions can affect who does or doesn't get control next, but there is no
guarantee who will get control, or that control will revert back to some function which once had it.
Aside from the lure of its democratic flavor, it is clearly a natural way to represent cooperating
knowledge modules.
Modular Representations of Knowledge in AI Systems: (1) Definition: Knowledge is partitioned
into packets (called modules, frames, units, experts, actors) along lines of: different
applicabilities, expertise, purpose, importance, generality, etc. Each packet is structurally
similar to all the rest. (2) Advantages: By having the knowledge discretized, pieces can be
added and/or removed with no trouble. The knowledge of the system is easily inspected
and analyzed. The structural similarity yields several advantages: a simple control system
suffices to "run" all the knowledge, the modules can intercommunicate easily, new modules
can be inserted without knowing precisely "who else" is already in the system. (3) Examples:
Some modular schemes (and their program incarnations) are: Actors (Plasma), Frames, Beings
(PUP6), Production Systems (PSG, Dendral, Mycin), Predicate Calculus. (4) Relation to
"Cooperating Knowledge Sources" Although modular representation is a natural way to
implement cooperating knowledge sources, the two concepts are distinct. For example, Hearsay
uses opaque modules, which do not have similar structures, who communicate via a global
blackboard. In general, if the modules are to have nonstandard structures, then the iritercommunication media must be a simple scheme (like a global assertional data base, a
blackboard).
5. Documentation
1. Thesis Proposal: SUAI file SYS4[TLK,DBL]
2. This supplementary file: SUAI file SUP[HPP,DBL]
3. The text of the tutorial: SUAI file TUT[HPP,DBL]
4. The text of the planning talk: SUAI file DET[HPP,DBL]
To run: get into
5. The system itself: SUMEX files TOP6, CON6, and UTIL6.
follow
instructions.
L,
LISP, load
6. The use of BEINGS representation in AM is described in the paper: Duplication of Human
Actions by an Interacting Community of Knowledge Modules, Proceedings of the Third
International Congress of Cybernetics and Systems, Bucharest, Roumania, August, 1975.
7. An Englishlike description of the heuristics for each facet of each concept can be perused as
SUAI file GIVENLTLK.DBL].
"
(84)
Stanford Artificial Intelligence Laboratory
Memo AIM286
Computer Science Department
Report No. STANCS76570
AM: An Artificial Intelligence Approach to
Discovery in Mathematics as Heuristic Search
by
Douglas B. Lenat
Research sponsored by
Advanced Research Projects Agency
ARPA Order No. 2494
COMPUTER SCIENCE DEPARTMENT
Stanford University
r
r\
HPP768
July 1976
Stanford Artificial Intelligence Laboratory
Memo AIM286
July 1976
Computer Science Department
Report No. STANCS76570
AM: An Artificial Intelligence Approach to
Discovery in Mathematics as Heuristic Search
by
Douglas B. Lenat
ABSTRACT
A program, called "AM", is described which models one aspect of elementary mathematics
research: developing new concepts under the guidance of a large body of heuristic rules.
"Mathematics" is considered as a type of intelligent behavior, not as a finished product.
This dissertation was submitted to the Department of Computer Science and the Committee on
Graduate Studies of Stanford University in partial fulfillment of the requirements for the degree
of Doctor of Philosophy.
This research was supported by the Advanced Research Projects Agency of the Department of
Defense under Contract MDA 90276C0206 The views and conclusions contained in this
document are those of the author(s) and should not be interpreted as necessarily representing the
official policies, either expressed or implied, of Stanford University, ARPA, or the U. S.
Government.
.
Reproduced in the U.S.A. Available from the National Technical
Virginia 22161.
Information Service, Springfield,
Acknowledgments
I owe a great debt of thanks to many people, both for the input of new ideas and for the
evaluation, channelling, and pruning of my own.
Let me begin by alphabetically thanking my committee: Bruce Buchanan, Ed Feigenbaum,
Cordell Green, Don Knuth, and Allen Newell. Interacting with each of them has been an
exciting experience, and my thesis has greatly benefited from their guidance.
The following individuals have each informally supplied some ideas or comments that
appear within this thesis. They all have earned my gratitude, and have significantly
improved the experence you are about to have, that of reading this thesis: Danny Bobrow,
Don Cohen, Paul Coehn, Avra Cohn, Randy Davis, Bob Floyd, Carl Hewitt, Earl Sacerdoti,
Richard Weyrauch, and Terry Winograd. Let me also thank SAIL, SRI, and SUMEX for
providing a sophisticated computing environment in which to work.
Around this point in the Acknowledgements, most theses have some sort of tribute to the
candidate's wife. Until I was in the throes of this research, I never fully appreciated the
importance of such support. So let me sincerely acknowledge the indispensable aid I
received from Merle, my wonderful wife, who put up with inverted schedules and who gave
me the confidence to tackle this problem and the enthusiasm to keep going.
iv
I
I
I
I
9
fl
■
Table of Contents
1. Overview
1.1. Abstract of this Thesis
1.2. Fivepage Summary of the Project
1
2
Detour: Analysis of a diacovery
What AM does: Syntheses of discover ms
Reeulte
Cjt
Motivation [optional]
Conclusions
1.3. Ways of viewing AM as some common process
6
AM as Hillclimbing
H
■"
_
B
"
■
m
■
I
fl
fl
fl
AM as Heuristic Search
AM ac a Mathematician
AM as a Thesis [optional]
2. An Example: Discovering Prime Numbers
21. Discussion of the AM Program
Repreaentation
Agenda and Heuristics

22. What to get out of and NOT get out of
2.3. Deciphering the Example
24. The Example Itself
2.5. Recapping the Example
S. Control Structure
3.1. AM's Search
 this example
17
18
20
27
28
3.2. Constraining AM's Search
3.3. The Agenda
30
32
Why an Agenda?
Detaila of the Agenda scheme
4. Heuristic Rules
4.1. Syntax of the Heuristics
35
Syntax of the Lefthand Side
Syntex of the Righthand Side
4.2. Heuristics Suggest New Tasks
An llluetration: "Fill in Generalizationsof Equality"
Tho Ratinga Game
ft
4.3. Heuristics Create New Concepts
■
An Illustration: Discovering Primes
Tha Theory of Greeting New Concepts
Another Illustration: Squaring a number
fl
v
fl
14
38
42
4.4. Heuristics Fill in Entries for a Specific Facet
47
An Illustration: "Fill in Examplea of Setunion"
Heuristics Propose New Conjectures
An Wustrntion: "All primes except 2 are odd"
Another illustration: Discovering Unique Factorization
4.5. Gathering Relevant Heuristics
fl
54
Domain of Applicability
4.6. AM's Starting Heuristics ._
58
Heuristics Grouped by tha Knowledge They Embody
Heuristics Grouped by How Specific They Are
AM's Concepts
5.1. Motivation and Overview
_
fl
__
I
61
■
fl
A Glimpse of a Typical Concept
Tha main constraint: Fixed eet of facete
BEINGa Repreeentation of Knowledge
5.2. Facets
w
■
Rippling
Ordering the Relevant Heuristics
5.
a*
m
66
■
Ganeralizationa/Spacializatione
Exemplee/hia'e
InDomainof/lnRangeof
Viewa
Intuitione
Analogies
Conjee's
■"
_
B
fl
Definitions
Algorithms
Domain/Range
I
*
Worth
Intereet
■
■
Suggest
Fillin/Check
Other Fecele which ware Considered
5.3. AM's Starting Concepts
105
Diagram of Initial Concepte
Summary oF Initial Concepte
_
■
Rationale biihind Choice of Concepts
6. Results
6.1. What AM Did
m
B
114
Linear Tackbytack Summary of a Good Run
TwoDimensional Behavior Graph
AM ac c Computer Prog rem
vi
V
Jg
■
m
■
9

am
■
■c
'■
_
125
6.2. Experiments with AM
Muet the Worth numbers be finely tuned?
How finely tuned ie the Agenda?
How valuable is tacking reasons onto each task?
What if certain concepte are eliminated/added?
Whet if cortein heurietice are tampered with?
Can AM work in a new domain: Plana Geometry?
7. Evaluating AM
7.1. Judging Performance
135
AM'a Ultimate Discoveries
The Magnitude of AM's Progreee
The Quality of AM'a Route
The Character of the UserSystem Interactions
AM's Intuitive Powers
Experiments on AM
How to Perform Experiment^ on AM
Future Implications of this Project
Open Problems: Suggestions for Future Reeearch
Comparison to Other Systems
7.2. Capabilities and Limitations of AM
I
Current
I
Limiting Assumptions
i
Choice of Domain
Limitatione of the Model of Math Reeearch
Ultimata powera and weaknesses
ibj
I
153
Abilities
Current Limitatione
limitations of the Agenda scheme
7.3. Final Conclusions
163
I
■
Appendix 1. Glossary of Technical Terms
Glossary of Math Terms
Glossary of A I Terms
165
II
Appendix 2. AM's Concepts
Initial Concepts
172
j
__

m
]
ll
IB
"
_
Index of Initial Concepte
Anything, Anyconcept, Active, Predicate, Objectequality, Conetantpredicate,
OsetOperation, Compose, Insert,
BagDelete, ListDelete, Oeetinsert, Listinsert, Baginsert, Delete,
Baglntereect,
Delete, Intersect, ListIntersect, OsetIntersect,
OeetBagUnion,
Union, LietUnion,
Canonize, Perallelreplace2, Parallelreplace,
BagDiff,
Repeat2, Repeat, Paralleljoin2, Paralleljoin, Reveraeordpair, Lastelement,
Allbutthefirstelement, Allbutthelastelement, Member,
Firstelement,
Projection], Projection2, Identity, Restrict, Invertanoperation, Invertedop,
Relation, Logicalcombination, Object, Conjecture, Atomobj, Truthvalue,
MultipleelementeStructureofStructures,
etructure, hfemultipleelementeetructure, Emptyetructure, Nonamptyetructura,
Bag*. L**<»> Orderedpaira,
Concepts never fully implemented
vii
Concepts and Heuristics as coded in LISP
The 'Compose' Concept
The 'Osete' Concept
Concepts created by AM
Appendix 3. AM's Heuristics
Heuristics for dealing with Anything
Heuristics for dealing with Anyconcept
226
facet of Anyconcept
Heuristic* for
Heuristic* for the Examples faceta of Anyconcept
Heuristic* for the Conjees facet of Anyconcept
Heuristic* for the Analogies facet of Anyconcept
Heuriatic* for the Genl/Spec facets of Anyconcept
Heuristics for the View facet of Anyconcept
Heuriatice for the Indom/ranof faceta of Anyconcept
Heuristics for the Definition facet of Anyconcept
Heuristics for dealing with any Active concept
Heuristics for dealing with any Predicate
Heuristics for dealing with any Operation
Heuristics for dealing with any Composition
Heuristics for dealing with any Insertions
Heuristics for dealing with the operation Coalesce
Heuristics for dealing with the operation Canonize
Heuristics for dealing with the operation Substitute
Heuristics for dealing with the operation Restrict
Heuristics for dealing with the operation Invert
Heuristics for dealing with Logical combinations
Heuristics for dealing with Structures
Heuristics for dealing with Orderedstructures
Heuristics for dealing with Unorderedstructures
Heuristics for dealing with Muttipleelesstructures
Heuristics for dealing with Sets
Appendix 4. MaximallyDivisible Numbers
A Meaningful Question
Special Case: n 2a 3b
Special Case: n 2a3b5c
277

The General Case
An even stronger claim
AM and Ramanujan
Appendix 5. Traces of AM in Action
Prose Traces
A 'Nice' Taskbytask Trace
An 'Unadulterated' Trace
287
Appendix 6. Bibliography
337
Documentation
viii
I
1
Chapter 1. Overview
Indeed, you can build a machine to draw demonstrative conclusions for you, but I
think you can never build a machine that will draw plausible inferences.
—
Polya
.1. Abstract of this Thesi
A program, called "AM", is described which models one aspect of elementary mathematics
research: developing new concepts under the guidance of a large body of heuristic rules.
"Mathematics" is considered as a type of intelligent behavior, not as a finished product.
The local heuristics communicate via an agenda mechanism, a global list of tasks for the
system to perform and reasons why each task is plausible. A single task might direct AM to
define a new concept, or to explore some facet of an existing concept, or to examine some
empirical data for regularities, etc. Repeatedly, the program selects from the agenda the
task having the best supporting reasons, and then executes it.
Each concept is an active, structured knowledge module. A hundred very incomplete
modules are initially provided, each one corresponding to an elementary settheoretic
concept (e.g., union). This provides a definite but immense "space" which AM begins to
explore. AM extends its knowledge base, ultimately rediscovering hundreds of common
concepts (e.g., numbers) and theorems (e.g„ unique factorization).
This approach to plausible inference contains great powers and great limitations.
I
AM:
Chapter I
Diacovery in Mathematice aa Heurirtic Search
2
1.2. Fivepage Summary of the Project
Scientists often face the difficult task of formulating nontrivial research problems which are
solvable. In any given branch of science, it is usually easier to tackle a specify: given
problem than to propose interesting yet managable new questions to investigate. For
example, contrast solving the Missionaries and Cannibals problem with the more ill defined
reasoning which led to inventing it.
This thesis is concerned with creative theory formation in mathematics: how to propose
interesting new concepts and plausible hypotheses connecting them. The experimental
vehicle of my research is a computer program called AM 1 Initially, AM is given the
definitions of 115 simple settheoretic concepts (like "Delete", "Equality"). Each concept is
represented internally as a data structure with a couple dozen slots or facets (like
"Definition", "Examples", "Worth"). Initially, most facets of most concepts are blank, and
AM uses a collection of 250 heuristics plausible rules of thumb for guidance, as it tries
to fill in those blanks. Some heuristics are used to select which specific facet of which specific
concept to explore next, while others are used to actually find some appropriate information
about the chosen facet. Other rules prompt AM to notice simple relationships between
known concepts, to define promising new concepts to investigate, and to estimate how
interesting each concept is.


1.2.1. Detour: Analysis of a discover
Before discussing how to synthesize a new theory, consider briefly how to analyze one, how
to construct a plausible chain of reasoning which terminates in a given discovery. One can
do this by working backwards, by reducing the creative act to simpler and simpler creative
acts. For example, consider the concept of prime numbers. How might one be led to define
such a notion? Notice the following plausible strategy:
"If f it a function which transforms elements of A into elements of B, and
B it ordered, thtn consider just those) members of A which are
transformed into extremal elements of B. This set is an interesting subset
of A."
When f(x) means "divisors of x", and the ordering is "by length", this heuristic says to
consider those numbers which have a minimal 2 number of factors that is, the primes. So
this rule actually reduces our task from "proposing the concept of prime numbers' to the
more elementary problems of "discovering orderingbylength" and "inventing divisorsof".

But suppose we know this general rule: "If f is an interesting function, consider its inverse." It
The) original meaning of
thie mnemonic has been abandoned. Ac Exodue states: I AM that I AM.
Tha other extreme, numbers with a MAXIMAL number of factors, wae aleo proposed by AM ac worth investigating Thie led
AM to many intereeting questions See Appendix A.
I
I
I
I
1
I
I
I
I
I
I
I
I
I
I
I
I
AM:
Chapter I
I
.
reduces the task of discovering divisorsof to the simpler task of discovering multiplication 3
Eventually, this task reduces to the discovery of very basic notions, like substitution, setunion, and equality. To explain how a given researcher might have made a given
discovery, such an analysis is continued until that inductive task is reduced to "discovering"
notions which the researcher already knew, which were his conceptual primitives.
I
What AM does: Syntheses of discoveries
II
II
I
I
Discovery in Mathematics aa Heuristic Search
the paradox that the more original a discovery the more obvious it
afterwards. The creative act is not an act of creation in the sense of the
Old Testament. It does not create something out of nothing; it uncovers, selects,
reshuffles, combines, synthesizes already existing facts, faculties, skills. The more
familiar the parts, the more striking the new whole.
This leads
to
seems
 Koestler
Suppose a large collection of these heuristic strategies has been assembled (e.g., by analyzing
a great many discoveries, and writing down new heuristic rules whenever necessary).
Instead of using them to explain how a given idea might have evolved, one can imagine
starting from a basic core of knowledge and "running" the heuristics to generate new
concepts. We're talking about reversing the process described in the last section: not how to
explain discoveries, but how to make them.
Such syntheses are precisely what AM does. The program consists of a large corpus of
1
primitive mathematical concepts, each with a few associated heuristics' AM's activities all
serve to expand AM itself, to enlarge upon a given body of mathematical knowledge. To
cope with the enormity of the potential "search space" involved, AM uses its heuristics as
judgmental criteria to guide development in the most promising direction. It appears that
the process of inventing worthwhile new5 concepts can be guided successfully using a
.
1
I
I
I
I
I
I
collection of a few hundred such heuristics.
Each concept is represented as a framelike data structure with 25 different facets or slots.
The types of facets include: Examples, Definitions, Generalizations, Domain/Range, Analogies,
Interestingness, and many others. Modular representation of concepts provides a convenient
scheme for organizing the heuristics; for example, the following strategy fits into the
Examples facet of the Predicate concept: "If, empirically, 10 times as many elements fail some
predicate P, as satisfy it, then some generalization (weakened version) of P might be more
interesting than P". AM considers this suggestion after trying to fill in examples of each
3
4
Plue noticing that multiplication ie associative end commutative.
.
rules which function cc local "plausible move generatora Some euggeet teeke for the system to cerry
out, soeie euggeet waye of satisfying a given task, etc.
Typically, "new" mesne new to AM, not to Mankind; end "worthwhile"cen only be judged in hindsight
Situation/action
AM:
Chapter I
predicate6
Diacovery in Mathematics ac Heuristic Search
4
.
AM is initially given a collection of 115 core concepts, with only a few facets filled in for
each. Its sole activity is to choose some facet of some concept, and fill in that particular slot.
In so doing, new notions will often emerge. Uninteresting ones are forgotten, mildly
interesting ones are kept as parts of one facet of one concept, and very interesting ones are
granted full conceptmodule status. Each of these new modules has dozens of blank slots,
hence the space of possible actions (blank facets to fill in) grows rapidly. The same
heuristics are used both to suggest new directions for investigation, and to limit attention:
both to sprout and to prune.
1.2.3. Results
The particular mathematical domains in which AM operates depend upon the choice of
initial concepts. Currently, AM begins with nothing but a scanty knowledge of concepts
which Piaget might describe as prenumericai. Sets, substitution, operations, equality, and so
on. In particular, AM is not told anything about proof, singlevalued functions, or
numbers.
From this primitive basis, AM quickly discovered 7 elementary numerical concepts
(corresponding to those we refer to as natural numbers, multiplication, factors, and primes)
and wandered around in the domain of elementary number theory. AM was not designed
to prove anything, but it did conjecture many wellknown relationships (e.g., the unique
factorization theorem).
AM was not able to discover any "newtoMankind" mathematics purely on its own, but has
discovered several interesting notions hitherto unknown to the author. A couple bits of new
mathematics have been inspired by AM. 2 A synergetic AMhuman combination can
8
sometimes produce better research than either could alone. Although most of the concepts
AM proposed and developed were already very well known, AM defined some of I hem in
novel ways (e.g., prime pairs were defined by restricting addition to primes; that is, for
which primes p.q.r is it possible that p+qr? 9).
Everything that AM does can be viewed as testing the underlying body of heuristic rules.
Gradually, this knowledge becomes better organized, its implications clearer. The resultant
body of detailed heuristics may be the germ of a more efficient programme for educating
co few are found that AM decides to gene elite that
AM ettempte to find examples of
i.e.,
predicate. The result ie the creation of a new predicate which means "Has ■thesamelengthae '
rudimentary precunior to natural numbers.
"Discovering a concept means that (I) AM recognized it ac a distinguished entity (e.g., by formuleting ite definition) and
aleo (2) AM decided it waa worth investigating (either becauee of the intereeting way it wae iormed, or
becauee of aurprieing preliminary empirical results).
Thie ie supported by Gelernter'i experiences with his geometry program: White lecturing about how it might prove a
certein theorem about t»o»cele» triangle*, he came up with a new, cute proof. Similarly, Guard and Eastman
noticed an intermediate reault of their SAM resolution theorem prover, end wieeh/ interpreted it ac a
nontrivial result in lattice theory (now known aa SAM'a lemma).
g
answer
ie
The
that either p or q must be 2, and that the other two primes are a prime pair Lo., they differ by two.
In
—
after
—
"
I
Chapter I
I
I
I
I
I
AM:
Discovery in Mathematics ac Heuriatic Search
5
.
math students than the current dogma 10
Another benefit of actually constructing AM is that of experimentation: one can vary the
concepts AM starts with, vary the heuristics available, etc., and study the effects on AM's
behavior. Several such experiments were performed. One involved adding a couple dozen
new concepts from an entirely new domain: plane geometry. AM busied itself exploring
elementary geometric concepts, and was almost as productive there as in its original domain.
New concepts were defined, and new conjectures formulated. Other experiments indicated
that AM was more robust than anticipated; it withstood many kinds of "detuning". Others
demonstrated the tremendous impact that a few key concepts (e.g., Equality) had on AM's
behavior. Several more experiments and extensions have been planned for the future.
1.2.4. Motivation foptionall
We need a supermathematics in which the operations are as unknown as the
quantities they operate on, and a supermathematician, who does not know what
he is doing when he performs these operations.
—
I
1
Eddington
Although the motivation for carrying out this research of course preceded the effort, I have
delayed until this section a discussion of why this is worthwhile, why it was attempted.
First there was the inherent interest of getting a handle on scientific creativity. AM is partly
a demonstration that some aspects of creative theory formation can be demystified, can be
modelled as simple rulegoverned behavior.
Related to this is the potential for learning from AM more about the processes of concept
formation. This was touched on previously, and several experiments already performed on
AM will be detailed later.
Third, AM itself may grow into something of pragmatic value. Perhaps it will become a
useful tool for mathematicians, for educators, or as a model for similar systems in more
"practical" fields. Perhaps in the future we scientists will be able to rely on automated
assistants to carry out the "hack" phases of research, the tiresome legwork necessary for
secondary" creativity.
Historically, the domain of AM came from a search for a scientific field whose activities had
no specific goal, and in which natural language abilities were unnecessary. This was to test
out the BEINGs [Lenat 75b] ideas for a modular representation of knowledge.
Currently, an educator take* the very beat work any mathematician ha* ever done, polishes it until its brilliance ie
blinding, then present* it to the student to induce upon. Meny individuals (eg., Knuth and Polya) have
pointed out thie blunder. A few (e.g., Papert at MIT, Adama at Stanford) are experimenting with more
realistic strategies for "teaching" creativity. See the references by theee author* in the bibliography.
I
AM:
Chapter 1
Discovery in Mathematics ac Heuristic Search
It would be unfair not to mention the usual bad reasons for this research: the "Look ma, no
hands" syndrome, the AI researcher's classic maternal urges, ego, the usual thesis drives, etc.
1.2.5. Conclusions
AM is forced to judge a priori the value of each new concept, to lose interest quickly in
concepts which aren't going to develop into anything. Often, such judgments can only be
based on hindsight. For similar reasons, AM has difficulty formulating new heuristics
which are relevant to the new concepts it creates. Heuristics are often merely compiled
hindsight. While AM's "approach" to empirical research may be used in other scientific
domains, the main limitation (reliance on hindsight) will probably recur. This prevents
AM from progressing indefinitely far on its own.
This ultimate limitation was reached. AM's performace degraded more and more as it
progressed further away from its initial base of concepts. Nevertheless, AM demonstrated
that selected aspects of creative discovery in elementary mathematics could be adequately
represented as a heuristic search process. Actually constructing a computer model of this
activity has provided an experimental vehicle for studying the dynamics of plausible
empirical inference.
1.3. Ways of
viewing
AM as some common
procesi
This section will provide a few metaphors: some hints for squeezing AM into paradigms
with which the reader might be familiar. For example, the existence of heuristics in AM is
functionally the same as the presence of domainspecific information in any knowledgebased system.
Consider assumptions, axioms, definitions, and theorems to be syntactic rules For the
language that we call Mathematics. Thus theoremproving, and the whole of textbook
mathematics, is a purely syntactic process. Then the heuristic rules used by a
mathematician (and by AM) would correspond to the semantic knowledge associated with
these more formal methods.
as one can upgrade naturallanguageunderstanding by incorporating semantic
knowledge, so AM is only as successful as the heuristics it knows.
Just
Four more ways of "viewing" AM as something else will be provided: (i) AM as a hillclimber, (ii) AM as a heuristic search program, (iii) AM as a mathematician, and (iv) AM
as a thesis.
1.3.1. AM as Hillclimbing
Let's draw an analogy between the process of developing new mathematics and the familiar
process of hillclimbing. We may visualize AM as exploring a space using a measuring or
evaluation" function which imparts to it a topography.
AM:
Chapter 1
Discovery in Mathematics aa Heuristic Search
7
Consider AM's core of very simple knowledge. By compounding its known concepts and
methods, AM can explore beyond the frontier of this foundation a little wherever it wishes.
The incredible variety of alternatives to investigate includes all known mathematics, much
trivia, countless deadends, and so on. The only "successful" paths near the core are the
narrow ridges of known mathematics (plus perhaps a few asyetundiscovered isolated
peaks).
How can AM walk through this immense space, with any hope of following the few, slender
trails of alreadyestablished mathematics (or some equally successful new fields)? AM must
do hillclimbing: As new concepts are formed, decide how promising they are, and always
explore the currently mostpromising new concept. The evaluation function is quite
nontrivial, and this thesis may be viewed as an attempt to study and explain and duplicate
the judgmental criteria people employ. Preliminary attempts" at codifying such
"mysterious" emotive forces as intuition, aesthetics, utility, richness, interestingness,
relevance... indicated that a large but not unmanageable collection of heuristic rules should
suffice.
The important visualization to make is that with proper evaluation criteria, AM's planar
mass of interrelated concepts is transformed into a threedimensional relief map: the known
lines of development become mountain ranges, soaring above the vast flat plains of trivia
and inconsistency below.
Occasionally an isolated hill is discovered near the core; 12 certainly whole ranges lie
undiscovered for long periods of time 13, and the terrrain far from the initial core is not yet
explored at all.
1.3.2. AM as Heuristic Search


As the title of this section and this thesis proclaims, AM is a kind of "heuristic search"
program. That must mean that AM is exploring a particular "space," using some informal
evaluation criteria to guide it.
The flavor of search which is used here is that of progressively enlarging a tree. Certain
"evaluationfunction" heuristics are used to decide which node of the tree to expand next,
and other guiding rules are then used to produce from that node a few interesting successor
nodes. To do mathematical research well, I claim that it is necessary and suffkent to have
good methods for proposing new concepts from existing ones, and for deciding how
interesting each "node" (partiallystudied concept) is.
AM is initially supplied with a few facts about some simple math concepts.
AM then
Thee* took the form of informal simulation*. Although far from controlled experiment*, they indicated the feaaability of
attempting to create AM, by yielding en approximate figure for the amount of informal knowledge euch
eyetem
12
would need.
E.g., Conway'* number*, ac de*cr*>*d in [Knuth 74).
13
Eg., nonEuclidean geometric* weren't thought of until I M&.
"
AM:
Chapter I
Discovery in Mathematica aa Heuriatic Search
explores mathematics by selectively enlarging that basis. One could say that AM consists of
an active body of mathematical concepts, plus enough "wisdom" to use and develop them
effectively. For "wisdom", read "heuristics". Loosely speaking, then, AM is a heuristi: search
program. To see this more clearly, we must explain what the nodes of AM's search space
are, what the successor operators or links are, and what the evaluation function is.
AM's space can be considered to consist of all nodes which are consistent, partially filledin
concepts. Then a primitive "legal move" for AM would be to (i) enlarge some facet of some
concept, or (ii) create a new, partiallycomplete concept. Consider momentarily the size of
this space. If there were no constraint on what the new concepts can be, and no informal
knowledge for quickly finding entries for a desired facet, a blind "legalmove" program
would go nowhere slowly! One shouldn't even call the activity such a program would be

doing "math research."
The heuristic rules are used as little "plausible move generators". They suggest which facet
of which concept to enlarge next, and they suggest specific new concepts to create. The only
activities which AM will consider doing are those which have been motivated far some
specific good 14 reason. A global agenda of tasks is maintained, listing all the activities
suggested but not yet worked on.
AM has a definite algorithm for rating the nodes of its space. Many heuristics exist merely
to estimate the worth of any given concept. Other heuristics use these worth ratings to
order the tasks on the global agenda list. Yet AM has no specific goal criteria: it can never
"halt", never succeed or fail in any absolute sense. AM goes on forever 15
.
Consider Nilsson's descriptions of depthfirst searching and breadthfirst searching (^Nilsson
71]). He has us maintain a list of "open" nodes. Repeatedly, he plucks the top one and
expands it. In the process, some new nodes may be added to the Open list. In the case of
depthfirst searching, they are added at the top; the next node to expand is the one most
recently created; the Openlist is being used as a pushdown stack. For breadthfirsl search,
new nodes are added at the bottom; they aren't expanded until all the older nodes have
been; the Openlist is used as a queue. For heuristic search, or "bestfirst" search, new nodes
are evaluated in some numeric way, and then "merged" into the alreadysorted list of Open
nodes.
This process is very similar to the agenda mechanism AM uses to manage its search. This
will be discussed In detail in Chapter 3. Each entry on the agenda consists of three parts:
(i) a plausible task for AM to do, (ii) a list of reasons supporting that task, and (iii) a
numeric estimate of the overall priority this task should have. When a task is suggested for
some reason, it is added to the agenda. A task may be suggested several times, for different
reasons. The global priority value assigned to each task is based on the combined value of
its reasons. The control structure of AM is simply to select the task with the highest
priority, execute it, and select a new one. The agenda mechanism appears to be a very wellsuited data structure for managing a "bestfirst" search process.
14
—
—
Of course, AM thinke a reaeon m "good" if and only if it waa told that by
be plausible, preferably the once actually used by the experte.
Technically, forever ie about 100,000 list cells and a couple cpu hours
c heurietic rule; co thoee rules had better
I
I
AM:
Chapter I
I
I
I
I
Discovery in Mathematics aa Heuriatic Search
Similar control structures were used in LT [Newell, Shaw, & Simon 57], the predictor part
of Dendral [Buchanan et al 69], SIMULA67 [Dahl 68], and KRL [Bobrow & Winograd
77]. The main difference is that in AM, symbolic reasons are used (albeit in trivial tokenlike ways) to decide whether and how much to boost the priority of a task when it is
suggested again.


There are several difficulties and anomalies in forcing AM into the heuristic search
paradigm. In a typical heuristic search (e.g., Dendral [Feigenbaum et al 71], MetaDendral
[Buchanan et al 72], most gameplaying programs [Samuel 67]), a "search space" is defined
implicitly by a "legal move generator". Heuristics are present to constrain that generator so
that only plausible nodes are produced. The second kind of heuristic search, of which AM
is an example, contains no "legal move generator". Instead, AM's heuristics are used as
plausible move generators. Those heuristics themselves implicitly define the possible tasks
AM might consider, and all such tasks should be plausible one. In the first kind of search,
removing a heuristic widens the search space; in AM's kind of search, removing a heuristic
I
reduces it.
I
Another anomaly is that the operators which AM uses to enlarge and explore the space of
concepts are themselves mathematical concepts (e.g., some heuristic rules result In the
creation of new heuristic rules; "Compose" is both a concept and an operation which results
in new concepts). Thus AM should be viewed as a mass of knowledge which enlarges itself
repeatedly. Typically, computer programs keep the information they "discover" quite
separate from the knowledge they use to make discoveries 16
I
I
I
I
I
I
I
I
I
Perhaps the greatest difference between AM and typical heuristic search procedures is that
AM has no welldefined target concepts or target relationships. Rather, its "goal criterion"
is to maximize the interestingness level of the activities it performs, the
its sole aim
of
the top tasks on the agenda. It doesn't matter precisely which definitions
priority ratings
or conjectures AM discovers or misses so long as it spends its time on plausible tasks.
There is no fixed set of theorems that AM should discover, so AM is not a typical problemsolver. There is no fixed set of traps AM should avoid, no small set of legal moves, and no
winning/losing behavior, so AM is not a typical gameplayer.
—



For example, no stigma is attached to the fact that AM never discovered real numbers 17 it
was rather surprising that AM managed to discover natural numbers! Even if it hadn't
done that, it would have been acceptable 18 if AM had simply gone off and developed ideas
in set theory.
Of
course
There ere
thi* ie often because the two kinds of knowledge are very different: For
c chessplayer, the firet kind ie
"good board positions," and the second ie "strategies for making a good move." Theoremprover* are an
exception. They produce a new theorem, end then use it (elmost like a new operator) in future proof* A
program to learn to play checkere [Samuel 67] haa thi* aame
thereby indicating that thie 'selfhelp'
property ie not c function of the tack domain, not simply a characteriatic of mathematice.
—
—
"nice" thinge which AM didn't end cent
do: e.g., devieing geometric concepte from He initial simple
eettheoretic knowledge. See the diecueeion of the limitation* of AM, Section 7.2.
Acceptable to whom? Ie there really a domaininvariant criterion for judging the quality of AM'a action*? See the)
diecueeione in Section 7.1.
I
AM:
Chapter I
Discovery in Mathematics aa Heuriatic Search
1.3.3. AM as a Mathematician
Before diving into the innards of AM, let's take a moment to discuss the totality of the
mathematics which AM carried out. Like a contemporary historian summarizing the work
of the Babylonian mathematicians, we shan't hesitate to use current terms and criticize by
current standards.
AM began its investigations with scanty knowledge of a few settheoretic concepls (sets,
sets, set operations). Most of the obvious settheory relations (e.g., de Margan's
laws) were eventually uncovered; since AM never fully understood abstract algebra, the
statement and verification of each of these was quite obscure. AM never derived a formal
notion of infinity, but it naively established conjectures like "a set can never be a member of
itself", and procedures for making chains of new sets ("insert a set into itself"). No
sophisticated set theory (e.g., diagonalization) was ever done.
equality of
After this initial period of exploration, AM decided that "equality" was worth generalizing,
and thereby discovered the relation "samesizeas". "Natural numbers" were based on this,
and soon most simple arithmetic operations were defined.
Since addition arose as am analog to union, and multiplication as a repeated substitution
followed by a generalized kind of unioning 19 it came as quite a surprise when AM noticed
that they were related (namely, N+N«2xN). AM later rediscovered multiplication in three
other ways: as repeated addition, as the numeric analog of the Cartesian product of sets,
and by studying the cardinality of power sets20 These operations were defined in different
ways, so it was an unexpected (to AM) discovery when they all turned out to be equivalent.
These surprises caused AM to give the concept Times' quite a high Worth rating.
.
Exponentiation was defined as repeated multiplication. Unfortunately, AM never found any
obvious properties of exponentiation, hence lost all interest in it.
Soon after defining multiplication, AM investigated the process of multiplying a number by
itself: squaring. The inverse of this turned out to be interesting, and led to the definition of
squareroot. AM remained content to play around with the concept of integersquareroot
Although it isolated the set of numbers which had no square root, AM was never close to
discovering rationals, let alone irrationals.
to fourth powers, and fourthrooting, were discovered at this time. Perfect squares
and perfect fourthpowers were isolated. Many other numeric operations and kinds of
numbers were isolated: Odds, Evens, Doubling, Halving, etc. Primitive notions of Tumeric
inequality were defined but AM never even discovered Trichotomy.
Raising
The associativity and commutativity of multiplication indicated that it could accept
10
i
BAG
Take two bege A and B Replace each element of A by the bag B. Remove one level of parenthesee by taking tho union of
ell elements of the Irensfigured bag A. Then that new bag will have cc meny elemente ac tha product of the
length* of the two original baga.
20 The eize of the
S
eet of ell eubeets of Sic 2 Thu* the power eet of AUB he* length equel to the product of tl* length*
of the power eete cf A and B individually (aeeuming A and B are diejoint).
AM:
Chepter I
Diacovery in Mathamatica
Heurietic Search
11
of numbers as its argument. When AM defined the inverse operation corresponding to
Times, this property allowed the definition to be: "any bag of numbers (>1) whose product is
x". This was just the notion of factoring a number x. Minimallyfactorable numbers
turned out to be what we call primes. Maximallyfactorable numbers were also thought to
be interesting.
Prime pairs were discovered in a bizarre way: by restricting addition (its arguments and its
values) to Primes. 21 AM conjectured the fundamental theorem of arithmetic (unique
factorization into primes) and Goldbach's conjecture (every even number >2 is the sum of
two primes) in a surprisingly symmetric way. The unary representation of numbers gave
way to a representation as a bag of primes (based on unique factorization), but AM never
thought of exponential notation. 22 Since the key concepts of remainder, greaterthan, gcd,
and exponentiation were never mastered, progress in number theory was arrested.
When a new base of geometric concepts was added, AM began finding some more general
associations. In place of the strict definitions for the equality of lines, angles, and triangles,
came new definitions of concepts we refer to as Parallel, Equalmeasure, Similar, Congruent,
Translation, Rotation, plus many which have no common name (e.g. the relationship of two
triangles sharing a common angle). A cute geometric interpretation of Goldbach's
conjecture was found 23 Lacking a geometry "model" (an analogic representation like the
one Gelernter employed), AM was doomed to failure with respect to proposing only
plausible geometric conjectures.
.
Similar restrictions due to poor "visualization" abilities would crop up in topology. The
concepts of continuity, infinity, and measure would have to be fed to AM before it could
enter the domains of analysis. More and more drastic changes in its initial base would be
required, as the desired domain gets further and further from simple finite set theory and
elementary number theory.
21
That ie, coneider the eet of triplee p,q,r, all prime*, for which p*qr. Then one of them muet be "2", end the other two
muet therefore form c prime pair.
22 A
with a human being
tangential note: All of the di*coverie* mentioned above were made by AM working by
obeerving it* behavior. If the level of eophietication of AM* concepte were higher (or tha level of
sophistication of it* u*er* were lower), then it might be worthwhile to develop c nice user eyetam
in that caee could and ought to work right along with AM ac a coreeoarcher.
interface. The
23
Given ad angle* of a prima number of degree*, (0,1,2,3,5,7,1 1,...,179 degraee), then any angle between 0 and 180
degree* can be approximated (to within I degree) aa tha aum of two of those angle*.
—
—
AM:
Chapter I
Discovery in Mathematics aa Heuristic Search
12
.3.4. AM as a Thesis foptionall
Walking home along a deserted street late at night, the reader may imagine
himself to feel in the small of his back a cold, hard object; and to hear the words
spoken behind him, 'Easy now. This is a stickup. Hand over your money.' What
does the reader do? He attempts to generate the utterance. He says to himself,
now if I were standing behind someone holding a cold, hard object against his
back, what would make me say that? What would I mean by it? The reader is
advised that he can only arrive at the deep structure of this book, and through the
deep structure the semantics, if he attempts to generate the book for himself. The
author wishes him luck.
—
Linderholm
Don't be scared by the weight of the document you're now holding. If you flip to page 165,
you'll see that the last twothirds are just appendices.
Each chapter is of roughly equal importance, which explains the huge variation in length.
Start looking over Chapter 2 right away: it contains a detailed example of what AM does.
Since you're reading this sentence now, we'll assume that you want a preview of what's to
come in the rest of this document.
Chapter 3 covers the toplevel control structure of the system, which is based aroi nd the
notion of an 'agenda' of tasks to perform. In Chapter 4 the lowlevel control structure is
revealed: AM is really guided by a mass of heuristic rules of varying generality. Chapter 5
contains more than you want to know about the representation of knowledge in AM. The
diagram showing some of AM's starting concepts (page 105) is worth a look, even out of
context.
Most of the results of the project are presented in Chapter 6. In addition to simply 'running*
AM, several experiments have been conducted with it. It's awkward to evaluate AM, and
therefore Chapter 7 is quite long and detailed.
The appendices provide material which supplements the text. Appendix 2 contains a
description of all the initial concepts, some examples of how they were coded into Lisp, and
a partial list of the concepts AM defined and investigated along the way. Appendix 3
exhibits all 242 heuristics that AM is explicitly provided with. Appendix 4 is essentially a
math article, about the major discovery that AM motivated: maximallydivisible numbers.
Finally, Appendix 5 contains traces of AM in action: a long prose description, a long taskbytask description, and a long undoctored transcript excerpt. Appendix 1 hasn't been
mentioned yet, and forms the subject of the remainder of this section.


This thesis
and its readers
must come to grips with a very interdisciplinary p'oblem.
For the reader whose background is in Artificial Intelligence, most of the system's actions
the "mathematics" it does may seem inherently uninteresting. For the mathematician, the
word "LISP" signifies nothing beyond a speech impediment (to Artificial Intelligence types it

—
I
Chapter I
I
I
I
I
I
I
I
I
I
I
I
I
AM:
Discovery in Mathematics
Heuristic Search
also connotes a programming impediment). If I don't describe "LISP" the first time I
mention it, a large fraction of potential readers will never realize that potential. If I do stop
to describe LISP, the other readers will be bored.
In an attempt not to lose readers due to jargon, two glossaries of terms have been compiled.
165) contains capsule descriptions of the mathematical terms, ideas, and
notations used in this thesis. Appendix 1.2 renders the analogous service for Artificial
Intelligence jargon and computer science concepts.
Appendix 1.1 (p.
I
14
I
I
Chapter 2.
An Example: Discovering Prime Numbers
I
I
I
I
I
I
I
1
I
I
I
This chapter will present an example of AM in action, an excerpt from the output of AM,
as it investigates some concepts.
After a brief discussion of AM's control structure in Section 2.1, the reader will be told
and is not. Section 2.3 provides a few eleventhhour
what the point of this example is
the
example.
decoding
hints at

The excerpt itself follows in Section 24. It skips the first half of the session, and picks up
at a point just after AM has defined the concept "Divisorsof". Soon afterward, AM defines
Primes, and begins to find interesting conjectures related to them. The excerpt goes on to
show how AM conjectured the fundamental theorem of arithmetic and Goldbach's
conjecture. AM derived the notion of partitioning a collection of n objects into smaller
bundles, but failed to find any interesting conjectures about that process. Instead, AM was
sidetracked into the (probably) fruitless investigation of numbers which can be represented
as the sum of two primes in one unique way.
The final section of this chapter will recap this example the way a math historian might
report it.
2.1. Discussion of the AM Program
2.1.1. Representation
AM is a program which expands a knowledge base of mathematical concepts. Each concept
is stored as a particular kind of data structure, namely as a collection of properties or
"facets" of the concept. For example, here is a miniature example of a concept 1 :
Tha right arrow
("■*") in tha box on the next page ie the symbol for "implie*". "No*." ie an abbreviation for "Number**. Tha
verticel ber "" is c symbol for the predicate "divides evenly into") the hook V ie a symbol for tha predicete
"the negation of". "«" indicates exclusive or, and the symbol "V" is read "for all". Plea** coneult tha
gloeeary, Appendix 1.1, for fuller diecueeion of these, plus other math terms like "Prima pair**.
Chapter 2
AM:
Discovery in Mathomatke ac Heuriatic Search
"Creating a new concept" is a welldefined activity: it involves setting up a new data
structure like the one above, and filling in entries for some of its facets or slots. Filling in a
particular facet of a particular concept is also quite welldefined, and is accomplished by
executing a collection of relevant heuristic rules. This process will be described in great
detail in later chapters.
2.12 Agenda and Heuristics
An agenda of plausible tasks is maintained by AM. A typical task is "Fillin examples of
Primes". The agenda may contain hundreds of entries such as this one. AM repeatedly
selects the top task from the agenda and tries to carry it out. This is the whole control
structure! Of course, we must still explain how AM creates plausible new tasks to place on
the agenda, how AM decides which task will be the best one to execute next, and how it
carries out a task.
Chapter 2
AM:
Discovery in Mathematica aa Heuriatic Search
If the task is "Fill in new Algorithms for Setunion", then satisfying it would mean actually
synthesizing some new procedures, some new LISP code capable of forming the union of
any two sets. A heuristic rule is relevant to a task iff executing that rule brings AM closer
to satisfying that task. Relevance is determined a priori by where the rule is stored. A rule
tacked onto the Domain/range facet of the Compose concept would be presumed relevant to
the task "Check the Domain/range of InsertoDelete".
Once a task is chosen from the agenda, AM gathers some heuristic rules which might be
relevant to satisfying that task. They are executed, and then AM picks a new task. While
a rule is executing, three kinds of actions or effects can occur:
(i) Facets of some concepts can get filled in (e.g., examples of primes may actually be found
and tacked onto the "Examples" facet of the "Primes" concept). A typical heuristic rule
which might have this effect is:
To fill in examples of X, where X is a kind of V (for some more general concept V),
Check the examples of V; some of them may be examples of X as well.
For the task of filling in examples of Primes, this rule would have AM notice that
Primes is a kind of Number, and therefore look over all the known examples of
Number. Some of those would be primes, and would be transferred to the Examples
facet of Primes.
(ii) New concepts may be created (e.g., the concept "primes which are uniquely representable
as the sum of two other primes" may be somehow be deemed worth studying). A
typical heuristic rule which might result in this new concept is:
If some (but not most) examples of X are also examples of V (for some concept V),
Create a new concept defined as the intersection of those 2 concepts (X and V).
Suppose AM has already isolated the concept of being representable as the sum of two
primes in only one way (AM actually calls such numbers "Uniquelyprimeaddable
numbers"). When AM notices that some primes are in this set, the above rule will
create a brand new concept, defined as the set of numbers which are both prime and
uniquely prime addable.
(iii) New tasks may be added to the agenda (eg., the current activity may suggest that the
following task is worth considering: "Generalize the concept of prime numbers"). A
typical heuristic rule which might have this effect is:
If v»ry few examples of X are found,
Then add the following task to the agenda: "Generalize the concept X".
Of course, AM contains a precise meaning for the phrase "very few". When AM looks
for primes among examples of alreadyknown kinds of numbers, it will find dozens of
nonexamples for every example of a prime it uncovers. "Very few" is thus naturally
I
AM:
Chapter 2
Diacovery in Mathematics aa Heuriatic Starch
implemented as a statistical confidence level2
17
.
I
I
The concept of an agenda is certainly not new: schedulers have been around for a long
and
time. But one important feature of AM's agenda scheme is a new idea: attaching
3
each
task
which
the
task
worth
explain
why
i:»
using a list of quasisymbolic reasons to
considering, why it's plausible. It is the responsibility of the heuristic rules to include reasons
for any tasks they propose.'* For example, let's reconsider the heuristic rule mentioned in (iii)
above. It really looks more like the following:
—
—
If very few examples of X are found,
Then add the following task to the agenda: "Generalize the concept X", for the following
reason: "X's are quite rare; a slightly less restrictive concept might he more
interesting".
If the same task is proposed by several rules, then several different reasons for it may be
present. In addition, one ephemeral reason also exists: "Focus of attention". Any tasks
which are similar to the one last executed get "Focus of attention" as a bonus reason. AM
uses all these reasons, e.g. to decide how to rank the tasks on the agenda. The
"intelligence" AM exhibits is not so much "what it does", but rather the order in which it
arranges its agenda5 AM uses the list of reasons in another way: Once a task h;is been
selected, the quality of the reasons is used to decide how much time and space the t;isk will
be permitted to absorb, before AM quits and moves on to a new task. This whole
mechanism will be detailed in Section 3.3.2, on Page 33.
.
2.2. What to
get out
of
— and NOT get out of — this example
The purpose of the example which begins on page 20 is to convey a bit of AM's flavor.
After reading through it, the reader should be convinced that AM is not a theoremprover,
nor is It randomly manipulating entries in a knowledge base, nor is it exhaustively
manipulating or searching. AM is carefully growing a network of data structures
representing mathematical concepts, by repeatedly using heuristics both (a) for guidance in
choosing a task to work on next, and (b) to provide methods to satisfy the chosen task.
2
I
I
I
I
I
Tha ratio of examplea found to nonexamples stumbled over lies between .001 and .05. Philosophers outraged by thie may
be eomewhat appeased by knowledge that large changee in the preciee numbers very rarely niter AM'e
behavior.
3
A
Each reason ie an English sentence. While AM can tell whether two given reaaona coincide, it can't actually do any internal
processing on them. If this lack of intelligence had proved to be a limiting problem, then more work would
have been expended on giving AM come euch abilitiea.
An alternative scheme, perhaps even a bit more humanlike, would be to (perhaps only occasionally) allow a buret of
poorlymotivated task* to be proposed, and then u*e tome pruning criteria to weed out the obvicu* loeere.
During thi* time, AM could type out to the uter (who otherwise would be closely monitoring ite activities) a
cute anthropomorphic phrase like "I'm now sitting back and puffing on my pipe, lost in contemplation."
example,
alternating
For
a randomlychoaen task and the "best" task (the one AM chose to do) only slow* the eyetem
down by a factor of 2, yet it totally deetroye ite credibility
a rational researcher (cc judg id by tha
human ueor of AM). Thi* i* one conclueion of experiment 2 (ccc Section 6.2.2, page 1 29).
I
I
I
I
AM:
Chapter 2
Discovery in Mathamatica aa Heuriatic Search
The following points are important but can't be conveyed by any lone example:
I
I
I
I
(i) Although AM appears to have reasonable natural language abilities, this is a typical A I
illusion: most of the phrases AM types are mere tokens, and the syntax which the user
must obey is unnaturally constrained. For the sake of clarity, I have "touched up" some
of the wording, indentation, syntax, etc. of what AM actually outputs, but left the spirit
of each phrase intact. As the reader becomes more familiar with AM, future examples
can be "unretouched". If he wishes, he may glance at Appendix 5.3, which shows
some actual listings of AM in action.
(ii) The reader should be skeptical of the generality of the program; is the knowledge base
"just right" (i.e., finely tuned to elicit this one chain of behaviors)? The answer is
"No" 6 The whole point of this project is to show that a relatively small set of general
heuristics can guide a nontrivial discovery process. Each activity, each task, was
proposed by some heuristic rule (like "look for extreme cases of X") which was used
time and time again, in many situations. It was not considered fair to insert heuristic
guidance which could only "guide" in a single situation.
.
This kind of generality can't be shown convincingly in one example. Nevertheless,
even within this small excerpt, the same line of development which leads to
1
decomposing numbers (using TIMES" ) and thereby discovering unique factorization,
1
also leads to decomposing numbers (using ADD" ) and thereby discovering Goldbach's
conjecture. The same heuristic which caused AM to expect that unique factorization
will be useful, also caused AM to suspect that Goldbach's conjecture will be useless.
Let me reemphasize that the "point" of this example is not the specific mathematical
concepts, nor the particular chains of plausible reasoning AM produces, nor the few flashy
conjectures AM spouts, but rather an illustration of the kinds of things AM does.
I
2.3. Deciphering the Example
I
Recall that in general, each task on the agenda will have several reasons attached to it. In
the example excerpt, the reasons for each task are printed just after the task is chosen, and
before it's executed.
AM numbers its activities sequentially. Each time a new task is chosen, a counter is
meaning that
TASK 65
incremented. The first task in the example excerpt is labelled
the example skips the first 64 tasks which AM selects and carries out. The reason simply is
that the development of simple concepts related to divisibility will probably be more
intelligible and palatable to the reader, than AM's early ramblings in finite set theory.
**
I
I
I
I
Tha deekn of AM wae finely tuned co that the answer to thi* question would be "No". Ponder thet one!
AM:
Chapter 2
Diacovery in Mathematics aa Heuriatic Search
.
In the example Itself, several irrelevant tasks have been excised7 About half cf those
omitted tasks were Interesting in themselves, but all of them were tangential or unrelated to
the development shown. The reader can tell by the global task numbering how many were
skipped. For example, notice that the excerpt jumps from Task 67 to Task 79.
help gauge AM's abitities, the reader may be interested to know that AM defined
"Natural Numbers" during Task 44, and "TIMES" was defined during Task 57. AM
started with no knowledge of numbers, and only scanty knowledge of sets and setoperations.
Task 3, e.g., was to fill in examples of Sets.
I
To
The concepts that AM talks about are selfexplanatory
some nonstandard ones.
 by and large.
I
Below are discussed
BAG is a kind of list structure, a bunch of elements which are unordered, but one ir which
multiple copies of the same element are permitted. One may visualize a paper bag filled
with cardboard letters. Technically, we shall say that a set is not considered to be a bag. A
bag is denoted by enclosure within parentheses, just as sets are within braces. So the bag
containing X and four V's might be written (X V V V V), and would be considered
I
Number will mean (typically) a positive integer.
I
I
indistinguishable from the bag (V V V X V).
TIMES'^
is a particular relation. For any number x, TIMES" '(x) is a set of bags. Each
bag contains some numbers which, when multiplied together, equal x. For example,
TIMES"'(18) { (18) (2 9) (2 3 3) (3 6) }. Checking, we see that multiplying, eg., the
numbers in the bag (2 3 3) together, we do get 2x2x318. TIMES"'(x) contains all possible
such bags (containing natural numbers >I).

.
ADD' ] is a relation analogous to TIMES" 1 For any number x, ADD"'(x) is also a set of
bags. Each bag contains a bunch of numbers which, when added together, equal jc. For
example, ADD"'(4) { (4) (1 I1I)(I12) (1 3) (2 2) }. ADD"'(x) contains all possible such
bags (containing numbers >0); it finds all possible partitions of x.

Divisorsof
is a more standard relation. For any number x, Divisorsof(x) is the set of all
positive numbers which divide evenly into x. For example, Divisorsof(18) {1 2 3 6 9 18}.

The definitions for most of the mathematical terms used in the excerpt can be found in the
Glossary (Appendix 1.1). Whenever there is a conflict between "computer science jirgon"
and "math jargon", I have opted for the latter. So, e.g., all "functions" are necessarily singlevalued for each member of their domain.
AM is an interactive computer program. It prints out phrases indicating what it is doing at
each moment, and a human being, referred to as the User, watches AM's activities, /it any
moment, the user can interrupt AM to ask questions or redirect its energies. In the excerpt,
the user never redirects AM, although he does occasionally ask AM to rename some
concept and to explain its behavior (e.g., just after
TASK 106 **). For the benefit of the
reader, I have boldfaced all the user's comments.
I
I
I
I
**
Thie ie
despite the reeulte of Experiment 2 (ccc Section 6.2.2) becauee the remaining taeke clump together in
twoe, threee, etc; they ere uninterrupted line* of research (e.g., Task* 6567), eeperated by vary large
gape (e.g., the jump from Task 67 to 79).
I
I
Chapter 2
I
AM:
Diacovery in Mathematice as Heurietic Search
My personal notes and comments occur rightjustified, in italics.
2.4. The Example Itself
** Task 65: **
Filling in examples of the following concept: "Divisorsof".
3 Reasons:
I
I
I
(1) No known examples for Divisorsof so far.
(2) TIMES, which is related to Divisorsof, is now very interesting.
(3) Focus of attention: AM recently defined Divisorsof.
26 examples found, in 9.2 seconds. e.g.: Divisorsof(6){l 2 3 6}.
Chapter 2
AM:
Discovery in Mathematice ac Heuriatic Search
21
Tatk 66:
** which have very small sets of Divisorsof (i.e., very few divisors).
**Considering
numbers
2 Reasons:
I
(1) Worthwhile to look for extreme cases.
(2) Focus of attention: AM recently worked on Divisorsof.
Filling in examples of numbers with 0 divisors.
0 examples found, in 4.0 seconds.
Conjecture: no numbers have 0 divisors.
Filling in examples of numbers with 1 divisors.
1 examples found, in 4.0 seconds, e.g.: Divisorsof( !){!}.
Conjecture: 1 is the only number with 1 divisor.
Filling in examples of numbers with 2 divisors.
24 examples found, in 4.0 seconds, e.g.: Divisorsof(13)«{l 13}.
No obvious conjecture. This kind of number merits more study.
Creating a new concept: "Numbers with2divisors".
AM had previously derived the notion of singleton,
doubleton, etc. Above, AM was actually defining, e.g., the
set of numbers which gets mapped into a doubleton of
numbers, by the operation Divisorsof. I took th<>. liberty
e.g.,
of translating this into the above form, which
the number "2".
I
Filling in examples of numbers with 3 divisors.
11 examples found, in 4.0 seconds, e.g.: Divisorsof (49)«{1 7 49}.
All numbers with 3 divisors are also Squares. This kind of number merits more stidy.
Creating a new concept: "Numberswith3divisors".
How did AM notice that conjecture? It took a random
example of Numberswith2divisiors, say 49. Then it
asked what other known concepts "49" was an example
of. The two answers were: Oddnumbers and Perfectsquares. AM then tested these conjectures on the other
ten examples just found. The only surviving conjecture
was that all numberswith 2divisors are also perfect
I
squares.
I
I
I
AM:
Chaptar 2
Diacovery in Mathematica aa Heuriatic Search
22
** Task 67: **
Considering the squareroots of Numberswith3divisors.
2 Reasons:
(1) Numberswith3divisors are unexpectedly also perfect Squares.
(2) Focus of attention: AM recently worked on Numberswith3divisors.

All squareroots of Numberswith3divisors seem to be Numberswith2divisors.
e.g., Divisorsof( Squareroot(169) )
Divisorsof(13) {1 13}.
Formulating the converse to this statement. Empirically, it seems to be true.
The square of each Numberwith2divisors seems to be a Numberwith3divisors.
This is very unusual. It is not plausibly a coincidence. (Chance of coincidence is < .001)
Boosting interestingness factor of the concepts involved:
Interestingness factor of "Divisorsof" raised from 300 to 400.
Interestingness factor of "Numberswith2divisors" raised from 100 to 600.
Interestingness factor of "Numberswith3divisors" raised from 200 to 700.
USER: Call the set of numbers with 2 divisors "Primes".
I
Task 68:
Considering the squares of Numberswith3divisors.
**
**
2 Reasons:
(1) Squares of Numberswith2divisors were interesting.
(2) Focus of attention: AM recently worked on Numberswith3divisors.
—
—
This gap in the sequencing from task 67 to task 79
eliminates some tangential and boring tasks. See page
19 for an explanation.
AM:
Chapter 2
Diacovery in Mathematic*
Heuriatic Search
Task 79: »*
Examining TIMES' '(x), looking for patterns involving its values.
**
2 Reasons:
I
(1) TIMES" 1 is related to the newlyinteresting concept "Divisorsof".
(2) Many examples of TIMES" 1 are known, to induce from.
Looking specifically at TIMES"'(I2), which is { (12) (2 6) (2 23)(34) }.
13 conjectures proposed, after 2.0 seconds.
e.g., "TIMES"'(x) always contains a bag containing only even numbers
Testing the conjectures on other examples of TIMES" 1
.
5 false conjectures deal with even numbers.
AM will sometime consider the restriction of TIMES" 1 to even numbers.
Only 2 out of the 13 conjectures are verified for all 26 known examples of TIMES' 1 :
Conjecture 1: TIMES' (x) always contains a singleton bag.
e.g., TIMES''(I2), which is { (12) (2 6) (2 23) (3 4) }, contains (12).
e.g., TIMES''(I3), which is { (13) }, contains (13).
Creating a new concept, "Singletimes".
Singletimes is a relation from Numbers to Bagsofnumbers.
Singletimes(x) is all bags in TIMES' (x) which are singletons.
e.g., Singletimes(l2H1 is the product
A
suspect
conjecture
may
very
I
that this
be
useful.
USER: Call this conjecture "Unique factorization conjecture".
To show that AM isn't really always right on the mark,
the next sequence of tasks includes a crime of omission
(ignoring the concept of Partitions) and a false start
(worrying about numbers which can be represented as the
sum of two primes in precisely one way). Notice the
skip here; 2 tasks have been omitted.
few!) metaheuristics in AM said the following: "When using tha 'look
at the inverse of extreme items under the operation f rule, Tack the following note onto the
Intereet facet of the new concept which ie created 'Conjectures involving this concept and f (or
eonc,P' PR,MES wM
f' 1 ) are natural, interesting, and probably useful.' " Now
How did AM know thia? One of the (unfortunately
using the 'extreme' heuristic
rule, with fDivisOrsOf. When PRIMES was first created, the matsrule wa just
presented tacked the following note onto Primes.interest:
Conjectures involving Primes and
Tho
* *
useful.'
wouldn't.
division (or multiplication) are natural, interesting, and probably
unique factorization conjecture triggers this feature, whereas
I
Goldbach's conjecture
AM:
Chapter 2
Diecovery in Methomatice
Heuriatic Search
Task 84: *«
Examining ADD" (x), looking for patterns involving its values.
**
2 Reasons:
(1) ADD" is analogous to the newlyinteresting concept
(2) Many examples of ADD" are known, to induce from.
"TIMES" 1"
Looking specifically at ADD''(6), which is { (1 11111)(11112)(1 1 1 3) (1 1 2 2)
(1 1 4) (1 2 3) (1 5) (2 2 2) (2 4) (3 3) (6)}.
17 conjectures proposed, after 3.9 seconds,
e.g., "ADD"'(x) always contains a bag of primes".
Testing them on other examples of ADD"
Only 11 out of the 17 conjectures are verified for all 19 known examples of ADD" 1 :
3 out of the 1 1 conjectures were false until amended.
.
Conjecture 1:
ADD"'(x) never contains a
Conjecture 2:
ADD"'(x) always contains a bag of size 2 (also called a "pair" or a "doub eton").
singleton bag.
e.g., ADD"'(6) contains (1 5), (2 4), and (3 3).
e.g., ADD"'(4) contains (1 3), and (2 2).
Creating a new concept, "Pairadd".
Pairadd is a relation from Numbers to Pairsofnumbers.
Pairadd(x) is all bags in ADD"'(x) which are doubletons (i.e., of size 2).
e.g., Pairadd(l2){ (1 11) (2 10) (3 9) (4 8) (5 7) (6 6) }.
e.g., Pairadd(4){ (1 3) (2 2) }.
Conjecture 3:
ADD~'(x) always contains a bag containing
only l's.
Conjecture 10: ADD*'(x) always contains a pair of primes.
This conjecture is false. Conjecture is amended:
"ADD"'(x) usually (but not always) contains a pair of primes."
e.g., ADD"'(10) contains (3 7), and (5 5).
e.g., ADD" '(4) contains (2 2).
e.g., ADD" '(11) does not contain a pair of primes.
Creating a new concept, "Primeadd".
Primeadd is a relation from Numbers to Pairsofnumbers.
Primeadd(x) is all bags in ADD" (x) which are pairs of primes.
e.g., Primeadd(l2){ (5 7) }.
e.g., PrimeaddUOM (3 7) (5 s) }.
e.g., Primeadd(ll) { }

I
Chapter 2
AM:
Discovery in Mathematic* aa Heuristic Search
** Task 106: **
Considering the set of numbers for which Primeadd is defined (has nonempty value).
I
1 Reason:
(1) Primeadd often has nonempty value. Worth isolating that case.
Warning: no task on the agenda has an interestingness value above 200!!!
Creating a new concept "Primeaddable".
Primeaddable is a kind of Number, x is Primeaddable if Primeadd(x) is nonempty.
Will spend 5.0 seconds filling in examples of Primeaddable.
18 examples found. Here are some of them: 4567891012 13 14 17 16 18.
Empirically, all even numbers are also in this set (ignoring boundary cases: 0 2 4 6).
So conjectured. Danger: must examine boundary cases: the numbers 0, 2, 4, and 6.
Two exceptions noticed. The only exceptions are the smallest boundary cases: 0, 2.
Conjecture is amended: All even numbers >2 are the sum of two primes.
Warning: I expect this conjecture will be cute but useless.
USER: Why?
Because the concept of "Primes" originally arose as extreme cases of Divisorsof. The more
closely an operation X is related to the concept Divisorsof, the more natural will be any
conjecture involving both that operation X and Primes. E.g., conjectures involving both
Primes and Times will be natural and useful. But this conjecture, which involves Primes and
ADD' 1 , will be cute but useless, since the relation "ADD" 1 " is unrelated to the relation
'Divisorsof".
End of explanation.
USER: Call this conjecture "Goldbach's conjecture".
»« Task 107: »«
Considering the set of numbers for which the relation Primeadd is singlevalued.
3 Reasons:
(1) Primeadd often has singleton value. Worth isolating that case.
(2) Restricted to this set, Primeadd would be a function.
(3) Focus of attention: AM recently worked on Primeadd.
Creating a new concept "Uniquelyprimeaddable".
"Uniquelyprimeaddable" is a kind of Number,
x is Uniquelyprimeaddable if Primeadd(x) is a singleton.
Will spend 10.0 seconds filling in examples of Uniquelyprimeaddable.
1 1 examples found. Here are some of them: 4578 9 12 13.
No obvious conjecture derived empirically.
Will forget "Uniquelyprimeaddable numbers", if no Ties found in near future.
I
Chapter 2
AM:
Discovery in Mathematics
Heuristic Search
27
I
2.5. Recapping the Example
Let's once again eavesdrop on a mathematician, as he describes
to a
colleague what AM
did.
This example was preceded by the momentous discoveries of multiplication and division.
Several interesting properiites of these operations were noticed. The first task which was
illustrated (** Task 65 **) involves exploring the concept of "divisors of a number"
(meaning all positive integers which divide evenly into the given number). After tiring of
finding examples of this relation, AM investigates extreme cases: that is, it wonders which
numbers have very few or very many divisors.
AM thus discovers Primes in a curious way. Numbers with 0 or 1 divisor are essentially
nonexistent, so they're not found to be interesting. AM notices that numbers with 3 divisors
always seem to be squares of numbers with 2 divisors (primes). This raises the
interestingness of several concepts, including primes. Soon (** TASK 79 **), another
conjecture involving primes is noticed: Many numbers seem to factor into prime.. This
causes a new relation to be defined, which associates to a number x, all prime factorizations
of x. The first question AM asks about this relation is "is it a function?". This question is
the full statement of the unique factorization conjecture: the fundamental theorem of
arithmetic. AM recognized the value of this relationship, and assigned it a high
interestingness rating.
In a similar manner, though with lower hopes, it noticed some more relationships involving
primes, including Goldbach's conjecture. AM quite correctly predicted that this would turn
out to be cute but of no future use mathematically.
The last activity mentioned (** TASK 107 **) shows AM examining a rather nonstandard
way".
concept: "numbers which can be written as the sum of a pair of primes, in only
These are termed "uniquetyprimeaddable" numbers. It was mildly unfortunate that AM
gave up on this concept before noticing that p+2 is uniquelyprimeaddable, for any prime
number p, and that in fact these are the only odd uniquelyprimeaddable numbers. The
session was repeated once, with a human user telling AM explicitly to continue studying this
concept. AM did in fact construct "Uniquelyprimeaddableoddnumbers", and then notice
this relationship. Here we see an example of unstable equilibrium: if pushed slighi:ly this
way, AM will get very interested and spend a lot of time working on this kind of number.
Since it doesn't have all the sophistication (i.e., compiled hindsight) that we have, It can't
know instantly whether what it's doing will be fruitless.
I
I
I
I
'Objectively' given, 'important' problems may arise fin mathj. But even then the
mathematician is essentially free to take it or leave it and turn to something else,
while an 'important' problem in [any other science] is usually a conflict, a
contradiction, which 'must' be resolved. The mathematician has a wide choice of
which way to turn, and he enjoys a very considerable freedom in what he does.
—
Neumann
AM is one of those awkward programs whose representations only make sense if you
already understand how they will be operated on. A discussion of AM's control structure
(this chapter and the next) must thus precede a discussion of concepts and how they are
represented (Chapter 5). Section 2.1 gave the reader a sufficient knowledge of AM's
"anatomy" to follow these chapters. Thus armed with a cursory knowledge of the "statics" of
AM, we shall proceed to describe in detail Its "dynamics".
Section 3.1 will give the reader a feeling for the immensity of AM's search space. This is
the "problem". The next section will give the toplevel "solution": the flow of control is
governed by a joblist, an agenda of plausible tasks. Section 3.3 will present some details of
this global control scheme.
Chapter 4 deals with the way AM's heuristics operate; this could be viewed as the "lowlevel" or local control structure of AM. Chapter 5 contains some detailed information
about the actual concepts (and heuristics) AM starts with, and a little more about their
design and representation. The reader is also directed to Appendix 5, which presents
several detailed examples of AM "in action".
3.1. AM's Search
To develop mathematics, one must always labor to substitute ideas for calculations.
—
Diriehlet
Let's first spend a paragraph reviewing how concepts are stored. AM contains a collection
Chapter 3
AM:
Discovery in Mathematics aa Heuristic Search
of data structures, called concepts. Each concept is meant to coincide intuitively wilh one
mathematical idea (e.g., Sets, Union, Trichotomy). As such, a concept has several aspects or
parts, called facets (e.g., Examples, Definitions, Domain/range, Worth). If you wish to think
of a concept as a "frame", then its facets are "slots" to be filled in. Each facet of a concept
will either be totally blank, or else will contain a bunch of entries. For example, the
Algorithms facet of the concept Union may point to several equivalent LISP function!;, each
1
of which can be used to form the union of two sets Even the "heuristic rules" are merely
entries on the appropriate kind of facet (e.g., the entries on the Interest facet of the
Structure concept are rules for judging the interestingness of Structures2).
.
At any moment, AM contains a couple hundred concepts, each of which has only some of its
facets filled in. AM starts with 115 concepts, and grows to about 300 concepts oefore
running out of time/space. Most facets of most concepts are totally blank. AM's basic
activity is to select some facet of some concept, and then try to fill in some entries for that
slot 3 Thus the primitive kind of "task" for AM is to deal with a particular facet/concept
pair. A typical task looks like this:
.
Check the entries on the "Domain/range" facet of the "BagInsert" concept
If the average concept has ten or twenty blank facets, and there are a couple hundred
concepts, then clearly there will be about 20x2004000 "fillin" type tasks for AM to work
on, at any given moment. If several hundred facets have recently been filled in, there will
be that many "checkentries" type tasks available. Executing a task happens to take around
ten or twenty cpu seconds, so over the course of a few hours only a small percentage of these
tasks can ever be executed. 4
—
Since most of these tasks will never be explored, what will make AM appear smart
or
5
stupid are its choices of which task to pick at each moment. So it's worth AM's spending
a nontrivial amount of time deciding which task to execute next. On the other hand, it had
better not be too much time, since a task does take only a dozen seconds.6

One question that must be answered is: What percentage of AM's legal moves (a.t any
The
reasons for having multiple algorithms ia that sometimes AM will want one that ia
sometimes AM will l>a more
concerned with economizing on storage, sometimes AM will want to "analyze" en algorithm, and lor that
etc.
purpose it muet be a very unoptimized
such
rule
ie:
is
if
all
ite
elements
are mildly intereeting in precisely the eamu way."
typical
very
interesting
A
"A structure
3
Thie ie not quite complete. In addition to filling in entries for a given facet/concept pair, AM may wiah to check it, eplit it
up, reorganize it, etc.
4
Tha precise "18 second* average" figure is not important. All heuristicsearch programa auffer thie eama handicap Ac tho
depth to which they've searched increases, the percentage of nodes (at or above that level) which have
been examined decreases exponentially (assuming the branching factor b is strictly larger then unity',.
2
Thie ie true of ell heuristic eearch programa. The branchier the eearch, the more it appliea.
Tha answer ie that AM apends thia "deciding" time not just before a task is picked, but rather each time a tack ie eddad
to tho agenda. A little under 1 cpu second is spent, on the average, to place tha task properly on tho
agenda, to assign it a meaningful numeric priority value. So "action time" is roughly one order of mugnituda
larger then "deciding time".
I
Chapter 3
AM:
Discovery in Mathematics aa Heuristic Search
typical moment) would be considered intelligent choices, and what percentage would be
irrational? The answer comes from empirical results. The percentages vary wildly
depending on the previous few tasks. Sometimes, AM will be obviously "in the middle" of
a sequence of tasks, and only one or two of the legal tasks would seem plausible. Other
times, AM has just completed an investigation by running into deadends, and there may be
hundreds of tasks it could choose and not be criticized. The median case would perhaps
permit about 6 of the legal tasks to be judged reasonable.
It is important for AM to locate one of these currentlyplausible tasks, but it's not worth
spending much time deciding which of them to work on next. AM still faces a huge search:
find one of the 6 winners out of a few thousand candidates.

Its choice of tasks is made even more important due to the 10second "cycle time"
the time
to investigate/execute one task. A human user is watching, and ten seconds is a nontrivial
amount of time to him. He can therefore observe, perceive, and analyze each and every
task that AM selects. Even just a few bizarre choices will greatly lower his opinion of AM's
intelligence. The trace of AM's actions is what counts, not its final results. So AM can't
draw much of its apparent intelligence from the speed of the computer.
Chessplaying programs have had to face the dilemma of the tradeoff between "intelligence"
(foresight, inference, processing,...) and total number of board situations examined. In chess,
the characteristics of currentday machines, language power vs. speed, and (to some extent)
the limitations of our understanding of how to be sophisticated, have to date unfortunately
still favored fast, nearlyblind 7 search. Although machine speed and LISP slowness may
allow blind search to win over symbolic inference for shallow searches, it can't provide any
more than a constant speedup factor for an exponential search. Inference is slowly gaining
on brute force, 8 and must someday triumph.
Since the number of "legal moves" for AM at any moment is in the thousands, it is
unrealistic to consider "systematically" 9 walking through the entire space that AM can reach.
In AM's problem domain, there is so much "freedom" that symbolic inference finally can
10
win over the "simple but fast" exploration strategy
.
3.2. Constraining AM's Search
a
a
simple static evaluation function.
E.g., eea [Berliner 74) There, eearching ie ueed mainly to verify plausible moves (c convergent process), not to discover
them (a bushier eearch).
ie., using a
e.g., exhaustively, or using
oi.fi minimaxing, etc.
Thie is tha author's opinion, partially supported by the reaulta of AM. Peul Cohan disagraee, feeling that machine
should be tha key to en automated mathematician's success.
speed
AM:
Chapter 3
Discovery in Mathamatice as Heuristic Search
There exist too many combinations to consider all combinations of existing entities;
the creative mind must only propose those of potential interest.
—
Poincare'
A great deal of heuristic knowledge is required to constrain the necessary processing
effectively, to zero in on a good task to tackle next. This is done in two stages.
1. A list of plausible facet/concept pairs is maintained. Nothing can get onto this list
unless there is some reason why filling in (or checking) that facet of that concept
would be worthwhile.
2. All the plausible tasks on this "job list" are ranked by the number and strength of
the different reasons supporting them. Thus the facet/concept pairs near the top of
the list will all be very promising tasks to work on.
The first of these constraints is akin to replacing a legal move generator by a plausible
move generator. The second kind of constraint is akin to using a heuristic evaluation
function to select the best move from among the plausible ones."
The joblist or agenda is a data structure which is a natural way to store the results of these
procedures. It is (1) a list of all the plausible tasks which have been generated, and (2) it is
kept ordered by the numeric estimate of how worthwhile each task is. A typical entry on
the agenda might look like this:
Peat Al
(e.g., [Samuel 67]) heve indicated that conatraining generation (1) ie more important then sophisticated
ordering of tha resultant candidates (2). This was confirmed by tha experiments performed on AM.
program*
Chapter 3
AM:
Discovery in Mathematics aa Heuristic Search
The actuat toplevel control structure is simply to pluck the top task from the agenda and
execute it. That is, select the facet/concept pair having the best supporting reasons, and try
to fill in that facet of that concept.
While a task is being executed, some new tasks might get proposed and merged into the
agenda. Also, some new concepts might get created, and this, too, would generate a flurry of
new tasks.
After AM stops filling in entries for the facet specified in the chosen task, it removes that
task from the agenda, and moves on to work on whichever task is the highestrated at that
time.
The reader probably has a dozen good questions in mind at this point (e.g., How do the
reasons get rated?, How do the tasks get proposed?, What happens after a task is
selected?,...). The next section should answer most of these. Some more judgmental ones
(How dare you propose a numeric calculus of plausible reasoning?! If you slightly detune
all those numbers, does the system's performance fall apart?...) will be answered in Chapter
7.
3.3. The Agenda
Creative energy is used mainly to ask the right question.
—
Halmos
3.3.1. Why an Agenda?
This subsection provides motivation for the following one, by arguing that a joblist scheme
is a natural mechanism to use to manage the taskselection problem AM faces. If that seems
obvious to you, feel free to skip ahead to section 3.3.2, page 33.
Recall that AM must zero in on one of the best few tasks to perform next, and it repeatedly
makes this choice. At each moment, there might be thousands of directions to explore
(plausible tasks to consider).
If all the legal tasks were written out, and reasons were thought up to support each one,
then perhaps we could order them by the strength of those reasons, and thereby settle on
the "best" task to work on next. In order to appear "smart" to the human user, AM should
never execute a task having no reasons attached.
Some magical function will be assumed to exist, which provides a numeric rating, a priority
value, for any given task. The function looks at a given facet/concept pair, examines all the
associated reasons supporting that task, and computes an estimate of how worthwhile it
would be for AM to spend some time now working on that facet of that concept.
I
AM:
Chapter 3
Discovery in Mathematica as Heuristic Search
So AM will maintain a list of those legal tasks which have some good reasons tacked onto
them, which justify why each task should be executed, why it is plausible. At least
implicitly, AM has a numeric rating for each task. The obvious control algorithm is to
choose the task with the highest rating, and work on that one next.
Assuming the tasks on this list are kept ordered by this numeric rating, then AM can just
repeatedly pluck the highest task and execute it. While it's executing, some new tasks might
get proposed and added to the list of tasks. Reasons are kept tacked onto each task on this
list, and form the basis for the numeric priority rating.
Give or take a few features, this notion of a "joblist" is the one which AM uses. Ii: Is also
called an agenda. 12 "A task on the agenda" is the same as "a job on the joblist" is the same
as "a facet/concept pair which has been proposed" is the same as "an active node in the
search space". Henceforth, I'll use the following all interchangeably: task, facet/conctpt pair,
13
node, job. This should break up the monotony
.
The flavor of agendalist used here is similar to the control structure of HEAFSAYII
[Lesser/Fennell/Erman/Reddy 75]. Vast numbers of tasks are proposed and addeci to the
joblist. Occasionally, when some new data arrives, some task is repositioned
3.3.2. Details of the
Agenda
scheme
At each moment, AM has many plausible tasks (hundreds or even thousands) whkh have
been suggested for some good reason or other, but haven't been carried out yet. Eiich task
is at the level of working on a certain facet of a certain concept: filling it in, checking it, etc.
Recall that each task also has tacked onto it a list of symbolic reasons explaining why the
task is worth doing.
In addition, a number (between 0 and 1000) is attached to each reason, representing some
absolute measure of the value of that reason (at the moment). One global formula 14
combines all the reasons' values into a single priority value for the task as a whole. This
overall rating is taken to indicate how worthwhile it would be for AM to bother executing
that task, how interesting the task would probably turn out to be. The "intelligence" of
AM's selection of task is thus seen to depend on this one formula. Yet experiments show
that its precise form is not important. We conclude that the "intelligence" has been pushed
down into the careful assigning of reasons (and their values) for each proposed task.
12
Borrowed from Kaplan's term for the joblist present in KRL
discussion of agendas, see [Knuth 68)
(see [Bobrow A Winograd 77]).
For an earlier general
13
end cover my eloppiness. Seriously, thanka to Engliah, each of theae terms will conjure up a slightly different image: a
"job" ie something to do, a "node" ie an item in a search apace, "facet/concept pair" reminds you of the
M
Hare ia that formula: Worth(J)
format of a task.
.
Rj Z ) x [

o.2xWorth(A) ♦ o.3xWorth(F) ♦ o.sxWorth(C)J, where J job to be
F, Concept C), and {R.} are the ratings of the reasons supporting J. For tha sample
Facet
"
job pictured in the box below, AFillin, FExemples,
(Rj}{ 100,100,200}. Tha formula will bei
repeated
and explained in Section 4.2, on page 40.
judged
(Act A,
—
SQRT(SUM
—
Chapter 3
AM:
Diacovery in Mathematics as Heuristic Search
A typical entry on the agenda might look like this:
TASK: Fillin examples of Sets
PRIORITY: 300
REASONS:
100: No known examples for Sets so far.
100: Failed to fillin examples of Setunion, for lack of examples of Sets
200: Focus of attention: AM recently worked on the concept of Setunion
Notice the similarity of this to the initial few lines which AM types just after it selects a job
to work on.
The flow of control is simple: AM picks the task with the highest priority value, and tries to
execute it. As a side effect, new jobs occasionally get added to the agenda while the task is
being executed.
The global priority value of the task also indicates how much time and space this task
deserves. The sample task above might rate 20 cpu seconds, and 200 list cells. When either
of these resources is used up, AM terminates work on the task, and proceeds to pick a new
one. These two limits will be referred to in the sequel as "time/space quanta" which are
allocated to the chosen task. Whenever several techniques exist for satisfying some task, the
remaining time/space quanta are divided evenly among those alternatives; i.e., each method
is tried for a small time. This policy of parceling out time and space quanta is called
"activation energy" in [Hewitt 76] and called "resourcelimited processes" in [Norman &
Bobrow 75]. In the case of filling in examples of sets, the space quantum (200 cells) will be
used up quickly (long before the 20 seconds expire).
There are two big questions now:
1. Exactly how is a task proposed and ranked?
How is a plausible new task first formulated?
How do the supporting reasons for the task get assigned?
How does each reason get assigned an absolute numeric rating?
Does a task's priority value change? When and how?
2. How does AM execute a task, once it's chosen?
Exactly what can be done during a task's execution?
The next chapter will deal with both of these questions. A detailed discussion of difficulties
and limitations of these ideas can be found in Section 7.2, on page 156.
I
Chapter 4.
Heuristic Rules

Assume that somehow AM has selected a particular task from the agenda
say "Fillin
Examples of Primes". What precisely does AM do, in order to execute the task? How are
examples of primes filled in?
The answer can be compactly stated as follows:
"AM selects relevant heuristics, and executes them."
This really just splits our original question into two new ones: (i) How are the relevant
heuristics selected, and (ii) What does it mean for heuristics to be executed (e.g., how does
executing a heuristic rule help to fill in examples of primes?).
These two topics (in reverse order) are the two major subjects of this chapter. Although
several examples of heuristics will be given, the complete list is relegated to Appendix 3.
'
The first section explains what heuristic rules look like (their "syntax", as it were). The next
three sections illustrate how they can be executed to achieve their desired results (their
"semantics").
Section 4.5 explains where the rules are stored and how they are accessed at the appropriate
times.
Finally, the initial body of heuristics is analyzed. The informal knowledge they contain is
categorized and described. Unintentionally, the distribution of heuristics among the
concepts is quite nonhomogeneous; this too is described in Section 4.6.
4.1. Syntax of the Heuristici
Let's
start
by seeing what a heuristic rule looks like. In general (see [Davis & King 75] for
historical references to production rules), it will have the form
If
Then
As an illustration, here is a heuristic rule, relevant when checking examples of anything:
There they are condensed and phraaed in English. The reader wishing to ccc examples of tha heuristics ac they actually
ware coded in LISP should glance et Appendix 2.3.
AM:
Chapter 4
Discovery in Mathematics aa Heuriatic Search
If the current task is to Check Examples of any concept X,
and (Forsome V) V is a generalization of X,
and V has at least 10 examples,
and all examples oi V are also examples of X,
Then print the following conjecture: X is really no more specialized than V,
and add it to the Examples facet of the concept named "Conjectures",
and add the following task to the agenda: "Check examples of V", for the reason: "Just
as V was no more general than X, oneof Generalizations(Y) may turn out to
be no more general than V", with a rating for that reason computed as the
Examples(Generalizations(Y)), Examples(Y),
and
average
of:
I
Priorityl[Current task).
As with production rules, and formal grammatical rules, each of AM's heuristic rule; has a
lefthandside and a righthandside. On the left is a test to see whether the rule is
applicable, and on the right is a list of actions to take if the rule applies. The lefthaidside
will also be called the IFpart, the predicate, the preconditions, left side, or the situational
fluent of the rule. The righthandside will sometimes be referred to as the THENpart, the
response, the right side, or the actions part of the rule.
4.1.1.
Syntax
of the Lefthand Side
The situational fluent is a LISP predicate, a function which always returns True cr False
(in LISP, it actually returns either the atom T or the atom NIL). This predicate may
investigate facets of any concept (often merely to see whether they are empty or not), use the
results of recent tests and behaviors (e.g., to see how much cpu time AM spent trying to
work on a certain task), etc.
The left side is a conjunction of the form PI a P2 a... All the conjuncts, except the very
first one, are arbitrary LISP predicates. They are only constrained to obey two
commandments:
1. Be quick! (return either True or False in under 0.1 cpu seconds)
2. Have no side effects! (destroying or creating list structures or Lisp functions, resetting
variables)
Here are some sample conjuncts that might appear inside a lefthand side (but
very first conjunct):
no' as the
More than half of lhe current task's time quantum is already exhausted,...
There are some known examples of Structures,...
Some generalization of the current concept (the concept mentioned as pari of the
current task) has an empty Examples facet,...
The space quantum of the current task is gone, but its time allocation is less than 10%
used up,....
A task recently selected had the form "Restructure facet F of concept X", where F is
any facet, and X is the current concept,...
I
AM:
Chapter 4
The
Discovery in Mathematics aa Heuristic Search
user has used this system at least once before,...
It's Tuesday,...
The very first conjunct of each lefthand side is special. Its syntax is highly constrained. It
specifies the domain of applicability of the rule, by naming a particular facet of a particular
concept to which this rule Is relevant.
AM uses this first conjunct as a fast "preprecondition", so that the only rules whose lefthand sides get evaluated are already known to be somewhat relevant to the task at hand. In
fact, AM physically attaches each rule to the facet and concept mentioned in its first
conjunct. 2 This will be discussed in more detail in Section 4.5, "Gathering relevant
heuristics". This first conjunct will always be written out as follows, in this document
(where A, F, and C are specified explicitly):
The current task (the one just selected from the agenda) is of the form "Do action A
to the F facet of concept C"
This can be viewed as the "syntax" of the very first conjunct on each rule's lefthand side.
Here are two typical examples of allowable first conjuncts:
The current task (the one last selected from the agenda) is of the form "Check the
Domain/range facet of concept X", where X is any operation
The current task is of the form "Fillin the examples facet of the Primes concept"
I
These are the only guidelines which the lefthand side of a heuristic rule must satisfy. Any
LISP predicate which satisfies these constraints is a syntactically valid lefthand side for a
heuristic rule. It turned out later that this excessive freedom made it difficult for AM to
inspect and analyze and synthesize its own heuristics; such a need was not foreseen at the
time AM was designed.
Because of this freedom, there is not much more to say about the lefthand sides of rules.
As the reader encounters heuristics in the next few sections, he should notice the
(unfortunate) variety of conjuncts which may occur as part of their lefthand sides.
I
4.1.2. Syntax of the Righthand Side
"Running" the lefthandside means evaluating the series of conjoined little predicates there,
to see if they all return True. If so, we say that the rule "triggers". In that case, the right
handside is "run", which means executing all the actions specified there. A single heuristic
rule may have a list of several actions as its righthandside. The actions are executed in
order, and we then say the rule has finished running.
I
Only the righthandside of a heuristic rule is permitted to have side effects. The right side
of a rule is a series of little LISP functions, each of which is called an action.
2 Sometimes,
I will mention where a certain rule ie attached; in that case, I can omit explicit mention of tho firet conjunct.
Conversely, if I include that conjunct, I needn't tell you where the rule ie etored.
AM:
Chapter 4
Diacovery in Mathematics ac Heuriatic Search
Semantically, each action performs some processing which is appropriate in some way to the
kinds of situations in which the lefthandside would have triggered. The final value that
the action function returns is irrelevant.
I
Syntactically, there is only one constraint which each function or "action" must satisfy: Each
action has one of the following 3 sideeffects, and no other sideeffects:
1. It suggests a new task for the agenda.
2. It causes a new concept to be created.
3. It adds (or deletes) a certain entry to a particular facet of a particular concept.
To repeat: the right side of a rule contains a list of actions, each of which is one of the
above three types. A single rule might thus result in the creation of several new concepts,
the addition of many new tasks to the agenda, and the filling in of some facets of some
alreadyexisting concepts.
These three kinds of actions will now be discussed in the following three sections.
4.2. Heuristic
est New Taski
This section discusses the "proposing a new task" kind of action.
Here is the basic idea in a nutshell: The lefthandside of a rule triggers. Scattered among
the "things to do" in its righthandside are some suggestions for future tasks. These new
tasks are then simply added to the agenda list.
4.2.1. An Illustration: "Fill in Generalizations of Equality"
If a new task is suggested by a heuristic rule, then that rule must specify how to assemble
the new task, how to get reasons for it, and how to evaluate those reasons. For example,
here is a typical heuristic rule which proposes a new task to add to the agenda. It says to
generalize a predicate if it is very rarely 3 satisfied:
If the current task was (Fillin examples of X),
and X is a predicate,
and more than 100 items are known in the domain of X,
and at least 1 0 cpu seconds were spent trying to randomly instantiate X,
and the ratio of successes/failures is both >0 and less than .05
Then add the following task to the agenda: (Fillin generalizations of X), for the following
reason:
3
Tha moet
suspicious part of the situational fluent (the IFpart) ia the number ".05". Where did it come from? lint: if ell
humane had f fingers, this would probably be 0.05 in baae f. Seriously, one can change thie value [to O I or
to .25) with virtually no change in AM'e behavior. This ie the conclusion of experiment 3 (see Section
6.2.3). Such empirical justification ie one important reaeon for actually writing and running large programe
like AM
I
I
AMi
Chspter 4
Discovery in Mathematics aa Heuristic Search
"X is rarely satisfied; a slightly less restrictive concept might be more interesting".
This reason's rating is computed as three times the ratio of nonexamples/examples
found.
Even this is one full step above the actual LISP implementation, where "X is a predicate"
would be coded as "(MEMBER X (EXAMPLES PREDICATE))". The function EXAMPLES(X)
rummages about looking for alreadyexisting examples of X. Also, the LISP code contains
information for normalizing all the numbers produced, so that they will lie in the range 01000.
Let's examine an instance of where this rule was used. At some point, AM chose the task
"Fillin examples of Listequality". One of the ways it filled in examples of this predicate was
to run it on pairs of randomlychosen lists, and observe whether the result was True or
False4 . Say that 244 random pairs of lists were tried, and only twice was this predicate
satisfied. Sometime later, the IF part of the above heuristic is examined. All the conditions
are met, so it "triggers". For example, the "ratio of successes to failures" is just 2/242, which
is clearly greater than zero and less than 0.05. So the righthandside (THENpart) of the
above rule is executed. The righthand side initiates only one action: the task "Fillin
generalizations of Listequality" is added to the agenda, tagged with the reason "Listequality
is rarely satisfied; a slightly less restrictive concept might be more interesting", and that
I
I
I
1
I
I
I
reason is assigned a numeric rating of 3x(242/2)

363.
Notice that the heuristic rule above supplied a little function to compute the value of the
5
reason. That formula was: "three times the ratio of examples/nonexamples found".
Functions of this type, to compute the rating for a reason, satisfy the same constraints as the
lefthandside did: the function must be very fast and it must have no side effects. The
"intelligence" that AM exhibits in selecting which task to work on ultimately depends on the
accuracy of these local rule evaluation formulae. Each one is so specialized that it is "easy"
for it to give a valid result; the range of situations it must judge is quite narrow. Note that
these little formulae were handwritten, individually, by the author. AM wasn't able to
create new little reasonrating formulae.
The reasonrating function is evaluated at the moment the job is suggested, and only the
numeric result is remembered, not the original function. In other words, we tack on a list of
reasons and associated numbers, for each job on the agenda. The agenda doesn't maintain
copies of the reasonrating functions which gave those numbers. This simplification is used
merely to save the system some space and time.
reasonrating formulae to the reasons themselves. Each reason
supporting a newlysuggested job is simply an English sentence (an opaque string, a token).
AM cannot do much intelligent processing on these reasons. AM is not allowed to inspect
parts of it, parse it, transform it, etc. The most AM can do is compare two such tokens for
equality. Of course, it is not to hard to imagine this capability extended to permit AM to
Let's
turn now from the
The True ones baceme examples of Listequality, and the paira of lists which didn't satisfy this predicate became known ac
nonexamples (failures, foibles,...). A heuristic similar to thie "random instantiation" one ie illustreted in
Section 4.4, on page 18
In actuality, thie would be checked to aneura that tha result lias between 0 and 1000.
AM:
Chapter 4
Discovery in Mathematics as Heuristic Search
syntactically analyze such strings, or to trivially compute some sort of "difference" between
6
two given reasons. Each reason is assumed to have some semantic impact on the user, and
is kept around partly for that purpose.
I
Each reason will have a numeric rating (a number between 0 and 1000) assigned to it
locally, by the heuristic rule which proposed the task for that reason. One global formula
will then combine all the reasons' ratings into one single priority value for the task.
4.2.2. The Ratings Came
In general, a task on the agenda list will have several reasons in support of it. Each reason
consists of an English phrase and a numeric rating. How can a task have more than one
reason? There are two contributing factors: (i) A single heuristic rule can have several
reasons in support of a job it suggests, and (ii) When a rule suggests a "new" task, that very
same task may already exist on the agenda, with quite distinct reasons tacked on there. In
that case, the new reason(s) are added to the alreadyknown ones.
I
I
One global formula looks at all the ratings for the reasons, and combines them into a single
priority value for the task as a whole. Below is that formula, in all its gory detail:
Worth(J)

SQRT(SUM Rj 2 ) x [ .2xWorth(A) ♦ .3xWorth(F)
♦
.sxWorth(C)]
Where J = job to be judged ■ (Act A, Facet F, Concept C)
and {Rj} are the ratings of the reasons supporting J.
1

For example, consider the job J (Check examples of Primes). The act A would be
"Check", which has a numeric worth of 100. The facet F would be "Examples", which has
a numeric worth of 700. The concept C would be "Primes", which at the momenl might
have Worth of 800. Say there were four reasons, having values 200, 300, 200, and 500.
The double lines "...f indicate normalization, which means that the final value of the
squareroot must be between 0 and 1, which is done by dividing the result of the Squareroot by 1000 and then truncating to 1.0 if the result exceeds unity.

In this case, we first compute Sqrt(2o0 2 4 300 2 + 2002 + 500 2) Sqrt(42o,ooo), which is
about 648. After normalization, this becomes 0.648. The expression in square brackets in
the formula 7 is actually computed as the dotproduct of two vectors8; in this case it is the
dotproduct of (100 700 800) and (.2 .3 .5), which yields 630. This is multiplied by the
normalized Squareroot value, 0.648, and we end up with a final priority rating of 408.
I
I
The four reasons each have a fairly low priority, and the total priority of the task is
6 It is in fact trivial to IMAGINE it. Of course DOING it ia quite a bit less trivial. In
"open reeearch problems" I'll propose.
7
Namely, [ o.2xWorth(A> ♦ o.3xWorth(F) ♦ o.sxWorth(C) \
8
Namely, and < .2, 3, .5 The dotproduct of ♦ (A3 x b3)
♦_
it probably ie tha tougheel of all tha
a2a3>
end ie defined
1
I
I
Chapter 4
AM:
Discovery in Mathematics ac Heuristic Search
41
therefore not great. It is, however, higher than any single reason multiplied by 0.648. This
is because there are many distinct reasons supporting it. The global formula uniting these
reasons' values does not simply take the largest of them (ignoring the rest), nor does it
simply add them up.
I
I
I
I
I
The above formula was intended originally as a first pass, an ad hoc guess, which I expected
I'd have to modify later. Since it has worked successfully, I have not messed with it. There
is no reason behind it, no justification for taking dotproducts of vectors, etc. I concluded,
and recent experiments tend to confirm, that the particular form of the formula is
unimportant; only some general characteristics need be present:
1. The priority value of a task is a monotone increasing function of each of its reasons'
ratings. If a new supporting reason is found, the task's value is increased. The
better that new reason, the bigger the increase.
2. If an alreadyknown supporting reason is reproposed, the value of the task is not
increased (at least, it's not increased very much). Like humans, AM is fooled
whenever the same reason reappears in disguised form.
3. The priority of a task involving concept C should be a monotone increasing
function of the overall worth of C. Two similar tasks dealing with two different
concepts, each supported by the same list of reasons and reason ratings, should be
ordered by the worth of those two concepts.
I believe that all of these criteria are absolutely essential to good behavior of the system.
Several of the experiments discussed later bear on this question (See Section 6.2, page
125). Note that the messy formula given on the last page does incorporate all 3 of these
constraints. In addition, there are a few features of that formula which, while probably not
necessary or even desirable, the reader should be informed of explicitly:
1. The task's value does not depend on the order in which the reasons were discovered.
This is not true psychologically of people, but it is a feature of the particular
priorityestimating formula initially selected.
2. Two reasons are either considered identical or unrelated. No attempt is made to
reduce the priority value because several of the reasons are overlapping
semantically or even just syntacticaly. This, too, is no doubt a mistake.
3. There is no need to keep around all the individual reasons' rating numbers. The
addition of a new reason will demand only the knowledge of the number of other
reasons, and the old priority value of the task.
4. A task with no reasons gets an absolute zero rating. As new reasons are added, the
priority slowly increases toward an absolute maximum which is dependent upon
the overall worth of the concept and facet involved.
There is one topic of passing interest which should be covered here. Each possible Act A
(e.g., Fillin, Check, Apply) and each possible facet F (e.g., Examples, Definition, Name(s)) is
assigned a fixed numeric value (by hand, by the author). These values are used inside the
formula on the last page, where it says 'Worth(A)' and 'Worth(F)'. They are fairly resistant
to change, but certain orderings should be maintained for best results. E.g., "Examples"
should be rated higher than "Specializations", or else AM may whirl away on a cycle of
specialization long after the concept has been constrained into vacuousness. As for the Acts,
their precise values turned out to be even less important than the Facets'.
Now that we've seen how to compute this priority value for any given task, let's not forget
what it's used for. The overall rating has two functions:
AM:
Chapter 4
Discovery in Mathematics as Heuristic Search
42
(i) The tasks on the agenda list are ordered by their ratings, and AM always chooses
the top task. Thus this rating determines which task to execute next. This is not an
ironclad policy: In reality, AM prints out the top few tasks, and the user hits the
option of interrupting and directing AM to work on one of those other tasks
instead of the very top one.
(ii) Once a task is chosen, its overall rating determines how much time and space AM
will expend on it before quitting and moving on to a new task. The precise
formulae are unimportant. Roughly, the 01000 rating is divided by ten to
determine how much time to allow, in cpu seconds. The rating is divided by :wo to
determine how much space to allow, in list cells.
4.3. Heuristics Create New Concepts
1
Recall that a heuristic rule's actions are of three types:
1 Suggest new tasks and add them to the agenda.
2. Create a new concept.
3. Fill in some entries for a facet of a concept.
.
This subsection discusses the second activity.
Here is the basic idea in a nutshell: Scattered among the "things to do" in the right handside of a rule are some requests to create specific new concepts. For each such request, the
heuristic rule must specify how to construct it. At least, the rule must specify ways of
assembling enough facets of the new concept to disambiguate it from all the other known
concepts. Typically, the rule will explain how to fill in the Definition of or an AlgDrithm
for
the new concept. After executing these instructions, the new concept will "exist", and
a few of its facets will be filled in, and a few new jobs will probably exist on the agenda,
indicating that AM might want to fill in certain other facets of this new concept in thie near
future.


4.3.1. An Illustration: Discovering Primes
Here is a heuristic rule that results in a new concept being created:
If the current task was (Fillin examples of F),
and F is an operation from domain space A into range space B,
and more than 100 items are known examples of A (in the domain of F),
and more than 10 range items (in B) were found by applying F to these domain items,
9
and at least 1 of these range items is a distinguished member (esp: extremum) of B,
Then (for each such distinguished member V Number
GENERALIZATIONS: TIMES
WORTH: 600
The name, definition, domain/range, generalizations, and worth are specified explicitly by
the heuristic rule.
The lambda expression stored under the definition facet is an executable LISP predicate,
which accepts two arguments and then tests them to see whether the second one is equal to
TIMESItself of the first argument. It performs this test by calling upon the predicate
stored under the definition facet of the TIMES concept. Thus TIMESItself.Defn(4,l6) will
call on TIMES.Defn(4,4,I6), and return whatever value that predicate returns (in this case,
it returns True, since 4x4 does equal 16).
A trivial transformation of this definition provides an algorithm for computing this
operation. The algorithm says to call on the Algorithms facet of the concept TIMES. Thus
TIMESItself.A lg(4) is computed by calling on TIMES.AIg(4,4) and returning that value
1
I
I
(namely, 16).
The worth of TIMES was 600 at the moment TIMESItself was created, and this becomes
the worth of TIMESItself.
TIMESItself is by definition a specialization of TIMES, so the SPECIALIZATIONS facet
Likewiie, the
of TIMES is incremented to point to this new concept.
to
TIMES.
points
facet
of
TIMESItself
GENERALIZATIONS
Note how easy it was to fill in these facets now, but how difficult it might be later on "out of
context". By way of contrast, the task of, e.g., filling in Specializations of TIMESItself will
be no harder later on than it is right now, so we may as well defer it until there's a good
I
I
I
I
I
Chaptar 4
AM:
Diacovery in Mathematics as Heuristic Search
47
reason for it. This task will probably be added to the agenda with so low a priority that
AM will never get around to it, unless some new reasons for it emerge.
The task "Fillin examples of TIMESItself" is probably worthwhile doing soon, but again it
won't be any harder to do at a later time than it is right now. So it is not done at the
moment; rather, it is added to the agenda (with a fairly high priority).
I
Incidentally, the reader may be interested to know that the next few tasks AM selected (in
reality) were to create the inverse of this operation (i.e., integer squareroot), and then to
create a new kind of number, the ones which can be produced by squaring (i.e., perfect
squares). Perfect squares were deemed worth having around because Integersquareroot is
defined precisely on that set of integers.
4.4. Heuristics Fill in Entries for a Specific Facet
The last two subsections dealt with how a heuristic rule is able to propose new tasks and
create new concepts. This section will illustrate how a rule can find some entries for a given
facet of a specific concept.
I
I
I
I
Typically, the facet/concept involved will be the one mentioned in the current task which
was chosen from the agenda. If the task "Fillin Examples of Setunion" were plucked from the
agenda, then the "relevant" heuristics would be those useful for filling in entries for the
Examples facet of the Setunion concept.
There is an important class of exceptions to this, however: conjectures. Some rules will
specify plausible relationships to look for; if found, they constitute a new conjecture. For
example, the reader will see in Section 4.4.4, on page 52, that the unique factorization
theorem is proposed merely as an observation of the form "The range of operation F is not
just B but rather the more specialized concept BB". The particular case of the unique
factorization theorem leads to this statement: "The range of Primefactorings 15 is not just
"Sets' but rather "Singletons'." In fact, this whole conjecture is recorded by merely replacing
Set> by as an entry on the Domain/range facet of the
concept Primefactorings.
The reader may be surprised to learn that the only kind of conjecture AM can make is of
16
Apparently, this is sufficient to
that form (add a new entry to some facet of some concept)
plausibly notice and state most interesting conjectures. Good definitions make the statements
17
of theorems short and simple.
.
Primafactoringa(x), also called Primetimea(x), ie the act of all bagaofprimea whose product is xi i.e., all waye of
factoring x into primee.
That'e why "conjecturing" ie classified under the "addanentry" type of heuristic rule action.
Exercise for the doubting reader: State the unique factorization theorem in purely settheoretic terme. Seriously, one
important way that definitions ere invented is to see what bulky construct in a theorem can be collapsed into
a single term. Typically one hopes that the term will be used elsewhere, of course
AM:
Chapter 4
Discovery in Mathematics as Heuristic Search
We'll take these two kinds of "filling in entries" one at a time: first the standard "find an
entry for the facet of the concept mentioned in the current task", followed by the interesting
but rarer activity of "looking for a conjecture".
4.4.1. An Illustration: "Fill in Examples of Setunion"
Recall that a task is typically of the form "Fill in facet F of concept C". How can executing
relevant heuristic rules satisfy such a task? This subsection illustrates how a heuristic rule
might be executed to find some entries for the facet designated by the current task.
A typical heuristic, attached to the concept Activity, says:
I
I
I
I
I
18
If the current task is to fill in examples of the activity F,
One way to get them is to run F on randomly chosen examples of the domain of F.
Of course, in the LISP implementation, this situationaction rule is not coded quite so
neatly. It would be more faithfully translated as follows:
If CURRENTTASK matches (FILLIN EXAMPLES F*anything)),
and F isa Activity,
Then carry out the following procedure:
1 Find the domain of F, and call it D;
2. Find examples of D, and call them E;
3. Find an algorithm to compute F, and call it A;
4. Repeatedly:
4a. Choose any member of E, and call it El.
4b. Run A on El, and call the result X.
4c. Check whether satisfies the definition of F.
4d. If so, then add X> to the Examples facet of F.
4e. If not, then add to the Nonexamples facet of F.
.
Let's take a particular instance where this rule would be useful. Say the current task is "Fillin
examples of Setunion". The lefthandside of the rule is satisfied, so the righthandside is
run.
Step (1) says to locate the domain of Setunion. The facet labelled Domain/Range, on the
Setunion concept, contains the entry (SET SET > SET), which indicates that the domain is
a pair of sets. That is, Setunion is an operation which accepts (as its arguments) a pair of
sets, and returns (as its value) some new set.
I
I
I
Since the domain elements are sets, step (2) says to locate examples of sets. The facet
labelled Examples, on the Sets concept, points to a list of about 30 different sets. This
includes {Z}, {A.B.C.D.E}, {}, {A,{{B}}},...
Step (3) involves nothing more than accessing some randomlychosen entry on the
Algorithms facet of Setunion. One such entry is a recursive LISP function of two
arguments, which halts when the first argument is the empty set, and otherwise pulls an
"Activity" ie c general concept which includes operations, predicates, relatione, functions, etc.
I
I
I
AM:
Chaptar 4
Discovery in Mathematics
as Heuristic Search
element out of that set and SETINSERT's it into the second argument, and then recurses
on the new values of the two sets. For convenience, we'll refer to this algorithm as UNION.
I
I
We then enter the loop of Step (4). Step (4a) has us choose one pair of our examples of
sets, say the first two {Z} and {A,B,C,D,E}. Step (4b) has us run UNION on these two sets.
The result is {A,B,C,D,E,Z}. Step (4c) has us grab an entry from the Definitions facet of
Setunion, and run it. A typical definition is this formal one:
(X (SI
S2 S3)
(AND
(For all x in SI, x is in S3)
(For all x in S2, x is in S3)
(For all x in S3, x is in SI or x is in S2)
)
) )
It is run on the three arguments SI {Z}, S2{A,B,C,D,E}, S3={A,B,C,D,E,Z}. Since it returns
"True", we proceed to Step (4d). The construct <{Z}, {A,B,C,D,E} » {A,B,C,D,E,Z}> is added
to the Examples facet of Setunion.
I
I
At this stage, control returns to the beginning of the Step (4) loop. A new pair of sets is
chosen, and so on.
But when would this loop stop? Recall that each task has a time and a space allotment
(based on its priority value). If there are many different rules all claiming to be relevant to
the current task, then each one is allocated a small fraction of those time/space quanta.
When either of these resources is exhausted, AM would break away at a "clean" point (just
after finishing a cycle of the Step (4) loop) and would move on to a new heuristic rule for
filling in examples of Setunion.
This concludes the demonstration that a heuristic rule really can be executed to produce the
kinds of entities requested by the current task.
I
I
I
I
I
I
4.4.2. Heuristics Propose New Conjectures
We saw in the sample excerpt (Chapter 2) that AM occasionally notices some unexpected
relationship, and formulates it into a precise conjecture. Below is an example of how this is
done. As you might guess from the placement of this subsection, 19 the mechanism is our
old friend the heuristic rule which fills in entries for certain facets.
In fact, a conjecture evolves through four stages:
1. A heuristic rule looks for a particular kind of relationship. This will typically be of
the form "X is a Generalization of V", or "X is an example of V", or "X is the
same as V", or "Fl.Defn(X.Y)" where Fl is an active concept AM knows about, or
19
or recall from tha opening remarks of Section 4.4
AM:
Chepter 4
Discovery in
Mathematics as Heuristic Search
"Fl.Defn(Y,X)"2
2. Once found, the relationship is checked, using supporting contacts. A great d»al of
empirical evidence must favor it, and any contradictory evidence must be
"explained away" somehow.
3. Now it is believed, and AM prints it out to the user. It is added as a new entry to
the Conjees facet of both concepts X and Y. It is also added as an entry :o the
Examples facet of the Conjecture concept.
4. Eventually, AM will get around to the task "Check Examples of Conjecture", or to the
task "Check Conjees of X". If AM had any concepts for proving conjectures, they
would then be invoked. In the current LISP implementation, these are absent.
Nevertheless, several "checks" are performed: (i) see if any new empirical cv dence
(pro or con) has appeared recently; (ii) see if this conjecture can be strengthened;
(iii) check it for extreme cases, and modify it if necessary; (iv) Modify the worth
ratings of the concepts involved in the conjecture.
I
°
I
I
The lefthandside of such a heuristic rule will be longer and more complex than mosl other
kinds, but the basic activities of the righthandside will still be filling in an entry for a
particular facet.
The entries filled in will include: (i) a new example of Conjectures, (ii) a new entry for the
Conjee facet of each concept involved in the conjecture, (iii) if we're claiming that concept
X is a generalization of concept V, then "X" would be added to the Generalizations facet of
V, and "V" added to the Specializations facet of X, (iv) if X is an Example of V, "X" is
added to the Examples facet of V, and "V" is added to the ISA facet of X.
The righthandside may also involve adding new tasks to the agenda, creating new
concepts, and modifying entries of particular facets of particular concepts. As is trui? of all
heuristic rules, both sides of this type of conjectureperceiving rule may run any little
functions they want to: any functions which are quick and have no side effects (e.g.,
FOR ALL tests, PRINT functions, accesses to a specified facet of some concept).
4.4.3. An Illustration: "All primes
except
2 are odd"
As an illustration, here is a heuristic rule, relevant when checking examples of any concept:
I
I
I
I
I
I
I
I
I
20
Theee last two cay that
and that FI(Y)X, respectively.
I
I
Chapter 4
Discovery in Mathematics ac Heuristic Search
If the current task is to Check Examples of X,
and (Forsome V) V is a generalization of X,
and V has at least 10 examples,
and all examples of V (ignoring boundary cases) are also examples of X,
Then print the following conjecture: X is really no more specialized than V,
and add it to the Examples facet of Conjectures,
and if the user asks, inform him that the evidence for this was that all Examples(Y) V's
(ignoring boundary examples of V's) turned out to be X's as well,
and Check the truth of this conjecture on boundary examples of V,
and add "X" to the Generalizations facet of V,
and add "V" to the Specializations facet of X,
and (if there is an entry in the Generalizations facet of V) add the following task to the
agenda "Check examples of V", for the reason: "Just as V was no more
general than X, oneof Generalizations(Y) may turn out to be no more
general than V", with a rating for that reason computed as:
o.4xExamples(Generalizations(Y))*
o.3xExamples(Y) *
o.3xPriority(Current task).
I
I
I
AM:
Let's take a particular instance where this rule would be useful. Say the current task is
"Check examples of Oddprimes". The lefthandside of the rule is run, and is satisfied when
the generalization V is the concept "Primes". Let's see why this is satisfied.
One of entries of the Generalization facet of Oddprimes is "Primes". AM grabs hold of
the 30 examples of primes (located on the Examples facet of Primes), and removes the ones
which are tagged as boundary examples ("2" and "3"). A definition of Oddprime numbers
is obtained (Definitions facet of Oddprimes), and it is run on each remaining example of
primes (5, 7, 11, 13, 17, ...). Sure enough, they all satisfy the definition. So all primes
(ignoring boundary cases) appear to be odd. The lefthandside of the rule is satisfied.
At this point, the user sees a message of the form "Oddprimes is really no more specialized
than Primes". If he interrupts and asks about it, he is told that the evidence for this was
that all 30 primes (ignoring boundary examples of primes) turned out to be Oddprimes.
I
I
I
I
I
Of the boundary examples (the numbers 2 and 3), only the integer "2" fails to be an oddprime, so the the user is notified of the finalized conjecture: "All primes (other than *2') are
also oddprimes". This is added as an entry on the Examples facet of the concept named
'Conjectures.'
Before beginning all this, the Generalizations facet of Oddprimes pointed to Primes. Now,
this rule has us add "Primes" as an entry on the Specializations facet of Oddprimes. Thus
Primes is both a generalization and a specialization of Oddprimes (to within a single stray
exception), and AM will be able to treat these two concepts as if they were merged together.
They are still kept separate, however, in case AM ever needs to know precisely what the
21
difference between them is, or in case later evidence shows the conjecture to be false
.
9
1
Whan space ie exhaueted, one emergency measure AM takes is to destructively coalesce a pair of concepts X,Y where X
is both c generalization of and a apecialization of V, even if there ere c couple "boundary" exceptions
1
AM:
Chapter 4
Discovery in Mathematics as Heuristic Search
The final action of the righthandside of this rule is to propose a new task (if there exist
some generalizations of the concept V, which in our case is "Primes"). So AM accesses the
Generalizations facet of Primes, which is "(Numbers)". A new task is therefore added lo the
agenda: "Check examples of Primes", with an associated reason: "Just as Primes was no more
general than Oddprimes, so Numbers may turn out to be no more general than Primes".
The reason is rated according to the formula given in the rule; say it gets the value 500.
To make this example a little more interesting, let's suppose that the task "Check examples of
Primes" already existed on the agenda, but for the reason "Many examples of Primes have
been found, but never checked", with a rating for the reason of 100, and for the task as a
whole of 200. The global taskrating formula then assigns the task a new overall priority of
600, because of the new, fairly good reason supporting it.
I
When that task is eventually chosen, the heuristic rule pictured above (at the beginning of
this subsection) will trigger and will be run again, with X»Primes and Y«Numbers. That is,
AM will be considering whether (almost) all numbers are primes. The lefthandside of the
heuristic rule will quickly fail, since, e.g., "6" is an example of Numbers which does not
satisfy the definition of Primes.
4.4.4. Another illustration: Discovering Unique Factorization
Below is a heuristic rule which is a key agent in the process of "noticing" the fundamental
theorem of arithmetic 22 (The reader may skip this subsection; it contains more details
about how AM actually proposed conjectures).
.
If F(a) is unexpectedly « B,
Then maybe (Vx) F(x) is a B.
Below, the same rule is given in more detail. The first conjunct on the IFpart of the
heuristic rule indicates that it's relevant to checking examples of any given operation F. A
typical example is selected at random, say F(x)=y. Then y is examined, to see if it satisfies
any more stringent properties than those specified in the Domain/range facet of F. That is,
the Domain/range facet of F contains an entry of the form A+B; so if x is an A, then all we
are guaranteed about y is that it is an example of a B. But now, this heuristic is asking if y
isn't really an example of a much more specialized concept than B. If it is (say it's an
example of a BB), then the rest of the examples of F are examined to see if they too satisfy
this same property. If all examples appear to map from domain set A into range set BB
(where BB is much more restricted than the set B specified originally in the Domain/range
facet of F), then a new conjecture is made: the domain/range of F is really A»BB, not A»B.
Here is that rule, in crisper notation:
22
The
unique fectorization conjecture: any positive integer n can be represented as the product of prime lumbers in
precisely one way (to within reorderings of those prime factora). Thus 28 " 2x2x7, and we don't
distinguish between the fectorization (2 2 7) and (2 7 2)
I
Chapter 4
AM:
Discovery in Mathematics as Heuristic Search
If the current task is to Check Examples of the operation F,
and F is an operation from domain A into range B,
and F has at least 10 examples,
and a typical one of these examples is "" (so x" as a new entry to the Domain/range facet of F, replacing
and add "",
and if the user asks, inform him that the evidence for this was that all Examples(F)
examples of F (ignoring boundary examples) turned out to be BB's,
and check Ihe truth of this conjecture by running F on boundary examples of A.
Let's see how this rule was used in one instance. In Task 79 in the sample excerpt in
Chapter 2, AM defined the concept Primetimes, which was a function transforming any
number n into the set of all factorizations of n into primes. For example, Primetimes(l2){(2 2 3)}, Primetimes(l3)={(l3)}. The domain of FPrimetimes was the concept
Numbers. The range was Sets. More precisely, the range was SetsofBagsofNumbers, but
AM didn't know that concept at that time.
The above heuristic rule was applicable. F was Primetimes, A was Numbers, and B was
Sets. There were far more than 10 known examples of Primetimes in action. A typical
example was this one: <21 » {(3,7)}>. The rule now asked that {(3,7)} be fed to each
specialization of Sets, to see if it satisfied any of their definitions. The Specializations facet
of Sets was acccessed, and each concept pointed to was run (its definition was run) on the
argument "{(3,7)}". It turned out that Singleton and Setofdoubletons were the only two
specializations of Sets satisfied by this example. At this moment, AM had narrowed down
the potential conjectures to these two:
1. Primetimes(x) is always a singleton set.
2. Primetimes(x) is always a set of doubletons
Each example of Primetimes was examined, until one was found to refute each conjecture
(for example, <8>{(2,2,2)}> destroys conjecture 2). But no example was able to disprove
conjecture I. So the heuristic rule plunged forward, and printed out to the user "A new
conjecture: Primetimes(n) is always a singletonset, not simply a set". The entry
Singletonsets> was added to the Domain/range facet of Primetimes, replacing
the old entry .
to discuss the robustness of the system. What if this heuristic
excised?
Could
AM
still propose unique factorization? The answer is yes, there
were to be
23
are other ways to notice it. If AM has the concept of a Function , then a heuristic rule like
the one in the previous subsection (page 50) will cause AM to ask if Primetimes is not
merely a relation, but also a Function.
Let's digress for a moment
23 A einglevalued relation. Thet ie, for any domain element F(x) contains precisely
undefined), nor ie it ever larger than a aingleton (ie., multiplevalued).
one member.
H ie
never empty (ia,
AM:
Chapter 4
Discovery in Mathematics as Heuristic Search
The past few sections should have convinced the reader that isolated heuristic rules teally
can do all kinds of things: propose new tasks, create new concepts, fill in entries for specific
facets (goaldriven), and look for conjectures (datadriven empirical induction). The rules
though that must be later verified empirically. They are
appear fairly general24
redundant in a pleasing way: some of the most "important", wellknown, and interesting
conjectures can (apparently) be derived in many ways. Again, we'll have to check this
I
I

experimentally.
4.5. Gathering Relevant Heuristics
Each concept has facets which contain some heuristics. Some of these are for filling in,
25
some for checking, some for deciding interestingness , some for noticing new conjectures,
I
etc.
AM contains hundreds of these heuristics. In order to save time (and to make AM appear
more rational), each heuristic should only be tried in situations where it might apply, where
it makes sense.
How is AM able to zero in on the relevant heuristic rules, once a task has been selected
from the agenda?
4.5.1. Domain of Applicability
The secret is that each heuristic rule is stored somewhere a propos to its "domain of
applicability". This "proper place" is determined by the first conjunct in the lefthand side
of the rule.
What does this mean? Consider this heuristic:
If the current task is to fill in examples of the operation F, <
and some examples of the domain of F are known,
Then one way to get examples of F is to run F on randomly chosen examples of the domain
of F.

This is a reasonable thing to try but only in certain situations. Should it be tried when
the current task is to check the Worth facet of the Sets concept? No, it would be irrational.
Of course, even if it were tried then, the lefthandside would fail very quickly. Yet some
cpu time would have been used, and if the user were watching, his opinion of AM would
24
i.e., applicable in many situations. It would be worse than useless if a rule existed which could lead to a aingle discovery
like "Fibonacci serieu" but never lead to any other discoveries. The reasons for demanding generality are
but the insights which occur when it is observed that several disparate concepts were
not only
ell motivated by the same general principle (e.g., "looking for the inverse image of extreme").
25
Tha reeder hae already seen several heuristics useful for filling in end checking facets; here ie one fo judging
interestingness: an entry on the Interest facet of Compose says that a composition AoB ie more irtereeting
rf the range of B equate the domain of A„ than if if they only partially overlap.
I
I
I
AM:
Chapter 4
Discovery in Mathematics as Heuristic Search
55
decrease.26
That particular heuristic has a precise domain of applicability: AM should use it whenever
the current task is to fill in examples of an operation, and only in those kinds of situations.
The key observation is that a heuristic typically applies to all examples of a particular
concept C. In the case we were considering, C=Operation. Intuitively, we'd like to tack that
heuristic onto the Examples facet of the concept Operation, so it would only "come to mind"
in appropriate situations. This is in fact precisely where the heuristic rule is stored.
Initially, the author identified the proper concept C and facet F for each heuristic H which
AM possessed, and tacked H onto CF 27 This was all preparation, completed long before
AM started up. Each heuristic was tacked onto the facet which uniquely indicates its
domain of applicability. The first conjunct of the IFpart of each heuristic indicates where
it is stored and where it is applicable. Notice the little arrow (<=) pointing to that conjunct
.
above.28
I
I
.
While AM is running, it will choose a task dealing with, say, facet F of concept C. AM
must quickly locate the heuristic rules which are relevant to satisfying that chosen task.
AM simply locates all concepts which claim C as an example. If the current task were
"Check the Domain/range of UnionoUnion" 29 , then C would be UnionoUnion. Which concepts
claim C as an example? They include ComposewithSelf, Composition, Operation, Active,
Anyconcept, and Anything. AM then collects the heuristics tacked onto facet F (in this
case, F is Domain/range) of each of those concepts. All such heuristics will be relevant. In
the current case, some relevant heuristics might be garnered from the Domain/range facet of
the concept Operation. Any heuristic which can deal with the Domain/range facet of any
operation can certainly deal with UnionoUnion's Domain/range. A typical rule on
Operation.Domain/range.Check30 would be this one:
If a
Dom/ran entry of F is of Ihe form , where R is a generalization of
D,
Then test whether the range might not be simply D.
I
Suppose one entry on UnionoUnion.Dom/ran was ''. Then this last heuristic rule would be relevant, and would have
AM ask the plausible question: Is the union of three nonempty sets always nonempty? The
26
I
This notion of worrying about a human user who is observing AM run in real time may appear to be quite language and
machinedependent An increaae in speed of a couple ordera of magnitude would radically alter the
qualitative appearance of AM. In Chapter 7, however, the reader will grasp how difficult it ie to
objectively rate a system like AM. For that reason, all measures of judgment must be reepected. Also, to
the actual human being using the system this really ia one of the most important measuree.
27 Recall thet CF ie en abbreviation for facet F of concept C
conjuncts are omitted, since the placement of a heuristic eervee the seme
some "prepreconditione" (like these first conjuncts) to determine relevance quickly.
Thie operation ie defined as: UnionoUnion(x,y,z) « (x U y) U z. ft accepts 3 eels as arguments, and returne e new
In tha LISP implementation, these first
purpose cc if it had
eet cc its value
tho 'Check' subfacet of tho 'Domein/renge' fecet of the 'Operation' concept
AM:
Chapter 4
I
Discovery in Mathematics as Heuristic Search
answer is affirmative, empirically, so AM modifies that Domain/range entry for
UnionoUnion. AM would ask the same question for Intersectolntersect. Although the
answer then would be 'No', it's still a rational inquiry. If AM called on this heuristic rule
when the current task was "Fillin specializations of Bags", it would clearly be an irrational act.
The domain of applicability of the rule is clear, and is precisely fitted to the slot where the
rule is stored (tacked onto Operation.Domain/range).
To recap the basic idea: when dealing with a task "Do act A on facet F of concept C, AM
must locate all the concepts X claiming C as an example. AM then gathers the heuristics
tacked onto X.F.A, for each such general concept X. All of them
and only they
are
relevant to satisfying that task.

So the whole
problem of locating relevant heuristics has been reduced

to the
problem of
I
I
1
efficiently finding all concepts of which C is an example (for a given concept C). This
process is called "rippling away from C in the ISA direction", and forms the subject of the
next subsection.
4.5.2. Rippling
Given a concept C, how can AM find all the concepts which claim C as an example?
The most obvious scheme is to store this information explicitly. So the Examples facet of C
would point to all known examples of C, and the Isa facet of C would point to all known
concepts claiming C as one of their examples. Why not just do this? Because one can
substitute a modest amount of processing time (via chasing links around) for the vast
amount of storage space that would be needed to have "everything point to everything".
Each facet contains only enough pointers so that the entire graph of Exs/Isa and Spec/Genl
links could be reconstructed if needed. Since "Genl"31 is a transitive relation, AM can
compute that Numbers is a generalization of Mersenneprimes, if the facet Mersenneprimes.Genl contains the entry "Oddprimes", and Oddprimes.Genl contains a pointer to
"Primes", and Primes.Genl points to "Numbers". This kind of "rippling" activity is used to
efficiently locate all concepts related to a given one X. In particular, AM knows how to
"ripple upward in the Isa direction", and quickly32 locate all concepts which claim Xas one
of their examples.
I
I
I
I
I
It turns out that AM cannot simply call for X.lsa, then the Isa facets of those concepts, etc.,
31
"Genl" is an abbreviation for the Generalizations facet of a concept; similarly, "Spec" means Specializations, Exs means
Examplea, etc. "Isa" is the converae facet to Exs; ie, A < BExs iff B ( A.lsa. Saying "Genl ie transitive" just
means the following: if A is a generalization of B, and B of
then A is also a generalization of C
32
With about 200 known concepts, with each laa facet and each Genl facet pointing to about 3 other concepte, iibout 25
links will be traced along in order to locate about a dozen final concepts, each of which claime the g iven one
as an example. Thi» whole rippling process, tracing 25 linkages, uses less then .01 cpu seconds, in
compiled Interlisp, on a KIIO type PDP10.
I
I
I
I
AM:
Chaptar 4
Discovery in Mathematics as Heuristic Search
.
because Isa is not transitive33 For the interested reader, the algorithm AM
Isas of X Is given below. 3
57
uses
to collect
1. All generalizations of the given concept X are located. AM accesses X.Genl, then
the Genl facets of those concepts, etc.
2. The "Isa" facet of each of those concepts is accessed.
3. AM locates all generalizations of these newlyfound higherlevel concepts. This is
the list of all known concepts which claim X as one of their examples.
form, one might express this rippling recipe more compactly as:
Genl*(lsa(Genl*(X))). There is not much need for a detailed understanding of this process,
hence it will not be delved into further in this thesis. This section probably already
contains more than anyone would want to know about rippling. 34
In
regular
4.5.3. Ordering the Relevant Heuristics
Now that all these relevant heuristics have been assembled, in what order should A.M
35 It
is important to note that the heuristics tacked onto very general concepts
will be applicable frequently, yet will not be very powerful. For example, here is a typical
heuristic rule which is tacked onto the Examples facet of the very general concept Anyconcept:
execute them?
I
If the current task is to fill in examples of any concept X,
Then one way to get them is to symbolically instantiate 36 a definition of X
It takes a tremendous amount of inference to squeeze a couple awkward examples of
Intersectolntersect out that concept's definition. Much time could be wasted doing so37
.
33 If x isa
y, and y isa
then xia (generally) NOT az. Thia is due to the intranaitivity of "memberof". Generalization ie
transitive, on the other hand, because "subsetof" is transitive.
34
interested reader, it is explained in great detail in file RlPPLE[die,dbl] at SAIL. Thie file hae been
For tho
permenently archived at SAIL.
35
Tha discussion below assumea that the heuristics don't interact with each other; ie, that each one may act independently
of all others. The validity of this simplification ia tested empirically (see Chapter 6) and discussed
theoretically (ccc Chapter 7) later.
36
"Symbolic inetantiation" ie a euphemism for a bag of tricka which transform a declarative definition of a concept into
particular entities satiafying that definition. The only constraint on the tricks is that they not actually run
the definition. One auch trick might be: if the definition is recursive, merely find come entity that satisfies
the base step. AM's symbolic instantiation tricks are too handcrafted to be of great interest, hence this will
not be covered any more deeply here. The interested reader is directed to the pioneering work by
[Lombards A Raphael 64], or the more recent literature on these techniques applied to automatic program
verification (eg., [Moore 75]).
37
Incidentally, thie illustrates why no single heuristic should be allowed to monopolize the processing of any one tack.
Chapter 4
AM:
Discovery in Mathematics aa Heuristic Search
Just as general heuristics are weak but often relevant, specific heuristics are powerful but
rarely relevant. Consider this heuristic rule, which is attached to the very specific concept
ComposewithSelf:
I
I
If the current task is to fill in examples of the composition FoF,
Then include any fixedpoints of F.
.
For example, since Inters«;ct(^At,X) equals phi, so must lntersectolntersect(^Ai,X,Y). 38
Assuming that such examples exist already on Intersect, this heuristic will fill in a few
examples of Intersectolntersect with essentially no processing required. Of course the
domain of applicability of this heuristic is minuscule.
As we expected, the narrower its domain of applicability, the more powerful and efficient a
heuristic is, and the less frequently it's useful. Thus in any given situation, where AM has
gathered many heuristic rules, it will probably be best to execute the most specific ones first,
and execute the most general ones last.
Below are summarized the three main points that make up AM's scheme for finding
relevant heuristics in a "natural" way and then using them:
1. Each heuristic is tacked onto the most general concept for which it applies: it is
given as large a domain of applicability as possible. This will maximize its
generality, but leave its power untouched. This brings it closer to the 'ideal"
tradeoff point between these two quantities.
I
2. When the current task deals with concept C, AM ripples away from C and quickly
locates all the concepts of which C is an example. Each of them will contain
heuristics relevant to dealing with C.
I
3. AM then applies those heuristics in order of increasing generality. You may wonder
how AM orders the heuristics by generality. It turns out that the rippling process
automatically gathers heuristics in order of increasing generality. In the LISP
system, each rule is therefore executed as soon as it's found. So AM nevers wastes
time gathering heuristics it won't have time to execute.
I
4.6. AM's
Starting
I
Heuristic
This section will briefly characterize the collection of 242 heuristic rules which AM was
originally given. A complete listing of those rules is found in Appendix 3; the rule
numbers below refer to the numbering given in that appendix.
phi
ie another name for the empty set, also written {}. This last sentence thus says that since {} ft X
X) 0 V must also equal {}.
" {}, tl en ({} 0
I
Chaptar 4
AM:
Discovery in Mathematics aa Heuriatic Search
4.6.1. Heuristics Grouped by the Knowledge They Embody
I
Many heuristics embody the belief that mathematics is an empirical inquiry. That is, one
approach to discovery is simply to perform experiments, observe the results, thereby gather
statistically significant amounts of data, induce from that data some new conjectures or new
concepts worth isolating, and then repeat this whole process again. Some of the rules which
capture this spirit are numbers 21, 4357, 91, 136139, 146148, 153154, 212216, 225,
and 241. As one might expect, most of these are "Suggest" type rules. They indicate
plausible moves for AM to make, promising new tasks to try, new concepts worth studying.
Almost all the rest are "Fillin" type rules, providing empirical methods to find entries for a
specified facet.

—
Another large set of heuristics is used to embody
or to modify
what can be called
"focus of attention". When should AM keep on the same track, and when not? The first
rules expressing varying nuances of this idea are numbers 15. The last such rules are
numbers 209216. Some of these rules are akin to goalsetting mechanisms (e.g., rule 141).
In addition, many of the "Interest" type rules have some relation to keeping AM interested
in recentlychosen concepts (or: in concepts related to them, e.g. by Anajogy, by Genl/Spec,
by Isa/Ex5,...).
I
The remaining "Interest" rules are generally some reechoing of the following notion: X is
interesting if F(X) has an unexpected (interesting) value. For example, in rule 26, "F(X)"
is just "Generalizations of X". In slightly more detail, the principle characteristics of
interestingness are:
" symmetry (e.g., in an expanding analogy)
often)
" coincidence (e.g., in a concept being rediscovered
appropriateness (e.g., in choosing an operation H so that GoH will have nicer
Domain/Range characteristics than G itself did)
recency (see the previous paragraph on focus of attention)
"
(e.g., the first entity observed which satisfies some property)
" individuality
(e.g.,
usefulness
there are many conjectures involving it)
"
(i.e.,
the given concept is related to an interesting one)
" association
One group of heuristic rules embeds syntactic tricks for generalizing definitions (Lisp
predicates), specializing them, instantiating them, symbolically evaluating them, inverting
them, rudimentarily analyzing them, etc. For example, see rules 31 and 89. Some rules
serve other syntactic functions, like ensuring that various limits aren't exceeded (e.g., rule
15), that the format for each facet is adhered to (e.g., rule 16), that the entries on each
facet are used as they are meant to be (e.g., rules 9 and 59), etc. Many of the "Check"
type heuristics fall into this category.
Finally, AM possesses a mass of miscellaneous rules which evade categorization. See, e.g.,
rules 185 and 236. These range from genuine math heuristics (rules which lead to discovery
frequently) to simple data management hacks.
I
No detailed analysis has been performed on the set of heuristics AM possesses, as of the
time of writing of this thesis.
AM:
Chapter 4
Discovery in Mathematics as
Heuristic Search
4.6.2. Heuristics Grouped by How Specific They Are
Another dimension of distribution of heuristics, aside from the above functional one, is
simply that of how high up in the Genl/Spec tree they are located. The table below
summarizes how the rules were distributed in that tree:
LEVEL
gjfc
Anything
0
1 AnyConcept
Active
2
Operation
3
Z4
Union
c
Con'a aw//Heur
1
1
1
1
2
2
3
6
100
11
c
Heurs
10
110
24
31
63
Avg
Avg
'Heur
w/Heur
10.0 10.0
110.0 110.0
12.0 12.0
5.2
10.3
5.7
0.6
c
Fillin a'Sugg
« Sui £a> Check
0
39
7
11
26
5
30
10
3
15
0
20
4
3
8
i
Int
5
21
3
14
16
Here is a key to the column headings:
LEVEL: How far down the Genl/Spec tree of concepts we are looking,
e.g.: A sample concept at that level.
Con's: The total number of concepts at that level.
w/Heur. How many of them have some heuristics.
Heurs: The total number of heuristics attached to concepts at that level.
Avg: (" Heurs) / (" Concepts); i.e., the mean number of heuristics per concept, al that
level.
Avg w/Heur: (" Heurs) / (" w. Heurs)
Fillin: Total number of "Fillin" type heuristics at that level.
Sugg: Total number of "Suggest" type heuristics at that level.
Check: Total number of "Check" type heuristics at that level.
Int: Total number of "Interestingness" type heuristics at that level.
"
"
"
"
"
"
"
The heuristic rules are seen not to be distributed uniformly, homogeneously among all the
initial concepts. The extent of this skewing was not realized by the author until the above
table was constructed. A surprising proportion of rules are attached to the very general
concepts. The top 107. of the concepts contain 737. of all the heuristics. One notable
exception is the "Interest" type heuristics: they seem more evenly distributed throughout the
tree of initial concepts. This tends to suggest that future work on providing "metaheuristics" should concentrate on how to automatically synthesize those Interest heuristics for
newlycreated concepts.
I
I
I
Chapter 5. AM's Concepts
I
This chapter contains material about AM's anatomy. After a brief overview, we'll look in
detail at the way concepts are represented (Section 5.2). This includes a discussion of each
kind of facet a concept may possess. Wedged in among the implementation details and
formats are a horde of tiny ideas; they should be useful to anyone contemplating working
on a system similar in design to AM.
chapter closes by sketching all the knowledge AM starts with. The concepts will be
diagrammed, and will also have a brief description, sufficient for the reader to follow later
chapters without trouble. Instead of using up a large number of pages for an unreadable
listing of all of the specific information initially supplied re each concept, such complete
The
coverage is relegated to Appendix
2.1.
The next chapter starts on page 1 14. 2
5.1. Motivation and Overview
Each concept consists merely of a bundle of facets. The facets represent the different aspects
of each concept, the kinds of questions one might want to ask about the concept:
How valuable is this concept?
What is its definition?
If it's an operation, what is legally in its domain?
What are some generalizations of this concept?
How can you separate the interesting instances of this concept from the dull ones?
etc.
Since each concept is a mathematical entity, the kinds of questions one might ask are fairly
constant from concept to concept. This set of questions might change significantly for a new
domain of concept.
One "natural" representation for a concept in LISP is therefore as a set of attribute/value
I
Thai appendix liete each concept, giving a condensed listing of the facta initially given (by the author) to AM about each
facet of that concept. Thie material ia tranalated from LISP into English and atandard math notation. Tha
appendix is preceded by an alphabetical index of the concepte and the page number on which they ara
etill in LISP
praeented. That index ie on pege 173. Some unmodified "concepte"
ara displayed in
Appendix 2.3.
—
2 Though devoid of theoretical significance, that sentence has alae
proved of high empirical vakie
—
1
Chapter 5
AM:
Discovery in Mathematics ac Haurietic Search
pairs. That is, each concept is maintained as an atom with a property list. The names cf the
properties (Worth, Definitions, Domain/Range, Generalizations, Interestingness, etc.)
correspond to the questions above, and the value stored under property F of atom C is
simply the value of the Ffacet of the Cconcept. This value can also be viewed as the
answer which expert C would give, if asked question F. Or, it can be viewed as the contents
of slot F of frame C.
5.1.1. A Glimpse of a Typical Concept
As an example, here is a stylized rendition of the SETS concept. This is a concept whch is
meant to correspond to the notion of a set of elements. The format P: vj,V2,... is used to
indicate that the value of property P is the list Vj.Vo,. That is, the concept Sets has entries
vj.vo, for its facet P. For example, according to the box below, "Singleton" is one enfy on
the Specializations facet of Sets.

and what are appatently
I shall not digress here to explain each of these entries
3 For now, just glance at it lo get
this
Such
will
be
done
later
in
chapter
things
omissions.
the flavor of what a concept is like.
.
II
I
3 Tha
individual facete will be discussnd one at a time. This particular concept ie shown at an intermediate etate o f being
filled in. Although several facete are blank, many are filled in which were initially empty (e.g., Example* ). The
reeder wishing to see what this concept waa like at the time thet AM started up should turn ahead to pege
2 1 1 (inside Appendix 2).
AM:
Chapter 5
I
Discovery in Mathematics aa Heuristic Search
Name(s): Set, Class, Collection
Definitions:
Recursive: X (S) [$*{} or Set.Definition (Remove(Anymember(S),S))]
Recursive quick: X (S) [S»{} or Set.Definition (CDR(S))]
Quick: X (S) [Match S with {...} ]
Specializations: Emptyset, Nonemptyset, Setofstructures, Singleton
Generalizations: UnorderedStructure, NomultipleelementsStructure
Examples:
Typical: {{}}, {A}, {A,B}, {3}
Barely: {}, [A, B, {C, {{{A, C, (3,3,3,9), <4,1 ,A,{B},A>}}}}}
Notquite: {A,A}, (), {B,A}
Foible: <4,1,A,1>
Conjee's: All unorderedstructures are sets.
Intu's:
Geometric: Venn diagram. {See [Venn 89], or [Skemp 7 1 ].}
Analogs: bag, list, oset
Worth: 600
View:
Predicate: X (P) {x. Next, the "Check" facet of each of these is examined, and all its heuristics are
collected. For example, the Check facet of the NomultipleelementsStructures concept
contains the following entry: "Eliminate multiple occurrences of each element" (of course this
is present not as an English sentence but rather as a little LISP function). So even though
Sets has no entries for its Check facet, several little functions will be gathered up by the
rippling process. Each potential set would be subjected to all those checks, and might be
probably
sets. If
modified or discarded as a result.
There is enough "structure" around to keep the heuristic rules relevant to this task isolated
from very irrelevant rules, and there is enough "uniformity" to make finding those rules
very easy.
The same rippling would be done to find predicates which tell whether a set is interesting
or dull. For example, one entry on the Interestingness facet of the Structure concept says
that a structure is interesting if all pairs of members satisfy the same rare predicate P(x,y)
[for any such P]. So a set, all pairs of whose members satisfy "Equality," would be
considered interesting. In fact, every Singleton is an interesting Structure for just that
reason. A singleton might be an interesting Anything because it takes only a few characters
to type It out (thereby satisfying a criterion on Anything.lnterest).
To locate all the specializations of Sets, the rippling would go in the opposite direction. For
example, one of the entries on the Specializations facet of Sets is Setofstructures; one if its
Specialization entries is Setofsets. So this latter concept will be caught in the net when
rippling away from Sets in the Specializations direction.
If AM wants lots of examples of sets, it has only to ripple in the Specializations direction,
gathering Examples of each concept it encounters. Examples of Setsofsets (like this one:
{{A},{{C,D}}}) will be caught in this way, as will examples of Setsofnumbers (like this one:
{1,4,5}), because two specializations of Sets are SetsofSets and SetsofNumbers 9
.
In addition to the three main reasons for keeping the set of facets the same for all the
concepts (see previous page), we claimed there were also reasons for keeping that set fixed
once and for all. Why not dynamically enlarge it? To add a new facet, its value has to be
filled in for lots of concepts. How could AM develop the huge body of heuristics needed to
guide such fillingin and checking activities? Also, the number of facets is small to begin
with because people don't seem to use more than a few tens of such "properties" in
classifying knowledge about a concept 10 If the viability of AM seemed to depend on this
ability, I would have worked on it. AM got along fine without being able to enlarge its set
of facets, so no time was ever spent on that problem. I leave it as a challenging, ambitious
"open research problem".
.
g
I
by "rippling" upward from
We ere
eesuminf that AM
in the Genl direction
aome time, and already diacovered
and already defined SeteofNumbere.
Thi* data ie gathered from introapection by myself and a few
and ahould probably be teated by performing come
haa run for
psychologicel experiment*.
AM:
Chapter 5
Discovery in Mathematica aa Heuriatic Search
5.1.3. BEINGs Representation of Knowledge
Before discussing each facet in detail, let's interject a brief historic digression, to explain the
origins of this modular representation scheme.
The ideas arose in an automatic programming context, while working out a solution to the
problem of constructing a computer system capable of synthesizing a simple conceptdiscrimination program (similar to [Winston 70]). The scenario envisioned was ore of
mutual cooperation among a group of a hundred or so experts, each a specialist in >ome
minute detail of coding, concept formation, debugging, communicating, etc. Each expert was
modelled by one module, one BEING. Each BEING had the same number of slots (parts,
facets), and each slot was interpreted as a question which that BEING could answer. The
community of experts carried on a roundtable discussion of a programming task which was
specified by a human user. Eventually, by cooperating and answering each other's
questions, they hammered out the program he desired. See [Lenat 75b] for details.
The final system, called PUP6, did actually synthesize several large LISP programs,
including many variants of the conceptlearning program. This is described fully in [Lenat
75a]. Unfortunately, PUP 6had virtually no natural language ability and was therefore
unusable by an untrained human. Its modal output was "Eh?".
The search for a new problem domain where this communication difficulty wouldn't tie so
severe led to consideration of elementary mathematics.
The other main defect of PUP6was its narrowness, the small range of 'target' programs
which could be synthesized. PUP6had been designed with just one target in mind, and
almost all it could do was to hit that target. The second constraint on the new task domain
was then one of having a nonspecific target, a very broad or diffuse goal. This pointed to
an automated researcher, rather than a problemsolver.
These two constraints then were (i) elementary math, because of communication ease, and
(ii) selfguided exploration, because of the danger of too specific a goal. Together, they
directed the author to an Investigation which ultimately resulted in the AM project.
5.2. Facets
How is each concept represented? Without claiming that this is the "best" or preferred
scheme, this section will treat In detail AM's representation of this knowledge.
We have seen that the representation of a concept can loosely be described as a collection of
facet/value pairs, where the facets are drawn from a fixed set of about 25 total possible
facets.
The facets break down into three categories:
1. Facets which relate this concept C to some other one(s): Generalizations,
Specializations, Examples, Isas, Indomainof, Inrangeof, Views, Intu's, Analogies,
Conjee's
2. Facets which contain information intensive to this concept C: Definitions,
Algorithms, Domain/Range, Worth, Interest
AM:
Chapter 5
Discovery in Mathematice
Heuriatic Search
3. Subfacets, containing heuristics, which can be tacked onto facets from either group
above. These include: Suggest, Fillin, Check
Some facets come in several flavors (e.g., there are really four separate facets
 which point to Examples: boundary, typical, justbarelyfailing, foibles).
 not just one
This section will cover each facet in turn. Let's begin by listing each of them. For a change
of pace, we'll show a typical question that each one might answer about concept C: n
Name: What shall we call C when communicating with the user?
Generalizations: Which other concepts have less restrictive definitions than C?
Specializations: Which concepts satisfy C's definition plus some additional constraints?
Examples: What are some things that satisfy C's definition?
12
Isa's. Which concepts' definitions does C itself satisfy?
Indomainof: Which operations can be performed on C's?
Inrangeof: Which operations result in values which are C's?
Views: How can we view some other kind of entity as if it were a C?
Intu's: What is an abstract, analogic representation for C?
Analogies: Are there similar (though formally unrelated) concepts?
Conjee's: What are some potential theorems involving C?
Definitions: How can we tell if x is an example of C?
Algorithms: How can we execute the operation C on a given argument?
Domain/Range: What kinds of arguments can operation C be executed on? What
kinds of values will it return?
Worth: How valuable is C? (overall, aesthetic, utility, etc.)
Interestingness: What special features make a C especially interesting?
In addition, each facet F of concept C can possess a few little subfacets which contain
heuristics for dealing with that facet of C's:
F.Fillin: How can entries on CF be filled in? These heuristics get called on when the
current task is "Fillin facet F of concept X", where X is a C.
F.Check: How can potential entries on CF be checked and patched up?
F.Suggest: If AM gets bogged down, what are some new tasks (related to CF) it might
consider?
We'll now begin delving into the syntax and semantics of each facet, one by one. Future
chapters will not depend on this material. The reader may wish to skip to Section 5.3 (page
105).
5.2.1. Generalizations/Specializations
In thia diacueeion, "C" repreeente the name of the concept whose facet ia being diacuaaed, and may be read "the given
concept".
12
Notice that C will therefore be an example of each member of laa'a(C).
AM:
Chapter 5
Discovery in Mathematics as
Heuristic Search
Generalization makes possible conscious, controlled, and accurate accomodation of
one's existing schemas, not only in response to the demands for assimilation of
new situations as they are encountered, but ahead of these demands, seeking or
creating new examples to fit the enlarged concept.
~ Skemp
We say concept A "is a generalization of concept B iff every example of B is an example of
A. Equivalently, this is true iff the definition of B can be phrased as "X (x) [A.Defn(x) and
P(x)]"; that is, for x to satisfy B's definition, it must satisfy A's definition plus some
additional predicate P. The Generalizations facet of concept C will be abbreviated as
C.Genl.
C.Genl does not contain all generalizations of C; rather, just the "immediate" ones. More
formally, if A is a generalization of B, and B of C, then C.Genl will not contain a pointer to
A. Instead, C will point to 813.B 13
.
Here are the recursive equations which permit a search process to quickly find all
generalizations or specializations of a given concept X:
Genl*(X) ■ {X} U Generalizations(X.Genl)
Generalizations(X) =
Specializations(X) ■
Spec*(X) = {X} U Specializations(X.Spec)
For the reader's convenience, here are the similar equations, presented elsewhere in the text,
for finding all examples of and Isas of X:
Examples(X)
Isa's(X) ■



Spee*(Exs(Spec*(X)))
Genl*(lsa(Genl*(X)»
The format of the Generalizations facet is quite simple: it is a list of concept names. The
Generalizations facet for Odd primes might be:
(Oddnumbers Primes)
13
XI Gen! will contain an entry X2,..; Xn.Genl will contain B
In general, C.Genl will contain an entry
will contain Vli.i Yn.Genl will contain A.
as one entry; 3 Genl
I
I
Chapter 5
AM:
Discovery in Mathematics ac Heuriatic Search
Here is a small diagram representing generalization relationships. The only lines drawn
represent the pointers found in the Genl facets of these concepts:
Object
Oddnumber 8
Evenprimes
tier sennepr imes
Each of those lines represents an arrow which slants upwards, indicating a Genl link. For
example, we see that the Generalizations facet of Oddprimes contains pointers to both
Oddnumbers and to Primes. There is no pointer from Oddprimes upward to Number,
because there is an "intermediate" concept (namely, Primes). There is no pointer from
Mersenneprimes to Object, since a chain of intermediate concepts links them.
The reason for these strange constraints is so that the total number of links can be
minimized. There is no harm if a few redundant ones sneak in. In fact, frequentlyused
paths are granted the status of single links, as we shall soon see.
We've been talking about both Specializations and Generalizations as if they were very
similar to each other. It's time to make that more explicit:
Chapter 5
AM:
Discovery in Mathematics a* Heuristic Search
Specializations are the converse of Generalizations. The format is the same, and (hopefully)
A is an entry on B's Specializations facet iff B is an entry on A's Generalizations facet.
The uses of these two facets are many:
1. AM can sometimes establish independently that A is both a generalization and a
specialization of B; in that case, AM would like to recognize that fact easily, so it
can conjecture that A and B specify equivalent concepts. Such coincidences are
easily detected as cycles in the Genl (or Spec) graph. In these cases, AM may
physically merge A and B (and all the other concepts in the cycle) into one concept.
2. Sometimes, AM wants to assemble a list of all specializations (or generalizations) of
X, so that it can test whether some statement which is just barely true (or false) for
X will hold for any of those specializations of X.
3. Sometimes, the list of generalizations is used to assemble a list of isas; the list of
specializations helps assemble a list of examples. 14
4. A common and crucial use of the list of generalizations is to locate all the heuristic
rules which are relevant to a given concept. Typically, the relevant rules are those
tacked onto Isas of that concept, and the list of Isas is built up from the list of
generalizations of that concept. This was also mentioned on page 56.
5. To incorporate new knowledge. If AM learns, conjectures, etc. that A is a
specialization of B, then all the machinery (all the theorems, algorithms, etc.) for B
become available for working with A.
Here is a little trick that deserves a couple paragraphs of its own. AM stores the answers to
common questions (like "What are all the specializations of Operation") explicitly, by
intentionally permitting redundant links to be maintained. If two requests arrive closely in
time, to test whether A is a generalization of B, then the result is stored by adding "A" as
an entry on the Generalizations facet of B, and adding "B" as a new entry on the
Specializations facet of A. The slight extra space is more than recompensed in cpu time
saved.
If the result were False (A turned out not to be a generalization of B) then the links would
specify that finding explicitly, so that the next request would not generate a long search
again. Such failures are recorded on two additional facets: Genlnot and Specnot. Since
most concept pairs A/B are related by Specnot and by Genlnot, the only entries which get
recorded here are the ones which were frequently called for by AM. If space ever gets tight,
all such facets can be wiped clean with no permanent damage done.
These two "shadow" facets (Genlnot and Specnot) are not useful or interesting in their own
right. If AM ever wished to know all the concepts which are not generalizations of C, the
fastest way would be to take the setdifference of all concepts and Generalizations(C). Since
they are quite incomplete, Genlnot and Specnot are used more like a cache memory: they
save time whenever they are applicable, and don't really cost much when they aren't
applicable. Because of their superfluity, these two facets will not be mentioned again. I only
mentioned them above because they do greatly speed up AM's execution time, and because
they may have some psychological analog.
14 Thi*
proce**
waa called RIPPLING, and
wa*
described in Chapter 4. See also footnote 34 in that chapter.
AM:
Chapter 5
Discovery in Mathematics aa Heuriatic Search
71
5.22. Examples/Isa's
Usually, to show that a definition implies no contradiction, we proceed by example.
we try to make an example of a thing satisfying the definition. We wish to define
a notion A, and we say that, by definition, an A is anything for which certain
postulates are true. If we can demonstrate directly that all these postulates are
true of a certain object B, the definition will be justified; the object B will be an
example of an A.
—
Poincare 1
Following Poincare', we say "concept A is an example of concept B" iff A satisfies B's
definition. 15 Equivalently, we say that "A isa B". It would be legal (in that situation) for "A"
to be an entry on B.Exs (the Examples facet of concept B) and for "B" to be an entry on
A. lsa (the Isas facet of concept A). Some earlier mention of the Examples and Isas facets
can be seen in Chapter 4, page 57.
The Examples facet of C does not contain all examples of C; rather, just the "immediate"
ones. The examples facet of Numbers will not contain "11" since it is contained in the
examples facet of Oddprimes. A "rippling" procedure is used to acquire a list of all
examples of a given concept. The basic equation is:
Examples(x) ■ Specializations(Exs(Specializations(x)))
where Exs(x) is the contents of the examples facet of x. Examples(x) represents the final list
of all known items which satisfy the definition of X. Examples(x) thus must include Exs(x).
Specializations(x) might be more regularly written Spec':<(x). That is, all members of x.Spec,
all members of their Spec facet, etc. Note the similarity of this to the formula for Isa's(x),
given on page 57. We could also write the above equation as follows:
Examples(x) ■ Spec*(Exs(Spec*(x)))
As an illustration, we shall show how AM would recognize that "3" is an example of
Object:
15
What doe* thia mean' B.Defn ia a Liap predicate, a Lambda expression. If it i* fed A aa ita argument, and it returne True,
we say that A ia a B, or that A satisfies B's definition. If B.Defn returns NIL, we say that A i* not a B, or
that A fails B's definition. If B.Defn runs out of time before returning a T/NIL value, there r» no definite
AM might check to tee whether A aatiefiee the definition
etatement of thia form we can make. In that
of some specialization of B, or whether A f aila the definition of come generalization, of 8.
Chapter 5
AM:
Discovery in Mathematics aa Heurietic Search
72
llersennepr imes
3
As the graph above shows, AM would ripple in the Spec direction 4 times, moving from
Object all the way to Mersermeprimes; then descend once in the Exs direction, to reach "3";
then ripple 0 more times in the Spec direction. Thus "3" is seen to be an example of
Object, according to the above formula. Similarly, we see that "3" is also an example of
Number, of Primes, of Oddnumber, of Oddprimes, and of course an example of
Mersenneprimes.
As with Generalizations/Specializations, the reasons behind the incomplete pointer structure
is simply to save space, and to minimize the difficulty of updating the graph structure
Chapter
AM:
5
Discovery in Mathematics as
Heuristic Search
whenever new links are found. Suppose a new Mersenne prime 16 is computed. Wouldn't it
be nice simply to add a single entry to the Exs facet of Mersenneprimes, rather than to
have to update the Exs pointers from a dozen concepts?
There is no harm if a few redundant links sneak in. In fact, frequentlyused paths are
granted the status of single links. If two requests arrive closely in time, to test whether A
isa B, then the result is stored as an entry on the Isa facet of A, and the Exs facet of B. If
the result were False, then the links would specify that, so that the next request would not
generate a long search. In fact, there is a separate facet called Exsnot, and one called Isanot. These two shadowy facets are quite analogous to the unmentionable facets "Genlnot"
and "Specnot", discussed in the previous subsection.
"Isas" is the converse of "Examples". The format is the same, and (if A and B are both
concepts) A is an entry on B.lsa iff B is an entry on A.Exs. In other words, A is a member
of Examples(B) iff B is a member of Isa's(A). Due to an ugly lack of standardization, nonconcepts are allowed to exist. Thus, "3" is an example of Primes, but is not itself a concept.
Examples of X sometimes are concepts, of course: "Intersectolntersect" is an example of
Composewithself. And Isa's(x) are always concepts. The highest level concept is called
"Anything" Its definition is the atom T. That is, "X(x) T". This highlevel concept can claim
everything as its examples.
The uses of the
Exs/Isa's facets are similar
to those for
Genl/Spec (see previous subsection).
Their formats are quite a bit more complicated than the Genl/Spec facets' formats, when we
finally get to the implementation level, however. There are really a cluster of different facets
all related to Examples:
1. TYPICAL: This is a list of average examples. Care must be taken to include a wide
spectrum of allowable kinds of examples. For "Sets", these would include sets of
varying size, nesting, complexity, type of elements, etc.
2. BOUNDARY: Items which just barely pass the definition of this concept. This
might include items which satisfy the base step of a recursive definition, or items
which were intuitively believed to be nonexamples of the concept. For "Sets", this
might include the empty set.
3. BOUNDARYNOT: Items which just barely fail the definition. This might include
an item which had to be slightly modified during checking, like {A,B,A} becoming
{A,B}.
4. FOIBLES: Total failures. Items which are completely against the grain of this
concept. For "Sets", this might include the operation "Compose".
5. NOT: This is the "cache" trick used to store the answers to frequentlyasked
questions. If AM frequently wants to know whether X is an example of V, and the
answer is No, then much time can be saved by adding X as an entry to the Exsnot
facet of Y.
An individual item on these facets may just be a concept name, or it may be more
complicated. In the case of an operation, it is an item of the form v>; i.e., actual
"Mersenne prime", without c hyphen, refer* to a number satisfying certain properties [ace glossary). "Mereenneprimea",
with a hyphen, refers to one specific AM concept, a data structure with facet*. Each Mersenne prime i* an
example of the concept Mersenneprimes.
AM:
Chapter 5
Discovery in Mathematics aa Heuriatic Search
74
I
I
arguments and the value returned. In the case of objects, it is an object of that form. An
Exs facet of the concept Sets might contain {a} as one entry.
Here is a more detailed illustration. Consider the Examples facet of Setunion. I: might
appear thus:
TYPICAL: {A}U{A,B}+{A,B};
{A,BJU{A,BH{A,B};
{A,<3,4,3>,{A B}}U{3 lAH{A,<3,4,3>,{A 1 8},3}.
BOUNDARY: {}UX+X 17
BOUNDARYNOT: {A,B}U{A,C}»{A,B,A,C};
{A,B,C ID}U{E,F,G H,I,JH{A,B,C,E IF,G IH,I,J}
FOIBLES: <2,A,2>
NOT: no entries
)
(
The format for Isas are much simpler: there are only two kinds of links, and they're each
merely a list of concept names. Here is the Isa facet of Setunion:
1A
ISA: (Operation Domain=Rangeop)
ISANOT: (Structure Composition Predicate)
At some time, some rule asked whether Setunion isa Composition. Asa result, the negative
response was recorded by adding "Composition" to the Isanot facet of Setunicn, and
adding "Setunion" to the Exsnot subfacet of the Examples facet of the concept
Composition (indicating that Setunion was definitely not an example of Composition, yet
there was no reason to consider it a foible).
5.2.3. InDoinainof/InRangeof
We shall say that A is in the domain of B (written "A Indomof B") iff
1 A and B are concepts
2. B isa Operation
3. A is equal to (or at least a specialization of) one of the domain components of the
operation B. That: is, B can be executed using any example of A as one of its
arguments. 19
.
For example, Oddperfectsquares is Indomof Add, since Oddperfectsquare:; is a
specialization of Numbers, and Numbers is one component of the following entry which is
Actually, AM ia not quite smart enough to use the variable X
shown in th* boundary examplea. It would aimpl^ atore a
few instances of thi* general rule, plu* have an entry of the form Equivalent Identity(X) and Setunion(X,{})> on the Exs facet of Conjecturee. Notice that because of the asymmetric way Setunion was
dafined, only one lopsided boundary example was found If mother definition were supplied, the converee
kind of boundary examplea would be found.
This entry is redundant.
19 More formally,
we can aay that thia occurs whenever some entry on the Domain/range facet of B hae the form with aoma t) a member of Generalizatione(A). Then A is a specialization of some domain component
of some entry on B.Domain/range.
,
I
I
AM:
Chapter 5
Discovery in Mathematics as Heuriatic Search
75
located on Add.Domain/range: . Since Oddperfectsquares
is a specialization of Numbers, the operation 'Add' can be executed using any example of
Oddperfectsquares as its argument.
As another
example, Oddperfectsquares is also Indomof Setinsert, one of whose
Domain/range entries is . This is because Oddperfectsquares is a
specialization of Anything. So Setinsert is executed on two arguments, and the first
argument can be any example of Oddperfectsquares (the second argument must be an
example of Sets).20

Although it can be recomputed very easily, we may wish to record the fact that A Indomof
B by adding the entry "B" to the Indomof facet of A. AM may even wish to add this new
entry to the Domain/range facet of B (where Ais a specialization of the jtn domain
component of B):
. The two examples given above would produce new
domain/range entries of Numbers> for Add, and for Setinsert.
D
The semantic content of Tndomof" is: what can be done to any example of a given
concept C? Given an example of concept C, what operations can be run on that thing?
Here are some illustrations:
"Oddperfectsquares Indomof Setinsert" tells us that Setinsert can be run on any
particular Oddperfectsquare we can grab hold of.
"Operation Indomof Compose" tells us that Compose can be run on any operation we
want.
"Dom» Rangeoperation Indomof Compose" tells us that Compose can be run on any
operation which has its range equal to one of its domain components.
"Primes Indomof Squaring" tells us that we can apply the operation Squaring to any
particular prime number we wish.
Let us now
turn
from Indomof
to
the related facet Inranof.
We say that concept Ais in the range of B iff Bis an Activity 21 and Ais a specialization
of the range of B. More precisely, we can say that "A Inranof B" iff
1 A and B are concepts
2. B isa Operation (i.e., B is an example of the concept "Operation")
3. Some entry on the Domain/range facet of B has the form with R
a generalization of A.
.
For example, Oddperfectsquares is Inranof Squaring, since (1) both of those are concepts,
(2) Squaring is an operation, (3) one of its Domain/range entries is , and
Discovery in Mathematics aa Heuriatic Search
.
Perfsquares is a generalization of Oddperfectsquares22
Here is what the Inranof facet of Oddperfectsquares might look like:
I
(Squaring Add TIMES Maximum Minimum Cubing)
Each of these operations will
result.
 at least sometimes  produce an odd perfect square as its
Semantically, the Inranof relation between A and B means that one might be able to
produce examples of A by running operation B. Aha! This is a potential mechan.sm for
finding examples of a concept A. All you need do is get hold of Inranof(A), and rtn each
of those operations. Even more expeditious is to examine the Examples facets of «:ach of
those operations, for alreadyrun examples whose values should be tested using A.Defn, to
see if they are examples of As. AM relies on this in times of high motivation; il is too
"blind" a method to use heavily all the time.
This facet is also useful for generating situations to investigate. Suppose that the
Domain/range facet of Doubling contains only one entry: < Numbers » Numbers > Then
syntactically, Oddnumbers is in the range of Doubling. Eventually a heuristic rule may
have AM spend some time looking for an example of Doubling, where the result was an
odd number. If none is quickly found, AM conjectures that it never will be found. Since
one definition of Oddnumiber(x) is "Number(x) and Not(Evennumber(x))", the only nonodd numbers are even numbers. So AM will increment the Domain/range fjxet of
Doubling with the entry , and remove the old entry. Thus Oddnumbers will no longer be Indomof Doubling. AM can of course chance upon this
conjecture in a more positive way, by noticing that all known examples of Doubling have
results which are examples of Evennumbers. 23
.
A more productive result is suggested by examining the cases where Oddperfectsquares
are the result of cubing. The smallest such odd numbers are 1, 729, and 15625. In general,
these numbers are all those of the form (2n+l) How could AM notice such an awkward
relationship?
.
The general question to ask, when A Inranof B, is "What is the set of domain items whose
values (under the operation B) are A's?" In case the answer is "AU" or "None", some special
modifications can be made to the Domain/range facets and Indomof, Inranof facets of
various concepts, and a new conjecture can be printed. In other cases, a new concept might
get created, representing precisely the set of all arguments to B which yield values in A. If
you will, this is the inverse image of A, under operation B. In the case of B a predicate,
this might be the set of all arguments which satisfy the predicate.
22
Why? Becau** Generalizationt(Oddperfectequaree) i* the tet of concept* (Oddnumbere Perfequaree dumber*
Objects Anyconcept Anything], hence containa Perfequaree. So Perfequarea ia a generalization of Oddperfectsquares.
23
This positive approach is in fact the way AM noticed thie particular relationship.
Wrong. That wae an exponent, not a footnote!
I
I
I
1
I
AM:
Chapter 5
Discovery in Mathematics
a*
Heuriatic Search
77
In the case of B=Cubing and A=Oddperfectsquares, the heuristic mentioned above will
have AM create a new concept: the inverse image of Oddperfectsquares under the
operation of Cubing. That is, find numbers whose cubes are Oddperfectsquares. It is
quickly noticed that such numbers are precisely the set of Oddperfectsquares themselves!
So The Domain/range facet of Cubing might get this new entry: . But not all squares can be reached by cubing, only a few of them
can. AM will notice this, and the new range would then be isolated and might be renamed
by the user "Perfectsixthpowers". Note that all this was brought on by examining the Inranof facet of Oddperfectsquares. "Cubing" was just one of the seven entries there.
There are six more stories to tell in this tiny nook of AM's activities.
How exactly does AM go about gathering the Inranof and Indomof lists? Given a
concept C, AM can scan down the global tree of operations (the Exs and Spec links below
the concept 'Active'). For if C is not Indomof F, it certainly won't be Indomof any
specialization of F. Similarly, if it can't be produced by F, it won't be produced by any
specialization of F. If you can't get x using Doubling you'll never get it by Quadrupling. So
AM simply ripples around, as usual. The precise code for this algorithm is of little interest.
There are not that many operations, and it is cheap to tell whether X is a specialization of
a given concept, so even an exhaustive search wouldn't be prohibitive. Finally, recall that
such a search is not done all the time. It will be done initially, perhaps, but after that the
Indomof and Inranof networks will only need slight updating now and then.
5.2.4. Views
I
I
Often, two concepts A and B will be inequivalent, yet there will be a "natural" bijection
between one and (a subset of) the other. For example, consider a finite set S of atoms, and
consider the set of all its subsets, 2s , also called the power set of S. Now Sis a member of,
but not a subset of, 2S (e.g., if S»{x,y,...}, then xis not a member of 2 s). On the other hand,
we can identify or view S as a subset by the mapping v*{v}. Then S is associated with the
following subset of 2S : { {x}, {y},... }. Why would we want to do this? Well, it shows that Sis
identified with a proper subset of 2^, and indicates that S has a lower cardinality
(remember: all sets are finite).
As another example, most of us would agree that the set {x, {y}, z} can be associated with
the following bag: (x, {y}, z). Each of them can be viewed as the other. Sometimes such a
viewing is not perfectly natural, or isn't really a bijection: how could the bag (2, 2, 3) be
viewed as a set? Is {2,3} better or worse than {2,{2},3)?
The View facet of a concept C describes how to view instances of another concept D as if
they were C's. For example, this entry on the View facet of Sets explains how to view any
given structure as if it were a Set:
I
I
Structure: X (x) Encloseinbraces(Sort(Removemultipleelements(x)))
If given the list , this little program would remove multiple elements (leaving
), sort the structure (making it ), and replace the "<...>" by "{..}", leaving the
final value as {a.c.z}. Note that this transformation is not 11; the list would get
transformed into this same set. On the other hand, it may be more useful than
I
Chapter 5
AM:
Discovery in Mathematics aa Heuriatic Search
transforming the original list into {z,{a,{c,{a}}}} which retains the ordering and multiple
element information. Both of those transformations may be present as entries on the View
facet of Sets.
I
As it turns out, the View facet of Sets actually contains only the following information:
Structure: X (x) Eincloseinbraces(x)
Thus the Viewing will produce entities which are not quite sets. Eventually, AM will get
around to executing a task of the form "Check Examples of Sets", and at that time the error
will be corrected. One generalization of Sets is NomultipleelementsStructures, and one of
its entries under Examples.Check says to remove all multiple elements. Similarly,
Unorderedstructures is a generalization of Sets, and one of its Examples.Check subfacet
entries says to sort the structure. If either of these alters the structure, the old structure is
added to the Boundarynot subfacet (the 'Justbarelymiss' kind) of Examples facet of Sets.
I
I
The syntax of the View facet of a concept C is a list of entries; each entry specifies the name
of a concept, X, and a little program P. If it is desired to view an instance of X as if it were
a C, then program P is run on that X; the result is (hopefully) a C. The progranu P are
opaque to AM; they must have no side effects and be quick.
Here is an entry on the View facet of Singleton:
Anything: X (x) Selinsert(x, PHI)
In other words, to view anything as a singleton set, just insert it into the empty se:. Note
that this is also one way to view anything as a set. As you've no doubt guessed, there is a
general formula explaining this:
Views(X) ■ View(Specializations(X))
Thus, to find all the ways of viewing something as a C, AM ripples away from C in the
Spec direction, gathering all the View facets along the way. All of their entries are valid
entries for C.View as well.
In addition to these builtin ways of using the Views facets, some special uses are made in
individual heuristic rules. Here is a heuristic rule which employs the Viewing facets of
relevant concepts in order to find some examples of a given concept C:
IF the current task is to Fillin Examples of C,
and C has some entries on its View facet,
and one of those entries indicates a concept X which has some known Examples,
THEN run the associated program P on each member of Examples(X),
and add the following task to the agenda: "Check Examples of C", for the following
reason: "Some very risky techniques were used to find examples of 0", and
that reason's rating is computed as: Average(Worth(X), the examples of C
found in llhis manner).
Say the task selected from the agenda was "Fillin Examples of Sets". We saw that one
entry on Sets.View was Structure: X(x) Encloseinbraces(x). Thus it is of the form ,
I
1
I
I
I
I
I
I
I
Chapter 5
AM:
Discovery in Mathematics aa Heuriatic Search
with X Structure. The above heuristic rule will trigger if any examples of Structures are
known. The rule will then use the View facet of Sets to find some examples of Sets. So AM
will go off, gathering all the examples of structures. Since Lists is a Specialization of
Structure, the computation of Examples(Structures) will eventually ripple downwards and
ask for Examples of Lists. If the Examples facet of Lists contains the entry , then
this will be retrieved as one of the members of Examples(Structure). The heuristic rule takes
each such member in turn, and feeds it to Set.View's little program P. In this case, the
program replaces the list brackets with set braces, thus converting to {z.a.c.a.a}.
In this manner, all the existing structures will be converted into sets, to provide examples of
sets. After all such conversions take place, a great number of potential examples of Sets will
exist. The final action of the right side of the above heuristic rule is to add the new task
"Check examples of Sets" to the agenda. When this gets selected, all the "slightly wrong"
examples will be fixed up. For example, {z,a,c,a,a} will be converted to {a,c,z}.
If any reliance is made on those unchecked examples, there is the danger of incorrectly
rejecting a valid conjecture. This is not too serious, since the very first such reliance will
boost the priority of the task "Check examples of Sets", and it would then probably be the
very next task chosen.
5.2.5. Intuitions
The mathematician does not work like a machine; we cannot overemphasize the
role played in his research by a special intuition (frequently wrong),
fundamental
which is not commonsense, but rather a divination of the regular behavior he
expects of mathematical beings.
—
Bourbaki
This facet turned out to be a "dud", and was later excised from all the concepts. It will be
described below anyway, for the benefit of future researchers. Feel free to skip directly to
the next subsection.
The initial idea was to have a set of a few (310) large, global, opaque LISP functions. Each
of these functions would be termed an "Intuition" and would have some suggestive name
like "jigsawpuzzle", "seesaw", "archery", etc. Each function would somehow model the
particular activity implied by its name. There would be a multitude of parameters which
could be specified by the "caller" as if they were the arguments of the function. The
function would then work to fill in values for any unspecified parameters. That's all the
function does. The caller would also have to specify which parameters were to be
considered as the "results" of the function.
For the seesaw, the caller might provide the weight of the lefthandside sitter, and the final
position of the seesaw, and ask for the weight of the righthand sitter. The function would
then compute that weight (as any random number greater/lessthan the lefthand weight,
I
I
Chapter 5
AM:
Diacovery in Mathematics aa Heuriatic Search
depending on the desired tilt of the board). Or, the caller might specify the two weights
and ask for the final position.
The Seesaw function is an expert on this subject; it has efficient code for computing any
values which can be computed, and for randomly instantiating any variables whch may
take on any value (e.g., the first names of the people doing the sitting). When an individual
call is made on this function, the caller is not told how the final values of the variables were
computed, only what those values end up as.
So the Intuitions were to be experimental laboratories for AM, wherein it could get some
(simulated) realworld empirical data. If the seesaw were the Intuition for ">", and weight
corresponded to Numbers, then several relationships might be visualized intuitively < like the
antisymmetry of ">"). This is a nice idea, but in practice the only relationships derived in
this way were the ones that were thought up while trying to encode the Intuition functions.
This shameful behavior led to the excision of the Intuitions facets completely from the
I
I
I
system.
As another example, suppose AM is considering composing two relations R and S. If they
have no common Intuition reference, then perhaps they're not meaningfully composable. If
they do both tie into the same Intuition function, then perhaps that function can tell us
something about the composition. This is a nice idea, but in practice very few prunings
were accomplished this way, and no unanticipated combinations were fused.
Each Intuition entry is like a "way in" to one of the few global scenarios. It can be
characterized as follows:
I. One of the salient features of these entries  and of the scenarios is that AM is
absolutely forbidden to look inside them, to try to analyze them. They are opaque.
Most Intuition functions use numbers and arithmetic, and it would be poir tless to
say that AM discovered such concepts if it had access to those algorithms all along.
2. The second characteristic of an Intuition is that it be fallible. As with human
intuition, there is no guarantee that what is suggested will be verified even
empirically, let alone formally. Not only does this make the programming of
Intuition functions easier, it was meant to provide a degree of "fairness" to them.
AM wasn't cheating quite as much if the Seesaw function was only antisymmetric
90£ of the time.
3. Nevertheless, the intuitions are very suggestive. Many conjectures can be proposed
only via them. Some analogies (see the next subsection) can also be suggested via
common intuitions.
I
—
After they were coded and running, I decided that the intuition functions were unfair; they
contained some major discoveries "builtin" to them. They had the power to propose
otherwiseobscure new concepts and potential relationships. They contributed nothing other
than what was originally programmed into them; they were not synergetic. Due to this
dubious character of the contributions by AM's few Intuition functions, they were removed
from the system. All the examples and all the discoveries listed in this document wen; made
without their assistance.
We shall now drop this deimplemented idea. I think there is some real opportunity for
research here. For the benefit of any future researchers in this area, let me point to the
excellent discussion of analogic representations in [Sloman 71].
I
I
I
I
AM:
Chapter 5
I
Discovery in Mathematics aa Heuriatic Search
5.2.6. Analogies
The whole idea of analogy is that 'Effects', viewed as a function
continuous function.
of situation, is
a
~ Poincare'
I
I
As with Views and Intuitions, this facet is useful for shifting between one part of the
universe and another. Views dealt with transformations between two specific concepts;
Intuitions dealt with transformations between a bunch of concepts and a large standard
scenario which was carefully handcrafted in advance. In contrast, this facet deals with
transforming between a list of concepts and another list of concepts.
Analogies operate on a much grander scale than Views. Rather than simply transforming a
few isolated items, they initiate the creation of many new concepts. Unlike Intuitions, they
are not limited in scope beforehand, nor are they opaque. They are dynamically proposed.
"prime numbers" is analogous to the notion of "simple groups". While not
isomorphic, you might guess at a few relationships involving simple groups just by my
The concept of
telling you this fact: simple groups are to groups what primes are to numbers. 24
Let's take 3 elementary examples, involving very fundamental concepts.
.
1 AM was told how to View a set as if it were a bag.
2. AM was told it could
I
I
the predetermined "Seesaw" function.
3. AM, by itself, once Analogized that these two lists correspond:
Those operations restricted to BagsofT's>
The concept of a bag, all of whose elements are "T"'s, is the unary representation of
numbers discovered by AM. When the above analogy (*>3) is first proposed, there are many
known Bagoperations25, but there are as yet no numeric operations 26 This triggers one of
AM's heuristic rules, which spurs AM on to finding the analogues of specific Bag
.
24
I
Intuit the relation "i" as
If
can be factored. Unfortunately, the factorization of a group into simple groups ie not unique.
Another analogizing contact: For each prime p, we can associate the cyclic group of order p, which i* of
course simple. AM never came up with the concept of aimple groups; this is just en illustration for the
a group ia not aimple, it
sophisticated reader.
i.e., all entries on Indomof(Bag) and Inranof(Bag); a few of theae ere: Baginaert, Bagunion, Bagintersection
26
Examples of Operation whoae domain/rang* contain* "Number".
25
Chapter 5
AM:
Discovery in Mathematic* aa Heuriatic Search
operations. That is, what special properties do the bagoperations have when their domains
and/or ranges are restricted from Bags to BagsofT's (i.c, Numbers). In this way, in fact,
AM discovers Addition (by restricting Bagunion to the Domain/range BagsofT's>), plus many other nice arithmetic functions.
Well, if it leads to the discovery of Addition, that analogy is certainly worth having. How
would an analogy like that be proposed? As the reader might expect by now, the
mechanism is simply some heuristic rule adding it as an entry to the Analogies facet of a
certain concept. For example:
IF the current task has just created a canonical specialization C 2
of concept CI, with respect
to operations Fl and F2, [i.e., two members of C2
satisfy Fl iff thfiy satisfy
F2],
THEN add the following entry to the Analogies facet of C2:
I
I
I
After generalizing "Equality" into the operation "Same length", AM seeks to find a.
canonical 27 representation for Bags. That is, AM seeks a canonizing function f, such that
(for any two bags x,y)
Samelength(x,y) iff Equal( f(x), f(y) )
Then the range of f would delineate the set of "canonical" Bags. AM finds such an f and
such a set of canonical bags: the operation f involves replacing each element of a bag by
"T", and the canonical bags are those whose elements are all T's. In this case, the above
rule triggers, with CUBags, C2=BagsofT's, FUSamelength, F2=Equality, and the analogy
which is produced is the one shown as example «3 above.
The Analogy facets are not implemented in full generality in the existing LISP version of
AM, and for that reason I shall refrain from delving deeper into their format. Since good
research has already been done on reasoning by analogy 28, I did not view it as a. central
feature of my work. Very little space will be devoted to it in this document.
An important type of analogy which Was untapped by AM was that between heuristics. If
two situations were similar, then conceivably the heuristics useful in one situation might be
useful (or have useful analogues) in the new situation. Perhaps this is a viable way of
enlarging the known heuristics. Such "metalevel" activities were kept to a minimum
throughout AM, and this proved to be a serious limitation.
Let me stress that the failure of the Intuitions facets to be nontrivial was due to the lack of
spontaneity which they possessed. Analogies facets were useful and "fair" since their uses
were not predetermined by the author.
27 A
natural, etandard form. All bag* differing in only "unimportant" way* should be tr»n»formed into the **mn canonical
form. Two bags B I and B2 which have the aame length should get transformed into the seme canonical bag.
28 An
excellent diacussion of reasoning by analogy ia found in [Polya 54]. Some early work on emulating thie
in [Evene 68]: a more recent thesis on this topic is [Kline; 71}
wa i reported
I
I
I
I
I
I
I
Chapter 5
I
AM:
Discovery in Mathematics as Heuristic Search
5.2.7. Conjee's
Basically, facet Conjee of concept C is a list of relationships which involve C. We shall
discuss its semantics (uses of this facet) before its syntax.
I
I
I
I
I
I
I
Perhaps the most obvious use for this facet would be to hold conjectures which could not
be phrased simply. Yet it turns out that luckily (I think), all the conjectures "fell out"
naturally as trivial relationships, e.g. simply as arcs in the Genl/Spec/Exs/Isas pointer
format. Specifically, the modal conjecture had the form "the range of F is not just C, but
actually S".
For example, AM restricted TIMES to perfect squares, and noted that the result was not
merely a number but a perfect square each time. The unique factorization theorem was
noticed similarly (the range of Primefactorings was always a singleton, not merely a set).
In all the cases encountered by AM, there was never any real need for a place to "park" an
awkwardlyphrased conjecture, because no awkward conjecture could ever possibly be noticed.
Why is this so? AM was constructed explicitly on the assumption that all (enough?)
important theorems could be discovered in quite natural ways, as very simple (alreadyknown) relationships on alreadydefined concepts. AM embodies several such assumptions
about math research; they are collected and packaged for display in Section 7.2.6, on
page 162.
What else might this facet be useful for, if not the storage of awkwardlyworded
conjectures? It might be a good place to store fiimsy conjectures: those which were strong
enough to get considered, yet for which not much empirical confirmation had been done.
This in fact was one important role of this facet.
two specializations of Unorderedgiven
any examples of any structures at
structures, namely Bags and Sets. But AM was not
all. Early on, it chose the task "Fillin examples of Bags" from the agenda. After filling them
in, a heuristic iu!e had AM consider whether or not this concept of Bags was really any
more specialized than the concept of Unorderedstructures. To test this empirically, AM
tried to verify whether or not there were any examples of Unorderedstructures that were
not examples of Bags. Failure to find any led to proposing the conjecture "All Unorderedstructures are really Bags". This could have been recorded quite easily: Bags was already
known to be specialization of Unorderedstructure, so all AM had to do was tag it as a
generalization as well (add "Bags" to the Generalizations facet of the Unorderedstructures
concept). But a heuristic rule which knows about such equivalence conjectures first asked
whether there were any specializations of Unorderedstructures which had no known
examples, and for which AM had not (recently, at least) tried to fill in examples. In fact,
such an entry was "Sets". So the conjecture was stored on the Conjee facet of Unorderedstructures, and a new job was added to the agenda: "Fill in examples of Sets". The reason
was that such examples might disprove this flimsy conjecture. In fact, the job already
existed on the agenda, so only the new reason was added, and its priority was boosted.
When such examples were found, they did of course disprove that conjecture: each set was
For
example, AM was initially told that there are
I
AM:
Chapter 5
Discovery in Mathematics aa Heuriatic Search
an Unorderedstructure and yet was not a Bag. 29
This last example has suggested another use for this facet: holding heuristic rules which are
relevant to filling in and checking conjectures. For example, the Conjee facet of Operations
has some special heuristics which look for certain kinds of relationships involving any
given operation (e.g., "Pick any example F(x)«y. See what interesting statements can be
made about y. Then try to verify or disprove each one by looking at the values of all the
other known calls on operation F"). The Conjee facet of Anyconcept will contain knowledge
which is much more general in scope (e.g., "See whether concept C is an example of some
member of (C.lsa).Spec"). Compose.Conjee will contain more specific heuristics (e.g., "See if
the composition AoB is really no different from B").
Given any concept C, AM will ripple upwards, locating Isas(C), and collect the heuristics
which are tacked onto their Conjee facets. These heuristic rules will then be evaluated (in
order of increasing generality), and some conjectures will probably be proposed, checked,
discarded, modified, etc. In fact, each Conjee facet of each concept can have two separate
subfacets: Conjec.Fillin and Conjee.Cheek. The former contains heuristics for noticing
conjectures, the second for verifying and patching them up.
There is yet another use for this facet, one of efficiency of storage. After discovering that
all primes except 2 are Oddprimes, there is very little reason to keep around Oddprimes
as a separate concept from Primes. Yet they are not quite equivalent. Primes.Conj;c is a
good place for AM to store the conjecture "Prime(x) implies that x=2 or Odd(x)", and to
pull over to Primes any efficient definition/algorithm which Oddprimes might possess
(patching it up to work for "2"), and then destroy the concept Oddprimes. Another way
out is merely to destroy "Primes", and make 2 a distinguished number tacked onto th; Justbarelymissed subfacet of Oddprimes.Exs (just like "1" is already).
Here is another example: AM discovers that SetinsertoSetinsert is the same as just Setinsert. That is, if you insert x twice into a set S, it's no different than inserting it just once
(because Sets don't allow multiple copies of the same element). Then there's no longer any
reason for keeping SetinsertoSetinsert hanging around as a separate concept. Instead, just
add a small new entry to Setinsert.Conjee and forget that spaceconsuming composition
1
I
I
I
1
I
I
I
forever.
There is another use of the Conjee facet: untangling paradoxes. It is with no sorrow that I
mention that this facility was never needed by AM: no genuine contradictions ever were
believed by AM. What would one look like? Suppose a chain of Spec links indicates that X
is a specialization of V, and yet AM finds some example x of X which does not .vatisfy
V.Definition). So X is
and is not  a specialization of Y. In such cases, the Conjees
facets of the concepts involved would indicate which of those Spec links were initiallysupplied (hence unchallengable), which links were created based on formal verifications
(barely challengable), and which links were established based only on empirical evidence
(yes, these are the ones which would then fade into the sunset). If it has to, AM should be
able to recall the justification for each new link it created. AM can deduce this by
examining the Conjee facets of the concepts involved.

29
Baga are not multisets, although those two notions are very closely related to each other. Each act is a multiset by
definition! but each act ia guaranteed by definition to not be a bag.
I
I
I
I
I
I
Chapter 5
AM:
Discovery in Mathematice as
Heuristic Search
Periodically (at huge intervals) AM chose a task of the form "Check conjees about C", at
which time all the entries on C.Conjee would be reexamined in light of existing data. Some
would be discarded (perhaps causing some Exs/Isa/Spec/Genl links to vanish with them).
Some of the conjectures might be believed much more strongly now (causing some new links
to be recorded). This turned out to be a surprisingly ineffective activity; very few new
revelations were obtained this way. Ultimately, this kind of task was muzzled (AM was
inhibited from doing this).
I
Theoretically, AM might possess rules which transformed a conjecture into a more efficient
algorithm for an operation, or which used the knowledge contained therein to speed up an
existing algorithm. Another sophisticated use of a conjee would be to set up a new
representation scheme for a concept 30
.
Finally, the Conjee's facet is used as a showcase, to highlight some nice discovery that AM
wants to display. The user can look at the entries on each concept's Conjee facet (after a
long run) and get a better feeling for AM's abilities. If there are several powerful
conjectures listed for concept C, then it appears to the user that AM "understands" the
concept much better than if C.Conjees is empty.
Let's recapitulate the uses of this facet:
I
I
I
I
I
1. Store awkwardlyphrased conjectures: this wasn't really useful.
2. Store flimsy conjectures: apparent relationships worth remembering, yet not quite
believed.
3. Hold heuristics which notice and check conjectures.
4. Obviate the need for many similar concepts: Collapse the entire essence of a related
concept into one or two relationships involving this one.
5. Untangling paradoxes: a historic record, which wasn't really used.
6. Improve existing algorithms, definition testing procedures, representations.
7. Display AM's most impressive observed relationships in a form which is easily
inspectable by the user.
The syntax of this facet is simply a list of conjectures, where each conjecture has the form
of a relationship: (R a b c.d). R is the name of a known operation (in which case, abc... are
its arguments and we claim that d is its value), or R is a predicate (and d is either True or
False), or R is the name of a kind of link (Genl, Spec, Isa, or Exs), and the claim is that a
and b are related by R. Here are three example of conjectures, illustrating the possible
formats:
1. (Compose Setinsert Setinsert Setinsert). This says that if you apply the known
operation Compose, to the two arguments Setinsert and Setinsert, then the
resultant composition is indistinguishable from Setinsert.
2. (Samesize Insert(S.S) S False). That is, inserting a set into itself will always (for finite
sets) give you a set of a different length.
(Exampleof
Primefactorings Function). This conjecture is the unique factorization
3.
e.g., after unique factorization is discovered, begin representing numbers as a bag of primes: n is represented aa the
prime fectorization of n This is exponentially better than unary notation: bagaofT'e. AM had a tiny ability
for thia kind of ongoing transformation, ao crude it's better left undescribed.
I
AM:
Chepter 5
Discovery in Mathematics as Heuristic
Search
theorem. The operation which takes a number n, and finds all prime factorizations
of n, is claimed to be a function, not merely a relation. That is, each n has
precisely one such prime factoring.
5.2.8. Definitions
A typical way to disambiguate a concept from all others is to provide a "definition" for it. 31
Almost every concept had some entries initially supplied on its "Definitions" facel. The
format of this facet is a list of entries, each one describing a separate definition. A single
entry will have the following parts:
1. Descriptors: Recursive/Linear/Iterative, Quick/Slow, Opaque/Transparent, Once
I
only/Early/Late, Destructive/Nondestructive.
2. Relators: Reducing to the definition of concept X, Same as V except..., Specialized
version of Z, Using the definition of W, etc.
3. Predicate: A small, executable piece of LISP code, to tell if any given item is an
example of this concept.
The predicate or "code" part of the entry must be faithfully described by the Descriptors,
must be related to other concepts just as the Relators claim. The predicate must be ai LISP
function which take argument(s) and return either T or NIL (for True/False), depending on
whether or not the argument(s) can be regarded as examples of the concept.
The argument "{A B}" should satisfy the predicate of any valid definition entry of the Sets
concept. This triple of arguments <{A B}, {A C}, {A B C}> should satisfy any defini:ion of
the Setunion concept, since the third is equal to the Setunion of the first two arguments.
I
I
Here is a typical entry from the Definitions facet of the Setunion concept:
Descriptors: Slow, Recursive, Transparent
Relators: Uses the algorithm for Setinsert, Uses the definition of Emptyset,
Uses the definition of Setequal, Uses the algorithm for Somemember,
Uses the algorithm for Setdelete, Uses the definition of Setunion
Code: X (A B C)
IF Emptyset.Defn(A) THEN Setequal.Defn(B,C) ELSE
X "» Somemember.Alg(A)
A «■ Setdelete.Alg(X,A)
B <" Setinsert.Alg(X,B)
Setunion.Defn(A,B,C)
31 Aa
EPAM studies ahowed (Feigenbaum 63],
for all time. In the distant
preeent
time.
one can never be aure that thi* definition will specify the concept jniquely
some new concept may differ in waya thought to be ignorebln at the
I
I
I
I
I
AM:
Chapter 5
Discovery in
Mathematica a* Heuriatic Search
Let me stress that this is just one entry, from one facet of one concept.
The notation "X « Somemember.Alg(A) means that any one algorithm for the concept
Somemember should be accessed, and then it should be run on the argument A. The result,
which will be an element of A, is to be assigned the name "X". The effect is to bind the
variable X to some member of set A.
H
I
In the actual LISP implementation, the ELSE part of the conditional is really coded32 as:
(Setunion. Defn (Setdelete. Alg (SETQ X (Somemember. A
(Setinsert. Alg X B)
lg
A))
A)
C
)
particular definition is not very efficient, but it is described as Transparent. That
means it is very well suited to analysis and modification by AM itself. Suppose some
heuristic rule wants to generalize this definition. It can peer inside it, and, e.g., replace the
33
base step call on Setequal, by a call on a generalization of Setequal (say "Samelength" ).
This
I
How could different definitions help here? Suppose there were a definition which first
checked to see if the three arguments were Setequal to each other, and if so then it
instantly returned T as the value of the definition predicate; otherwise, it recurred into Setunion.Defn again. This might be a good algorithm to try at the very beginning, but if the
Equality test fails, we don't want to keep recurring into this definition. This algorithm
should thus have a descriptor labelling it ONCEONLY EARLY.
A typical kind of entry for the Definitions facet of an operation is to simply call on the
Algorithms part of that same concept. Here is such an entry from the Definitions facet of the
Setunion concept:
Descriptors: none
Relators: Uses the definition of Setequal, Uses the algorithm for Setunion
Code: X (A B C) Setequal.Defn(C, Setunion.Alg(A.B))
I
I
This definition is a trivial call on the "Algorithms" facet of Setunion. That is, one way to
test whether C is the setunion of A and B, is simply to run setunion on A and B, and
compare the result against C. The descriptors and relators of the particular algorithm
which is chosen will then be added to the descriptors and relators which exist so far on this
entry. Note that the box above (like the box on the previous page) is simply one entry on
the Definitions facet of the Setunion concept.
00
For
a
a
2...)" meana "apply the predicate part of a definition of
to argumente al, 2,...".
definition ia to be randomly selected from the entries on the Definition* facet of concept f.
disjoint
the new definition would epecify the operation which wa call "addition"
The expree»ion "(f.Defn al
Thia
Chapter 5
AM:
Discovery in Mathematics aa Heuriatic Search
There are three purposes to having descriptors and relators hanging around:
1. For the benefit of the user. AM appears more intelligent because it can describe the
kind of definition it is using and why.
2. For the sake of efficiency. When all AM wants to do is to evaluate Setunion(A.B),
it's best just to grab a fast definition. When trying to generalize Setunion, it's
more appropriate to modify a very clean, transparent definition  even if it is a
slow one.
3. For the benefit of the heuristic rules. Often, a left or a righthandside will ask
about a certain kind of definition. For example, "If a transparent definition of X
exists, then try to specialize X".

I
I
Granted that Descriptors and Relators are useful, how do these "metalevel" modifiers get
filled in, for newlycreated34 concepts? All such powers are embedded in the fine structure
of the heuristic rules. This is true for the Algorithms facet as well, and will be illustrated in
the very next subsection.
Let me pull back the curtain a little further, and expose the actual implementation of these
ideas in AM. The secrets about to be revealed will not be acknowledged anywhere else in
this document. They may, however, be of interest to future researchers. Each concept may
have a cluster of Definition facets, just as it can have several kinds of Examples facets.
These include three types: Necessary and sufficient definitions, necessary definitions, and
sufficient definitions. These three types have the usual mathematical meanings. All that
has been alluded to before (and after this subsection) is the necc&suff type of definition (x is
an example of C if and only if x satisfies C.Def/necc&suff). Often, however, there will be a
much quicker sufficient definition (x satisfies C.Def/suf, only if x is certainly a C). Similarly,
entries on C.Def/nec are useful for quickly checking that x is not an example of C (to check
this, it suffices to verify that x fails to satisfy a necessary definition of C).
So given the task of deciding whether or not x is an example of C, we hay«: many
alternatives:
1. If x is a concept, see if C is a member of x.ISA (if so, then x is an example of C).
2. Try to locate x within C.Exs. (depending upon the flavor of subfacet on which x is
found, this may show that x is or is not an example of C).
3. If x is a concept, ripple to collect ISA's(x), and see if C is a member of ISA's(>;).
4. If there is a fast sufficent definition of C, see if x satisfies it.
5. If there is a fast necessary definition of C, see if x fails it (if so, then x is not an
example of C).
6. If there is a necessary and sufficient definition of C, see whether or not x satisfies
that definition (this may show that x is or is not an example of C).
7. Try to locate x within C.Exs. (depending upon the flavor of subfacet on which x is
found, this may show that x is or is not an example of C).
8. Recur: check to see if x is an example of any specialization of C.
9. Recur: check to see if x is not an example of some generalization of C (if so, then x
is not an example of C),
In fact, there is a LISP function, ISEXAMPLE, which performs those steps in tha: order.
34
For initiellyeupplied definition entrie*, the author handcoded theae modifiers
I
I
I
I
I
I
I
I
I
I
AM:
Chapter 5
Discovery in Mathematics
Heuriatic Search
At each moment, there is a timer set, so even if there is a necessary and sufficient definition
hanging around, it might run out of time before settling the issue one way or the other.
Each time the function recurs, the timer is granted a smaller and smaller quantum, until
finally it has too little to bother recurring anymore. There is a potential overlap of activity:
to see if x is an example of C, the function might ask whether x is or is not an example of
a particular generalization of C (step 9, above); to test that, AM might get to step 8, and
again ask if x is an example of C. Even though the timer would eventually terminate this
fiasco (and even though the true answer might be found despite this wasted effort) it is not
overly smart of AM to fall into this loop. Therefore, a stack is maintained, of all concepts
whose definitions the ISEXAMPLE function tried to test on argument x. As the function
recurs, it adds the current value of C to that stack; this value gets removed when the
recursion pops back to this level, when that recursive call "returns" a value.
5.2.9. Algorithms
I
Earlier, we said that each concept can have any facets from the universal fixed set of 25
facets. This is not strictly true. Sometimes, a whole class of concepts will possess a certain
type of facet which no others may meaningfully have. If C can have that facet, then so can
any specialization of C. Typically, there will be some concept C such that the examples of
C are precisely the set of concepts which can possess the new facet. That is, there will be a
domain of applicability for the facet, just as we defined such domains of applicability for
heuristics. For example, consider the "Domain/Range" facet. It is meaningful only to
"operations", but really is an important feature of all operations. Its domain of applicability
is Operation.


The kinds of facets
including all such limited "jargon" facets is fixed once and for all.
New kinds of facets cannot be conceived and added by AM itself. Nor does AM have any
control over the domain of applicability of each facet.
If desired, one can view all this in a more general light. For each facet f, the only concepts
which can have entries for facet f are examples of some particular concept J(f)
the "J"
stands for "jargon". J(f) is the domain of applicability of facet f. If C is any concept which
is not an example of J(f), then it can never meaningfully possess any entries for that facet f.
For almost all facets f, J(f) is "Anyconcept". Thus any concept can possess almost any facet.
For example, J(Defn)"Anyconcept", so any concept may have definitions.

I
There are a few more restricted facets. For example, J(Domain/range)«"Operation". So only
35
operations can have domain/range facets. The concept "Sets", which is not an operation,
can't have a domain/range facet.
Similarly, J(Algorithms)= "Actives". This facet is the subject of this section. The Algorithms
but only for Actives (predicates, relations, operations).
facet is present for all
I
I


The representation is, as usual, a list of entries, each one describing a separate algorithm. A
single entry will have the following parts:
9C
Actually, Predicates alto have domain/range facttt, even though the Range partt are all necessarily the
tame:
(T,F}.
AM:
Chapter 5
Discovery in Mathematics as Heuristic Search
Quick/Slow, Opaque/Transparent,
Onceonly/Early/Late, Destructive/Nondestructive.
2. Relators: Reducing lo the algorithm for concept X, Same as V except..., Specialized
version of Z's algorithm, Using the algorithm for W, etc.
3. Program: A small, executable piece of LISP code, for actually running C.
1.
Descriptors:
Recursive/Linear/Iterative,
Note the similarity to the format for the Definitions facets of concepts. Instead of a LISP
predicate, however, the Algorithms facets possess a LISP function (an executable piece of
code whose value will in general be other than True/False). That "program" part of the
entry must be faithfully described by the Descriptors, must be related to other concepts just
as the Relators claim, must take arguments and return values as specified in the
Domain/Range facet of C, and when run on any arguments, the resultant pair
must satisfy the Definitions facet of C.
There is an extra level of sophistication which is available but rarely used in AM The
descriptors can themselves be small numericvalued functions. For example, instead of just
including the Descriptor "Quick", and instead of just giving a fixed number for the speed of
the algorithm, there might be a little program there, which looked at the arguments fed to
the algorithm, and then estimated how fast this algorithm would be. The main reason for
not using this feature more heavily is that most of the algorithms are fairly fast, and fairly
constant in performance. It would be silly to spend much time recomputing their efficiency
each time they were called. If the algorithm is recursive, this conjures up even sillier
pictures. The main reason in support of using this feature is of course "intelligence": in the
long run, processing a little bit before deciding which algorithm to run has to be the
winning solution. At the moment, it is not yet costeffective.
36
Here is a typical entry from the Algorithms facet of the Setunion concept:
Descriptors: Slow, Recursive, Transparent
Relators: Uses the algorithm for Setinsert, Uses the definition of Emptyset,
Uses the algorithm for Somemember, Uses the algorithm for Setinsert,
Uses the algorithm for Setunion
Code: X (A B)
IF Emptyuet.Defn(A) THEN B ELSE
X «■ Siomemember.Alg(A)
A « Setdelete.Alg(X,A)
B < Setinsert.Alg(X,B)
Setunion.Alg(A,B)
3
note that rt ia similar to
—
but not identical to
—
the
entry ahown on page
86, of a Definition of Setunion.
I
I
AM:
Chapter 5
Discovery in Mathematics as Heuristic Search
37
Note that the Descriptors don't say whether this algorithm is destructive or not. That
means that this same algorithm can be used either destructively or not, depending on what
AM wants. More precisely, it's up to the algorithms which get called on by this one. If they
are all chosen to be destructive, so will Setunion. If they all copy their arguments first then
Setunion will not be destructive. For example, note how the algorithm calls on Setinsert(X.B). If this is destructive, then at the end B will have been physically modified to
contain X; the original contents of B will be lost.
This particular algorithm is not
means it is very well suited to
heuristic rule wants to specialize
variable X in (Setinsert X B) by
very efficient, but it is described as Transparent. That
analysis and modification by AM itself. Suppose some
this algorithm. It can peer inside it, and, e.g., replace the
the constant
"T".38
Why should AM bother storing multiple algorithms for the same concept? Consider this
example again, of Setunion. Suppose there were an algorithm which first checked to see if
the two arguments were Equal to each other, and if so then it instantly returned one of
them as the final value for Setunion; otherwise, it recurred into Setunion.Alg. This might
be a good algorithm to try at the very beginning, but if the Equality test fails, we don't want
to keep recurring into this definition. This algorithm should thus have a descriptor
labelling it ONCEONLY EARLY.
Also, there is an iterative algorithm which checks to see if A equals B, and if so then it
returns B. If not, the algorithm proceeds to check that A is shorter than B, and if not it
switches them. Finally, it enters an iterative loop similar to the recursive one above: it
repeatedly transfers an element from A to B, using Somemember, Setdelete and Setinsert.
This iterative loop repeats until A becomes empty. While more efficient than the recursive
one, this definition is less transparent.
An even more efficient algorithm is provided, but it is totally opaque:
Descriptors: Quick, Nonrecursive, Nondestructive, Opaque
Relators: none
Code: X (A B) (UNION A B)
I
I
I
This algorithm calls on the LISP function "UNION" to perform the setunion. It is the
"best" algorithm to choose unless space is critical, in which case a destructive algorithm must
be chosen, or unless AM wishes to inspect it rather than run it, in which case a transparent
one must be picked.
A LISP elgorithm is deetructive if it physically, permanently modifies the list structures it is fed aa argument*. Setafter running
A and B don't have the same valuea they atarted with. The
union(A,B) ia destructive if
adventagea of destructive operation* are increased speed, decreased space used up, fewer assignment
statement*. The danger of course is in accidentally destroying some information you didn't mean to.
—
Thi*
te
e
feirh/
u*ele*« new operation, of
effect et ell.
course,
—
h adda
T to B unless A i*
empty, in which ceee thie operation hae no
AM:
Chapter 5
Discovery in Mathematics as Heuristic Search
All the details about understanding the descriptors and relators are embedded in the fine
structure of the heuristic rules. A lefthandside may test whether a certain knd of
algorithm exists for a given concept. A righthandside which fills in a new algorithm must
also worry about filling in the appropriate descriptors and relators. As with newly created
concepts, such information is trivial to fill in at the time of creation, but becomes much
harder after the fact.
I
I
Here is a typical heuristic rule which results in a new entry being added to the Algorithms
facet of the newlycreated concept named ComposeSetIntersect&SetIntersect:
IF the task is to Fillin Algorithms for F,
and F is an example of Composition
and F has a definition of the form F=GoH,
and F has no transparent, nonrecursive algorithm,
THEN add a new entry to the Algorithms facet of F,
with Descriptors: Transparent, Nonrecursive
with Relators: Reducing to G.AIg and HAIg, Using the Definition of
with Program: X (,I,X)
(SETQ X (HAIg ))
I
(AND
«G.Domain>.Defn X)
(G.AIgX l))
The intent of the little program which gets created is to apply the first operator, check that
the result is in the domain of the second, and then apply the second operator. The
expression  means find a domain/range entry for G, count how many domain
components there are, and form a list that long from randomlychosen variable names
(u,v,w,x,y,z).

For the case mentioned above, F ■ ComposeSetIntersect&SetIntersect, G
Setlniersect,
and H SetIntersect. The domain of G is a pair of Sets, so  is a list of 2
variables, say (v v). Similarly, l is a list of I variable, say (w). Putting all
this together, we see that the new definition entry created for ComposeSetIntersect&SetIntersect would look like this:

I
I
I
Descriptors: NonRecursive, Transparent
Relators: Reducing to Setlntersect.Alg, Using the definition of Sets
Code: X (u,v,w,X)
(SETQ X (Setlntersect.Alg v v))
(AND
(Sets.Defn X)
(Setlntersect.Alg X w)
Let me make clear here one "kluge" of the AM program. At times, AM will be capable of
producing only a slow algorithm for some new concept C. For example, TIMES"'(> ) was
I
I
I
I
I
I
I
I
AM:
Chapter 5
Discovery in Mathematics as Heuristic Search
originally defined by AM as a blind, exhaustive search for bags of numbers whose product
is x. As AM uses that algorithm more and more, AM records how slow it is. Eventually, a
task is selected of the form "Fillin new algorithms for C", with the two reasons being that the
existing algorithms are all too slow, and they are used frequently. At this point, AM should
draw on a body of rules which take a declarative definition and transform it into an
efficient algorithm, or which take an inefficient algorithm and speed it up. Doing a good job
on just those rules would be a mammoth undertaking, and the author decided to omit them.
Instead, the system will occasionally beg the user for a better (albeit opaque) algorithm for
some particular operation. In general, the only requests were for inverse operations, and
even then only a few of them. The reader who wishes to know more about rules for
creating and improving LISP algorithms is directed to [Darlington and Burstall 73]. A
more general discussion of the principles involved can be found in [Simon 723.
5.2.10. Domain/Range
Another facet possessed only by active concepts is Domain/Range. The syntax of this facet
is quite simple. It is a list of entries, each of the form , where there can be
any number of Dj's preceding the arrow, and R and all the Dj's are the names of concepts.
Semantically, this entry means that the active concept may be run on a list of arguments
where the first one is an example of Dj, the second an example of 2, etc., and in that case
will return a value guaranteed to be an example of R. In other words, the concept may be
considered a relation on the crossproduct DjxD2X...xR. We shall say that the domain of
the concept is DjxD2X... and that its range is R. Each Dj is called a component of the
2...
D
(
domain.
I
I
For example, here is what the Domain/Range facet of TIMES might look like:
{
< Numbers Numbers * Numbers >
< Oddnumbers Oddnumbers » Oddnumbers >
< EvenNumbers EvenNumbers » Evennumbers >
< Oddnumbers EvenNumbers > EvenNumbers >
< PerfSquares PerfSquares * PerfSquares >
< BagsofNumbers » Numbers >
}
I
Here is what the Domain/Range facet of SetUnion might look like:
{
I
I
< Sets Sets » Sets >
< Nonemptysets Sets * Nonemptysets >
< SetsofSets » Sets >
}
Chapter 5
AM:
Discovery in Mathematics as Heuristic
Search
The Domain/Range part is useful for pruning away absurd compositions, and for
syntactically suggesting compositions and "coalescings". Let's see what this means.
Suppose some rule sometime tried to compose TIMESoSetunion. A rule tacked onto
Compose says to ensure that the range of Setunion at least intersects (and prefers bly is
equal to) some component of the domain of TIMES. But there are no entities which are
both sets and numbers39 ergo this fails almost instantaneously.
I
I
I
I
This is too bad, since there was probably a good reason (e.g., intuition) for trying this
composition. If the activation energy (priority of the current task) is high enough, AM will
continue trying to force it through. The failure arose because Sets could not be viewed as if
they were Numbers. A relevant rule says:
IF you want to view X's as if they were V's,
THEN seek an interesting operation F from X to V, to do the viewing.
So AM had to locate any and all operations whose domain/range had an entry of the form
Numbers>. The only such operation known to AM at the time was F=Length. So
the composition produced was TIMES[X, Length(Setunion(Y,Z))].
Notice that if the composition SetunionoSetunion is proposed, there will be no conflict,
since the range of Setunion obviously intersects one component of the domain of Setunion.
How can AM determine the domain/range of this composition? A rule tacked onto Compose
indicates that if FGoH, and a domain/range entry for G is C>, and an entry
for H is Y>, and V intersects X, then an entry for F's domain/range is C>. That is, the domain of H is substituted for the single component of the domain of G
which can be shown to intersect the range of H. Purely syntactically, AM can thus compute
some domain/range entries for the composition SetunionoSetunion.
Sets » Sets> combine to yield < Sets Sets Sets » Sets >;
< Sets Sets * Sets> and < Sets
combine to yield
;
and so on. Similarly, one can compute an entry for the domain/range facet of the previous
composition of three operations TlMESoLengthoSetunion:
< Sets Sets » Sets), < Sets » Numbers), and < Numbers Numbers » Numbers > combine to
yield < Numbers Sets Sets * Numbers >
So when computing TIMES( X, Length( Setunion(Y,Z))), both V and Z can be sets, md X
a number, and the result will be a number.
The claim was also made that Domain/Range facets help propose plausible coalescings. By
"coalescing" an operation, we mean defining a new one, which differs from the original one
in that a couple of the arguments must now coincide. For example, coalescing TIMES(x.y)
results in the new operation F(x) defined as TIMES(x.x). Syntactically, we can coalesce a
pair of domain components of the domain/range facet of an operation if those two domain
components are equal, or if one of them is a specialization of the other, or even if they
Why? The number n, to AM, is represented in unary, as a bag of n T'a. None of these ere eete. The conposition
"TIMESoBAGUNION" would have made sense to AM, but would have been defined only for bagaof T'e. Then
TIMESoBAGUNION(x,y,z) would be just x(y«z).
I
I
I
I
I
I
I
AM:
Chapter 5
Discovery in Mathematics aa Heuristic Search
merely intersect. In the case of one related to the other by specialization, the more
specialized concept will replace both of them, In case of merely intersecting, an extra test will
have to be inserted into the definition of the new coalesced operation.
Given this domain/range entry for Setinsert: < Anything Sets » Sets >, we see that it is ripe
for coalescing. Since Sets is a specialization of Anything, the new operation F(x), which is
defined as Setinsert(x.x), will have a domain/range entry of the form < Sets » Sets >. That
is, the specialized concept Sets will replace both of the old domain elements (Anything and
Sets). F(x) takes a set x and inserts it into itself. Thus F({a,b}){a,b,{a,b}}. In fact, this new
operation F is very exciting because it always seems to give a new, larger set than the one
you feed in as the argument.
I
We have seen how the Domain/range facets can prune away meaningless coalescings, as well
meaningless compositions. Any proposed composition or coalescing will at least be
syntactically meaningful. If all compositions are proposed only for at least one good semantic
reason, then those passing the domain/range test, and hence those which ultimately get
created, will all be valuable new concepts. Since almost all coalescings are semantically
interesting, any of them which have a valid Domain/Range entry will get created and
probably will be interesting.
as
This facet is occasionally used to suggest conjectures to investigate. For example, a heuristic
rule says that if the domain/range entries have the form genl(D) >, then it's
worthwhile seeing whether the value of this operation doesn't really always lie inside D
itself. This is used right after the BagsoNumbers analogy is found, in the following way.
One of the Bagoperations known already is Bagunion. The analogy causes AM to
consider a new operation, with the same algorithm as Bagunion, but restricted to BagsofT's (numbers in unary representation). The Domain/range facet of this new, restricted
mutation of Bagunion contains only this entry: . Since
Bags is a generalization of BagsofT's, the heuristic mentioned above triggers, and AM sees
whether or not the union of two BagsofT's is always a bag containing only T's. It appears
to be so, even in extreme cases, so the old Domain/range entry is replaced by this new one:
BagsofT's>. When the user asks AM to call these bagsofT's
"numbers", this entry becomes . In modern terms, then, the
conjecture suggested was that the sum of two numbers is always a number.
this last ability in fancy language, we might say that one mechanism for
proposing conjectures is the prejudicial belief in the unlikelihood of asymmetry. In this
case, it is asymmetry in the parts of a Domain/range entry that draws attention. Such
conjecturing can be done by any action part of any heuristic rule; the Conjee facet entries
don't have a monopoly on initiating this type of activity.
To sum
up
5.2.11. Worth
I
I
How can we represent the worth of each concept? Here are some possible suggestions:
1. The most intelligent (but most difficult) solution is "purely symbolically". That is, an
individualized description of the good and bad points of the concept; when it is
useful, when misleading, etc.
2. A simpler solution would be to "standardize" the above symbolic description once
Chapter 5
3.
4.
5.
6.
AM:
Discovery in Mathematics as Heuristic
Search
and for all, fixing a universal list of questions. So each concept would have to
answer the questions on this list (How good are you at motivating new concepts?,
How costly is your definition to execute?,...). The answers might each be symbolic;
e.g., arbitrary English phrases.
To simplify this scheme even more, we can assume that the answers to each question
will be numericvalued functions (i.c, LISP code which can be evaluated to yield a
number between 0 and 1000). The vector of numbers produced by Evaluating all
these functions will then be easy to manipulate (e.g. using dotproduct, vectorproduct, vectoraddition, etc.), and the functions themselves may be inspected for
semantic content. Nevertheless, much content is lost in passing from symbolic
phrases to small LISP functions.
A slight simplification of the above would be to just store the vector of numbers
answering the fixed set of questions; i.e., don't bother storing a bunch of programs
which compute them dynamically.
Even simpler would be to try to assign a single "worthwhileness" number to each
concept, in lieu of the vector of numbers. Simple arithmetic operations could
manipulate Worth values then. In some cases, this linear ordering seems
reasonable ("primes" really are better than "palindromes".) Yet in many cases we
find concepts which are too different to be so easily compared (e.g., "numbers" and
angles .)
The least intelligent solution is none at all: each concept is considered equally
worthwhile as any other concept. This threatens to be combinatorial dynamile.
I
I
I
As we progress along the intelligent+>>trivial dimension, we find that the schemes get easier
and easier to code, the Worth values get easier and easier to deal with, but the amount of
reliable knowledge packed into them decreases.
Initially, scheme «3 above was chosen for AM: a vector of numericvalued procedural
answers to a fixed set of questions. Here are those questions, the components of the Worth
vectors for each concept:
1. Overall aesthetic worth.
2. Overall utility. Combination of usefulness, übiquity.
3. Age. How many cycles since this concept was created?
4. Lifespan. Can this concept be forgotten yet?
5. Cost. How much cpu time has been spent on this concept, since its creation?
Notice that in general no constant number can answer one of these questions once and for
all (consider, e.g., Lifespan). Each 'answer' had to be a numericvalued LISP function
A few questions which crop up often are not present on this list, since they can be answered
trivially using standard LISP functions (e.g., "How much space does concept C use up?" can
be found by calling the function "COUNT" on the propertylist of the LISP atom "C").
Another kind of question, which was anticipated and did in fact come up frequently, is of
the form "How good are the entries on facet F of this concept?", for various value:; of F.
Since there are a couple dozen kinds of facets, this would mean adding a couple dozen more
questions to the list. The line must be drawn somewhere. If too much of AM's time is
drained by evaluating where it is already, it can never progress.
I
I
I
I
I
I
AM:
Chapter 5
Discovery in Mathematics as Heuristic Search
The heuristic rules are responsible for initially setting up the various entries on the Worth
facets of new concepts, and for periodically altering those entries for all concepts, and for
delving into those entries when required.
Recent experiments have shown (see Experiment 1, page 127) there was little change in
behavior when each vector of functions was replaced by a single numeric function (actually,
the sum of the values of the components of the "old" vector of functions). There wasn't even
too much change when this was replaced by a single number. There was a noticeable
degradation (but no collapse) when all the concepts' numbers were set equal to each other
initially.
For the purposes of this document, then (except for this page and the discussion of
Experiment I), we may as well assume that each concept has a single number (between 0
and 1000) attached as its overall "Worth" rating. This number is set 40 and referenced and
updated by heuristic rules. Experiment 1 can be considered as showing that a more
sophisticated Worth scheme is not necessary for the particular kinds of behaviors that AM
exhibits.
5.2.12 Interest
I
Now that we know how how to judge the overall worth of the concept "Composition", let's
the question of how interesting some specific composition is. Unfortunately, the
Worth facet really has nothing to say about that problem. The Worth of the concept
"Compose" has little effect on how interesting a particular composition is: "CountoDivisorsof" is very interesting, and "InsertoMember"'" is less so. The Worth facets of those concepts
will say something about their overall value. And yet there is some knowledge, some
"features" which would make any composition which possessed them more interesting than a
composition which lacked them:
Are the domain and range of the composition equal to each other?
Are interesting properties of each component of the composition preserved?
Are undesirable properties lost (i.e., not true about the composition)?
Is the new composition equivalent to some alreadyknown operation?
turn to
These hints about "features to look for" belong tacked onto the Composition concept, since
they modify all compositions. Where and how can this be done?

can have entries on its
For this purpose each concept  including "Composition"
"Interest" facet. It contains a bunch of features which (if true) would make any particular
example of the current concept interesting.
The format for the Interest facet is as follows:
I
I
< Conflictmatrix
. rf x«y, then insert T into else insert 'NIL' into r.
Chapter 5
AM:
Discovery in Mathematics aa Heuriatic Search
I
z is an example of AoB
This rule applies precisely to the task of filling in examples of Examples(Composition).
Thus, it is relevant to the task "Fill in examples of Insertolnsert". It is irrelevant if you
change the action (e.g., "Check examples of Insertolnsert"), or if you change the facet to be
dealt with (e.g., "Fill in algorithms for Insertolnsert"), or if you change the class of concept
(e.g., "Fill in examples of Setunion)*5
.
illustration, let me describe a typical rule which is found on
Compose.Conjec.Fillin. It says that one potential conjecture about a given composit on AoB
is that it is unchanged from A (or from B). This happens often enough that it's worth
examining each time a new composition is made. This rule applies precisely to the task of
filling in conjectures about particular compositions.
As
another
The subfacet AnyConcept.Examples.Fillin is quite large; it contains all the known methods
for filling in examples of C (when all we know is that C is a concept). Here are a few of
those techniques46:
1. Instantiate C.Defn
2. Search the examples facets of all the concepts on Generalizations(C) for examples of
C
3. Run some of the concepts named in Inranof(C) [i.e., operations whose ranje is C]
and collect the resultant values.
AnyConcept.Examples.Check is large for similar reasons. A typical entry there says to
examine each verified example of C: if it is also an example of a specialization of C, then it
47
must be removed from C.Examples and inserted
into the Examples facet of that
specialized concept.
Here is one typical entry from Operation.Domain/Range.Check:
IF a domain/range entry has the form (D D D... » R),
and all the D's are equal, and R is a generalization of D,
THEN it's worth seeing whether (D D D... > D) is consistent with all known examples of the
operation.
If there are no known examples, add a task to the agenda requesting they be filed in.
If there are examples, and (D D D... * D) is consistent, add it to the Domain/range facet
of this operation.
If there are some contradicting examples, create a new concept which is defined as this
operation restricted to (D D D...
* D).
45 Note
that it does make sense if you replace the concept "Insert o Insert" by any other example of a Composition (e.g.,
"Fill in examples of SetUnion o Setintersection")
46
The interested reader will find them all listed in Appendix 3, beginning on
233
47
Conditionally. Since each concept is of finite worth, it is allotted a finite amount of space. A random number is generated
to decide whether or not to actually insert this example into the Examples facet of the specialization of C.
The more that specialized concept is "exceeding its quota", the narrower the range that the random number
must fall into to have that new item ineerted. The probability is never precisely I or 0
AM:
Chapter 5
Discovery in Mathematics »* Heuristic Search
Note that this "Checking" rule doesn't just passively check the designated facet; it actively
"fixes up" faulty entries, adds new tasks, creates new concepts, etc. All the check rules are
very aggressive in this way. For example, one entry on Nomultipleelementsstructure.Examples.Check will actually remove any multiple occurrences of an element from
a structure.
As you might expect, the set Checks(C.F) of all relevant rules for checking facet F of
concept C is obtained as (ISA's(C)).F.Check. That is, look for the Check subfacet of the F
facet of all the concepts on ISA's(C)). Similarly, Fillins(C.F) is the union of the Fillin
subfacets of the F facets of all the concepts on ISA's(C).
When AM chooses a task like "Fillin examples of Primes", its first action is to compute
Fillins(Primes.Exs). It does this by asking for ISA's(Primes); that is, a list of all concepts of
which Primes is an example. This list is: F(a,b,b,d...). That is, F is called upon with a pair of arguments equal
other. If F were Times, then G would be Squaring. If F were Setinsert, then G would be
the operation of inserting a set S into itself.
COMPOSITION involves taking two operations A and B, and applying them in sequence:
AoB(x)=A(B(x)). This concept deals with (i) the activity of creating new composition given
a pair of operations; (ii) all the operations which were created in this fashion. That is why
this concept is both a specialization of and an example of Operation.


CONJECTURES are a kind of object. This concept knows about and can store conjectures.
When proof techniques are inserted into AM, this tiny twig of the tree of concepts will grow
to giant proportions.
CONSTANTPREDICATE is a predicate which can afford to have a very liberal domain: it
always ignores its arguments and just returns the same logical value all the time.
DELETE is an operation which contains all the information common to all flavors of
removing an element from a structure (regardless of the type of structure which :s being
attenuated). When called upon to actually perform a deletion, this concept determines the
type of structure and then calls the appropriate specialized delete concept (e.g., Bagdelete).
DIFFERENCE is another general operation, which accepts two structures, determines their type
(e.g., Bags), and then calls the appropriate specialized version of difference (e.g., Bag d iff).
EMPTYSTRUCTURE contains data relevant to structures with no members.
FIRSTELEMENT is an operation which takes an ordered structure and returns the first
element. It is like the Lisp function 'CAR.
IDENTITY is just what it claims to be. It takes one argument and returns it immediately. The
main purpose of knowing about this boring transformation is just in case some new concept
turns out unexpectedly to be equivalent to it.
i.e., take an operation which used to have "A" as one of it* domain component* or aa it* range, and try to create c new
definition but who*e domain/range aaya "CanonicalA" inetead of "A".
operation with essentially the
Both a specialization of Operation and an example of Operation.
Chapter 5
AM:
Discovery in Mathematics as Heuristic Search
INSERT takes an item x and a structure S, determines S's type, and calls the appropriate
flavor of specialized Insertion concept. The general INSERT concept contains any
information common to all of those insertion concepts.
INTERSECT is an operation which computes the intersection of any two structures. It, too, has
a separate specialization for Bags, Sets, Osets, and Lists.
INVERTANOPERATION is a very active concept. It can invert any given operation. If
F:X+Y is an operation, then its inverse will be abbreviated F" 1 , and F"'(y) is defined as all
the x's in X for which F(x)y. The domain and range of F* 1 are thus the range and
domain of F.
INVERTEDOP contains information specific to operations which were created as the inverses
of more primitive ones.
LASTELEMENT takes an ordered structure and returns its final member.
LIST is a type of structure. It is ordered, and multiple occurrences of the same element are
permitted. Lists are also called vectors, tuples, and obags (for "ordered bags").
LISTDELETE is an operation which takes two arguments, x and B. Although x can be
anything, B must be a list. The procedure is to remove the first occurrence of x from B.
LISTDIFF is an operation which takes two lists B.C. It repeatedly picks a member of C, and
removes it (the first remaining occurrence of it) from both B and C. This continues until
there are no more members in C.
LISTINSERT is an operation which adds (another occurrence of) x onto the front of list B. It
is like the Lisp function 'CONS'.
LISTINTERSECT takes two lists B.C, and creates a new list D. An item occurs in D the
minimum number of times it occurs in either B or C. D is arranged in order as (a sublist
of) list B.
LISTUNION takes list C glues it onto the end of list B. It's like 'APPEND' in Lisp.
LOGICALRELATION contains
conjunction, implication, etc.
knowledge
about
Boolean
combinations:
disjunction,
MULTIPLEELEMENTSSTRUCTURES are a specialization of Structure. They permit the same
atom to occur more than once as a member, (e.g., Bags and Lists)
NOMULTIPLEELEMENTSSTRUCTURES are a specialization of Structure. They permit the
same atom to occur only once as a member, (e.g., Sets and Osets)
NONEMPTYSTRUCTURES are a specialization of Structure also. They contain data about all
structures which have some members.
OBJECT is a general, static concept. Objects are like the subjects and direct objects in
I
Chapter 5
AM:
Discovery in Mathematics aa Heuriatic Search
sentences, while the Actives are like the verbs52
.
OBJECTEQUALITY is a predicate. It takes a pair of objects, and returns True if (i) they are
identical, or (ii) they are structures, and each corresponding pair of members satisfies
ObjectEquality. Often we'll call this 'Equal', and denote it as ''.
OPERATIONS are Actives which take arguments and return a value. While a predicate
examines its arguments and returns either True or False, an operation examines its
arguments and returns any number of values, of varying types. Assuming that the
arguments lay in the domain of the operation (as specified by some entry on its
Domain/range facet), then every value returned must lie within its range (as specified by
that same Domain/range entry).
ORDEREDPAIR is a kind of List. It has just two 'slots', however: a front and a rear elenent.
ORDEREDSTRUCTURE is a specialized type of Structure. It includes all structures for which
the order of insertion of two members can make a difference in whether the structures are
equal or not. Orderedstructures are those for which it makes sense to talk about a front
and a rear, a first element and a last element.
OSET is a type of structure. It is ordered, and multiple occurrences of the same element are
permitted. The shorttermmemory of Newell's PSG [Newell 73] is an Oset, as is a
cafeteria line. Not much use was found for this concept by AM.
not
OSETDELETE removes x from oset B (if x was in B).
OSETDIFF is an operation which takes two osets B.C. It removes each member of C from B.
OSETINSERT is an operation which adds x to the front of oset B. If x was in B previously,
it is simply moved to the front of B.
OSETINTERSECT takes two osets B.C, and removes from B any items which are not in C as
well. B thus 'induces' the ordering on the resultant oset.
OSETUNION takes oset C, removes any elements in B already, then glues what's left of C
onto the rear of B.
PARALLELJOIN is an operation which takes a kind of structure and an operation H. It
operation F, whose domain is that type of structure. For any such structure S,
F(S) is computed by appending together H(x) for each member x of S.
creates a new
PARALLELJOIN 2 is a similar operation. It creates an operation F with two structural
arguments. F(S,L) is computed by appending the values of H(x,L), as x runs through the
elements of S.53
52
53
Aa in Engliah, a particular Activity can aometimes itaelf be the subject.
Here, the ergs to PARALLELJOIN 2are two types of structures SS and LI, and an operation H whose range
etructurel type DO. Then c new operation ia created, with domain SSxLL and range DD.
t*
aleo
11
AM:
Chapter 5
Discovery in Mathematic* aa Heuriatic Search
111
PARALLELREPLACE is an operation used to synthesize new substitution operations. It takes
a structural type and an operation H as its arguments, and creates a new operation F. F(S)
is computed by simply replacing each member x of S by the value of F(x). The operation
produced is very much like the Lisp function MAPCAR.
PARALLELREPLACE 2 is a slightly more general operation. It creates F, where F(S,L) is
computed by replacing each x«S by F(x,L).
PREDICATES are actives which examine their arguments and then return T or NIL (True or
False). It is only due to the capriciousness of AM's initial design that predicates are kept
distinct from operations. Of course, each example of an operation can be viewed as if it
were a predicate; if F:A*B is any operation from A to B, then we can consider F a relation
on Axß, that is a subset of Axß, and from there pass to viewing F as a (characteristic)
predicate F:AxB»{T,F). Similarly, any predicate on Ax...xßxC may be considered an
operation (a multivalued, notalwaysdefined function) from Ax...xß into C. There are no
unary predicates. If there were one, say P:A>{T,F}, then that predicate would essentially be
a new way to view a certain subset of A; the predicate would then be transformed into
{a into .
SET is a type of structure. It is unordered, and multiple occurrences of the same element are
not permitted.
SETDELETE is an operation which takes two arguments, x and B. Although x can be
anything, B must be a set. The procedure is to remove x from B (if x was in B), then
return the resultant value of B.
Chapter 5
AM:
Discovery in Mathematics a* Heuristic Search
112
SETDIFF is an operation which takes two sets B,C. It removes each member of C from B.
SETINSERT is an operation which adds x to set B.
SETINTERSECT removes from set B any items which are not in set C, too.
SETUNION dumps into B all the members of C which weren't in there already.
STRUCTURE, the antithesis of ATOM, is inherently divisible. A structure is something that
has members, that can be broken into pieces. There are two questions one can ask about
any kind of structure: Is it ordered or not? Can there be multiple occurrences of the same
element in it or not? There are four sets of answers to these two questions, and eacF of the
four specifies a wellknown kind of structure (Sets, Lists, Osets, Bags).
STRUCTUREOFSTRUCTURES is a specialization of Structure, representing those structures all
of whose members are themselves structures.
TRUTHVALUE is a specialized kind of atomic object. Its only examples are True and False.
This concept is the range set for all predicates.
UNION is a general kind of joining operation. It takes two structures and combines them.
Four separate variants of this concept are given to AM initially (e.g., Setunion).
UNORDEREDSTRUCTURE is a specialized type of Structure. It includes all structures for
which the order of insertion of two members never makes any difference in whether the
structures are equal or not. Unorderedstructures cannot be said to have a front or 3i rear, a
first element or a last element.
5.3.3. Rationale behind Choice of Concepts
A necessary part of realizing AM was to choose a particular set of starting concepts. But
how should such a choice be made?
My first impulse was to gather a complete set of concepts. That is, a basis which would be
sufficient to derive all mathematics. The longer I studied this, the larger the estimated size
of this basis grew. It immediately became clear that this would never fit in 256k. M One
philosophical problem here is that future mathematics may be inspired by some realworld
phenomena which haven't even been observed yet. Aliens visiting Earth might have a
different mathematics from ours, since their collective life experiences could tie quite
different from we Terrans.
Scrapping the idea of a sufficient basis, what about a necessary one? That is, a basis which
would be minimal in the following sense: if you ever removed a concept from that basis, it:
could never be rediscovered. In isolated cases, one can tell when a basis is not minimal: if
it contains both addition and multiplication, then it is too rich, since the latter can be
54 Thie
is the size of the core
of the computer I had at
disposal.
I
I
Chapter 5
AM:
Discovery in Mathematics as Heuristic
Search
derived from the former. 55 And yet, the same problem about "absoluteness" cropped up:
how can anyone claim that the discovery of X can never be made from a given starting
point? Until recently, mathematicians didn't realize how natural it was to derive numbers
56
and arithmetic from set theory (a task which AM does, by the way)
So 50 years ago the
and
number
would
both
have
been
placed into a
theory
theory
undisputedly
concepts of set
thus
absolute
culture
conceptual
"minimal" basis. There are
no
primitives; each
(perhaps
even each individual) possesses its own basis.
.
Since I couldn't give AM a minimal basis, nor a complete one, I decided AM might as well
have a nice one. Although it can never be minimal, it should nevertheless be made very
small (order of magnitude: 100 concepts). Although it can never be complete, it should
suffice for rediscovering much of alreadyknown mathematics. Finally, it should be
rational, by which I mean that there should be a simple rule for deciding which concepts do
and don't belong in that basis.
The concepts AM starts with are meant to be those possessed by young children (age 4, say).
This explains some omissions of concepts which would otherwise be considered
fundamental: (i) Proof and techniques for proof/disproof; (ii) Abstract properties of
relations, like associativity, singlevalued, onto; (iii) Cardinality, arithmetic; (iv) Infinity,
continuity, limits. The interested reader should see [Piaget 55] or [Copeland 70].
Because my programming time and the PDP10's memory space were both quite small, only
a small percentage of these 'prenumerical' concepts could be included. Some unjustified
omissions are: (i) visual operations, like rotation, coloration; (ii) Games, rules, procedures,
strategies, tactics; (iii) Geometric notions, e.g., outside and between.
AM is not supposed to be a model of a child, however. It was never my intention (and it
would be much too hard for me) to try to emulate a human child's whimsical imagination
57
and emotive drives. And AM is not ripe for "teaching", as are children. Also, though it
possesses a child's ignorance of most concepts, AM is given a large body of sophisticated
"adult" heuristics. So perhaps a more faithful image is that of Ramanujan, a brilliant
modern mathematician who received a very poor education, and was forced to rederive
much of known number theory all by himself. Incidentally, Ramanujan never did master
the concept of formal proof.
There is no formal justification for the particular set of starting concepts. They are all
reasonably primitive (sets, composition), and lie several levels "below" the ones which AM
managed to ultimately derive (prime factorization, squareroot). It might be valuable to
attempt a similar automated math discoverer, which began with a very different set of
concepts (e.g., start it out as an expert in lattice theory, possessing all known concepts
thereof). The converse kind of experiments are to vary the initial base of concepts, and
observe the effects on AM's behavior. A few experiments of that form are described in
Section 6.2.
55
by
AM, and by any mathematician. As Don Cohen points
he might never derive Time* from Plus.
out, if the researcher lacked the proper discovery
methods, then
The "new meth" i» trying to get young children to do thi* aa welli unfortunately, no one thowed the elementaryschool
teachera the underlying harmony, and the results have been saddening.
teeming psychologists might label AM aa neobehavioriaticand cognitivistic. See [LeFrancois].
I
114
Chapter 6. Results
This chapter opens by summarizing what AM "did". Section 1 gives a fairly highlevel
description of the major paths which were explored, the concepts discovered along the way,
the relationships which were noticed, and occasionally the ones which "should" have been
but but weren't.
The next section (6.2) continues this exposition by presenting the results of experiments
which were done with (and on) AM.
I
Chapter 7 will draw upon these results  and others given in the appendices
conclusions about AM. Several metalevel questions will be tackled there (e.g.,
AM's limitations?").
 to form
"What are
6.1. What AM Did
Now we have seen that mathematical work is not simply mechanical, that it could
not be done by a machine, however perfect. It is not merely a question of applying
rules, of making the most combinations possible according to certain fixed laws.
The combinations so obtained would be exceedingly numerous, useless, and
cumbersome. The true work of the inventor consists in choosing among these
combinations so as to eliminate the useless ones or rather to avoid the trouble of
making them, and the rules which must guide this choice are extremely fine and
delicate. It is almost impossible to state them precisely; they are felt rather than
formulated. Under these conditions, how imagine a sieve capable of applying
them mechanically?
~
Poincar*)'
A M is both a mathematician of sorts, and a big computer program.
By granting AM more anthropomorphic qualities than it deserves, we can describe its
progress through elementary mathematics. It rediscovered many wellknown concepts, a
couple interesting but notgenerallyknown ones, and several concepts which were hitherto
unknown and should have stayed that way. Section 1.3, on page 10, recaps what AM did,
much as a historian might critically evaluate Euler's work. A more detailed prose
description of everything AM did is found in Appendix 5.1, beginning on page 287.
Chapter 6
AM:
Discovery in Mathematics as Heuristic Search
I
Instead of repeating any of this descriptive prose here, Section 6.1.1 will provide a very
brief listing of what AM did in a single good run, task by task. A much more detailed
version of this same list is found in Appendix 5.2, beginning on page 294. The task
1
numbers there correspond to the numbering below These taskbytask listings are not
complete listings of every task AM ever attempted in any of its many runs, but rather a
2
trace of a single, betterthanaverage run of the program. The reader may wish to consult
the brief alphabetized glossary of concept names in the last chapter (page 107), or the more
detailed appendix of concept descriptions (following page 173).
.
Following this linear trace of AM's behavior is a more appropriate representation of what it
did: namely, a twodimensional graph of that same behavior as seen in "concept space".
This forms Section 6.1.2, and is found on page 123.
By underestimating AM's sophistication, one can demand answers to the typical questions
to ask about a computer program: how big is it, how much cpu time does it use, what
language it's coded in, etc. These are found in Section 6.1.3.
6.1.1. Linear Taskbytask Summary of a Good Run
1. Fill in examples of Compose. Failed, but suggested next task:
2. Fill in examples of Setunion. Also failed, but suggested:
3. Fill in examples of Sets. Many found (e.g., by instantiating Set.Defn) and th;n more
derived from those examples (e.g., by running Union.Alg).
4. Fill in specializations of Sets (because it was very easy to find examples of Sets).
Creation of new concepts. One, INTSets, is related to "Singletons". Another, "BISets", is all nests of braces (no atomic elements).
5. Fill in examples of INTSets. This indirectly led to a rise in the worth of Equal.
6. Check all examples of INTSets. All were confirmed. AM defines the set of Nonempty
INTSets; this is renamed "Singletons" by the user.
7. Check all examples of Sets. To check a couple conjectures, AM will soon look for
Bags and Osets.
8. Fill in examples of Bags.
9. Fill in specializations of Bags. Created INTBags (contain just one kind of element),
and 81Bags (nests of parentheses).
10. Fill in examples of Osets.
11. Check examples of Osets.
12. Fill in examples of Lists.
13. Check examples of Lists.
14. Fill in examples of Allbutfirst.
15. Fill in examples of Allbutlast.
16. Fill in specializations of Allbutlast. Failed.
They do not precisely match the task numbers accompanying the example given in Chapter 2.
In
H ia perhaps the best overall run. It occurred in two stages (due to space problem*; unimportant). In t ii* particular
run, AM miaaea the few "very beat" discoveries it ever made, since the runs they occurred in went in
somewhat different directions it alao omit* aome of the more boring tasks: see, e.g., the deacription of task
number 69.
I
I
I
I
I
I
AM:
Chapter 6
I
Discovery in Mathematics
Heuriatic Search
17. Fill in examples of Listunion.
18. Fill in examples of Projl.
19. Check examples of Allbutfirst.
20. Check examples of Allbutlast.
21. Fill in examples of Proj2.
22. Fill in examples of Emptystructures. 4 found.
23. Fill in generalizations of Emptystructures. Failed.
24. Check examples of Listunion.
25. Check examples of Bags. Defined Singletonbags.
26. Fill in examples of Bagunion.
27. Check examples of Proj2.
28. Fill in examples of Setunion.
29. Check examples of Setunion. Define A (x,y) xuy=x, later called Superset.
30. Fill in examples of Baginsert.
31. Check examples of Baginsert. Range is really Nonempty bags. Isolate the results of
insertion restricted to Singletons: call them Doubletonbags.
32. Fill in examples of Bagintersect.
33. Fill in examples of Setinsert.
34. Check examples of Setinsert. Range is always Nonempty sets. Define X (x,S) Setinsert(x,S)=S; i.e., set membership. Define Doubleton sets.
35. Fill in examples of Bagdelete.
36. Fill in examples of Bagdifference.
37. Check examples of Bagintersect. Define \ (x,y) xny(); i.e. disjoint bags
38. Fill in examples of Setintersect.
39. Check examples of Setintersect. Define A (x,y) xny«=x; i.e., subset. Also define
I
I
40.
41.
42.
43.
44.
disjoint sets: X (x,y) xny={}.
in examples of Listintersect.
in examples of Equal. Very difficult to find examples; this led to:
in generalizations of Equal. Define "Samesize", "EqualCARs", and some losers.
in examples of Samesize.
Apply an Algorithm for Canonize to the args Samesize and Equal. AM eventually
synthesizes the canonizing function "Size". AM defines the set of canonical
structures: bags of TV, this later gets renamed as "Numbers".
Fill
Fill
Fill
Fill
45. Restrict the domain/range of Bagunion. A new operation is defined, Numberunion, with domain/range entry Bag>.
I
46. Fill in examples of Numberunion. Many found.
47. Check the domain/range of Numberunion. Range is 'Number. This operation is
renamed "Add2".
48. Restrict the domain/range cf Bagintersect to Numbers. Renamed "Minimum".
49. Restrict the domain/range of Bagdelete to Numbers. Renamed "SUB 1".
50. Restrict the domain/range of Baginsert to Numbers. AM calls the new operation
"Numberinsert". Its domain/range entry is Bag>.
51. Check the domain/range of Numberinsert. This doesn't lead anywhere.
52. Restrict the domain/range of Bagdifference to Numbers. This becomes "Subtract".
53. Fill in examples of Subtract. This leads to defining the relation LEQJ*). 3
54. Fill in examples of LEQ. Many found.
I
o
If a larger number ia "subtracted" from a smaller, the result is zero.
numbers having zero "difference". ia in that set iff
AM explicitly defines the eet of ordered peire of
x i* less than or equal to y.
I
AM:
Chapter 6
Discovery in Mathematics as Heuristic Search
117
55. Check examples of LECv
56. Apply algorithm of Coalesce to LEQ. LEOjx.x) is ConstantTrue.
57. Fill in examples of Parallel join2. Included is Parallel join2(Bags,Bags,Proj2), which
is renamed "TIMES", and Parallel join2(Structures,Structures,Projl), a generalized
Union operation renamed "GUnion", and a bunch of losers.
58. 69. Fill in and check examples of the operations just created.
70. Fill in examples of Coalesce. Created: SelfCompose, SelfInsert, SelfDelete, SelfAdd, SelfTimes, SelfUnion, etc. Also: Coarepeat2, Coajoin2, etc.
71. Fill in examples of SelfDelete. Many found.
72. Check examples of SelfDelete. SelfDelete is just Identityop.
73. Fill in examples of SelfMember. No positive examples found.
74. Check examples of SelfMember. Selfmember is just ConstantFalse.
75. Fill in examples of SelfAdd. Many found. User renames this "Doubling".
76. Check examples of Coalesce. Confirmed.
77. Check examples of Add 2. Confirmed.
78. Fill in examples of SelfTimes. Many found. Renamed "Squaring" by the user.
79. Fill in examples of SelfCompose. Defined SquaringoSquaring. Created AcldoAdd
(two versions: Add2l which is X (x,y,z) (x+y)+z, and Add22 which is j:+(y*z)>.
Similarly, two versions of TimesoTimes and of ComposeoCompose.
Fill
80.
in examples of Add2l. (x+y)+z. Many are found.
81. Fill in examples of Add 22. x+(y+z). Again many are found.
82. Check examples of Squaring. Confirmed.
83. Check examples of Add22. Add2l and Add22 appear equivalent. But first:
84. Check examples of Add2l. Add2l and Add22 still appear equivalent. Merge them.
So the proper argument for a generalized "Add" operation is a Bag.
85. Apply algorithm for Invert to argument 'Add. Define Invadd(x) as the set of all
bags of numbers (>0) whose sum is x. Also denoted Add"'(x).
86. Fill in examples of TIMES2I. (xy)z. Many are found.
87. Fill in examples of TIMES22. x(yz). Again many are found.
88. Check examples of TIMES22. TIMES2I and TIMES 22 may be equivalent.
89. Check examples of TIMES2I. TIMES2I and TIMES 22 still appear equivalent.
Merge them. So the proper argument for a generalized "TIMES" operation is a
Bag. Set up an analogy between TIMES and ADD, because of this fact.
90. Apply algorithm for Invert to argument TIMES'. Define InvTIMES(x) as the set
of all bags of numbers (>1) whose product is x. Analogic to InvAdd.
91.
Fill
in
of
Included
examples
Parallelreplace2.
are
Parallelreplace2(Bags,Bags,Proj2) (called MR2BBP2), and many losers.
92. 107. Fill in and check examples of the operations just created.
108. Fill in examples of Compose. So easy that AM creates IntCompose.
109. Fill in examples of IntCompose. The two chosen operations G,H must be such
that ran(H)2x, Timessq is TIMES
with its domain restricted to bags of perfect squares, Timesev takes only even
arguments, Timestoevens requires that the result be even, Timesto5q,
149. Check examples of Divisors. Define ODiv, lDiv, 2Div, and 3Div, the sets of
numbers whose Divisors value is the empty set, a singleton, a doubleton, and a
[exponentiation], Coarepeat2(Structures, Projl)
...
tripleton, respectively.
I
I
Discovery in
150. Fill in examples of lDiv. Only one example found: "1". Lower 1Div.Worth.
151. Fill in examples of ODiv. None found. Lower the worth of this concept.
152. Fill in examples of 2Div. A nice number are found. Raise 2Div.Worth.
153. Check examples of 2Div. All confirmed, but no pattern noticed.
154. Fill in examples of 3Div. A nice number found.
155. Check examples of 3Div. All confirmed. All are perfect squares.
156. Restrict Squareroot to numbers which are in 3Div. Call this Root3.
157. Fill in examples of Root3. Many found.
158. Check examples of Root3. All confirmed. All are in 2Div. Raise their worths.
159. Restrict Squaring to 2divs. Call the result Square2.
160. Fill in examples of Square2. Many found.
161. Check the range of Square2. Always 3Divs. Conjecture: x has 2 divisors iff x 2
has 3 divisors.
162. Restrict Squaring to 3Divs. Call the result Square3.
163. Restrict Squarerooting to 2Divs. Call the result Root2.
164. Fill in examples of Square! Many found.
165. Compose Divisorsof and Square3. Call the result DivSq3.
166. Fill in examples of DivSq3. Many found.
167. Check examples of DivSq3. All such examples are Samesize.
168.  175. More confirmations and explorations of the above conjecture. Gradually,
all its ramifications lead to deadends (as far as AM is concerned).
176. Fill in examples of Root2. None found. Conjecture that there are none.
AM:
Chapter 6
Discovery in Mathematics as
I
Heuristic Search
examples of InvTIMES. InvTIMES always contains a singleton bag, and
always contains a bag of primes.
Restrict the range of InvTIMES to bags of primes. Call this PrimeTimes.
Restrict the range of InvTIMES to singletons. Called SingleTimes.
Fill in examples of Primetimes. Many found.
Check examples of Primetimes. Always a singleton set. User renames this
conjecture "The unique factorization theorem".
Fill in examples of SingleTIMES. Many found.
Check examples of SingleTIMES. Always a singleton set. SingleTIMES is
actually the same as Baginsert!
177 Check
178
179
180
181
182
183
184 Fill in examples of Selfsetunion. Many found.
185 Check examples of Selfsetunion. This operation is same as
Identity.
186 Fill in examples of Selfbagunion. Many found.
187 Check examples of Selfbagunion. Confirmed. Nothing interesting noticed.
188 Fill in examples of InvADD.
189 Check examples of InvADD. Hordes of boring conjectures, so:
190 Restrict the domain of InvADD to primes (InvAddprimes), to evens (InvAddevens), to squares, etc.
191 Fill in examples of Invaddprimes. Many found.
192 Check examples of Invaddprimes. Confirmed, but nothing special noticed.
193 Fill in examples of Invaddevens. Many found.
194 Check examples of Invaddevens. Always contains a bag of primes.
195 Restrict the range of InvAddevens to bags of primes. Called PrimeADD.
196 Restrict the range of InvADD to singletons. Call that new operation Single ADD.
197 Fill in examples of PrimeADD. Many found.
198 Check examples of PrimeADD. Always a nonempty set (of bags of primes). User
renames this conjecture "Goldbach's conjecture".
199 Fill in examples of SingleADD. Many found.
200 Check examples of SingleADD. Always a singleton set. This operation is the same
as Baginsert and SingleTIMES.
201 Restrict the range of PrimeADD to singletons, by analogy to PrimeTIMES. 5 Call
the new operation PrimeADDSING.
202 Fill in examples of PrimeADDSING. Many found.
203 Check examples of PrimeADDSING. Nothing special noticed.
6
204 Fill in examples of Timessq. Many examples found.
205 Check domain/range of Timessq. Is the range actually Perfectsquares? Yes!
206 Fill in examples of Timesl. Recall that Timesl(x)sTlMES(l,x).
207 Check examples of Timesl. Apparently just a restriction of Identity.
208. Check examples of Timessq. Confirmed.
209. Fill in examples of TimesO.
210. Fill in examples of Times2.
211. Check examples of Times2. Apparently the same as Doubling. That is, :c+x=2»x.
Very important. By analogy, define Ad2(x) as x+2.
212. Fill in examples of Ad 2.
213. Check examples of Ad2. Nothing interesting noticed.
5
6
In this case, AM ie aaking which numbers are uniquely representable as the sum of two primes.
Recall that thia is just TIMES restricted to operate on perfect squares
I
I
I
I
1
I
I
I
I
I
1
AM:
Chapter 6
Discovery in Mathematics as Heuristic Search
214. Fill in specializations of Add. Among those created are: AddO (x+o), Addl, Add3,
ADDsq (addition restricted to perfect squares), Addev (sum of even numbers),
Addpr (sum of primes), etc.
215. Check examples of TimesO. The value always seems to be 0.
216. Fill in examples of Timesev.7 Many examples found.
217. Check examples of Timesev. Apparently all the results are Evens.
218. Fill in examples of Timestoev.8 Many found.
219. Fill in examples of Timestosq. Only a few found.
220. Check examples of Timestosq. All arguments always seem to be squares. Conjee:
Timestosq is really the same as Timessq. Merge the two. This is a false
conjecture, but did AM no harm.
221. Check examples of Timestoev. The domain always contains an even number.
222. Fill in examples of SelfUnion.
223. Check examples of SelfUnion.
224. Fill in examples of SubSet.
225. Check example of SubSet.
226. Fill in examples of SuperSet.
227. Check examples of SuperSet. Conjee: Subset(x.y) iff Superset(y.x). Important.
228. Fill in examples of ComposeoCompose1. AM creates some explosive combination
(e.g., (ComposeoCompose)o(ComposeoCompose)o(ComposeoCompose)), some poor
ones (e.g., SquareoCountoADD" 1 ), and even a few
winners (e.g.,
very few
SUB loCountoSelfInsert).
229. Check examples of ComposeoCompose1.
230. Fill in examples of ComposeoCompose2 9 AM recreates many of the previous
tasks' operations.
231. Check examples of ComposeoCompose2. Nothing noticed yet 10
232.  252. Fill in and check examples of the losing compositions just created.
253. Fill in examples of Addsq (i.e., sum of squares).
254. Check domain/range entries of Addsq. The range is not always perfect squares.
Define Addsqsq(x.y), which is True iff x and y are perfect squares and their sum
is a perfect square as well.
255. Fill in examples of Addpr; i.e., addition of primes.
256. Check Domain/range entries of Addpr. AM defines the set of pairs of primes
whose sum is also a prime. This is a bizarre derivation of prime pairs.
I
I
I
I
I

I

.
I
I
I
g
g
Recall that Timesev is just like TIMES restricted to operating on even numbers
That is, consider baga of numbera which multiply to give an even number.
Recall that the difference between thia operation and the last on* i* merely in the order of the composing: Fo(GoH) versus
(FoG)oH.
Later on, AM will uae the** new operations to diacover the associativity of Compos*
Chapter 6
AM:
Diacovery in Mathematics as Heuristic Search
121
I
6.1.2. TwoDimensional Behavior Graph
On the next two pages is a graph of the same "best run" which AM executed. The nodes
are concepts, and the links are actions which AM performed. Labels on the links indicate
when each action was taken, so the reader may observe how AM jumped around. It should
also easy to perceive from the graph which paths of development were abandoned which
concepts ignored, and which ones concentrated upon. These are precisely the features of
AM's behavior which are awkward to infer from a simple linear trace (as in the previous
section).
In more detail, here is how to read the graph: Each node is a concept. To save space, these
names are often highly abbreviated. For example, "xO" is used in place of "TIMES0".
I
Each concept name is surrounded by from zero to four numbers:
318
288
FROBNATION
I
310
291
The upper right number indicates the task number (see last section) during which examples
of this concept were filled in. The lower right number tells when they were checked. The
upper left number indicates when the Domain/range facet of that concept was modified.
Finally, the lower left number is the task number during which some new Algorithms for
that concept were obtained. A number in parentheses indicates that the task with that
number was a total failure.
Because of the limited space, it was decided that if a concept were ever renamed by the
user, then only that newer, mnemonic name would be given in the diagram. Thus there is
an arrow from "Coalesce" to "Square", an operation originally called "SelfTimes" by AM.
I
I
Sometimes, a concept will have under it a note of the form =GROK. This simply means that
AM eventually discovered that the concept was equivalent to the alreadyknown concept
"Grok", and probably forgot about this one (merged it into the one it already knew about).
The "trail" of discovery may pick up again at that preexisting concept. A node written as
*GROK means that the concept was really the same as "Grok", but AM never investigated it
enough to notice this.
Each node may have an arrow leading into it, and any number of arrows emanating from
it. The arrows indicate the creation of new concepts. Thus an arrow leading to concept
"Frobnate" indicates how that concept was created. An arrow directed away from Frobnate
points to a concept created as, e.g., a specialization or an example of Frobnate. No
arrowheads are in practice necessary: all arrows are directed downwards.
The arrows may be labelled, indicating precisely what they represent (e.g., composition,
restriction) and what the task number was when they occurred. For space rea.'ons, the
following convention has proven necessary: if an arrow emanating from C is unnumbered,
it is assumed to have occurred at the same time as the arrow to its immediate left which also
points from C; if all the arrows emanating from C have no number, than all their times of
I
I
I
I
I
I
AM:
Chapter 6
Discovery in Mathematics as Heuristic Search
122
occurrence are assumed to be the lower right v number of C. Finally, if C has no lower
right number, the arrow is assumed to have the value of the upper right number of C.
An unlabelled arrow is assumed to be an act of Specialization or the creation of an
Example. 12 Labels, when they do occur, are given in capitals and small letters; concept
names (nodes) are by contrast in all capitals.
All the numbers correspond to those given to the tasks in the taskbytask traces presented
in the last section (p. 115) and in Appendix 5 (p. 294).
I
The first part of this graph (presented below) contains static structural (and ultimately
numerical) concepts which were studied by AM:
STRUCTURES
3
SETS
8
.8
12
22
7
I
4
5
INTSET
K6
\
81SET
SINGLETONS DOUBLETONS
I
I
I
52
154
EVENS SQUARES 1OIV 8OIV PRIMES 3DIV
iEMPTY
153
155
The rest of the graph (presented on the next page) deals with activities which were
investigated:
12
Thie is often true becauae many concepte are created while checking example* of tome known concept.
It ehould be clear in each context which ia happening. If not, refer to the ehort trace in the preceding eection, and look
the appropriate taak number.
up
ftCTIVT.S
3(
U
15
17
18
21
25
21
32
33
35
38
TOEVEN xTOSQ'JBRE
xSQUnRE\xI
288
211
767 215
VS
tIOEN
Restrlxt^C^lßpotS
178X473 JU
rlrl
Rittnlct
——
XXS>O
164
\^\^o^^~^^~^_
\
84
\\63
165
168
157
167
158
176
176
3 ROOT 2
/ 83
IX.V+2
(in ADO)
ReitrlctMtiMrlct
OIVSQ3
iTIHES 134
182
183 146
221
IxSQURRE
221
Analogy
145
1«
143
(tit
RO2)
140
1365. \. BlT\ 81^ 85^87^*528
Of.YWZ Sfr(Y.Z) v
C~ 133
139
$:$:5~^ ,rI"STiELE .LRSTELE CR2R2"tiTion«n
InvirKS> which is not
eingleton. Whether they did or not depended only on the equality or inequality of the two argument*. There
tiny conjecturea propoaed which merely reechoed thi* general conclusion
were
*
AM:
Chapter 7
Discovery in Mathematics as Heuriatic Search
worths are initialized at the same value, the performance of AM
it is noticeably degraded.
144
doesn't collapse, although
Certain mutilations of the priorityvalue scheme for tasks on the agenda will cripple AM,
but it can resist most of the small changes tried in various experiments.
(e.g., Equality) was enough to block AM from
it
otherwise
concepts
got (in this case, Numbers). This makes
discovering some valuable
of advancement. But on the other
like
a
slender
chain
very
fragile,
AM's behavior sound
hand, many concepts (e.g., TIMES, Timberline, Primes 21 ) were discovered in several
22
More
independent ways. If AM's behavior is a chain, it is multiplystranded
this
conclusion
about
AM.
general
experiments of this sort should be done to test
Sometimes, removing just a single concepts
.
The heuristics are specific to their stated domain of applicability. Thus when working in
geometry, the Operation heuristics were just as useful as they were when AM worked in
elementary set theory or number theory. The set of facets seemed adequate for those
domains, too. The Intuition facet, which was rejected as a valid source of information ibout
sets and numbers, might have been more acceptable in geometry (e.g., something similar to
Gelernter's model of a geometric situation).
All in all, then, we conclude that AM was fairly tough, and about as general as its heuristics
claimed it was. AM is not invincible, infallible, or universal. Its strength lies in careful use
of heuristics. If there aren't enough domainspecific heuristics around, the system will simply
not perform well in that domain. If the heuristicusing control structure of AM is tampered
with 23, there is some chance of losing vital guiding information which the heuristics would
otherwise supply.
7.1.7. How
to Perform Experiments on
AM
The very fact that the kinds of experiments mentioned in the last section (and described in
detail in Section 6.2) can be "set up" and performed on AM, reflects a nice quality cf the
AM program.
Most of those experiments took only a matter of minutes to set up, only a few tiny
modifications to AM. For example, the one where all the Worth ratings were initialized to
the same value was done by evaluating the single LISP expression:
(MAPC
CONCEPTS '(X (c) (PUT c 'Worth 200)))
Primes wee discovered independently as follows: all numbers (>0) were aeen to be representable cc the sum of i mailer
number*! Add we* known to be analogous to TIMES; But not all number* (>!) appeared to be representable
cc the product of two emeller oneei Rule number 81 triggered (see Appendix 3, page 243), end AM
defined the eet of exception*: the **t of number* which could not be expressed aa the product )f two
mailer once; i.e., the primee.
except for c few weak spots, like Numbers. If they don't get diacovered, AM loses
"
e.g., treat
all reasons ac equivalent, co you
the agenda.
just
COUNT the number of reasons
a task has, to determine ite prio'ity on
AM:
Chapter 7
Discovery in Mathematics as Heuristic
Search
Similarly, here is how AM was modified to treat all tasks as if they had equal value: the
function Picktask has a statement of the form
(SETQ Currenttask (Firstmemberof Agenda))
2
All that was necessary was to replace the call on the function "Firstmemberof" by the
function "Randommemberof".
Even the most sophisticated experiment, the introduction of a new bunch of concepts took only a day of
those dealing with geometric notions like Between, Angle, Line
up.
work
to
set
conscious

Of course running the experiment involves the expenditure of hours of cpu time, so only a
limited number were actually performed.
There are certain experiments one can't easily perform on AM: removing all its heuristics,
for example. Most heuristic search programs would then wallow around, displaying just
how big their search space really was. But AM would just sit there, since it'd have nothing
plausible to do.
Many other experiments, while cute and easy to set up, are quite costly in terms of cpu time.
For example, the class of experiments of the form: "remove heuristics x, y, and z, and
observe the resultant affect on AM's behavior". This observation would entail running AM
for an hour or two of cpu time! Considering the number of subsets of heuristics, not all
these questions are going to get answered in our universe's lifetime. Considering the small
probable payoff from any one such experiment, very few should actually be attempted.
One nice experiment would be to monitor the contribution each heuristic is making. That
is, record each time it is used and record the final outcome of its activation (which may be
several cycles later). Unfortunately, AM's heuristics are not all coded as separate Lisp
entities, which one could then "trace". Rather, they are often interwoven with each other
into large program pieces. So this experiment can't be easily set up and run on AM.
Most of the experiments one could think of can be quickly set up  but only by someone
familiar with the LISP code of AM. It would be quite hard to modify AM so that the
untrained user could easily perform these experiments. Essentially, that would demand that
AM have a deep understanding of its own structure. This is of course desirable, fascinating,
26
challenging, but wasn't part of the design of AM.
24 In
this function is actually abbreviated "CAR".
Those described in the last chapter. The series of experiments began at the same time that this document was being
written, and was intended originally only as a diversion from the tedium of writing. The interesting character
of their results convinced me they should be included, even though they are few in number and quite
incomplete.
26
A
for future research projects in this general area: auch aystems should be designed in a way which
facilitatee a poorlytrained user not only using the system but experimenting on it.
suggestion
Chapter 7
AM:
Diacovery in Mathematics
Heuristic Search
7.1.8. Future Implications of this Project
One harsh measure of AM would be to demand what possible applications it will have.
This really means (i) the uses for the AM system, (ii) the uses for the ideas of how to create
such systems, (iii) conclusions about math and science one can draw from experiments with
AM.
Here are some of these implications, both real and potential:
1. New tools for computer scientists who want to create large knowledgebased system to
emulate some creative human activity.
la. The modular representation of knowledge that AM uses might prove to be
effective in any knowledgebased system. Division of a global problem into a multitude
of small chunks, each of them of the form of setting up one quite local "expert" on some
concept, is a nice way to make a hard task more managable. Conceivably, each needed
expert could be filled in by a human who really is an expert on that topic. Then the
global abilities of the system would be able to rely on quite sophisticated local criteria.
Fixing a set of facets once and for all permits effective intermodule communication.
lb. Some ideas may carry over unchanged into many fields of human creativity,
wherever local guiding rules exist. These include: (a) ideas about heuristics hiving
domains of applicability, (b) the policy of tacking them onto the most general knowledge
source (concept, module) they are relevant to, (c) the rippling scheme to locate relevant
knowledge, etc.,
2. A body of heuristics which can be built upon by others.
2a. Most of the particular heuristic judgmental criteria for interestingness, utility,
etc., might be valid in developing theorizers in other sciences. Recall that each rule has
its domain of applicability; many of the heuristics in AM are quite general.
2b. Just within the small domain in which AM already works, this base of
heuristics might be enlarged through contact with various mathematicians. If they are
willing to introspect and add some of their "rules" to AM's existing base, it might
gradually grow more and more powerful.
2c. Carrying this last point to the limit of possibility, one might imagine the
program possessing more heuristics than any single human. Of course, AM as it stands
now is missing so much of the 'human element', the life experiences that a
mathematician draws upon continually for inspiration, that merely amassing more
heuristics won't automatically push it to the level of a superhuman intelligence.
Another farout scenario is that of the great mathematicians of each generation po jring
their individual heuristics into an AMlike system. After a few generations have come
and gone, running that program could be a valuable way to bring about 'interactions'
between people who were not contemporaries.
3. New and better strategies for math educators, [optional]
3a. Since the key to AM's success seems to be its heuristics, and not the particular
concepts it knows, the whole orientation of mathematics education should perhaps be
modified, to provide experiences for the student which will build up these rules in his
mind. Learning a new theorem is worth much less than learning a new heuristic which
I
Chapter 7
AM:
Discovery in Mathematics as Heuristic Search
147
27
lets you discover new theorems. lam far from the first to urge such a revision (see,
e.g., [Koestler 67], p.265, or see [Papert 72]).
3b. If the repertoire of intuition (simulated realworld scenarios) were sufficient for
AM to develop elementary concepts of math, then educators should ensure that children
(46 years old) are thoroughly exposed to those scenarios. Such activities would include
seesaws, slides, piling marbles into pans of a balance scale, comparing the heights of
towers built out of cubical blocks, solving a jigsaw puzzle, etc. Unfortunately, AM failed
to show the value of these few scenarios. This was a potential application which was
not confirmed.
3c. One use for AM itself would be as a "fun" teaching tool. If a very nice user
interface is constructed, AM could serve as a model for, say, college freshmen with no
math research experience. They could watch AM, see the kinds of things it does, play
with it, and perhaps get a real flavor for (and get turned on by) doing math research. A
vast number of brilliant minds are too turned off by highschool drilling and college
calculus to stick around long enough to find out how exciting  and different research
math is compared to textbook math.

4. Further experiments on AM might tell us something about how the theory formation task
changes as a theory grows in sophistication. For example, can the same methods which
lead AM from premathematical concepts to arithmetic also lead AM from number
systems up to abstract algebra? Or are a new set of heuristic rules or extra concepts
required? My guess is that a few of each are lacking currently, but on/31 a few ' There is
a great deal of disagreement about this subject among mathematicians. By tracing
along the development of mathematics, one might categorize discoveries by how easy
they would be for an AMlike system to find. Sometimes, a discovery required the
invention of a brand new heuristic rule, which would clearly be beyond AM as
currently designed. Sometimes, discovery is based on the lucky random combination of
existing concepts, for no good a priori reason. It would be instructive to find out how
often this is necessarily the case: how often can't a mathematical discovery be motivated
and "explained" using heuristic rules of the kind AM possesses?
5. An unanticipated result was the creation of newtoMankind math (both directly and by
defining new, interesting concepts to investigate by hand). The amount of new bits of
mathematics developed to date is minuscule.
sa. As described in (2c) above, AM might absorb heuristics from several
individuals and thereby integrate their particular insights. Tiis might eventually result
in new mathematics being discovered.
sb. An even more exciting prospect, which never materialized, was that AM
would find a new redivision of existing concepts, an alternate formulation of some
established theory, much like Hamiltonian mechanics is an alternate unification of the
data which led to Newtonian mechanics. The only rudimentary behavior along these
lines was when AM occasionally derived a familiar concept in an abnormal way (e.g.,
TIMES was derived in four ways; Prime pairs were noticed by restricting Addition to
primes).
Usually. One kind of exception is the following: the ability to take a powerful theorem, and extract from it a new,
powerful heuristic. AM cannot do this, but it may turn out that this mechanism is quite crucial for human*'
obtaining new heuriatic*. Thia ia another open re«*arch problem
Chapter 7
AM:
Discovery in Mathematica aa Heuristic Search
7.1.9. Open Problems: Suggestions for Future Research
While AM can and should stand as a complete research project, part of its value will stem
from whatever future studies are sparked by it. Of course the "evaluation" of AM along
this dimension must wait for years, but even at the present time several such open problems
come to mind:
Devise Metaheuristics, rules capable of operating on and synthesizing new heuristic
rules. AM has shown the solution of this problem to be both nontrivial and
indispensable. AM's progress ground to a halt because fresh, powerful heuristics
were never produced. The next point suggests that the same need for new rules
exists in mathematics as a whole:
Examine the history of mathematics, and gradually build up a list of the heuristic
rules used. Does the following thesis have any validity: "The development of
mathematics is essentially the development of new heuristics." That is, can we 'factor
out' all the discoveries reachable by the set of heuristics available (known) to the
mathematicians at some time in history, and then explain each new big discovery
as requiring the synthesis of a brand new heuristic? For example, Bolyai and
Lobachevsky did this a century ago when they decided that counterintuitive
systems might still be consistent and interesting. NonEuclidean geometry resulted,
and no mathematician today would think twice about using the heuristic they
developed. Einstein invented a new heuristic more recently, when he dared to
consider that counterintuitive systems might actually have physical reality. 28 Whait
was once a bold new method is now a standard tool in theoretical physics.
In a far less dramatic vein, a hard open problem is that of building up a. body of
rules for symbolically instantiating a definition (a LISP predicate), These rules may
be structured hierarchically, so that rules specific to operating on 'operations whose
domain and range are equal' may be gathered. Is this set finite and manajfable; i.e.,
does some sort of "closure" occur after a few hundred (thousand?) such rules are
assembled?
»
More generally, we can ask for the expansion of all the heuristic rulus, of all
categories. This may be done by eliciting them from famous mathematicians, or
automatically by the application of very sophisticated metaheuristics. Some
categories of rules include: how to generalize/specialize definitions, how to find
examples of a given concept, how to optimize LISP algorithms.
Experiments can be done on AM. A few have been performed already, many more
are proposed in Section 6.2, and no doubt some additional ones have already
occurred to the reader.
Extend the analysis already begun (see p. 59) of the set of heuristics AM possesses.
One reason for such an analysis would be to achieve a better understanding of the
28
A* Courent eaye, "When Einatein tried to reduce the notion of 'simultaneous events occurring at different place*' to
obeervable phenomena, when he unmaeked aa a metaphysical prejudice the belief that thi* concept muet
he had found the key to hie theory of relativity."
have a »ei*ntific meaning in
Chapter 7
AM:
Discovery in Mathematics as Heuristic
Search
contribution of the heuristics. In some sense, the heuristics and the choice of
starting concepts "encode" the discoveries which AM makes, and the way it makes
them. A better understanding of that encoding may lead to new ideas for AM and
for future AMlike systems.
Rewrite AM. In Chapter 1, on page 9, it was pointed out that there are two
common species of heuristic search programs. One type has a legal move
generator, and heuristics to constrain it. The second type, including AM, has only
a set of heuristics, and they act as plausible move generators. Since AM seemed to
create new concepts, propose new conjectures, and formulate new tasks in a very
few distinct ways, it might very well be feasible to find a purely syntactic "legal
move generator" for AM, and to convert each existing heuristic into a form of
constraint. In that case, one could, e.g., remove all the heuristics and still see a
meaningful (if explosive) activity proceed. There might be a few surprises down
that path.
A more tractible project, a subset of the former one, would be to recode just the
conjecturefinding heuristics as constraints on a new, purely syntactic "legal
conjecture generator". A simple GenerateandTest paradigm would be used to
synthesize and examine large numbers of conjectures. Again, removing all the
heuristics would be a worthwhile experiment.
At the reaches of feasability, one can imagine trying to extend AM into more and
more fields, into lessformalizable domains. International politics has already been
suggested as a very hard future applications area.
Abstracting that last point, try to build up a set of criteria which make a domain
ripe for automating (e.g., it possesses a strong theory, it is knowledgerich (many
heuristics exist), the performance of the professionals/experts is much better than
that of the typical practitioners, the new discoveries in that field all fall into a small
variety of syntactic formats,...?). Initially, this study might help humans build better
and more appropriate scientific discovery programs. Someday, it might even permit
the creation of an automatictheoryformationprogramwriter.
The interaction between AM and the user is minimal and painful. Is there a more
effective language for communication? Should several languages exist, depending
on the type of message to be sent (pictures, control characters, a subset of natural
language, induction from examples, etc.)? Can AM's output be raised in
sophistication by introducing an internal model of the user and his state of
knowledge at each moment?
Human protocol studies may be appropriate, to test out the model of mathematical
research which AM puts forward. Are the sequences of actions similar? Are the
mistakes analogous? Do the pauses which the humans emit quantitatively
correspond to AM's periods of gathering and running 'Suggest' heuristics?
Can the idea of Intuition functions be developed into a useful mechanism? If not,
how else might realworld experiences be made available to an automated
researcher to draw upon (for analogies, to base new theories upon)? Could one
AM:
Chapter 7
Discovery in Mathematics as Heuristic Search
interface physical effectors and receptors and quite literally allow the program to
'play around in the real world' for his analogies?
Most of the 'future implications' discussed in the last section suggest future activities
(e.g., new educational experiments and techniques).
Most of the 'limiting assumptions' discussed in a later section (page 157) can be
tackled with today's techniques (plus a great deal of effort). Thus each of them
counts as an open problem for research.
Perform an informationtheoretic analysis on AM. What is the value of each
heuristic? the new information content of each new conjecture?
If you're interested in natural language, the very hard problem exists of giving AM
(or a similar system) the ability to really do inferential processing on the reasons
attached to tasks on the agenda. Instead of just being able to test for equality of
two reasons, it would be much more intelligent to be able to infer the kind of
relationship between any two reasons; if they overlap semantically, we'd like to be
able to compute precisely how that should that effect the overall rating for the task;
etc.
Modify the control structure of AM, as follows. Allow minigoals to exist, and
supply new rules for setting them up (plausible goal generators) and altering those
goals, plus some new rules and algorithms for satisfying them. The modification I
have in mind would result in new tasks being proposed because of certain current
goals, and existing tasks would be reordered so as to raise the chance of satisfying
some important goal. Finally, the human watching AM would be able to observe
the rationality (hopefully) of the goals which were set. The simple "Focus of
Attention" mechanism already in AM is a tiny step in this goaloriented direction.
Note that this proposal itself demonstrates that AM is not inherently opposed to a
goaldirected control structure. Rather, AM simply possesses only a partial set of
mechanisms for complete reasoning about its domain.
7.1.10. Comparison to Other Systems
One popular way to judge a system is to compare it to other, similar systems, and/or to
others' proposed criteria for such systems. There is no other project (known to the author)
having the same objective: automated math research. 29 Many somewhat related efforts have
been reported in the literature and will be mentioned here.
Several projects have been undertaken which overlap small pieces of the AM system and in
addition concentrate deeply upon some area not present in AM. For example, the CLET
29 In
[Atkin A Birch 1971], e.g., we find no mention of the computer except cc a number cruncher.
I
AM:
Chapter 7
Discovery in Mathematics as Heurirtic Search
30 but
the
system [Badre 73] worked on learning the decimal addition algorithm
"mathematics discovery" aspects of that system were neither emphasized nor worth
emphasizing; it was an interesting natural language communication study. The same
31
comment applies to several related studies by IMSSS
.
Boyer and Moore's theoremprover [Boyer&Moore 75] embodies some of the spirit of AM
(e.g., generalizing the definition of a LISP function), but its motivations are quite different,
32
its knowledge base is minimal, and its methods purely formal. The same comments apply
to the SAM program [Guard 69], in which a resolution theoremprover is set to work on
unsolved problems in lattice theory.
Among the attempts to incorporate heuristic knowledge into a theorem prover, we should
also mention [Wang 60], [Pitrat 70], [Bledsoe 71], and [Brotz 74]. How did AM differ from
these "heuristic theoremprovers"? The goaldriven control structure of these systems is a
real but only minor difference from AM's control structure (e.g., AM's "focus of attention" is
a rudimentary step in that direction; see p. 150). The fact that their overall activity is
typically labelled as deductive is also not a fundamental distinction (since constructing a
proof is usually in practice quite inductive). Even the character of the inference processes
are analogous: The provers typically contain a couple binary inference rules, like Modus
Ponens, which are relatively risky to apply but can yield big results; AM's few "binary"
operators have the same characteristics: Compose, Canonize, Logicallycombine (disjoin and
conjoin). The main distinction is that the theorem provers each incorporate only a handful
of heuristics. The reason for this, in turn, is the paucity of good heuristics which exist for
the very general task environment in which they operate: domainindependent (asemantic)
predicate calculus theorem proving. The need for additional guidance was recognized by
33
these researchers. For example, see [Wang 60], p. 3 and p. 17. Or as Bledsoe says :
There is a real difference between doing some mathematics and being a
mathematician. The difference is principally one of judgment: in the selection of a
problem (theorem to be proved); in determining its relevance;... It is precisely in
these areas that machine provers have been so lacking. This kind of judgment has
to be supplied by the user... Thus a crucial part of the resolution proof is the
selection of the reference theorems by the human user; the human, by this one
action, usually employs more skill than that used by the computer in the proof.
Many researchers have constructed programs which pioneered some of the\echniques AM
uses3". [Gelernter 63] reports the use of prototypical examples as analogic models to guide
search in geometry, and [Bundy 73] employs models of "sticks" to help his program work
with natural numbers. The single heuristic of analogy was studied in [Evans 68] and
Given the addition table up to 10 ♦ 10, plus an English text description of what it means to carry, how and when to
carry, etc., actually write a program capable of adding two 3digit numbers
31
See [Smith 74a], for example
This is not meant as criticism; considering the goale of those researchers, and the age of that system, their work is quite
significant.
33 [Bledsoe 71], p. 73
34 In
many cases, those techniques were ueed for the first time, hence were thought of as tricks
.
AM:
Chapter 7
Discovery in Msthematica aa Heuriatic Search
[Kling 7 1] 35
Theory formation systems in any field have been few. MetaDendral [Buchanan 74]
represents perhaps the best of these. Its task is to unify a body of mass spectral data
(examples of "proper" identifications of spectra) into a small body of rules for making
identifications. Thus even this system is given a fixed task, a fixed set of data to find
regularities within. AM, however, must find its own data, and take the responsibility for
managing its own time, for not looking too long at worthless data.36 There has bee i much
written about scientific theory formation (e.g., [Hempel 52]), but very little of it is specific
enough to be of immediate use to A I researchers. A couple pointers to excellent discussions
of this sort are: [Fogel 66], [Simon 73], and [Buchanan 75]. Also worth noting is a
discussion near the end of [Amarel 69], in which "formation" and "modelling" problems are
I
treated:
The problem of model finding is related to the following general question raised
by Schutzenberger [In discussion at the Conference on Intelligence and Intelligert
Systems, Athens, Ga., 1967]: 'What do we want to do with intelligent systems thct
relates to the work of mathematicians?'. So far all we have done in this genem!
area is to emulate some of the reasonably simple activities of mathematicians,
which is finding consequences from given assumptions, reasoning, proving
theorems. A certain amount of work of this type was already done in ths
propositional and predicate calculi, as well as in some other mathematical systems.
But this is only one aspect of the work that goes on in mathematics.
Another very important aspect is the one of finding general properties cf
structures, finding analogies, similarities, isomorphisms, and so on. This is the
type of activity that is extremely important for our understanding of modelfinding mechanisms. Work in this area is more difficult than theoremproving. Ths
problem here is that of theorem finding.
AM is one of the first attempts to construct a "theoremfinding" program. As Amarel noted,
it may be possible to learn from such programs how to tackle the general task of automating
scientific research.
Besides "math systems", and "creative thinking systems", and "theory formation systems", we
should at least discuss others' thoughts on the issue of algorithmically doing math research.
Some individuals feel it is not so farfetched to imagine automating mathematical research
(e.g., Paul Cohen). Others (e.g., Polya) would probably disagree. The presence of \i highspeed, generalpurpose symbol manipulator in our midst now makes investigation of that
question possible.
I
There has been very little published thought about discovery in mathematics from an
algorithmic point of view; even clear thinkers like Polya and Poincare' treat mathematical
ability as a sacred, almost mystic quality, tied to the unconscious. The writings of
philosophers and psychologists invariably attempt to examine human performance and
belief, which are far more managable than creativity in vitro. Belief formulae in inductive
35
36
Brott's program, [Brotz 74], uses thi* to propose uteful lemmata.
In caee thet wasn't clear: MetaDendral ha* a fixed act of template* for rule* which it withe* to
and a fixed
vocabulary of maaa apectral concept* which can be plugged into those templates. AM also has oily a few
stock formate for conjectures, but it selectively enlargea its vocabulary of math concepts.
I
I
AM:
Chapter 7
Discovery in Mathematics
Heuri»tic Search
logic37 invariably fall back upon how well they fit human measurements. The abilities of a
computer and a brain are too distinct to consider blindly working for results (let alone
algorithms!) one possesses which match those of the other.
7.2.
Capabilities
and Limitations of AM
The first two subsections contain a general discussion of what AM can and can't do. Later
subsections deal with powers and limitations inherent in using an agenda scheme, in fixing
the domain of AM, and in picking one specific model of math research to build AM upon.
The AM program exists only because a great many simplifying assumptions were tolerated;
these are discussed in Section 7.2.4 (p. 157). Finally, some speculation is made about the
ultimate powers and weaknesses of any systems which are designed very much like AM.
7.2.1. Current Abilities
What fields has AM worked in so far? AM is now able to explore a small bit of the theory
nor has it
of sets, data types, numbers, and plane geometry. It by no means has been fed
known
of
those
It
be more
any
might
a
fraction
of
what
is
in
fields.
rediscovered
large
accurate to be humble and restate those domains as: elementary finite set theory, trivial
observations about four kinds of data types, arithmetic and elementary divisibility theory,
and simple relationships between lines, angles, and triangles. So a sophisticated concept in
each domain which was discovered by AM might be:
" de Morgan's laws
38
" the fact that Deleteolnsert never alters Bags or Lists
factorization
" unique
triangles
" similar

I


I
Can AM work in a new field, like politics? AM can work in a new elementary, formalized
domain, if it is fed a supplemental base of conceptual primitives for that domain. To work
in plane geometry, it sufficed to give AM about twenty new primitive concepts, each with a
few parts filled in. Another domain which AM could work in would be elementary
mechanics. The more informal the desired field, the less of AM that is relevant. Perhaps an
AMlike system could be built for a constrained, precise political task. 39 Disclaimer: Even
for a very small domain, the amount of commonsense knowledge such a system would need
is staggering. It is unfortunate to provide such a trivial answer to such an important
question, but there is no easy way to answer it more fully until years of additional research
are performed.
Can AM discover X? Why
didn't it do V? It
is difficult to predict whether AM will (without
37
[Hintikk* 62], [Pietarinin 72] The latter also contain* a good summary of Carnap's X,et formalization.
For exemple,
38 Take an item x, insert it into (the front of) structure B, then delete one (the first) occurrence of x from B
39 For example, such a politicsoriented AMlike ayatem
might conceive the notion of a group of political entities which view
themselves as quite disparate, but which are viewed from the outside as a single unit: g., 'the Arabs', 'the
American Indians'. Conjectures about this concept might include its reputation as a poor combatant (and
why). Many of the same facets AM uses would carry over to repreaent concepta in that new domain
*
I
AM:
Chapter 7
Discovery in Mathematict
Heuristic Search
modifications) ever make a specific given discovery. Although its capabilities are small, its
limitations are hazy. What makes the matter even worse is that, given a concept C which
AM missed discovering, there is probably a reasonable heuristic rule which is missing from
AM, which would enable that discovery. One danger of this "debugging" is that a rule will
be added which only leads to that one desired discovery, and isn't good for anything else. In
that case, the new heuristic rule would simply be an encoding of a specific bit of
mathematics which AM would then appear to discover using general methods. This must
be avoided at all costs, even at the cost of intentionally giving up a certain discovery. If the
needed rule is general it has many applications and leads to many interesting results —
then it really was an oversight not to include it in AM. Although I believe that there are
not too many such omissions still within the small realm AM explores, there is no objective
way to demonstrate that, except by further long tests with AM.
I

In what ways are new concepts created? Although the answer to this is accurately given in
Section 4.3, page 42 (namely, this is mainly the jurisdiction of the right sides of heuristic
rules), and although I dislike the simpleminded way it makes AM sound, the lisi: below
does characterize the major ways in which new concepts get born:
Fill in examples of a concept (e.g., by instantiating or running its definition)
Create a generalization of a given concept (e.g., by weakening its definition)
Create a specialization of a given concept (e.g., by restricting its domain/range)
Compose two operations f,g, thereby creating a new one h. [Define h(x)=f(g(x))]
Coalesce an operation f into a new one g. [Define g(x)=f(x,x)]
Permute the order of the arguments of an operation. [Define g(x,y)=f(y,x)]
Invert an operation [g(x)=y iff f(y)=x] (e.g., from Squaring, create Squarerooting)
Canonize one predicate PI with respect to a more general one P2 [create a new concept f,
an operation, such that: P2(x,y) iff PI (f(x),f(y))]
Create a new operation g, which is the repeated application of an existing operation f
The usual logical combinations of existing concepts x,y: xAy, xvy, x, etc.
Below is a similar list, giving the primary ways in which AM formulates new conjectures:
Notice that concept CI is really an example of concept C
2
Notice that concept CI is really a specialization (or: generalization) of C
2
Notice that CI is equal to C2; or: almost always equal
Notice that CI and C2
are related by some known concept
Check and update the domain/range of an existing operation
If two concepts are analogous, extend the analogy to their conjectures as well
In summary, we can say that AM has achieved its original purpose: to be guided
successfully by a large set of local heuristic rules, in the discovery of new mathematical
theories. Besides creating new concepts and noticing conjectures, AM has the key "ability"
of appearing to decide rationally what to work on at each moment. This is a resull of the
agenda of tasks containing associated reasons. Of course all of these abilities stem from
the quality and the quantity of local heuristic rules: little plausible move generators and

evaluators.
7.2.2 Current Limitations
Below are several shortcomings of AM, which hurt its behavior but are not believed
inherent limitations of its design. They are presented in order of decreasing severity.
to
be
I
I
I
Chapter 7
AM:
Discovery in Mathematics as Heuristic Search
Perhaps the most serious limitation on AM's current behavior arose from the lack of
I
I
I
constraints on left sides of heuristic rules. It turned out that this excessive freedom made it
difficult for AM to inspect and analyze and synthesize its own heuristics; such a need was
not foreseen at the time AM was designed. It was thought that the power to manipulate
heuristic rules was an ability which the author must have, but which the system wouldn't
require. As it turned out, AM did successfully develop new concepts several levels deeper
than the ones it started with. But as the new concepts got further and further away from
those initial ones, they had fewer and fewer specific heuristics filled in (since they had to be
filled in by AM itself). Gradually, AM found itself relying on heuristics which were very
general compared to the concepts it was dealing with (e.g., forced to use heuristics about
Objects when dealing with Numbers). Heuristics for dealing with heuristics do exist, and
their number could be increased. This is not an easy job: finding a new metaheuristic is a
tough process. Heuristics are rarely more than compiled hindsight; hence it's difficult to
create new ones "before the fact".
AM has no notion of proof, proof techniques, formal validity, heuristics for finding
counterexamples, etc. Thus it never really establishes any conjecture formally. This could
probably be remedied by adding about 25 new concepts (and their 100 new associated
heuristics) dealing with such topics. The needed concepts have been outlined on paper, but
not yet coded. It would probably require a few hundred hours to code and debug them.
The user interface is quite primitive, and this again could be dramatically improved with
just a couple hundred hours' work. AM's explanation system is almost nonexistent: the user
must ask a question quickly, or AM will have already destroyed the information needed to
construct an answer. A clean record of recent system history and a nice scheme for tracking
down reasons for modifying old rules and adding new ones dynamically does not exist at
the level which is found, e.g., in MYCIN [Davis 76]. There is no trivial way to have the
system print out its heuristics in a format which is intelligible to the untrained user.
II
I
I
An important type of analogy which was untapped by AM was that between heuristics. If
two situations were similar, conceivably the heuristics useful in one situation might be
useful (or have useful analogues) in the new situation (see [Koppelman 75]). Perhaps this
is a viable way of enlarging the known heuristics. Such "metalevel" activities were kept to
a minimum throughout AM, and this proved to be a serious limitation. My intuition tells
me that the "right" ten metarules could correct this particular deficiency.
The idea of "Intuitions" facets was a flop. Intuitions were meant to model reality, at least
little pieces of it, so that AM could perform (simulate) physical experiments, and observe the
results. The major problem here was that so little of the world was modelled that the only
relationships derivable were those foreseen by the author. This lack of generality was
unacceptable, and the intuitions were completely excised. The original idea might lead
somewhere if it were developed fully. As with all limitations of AM, I leave this as an open
suggestion for future research.
Several limitations arose from the constraints of the agenda scheme, from the choice of finite
set theory as the domain to work in, and from the particular model of math research that
was postulated. These will be discussed in the next few subsections.
Chapter 7
7.2.3. Limitations
AM:
Discovery in Mathematics aa Heuristic Search
of the Agenda scheme
The following quibbles with the agenda scheme get less and less important. When you get
bored, skip to the next subsection.
Currently, it is difficult to include heuristics which interact with one another in any
significant way. The whole fibre of the Agenda scheme assumes perfect independence of
heuristics. The global formula used to rate tasks on the agenda assumes perfect
superposition of reasons: there are no "crossterms". Is this assumption always valid?
Unfortunately no, not even for the limited domain AM has explored. Sometimes, two
reasons are very similar: "Examples of Sets would permit finding examples of Union" and
"Examples of Sets would permit finding examples of Intersection". In that case, tieir two
ratings shouldn't cause such a big increase in the overall priority value of the task "Fillin
examples of Sets".
Sometimes, a heuristic rule will want to dissuade the system from some activity. Thus a
negative numeric contribution to a task's priority value is desired. This is not figured into
the current scheme. With a slight modification, the global formula could preserve the sign
(signum) of each reason's rating.
Tasks on the agenda list are ordered by their numeric priority value. Each reason's
value is kept, too. When new reasons are added, these values are used to
recompute a new priority for the task. Each reason's rating was computed by a little
formula found inside some heuristic rule. Those formulae are not kept hanging around.
One big improvement in apparent intelligence could be attained by tacking on those little
formulae to the reasons. When a new reason is added, the old reasons' rating formulae
would be evaluated again. They might indeed give new numbers. For example, suppose
one reason was "Few examples of X are known". But by now, other tasks have meanwhile
inadvertantly filled in several examples of X. Then that little reason's formula would come
up with a much lower value than it did originally. In fact, the value might be so low that
the reason was dropped altogether. If the formulae were kept, it might be good practice to
evaluate them for the top two or three tasks on the agenda, to see if they might change their
ordering. Also, the top task's priority would then be more accurate, and recall that its value
is used to determine the cpu time and list cell space quanta that the task is allowed to use
up. At the moment, AM is not set up to store the little functions, and if modified to do so, it
uses up a lot more space than it can afford. Also, the top few jobs are almost never
semantically coupled (except by "focus of attention"), so the precise order in which they are
executed rarely matters.
numeric
Perhaps what is needed is not a single priority value for each task, but a vector of numbers.
At each cycle, AM would construct a vector of its current "interests" and needs, and each
task's vector would be dotmultiplied against this global vector of AM's desires. The
highest scorer would then be chosen. For example, one dimension of the rating could be
"safety", and one could be "best possible payoff", one could be "average expected payoff", etc.
Sometimes, AM would have to break out of a stagnant situation, and it would be willing to
try riskier tasks than usual. This was not implemented because of the great increase in cpu
time it would cause. It is, however, probably a better design than the current on;. Even
more intelligent schemes can be envisioned involving more and more symbolic data being
stored with each task. Ultimately, this would be just the English reasons themselves by that

1
AM:
Chapter 7
Discovery in Mathamatica as Heuristic Search
time, the taskorderer would have grown into an incredibly complex AI program itself (a
natural language program plus an interrelator plus...).
The agenda list should really be an agenda tree , since the ordering of tasks is really just
partial, not total. If this is clear, then skip the rest of this paragraph. There are some
"legitimate" orderings of tasks on the agenda; if task X is supported by a subset of the
reasons which support V, then typically the priority of X will be less than or equal to the
priority of Y. Two tasks of the form "Fillin examples of A", "Fill in examples of B" can be
ordered simply because A is currently much more interesting than B. But often, two tasks
will have no ironclad ordering between them: compare "Fillin examples of Sets" and "Check
generalizations of Union". Thus the ordering is only partial, and it is the artifice of the
global evaluation function which embeds this into a linear ordering. If multiprocessors are
used, it might be advantageous to keep the original partial ordering around.
7.2.4. Limiting Assumptions
AM only "got off the ground" because a number of sweeping assumptions were made,
pertaining to what could be ignored, how a complex process could be adequately simulated,
etc. Now that AM is running, however, those same simplifications crop up as limitations to
the system's behavior. Each of the following points is a 'convenient falsehood. Although
the reader has already been told about some of these, it's worth listing them all together
here:
The only communication necessary from AM to the user is keeping the user
informed of what AM is doing. No natural language ability is required by AM;
simple template instantiation is sufficient.
The only communication from the user to AM is an occasional interrupt, when the
user wishes to provide some guidance or to pose a query. Both of these can be
stereotyped and passed easily through a very narrow channel.41
Each heuristic has a welldefined domain of applicability, which can be specified
just by giving the name of a single concept.
If concept CI is more specialized than C2, then Cls heuristics will be more
powerful and should be executed before C2's (whenever both concepts' heuristics
are relevant).
If hi and h2 are two heuristics attached
spend any time ordering them.
I
to concept
C, then it is not necessary to
Heuristics superimpose perfectly; they never interact strongly with each other.
'"
maybe an agenda Heap.
E.g., a set of escape characters, so TW means
something else', etc
'Why did yOU do that?', TU meana 'Uninteresting/ Go
071 tO
AM:
Chapter 7
Oiscovery in Mathematics as Heuristic Search
The reasons supporting a task can be mere tokens; it suffices to be able to inspect
them for equality. They need not follow a constrained syntax. The value of a
reason is adequately characterized by a unidimensional numeric rating.
The reasons supporting a task superimpose perfectly; they never interact with each
other.
Supporting reasons  and their ratings  never change with time, with one
exception: the ephemeron 'Focus of attention.
It doesn't matter in what order the supporting reasons for a task were added.
There is no need for negative or inhibitory reasons, which would decrease the
priority value of a task.
At any moment, the top few tasks on the agenda are not coupled strongly; it is not
necessary to expend extra processing time to carefully order them.
The tasks on the agenda are completely independent of each other, in the .
Integers: positive and negative whole numbers; i.c
2, 1,0, 1, 2...
Map: used as a verb, this word indicates the action of applying a function or a relation; e.g.,
we say that squaring maps 7 into 49. Used as a noun, it is a synonym for function.
Mathematical concept: this is taken to mean all the constructions, definitions, conjectures,
operations, structures, etc. that a mathematician deals with. Some examples: Setintersection,
Sets, The unique factorization theorem, every entry listed in this glossary.
AM
Appendix I
Discovery in Mathematics as Heuristic Search
Mathematical intuition: this is the mental imagery which can be brought to bear. Typically,
we transform the situation to an abstract, simplified one, manipulate it there, and retranslate the results into the original notation. For example, our intuition about "ordering"
may involve the image of marks on a yardstick. We can then answer questions involving
ordering rapidly, using this representation. Three features of the intuitive image should be
noted: (i) it is typically fast and simple, (ii) it is opaque, one cannot introspect too easily on
"why it works", and (iii) it is fallible, occasionally leading to wrong results.
Mathematical research: The fundamental idea here is that mathematics is an empirical
science, just as much as chemistry or physics. In doing research, the ultimate goal is the
creation of new, interesting theories, but the techniques used include looking for patterns in
empirical data, inducing new conjectures, modelling some aspects of the real world, etc.
Although the final product looks like a smooth, formal development, magically flowing from
postulates to lemmas to theorems, the actual research process involved untold blind alleys,
rough guesses, and hard work. (Analogy: The process of painting is rarely itself artistic.)
Mathematical theory: to qualify as a theory, we must have (i) a basis of undefined primitive
involving these, (iii) axioms involving all the primitives and defined
terms (iv) conjectures and theorems relating these terms. To be at all worthwhile, however,
the theory must also meet the fuzzy requirements that (v) there is some correspondence
between the primitives and some "realworld" concepts, between the axioms and some "real"
relationships, and (vi) some of the theorems are unexpected, hard to prove, elegant,
interesting, etc.
terms, (ii) definitions
Mersenne prime: a prime number which happens to be of the form 2P1, where p is prime.
Natural numbers: nonnegative integers; i.e., 0, 1,2, 3,...
No.: an abbreviation for "Number".
typical loose fashion of computer scientists, I intend this to me in a nonnegative integer: i.e., a natural number.
Number, in the
Ordering: the concept of "before" and "after". This distinguishes a list from a bag
(multiset). The formal axioms for ordering simply state the obvious properties of the
intuitive image of a list
Prime numbers: natural numbers which have no divisors other than 1 and themself; e.g., 17,
but not 15 (=3xs). Primes are interesting because of the myriad times they cop up in
diverse theorems
from the Chinese Remainder Theorem (solving systems of linear
congruence equations), to the Law of Quadratic Reciprocity, to Fermat's Theorem (for all
integers n, for all primes p, nP is congruent to n (mod p)). The "secret" of their value lies in
the fact that all integers can be factored uniquely into a set of prime divisors. This "Unique
Factorization Theorem" lets us reduce questions about integers to questions about primes.

Prime pairs: two prime numbers whose difference is two; e.g., 17 and 19.
Relation: an operation which associates, for each element of some set D, a set of elements E
{ej, e2,..} of some set R. D and R are the domain and range of the relation. For example,

I
I
I
AM
Appendix I
Discovery in Mathematica as Heuristic Search
167

i.e., all integers which 5 is
the relation
to 5 the set of numbers {5, 6, 7, 8,...}
"<" associates
less than or equal
to. The domain and range of this relation are the integers.
Settheoretic: having to do (in the context of this thesis) with elementary finite set theory
and the primitive notions of mathematics (e.g., union, insert, predicate, conjecture).
Unity: a fancy way of referring to the natural number "1".
: The relation
"dividesevenlyinto". Thus we say 26.
: The operation of negation.
v: Disjunction.
"iX" is read
as "not X".
"Avß" is read as "A or B".
a: Conjunction. "Aaß" is read as "A and B".
®: Exclusive or.
"A«B" is read as "A or B, but
»: Implication. "A+B" is read as "If A
both".
then B".
«. Logical equivalence. "AoB" is read as
V: Universal
not
"A if and only if B".
quantification. "VX" is read as "For all X".
3: Existential quantification. "3X" is read as "For some X".
I
I
I
Appendix I
AM
Discovery in Mathematica
Appendix 1.2. Glossary
Heuristic Search
of AI Terms
ACTORs: A modular form of representation, useful for distributing of the task of control
among several components in a computer program. Each ACTOR is a black box with no
parts or slots, but which does have some assertions (a "contract") which he must honor. It
merely responds to a fixed set of messages, by sending out certain messages of his own.
These are delivered via a bureaucracy. See [Hewitt 76].
I
I
I
A I: an abbreviation for Artificial Intelligence.
Bag: A bag is a kind of list structure, a bunch of elements which are unordered, but one in
which multiple copies of the same element are permitted. One may visualize a paper bag
filled with cardboard letters. Technically, we shall say that a set is not considered to be a
bag. A bag is denoted by enclosure within parentheses, just as sets are within braces. So
the bag containing X and four V's might be written (X V V V V), and would be considered
indistinguishable from the bag (V V V X V).
BEINGs: A modular form of representation of knowledge, conceived as a collection of
cooperating experts. Each expert is modelled by one module, which consists of a list of
Question/A nsweringprogram pairs. The set of questions is fixed for all the Beings in the
system. When any Being has a question, he broadcasts it to the entire system, and some
Being who recognizes it will take over control and try to answer it by running his
appropriate A nsweringprogram. In the process of running this, some new questions may
arise. Notice that Beings distribute responsibility for control and for static knowledge. See
[Lenat 75b].
Bug: a flaw in a computer program. As Corey Sacerdoti put it, a bug refers to something
which is broken but not badly.
Concept: within the context of this document, the word "concept" typically refers to a precise
framelike data structure, a BEING. Semantically, each concept is meant to correspond to
one abstract entity thai: we would intuitively call a concept: an object, an operator, a
conjecture, etc. See "facet".
Cooperating Knowledge Sources: Very often, in tackling a problem, one receives some hints
and some constraints from very different sources, phrased in very different languages, often
addressing different representations of the problem. For example, in trying understand a
human speaker, our memory of the previous discussion and knowledge of the speaker may
narrow down the possible meanings of what he is saying. Our ears, of course, register the
precise acoustic waveforms he is uttering. Our English vocabulary forces us to interpret
imperfect signals as real words. Our eyes see his gestures and his lip movements, and give
us more information. All these different sources of information must be used, and yet they
all are talking in different "languages" to us. The most trivial solution is to keep all the
sources independent, and keep working until one of them can solve the problem all by itself.
A much better solution is to transform all their babblings into one canonical representation,
one single language. This way, all the knowledge sources can cooperate.
Coupled: two functional subsystems are causally connected; one influences the other. See the
entry for "Linear".
I
I
I
I
I
I
I
I
Appendix I
AM
Discovery in Mathematica as Heuristic Search
CPU time: CentralProcessingUnit runtime (cpu time) is the number of execution cycles of
the computer that the AM program has used up. This is conveniently measured in seconds,
minutes, and hours, where one cpu minute is the amount of processing done in one minute
of real time, when AM has 1007. of the machine, and is runninng without any input or
output.
CS: an abbreviation for Computer Science.
Execution: a program is actually used by running it on a particular set of input data. This
process is known as program execution.
Facet: Within the context of this document, the word "facet" denotes a slot of the kind of
datastructure known as "concepts" (qv). Thus "a facet of the Compose concept" really just
means a slot of a particular frame, a part of certain BEING, one single attribute/value pair
taken from the property list of the Lisp atom named Compose. Semantically, each facet
holds information pertaining to a single aspect of the concept it is a part of; hence the
suggestive name: "facet".
FRAMEs: A modular representation of knowledge. Each module is a list of Feature/Value
pairs. The value represents a default assumption which can be relied on until/unless new
information comes in about that feature. Each frame has whatever features (called "slots")
seem appropriate. Whenever a situation S is encountered, the frame(s) for S are activated.
As new information rolls in, it replaces the default information in various slots. Notice the
emphasis on distributing static knowledge (data), not necessarily control, in such a system.
See [Piaget 55] or [Minsky 75].
Function: a small, executable part of a program. When fed the proper kind of argument(s),
a function will "run" and ultimately produce some sort of value. Unlike pure mathematical
functions (see the previous glossary), a Lisp function can have side effects (qv).
Garbage collection: As a Lisp program executes, various list structures (pointer networks)
are created. When the last pointer to a structure is removed, that structure has essentially
been irretrievably forgotten. If the operating system knew which storage cells were thus
"free", it could recycle them, reuse them. The process of finding and liberating such
discarded lists is called garbage collection. This is performed automatically by the Lisp
language, whenever space is almost all filled up.
Hack: A quick job that produces what is needed, but not well. Introducing a heuristic which
was only used once, in a predetermined way (e.g., to fix a particular bug), would be a real
hack.
Handcrafting: the human programmer carefully designs his system in such a way that the
pieces just manage to mesh. For instance: he provides just the perfect set of axioms so that
his theoremprover can solve a certain problem, or he modifies the program's strategies so
that they efficiently manipulate the axiom set in just the right way.
Heterarchy: A kind of control structure for a computer program which is distinct from
hierearchy. Heterarchical structuring views the whole program as a collection of equal
partners, an unstructured set of functions. "Control" is viewed as a spotlight, which can be
Appendix I
AM
Discovery in Mathematics as Heuristic Search
flicked from one function to another. The functions can affect who does or doesn't get
control next, but there is no guarantee who will get control, or that control will revert back
to some function which once had it. Aside from the lure of its democratic flavor, it is
clearly a natural way to represent cooperating knowledge modules.
Hierarchy: This term refers to a kind of control structure for a computer program. The
typical hierarchical structure is one in which a function calls a subroutine, which processes
and then returns a value to that function. A
lines indicating "calling".
program is viewed as a tree
structure, with
Interact: a dynamic mode of communication between a human and a computer program.
The human reacts to what the program is printing out on his terminal, and the prDgram in
turn reacts to what the user types in. This may take the form of questioning and answering,
or interrupting and commenting.
Interestingness: Note that this is not a valid English word. In the context of AM, it refers to
a numeric value, computed by little Lisp programs stored in the "Interest" facets of various
concepts. Despite the danger of imbuing such a humble scheme with all the mystique of
what is and isn't interesting, it is felt that a sufficient component of that evaluation has been
captured to warrant the name. Pragmatically, it is of much more use to the user to see
"Interestingness of Compose has just risen" than to see a message like "G00034
incremented".
Kludge (or Kluge): This is a program feature which is an unfair shortcut around a specific
problem. One "kludgy" way of improving the algorithm of a given concept is to ask the user
for a better algorithm.
Linear: a system whose components, inputs, and outputs superimpose
 i.e., don't couple.
Lisp: a LIStProcessing programming language. Primitive operations exist for man pulating
nested list structures. Since Lisp functions are also merely lists, it is easy to create and
modify entities which are then executed (qv).
Modular Representations of Knowledge in AI Systems: Knowledge is partitioned into
packets (called modules, frames, units, productions, Beings, experts, Actors) along lines of:
different applicabilities, expertise, purpose, importance, generality, etc. Each packet is
structurally similar to all the rest. Advantages: By having the knowledge discretized, pieces
can be added and/or removed with no trouble. The knowledge of the system is easily
inspected and analyzed. The structural similarity yields several advantages: a simple control
system suffices to "run" all the knowledge, the modules can intercommunicate easily, new
modules can be inserted without knowing precisely "who else" is already in the system. In
general, the less similarlystructured the modules are, the simpler the intercommunication
media must be. Modular representation is a natural way to implement cocperating
knowledge sources.
Number: in the typical loose fashion of computer scientists, I intend this to mean a nonnegative integer: i.e., a natural number.
Open research problem: a limitation of the AM system.
I
Appendix I
AM
Discovery in Mathematics as Heuristic Search
171
Recur: Often, part of a definition will refer back to that very same definition. This may
lead to an infinite circular loop, or it may terminate. The following definition of "is larger
than" is recursive, because the last line recurs:
set R is larger than set S
if R={} but S/{}, or
if neither is empty and
Removeelement(R) is
I
I
I
I
I
I
I
larger than Removeelement(S).
Recurse: a transitive verb which means "to swear again." It must be distinguished from
"recur", above.
Side effects: while a function is executing, it may cause changes in the state of its
environment which persist even after the function has returned a value. This is like
hysteresis effects. For example, a function may create or destroy some list structure, define a
new function, reset some variable, etc. Such activities are called side effects of the function.
Space: The memory of a computer is quite finite. Though it may be supplemented by slow
auxilliary devices (tapes, discs, etc.), the actual number of storage cells in the computer's fast
"core" memory is a limiting factor in program behavior. Storage space, or just "space", refers
to these internal memory cells. When space is exhausted, the only remedy is to perform a
garbage collection (qv).
System: this can mean a computer program, and occasionally is just an another way of
referring to AM. In general, a system is any collection of entities related to form a
meaningful whole.
Terminal: a communications device for passing information between a computer system and
a human. This could be a teletype, a TV screen and keyboard, etc. The terminal is usually
portable and remotely located from the computer.
User: the human being who sits at a computer terminal and watches AM run (occasionally,
perhaps, interacting with AM).
I
172
I
I
I
I
I
I
I
I
I
I
I
Appendix 2. AM's Concepts
The first part of this huge appendix (Appendix 2.1.2 to 2.1.75) lists the set of knowledge
AM started with: its initial concepts. It is not very readable, nor is it central to any of the
ideas on which AM is based. The reader is therefore warned to proceed at his own risk
through this material.
Section 2 of this appendix contains a brief description of those concepts which were only
partially implemented in AM (e.g., "Destructiveop"). It was decided not to give each of
them a full "box" of their own.
The third part of this appendix lists a couple concepts as they were actually coded into
or heuristic rule  each bit of Lisp code
Lisp. The reader is shown which entry

corresponds to.
Finally, starting on page 224, a list is provided of some of the concepts which AM created.
This is intended not as an exhaustive catalog, but merely to show the breadth of what was
done by AM, the smart guesses and the lunacies. This list could have been pieced together
by studying Appendix 5, wherein some examples of AM in action are given. There the
reader may dynamically observe what kinds of concepts
and infer what kinds of entries
for their facets
 AM was able to derive from its initial base.
Appendix 2.1.
.
Each concept will be listed, followed by a description of the entries in each of its facets 1
For each such "slot", a condensation is provided (in English, LISP, and math notation) of
all the knowledge initially supplied to AM about that facet of that concept.
If there is any unmentioned facet for a concept, then it started out blank. Many of the
facets of the original concepts were left blank intentionally, knowing that AM would be able
to fill them in as well. After all, if you can fill in examples of any new concept, you ought to
be able to fill in examples of Sets!
The concepts are grouped semantically, much like the tree shown on page 105, like the
order in which heuristics are listed in Appendix 3. This section of the appendix is
prefaced by an index which is arranged alphabetically, since the primary use of it will
probably be as an encyclopedia. When the reader encounters a poorlynamed or poorlyexplained concept somewhere in the text, he may wish to glance first at Chapter 5, page 107,
where very brief definitions of the concepts are also given alphabetically. If that
"dictionary" is insufficent, he can turn to the appropriate page in this appendix, and see the
same concept presented in much more detail.
Each of theaa entries
I
Initial Concepts
waa aupplied by hand, by the author.
I
AM
Appendix 2
Appendix
CONCEPT
Active
Allbutthefirstelemenl:
Allbutthelastelement
Anyconcept
Anything
Atomobj
BagDelete
BagDiff
Baginsert
BagIntersect
BagUnion
Bags
Canonize
Coalesce
Compose
Conjecture
ConstantFalse
Constantpredicate
ConstantTrue
Delete
Difference
Emptystructure
Firstelement
Identity
Insert
Intersect
Invertanoperation
Invertedop
Lastelement
ListDelete
ListDiff
Listinsert
ListIntersect
ListUnion
Lists
Logicalcombination
Member
Discovery in Mathematics as Heuristic Search
2.1.1
Index
PAGE
175
201
202
174
174
208
184
194
182
189
191
212
196
195
178
207
177
176
176
183
192
211
201
204
1 79
186
205
205
200
184
192
182
186
190
213
206
202
to
Initial
173
I
Concepts
CONCEPT
PAGE
Multipleelementsstructure
Nomultipleelementsstructure
Nonemptystructure
210
21 1
211
Object
Objectequality
207
176
Operation
OrdStructure
Orderedpairs
OsetDelete
OsetDiff
Osetinsert
OsetIntersect
OsetUnion
Osets
Parallel join
177
210
213
185
193
181
Parallelreplace
Parallelreplace
2
197
197
2
203
203
187
Projection!
Projection
Relation
175
206
198
Repeat 2 198
Restrict
204
Reverseordpair
200
SetDelete
183
SetDiff
193
Setinsert
180
SetIntersect
188
SetUnion
191
Sets
212
Structure
209
StructureofStructures
209
Truthvalue
208
Union
189
UnordStructure
210
Repeat
I
I
I
I
190
2 14
199
Paralleljoin 2 199
Predicate
I
I
I
I
I
I
I
I
I
I
Appendix 2
AM
Discovery in Mathematics as Heuristic Search
Appendix 2.1.2
174
Anything
Name(s): Anything, Entity, Thing, Item
Definitions:
I
I
NonRecursive, Trivial, Quick: X () T
Specializations: Anyconcept, Nonconcepts
Generalizations: none
Examples: Anything, Anyconcept
Isas: Anyconcept
Worth: 100
Interest: 5 heuristics (see Appendix 3.1, page 229).
Sugg: 5 heuristics
Indomainof: Delete, Insert , Member, Projl, Proj2, Identity, Constantpred.
Inrangeof: Firstele, Lastele, Member, Projl, Proj2, Identity.
"1
I
I
I
Appendix 2.1.3
Anyconcept
Name(s): Anyconcept, AnyBeing, Anybody
Definitions:
NonRecursive, Opaque, Quick: X (x) FMEMB(x,Concepts)
NonRecursive, Opaque, Quick: X (x) GETP(x,Name)
Specializations: Active, Object
Generalizations: Anything
Examples: Anything, Anyconcept, Active, Object
Isas: Anything, Anyconcept
Worth: 100
View: to view any X as if it were aY, find an op. whose domain contains X,
and whose range is contained in V, and apply that op. to the given X.
Fillin: 39 heuristics (see Appendix 3.2, beginning on page 230).5
Check: 20 heuristics
Interest: 21 heuristics
Sugg: 30 heuristics
I
2 In general, this
appendix will omit heuristics. They will instead be presented in one big collection, as the next appendix. For
each concept, we will however mention how many heuristics of each variety are preaent. Tha intereated
reader
turn immediately to Appendix 3 if he desires, to see those heuristic rules.
3 All four specializationa of each of Delete (e.g., Bagdelete) and Insert (eg., Listinsert) are also listed here.
That ie, the domain of the operation is DlxD2xD3.., and X is a *üb**t of some Di, a specialization of Di.
Ac ueuel, the heuriatic*
listed in Appendix 3, not here. But the reader i* forewarned that thi* conc*pt ha* *o many
heuriatic* that they are grouped by facet in the next appendix, occupying Appendicee 3.2.1 through
3 2 8, pages 230 to 251.
Appendix 2
AM
Discovery in Mathematics as Heuristic Search
Appendix 2.1.4
175
Active
Name(s): Active, activity, action
Definitions:
Sufficient, NonRecursive, Quick: X (x) GETP(x.Algorithms)
Sufficient, NonRecursive, Quick: X (x) GETP(x,Dom/range)
Specializations: Predicate, Relation, Operation
Generalizations: Anyconcept
Examples: none.6
Isas: Anyconcepl
Indomainof: Constructive, Destructive, Coalesce, Compose, Restrict
Inrangeof: Compose, Coalesce, Restrict.
Worth: 100
Fillin: 7 heuristics.
Check: 4 heuristics
Interest: 3 heuristics
Sugg: 10 heuristics
Appendix 2.1.5
Predicate
Name(s): Predicate, sometimes: logical operation, Boolean function.
Definitions:
Nonrecursive quick opaque: X (P) Range(P) is Truthvalue; i.e., {T,F}.
Generalizations: Active
Examples: Equality, Constructive, Destructive, Empty, Nonempty, Constantpred,
the Defn entries of each concept.
Indomainof: Canonize
Worth: 100
Fillin: 2 heuristics.
Sugg: 1 heuristic
Interest: 1 heuristic.
Recall that each active will bo an example of an operation, predicate, etc , hence need not be pointed to expli itly here
Thu* the predicate 'Empty', while it exi*t* in AM, i* superflous, since the definition facet of 'Emptyetrut containa that
very predicate
I
Appendix 2
AM
Diacovery in Mathematics
Appendix 2.1.6
Heuriatic Search
Objectequality
Name(s): Equality, Object equality, Objequal, Equal, Same.
I
I
Definitions:
Nonrecursive opaque: X (x,y) EQUAL(x,y)
Sufficient, very quick, opaque: X (x,y) EQ(x,y)).
Recursive slow: X (x,y) x and y are both identical atoms,
or x and y are both empty structures,
or x and y are both nonempty structures and
Equality.Defn(CAß(x),CAß(y)) and
Equality .Defn(CDR(x),CDR(y)).
Nonrecursive transform slow: X (x y) ldentity.Defn(x,y)
Quick: X (x,y) y«Equelity.Algs(x).
Domain/range: