Theory (mathematics): Difference between revisions

From Citizendium
Jump to navigation Jump to search
imported>Boris Tsirelson
mNo edit summary
 
(55 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{subpages}}
{{subpages}}
Similarly to a living thing, mathematics is a unity within an environment, yet apart from it — a compartment of a larger whole, structurally distinguishable though not functionally completely isolated from or closed to its surroundings.<ref>This phrase is borrowed from "[[Life#&nbsp;&nbsp;&mdash;&nbsp;&nbsp;Cells|Life]]".</ref>
{{Image|KleinBottles.jpg|right|350px}}
{{Image|Sherka16.jpg|right|350px}}


==Axiomatic or non-axiomatic==
==Mathematics is structurally distinguishable==
 
===Axiomatic or non-axiomatic===
There are two possible approaches to mathematics, called by R. Feynman "the Babylonian tradition" and "the Greek tradition". They answer differently the question, whether or not some mathematical facts are more fundamental than others, more consequential facts. The same approaches apply to any theory, mathematical or not.
There are two possible approaches to mathematics, called by R. Feynman "the Babylonian tradition" and "the Greek tradition".<ref>{{harvnb|Feynman|1995|loc=Sect. 2, page 46}}.</ref> They answer differently the question, whether or not some mathematical facts are more fundamental than others, more consequential facts. The same approaches apply to any theory, mathematical or not.


The Babylonian (non-axiomatic) tradition treats a theory as a network whose nodes are facts and connections are derivations. If some facts are forgotten they probably can be derived from the others.
The Babylonian (non-axiomatic) tradition treats a theory as a network whose nodes are facts and connections are derivations. If some facts are forgotten they probably can be derived from the others.
Line 9: Line 12:
The Greek (axiomatic) tradition treats a theory as a tower of more consequential facts called theorems, grounded on the basis of more fundamental facts called axioms. If all theorems are forgotten they surely can be derived from the axioms. The axioms are sparse and simple, not to be forgotten.
The Greek (axiomatic) tradition treats a theory as a tower of more consequential facts called theorems, grounded on the basis of more fundamental facts called axioms. If all theorems are forgotten they surely can be derived from the axioms. The axioms are sparse and simple, not to be forgotten.


==Monotonic or non-monotonic==
===Monotonic or non-monotonic===
 
The distinction between these two approaches is closely related to the distinction between monotonic and non-monotonic logic. These answer differently the question, whether or not a fact can be retracted because of new evidence. Monotonic logic answers in the negative, non-monotonic logic answers in the affirmative.
The distinction between these two approaches is closely related to the distinction between monotonic and non-monotonic logic. These answer differently the question, whether or not a fact can be retracted because of new evidence. Monotonic logic answers in the negative, non-monotonic logic answers in the affirmative.


Line 19: Line 21:
The axiomatic approach is inflexible. A theorem cannot be retracted without removing (or replacing) at least one axiom, which usually has dramatic consequences for many other theorems. Nowadays this approach is widely used in mathematics.
The axiomatic approach is inflexible. A theorem cannot be retracted without removing (or replacing) at least one axiom, which usually has dramatic consequences for many other theorems. Nowadays this approach is widely used in mathematics.


The non-axiomatic approach is well suited when new evidence often comes from the outside. The axiomatic approach is well suited for a theory that advances only by extracting new consequences from an immutable list of axioms. It may seem that such a development must be dull. Surprisingly, this is an illusion. Being inflexible in some sense, an axiomatic theory can be very flexible in another sense (see below).
The non-axiomatic approach is well suited when new evidence often comes from the outside. The axiomatic approach is well suited for a theory that advances only by extracting new consequences from an immutable list of axioms. It may seem that such a development must be dull. Surprisingly, this is an illusion. Being inflexible in some sense, an axiomatic theory can be very flexible in another sense, see [[#Flexible or inflexible|Sect. 2.2]].
 
===Sharp or fuzzy; real or ideal===
A fair coin is tossed 1000 times; can it happen at random that heads is obtained all the 1000 times? Emile Borel, a famous mathematician, answers in the negative, with an important reservation:
<blockquote>Such is the sort of event which, though its ''impossibility'' may not be rationally demonstrable, is, however, so unlikely that no sensible person will hesitate to declare it actually impossible.<ref>{{harvnb|Borel|1962|loc=page 3}}.</ref></blockquote>
 
Why `may not be rationally demonstrable'? Why does this statement remain outside the list of Borel's theorems in probability theory?
 
Mathematical truth is sharp, not fuzzy. Every mathematical statement is assumed to be either true or false (even if no one is able to decide) rather than "basically true", "true for all practical purposes" etc. If ''n'' heads out of ''n'' times is an impossible event for ''n''=1000 then there must exist the least such ''n'' (be it known or not). Say, 665 heads can occur, but 666 heads cannot. Then, assuming that 665 heads have just appeared, we see that the next time tails is guaranteed, which contradicts to the assumed memoryless behavior of the coin.


==Computers metaphor==
Another example. The number 4 can be written as a sum of units: 1+1+1+1. Can the number 2<sup>1000</sup> be written as a sum of units?  Mathematics answers in the affirmative. Indeed, if you can write, say, 2<sup>665</sup> as a sum of units, then you can do it twice (connecting the two copies by the plus sign). Complaints about limited resources, appropriate in the real world, are inappropriate in the imaginary, highly idealized mathematical universe.


A conceptual metaphor helps to understand one conceptual domain in terms of another. For example, the desktop metaphor treats the monitor of a computer as if it is the user's desktop, which helps to a user not accustomed to the computer. Nowadays many are more accustomed to computers than to mathematics. Thus, analogies with computers may help to understand mathematics. Such analogies are widely used below.
===Defined or undefined===
Facts are formulated via notions.
 
In the non-axiomatic approach, notions are nodes of a network whose connections are definitions. If some notions are forgotten they probably can be restored from the others.
 
Searching Google for "define:line" we get "a length without breadth or thickness". Similarly we find definitions for breadth, thickness and so on, recursively. Doing so we would get a large subnetwork; here is its small fragment:
 
*Line: a length without breadth or thickness
**Length: linear extent in space
***Linear: along a line↑
***Extent: a range of locations
****Location: point or extent↑ in space
***Space: unlimited expanse in which everything is located↑
**Breadth: the extent↑ from side to side
***Side: a surface forming part of the outside of an object
****Surface: the extended↑ two-dimensional outer boundary of a three-dimensional object
**Thickness: the dimension through an object as opposed to its length↑ or width
***Width: the extent↑ from side↑ to side↑


==Flexible or inflexible==
(Up arrows mean: see above.) We observe that
* circularity appears routinely; for example: line→length→linear→line;
* the definition of a single notion involves recursively a large number of other, quite remote notions.


In 1960s a computer was an electronic monster able to read from a punch tape simple [[Instruction set architecture|instructions]] stipulated by the hardware and execute them quickly, thus performing a lot of boring calculations. Nowadays some parents complain that personal computers are too fascinating. However, without software a personal computer is only able to read (say, from a compact disk) and execute instructions stipulated by the hardware. These instructions are now as technical as before: simple arithmetical and logical operations, loops, conditional execution etc. A computer is dull, be it a monster of 1960s or a nice looking personal computer, unless programmers develop fascinating combinations (called programs) of these technical instructions.
Such system of notions is unsuitable for a mathematical theory. Here, circularity is disallowed, and the set of involved notions is kept reasonably small (whenever possible). (See also [[Semantic primes]] for an attempt to disclose a common mathematics-like structure behind all natural languages.)


For a programmer, the instruction set of a given computer is an immutable list. The programmer cannot add new elements to this list, nor modify existing elements. In this sense the instruction set is inflexible. New programs are only new combinations of the given elements. Does it mean that program development is dull? In no way! A good instruction set is universal. It means that a competent programmer feels pretty free to implement any well-understood algorithm provided that the time permits (which is usually most problematic) and the memory and the speed of the computer are high enough (which is usually less problematic). In this sense a good instruction set is very flexible.
In the axiomatic approach, notions are a tower of defined notions, grounded on the basis of more fundamental notions called undefined primitives. If all defined notions are forgotten they surely can be restored from the undefined primitives. The undefined primitives are sparse and simple, not to be forgotten.


Likewise a mathematician working within a given theory faces its axioms as an immutable list. However, this list can be universal in the sense that a competent mathematician feels pretty free to express any clear mathematical idea. In this sense some axiomatic systems are very flexible.
Curiously, when a non-mathematical encyclopedia contains an article on a mathematical notion, two very different "definitions" may appear, one general (informal), the other mathematical (formal).


==Universal or specialized==
From now on, in this article, "definition" means a mathematical definition (unless explicitly stated otherwise).


For a user, the software of a computer is first of all a collection of applications (games, web browsers, media players, word processors, image editors etc.) All applications function within an operating system (Windows, MacOS, Linux etc.) The operating system is a universal infrastructure behind various specialized applications. Each application deals with relevant files. The operating system maintains files in general, and catalogs (directories) containing files and possibly other catalogs.
The lack of definition of a primitive notion does not mean lack of any information about this notion. Axioms provide such information, to be used in proofs. Informal (intuitive) understanding of a primitive notion is communicated in a natural language. This information cannot be used in proofs, but is instrumental when guessing what to prove, how to prove, how to apply proved theorems and, last but not least, what to postulate by axioms.


Likewise, the [[set theory]] is a universal infrastructure behind various specialized mathematical theories (algebra, geometry, analysis etc.)  Each specialized mathematical theory deals with relevant objects, relations and sets. The sets theory deals with [[set]]s in general, possibly containing other sets, and reduces objects and relations to sets.
==Mathematics is not isolated==
===Computers metaphor===
A conceptual metaphor helps to understand one conceptual domain in terms of another. For example, the desktop metaphor treats the monitor of a computer as if it is the user's desktop, which helps to a user not accustomed to the computer. Nowadays many are more accustomed to computers than to mathematics. Thus, analogies with computers may help to understand mathematics. Such analogies are widely used below.


==Univalent or multivalent==
===Flexible or inflexible===
In 1960s a computer was an electronic monster able to read from a punch tape simple [[Instruction set architecture|instructions]] stipulated by the hardware and execute them quickly, thus performing a lot of boring calculations. Nowadays some parents complain that personal computers are too fascinating. However, without software a personal computer is only able to read (say, from a compact disk) and execute instructions stipulated by the hardware. These instructions are now as technical as before: simple arithmetical and logical operations, loops, conditional execution etc. A computer is dull, be it a monster of 1960s or a nice looking personal computer, unless programmers develop fascinating combinations (called programs) of these technical instructions.


Plane geometry (called also "planar geometry") is a part of solid geometry that restricts itself to a single plane ("the plane") treated as a geometric universe. The question "which plane?" is inappropriate, since planes do not differ in their geometric properties. Every two planes α, β are isomorphic, that is, there exists an isomorphism ''f'' between α and β. Treating α and β as sets of points one defines isomorphism as an invertible (one-to-one and onto) map ''f'' : α → β preserving all primitive relations. Namely: ''f'' maps lines into lines; the distance between ''f''(''A'') and ''f''(''B'') on β is equal to the distance between ''A'' and ''B'' on α; etc. The same is required of the inverse map <math>f^{-1}:\beta\to\alpha</math>.
For a programmer, the instruction set of a given computer is an immutable list. The programmer cannot add new elements to this list, nor modify existing elements. In this sense the instruction set is inflexible. New programs are only new combinations of the given elements. Does it mean that program development is dull? In no way! A good instruction set is universal. It means that a competent programmer feels pretty free to implement any well-understood algorithm provided that the time permits (which is usually most problematic) and the memory and the speed of the computer are high enough (which is usually less problematic). In this sense a good instruction set is very flexible.


Axioms of the plane Euclidean geometry leave no freedom, they determine uniquely all geometric properties of the plane. More exactly: all Euclidean planes are mutually isomorphic. In this sense we have "the" Euclidean plane. In terms of Bourbaki, the plane
Likewise a mathematician working within a given theory faces its axioms as an immutable list. However, this list can be universal in the sense that a competent mathematician feels pretty free to express any clear mathematical idea. In this sense some axiomatic systems are very flexible.
Euclidean geometry is an univalent theory. In contrast, axioms of a linear space (called also vector space) leave a freedom: a linear space may be one-dimensional, two-dimensional, three-dimensional, four-dimensional and so on (infinite dimension is also possible). The corresponding theory is multivalent.


According to Bourbaki, the study of multivalent theories is the most striking feature which distinguishes modern mathematics from classical mathematics.
===Universal or specialized===
For a user, the software of a computer is first of all a collection of applications (games, web browsers, media players, word processors, image editors etc.) All applications function within an operating system (Windows, MacOS, Linux etc.) The operating system is a universal infrastructure behind various specialized applications. Each application deals with relevant files. The operating system maintains files in general, and catalogs (directories) containing files and possibly other catalogs.


A similar idea occurs in mathematical logic: a theory is called categorical if all its models are mutually isomorphic. However, for Bourbaki a theory is embedded into the set theory, while in logic a theory is standalone (embedded into the first-order logic).
Likewise, the [[set theory]] is a universal infrastructure behind various specialized mathematical theories (algebra, geometry, analysis etc.)  Each specialized mathematical theory deals with relevant objects, relations and sets. The sets theory deals with [[set]]s in general, possibly containing other sets, and reduces objects and relations to sets. Alternatively, [[Proof assistant#Higher-order logic versus theory of sets|higher-order logic]] can be used as such a universal infrastructure, more convenient for [[Proof assistant|computer-assisted formalization of mathematics]]. Also [[category theory]] pretends to the throne of the set theory.<ref>{{harvnb|Lawvere|Rosebrugh|2003}}.</ref>


==Motivated or indiscriminate==
In many cases a specialized mathematical theory is a theory of mathematical structures of some kind, often called [[space (mathematics)|space]]s (of this kind). For example: Euclidean geometry is a theory of Euclidean spaces; topology is a theory of topological spaces; linear algebra is a theory of linear spaces. Differential geometry investigates smooth manifolds. Algebra investigates groups, rings, fields etc.


===Motivated or indiscriminate===
Monkeys could type into a computer a sequence of hardware instructions; the computer could execute them; but the result of such "programming" has almost no chance to be fascinating or useful. Fascinating computer games reflect human predilections. Useful programs reflect human needs. A computer is dull for humans unless its software reflects human life in one way or another.
Monkeys could type into a computer a sequence of hardware instructions; the computer could execute them; but the result of such "programming" has almost no chance to be fascinating or useful. Fascinating computer games reflect human predilections. Useful programs reflect human needs. A computer is dull for humans unless its software reflects human life in one way or another.


Likewise, a theorem is of no interest for humans unless it is motivated in one way or another by human life. The motivation may be quite indirect; many theorems "only" help to prove other theorems, many are appreciated "only" for their aesthetic value, etc. But some kind of motivation is necessary. Indiscriminate stream of logical consequences of the axioms is not publishable in the mathematical literature.
Likewise, a theorem is of no interest for humans unless it is motivated in one way or another by human life. The motivation may be quite indirect; many theorems "only" help to prove other theorems, many are appreciated "only" for their aesthetic value, etc. But some kind of motivation is necessary. Indiscriminate stream of logical consequences of the axioms is not publishable in the mathematical literature.


Note that "a theorem" does not mean "a motivated theorem", "an important theorem" etc., not even "an already discovered theorem". All theorems are just an indiscriminate stream of logical consequences of the axioms, including the axioms themselves.
Note that "a theorem" does not mean "a motivated theorem", "an important theorem" etc., not even "an already discovered theorem". All theorems are just an indiscriminate stream of logical consequences of the axioms.


Theorems of a theory are, by definition, statements that follow from the given axioms according to the given rules (called by different authors inference rules, derivation rules, deduction rules, transformation rules).
Theorems of a theory are, by definition, statements that follow from the given axioms according to the given rules (called by different authors inference rules, derivation rules, deduction rules, transformation rules), including the axioms themselves.


==Rigor or intuition==
===From technical to human: definitions===
The gap between a bare hardware and a nice application is too wide for a single jump, or even a triple jump (hardware – operating system – programming language – application). Bridging the gap is a laborious task for many programmers. They compose programs of modules, and modules of subroutines. Each subroutine reduces a bit more useful task to a bit simpler tasks. Ultimately, a useful (or even fascinating) task is reduced to the technical instructions of the bare hardware.


Mathematics as a cultural process is not bound by monotonic logic. For example, it was treated till 1872 as intuitively evident fact that any curve on the plane has a tangent line everywhere except maybe a discrete set of points. However, this "fact" was refuted and retracted. "A century later, we have seen so many monsters of this sort that we have become a little blase. But the effect produced on the majority of nineteenth century mathematicians ranged from disgust to consternation." (Bourbaki, page 311)
Likewise, mathematicians bridge the wide gap between useful notions (say, "ellipse" or "normal distribution") and the undefined primitives by a large and complicated system of definitions. Each definition reduces a bit more useful notion to a bit more primitive notions.
 
Mathematical definitions are very diverse.
 
A definition may be just an abbreviation local to a single calculation, like this: "denoting for convenience <math>x^2+x+1</math> by <math>a</math> we have...".


Since then, intuition is not a valid argument in mathematical proofs. A mathematical theory is bound by monotonic logic. If needed, mathematics as a cultural process can retract an axiomatic theory as whole (rather than some theorems) and accept another axiomatic theory. (Moreover, in a remote perspective the axiomatic approach could be abandoned.)
A single definition may embed a whole specialized mathematical theory into the universal set-theoretic framework, like this:


==Sharp or fuzzy; real or ideal==
"A Euclidean space consists, by definition, of three sets, whose elements are called points, lines and planes respectively, and six relations, one called betweenness, three called containment and two called congruence, satisfying the following conditions: ..."


A fair coin is tossed 1000 times; can it happen at random that heads is obtained all the 1000 times? Emile Borel, a famous mathematician, answers in the negative, with an important reservation: "Such is the sort of event which, though its ''impossibility'' may not be rationally demonstrable, is, however, so unlikely that no sensible person will hesitate to declare it actually impossible." (Borel, page 9)
A definition may be given mostly for speaking more concisely, like this: "in a triangle, the altitude of a vertex is, by definition, its distance to the line through the other two vertices".


Why `may not be rationally demonstrable'? Why does this statement remain outside the list of Borel's theorems in probability theory?
A definition may introduce a new revolutionary concept, like this: "the derivative of a function at a point is, by definition, the limit of the ratio..."


Mathematical truth is sharp, not fuzzy. Every mathematical statement is assumed to be either true or false (even if no one is able to decide) rather than "basically true", "true for all practical purposes" etc. If ''n'' heads out of ''n'' times is an impossible event for ''n''=1000 then there must exist the least such ''n'' (be it known or not). Say, 665 heads can occur, but 666 heads cannot. Then, assuming that 665 heads have just appeared, we see that the next time tails is guaranteed, which contradicts to the assumed memoryless behavior of the coin.
<blockquote>Some mathematicians will tell you that the main aim of their research is to find the right definition, after which their whole area will be illuminated. ... For other mathematicians, the main purpose of definitions is to prove theorems... <ref>{{harvnb|Gowers|2008|loc=pages 74–75}}.</ref></blockquote>


Another example. The number 4 can be written as a sum of units: 1+1+1+1. Can the number 2<sup>1000</sup> be written as a sum of units?  Mathematics answers in the affirmative. Indeed, if you can write, say, 2<sup>665</sup> as a sum of units, then you can do it twice (connecting the two copies by the plus sign). Complaints about limited resources, appropriate in the real world, are inappropriate in the imaginary, highly idealized mathematical universe.
==Principles and practices==
===Rigor or intuition===
Mathematics as a cultural process is not bound by monotonic logic. For example, it was treated till 1872 as intuitively evident fact that any curve on the plane has a tangent line everywhere except maybe a discrete set of points. However, this "fact" was refuted and retracted.
<blockquote>A century later, we have seen so many monsters of this sort that we have become a little blasé. But the effect produced on the majority of nineteenth century mathematicians ranged from disgust to consternation.<ref>{{harvnb|Bourbaki|1968|loc=page 311}}.</ref></blockquote>


==Formal or formalizable==
Since then, intuition is not a valid argument in mathematical proofs. A mathematical theory is bound by monotonic logic. If needed, mathematics as a cultural process can retract an axiomatic theory as whole (rather than some theorems) and accept another axiomatic theory. (Moreover, in a remote perspective the axiomatic approach could be abandoned.)


===Formal or formalizable===
An example. If <math>a^2+2b^2=1</math> then <math>a^4+4b^4=(1-2ab)(1+2ab)</math>, since <math> a^4 + 4b^4 = a^4 + 4a^2 b^2 + 4b^4 - 4a^2 b^2 = (a^2+2b^2)^2 - 4a^2 b^2 = 1 - 4a^2 b^2 = (1-2ab)(1+2ab)</math>. This simple algebraic argument can be checked by a human or even a computer. No need to understand the meaning of the statement; it is sufficient to know the relevant formal rules. The statement follows from the given rules only, rigorously, without explicit or implicit use of intuition. A formal proof guarantees that a theorem will never be retracted.
An example. If <math>a^2+2b^2=1</math> then <math>a^4+4b^4=(1-2ab)(1+2ab)</math>, since <math> a^4 + 4b^4 = a^4 + 4a^2 b^2 + 4b^4 - 4a^2 b^2 = (a^2+2b^2)^2 - 4a^2 b^2 = 1 - 4a^2 b^2 = (1-2ab)(1+2ab)</math>. This simple algebraic argument can be checked by a human or even a computer. No need to understand the meaning of the statement; it is sufficient to know the relevant formal rules. The statement follows from the given rules only, rigorously, without explicit or implicit use of intuition. A formal proof guarantees that a theorem will never be retracted.


A universal formal language conceived by Gottfried Leibniz near 1685, "capable of expressing all humans thoughts unambiguously, of strengthening our power of deduction, of avoiding errors by a purely mechanical effort of attention" [Bourbaki, p.302] was partially implemented in mathematics two centuries later, especially, as a formal language of set theory. In the form introduced by Bourbaki this language contains four logical signs, three specific signs, and letters. Its rules are considerably more complicated than elementary algebra.
A universal formal language conceived by Gottfried Leibniz near 1685,
<blockquote>capable of expressing all humans thoughts unambiguously, of strengthening our power of deduction, of avoiding errors by a purely mechanical effort of attention<ref>{{harvnb|Bourbaki|1968|loc=page 302}}.</ref></blockquote>
was partially implemented in mathematics two centuries later, especially, as a formal language of set theory. In the form introduced by Bourbaki this language contains four logical signs, three specific signs, and letters. Its rules are considerably more complicated than elementary algebra.


However, this formal language is never used by a typical mathematician. Likewise, most programmers never use the instructions of the computer hardware. Instead, they develop applications in appropriate programming languages (BASIC, Java, C++, Python etc). A statement of a programming language corresponds usually to many hardware instructions; the correspondence is established by special programs (compilers and interpreters). A programming language is geared to programmers, hardware instructions --- to hardware.
However, this formal language is never used by a typical mathematician. Likewise, most programmers never use the instructions of the computer hardware. Instead, they develop applications in appropriate programming languages (BASIC, Java, C++, Python etc). A statement of a programming language corresponds usually to many hardware instructions; the correspondence is established by special programs (compilers and interpreters). A programming language is geared to programmers, hardware instructions --- to hardware.
Line 88: Line 132:
A programmer, when coding a program, knows that the computer does exactly what it is told to do. In contrast, an author or speaker, when communicating with other people, can be ambiguous, make small errors, and still expect to be understood. Pseudocode is a description of an algorithm, intended for human reading, that can be converted into a code by any competent programmer. A code is formal; a pseudocode is formalizable.
A programmer, when coding a program, knows that the computer does exactly what it is told to do. In contrast, an author or speaker, when communicating with other people, can be ambiguous, make small errors, and still expect to be understood. Pseudocode is a description of an algorithm, intended for human reading, that can be converted into a code by any competent programmer. A code is formal; a pseudocode is formalizable.


Likewise, communication between mathematicians is made in a formalizable language. "In practice, the mathematician ... is content to bring the exposition to a point where his experience and mathematical flair tell him that translation into formal language would be no more than an exercise of patience (though doubtless a very tedious one)." (Bourbaki, page 8)
Likewise, communication between mathematicians is made in a formalizable language.
<blockquote>In practice, the mathematician ... is content to bring the exposition to a point where his experience and mathematical flair tell him that translation into formal language would be no more than an exercise of patience (though doubtless a very tedious one).<ref>{{harvnb|Bourbaki|1968|loc=page 8}}.</ref></blockquote>


A mathematician, using a formalizable language, need not know all technical details of the underlying formal language. Many equivalent formal languages may be used interchangeably. Likewise, a single pseudocode can be used when coding in different programming languages, as long as these are not too far from each other.
A mathematician, using a formalizable language, need not know all technical details of the underlying formal language. Many equivalent formal languages may be used interchangeably. Likewise, a single pseudocode can be used when coding in different programming languages, as long as these are not too far from each other.


==Axiomatic or axiomatizable==
===Axiomatic or axiomatizable===
 
{{Image|Latgen.png|right|400px|Two finite lists of vectors generate (by summation) the same infinite set of vectors. Similarly, different finite lists of axioms can be used for generating the same infinite set of theorems.}}
{{Image|Latgen.png|right|400px|Two finite lists of vectors generate (by summation) the same infinite set of vectors. Similarly, different finite lists of axioms can be used for generating the same infinite set of theorems.}}
A mathematician, working within an axiomatic theory, need not remember the list of axioms in full detail. It is sufficient to remember a list of theorems that contains all axioms or, more generally, such that each axiom follows easily from these theorems. (Here, as usual, "theorems" are treated inclusively, as "axioms and theorems".)
A mathematician, working within an axiomatic theory, need not remember the list of axioms in full detail. It is sufficient to remember a list of theorems that contains all axioms or, more generally, such that each axiom follows easily from these theorems. (Here, as usual, "theorems" are treated inclusively, as "axioms and theorems".)


Two lists of axioms are called equivalent if every axiom of each list follows from the axioms of the other list. Such lists of axioms may be used interchangeably, since they generate the same theorems.
Two lists of axioms are called equivalent if every axiom of each list follows from the axioms of the other list. Such lists of axioms may be used interchangeably, since they generate the same theorems.


==Formalizable in reality or in principle==
==Some details==
===Consistent or inconsistent===
If a theory states that 2+2=5, it is a paradox but not yet a contradiction. By "paradox" people may mean
*a contradiction;
*an apparent contradiction;
*something counterintuitive;
*something surprising;
*something ironic;
etc. In contrast, a contradiction (in a mathematical theory) is, by definition, a pair of theorems (of the given theory) such that one is the negation of the other. Thus, two theorems
: <math>2+2=4,</math>
: <math>2+2=5</math>
are still not a contradiction. Two theorems
: <math>2+2=4,</math>
: <math>2+2\ne4</math>
are a contradiction.
 
If a contradiction exists in a given theory, this theory is called inconsistent. Otherwise, if no contradictions exist (rather than merely not found for now), the theory is called consistent.
 
For a mathematician, an inconsistent theory is completely useless. Some philosophers disagree:
<blockquote>Superstitious dread and veneration by mathematicians in face of a contradiction<ref>[[Ludwig Wittgenstein]].</ref></blockquote>
But a mathematician insists: an inconsistent theory is completely useless, since ''all'' statements (in the given language) are theorems! The reason is, proof by contradiction.
No matter which statement ''X'' is in question, we always can prove ''X'' as follows:
* Assume that ''X'' is false;
* ... (put the proof of the contradiction here);
* the assumption leads to a contradiction, thus ''X'' is true.
It is tempting to object that the contradiction has nothing in common with the assumption and therefore cannot invalidate it. However, the rules of formal logic do not demand that the contradiction has something in common with the assumption. Some attempts to change these rules were made (so-called "relevance logic", or "relevant logic"), but with little success. It is always possible to obfuscate the proof of the contradiction, making it seemingly entangled with ''X''. We have no formal criterion able to unmask any possible fictitious participation of ''X'' in the proof of the contradiction.
 
David Hilbert aimed to find axioms sufficient for all mathematics and to prove their consistency from the assumption that the "finitary arithmetic" (a subsystem of the usual arithmetic of the positive integers, chosen to be philosophically uncontroversial) was consistent. A fatal blow was dealt by the second Gödel's [[incompleteness theorem]]. Consistency of a theory cannot be proved by a weaker theory, nor by the same theory. It can be proved by a stronger theory, which does not dispel doubts: if the given theory is inconsistent then the stronger theory, being all the more inconsistent, can prove every claim, be it true or false.


The integer 1 is a very simple example of a mathematical object that can be defined in the language of set theory. Its definition can be translated from this formalizable language into any corresponding formal language, in particular, into the formal language introduced by Bourbaki. However, the length of the resulting formal definition is equal to 4,523,659,424,929. It would take about 4000 gigabytes of computer memory! More complicated mathematical definitions are much longer. Clearly, such formalization is possible only in principle, similarly to the possibility of writing 2<sup>1000</sup> as a sum of units.
Many mathematicians feel that specialized theories, being more reliable than universal theories, are like watertight compartments. If a contradiction will be found in the used universal theory, specialized theories will separate and wait for a better universal theory.


This formal language is so verbose because of substitution. Here is an algebraic example: substituting <math>x=(a+1)/(a^2+1)</math> into <math>(x^2-x+1)/(x^2+x+1)</math> gives an expression of length 64, containing 8 occurrences of ''a''. Further, substitute <math>a=u^3-2u^2-3u+4</math> and you get an expression of length 152, containing 24 occurrences of ''u''. And so on.
<blockquote>I have always felt that, if one day someone came up with a contradiction in mathematics, I would just say, "Well, those crazy logicians are at it again," and go about my business as I was going the day before.<ref>Vaughan Jones. See {{harvnb|Casacuberta|Castellet|1992|loc=page 91}}.</ref></blockquote>


A formal language intended for practical use stipulates a kind of abbreviations. Especially, hardware instructions of any computer stipulate subroutine call (in some form). A subroutine of 300 instructions can be called 20 times from another subroutine of 300 instructions; thus, 6300 operations are performed by a program of only 600 instructions. Also the second subroutine can be called many times, and so on.
===Univalent or multivalent===
Plane geometry (called also "planar geometry") is a part of solid geometry that restricts itself to a single plane ("the plane") treated as a geometric universe. The question "which plane?" is inappropriate, since planes do not differ in their geometric properties. Every two planes α, β are isomorphic, that is, there exists an isomorphism ''f'' between α and β. Treating α and β as sets of points one defines isomorphism as an invertible (one-to-one and onto) map ''f'' : α → β preserving all primitive relations. Namely: ''f'' maps lines into lines; the distance between ''f''(''A'') and ''f''(''B'') on β is equal to the distance between ''A'' and ''B'' on α; etc. The same is required of the inverse map <math>f^{-1}:\beta\to\alpha</math>.


There exist programming languages intended for theoretical rather than practical use. A universal Turing machine gives an example. Such formal languages usually do not stipulate abbreviations. Accordingly, a practically useful algorithm translated into such formal language gives an exponentially long program, useful in principle but useless in practice.
Axioms of the plane Euclidean geometry leave no freedom, they determine uniquely all geometric properties of the plane. More exactly: all Euclidean planes are mutually isomorphic. In this sense we have "the" Euclidean plane. In terms of Bourbaki, the plane
Euclidean geometry is an univalent theory. In contrast, axioms of a linear space (called also vector space) leave a freedom: a linear space may be one-dimensional, two-dimensional, three-dimensional, four-dimensional and so on (infinite dimension is also possible). Linear algebra is multivalent.


There exist also formal mathematical languages intended for practical use, for example, [[Mizar]]. These stipulate abbreviations.
Axioms of topology leave much more freedom; every subset of (say) the plane is an example of a topological space, be it connected or not, compact or not, be it a curve, a domain, a fractal, or whatever; and all these are still a minority among all topological spaces. Topology is multivalent.


The formalizable (but not formal) language of mathematical texts stipulates several kinds of abbreviations described below.
According to Bourbaki, the study of multivalent theories is the most striking feature which distinguishes modern mathematics from classical mathematics.<ref>{{harvnb|Bourbaki|1968|loc=page 385}}.</ref>


==Definitions==
A similar idea occurs in mathematical logic: a theory is called categorical if all its models are mutually isomorphic. However, for Bourbaki a theory is embedded into the set theory, while in logic a theory is standalone (embedded into the first-order logic).


Mathematical definitions are very diverse.
===Complete or incomplete===
A theory is called complete, if every statement in its language can be proved or disproved. In other words: if such a statement is not a theorem then its negation is necessarily a theorem.
 
A statement (in the language of the given theory) is called independent (in other words, undecidable) in this theory, if it is not a theorem, but also its negation is not a theorem.
 
Independent statements appear naturally in multivalent theories. For example, in linear algebra the statement "three vectors cannot be linearly independent" is neither provable nor disprovable, since the dimension of a linear space may be 2 or 3 (or anything else).
Thus, multivalent theories are usually incomplete.


A definition may be just an abbreviation local to a single calculation, like this: "denoting for convenience <math>x^2+x+1</math> by <math>a</math> we have...".
What about univalent theories? Seemingly, these should be complete. For example, the plane Euclidean geometry should prove all geometric statements that hold on the Euclidean plane, and only such statements. Similarly, arithmetics of natural numbers should prove all true arithmetical statements about natural numbers, and only these. Unexpectedly, the situation is far more complicated than these naive ideas, see [[#A theory as a crystal ball?|Sect. 4.7]].


A single definition may embed a whole specialized mathematical theory into the universal set-theoretic framework, like this:
===Definitions: abbreviations or extensions===
According to [[#Motivated or indiscriminate|Sect. 2.4]], a theorem is a logical consequence of the axioms. However, what about definitions? They are numerous and important, according to [[#From technical to human: definitions|Sect. 2.5]]. Two possible approaches to definitions answer differently the question, whether they belong to the formal theory or not.


"A Euclidean space consists, by definition, of three sets, whose elements are called points, lines and planes respectively, and six relations, one called betweenness, three called containment and two called congruence, satisfying the following conditions: ..."
One approach treats definitions as abbreviations, used in the formalizable language and eliminated when translating into the formal language. Thus, the formal theory contains only primitive notions and axioms, not definitions.


A definition may be given mostly for speaking more concisely, like this: "in a triangle, the altitude of a vertex is, by definition, its distance to the line through the other two vertices".
According to the other approach, each definition extends the formal theory. The new (extended) theory contains one more notion and one more axiom. Informally, the new notion is just defined, and the new axiom is just the definition. But formally, the new theory still contains only primitive notions and axioms, not definitions.


A definition may introduce a new revolutionary concept, like this: "the derivative of a function at a point is, by definition, the limit of the ratio..."
An example.


"Some mathematicians will tell you that the main aim of their research is to find the right definition, after which their whole area will be illuminated. ... For other mathematicians, the main purpose of definitions is to prove theorems..." [The Companion, pages 74-75]
Definition: a prime number is a natural number having exactly two divisors. (These are 1 and the number itself, of course.)


==Complete or incomplete==
Theorem: there are infinitely many prime numbers.


A theory is called complete, if every statement in its language can be proved or disproved. In other words: if such a statement is not a theorem then its negation is necessarily a theorem.
The first approach (abbreviations). In the formal language, the notion "prime" is eliminated, as follows. Theorem: there are infinitely many natural numbers having exactly two divisors. The proof also does not involve the notion "prime".


A statement (in the language of the given theory) is called independent (in other words, undecidable) in this theory, if it is not a theorem, but also its negation is not a theorem.
The second approach (extensions). The new formal theory stipulates a new primitive notion "prime" and a new axiom: a natural number is prime if and only if it has exactly two divisors. This axiom is used in the proof of the theorem.


Independent statements appear naturally in multivalent theories. For example, in linear algebra the statement "three vectors cannot be linearly independent" is neither provable nor disprovable, since the dimension of a linear space may be 2 or 3 (or anything else).
===Formalizable in reality or in principle===
Thus, multivalent theories are usually incomplete.
The integer 1 is a very simple example of a mathematical object that can be defined in the language of set theory. Its definition, treated as an abbreviation (rather than extension), can be translated from this formalizable language into any corresponding formal language, in particular, into the formal language introduced by Bourbaki. However, the length of the resulting formal definition is equal to 4,523,659,424,929.<ref>{{harvnb|Mathias|2002}}.</ref> It would take about 4000 gigabytes of computer memory! More complicated mathematical definitions are much longer. Clearly, such formalization is possible only in principle, similarly to the possibility of writing 2<sup>1000</sup> as a sum of units.


What about univalent theories? Seemingly, these should be complete. For example, the plane Euclidean geometry should prove all geometric statements that hold on the Euclidean plane, and only such statements. Similarly, arithmetics of natural numbers should prove all true arithmetical statements about natural numbers, and only these. Unexpectedly, the situation is far more complicated than these naive ideas, see the next sections.
This formal language is so verbose because of substitution. Here is an algebraic example: substituting <math>x=(a+1)/(a^2+1)</math> into <math>(x^2-x+1)/(x^2+x+1)</math> gives an expression of length 64, containing 8 occurrences of ''a''. Further, substitute <math>a=u^3-2u^2-3u+4</math> and you get an expression of length 152, containing 24 occurrences of ''u''. And so on.


==Generating all theorems; decidable or undecidable==
There exist formal mathematical languages intended for practical use, for example, [[Mizar]]. These treat definitions as extensions.


It is impossible to list all theorems, since they are infinitely many. However, an endless algorithmic process can generate theorems, only theorems, and all theorems in the sense that every theorem will be generated, sooner or later. (In more technical words: the set of all theorems is recursively enumerable.)
===Generating all theorems; decidable or undecidable===
It is impossible to list all theorems, since they are infinitely many. However, an endless algorithmic process can generate theorems, only theorems, and all theorems in the sense that every theorem will be generated, sooner or later. In more technical words: the set of all theorems is recursively enumerable. Some theories have infinitely many axioms generated by a finite list of so-called axiom schemata. Still, the set of all theorems is recursively enumerable, since the set of axioms is.


An open question (in a mathematical theory) is a statement neither proved nor disproved. It is possible (in principle, not necessarily in practice) to run the theorem-generating algorithm waiting for one of two events: either the given statement appears to be a theorem, or its negation does; in both cases the (formerly) open question is decided. If the theory is complete, this must happen sooner or later.
An open question (in a mathematical theory) is a statement neither proved nor disproved. It is possible (in principle, not necessarily in practice) to run the theorem-generating algorithm waiting for one of two events: either the given statement appears to be a theorem, or its negation does; in both cases the (formerly) open question is decided. If the theory is complete, this must happen sooner or later.
Line 151: Line 231:
The argument above shows that a complete theory is decidable.
The argument above shows that a complete theory is decidable.


==A theory as a crystal ball?==
===A theory as a crystal ball?===
 
Imagine a theory able to describe the whole discrete mathematics (the set theory?) or, at least, arbitrary algorithmic computations, in other words, formal machines (like computers, but with unlimited resources). Assume that the theorems include all true statements about these machines, and no wrong statements about them. Then a formal machine can use this theory as a "crystal ball", — for predicting the future of any formal machine. In particular, for deciding, whether a given machine will eventually halt, or will run forever (so-called [[halting problem]]). To this end it runs the theorem-generating algorithm until either the statement "such machine eventually halts" or its negation appears among the theorems.
Imagine a theory able to describe the whole discrete mathematics (the set theory?) or, at least, arbitrary algorithmic computations, in other words, formal machines (like computers, but with unlimited resources). Assume that the theorems include all true statements about these machines, and no wrong statements about them. Then a formal machine can use this theory as a "crystal ball", — for predicting the future of any formal machine. In particular, for deciding, whether a given machine will eventually halt, or will run forever (so-called halting problem). To this end it runs the theorem-generating algorithm until either the statement "such machine eventually halts" or its negation appears among the theorems.


In order to predict the future of a formal machine Y, a formal machine X needs the code of Y. It does not mean that the code of Y must be inscribed into (and therefore be shorter than) the code of X. Instead, X can generate the code of Y. This way, a single machine X can predict the future of infinitely many machines Y, and moreover, of all possible machines Y, generating their codes in an endless loop.
In order to predict the future of a formal machine Y, a formal machine X needs the code of Y. It does not mean that the code of Y must be inscribed into (and therefore be shorter than) the code of X. Instead, X can generate the code of Y. This way, a single machine X can predict the future of infinitely many machines Y, and moreover, of all possible machines Y, generating their codes in an endless loop.
Line 159: Line 238:
Can X predict its own future? It may seem that the affirmative answer is ensured by the argument above. However, how can X know, which Y is identical to X? In other words, how can X know (it means, be able to generate) its own code?
Can X predict its own future? It may seem that the affirmative answer is ensured by the argument above. However, how can X know, which Y is identical to X? In other words, how can X know (it means, be able to generate) its own code?


It is a well-known and nontrivial fact that some programs, so-called quine programs, can generate their own texts. It may seem to be a trick, but in fact it is a form of self-replication, and it uses a 4×10<sup>9</sup> years old patent of [[life]]: a cell replicates itself using its DNA, and also replicates the DNA itself.
It is a well-known and nontrivial fact that some programs, so-called quine programs, can generate their own texts. (See also [[halting problem]].) It may seem to be a trick, but in fact it is a form of self-replication, and it uses a 4×10<sup>9</sup> years old patent of [[life]]: a cell replicates itself using its DNA, and also replicates the DNA itself.
 
Using the quine technique, a machine ''X'' can predict its own future. Is it paradoxical? Seemingly not, since ''X'' is a deterministic machine, it has no free will. But nevertheless it ''is'' paradoxical, since we may program ''X'' as follows:
*predict your own future: whether you will eventually halt, or will run forever;
*if "halt" is predicted, enter an infinite loop;
*otherwise (if "run forever" is predicted), halt.
We get a paradox, which means that the assumptions made in the beginning of this section cannot be satisfied. An axiomatic theory cannot be such that its theorems include all true statements about formal machines, and no wrong statements about them! This important conclusion is closely related to the famous Gödel theorem.<ref>The first [[Incompleteness Theorem|incompleteness theorem]] of [[Kurt Gödel|Gödel]], somewhat stronger than this conclusion, states that a consistent and complete theory cannot contain Robinson arithmetics.</ref>
 
==Notes==
{{reflist|2}}
 
==References==
{{Citation
| last = Borel
| first = Émile
| title = Probabilities and life
| year = 1962
| publisher = Dover publ. (translation)
}}.
 
{{Citation
| last = Bourbaki
| first = Nicolas
| title = Elements of mathematics: Theory of sets
| year = 1968
| publisher = Hermann (original), Addison-Wesley (translation)
}}.
 
{{Citation
| year = 1992
| editor-last = Casacuberta
| editor-first = C
| editor2-last = Castellet
| editor2-first = M
| title = Mathematical research today and tomorrow: Viewpoints of seven Fields medalists
| series = Lecture Notes in Mathematics
| volume = 1525
| publisher = Springer-Verlag
| isbn = 3-540-56011-4
}}.
 
{{Citation
| last = Feynman
| first = Richard
| author-link = Richard Feynman
| title = The character of physical law
| edition = twenty second printing
| year = 1995
| publisher = the MIT press
| isbn = 0 262 56003 8
}}.
 
{{Citation
| year = 2008
| editor-last = Gowers
| editor-first = Timothy
| title = The Princeton companion to mathematics
| publisher = Princeton University Press
| isbn = 978-0-691-11880-2
}}.
 
{{Citation
| last1 = Lawvere
| first1 = F. William
| last2 = Rosebrugh
| first2 = Robert
| title = Sets for mathematics
| year = 2003
| publisher = Cambridge University Press
| isbn = 0-521-80444-2
}}.
 
{{Citation
| last = Mathias
| first = Adrian
| year = 2002
| title = A term of length 4,523,659,424,929
| journal = Synthese
| volume = 133
| issue = 1/2
| pages = 75–86
| url = http://www.springerlink.com/content/x28504221108023t/
}}.
(Also [http://personnel.univ-reunion.fr/ardm/inefff.pdf here].)
 
[[Category:Suggestion Bot Tag]]

Latest revision as of 06:00, 28 October 2024

This article is developing and not approved.
Main Article
Discussion
Definition [?]
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
 
This editable Main Article is under development and subject to a disclaimer.

Similarly to a living thing, mathematics is a unity within an environment, yet apart from it — a compartment of a larger whole, structurally distinguishable though not functionally completely isolated from or closed to its surroundings.[1]

(CC) Photo: Anders Sandberg
(CC) Image: Anders Sandberg

Mathematics is structurally distinguishable

Axiomatic or non-axiomatic

There are two possible approaches to mathematics, called by R. Feynman "the Babylonian tradition" and "the Greek tradition".[2] They answer differently the question, whether or not some mathematical facts are more fundamental than others, more consequential facts. The same approaches apply to any theory, mathematical or not.

The Babylonian (non-axiomatic) tradition treats a theory as a network whose nodes are facts and connections are derivations. If some facts are forgotten they probably can be derived from the others.

The Greek (axiomatic) tradition treats a theory as a tower of more consequential facts called theorems, grounded on the basis of more fundamental facts called axioms. If all theorems are forgotten they surely can be derived from the axioms. The axioms are sparse and simple, not to be forgotten.

Monotonic or non-monotonic

The distinction between these two approaches is closely related to the distinction between monotonic and non-monotonic logic. These answer differently the question, whether or not a fact can be retracted because of new evidence. Monotonic logic answers in the negative, non-monotonic logic answers in the affirmative.

Non-monotonic logic is used routinely in everyday life and research. An example: "being a bird, this animal should fly", but the bird may appear to be a penguin. Another example: "the grass is wet, therefore it rained", but the cause may appear to be a sprinkler.

The non-axiomatic approach is flexible. When needed, some old facts can be retracted, some new facts added, and some derivations changed accordingly. Nowadays this approach is widely used outside mathematics, and only marginally within mathematics (so-called informal mathematics).

The axiomatic approach is inflexible. A theorem cannot be retracted without removing (or replacing) at least one axiom, which usually has dramatic consequences for many other theorems. Nowadays this approach is widely used in mathematics.

The non-axiomatic approach is well suited when new evidence often comes from the outside. The axiomatic approach is well suited for a theory that advances only by extracting new consequences from an immutable list of axioms. It may seem that such a development must be dull. Surprisingly, this is an illusion. Being inflexible in some sense, an axiomatic theory can be very flexible in another sense, see Sect. 2.2.

Sharp or fuzzy; real or ideal

A fair coin is tossed 1000 times; can it happen at random that heads is obtained all the 1000 times? Emile Borel, a famous mathematician, answers in the negative, with an important reservation:

Such is the sort of event which, though its impossibility may not be rationally demonstrable, is, however, so unlikely that no sensible person will hesitate to declare it actually impossible.[3]

Why `may not be rationally demonstrable'? Why does this statement remain outside the list of Borel's theorems in probability theory?

Mathematical truth is sharp, not fuzzy. Every mathematical statement is assumed to be either true or false (even if no one is able to decide) rather than "basically true", "true for all practical purposes" etc. If n heads out of n times is an impossible event for n=1000 then there must exist the least such n (be it known or not). Say, 665 heads can occur, but 666 heads cannot. Then, assuming that 665 heads have just appeared, we see that the next time tails is guaranteed, which contradicts to the assumed memoryless behavior of the coin.

Another example. The number 4 can be written as a sum of units: 1+1+1+1. Can the number 21000 be written as a sum of units? Mathematics answers in the affirmative. Indeed, if you can write, say, 2665 as a sum of units, then you can do it twice (connecting the two copies by the plus sign). Complaints about limited resources, appropriate in the real world, are inappropriate in the imaginary, highly idealized mathematical universe.

Defined or undefined

Facts are formulated via notions.

In the non-axiomatic approach, notions are nodes of a network whose connections are definitions. If some notions are forgotten they probably can be restored from the others.

Searching Google for "define:line" we get "a length without breadth or thickness". Similarly we find definitions for breadth, thickness and so on, recursively. Doing so we would get a large subnetwork; here is its small fragment:

  • Line: a length without breadth or thickness
    • Length: linear extent in space
      • Linear: along a line↑
      • Extent: a range of locations
        • Location: point or extent↑ in space
      • Space: unlimited expanse in which everything is located↑
    • Breadth: the extent↑ from side to side
      • Side: a surface forming part of the outside of an object
        • Surface: the extended↑ two-dimensional outer boundary of a three-dimensional object
    • Thickness: the dimension through an object as opposed to its length↑ or width
      • Width: the extent↑ from side↑ to side↑

(Up arrows mean: see above.) We observe that

  • circularity appears routinely; for example: line→length→linear→line;
  • the definition of a single notion involves recursively a large number of other, quite remote notions.

Such system of notions is unsuitable for a mathematical theory. Here, circularity is disallowed, and the set of involved notions is kept reasonably small (whenever possible). (See also Semantic primes for an attempt to disclose a common mathematics-like structure behind all natural languages.)

In the axiomatic approach, notions are a tower of defined notions, grounded on the basis of more fundamental notions called undefined primitives. If all defined notions are forgotten they surely can be restored from the undefined primitives. The undefined primitives are sparse and simple, not to be forgotten.

Curiously, when a non-mathematical encyclopedia contains an article on a mathematical notion, two very different "definitions" may appear, one general (informal), the other mathematical (formal).

From now on, in this article, "definition" means a mathematical definition (unless explicitly stated otherwise).

The lack of definition of a primitive notion does not mean lack of any information about this notion. Axioms provide such information, to be used in proofs. Informal (intuitive) understanding of a primitive notion is communicated in a natural language. This information cannot be used in proofs, but is instrumental when guessing what to prove, how to prove, how to apply proved theorems and, last but not least, what to postulate by axioms.

Mathematics is not isolated

Computers metaphor

A conceptual metaphor helps to understand one conceptual domain in terms of another. For example, the desktop metaphor treats the monitor of a computer as if it is the user's desktop, which helps to a user not accustomed to the computer. Nowadays many are more accustomed to computers than to mathematics. Thus, analogies with computers may help to understand mathematics. Such analogies are widely used below.

Flexible or inflexible

In 1960s a computer was an electronic monster able to read from a punch tape simple instructions stipulated by the hardware and execute them quickly, thus performing a lot of boring calculations. Nowadays some parents complain that personal computers are too fascinating. However, without software a personal computer is only able to read (say, from a compact disk) and execute instructions stipulated by the hardware. These instructions are now as technical as before: simple arithmetical and logical operations, loops, conditional execution etc. A computer is dull, be it a monster of 1960s or a nice looking personal computer, unless programmers develop fascinating combinations (called programs) of these technical instructions.

For a programmer, the instruction set of a given computer is an immutable list. The programmer cannot add new elements to this list, nor modify existing elements. In this sense the instruction set is inflexible. New programs are only new combinations of the given elements. Does it mean that program development is dull? In no way! A good instruction set is universal. It means that a competent programmer feels pretty free to implement any well-understood algorithm provided that the time permits (which is usually most problematic) and the memory and the speed of the computer are high enough (which is usually less problematic). In this sense a good instruction set is very flexible.

Likewise a mathematician working within a given theory faces its axioms as an immutable list. However, this list can be universal in the sense that a competent mathematician feels pretty free to express any clear mathematical idea. In this sense some axiomatic systems are very flexible.

Universal or specialized

For a user, the software of a computer is first of all a collection of applications (games, web browsers, media players, word processors, image editors etc.) All applications function within an operating system (Windows, MacOS, Linux etc.) The operating system is a universal infrastructure behind various specialized applications. Each application deals with relevant files. The operating system maintains files in general, and catalogs (directories) containing files and possibly other catalogs.

Likewise, the set theory is a universal infrastructure behind various specialized mathematical theories (algebra, geometry, analysis etc.) Each specialized mathematical theory deals with relevant objects, relations and sets. The sets theory deals with sets in general, possibly containing other sets, and reduces objects and relations to sets. Alternatively, higher-order logic can be used as such a universal infrastructure, more convenient for computer-assisted formalization of mathematics. Also category theory pretends to the throne of the set theory.[4]

In many cases a specialized mathematical theory is a theory of mathematical structures of some kind, often called spaces (of this kind). For example: Euclidean geometry is a theory of Euclidean spaces; topology is a theory of topological spaces; linear algebra is a theory of linear spaces. Differential geometry investigates smooth manifolds. Algebra investigates groups, rings, fields etc.

Motivated or indiscriminate

Monkeys could type into a computer a sequence of hardware instructions; the computer could execute them; but the result of such "programming" has almost no chance to be fascinating or useful. Fascinating computer games reflect human predilections. Useful programs reflect human needs. A computer is dull for humans unless its software reflects human life in one way or another.

Likewise, a theorem is of no interest for humans unless it is motivated in one way or another by human life. The motivation may be quite indirect; many theorems "only" help to prove other theorems, many are appreciated "only" for their aesthetic value, etc. But some kind of motivation is necessary. Indiscriminate stream of logical consequences of the axioms is not publishable in the mathematical literature.

Note that "a theorem" does not mean "a motivated theorem", "an important theorem" etc., not even "an already discovered theorem". All theorems are just an indiscriminate stream of logical consequences of the axioms.

Theorems of a theory are, by definition, statements that follow from the given axioms according to the given rules (called by different authors inference rules, derivation rules, deduction rules, transformation rules), including the axioms themselves.

From technical to human: definitions

The gap between a bare hardware and a nice application is too wide for a single jump, or even a triple jump (hardware – operating system – programming language – application). Bridging the gap is a laborious task for many programmers. They compose programs of modules, and modules of subroutines. Each subroutine reduces a bit more useful task to a bit simpler tasks. Ultimately, a useful (or even fascinating) task is reduced to the technical instructions of the bare hardware.

Likewise, mathematicians bridge the wide gap between useful notions (say, "ellipse" or "normal distribution") and the undefined primitives by a large and complicated system of definitions. Each definition reduces a bit more useful notion to a bit more primitive notions.

Mathematical definitions are very diverse.

A definition may be just an abbreviation local to a single calculation, like this: "denoting for convenience by we have...".

A single definition may embed a whole specialized mathematical theory into the universal set-theoretic framework, like this:

"A Euclidean space consists, by definition, of three sets, whose elements are called points, lines and planes respectively, and six relations, one called betweenness, three called containment and two called congruence, satisfying the following conditions: ..."

A definition may be given mostly for speaking more concisely, like this: "in a triangle, the altitude of a vertex is, by definition, its distance to the line through the other two vertices".

A definition may introduce a new revolutionary concept, like this: "the derivative of a function at a point is, by definition, the limit of the ratio..."

Some mathematicians will tell you that the main aim of their research is to find the right definition, after which their whole area will be illuminated. ... For other mathematicians, the main purpose of definitions is to prove theorems... [5]

Principles and practices

Rigor or intuition

Mathematics as a cultural process is not bound by monotonic logic. For example, it was treated till 1872 as intuitively evident fact that any curve on the plane has a tangent line everywhere except maybe a discrete set of points. However, this "fact" was refuted and retracted.

A century later, we have seen so many monsters of this sort that we have become a little blasé. But the effect produced on the majority of nineteenth century mathematicians ranged from disgust to consternation.[6]

Since then, intuition is not a valid argument in mathematical proofs. A mathematical theory is bound by monotonic logic. If needed, mathematics as a cultural process can retract an axiomatic theory as whole (rather than some theorems) and accept another axiomatic theory. (Moreover, in a remote perspective the axiomatic approach could be abandoned.)

Formal or formalizable

An example. If then , since . This simple algebraic argument can be checked by a human or even a computer. No need to understand the meaning of the statement; it is sufficient to know the relevant formal rules. The statement follows from the given rules only, rigorously, without explicit or implicit use of intuition. A formal proof guarantees that a theorem will never be retracted.

A universal formal language conceived by Gottfried Leibniz near 1685,

capable of expressing all humans thoughts unambiguously, of strengthening our power of deduction, of avoiding errors by a purely mechanical effort of attention[7]

was partially implemented in mathematics two centuries later, especially, as a formal language of set theory. In the form introduced by Bourbaki this language contains four logical signs, three specific signs, and letters. Its rules are considerably more complicated than elementary algebra.

However, this formal language is never used by a typical mathematician. Likewise, most programmers never use the instructions of the computer hardware. Instead, they develop applications in appropriate programming languages (BASIC, Java, C++, Python etc). A statement of a programming language corresponds usually to many hardware instructions; the correspondence is established by special programs (compilers and interpreters). A programming language is geared to programmers, hardware instructions --- to hardware.

The gap between the language of mathematical texts (books and articles) and the formal language of the set theory is even wider than the gap between a programming language and hardware instructions. It is as wide as the gap between pseudocode and Turing machines.

A programmer, when coding a program, knows that the computer does exactly what it is told to do. In contrast, an author or speaker, when communicating with other people, can be ambiguous, make small errors, and still expect to be understood. Pseudocode is a description of an algorithm, intended for human reading, that can be converted into a code by any competent programmer. A code is formal; a pseudocode is formalizable.

Likewise, communication between mathematicians is made in a formalizable language.

In practice, the mathematician ... is content to bring the exposition to a point where his experience and mathematical flair tell him that translation into formal language would be no more than an exercise of patience (though doubtless a very tedious one).[8]

A mathematician, using a formalizable language, need not know all technical details of the underlying formal language. Many equivalent formal languages may be used interchangeably. Likewise, a single pseudocode can be used when coding in different programming languages, as long as these are not too far from each other.

Axiomatic or axiomatizable

Two finite lists of vectors generate (by summation) the same infinite set of vectors. Similarly, different finite lists of axioms can be used for generating the same infinite set of theorems.

A mathematician, working within an axiomatic theory, need not remember the list of axioms in full detail. It is sufficient to remember a list of theorems that contains all axioms or, more generally, such that each axiom follows easily from these theorems. (Here, as usual, "theorems" are treated inclusively, as "axioms and theorems".)

Two lists of axioms are called equivalent if every axiom of each list follows from the axioms of the other list. Such lists of axioms may be used interchangeably, since they generate the same theorems.

Some details

Consistent or inconsistent

If a theory states that 2+2=5, it is a paradox but not yet a contradiction. By "paradox" people may mean

  • a contradiction;
  • an apparent contradiction;
  • something counterintuitive;
  • something surprising;
  • something ironic;

etc. In contrast, a contradiction (in a mathematical theory) is, by definition, a pair of theorems (of the given theory) such that one is the negation of the other. Thus, two theorems

are still not a contradiction. Two theorems

are a contradiction.

If a contradiction exists in a given theory, this theory is called inconsistent. Otherwise, if no contradictions exist (rather than merely not found for now), the theory is called consistent.

For a mathematician, an inconsistent theory is completely useless. Some philosophers disagree:

Superstitious dread and veneration by mathematicians in face of a contradiction[9]

But a mathematician insists: an inconsistent theory is completely useless, since all statements (in the given language) are theorems! The reason is, proof by contradiction. No matter which statement X is in question, we always can prove X as follows:

  • Assume that X is false;
  • ... (put the proof of the contradiction here);
  • the assumption leads to a contradiction, thus X is true.

It is tempting to object that the contradiction has nothing in common with the assumption and therefore cannot invalidate it. However, the rules of formal logic do not demand that the contradiction has something in common with the assumption. Some attempts to change these rules were made (so-called "relevance logic", or "relevant logic"), but with little success. It is always possible to obfuscate the proof of the contradiction, making it seemingly entangled with X. We have no formal criterion able to unmask any possible fictitious participation of X in the proof of the contradiction.

David Hilbert aimed to find axioms sufficient for all mathematics and to prove their consistency from the assumption that the "finitary arithmetic" (a subsystem of the usual arithmetic of the positive integers, chosen to be philosophically uncontroversial) was consistent. A fatal blow was dealt by the second Gödel's incompleteness theorem. Consistency of a theory cannot be proved by a weaker theory, nor by the same theory. It can be proved by a stronger theory, which does not dispel doubts: if the given theory is inconsistent then the stronger theory, being all the more inconsistent, can prove every claim, be it true or false.

Many mathematicians feel that specialized theories, being more reliable than universal theories, are like watertight compartments. If a contradiction will be found in the used universal theory, specialized theories will separate and wait for a better universal theory.

I have always felt that, if one day someone came up with a contradiction in mathematics, I would just say, "Well, those crazy logicians are at it again," and go about my business as I was going the day before.[10]

Univalent or multivalent

Plane geometry (called also "planar geometry") is a part of solid geometry that restricts itself to a single plane ("the plane") treated as a geometric universe. The question "which plane?" is inappropriate, since planes do not differ in their geometric properties. Every two planes α, β are isomorphic, that is, there exists an isomorphism f between α and β. Treating α and β as sets of points one defines isomorphism as an invertible (one-to-one and onto) map f : α → β preserving all primitive relations. Namely: f maps lines into lines; the distance between f(A) and f(B) on β is equal to the distance between A and B on α; etc. The same is required of the inverse map .

Axioms of the plane Euclidean geometry leave no freedom, they determine uniquely all geometric properties of the plane. More exactly: all Euclidean planes are mutually isomorphic. In this sense we have "the" Euclidean plane. In terms of Bourbaki, the plane Euclidean geometry is an univalent theory. In contrast, axioms of a linear space (called also vector space) leave a freedom: a linear space may be one-dimensional, two-dimensional, three-dimensional, four-dimensional and so on (infinite dimension is also possible). Linear algebra is multivalent.

Axioms of topology leave much more freedom; every subset of (say) the plane is an example of a topological space, be it connected or not, compact or not, be it a curve, a domain, a fractal, or whatever; and all these are still a minority among all topological spaces. Topology is multivalent.

According to Bourbaki, the study of multivalent theories is the most striking feature which distinguishes modern mathematics from classical mathematics.[11]

A similar idea occurs in mathematical logic: a theory is called categorical if all its models are mutually isomorphic. However, for Bourbaki a theory is embedded into the set theory, while in logic a theory is standalone (embedded into the first-order logic).

Complete or incomplete

A theory is called complete, if every statement in its language can be proved or disproved. In other words: if such a statement is not a theorem then its negation is necessarily a theorem.

A statement (in the language of the given theory) is called independent (in other words, undecidable) in this theory, if it is not a theorem, but also its negation is not a theorem.

Independent statements appear naturally in multivalent theories. For example, in linear algebra the statement "three vectors cannot be linearly independent" is neither provable nor disprovable, since the dimension of a linear space may be 2 or 3 (or anything else). Thus, multivalent theories are usually incomplete.

What about univalent theories? Seemingly, these should be complete. For example, the plane Euclidean geometry should prove all geometric statements that hold on the Euclidean plane, and only such statements. Similarly, arithmetics of natural numbers should prove all true arithmetical statements about natural numbers, and only these. Unexpectedly, the situation is far more complicated than these naive ideas, see Sect. 4.7.

Definitions: abbreviations or extensions

According to Sect. 2.4, a theorem is a logical consequence of the axioms. However, what about definitions? They are numerous and important, according to Sect. 2.5. Two possible approaches to definitions answer differently the question, whether they belong to the formal theory or not.

One approach treats definitions as abbreviations, used in the formalizable language and eliminated when translating into the formal language. Thus, the formal theory contains only primitive notions and axioms, not definitions.

According to the other approach, each definition extends the formal theory. The new (extended) theory contains one more notion and one more axiom. Informally, the new notion is just defined, and the new axiom is just the definition. But formally, the new theory still contains only primitive notions and axioms, not definitions.

An example.

Definition: a prime number is a natural number having exactly two divisors. (These are 1 and the number itself, of course.)

Theorem: there are infinitely many prime numbers.

The first approach (abbreviations). In the formal language, the notion "prime" is eliminated, as follows. Theorem: there are infinitely many natural numbers having exactly two divisors. The proof also does not involve the notion "prime".

The second approach (extensions). The new formal theory stipulates a new primitive notion "prime" and a new axiom: a natural number is prime if and only if it has exactly two divisors. This axiom is used in the proof of the theorem.

Formalizable in reality or in principle

The integer 1 is a very simple example of a mathematical object that can be defined in the language of set theory. Its definition, treated as an abbreviation (rather than extension), can be translated from this formalizable language into any corresponding formal language, in particular, into the formal language introduced by Bourbaki. However, the length of the resulting formal definition is equal to 4,523,659,424,929.[12] It would take about 4000 gigabytes of computer memory! More complicated mathematical definitions are much longer. Clearly, such formalization is possible only in principle, similarly to the possibility of writing 21000 as a sum of units.

This formal language is so verbose because of substitution. Here is an algebraic example: substituting into gives an expression of length 64, containing 8 occurrences of a. Further, substitute and you get an expression of length 152, containing 24 occurrences of u. And so on.

There exist formal mathematical languages intended for practical use, for example, Mizar. These treat definitions as extensions.

Generating all theorems; decidable or undecidable

It is impossible to list all theorems, since they are infinitely many. However, an endless algorithmic process can generate theorems, only theorems, and all theorems in the sense that every theorem will be generated, sooner or later. In more technical words: the set of all theorems is recursively enumerable. Some theories have infinitely many axioms generated by a finite list of so-called axiom schemata. Still, the set of all theorems is recursively enumerable, since the set of axioms is.

An open question (in a mathematical theory) is a statement neither proved nor disproved. It is possible (in principle, not necessarily in practice) to run the theorem-generating algorithm waiting for one of two events: either the given statement appears to be a theorem, or its negation does; in both cases the (formerly) open question is decided. If the theory is complete, this must happen sooner or later.

A theory is called decidable if there exists an algorithm that, given an arbitrary statement (in the language of the theory) decides whether it is a theorem or not. (In more technical words: if the set of all theorems is recursive.)

The argument above shows that a complete theory is decidable.

A theory as a crystal ball?

Imagine a theory able to describe the whole discrete mathematics (the set theory?) or, at least, arbitrary algorithmic computations, in other words, formal machines (like computers, but with unlimited resources). Assume that the theorems include all true statements about these machines, and no wrong statements about them. Then a formal machine can use this theory as a "crystal ball", — for predicting the future of any formal machine. In particular, for deciding, whether a given machine will eventually halt, or will run forever (so-called halting problem). To this end it runs the theorem-generating algorithm until either the statement "such machine eventually halts" or its negation appears among the theorems.

In order to predict the future of a formal machine Y, a formal machine X needs the code of Y. It does not mean that the code of Y must be inscribed into (and therefore be shorter than) the code of X. Instead, X can generate the code of Y. This way, a single machine X can predict the future of infinitely many machines Y, and moreover, of all possible machines Y, generating their codes in an endless loop.

Can X predict its own future? It may seem that the affirmative answer is ensured by the argument above. However, how can X know, which Y is identical to X? In other words, how can X know (it means, be able to generate) its own code?

It is a well-known and nontrivial fact that some programs, so-called quine programs, can generate their own texts. (See also halting problem.) It may seem to be a trick, but in fact it is a form of self-replication, and it uses a 4×109 years old patent of life: a cell replicates itself using its DNA, and also replicates the DNA itself.

Using the quine technique, a machine X can predict its own future. Is it paradoxical? Seemingly not, since X is a deterministic machine, it has no free will. But nevertheless it is paradoxical, since we may program X as follows:

  • predict your own future: whether you will eventually halt, or will run forever;
  • if "halt" is predicted, enter an infinite loop;
  • otherwise (if "run forever" is predicted), halt.

We get a paradox, which means that the assumptions made in the beginning of this section cannot be satisfied. An axiomatic theory cannot be such that its theorems include all true statements about formal machines, and no wrong statements about them! This important conclusion is closely related to the famous Gödel theorem.[13]

Notes

  1. This phrase is borrowed from "Life".
  2. Feynman 1995, Sect. 2, page 46.
  3. Borel 1962, page 3.
  4. Lawvere & Rosebrugh 2003.
  5. Gowers 2008, pages 74–75.
  6. Bourbaki 1968, page 311.
  7. Bourbaki 1968, page 302.
  8. Bourbaki 1968, page 8.
  9. Ludwig Wittgenstein.
  10. Vaughan Jones. See Casacuberta & Castellet 1992, page 91.
  11. Bourbaki 1968, page 385.
  12. Mathias 2002.
  13. The first incompleteness theorem of Gödel, somewhat stronger than this conclusion, states that a consistent and complete theory cannot contain Robinson arithmetics.

References

Borel, Émile (1962), Probabilities and life, Dover publ. (translation).

Bourbaki, Nicolas (1968), Elements of mathematics: Theory of sets, Hermann (original), Addison-Wesley (translation).

Casacuberta, C & M Castellet, eds. (1992), Mathematical research today and tomorrow: Viewpoints of seven Fields medalists, Lecture Notes in Mathematics, vol. 1525, Springer-Verlag, ISBN 3-540-56011-4.

Feynman, Richard (1995), The character of physical law (twenty second printing ed.), the MIT press, ISBN 0 262 56003 8.

Gowers, Timothy, ed. (2008), The Princeton companion to mathematics, Princeton University Press, ISBN 978-0-691-11880-2.

Lawvere, F. William & Robert Rosebrugh (2003), Sets for mathematics, Cambridge University Press, ISBN 0-521-80444-2.

Mathias, Adrian (2002), "A term of length 4,523,659,424,929", Synthese 133 (1/2): 75–86. (Also here.)