N. Wirth
Tuesday 23 February 1988
Abstract
The programming language Oberon is the result of a
concentrated effort increase the power of Modula-2 and
simultaniously to reduce its complexity. Several features were
eliminated, and a few were added in order to increase the
expressive power and flexibility of the language. This paper
describes and motivates the changes. The language is defined
in a concise report.
BIX oberon/long_messages#11, from frode,21774 chars,
Tue Feb 23 20:43:46 1988.
TITLE Oberon Language Article, (c) ETH-ZENTRUM, SWITZERLAND.
Introduction
The programming language Oberon evolved from a project
whose goal was the design of a modern, flexible and efficient
operating system for a single-user workstation. A principal
guideline was to concentrate on properties that are genuinely
essential and - as a consequense - to omit emphemeral issues.
It is best way to keep the system in hand, to make it
understandable, explicable, reliable and efficiently
implementable.
Initially, it was planned to express system in
Modula-2[1], as that language supports the notion of modular
design quite effectively, and because an operating system has
to be design in terms of separately compilable parts with
consientiously chosen interfaces. In fact, an operating system
should be no more than a set of basic modules, and the design
of an application must be considered as a goal-oriented
extension of that basic set: Programming is always extending a
given system.
Whereas modern languages, such as Modula, supports the
notion of extensibility in the procedure realm, the notion is
less well established in the domain in data types. Modula in
particular does not allow the definition of a new data types
as extensions of other, programmer-defined types in the
adequate manner. An additional features was called for,
thereby giving rise to an extension of Modula.
The concept of the planned operating system also called
for a highly dynamic, centralized storage managment relying on
the technique of garbage collection. Althoug Modula does not
prevent the incorporation of a garbage collector in principle,
its variant records feature constitutes a genuine obstacle. As
the new facility for extending types would make the variant
records feature superfluous, the removal of this stumbling
block was a logical desicion. This step, however gave rise to
a restriction (subset) of Modula. It soon became clear that
the rule to concentrate on the essential and to eliminate
inessential should not only be applied to the design of the
new system, but equallu stringently to the language in which
the system is fornulated. The application of the principal
thus led from Modula to a new language. The adjective new,
however, has to be understood in proper context: Oberon
evolved from Modula by a very few addition and several
substractions. In relying on evolution rather then revolution
we remain in the tradition of a long development that led from
Algol to Pascal, then Modula-2, and eventially to Oberon. The
common trait of these languages are their procedural rather
than functional model, and the strict typing of data. More
fundamental even is perhaps the idea of abstraction: the
language must be defined in terms of mathematical, abstract
concepts without reference to any computing mechanism. Only if
a language satisfies this criterion can it be called
"higher-level". No syntactic coasting whatsoever can earn a
language this attribute alone.
The definition of a language must be coherent and
concise. This can only be achived by a careful choice of the
underlying abstractions an appropriate structure combining
them. The language manual must be resonably short, avoiding
the explanation of individual cases derivable from the general
rules. The power of a formalism must be measured by the length
of its description. To the contrary, an overly lengthy
definition is a sure symptom of inadequacy. In this respect,
not complexity but simplicity must be goal.
In spite of its brevity, a description must be complete.
Completeness is to be achived whithin the framework of the
chosen abstractions. Limitations imposed by particular
implementations do not belong to a language definition proper.
Examples of such restrictions are the maximum values of
numbers, rounding and truncation errors in arithmetic, and
actions taken when a program violates the stated rules. It
should not be necessary to supplement a language definition
with voluminous standarts document to cover "unforeseen"
cases.
But neither should a programming language be a
mathematical theory only. Its must be practical tool. This
imposes certain limits on the terseness of the formalism.
Several features of Oberon are superfluous from a purely
theoretical point of view. They are nevertheless retained for
practical reasons, either for programmers' convenience or to
allow for efficient code generation whithout the necessity of
complex, "optimizing" pattern matching algorithms in
compilers. Examples of such features are the presence of
several forms of repetitive statements, and of standard
procedures such as INC, DEC, and ODD. They complicate neither
the language conceptually nor the compiler to any significant
degree.
These underlying premises must be kept in mind when
comparing Oberon with other languages. Neither the language
nor its defining document reach the ideal; but Oberon
approximates these goals much better than its predecessors.
A compiler for Oberon has been implemented for the
NS32000 processor family and is embedded in the Oberon
operating environment. The following data provide an estimate
of the simplicity and efficiency of the implementation, and
readers are encouraged to compare them whith implementations
of other languages. (Measurements with 10 MHz NS32032).
--------------------------------------------------------------
lenght of lenght of time of self
source program compiled code compilation
--------------------------------------------------------------
lines characters bytes seconds
--------------------------------------------------------------
Parser 1116 36719 9928 11.53
Scanner 346 9863 3388 3.80
Import/Export 514 18386 4668 5.25
Code generator 1963 65901 21636 21.02
Total 3939 130869 39620 41.60
Subsequently, we present a brief introduction to Oberon
assuming familiarity with Modula (or Pascal), concentrating on
the added features and listing the eliminated ones. In order
to be able to "start with a clean table", the latter are taken
first.
Fetures omitted from Modula
Data types
Variant records are eliminated, because they constitute a
genuine difficulty for the implementation of a reliable
storage management system based on automatic garbage
collection. The functionality of variant records is preserved
by the introduction of extensible data types.
Opaque types cater to the concept of the abstract data
type and information hiding. They are eliminated because again
the concept is covered by the new facility of extended record
types.
Enumeration types appear to be a simple enough feature to
be uncontroversial. However, they defy extensibility over
module boundaries. Either a facility to extend enumeration
types would have to be introduced, or they would have to be
dropped. A reason in favour of the latter, radical solution
was the observation that in a growing number of programs the
indiscriminate use of enumerations had led to a pompous style
that contributed not to program clarity, but rather to
verbosity. In connection with import and export, enumerations
gave rise to the exceptional rule that import of a type
identifier also causes the (automatic) import of all
associated constant identifiers. This exceptional rule defies
conceptual simplicity and causes unpleasant problems for the
implementor.
Subrange types were introduced in Pascal (and adopted in
Modula) for two reasons: (1) to indicate that a variable
accepts a limited range of values of the base type and allow a
compiler to generate appropriate guards for assigments, and
(2) to allow a compiler to allocate minimal storage space
needed to store values of the indicated subrange. This
appeared desirable in connection with packed records. Very few
implementations have taken advantage of this space saving
facility, because additional compiler complexity is very
considerable. Reason 1 alone, however, did not appear to
provide sufficient justification to retain subrange facility
in Oberon.
With the absence of enumerations and subrange types, the
general posibility to define set type based on given element
types appeared as redundant. Instead, a single, basic type SET
is introduced, whose values are sets of integers from 0 to an
implementation-defined maximum.
The basic type CARDINAL had been introduced in Modula-2
in order to allow address arithmetic with values from 0 to
2**16 on 16-bit computers. With the prevalence of 32-bit
address in modern processors, the need for unsigned arithmetic
has practically vanished, and therefore the type CARDINAL has
been eliminated. With it, the bothersome incompatibilities of
operands of types CARDINAL and INTEGER have disappeared.
The notion of a definable index type of arrays also been
abandoned: All indicies are by default integers. Furthermore,
the lower bound is fixed to 0; array declarations specify a
number of elements (length) rather than pair of bounds. This
break with a long standing tradition since Algol 60
demonstrates the principle of eliminating the inessential most
clarity. The specification of an arbitrary lower bound
provides no expressive power at all, but it introduces a
non-negligible amount of hidden, computational effort. (Only
in the case of static declarations can it be delegated to the
compiler).
Modules and import/export rules
Experience with Modula over last eight year has shown
that local modules were rarely used. The additional
complexibility of the compiler required to handle them, and
the additional complications in the visibility rules of the
language definition appear not to justify local modules.
The qualification of an imported object's identifier x by
the exporting module's name M, viz. M.x can be circumvented in
Modula by the use of the import clause FROM M IMPORT x. This
facility has also been discarded. Experience in programming
systems involving many modules has taught that the explicit
qualification of each occurrence of x is actualy preferable. A
simplification of the compiler is a welcome side-effect.
The dual role of the main module in Modula is
conceptually confusing. It constitutes a module in sense of a
package of data and procedures enclosed by a scope of
visibility, and at the same time it constitutes a single
procedure called the main program. Within the Oberon system,
the notion of a main program is vanished. Instead, the system
allows the user to activate any (exported,parameterless)
procedure (called a command). Hence, the language excludes
modules without explicit definition part, and every module is
defined in terms of a definition part and an implementation
part (not definition module and implementation module).
Statments
The with statment has been discarded. Like in the case of
exported identifiers, the explicit qualification of field
identifiers is to be preferred.
The elimination of the for statment constitutes a break
with another long standing tradition. The baroque mechanism in
Algol 60's for statment had been trimmed considerably in
Pascal (and Modula). Its marginal value in practice has led to
its absence in Oberon.
Low-level facilities
Modula-2 makes access to machine-specific facilities
possible through low-level constructs, such as data types
ADDRESS and WORD, absolute addressing of variables, and type
casting functions. Most of them are packaged in module called
SYSTEM. The features were supposed to rarely used and easily
visible throug the presense of SYSTEM in a modules's import
list. Experience has revealed, however, that significant
number of programmers import this module quite
indiscriminately. A particulary seducing trap are Module's
type transfer functions.
It appears preferable to drop the pretense of partability
of programs that import a "standard", yet system-specific
module. Both the module SYSTEM and the type transfer functions
are eliminated, and with them also the types ADDRESS and WORD.
Individual implementors are free to provide system-dependent
modules, but they do not belong to the general language
definition. Their use then declares a program to be patently
implementation-specific, and thereby not portable.
Concurrency
The system Oberon does not require any language
facilities for expressing concurrent process. The pertinent,
rudimentary features of Modula, in particular the coroutine,
were therefor not retained. This exclusion is merely a
reflection of our actual needs within the concrete project,
but not on the general relevance of concurrency in
programming.
Features introduced in Oberon
Type extension
The most important addition is the facility of extended
record types. It permits the construction of new types on the
basis of existing types, and establishing a certain degree of
compatibility between the names of the new and old types.
Assuming a given type
T = RECORD x,y: INTEGER END
extension may be defined which contain certain fields in
addition to the existing ones. For example
T0 = RECORD (T) z: REAL END;
T1 = RECORD (T) w: LONGREAL END;
define types with fields x,y,z and x,y,w respectively. We
define a type declared by
T' = RECORD (T) <field declarations> END
to be a (direct) extension of T, and conversely T to be
the (direct) base type of T'. Extended types may be extended
again, giving rise to the following definitions:
A type T' is an extension of T, if T' is a direct
extension of an extension of T. Conversely, T is a base of T',
if T=T' or T is the direct base of a base type of T'. We
denote this relationship by T'=>T.
The rule of assigment compatibility states that values of
an extended type are assignable to variable of their base
types. For example, a record of type T0 can be assigned to a
variable of the base type T. This assigmant involves the field
x and y only, and in fact constitutes a projection of the
value onto space spanned by the base type.
It is important that an extended type may be declared in
a module that imports the base type. In fact, this is probably
the normal case.
This concept of extensible data type gain importance when
extended to pointers. It is appropriate to say that a pointer
type P' bound to T' extends a pointer type P, if P is bound to
a base type T of T', and to extend the assigmant rule to cover
this case. It is now possible to form structures whose nodes
are of different types, i.e. inhomogenious data structures.
The inhomogeneity is automatically (and most sensibly) bounded
by the fact that the nodes are linked by the pointers of a
common base ype.
Typically, the pointer fields establishing the structure
are contained in the base type T, and the procedures
manipulating the structure are defined in the same (base)
module as T. Individual extensions (variants) are defined in
client modules together with procedures operating on nodes of
the extended type. This scheme is in full accordance with the
notion of system extensibility: new modules defining new
extensions may be added to a system without requiring a change
of the base modules, not even their recompilation.
As access to an individual node via pointer bound to a
base type provides a projected view of the node data only, a
facility to widen the view is nessasary. It depends on the
possibility to determine the actual type of the referenced
node. This is achieved by a type test, a Boolean expression in
the form
t IS T' (or p IS P')
If the test is affirmative, an assigment t':=t (t' of
type T') or p':=p (p' of type P') should be possible. The
static view of types, however, prohibits this. Note that both
assigments violate the rule of assigment compatibility. The
desired statment is made possible by providing a type guard in
the form
t':=t(T) (p':=p(P))
and by the same token access to the field z of T0 (see
previous example) is made possible by a type guard in the
designator t(T0).z. Here the guard asserts that t is
(currently) of type T0.
The declaration of extended record types, the type test,
and the type guard are the only additional feartures
introduced in this context. A more extensive discussion is
provided in [2]. The concept is very similar to the class
notion of Simula 67 [3], Smalltalk [4], and others. Difference
lie in the fact that the class facility stipulates that all
procedures applicable to object of the class are defined
together with the data declaration. It is awkward to be
obliged to define a new class solely because a method
(procedure) has been added or changed. In Oberon, procedure
(method) types rather than methods are connected with objects
in the program text. The binding of actual methods (specific
procedures) to objects (instances) is delayed until the
program is executed. In Smalltalk, the compatibility rules
between a class and its subclasses are confined to pointers,
thereby intertwining the concept of access method and data
type in an undesirable way. Here, the relationship between a
type in its extension is based on the established mathematical
concept of projection.
In Modula, it is possible to declare a poiner type within
an implementation module, and to export it as an opaque type
by listing the same identifier in the corresponding definition
module. The net effect is that the type is exported whereby
its associated binding remains hidden (invisible to clients).
In Oberon, this facility is generalized in the following way:
Let a record type be defined in a certain implementation part,
for example:
Viewer = RECORD width, height: INTEGER; x,y: INTEGER END
In the corresponding definition part, a partial
definition of the same type may be specified, for example
Viewer = RECORD width, height: INTEGER END
with the effect that a partial view - a public projection
- is visible to client. In client modules as well as in the
implementation part it is possible to define extensions of the
base type (e.g. TextViewers or GraphViewers).
Type inclusion
Modern processors feature arithmetic operation on several
number formats. It is desirable to have all these formats
reflected in the language as basic types. Oberon features five
of them:
LONGINT, INTEGER, SHORTINT (integer types)
LONGREAL, REAL (real types)
With the proliferation of basic types, a relaxation of
compatibility rules between them becomes almost mandatory.
(Note that in Modula the arithmetic types INTEGER, CARDINAL
and REAL are uncompatible). To this end, the notion of type
inclusion is introduced: a type T includes a type T', if the
values of T' are also the values of type T. Oberon postulates
The following hierarchy:
LONGREAL > REAL > LONGINT > INTEGER > SHORTINT
[Note that ">" should be replaced by the appropriate
mathematical sign. Limitation of type-in..]
The assigmant rule is relaxed accordinaly: A value of
type T' can be assigned to a variable of type T, if T' is
included in T (if T' extends T), i.e. if T > T' or T' => T. In
this respect, we return to (and extend) the flexibility of
Algol 60. For example, given variables
i: INTEGER; k: LONGINT; x: REAL
the assigments
k:=i; x:=k; k:=k+1; x := x*10 + i
are confirming to the rules, where the assigments
i:=k; k:=x
are not acceptable. Finally, it is wort noting that the
various arithmetic types represents a limited set of subrange
types.
The multi-dimensional open array and the closure statment
(in symmetry to a module's initialization body) are the
remaining facilities of Oberon not present in Modula.
Summary
The language Oberon has evolved from Modula-2 and
incorporates the experiences of many years of programming in
Modula. A significant number of features have been eliminated.
They appear to have contributed more to language and compiler
complexity than to genuine power and flexibility of
expression. A small number of features have been added, the
most significant one being the concept of type extension.
The evolution of a new language that is smaller, yet more
powerful than its ancestor is contrary to common practice and
trends, but has inestimable advantage. Apart from simpler
compilers, it results in a concise definition document [5],
and indispensible prerequisite for any tool that must serve in
the construction of sophisticated and reliable systems.
Acknowledgement
It is impossible to explicitly acknowledgement all
contributions od ideas that ultimately simmered down to what
is now Oberon. Most came from the use or study of existing
languages, such as Module-2, Ada, Smalltalk, C++ and Cedar,
which often though us how not to do it. Of particular value
was the contribution of Oberon's first user, J.Gutknext. The
author is grateful for his insistence on the elimination of
deadwoof and on basing the remaining features on a sound
mathematical foundation.
References
1. N.Wirth. Programming in Modula-2. Springer-Verlag, 1982.
2. N.Wirth. Type Extensions. ACM Trans. on Prog. Languages and
Systems (to appear)
3. G.Birtwistle, et al. Simula Begin. Auervach, 1973.
4. A.Goldberg, D.Robson. Smalltalk-80: The language and its
implementation. Addison-Weslay, 1983.
5. N.Wirth. The Programming Language Oberon (language
definition document)