From Modula to Oberon

                           N. Wirth

                   Tuesday 23 February 1988


                           Abstract

     The  programming  language  Oberon  is  the  result  of a
concentrated   effort  increase  the  power  of  Modula-2  and
simultaniously to reduce its complexity. Several features were
eliminated,  and  a  few  were  added in order to increase the
expressive  power  and flexibility of the language. This paper
describes  and  motivates the changes. The language is defined
in a concise report.

BIX oberon/long_messages#11, from frode,21774 chars,
Tue Feb 23 20:43:46 1988.
TITLE Oberon Language Article, (c) ETH-ZENTRUM, SWITZERLAND.


Introduction

     The  programming  language  Oberon evolved from a project
whose  goal was the design of a modern, flexible and efficient
operating  system  for  a single-user workstation. A principal
guideline  was to concentrate on properties that are genuinely
essential  and - as a consequense - to omit emphemeral issues.
It  is  best  way  to  keep  the  system  in  hand, to make it
understandable,    explicable,    reliable   and   efficiently
implementable.

     Initially,   it   was   planned   to  express  system  in
Modula-2[1],  as  that language supports the notion of modular
design  quite effectively, and because an operating system has
to  be  design  in  terms  of separately compilable parts with
consientiously chosen interfaces. In fact, an operating system
should  be no more than a set of basic modules, and the design
of  an  application  must  be  considered  as  a goal-oriented
extension of that basic set: Programming is always extending a
given system.

     Whereas  modern  languages,  such as Modula, supports the
notion  of extensibility in the procedure realm, the notion is
less  well  established in the domain in data types. Modula in
particular  does  not allow the definition of a new data types
as  extensions  of  other,  programmer-defined  types  in  the
adequate  manner.  An  additional  features  was  called  for,
thereby giving rise to an extension of Modula.

     The  concept  of the planned operating system also called
for a highly dynamic, centralized storage managment relying on
the  technique  of garbage collection. Althoug Modula does not
prevent the incorporation of a garbage collector in principle,
its variant records feature constitutes a genuine obstacle. As
the  new  facility  for extending types would make the variant
records  feature  superfluous,  the  removal of this stumbling
block  was a logical desicion. This step, however gave rise to
a  restriction  (subset)  of Modula. It soon became clear that
the  rule  to  concentrate  on  the essential and to eliminate
inessential  should  not  only be applied to the design of the
new  system,  but equallu stringently to the language in which
the  system  is  fornulated.  The application of the principal
thus  led  from  Modula  to a new language. The adjective new,
however,  has  to  be  understood  in  proper  context: Oberon
evolved  from  Modula  by  a  very  few  addition  and several
substractions.  In relying on evolution rather then revolution
we remain in the tradition of a long development that led from
Algol  to Pascal, then Modula-2, and eventially to Oberon. The
common  trait  of  these languages are their procedural rather
than  functional  model,  and  the strict typing of data. More
fundamental  even  is  perhaps  the  idea  of abstraction: the
language  must  be  defined in terms of mathematical, abstract
concepts without reference to any computing mechanism. Only if
a   language   satisfies  this  criterion  can  it  be  called
"higher-level".  No  syntactic  coasting whatsoever can earn a
language this attribute alone.

     The  definition  of  a  language  must  be  coherent  and
concise.  This  can only be achived by a careful choice of the
underlying  abstractions  an  appropriate  structure combining
them.  The  language  manual must be resonably short, avoiding
the explanation of individual cases derivable from the general
rules. The power of a formalism must be measured by the length
of  its  description.  To  the  contrary,  an  overly  lengthy
definition  is  a sure symptom of inadequacy. In this respect,
not complexity but simplicity must be goal.

     In  spite of its brevity, a description must be complete.
Completeness  is  to  be  achived whithin the framework of the
chosen   abstractions.   Limitations   imposed  by  particular
implementations do not belong to a language definition proper.
Examples  of  such  restrictions  are  the  maximum  values of
numbers,  rounding  and  truncation  errors in arithmetic, and
actions  taken  when  a  program violates the stated rules. It
should  not  be  necessary to supplement a language definition
with  voluminous  standarts  document  to  cover  "unforeseen"
cases.

     But   neither   should   a   programming  language  be  a
mathematical  theory  only.  Its  must be practical tool. This
imposes  certain  limits  on  the  terseness of the formalism.
Several  features  of  Oberon  are  superfluous  from a purely
theoretical  point of view. They are nevertheless retained for
practical  reasons,  either for programmers' convenience or to
allow  for efficient code generation whithout the necessity of
complex,   "optimizing"   pattern   matching   algorithms   in
compilers.  Examples  of  such  features  are  the presence of
several  forms  of  repetitive  statements,  and  of  standard
procedures  such as INC, DEC, and ODD. They complicate neither
the  language conceptually nor the compiler to any significant
degree.

     These  underlying  premises  must  be  kept  in mind when
comparing  Oberon  with  other languages. Neither the language
nor   its  defining  document  reach  the  ideal;  but  Oberon
approximates these goals much better than its predecessors.

     A  compiler  for  Oberon  has  been  implemented  for the
NS32000  processor  family  and  is  embedded  in  the  Oberon
operating  environment. The following data provide an estimate
of  the  simplicity  and efficiency of the implementation, and
readers  are  encouraged to compare them whith implementations
of other languages. (Measurements with 10 MHz NS32032).

--------------------------------------------------------------
                  lenght of         lenght of     time of self
                source program    compiled code    compilation
--------------------------------------------------------------
               lines  characters          bytes        seconds
--------------------------------------------------------------
Parser           1116      36719           9928          11.53
Scanner           346       9863           3388           3.80
Import/Export     514      18386           4668           5.25
Code generator   1963      65901          21636          21.02
Total            3939     130869          39620          41.60


     Subsequently,  we  present a brief introduction to Oberon
assuming familiarity with Modula (or Pascal), concentrating on
the  added  features and listing the eliminated ones. In order
to be able to "start with a clean table", the latter are taken
first.


Fetures omitted from Modula

Data types

     Variant records are eliminated, because they constitute a
genuine  difficulty  for  the  implementation  of  a  reliable
storage   management   system   based   on  automatic  garbage
collection.  The functionality of variant records is preserved
by the introduction of extensible data types.

     Opaque  types  cater  to the concept of the abstract data
type and information hiding. They are eliminated because again
the  concept is covered by the new facility of extended record
types.

     Enumeration types appear to be a simple enough feature to
be  uncontroversial.  However,  they  defy  extensibility over
module  boundaries.  Either  a  facility to extend enumeration
types  would  have  to be introduced, or they would have to be
dropped.  A  reason  in favour of the latter, radical solution
was  the  observation that in a growing number of programs the
indiscriminate  use of enumerations had led to a pompous style
that  contributed  not  to  program  clarity,  but  rather  to
verbosity.  In connection with import and export, enumerations
gave  rise  to  the  exceptional  rule  that  import of a type
identifier   also   causes   the  (automatic)  import  of  all
associated  constant identifiers. This exceptional rule defies
conceptual  simplicity  and causes unpleasant problems for the
implementor.

     Subrange  types were introduced in Pascal (and adopted in
Modula)  for  two  reasons:  (1)  to  indicate that a variable
accepts a limited range of values of the base type and allow a
compiler  to  generate  appropriate guards for assigments, and
(2)  to  allow  a  compiler  to allocate minimal storage space
needed  to  store  values  of  the  indicated  subrange.  This
appeared desirable in connection with packed records. Very few
implementations  have  taken  advantage  of  this space saving
facility,  because  additional  compiler  complexity  is  very
considerable.  Reason  1  alone,  however,  did  not appear to
provide  sufficient  justification to retain subrange facility
in Oberon.

     With  the absence of enumerations and subrange types, the
general  posibility  to define set type based on given element
types appeared as redundant. Instead, a single, basic type SET
is  introduced, whose values are sets of integers from 0 to an
implementation-defined maximum.

     The  basic  type CARDINAL had been introduced in Modula-2
in  order  to  allow  address arithmetic with values from 0 to
2**16  on  16-bit  computers.  With  the  prevalence of 32-bit
address in modern processors, the need for unsigned arithmetic
has  practically vanished, and therefore the type CARDINAL has
been  eliminated. With it, the bothersome incompatibilities of
operands of types CARDINAL and INTEGER have disappeared.

     The  notion of a definable index type of arrays also been
abandoned:  All indicies are by default integers. Furthermore,
the  lower  bound  is fixed to 0; array declarations specify a
number  of  elements (length) rather than pair of bounds. This
break   with   a   long  standing  tradition  since  Algol  60
demonstrates the principle of eliminating the inessential most
clarity.   The  specification  of  an  arbitrary  lower  bound
provides  no  expressive  power  at  all,  but it introduces a
non-negligible  amount  of hidden, computational effort. (Only
in  the case of static declarations can it be delegated to the
compiler).

Modules and import/export rules

     Experience  with  Modula  over  last eight year has shown
that   local   modules   were   rarely  used.  The  additional
complexibility  of  the  compiler required to handle them, and
the  additional  complications  in the visibility rules of the
language definition appear not to justify local modules.

     The qualification of an imported object's identifier x by
the exporting module's name M, viz. M.x can be circumvented in
Modula  by  the use of the import clause FROM M IMPORT x. This
facility  has  also  been discarded. Experience in programming
systems  involving  many  modules has taught that the explicit
qualification of each occurrence of x is actualy preferable. A
simplification of the compiler is a welcome side-effect.

     The   dual   role   of  the  main  module  in  Modula  is
conceptually  confusing. It constitutes a module in sense of a
package  of  data  and  procedures  enclosed  by  a  scope  of
visibility,  and  at  the  same  time  it constitutes a single
procedure  called  the main program. Within the Oberon system,
the  notion of a main program is vanished. Instead, the system
allows  the  user  to  activate  any  (exported,parameterless)
procedure  (called  a  command).  Hence, the language excludes
modules  without explicit definition part, and every module is
defined  in  terms  of a definition part and an implementation
part (not definition module and implementation module).

Statments

     The with statment has been discarded. Like in the case of
exported  identifiers,  the  explicit  qualification  of field
identifiers is to be preferred.

     The  elimination  of the for statment constitutes a break
with another long standing tradition. The baroque mechanism in
Algol  60's  for  statment  had  been  trimmed considerably in
Pascal (and Modula). Its marginal value in practice has led to
its absence in Oberon.

Low-level facilities

     Modula-2  makes  access  to  machine-specific  facilities
possible  through  low-level  constructs,  such  as data types
ADDRESS  and  WORD, absolute addressing of variables, and type
casting  functions. Most of them are packaged in module called
SYSTEM.  The  features were supposed to rarely used and easily
visible  throug  the  presense of SYSTEM in a modules's import
list.  Experience  has  revealed,  however,  that  significant
number    of    programmers    import    this   module   quite
indiscriminately.  A  particulary  seducing  trap are Module's
type transfer functions.

     It appears preferable to drop the pretense of partability
of  programs  that  import  a  "standard", yet system-specific
module. Both the module SYSTEM and the type transfer functions
are eliminated, and with them also the types ADDRESS and WORD.
Individual  implementors  are free to provide system-dependent
modules,  but  they  do  not  belong  to  the general language
definition.  Their  use then declares a program to be patently
implementation-specific, and thereby not portable.

Concurrency

     The   system   Oberon   does  not  require  any  language
facilities  for  expressing concurrent process. The pertinent,
rudimentary  features  of Modula, in particular the coroutine,
were  therefor  not  retained.  This  exclusion  is  merely  a
reflection  of  our  actual needs within the concrete project,
but   not   on   the   general  relevance  of  concurrency  in
programming.

Features introduced in Oberon

Type extension

     The  most  important addition is the facility of extended
record  types. It permits the construction of new types on the
basis  of existing types, and establishing a certain degree of
compatibility  between  the  names  of  the new and old types.
Assuming a given type

T = RECORD x,y: INTEGER END

     extension  may be defined which contain certain fields in
addition to the existing ones. For example

T0 = RECORD (T) z: REAL END;
T1 = RECORD (T) w: LONGREAL END;

     define types with fields x,y,z and x,y,w respectively. We
define a type declared by

T' = RECORD (T) <field declarations> END

     to  be  a (direct) extension of T, and conversely T to be
the  (direct)  base type of T'. Extended types may be extended
again, giving rise to the following definitions:
     A  type  T'  is  an  extension  of  T,  if T' is a direct
extension of an extension of T. Conversely, T is a base of T',
if  T=T'  or  T  is  the  direct base of a base type of T'. We
denote this relationship by T'=>T.

     The rule of assigment compatibility states that values of
an  extended  type  are  assignable  to variable of their base
types.  For  example, a record of type T0 can be assigned to a
variable of the base type T. This assigmant involves the field
x  and  y  only,  and  in fact constitutes a projection of the
value onto space spanned by the base type.

     It  is important that an extended type may be declared in
a module that imports the base type. In fact, this is probably
the normal case.

     This concept of extensible data type gain importance when
extended  to pointers. It is appropriate to say that a pointer
type P' bound to T' extends a pointer type P, if P is bound to
a base type T of T', and to extend the assigmant rule to cover
this  case.  It is now possible to form structures whose nodes
are  of  different  types, i.e. inhomogenious data structures.
The inhomogeneity is automatically (and most sensibly) bounded
by  the  fact  that  the nodes are linked by the pointers of a
common base ype.

     Typically,  the pointer fields establishing the structure
are   contained  in  the  base  type  T,  and  the  procedures
manipulating  the  structure  are  defined  in the same (base)
module  as  T. Individual extensions (variants) are defined in
client  modules together with procedures operating on nodes of
the  extended type. This scheme is in full accordance with the
notion  of  system  extensibility:  new  modules  defining new
extensions may be added to a system without requiring a change
of the base modules, not even their recompilation.

     As  access  to  an individual node via pointer bound to a
base  type  provides a projected view of the node data only, a
facility  to  widen  the  view is nessasary. It depends on the
possibility  to  determine  the  actual type of the referenced
node. This is achieved by a type test, a Boolean expression in
the form

t IS T' (or p IS P')

     If  the  test  is  affirmative, an assigment t':=t (t' of
type  T')  or  p':=p  (p'  of type P') should be possible. The
static  view of types, however, prohibits this. Note that both
assigments  violate  the  rule of assigment compatibility. The
desired statment is made possible by providing a type guard in
the form

t':=t(T)    (p':=p(P))

     and  by  the  same token access to the field z of T0 (see
previous  example)  is  made  possible  by a type guard in the
designator   t(T0).z.   Here  the  guard  asserts  that  t  is
(currently) of type T0.

     The  declaration of extended record types, the type test,
and   the   type  guard  are  the  only  additional  feartures
introduced  in  this  context.  A more extensive discussion is
provided  in  [2].  The  concept  is very similar to the class
notion of Simula 67 [3], Smalltalk [4], and others. Difference
lie  in  the  fact that the class facility stipulates that all
procedures  applicable  to  object  of  the  class are defined
together  with  the  data  declaration.  It  is  awkward to be
obliged  to  define  a  new  class  solely  because  a  method
(procedure)  has  been  added or changed. In Oberon, procedure
(method)  types rather than methods are connected with objects
in  the  program text. The binding of actual methods (specific
procedures)  to  objects  (instances)  is  delayed  until  the
program  is  executed.  In  Smalltalk, the compatibility rules
between  a  class and its subclasses are confined to pointers,
thereby  intertwining  the  concept  of access method and data
type  in  an undesirable way. Here, the relationship between a
type in its extension is based on the established mathematical
concept of projection.

     In Modula, it is possible to declare a poiner type within
an  implementation  module, and to export it as an opaque type
by listing the same identifier in the corresponding definition
module.  The  net  effect is that the type is exported whereby
its  associated binding remains hidden (invisible to clients).
In  Oberon, this facility is generalized in the following way:
Let a record type be defined in a certain implementation part,
for example:

Viewer = RECORD width, height: INTEGER; x,y: INTEGER END

     In   the   corresponding   definition   part,  a  partial
definition of the same type may be specified, for example

Viewer = RECORD width, height: INTEGER END

     with the effect that a partial view - a public projection
-  is  visible  to client. In client modules as well as in the
implementation part it is possible to define extensions of the
base type (e.g. TextViewers or GraphViewers).


Type inclusion

     Modern processors feature arithmetic operation on several
number  formats.  It  is  desirable  to have all these formats
reflected in the language as basic types. Oberon features five
of them:

LONGINT, INTEGER, SHORTINT (integer types)
LONGREAL, REAL (real types)

     With  the  proliferation  of basic types, a relaxation of
compatibility  rules  between  them  becomes almost mandatory.
(Note  that  in  Modula the arithmetic types INTEGER, CARDINAL
and  REAL  are  uncompatible). To this end, the notion of type
inclusion  is  introduced: a type T includes a type T', if the
values  of T' are also the values of type T. Oberon postulates
The following hierarchy:

LONGREAL > REAL > LONGINT > INTEGER > SHORTINT

     [Note  that  ">"  should  be  replaced by the appropriate
mathematical sign. Limitation of type-in..]

     The  assigmant  rule  is  relaxed accordinaly: A value of
type  T'  can  be  assigned  to a variable of type T, if T' is
included in T (if T' extends T), i.e. if T > T' or T' => T. In
this  respect,  we  return  to (and extend) the flexibility of
Algol 60. For example, given variables

i: INTEGER; k: LONGINT; x: REAL

     the assigments

k:=i; x:=k; k:=k+1; x := x*10 + i

     are confirming to the rules, where the assigments

i:=k; k:=x

     are  not  acceptable. Finally, it is wort noting that the
various  arithmetic types represents a limited set of subrange
types.

     The multi-dimensional open array and the closure statment
(in  symmetry  to  a  module's  initialization  body)  are the
remaining facilities of Oberon not present in Modula.

Summary

     The   language  Oberon  has  evolved  from  Modula-2  and
incorporates  the  experiences of many years of programming in
Modula. A significant number of features have been eliminated.
They  appear to have contributed more to language and compiler
complexity   than   to   genuine   power  and  flexibility  of
expression.  A  small  number of features have been added, the
most significant one being the concept of type extension.

     The evolution of a new language that is smaller, yet more
powerful  than its ancestor is contrary to common practice and
trends,  but  has  inestimable  advantage.  Apart from simpler
compilers,  it  results  in a concise definition document [5],
and indispensible prerequisite for any tool that must serve in
the construction of sophisticated and reliable systems.

Acknowledgement

     It   is  impossible  to  explicitly  acknowledgement  all
contributions  od  ideas that ultimately simmered down to what
is  now  Oberon.  Most  came from the use or study of existing
languages,  such  as  Module-2, Ada, Smalltalk, C++ and Cedar,
which  often  though  us how not to do it. Of particular value
was  the  contribution of Oberon's first user, J.Gutknext. The
author  is  grateful  for his insistence on the elimination of
deadwoof  and  on  basing  the  remaining  features on a sound
mathematical foundation.

References

1. N.Wirth. Programming in Modula-2. Springer-Verlag, 1982.
2. N.Wirth. Type Extensions. ACM Trans. on Prog. Languages and
   Systems (to appear)
3. G.Birtwistle, et al. Simula Begin. Auervach, 1973.
4. A.Goldberg, D.Robson. Smalltalk-80: The language and its
   implementation. Addison-Weslay, 1983.
5. N.Wirth. The Programming Language Oberon (language
   definition document)