TRAIAN 3.0 released on February 29, 2020

The CONVECS team is pleased to announce that version 3.0 of the TRAIAN compiler for LOTOS NT is available.

Major Enhancements

During the last two years, TRAIAN has been entirely rewritten by Hubert Garavel, Frederic Lang, and Wendelin Serwe.

The main limitation of TRAIAN was that it was a 20-year- old compiler, which was increasingly difficult to maintain. TRAIAN was built using the FNC-2 compiler generation system, which was no longer supported. For this reason, TRAIAN only existed in 32-bit versions, and sometimes hit the 4 GB RAM limit when processing large compiler specifications, such as those of LNT2LOTOS or EVALUATOR 5.

Therefore, a major version TRAIAN 3.0 was developed, with three main design decisions: (i) getting rid of FNC-2; (ii) supporting (most of) the LOTOS NT features accepted by TRAIAN 2.9, also introducing some features belonging to LNT, to prepare a future convergence between LOTOS NT and LNT; (iii) writing TRAIAN 3.0 in LOTOS NT itself, so that the previous version 2.9 of TRAIAN can be used to build the next one (i.e., compiler bootstrapping).

The internal architecture of TRAIAN 3.0 is totally different from that of TRAIAN 2.9, whereas both compilers nearly have the same size of source code: 31,614 lines of code for TRAIAN 3.0 (72% of LNT code, 11% of SYNTAX code, and 17% of C code) vs 33,210 lines of code for TRAIAN 2.9 (85% of FNC2 code, 7% of SYNTAX code, and 8% of C code).

The lexer and parser of TRAIAN 3.0 have been developed using the SYNTAX compiler-generation system developed at INRIA Paris. This work triggered various enhancements of the programming interfaces of SYNTAX.

The abstract syntax tree of LOTOS NT, the library of predefined LOTOS NT types and functions, the static semantics checking (identifier binding, type checking, dataflow analysis, etc.), and the C code generation have been redesigned: previously specified as FNC-2 attribute grammars, in the functional style, they have been written in LOTOS NT, in the imperative style.

Test bases gathering thousands of correct and incorrect LOTOS NT programs (920,000 lines of LOTOS NT code in total) have been set up and systematically used for non-regression testing.

Compared to TRAIAN 2.9, TRAIAN 3.0 brings major enhancements:

Compiler performance greatly improved. When compiling two large programs (LNT2LOTOS and TRAIAN 3.0 itself), TRAIAN 3.0 is 6-7 times faster and uses 120-150 times less memory (e.g., 25.7 MB instead of 3.9 GB, measured on a Dell server running OpenIndiana in 32-bit mode, both versions of TRAIAN being invoked with the "-noindent" option).

Native 64-bit binaries of TRAIAN 3.0 are now available for Linux, macOS, OpenIndiana, and Solaris 11. Together with the reduction of memory consumption, this fully solves the former 4 GB memory limitation that existed when running TRAIAN 2.9 in 64-bit compatibility mode.

Error and warning messages produced by TRAIAN 3.0 are clearer and more concise.

Concerning its input, TRAIAN 3.0 remains largely compatible with TRAIAN 2.9, apart from a few changes detailed below.

Concerning its outputs, TRAIAN 3.0 takes great care to generate exactly the same C files as TRAIAN 2.9, except when changes are necessary to fix bugs and, sometimes, in presence of recursive data structures, for which TRAIAN 3.0 may generate the same C code as TRAIAN 2.9 but in a different order.

The TRAIAN manual page, the LOTOS NT user manual, the mode files for the various editors, and the demo examples contained in the TRAIAN distribution have been updated to reflect the changes brought to the language and its compiler.

Compiler Changes

TRAIAN 2.9 would accept files with either ".lotnt" or ".lnt" extensions. From now on, the ".lotnt" extension is no longer supported, and TRAIAN 3.0 only accepts the ".lnt" extension, meaning that users may have to rename their ".lotnt" files.

The two compiler options "-analysis" and "-log" that existed in TRAIAN 2.9 have been removed from TRAIAN 3.0. A new option "-depend" was added, which displays dependencies between LOTOS NT files and can be useful to generate Makefiles automatically.

In TRAIAN 2.9, the definitions of predefined types (Bool, Nat, Int, Real, Char, and String) and all their associated functions were built-in in the compiler. In TRAIAN 3.0, all these definitions are contained in a library file named "lib/lotosnt_predefined.lnt", which users may wish to modify or replace with a custom library file. This will also ease future evolutions of LOTOS NT predefined types and functions.

Language Changes

In TRAIAN 2.9, the lower and upper cases were considered to be equivalent, both in keywords, pragmas, and user-defined identifiers.
In TRAIAN 3.0, all keywords must be written in lower case (with an exception for six infix operators "and", "andthen", "implies", "or", "orelse", and "xor", which can also be written entirely in upper case, for backward compatibility reasons).
It becomes therefore possible for users to define identifiers such as "TYPE", "FUNCTION", etc.
Users may have to update their LOTOS NT programs to rename a few keywords written in upper case; in practice, this mostly concerns "NULL" and "ANY", which must now be written "null" and "any".
Similarly, the pragmas ("!implementedby", "!comparedby", "!pointer", "!external", etc.) must now be written in lower case exclusively.
For identifiers, the lower and upper cases are still considered to be equivalent in TRAIAN 3.0, but users are warned not to rely on such equivalence, which is likely to be removed in a future version of the compiler.

In TRAIAN 2.9, special function identifiers could be formed using the following set of characters % & * + - . / > = ≤ @ \ = ^ ~ { }.
In TRAIAN 3.0, this set was restricted by removing the characters ".", "{", and "}", which are no longer allowed in special identifiers, removing several conflicts in the grammar. This also allows spaces between "." and "{" in the field update operator, whereas such spaces were forbidden by TRAIAN 2.9. Users may have to update their LOTOS NT programs that use these characters in special identifiers.

The syntactic grammar of TRAIAN 2.9 was too laxist and allowed bizarre code fragments without clear semantics, such as the following ones:
```
   function F (, B:Bool) is ...

   function F (B:Bool) raises [ ] is ...

   function F (B:Bool) raises [ , X:none, Y:none]

   var in ... end var  -- no variable declaration

   V.{V1, ..., Vn} -- meaningless field update, without "=>"

   function F is end function -- no function body
```
TRAIAN 3.0 has a much stricter parser, which forbids all these questionable code fragments. In the latter example, an empty function must either have a "null" body or an "!external" pragma.

As of TRAIAN 3.0, the "enum" and "record" types, which were marked as deprecated but still accepted by TRAIAN 2.9, are no longer accepted by the compiler. Thus, "enum" and "record" are no longer reserved keywords of LOTOS NT.

As of TRAIAN 3.0, "false", "true", and "not" are no longer reserved keywords of LOTOS NT. These are ordinary functions, the definitions of which can be found in the library file "lotosnt_prefefined.lnt". It becomes therefore possible to define overloaded functions named "false", "true", and "not" for other types than Bool.

As of TRAIAN 3.0, "iff" is no longer a reserved keyword of LOTOS NT. The Boolean equivalence operator is now noted "<=>". Users may have to update their LOTOS NT programs by replacing "iff" with "<=>".

For each LOTOS NT type, TRAIAN 2.9 would automatically define two identical functions noted "!=" and "/=" for implementing the disequality relation. TRAIAN 3.0 only generates the "!=" function, meaning that users may have to update their LOTOS NT programs by replacing "/=" with "!=".

As of TRAIAN 3.0, the unary minus operator is not anymore a built-in operator, but a function defined in the standard library. Therefore, if V is a value expression, the non-parenthesized form "-V" is not anymore accepted, and must be replaced by "-(V)", meaning that "-" is a unary function invoked in prefix form. Users may have to update their LOTOS NT programs accordingly to get them accepted by TRAIAN 3.0.

As of TRAIAN 3.0, the syntax of the "var" instruction has been aligned to that of LNT and simplified. It is no longer possible to declare and initialize variables simultaneously; initialization must be done using standard assignments. Users may have to update their LOTOS NT programs by replacing
```
   var X : T := E in ... end var
```
with
```
   var X : T in X := E ; ... end var
```

The syntax of the "trap" instruction has been simplified by removing the "exit" handler, which was never used in practice. Consequently, "exit" is no longer a reserved keyword of LOTOS NT.

In TRAIAN 2.9, exceptions could have zero or one parameter. Because it should have been a list of parameters rather than a single parameter, and because this parameter was not used actually, the LOTOS NT language was (temporarily) simplified by having zero parameters only. Thus, the language definition was modified at three places:
- It is no longer possible to declare a function with exceptions of arbitrary types, e.g.:
```
   function F (...) : ... raises [toolow:nat, zdiv:bool]
```
  From now on, all exceptions have to be declared with the special type "none", which means "no parameter".
- In the exception handlers of "trap" instructions, it is no longer allowed to specify a type T when declaring an exception, meaning that the syntax:
```
     trap exception X[:T] is ... in ...
```
  becomes
```
     trap exception X is ... in ...
```
- Finally, the "raise" instruction no longer accepts an optional parameter, e.g.
```
     raise X (1.23)
```
  Only the syntax without parameters is now supported:
```
     raise X
```
  Notice that TRAIAN 2.9 did not detect the situation where an exception declared with a parameter was raised without parameter, and generated incorrect C code in such case.

As of TRAIAN 3.0, the "break" instruction no longer accepts a parameter after the loop identifier, a feature that was never used. This solves a bug of TRAIAN 2.9, which accepted instructions such as
```
   break L (1.23)
```
or
```
   break (false)
but did not generate C code for this parameter.
```

In TRAIAN 2.x, "break" and "raise" were synonymous and could be freely substituted one by the other. One could thus use "break E" to raise the exception E, or "raise L" to break the loop L.
TRAIAN 3.0 now distinguishes between "raise" and "break" instructions. Exception names and loop names now belong to two different name spaces, making it impossible to raise an exception using "break" or to break a loop using "raise". For instance, the following code fragment was accepted by TRAIAN 2.9 but is now rejected by TRAIAN 3.0.
```
   function F() raises [X : none] is
      break X
   end function
```
Unfortunately, a few code fragments are valid for both TRAIAN 2.9 and TRAIAN 3.0, but yield different results:
```
   function F (X : Nat) : Nat is
      trap
         exception E is return 1
      in
         loop E in
            if X == 0 then
               raise E
            end if
         end loop;
         return 0
      end trap
   end function
```
Indeed, the call F (0) returned 0 with TRAIAN 2.9 (since "raise E" breaks loop E) but now returns 1 with TRAIAN 3.0 (since "raise E" raises exception E).

With TRAIAN 3.0, the syntax of LOTOS NT patterns has been extended by allowing patterns to contain infix operators and parentheses.

A new pragma "!version" has been added to LOTOS NT. Each occurrence of this pragma is followed by a character string, which is inserted in the C code generated by TRAIAN 3.0, and finally in the executable files produced by the C compiler. If the string starts with "@(#)", it will be displayed by the Unix command "what", thus allowing to know from which source LOTOS NT files a given executable has been generated.

TRAIAN 3.0 now requires that the C identifiers specified by the "!implemented" ("!comparedby", "!printedby", etc.) pragmas attached to types and functions are pairwise distinct. TRAIAN 2.9 did not implement such checks, allowing, e.g., two types or two functions to have "!implemented N" pragmas with the same identifier N.

TRAIAN 3.0 now forbids that the C identifiers specified by the "!implemented" ("!comparedby", "!printedby", etc.) pragmas attached to types and functions either start with the "TRAIAN_" prefix, or are reserved C keywords. Such checks were not performed by TRAIAN 2.9, which tolerated, e.g., definitions such as:
```
   type T is !implementedby "else" ... end type
```

Bug Fixes

TRAIAN 3.0 now ensures that any type or function has at most one pragma of each kind ("!implementedby" ("!comparedby", "!printedby", "!external", etc.). Such checks were not performed by TRAIAN 2.9, which tolerated, e.g., ambiguous definitions such as:
```
   !implementedby S1 !comparedby S2 !comparedby S3
```

With TRAIAN 2.9, a type or function could be declared as "!external" without "!implementedby" pragma. In such case, the compiler would generate unique C names that could hardly be guessed by the user (e.g., "TRAIAN_ADT_FUN_TEST_F_250" for an external LOTOS NT function named "F"). TRAIAN 3.0 addresses this issue by generating a warning in such case.

TRAIAN 2.9 allowed to attach "!external" pragmas to the constructors of a type. The compiler did not strictly verify these pragmas, but applied informal rules:
- Given a type T declared "!external" and a constructor C of type T not declared "!external", the pragma "!external" was silently added to C.
- Given a type T not declared "!external" and a constructor C of type T declared "!external", the pragma "!external" of C was silently deleted.
With TRAIAN 3.0, the LOTOS NT grammar now forbids attaching pragma "!external" to constructors. Therefore, all the constructors of a type T will be external iff T is declared "!external". Users may have to update their LOTOS NT source files to remove "!external" pragmas for constructors.

TRAIAN 2.9 would sometimes crash when analyzing an incorrect LOTOS NT program. This was due to a conflict between the SYNTAX parser generator, which implements automatic recovery from syntax errors, and the FNC2 compiler generator, which did not properly handle all the missing code fragments inserted by SYNTAX. This problem no longer exists with TRAIAN 3.0, which stops if lexical or syntactic errors have been detected.

TRAIAN 3.0 now accepts successive closing braces, as in, e.g. "V.{X => Y.{Z => 1}}". Previously, TRAIAN 2.9 would require at least one a space to be inserted between the two closing braces.

TRAIAN 2.9 did not always detect when a module contained two definitions of the same type identifier. This is now strictly forbidden by TRAIAN 3.0.

TRAIAN 2.9 did not always detect the situation where two constructors C1 and C2 of the same type had each an argument named with the same name, e.g., X, but with two distinct types for argument X of C1 and argument X of C2. In such case, TRAIAN 2.9 could generate incorrect C code with subtle typing errors that could remain undetected in large programs. TRAIAN 3.0 now systematically checks that all the arguments of the constructor of a given type have the same type if they have the same name.

TRAIAN 2.9 did not always detect when a module contained two definitions of the same function identifier with identical argument and result types. This is now strictly forbidden by TRAIAN 3.0.

TRAIAN 2.9 sometimes rejected correct LOTOS NT programs that contained functions called with a parameter list in the named style if the parameter list was in a different order than the one given when the function was declared. Such programs could be rejected with an incomprehensible type error message or without explanation at all. This program was solved in TRAIAN 3.0.

TRAIAN 2.9 would sometimes reject correct LOTOS NT programs containing non-ambiguous calls to overloaded functions, emitting an obscure error message ("Error: type checking"). This problem was solved in TRAIAN 3.0.

TRAIAN 2.9 would reject code fragments such as
```
   trap exception EZ1 is return 1 in ...
```
with an obscure error message
```
   Unexpected natural or integer constant: expression of incorrect type
```
One had to write
```
   trap exception EZ1 is return 1 of Nat in ...
```
to get the program accepted. This problem was solved in TRAIAN 3.0, which properly handles function overloading.

TRAIAN 2.9 would accept a function F without return type to be called in an assignment instruction, e.g. "X := F (...)", and would generate incorrect C code. This problem was solved in TRAIAN 3.0, which now rejects such incorrect LOTOS NT programs with a clear error message.

TRAIAN 2.9 would sometimes reject correct LOTOS NT programs containing "loop/break" instructions, in particular the instruction "loop L in break L end loop", which was rejected with an incomprehensible type error message. This problem was solved in TRAIAN 3.0, which now accepts such programs.

When compiling "case" instructions, TRAIAN 2.9 would create C variables named "TRAIAN_ADT_LOCAL_V<number>"; the names of these variables could collide with other C variables generated by TRAIAN 2.9 for LOTOS NT variables named "V<number>". This problem was solved in TRAIAN 3.0, which now generates distinct names for these two classes of C variables.

TRAIAN 2.9 could generate incorrect C code when an "out" or "inout" function parameter was assigned a value by pattern-matching in a "case" instruction. In the C code produced by TRAIAN 2.9, a pointer indirection was missing, triggering warnings from the C compiler, e.g.:
```
   improper pointer/integer combination: op "=" assignment makes pointer from integer without a cast
```
and possibly causing crashes at run-time. This problem was solved in TRAIAN 3.0.

When TRAIAN 2.9 detected a function call in which some actual "out" and/or "in out" parameters were identical, it issued an appropriate error message but then proceeded to the code generation phase without exiting, as if the source code was correct. This problem was solved in TRAIAN 3.0.

When an input LNT program contained redundant "raises" clauses, as in "(X - Y) raises [E1] raises [E2]", TRAIAN 2.9 issued a warning during syntax analysis; the line and column number printed in the warning message were often incorrect (e.g., huge numbers far beyond the number of lines of the source program). This was solved in TRAIAN 3.0, which now displays correct line numbers, and issues a fatal error to avoid any ambiguity on the exception actually raised (namely, E1 or E2).

TRAIAN 2.9 dataflow analysis was not 100% exhaustive. For instance, it did not detect a few cases where a LOTOS NT function declared to compute a value would terminate without properly returning a result, or where a LOTOS NT function declared to compute an "out" parameter would terminate without properly assigning that parameter. It could also accept certain "trap" instructions, in which variables are not initialized before used. Such issues were solved in TRAIAN 3.0.

When accessing or updating a field X of a type T having more than one constructor, TRAIAN 2.9 would systematically assume that an exception can be raised (in case some constructor of T has no field named X), and would issue a fatal error if such exception was not specified by a "raises" clause.
TRAIAN 3.0 implements a finer analysis and does not assume that an exception can be raised if all the constructors of T have a field X. In such case, TRAIAN 3.0 considers that the exception cannot occur and even raises a warning if a superfluous "raises" clause is present.

Future Work

TRAIAN 3.0 will continue to evolve, our goal being to merge LOTOS NT and LNT in one single language.

Download

TRAIAN 3.0 can be freely downloaded from the TRAIAN Web Page located at:

http://vasy.inria.fr/traian