BULL | Jack Abily Sylvie Lesmanne (project leader) Solofo Ramangalahy Yehong Xing |
INRIA / VASY | Damien Bergamini (software engineer) Hubert Garavel (research director) Radu Mateescu (research scientist) |
Since 1995, there has been a long-standing collaboration of VASY and BULL, to which the former PAMPA team of INRIA Rennes participated until December 2000. This collaboration aims at demonstrating that the formal methods and tools developed at INRIA for validating and testing telecommunication protocols can also be successfully applied to BULL's multiprocessor architectures. The long-term objective is to develop a complete and integrated solution supporting formal specification, simulation, rapid prototyping, verification, test generation, and test execution.
A first phase of this collaboration took place from 1995 to 1998 in the framework of the DYADE joint venture between BULL and INRIA. Two case studies were successfully tackled: the POWERSCALE bus arbitration protocol and the POLYKID multiprocessor architecture. The feasibility of the proposed approach was established and BULL expressed its interest in pursuing the collaboration for its new architectures.
Since October 1998, we have been working on FAME, the CC-NUMA multiprocessor architecture developed by BULL for its NOVASCALE series of high-performance servers based on INTEL ITANIUM 64 bits processors. Initially informal, this collaboration was officialized in 1999 as a DYADE action named FORMALFAME, which lasted until the end of DYADE in March 2001. Since then, the collaboration is going on under the form of a BULL-INRIA contract, for which we kept the name FORMALFAME. This contract gathers VASY team and the BULL team of Sylvie Lesmanne (who succeeded Anne Kaszynski in March 2002) in Les Clayes-sous-Bois, the coordination being ensured by a BULL engineer (M. Zendri, then N. Zuanon, then S. Ramangalahy) located at INRIA Rhône-Alpes.
FORMALFAME successively focused on several critical components of the FAME architecture: the CCS circuit that manages communications for a group of four processors and the NCS circuit that manages network communications (from October 1998 to November 1999), the B-SPS circuit that implements the cache coherency protocol (from December 1999 to March 2002), and, since then, the PRR block and the ILU unit, which are two sub-components of the B-SPS circuit to which a particular attention is drawn. For each of these components, LOTOS descriptions were written, which provided a formal basis for testing and verification.
In 2003, the results obtained by FORMALFAME concern, on the one hand, the validation and testing of PRR and ILU and, on the other hand, the improvement of various CADP tools.
Since October 1998, we have been co-operating with BULL in the framework of the FORMALFAME contract devoted to the use of formal methods for verifying and testing multiprocessor architectures. Our work targets at FAME, the CC-NUMA architecture developed by BULL for its NOVASCALE series of high-performance servers based on 64 bits INTEL ITANIUM processors. The core of FAME is a complex circuit (named B-SPS), which implements the FAME cache coherency protocol and performs routing between processors and input/output nodes. The complexity of this circuit comes both from the internal parallelism inherent to routing and from multiple accesses to shared data, the consistency of which must be preserved.
After studying the B-SPS as a whole (1999-2002), FORMALFAME currently focuses on two complex sub-components of B-SPS: the PRR (Pending Request Responses) block and the ILU (InterLeaving) unit, which are modeled and analyzed in great detail. In 2003, work took place in two directions:
The development of LOTOS specifications for PRR (about 7,500 lines of LOTOS and 200 lines of C code) and ILU (about 8,900 lines of LOTOS and 3,400 lines of C code) was done at BULL Les Clayes-sous-Bois by Yehong Xing and Jack Abily. Two successive versions V1 and V2 were described, the former corresponding to a chip sent to foundry in March 2003 and the latter to the forthcoming evolution of this chip. The role of VASY was essentially to provide expertise in writing LOTOS specifications (for instance, on the proper way of introducing random number generation in formal models, on the decision to implement a given feature either as a LOTOS process or as a LOTOS data operation, etc.).
A long-standing goal of FORMALFAME is to reuse the LOTOS specification developed for a hardware system (or unit, or block, respectively) as a means to check the correctness of the VERILOG design implementing this system (or unit, or block). So doing, the effort invested in formal specification is rewarded not only by finding mistakes at a high level of abstraction, but also in the actual system implementation itself.
In 2003, we experimented two different approaches to validate the VERILOG designs of PRR and ILU.
As regards PRR, the conformance between the LOTOS specification and the VERILOG design is checked by means of random test sequences generated automatically from the LOTOS specification (using the EXECUTOR tool of CADP). This approach is justified by the deterministic nature of PRR, which always emits the same outputs when given the same inputs. The generated test sequences contain both input and output events; they are applied to the VERILOG design of PRR using CADENCE's TESTBUILDER tool . This approach proved to be successful as large test sequences (more than 100,000 transactions) were produced and allowed to find a problem in PRR V1 (technically, a non-conformity between the LOTOS specification and the VERILOG design). This problem was not detected by other techniques because it has no impact on global cache coherence/memory consistency.
As regards ILU, the conformance between the LOTOS specification and the VERILOG design is checked by translating the LOTOS specification into a C program that is connected to the VERILOG design of ILU using the EXEC/CAESAR and TESTBUILDER tools. This C program drives the execution of the VERILOG design, sending inputs and checking the correctness of received outputs. The inputs are generated randomly using an history-based guidance mechanism, which gives priority to branches that have not been chosen already. The approach targets at the extensive testing of the VERILOG design by letting this guided simulation execute automatically for a certain amount of time until a suitable coverage level is reached. For ILU, the coverage criterion used in 2003 is the percentage of transitions (in the Petri net generated by CAESAR from the LOTOS specification of ILU) fired during the guided simulation. This approach proved to be technically challenging due to the intrinsic complexity of the ILU behavior and to the programming machinery needed to interface the ILU design with its formal specification. For the approach to be effective, two main issues had to be solved: The simulation speed was initially too slow (about 360 visible transitions per second) and the coverage was originally too small (15% of transitions fired). These issues were addressed in several ways:
LOTOS specification of ILU was modified to define various hardware data structures (e.g., associative search tables, directories, etc.) using data types and functions rather than LOTOS processes.
The efficiency of the C code generated by CAESAR was significantly improved (see the VASY 2004 Activity Report for details).
The firing of tau-transitions evolved from fixed priority order to random order.
These changes led to a satisfactory solution (speed multiplied by a factor of 3 and coverage increased up to nearly 100%). The FORMALFAME approach is now fully operational and used intensively for the validation of PRR V2 and ILU V2. The next research steps of FORMALFAME investigate a different coverage criterion: the percentage of reachable, visible labels produced during the guided simulation (thus measuring coverage for both LOTOS data types and gate offers).