In 96/97, the Pregnana R&D center and the Bull-Inria GIE DYADE collaborated in applying formal methods to the Polykid architecture design process. This was carried out by the DYADE action VASY (VAlidation of SYstems).The aim of this action is to experiment and evaluate the use of FM (Formal Methods) in large projects having in mind a technology transfer into Bull.
In order to assess this experiment, this note reports the results obtained and the status at mid-october 97. Some conclusions are drawn for inserting FM in future projects.
FM have been already used in Bull in hardware design to prove that the logical behaviour of a chip is equivalent to the physical layout. Within Polykid project, FM are used at a completely different level to find issues in the specification of the overall system design (architecture and protocol specification), starting at the earlier phases of the development process.
"FM refer to the use of techniques from formal logic and discrete mathematics in the specification, design and construction of computer system and software. [...] This is analoguous to the role of mathematics in all other engineering disciplines: mathematics provide ways of modeling and predicting the behavior of systems through calculations." (quoted from ). For more information about FM see the reference documents and the web sites listed below. Let us only point out that we use the FM technique called state enumeration or model checking based on automata generation (there is another technique based on logic provers).
This note is organized as follows:
The application of FM (formal methods) to the verification of the cache coherence protocol of Polykid (Bull CC-NUMA multiprocessor project) raised 20 issues throughout the protocol specification development (among which 8 behavioral errors) and costed about 10 man.months. Some of these errors are very hard to reproduce through tests and simulations. In the context of such complex systems, FM proved to be a protocol debugging aid and helped in improving the product quality.
We used the CADP toolbox (provided by INRIA Rhône-Alpes and Verimag) to compile, simulate and verify the formal specifications of Polykid cache coherence protocol. These were small configurations and we still have to work on tackling formal specification and verification of larger configurations.
Work is in progress also to apply formal methods techniques to generate test cases automatically, using the TGV tool (provided by INRIA-Rennes and Verimag): it will be applied in Polykid test environment.
The experiment started at the beginning of Polykid specification in February 1996. The goals were:
We can distinguish three phases in the application of formal methods to Polykid, depending on the maturity of the architecture engineering product specification.
|Specification under construction
(cache coherence protocol)
55 questions raised to Polykid architects
2 formal specifications (4000 and 2000 lines)
|Draft versions of EPS* rev AC||1.5 month,
20 questions raised to Polykid architects
|7 issues raised|
|EPS* rev AE to AF||2 months,
10 questions raised to Polykid architects
a formal specification of 2000 lines (2 versions developped),
3 properties verified (on small configuration)
|13 issues raised|
*EPS = Engineering Product Specification
Phase 1. (February to July 96) The Polykid cache coherence protocol was under construction. In order to build a formal specification, the first task was to interact with the architects and designers to get the information and to keep current with all the changes (moving target).
Obviously, in this phase, it is difficult to speak about errors because the features were not yet frozen: the main benefit was to clarify undocumented concerns and to point out potential issues. All the more since the most delicate points (eg, the collision rules) were not yet specified.
About fifty-five questions were raised to Polykid hardware architects (40 oral ones and 15 written ones) over six months. Two formal LOTOS specifications were produced. The first one (4000 LOTOS lines) was very detailed (although incomplete) and so was too large to be compiled by the LOTOS compiler. The second one (2000 lines) was focused on the cache coherence rules (without the collision rules). It was possible to execute this model (user-driven simulation) but the description was still too detailed and not decomposable, so that formal verification was not feasible.
Phase 2. (September to December 96) From EPS rev.AB to AC. The cache coherence protocol was now complete in draft versions of EPS rev.AC. About twenty questions are asked to the architects to understand the last changes and to get additional information. Seven issues were raised, categorized as follows:
Formal specification and verification activity was reduced from October 96 and then completely suspended from January 97 until June 97, in order to developp RCC software emulator (TSP-system).
Phase 3. (July 97 to Sep 97) From EPS rev.AE to AF. The specification was now mature and fairly stable. In order to build a new formal specification (small and decomposable), a new thorough analysis of the EPS was undertaken, raising about ten questions to the architects. This led to raise 13 issues, categorized as follows:
The Polykid hardware architects proposed solutions to these errors. Five out of seven errors were possible in two-module configuration without processor caches level. So we built a new 2000 lines LOTOS specification describing such a configuration with the corrected rules. Actually two versions were built:
These models were small enough and written in a decomposable way, so that generating the corresponding automata and performing verification were possible.Then, it was formally verified that, in this downsized and simplified configuration, three properties are true:
In order to be sure that the models were accurate enough, we introduced in each version one of the errors found during modeling: the tools detected the errors and produced the corresponding diagnostic sequences.
This result increases the confidence in the corrected version of the design, but it should be stressed that it is in no way a formal proof of total correctness because:
Categorization of issues. We only considered issues which were not clear for a reader familiar with the protocol: obvious omissions or typing errors were not taken into account:
Learning and reuse factor. As formal specification was progressing, we improved our performance since we got experience in:
As example of constraints: the LOTOS compiler is more performant when there are less processes and less gates (LOTOS event gate = interaction point) and more functions, while decomposition tools rely on the structuring in processes.
Some parts of the first phase specification were reused in the third phase. We also took advantage of a previous experiment in 1995 where we applied FM to the PowerScale architecture and so got familiar with the PowerPc architecture.
The features of the formal specifications were as follows:
|Model features||LOTOS spec. features||Automata generation|
2 processors per module with caches
remote cache level
12 memory addresses
52 event gates
|Too large to be compilable|
|3 modules (one home)
2 processors per module with caches
remote cache level
1 memory address
IN_Q and OUT_Q collapsed
11 event gates
|Simulation is possible but the automaton was not possible to generate|
|2 modules (one home)
1 cacheless processor per module
remote (node) cache level
2 memory addresses (colliding in the module cache)
distinct OUT_Q and IN_Q (1 entry for each)
11 event gates
|automaton size with 4 bus operations:
59 379 states, 216 539 transitions,
generated in 45 mn (on an Ultra-Sparc 144Mhz and 256 MB)
automaton size with 9 bus operations:
To ascertain that the money spent on applying formal methods to Polykid was well spent, we have to compare the total cost (over all the life-cycle) of the product without use of FM and the total cost when FM are used. Or equivalently, we have to compare:
To estimate the benefits, we need to measure:
However, these data are not available because:
Another rigorous way of performing a quantitative evaluation of FM benefits is to collect data from projects using FM and compare them with data collected from projects which do not use FM (the projects should deal with the same application domain and be staffed with equally skilled teams).
Since quantitative data are lacking, we may draw the following qualitative evaluation, showing the additional benefits of FM with respect to other activities:
Finally, the use of FM is a strong commitment to ensure product quality: it has the potential to improve the overall system quality by clarifying specifications, to find issues in the early phases of the product development thus reducing the risk of delays and to increase confidence in the design.
VASY is also investigating the use of FM in test generation. It consists in generating test cases from a formal specification and formal test purposes: the generated test cases are then used to test that the implementation is conform with the formal specification.
The benefits of this approach are to avoid writing test verifiers: the test case (automatically produced) sends inputs and verifies outputs of the implementation. This approach has been already successfully applied in telecom software (in INRIA-Rennes) using the TGV tool. The challenge for VASY is to use this tool to automatically generate test suites for Polykid SIM1 test environment.
Because of staffing problems, this activity has been delayed. Work is in progress in three directions:
Nevertheless, TGV tool has been already configured to accept LOTOS formal specification (it was not the case at the beginning of the action), and some preliminary experiments have been done on the first compilable LOTOS specification of Polykid to produce abstract test cases.
In order to insert the use of FM in the standard methodology, the following points stem from this experiment:
The formal specification language used is LOTOS, an ISO standard. It allows algebraic type definitions and structuring of systems in parallel processes (with rendez-vous synchronisations). The tools provided by INRIA are CADP and TGV(currently only available on Solaris systems):
CADP (Caesar Aldebaran Development Package): it is a toolbox developped by INRIA Rhône-Alpes and Verimag. The functions used in Polykid project are:
In 1997, the graphical user interface has been enhanced and the LOTOS compiler has been optimized to run faster ; new composional verification tools have been added to the toolbox. Work is in progress to improve the automata generation algorithm to generate less redundant states.
TGV (Test Generation with Verification technology). This tool is developped by IRISA (INRIA Rennes) and Verimag. From a formal LOTOS specification of Polykid and a formal test purpose (in an automaton format) TGV produces a test case as an automaton which represents test sequences. During the last year, algorithms and interfaces of TGV have been improved to take into account the specific features of LOTOS language and to fit as closer as possible to the practices in hardware testing. Work is in progress to develop a tool which transforms this test case in a program which will be used in SIM1 Polykid test environment: this program will send inputs to the implementation and receives outputs and verify them with respect to the test purpose.
It seems that our most serious competitor in formal specification and verification of hardware protocols, and especially cache coherence protocols, is the Hardware Verification Group of Stanford University. They have designed their own formal specification language and a formal verifier called Murphi (phi is the greek letter).
This tool has been used by Sun within two projects:
We have more details about the applications done by the Stanford team, since these are included in their tool distribution. Due to several state reduction techniques implemented in Murphi, they are able to handle fairly detailed models. The interesting case studies are:
However, using Murphi, it is not possible to run a user-driven graphical simulation (useful to debug the specification) and it is not possible to verify liveness properties (such as: a request will get an answer) but only safety properties (such as no coherence paradox).
In the Polykid experiment, we have obtained a good result in modeling two addresses in our system, which are mapped in the same slot of the memory directory and of the module (or remote) cache: this has allowed us to take into account conflicts that were the origin of the most tricky errors. While all the above experiments modeled only one address, which means they assume a fully mapped directory and a remote cache per node as big as the whole memory. But we definitely need to improve our performance and handle larger models which are likely to contain problems, working in two directions: