Oppolzer - Informatik / Stanford Pascal Compiler

Home Lebenslauf Schwerpunkte Kenntnisse Seminare Kunden Projekte Produkte Blog Stanford Pascal Kontakt

The Stanford Pascal Compiler / Evolution Steps

Porting Stanford Pascal to Windows, OS/2 and Linux - other issues

When I tested a more complex program, I discovered another portability issue. Case statements are implemented by branch tables, that is, tables of branch instructions that are indexed by the value of the case variable, but: if the case variable is of char type, the branch table is constructed based on the EBCDIC character set.

It took me some time to think about several possible solutions for this problem. The problem exists only with char based branch tables; integer subtype or ordinal based branch tables work correctly. But I wanted the same solution for all kinds of branch tables.

This example shows a case statement and its implementation using the XJP instruction before the change:

case T of 'A' : WRITE ( ' SONNTAG,' ) ; 'B' : WRITE ( ' MONTAG,' ) ; 'D' : WRITE ( ' MITTWOCH,' ) ; 'F' : WRITE ( ' FREITAG,' ) ; 'G' : WRITE ( ' SAMSTAG,' ) ; end (* case *) ; /***************************************************/ /* generated P-Code */ /***************************************************/ XJP L3 ... L3 DEF 193 L4 DEF 199 L5 LAB UJP L8 UJP L9 UJP L6 UJP L11 UJP L6 UJP L13 UJP L14 L6 LAB

Each XJP involves four labels, starting with the number specified with XJP (in the example L3, L4, L5 and L6). The first is the lowest case value, the second is the highest, the third is the begin of the branch table, and the forth is the default label. The default label is the target of XJP, if the case value is outside the defined range (that is, lower than the lowest or higher than the highest value), but it is also used to fill the branch table with UJP branches for values inside the range which are not used in the case statement.

The first change I made was to add a type identifier to the DEF instruction, which was not present before. See the numbers in the example, which were used in place of the character constants. The new type identifiers can be I (integer) or C (char).

The second change made the branch table portable, because each entry now includes a DEF constant and a branch target. This means, that XJP is redefined in such a way that it scans the DEF constants in the branch table (which may be of type char) and branches to the address, if it finds a match. It branches to the default address, if it finds no match. The branch table may be "incomplete", because unused entries can be omitted. So for case statements with big "holes", the generated P-Code may be smaller.

The second pass of the compiler, which generates machine code out of the P-Code, may expand the portable P-Code to the old representation of "true" branch tables, so that there is no performance degradation. (In fact, this is what PASCAL2.PAS for IBM Mainframe does today). But this is left to the second pass, which is non-portable, anyway. The P-Code representation is meant to be portable.

See the example above again in the new "portable branch-table" version:

case T of 'A' : WRITE ( ' SONNTAG,' ) ; 'B' : WRITE ( ' MONTAG,' ) ; 'D' : WRITE ( ' MITTWOCH,' ) ; 'F' : WRITE ( ' FREITAG,' ) ; 'G' : WRITE ( ' SAMSTAG,' ) ; end (* case *) ; /***************************************************/ /* generated P-Code */ /***************************************************/ XJP L3 ... L3 DEF C,'A' L4 DEF C,'G' L5 LAB DEF C,'A' UJP L8 DEF C,'B' UJP L9 DEF C,'D' UJP L11 DEF C,'F' UJP L13 DEF C,'G' UJP L14 L6 LAB

This looks good at first sight, but there is still a portability problem:

there is no guarantee, that the constants which the compiler on platform X puts in the lowest and highest positions (L3 and L4 in the example) will remain the lowest and highest when migrating to platform Y.

In fact, the only solution will be, to completely get rid of the min and max values and the two labels, and to compute the minimum and maximum constant out of the DEF instructions contained in the "portable branch table". This will be done in a future step; the P-Code interpreter on Windows etc. can ignore the min and max values from the start, and they can be removed from the mainframe compiler, later.

1. Note: P-Code interpreters and code generators should not expect that the DEF constants are sorted with respect to the character set of their target machine; if the P-Code file comes from a foreign machine, the DEF constants will not be sorted in the general case.

2. Note: PASCAL2.PAS for the IBM mainframe (at the moment) uses the min and max values to compute the size of the branch table beforehand; this is not correct for P-Code files which come from other platforms. The min and max values are not portable. Furthermore, it is not possible to compute the needed size at a foreign platform, because it depends on the character set of the target platform. The only correct way is this: scan the DEF constants when on the target platform; compute the min and max values from this scan and determine the needed size of the branch table from their difference.

3. Note: I believe that at this moment there is no sound solution for branch tables that grow very large, but with this extension (portable branch tables), I could at least imagine how such solutions could look.

Back to Compiler main page