Oppolzer - Informatik / Stanford Pascal Compiler


Home       Lebenslauf       Schwerpunkte       Kenntnisse       Seminare       Kunden       Projekte       Produkte       Blog       Stanford Pascal       Kontakt

The Stanford Pascal Compiler / Evolution Steps

Back to Compiler main page

My own Pascal history (please forgive me, this is in German)

Pascal war die erste Programmiersprache, die ich 1977 an der Universität Stuttgart gelernt habe. Danach kamen ASSEMBLER, FORTRAN, RPG, BASIC (Microsoft) und COBOL innerhalb der ersten paar Jahre. Aber Pascal hat mich und auch meine Studienkollegen ohne Zweifel stark geprägt; natürlich auch die Programmierumgebung auf der Telefunken-Maschine TR 440, die für die damalige Zeit unglaublich komfortabel war. Wir versuchten immer wieder, auf den anderen Maschinen, mit denen wir danach konfrontiert waren, eine ähnliche Umgebung zu finden, meistens ohne Erfolg.

1982 kam ich durch meine Arbeit bei der Stuttgarter Straßenbahn (SSB AG) mit VM/CMS in Berührung, das von seiner Programmierumgebung her dem Komfort des Telefunken-Systems schon sehr nahe kam. Aber dort gab es leider nur FORTRAN. Als ich dann 1984 wiederum an der Uni meine Studien- und Diplomarbeit beim Institut für Eisenbahn- und Verkehrswesen machte, sollte ich dort zunächst auf einer Control-Data-Maschine arbeiten. Aber sie hatten auch Zugang zu einer IBM 3083, mit VM/CMS und Pascal. Ich setzte durch, dass ich die Arbeit dort machen durfte und hatte damit meine Traum-Umgebung gefunden. Später beschaffte ich den Pascal-Compiler (IBM Pascal VS) auch für die Straßenbahn. Nebenbei: auch das Institut für Eisenbahn- und Verkehrswesen (Prof. Heimerl) hat danach aufgrund meines Beispiels alle seine Aktivitäten auf die IBM-Maschine verlagert, für einige Zeit.

Später (1992 ca.) war ich gezwungen, meine Leidenschaft für Pascal aufzugeben und ich habe ab da praktisch nur noch in C programmiert - und in anderen Sprachen wie PL/1 und ASSEMBLER, natürlich.

2011 entdeckte ich durch Zufall auf einem emulierten IBM-System namens MUSIC/SP einen lauffähigen Pascal-Compiler aus dem Jahr 1982; das war eine Portierung des Stanford Pascal Compilers aus dem Jahr 1979. Dieser wiederum ist ein direkter Abkömmling des Pascal P4 Compilers von Niklaus Wirth bzw. seinen Mitarbeitern an der ETH Zürich, der auch bei uns in Stuttgart (an der Telefunken-Maschine) im Einsatz war.


Porting the Stanford compiler from MUSIC/SP to VM on Hercules

In 2011 I discovered a running version of Stanford Pascal on MUSIC/SP. This is an improved 1982 version (by McGill university) of 1979 Stanford Pascal which is part of some Hercules distributions. At that time, I already had a working version of VM/370 R6 running on Hercules on my Windows PC.

On Hercules VM, there was the 1979 Stanford version. My goal was to replace this by the 1982 McGill version. Because the compiler is able to compile itself, this first seemed no big problem. But soon I realized that the 1982 language is extended compared with the 1979 language, and the compiler uses these extensions excessively. So the 1982 compiler could not be compiled by the 1979 compiler.

For example, one of the extensions: structured constants:


program TEST1 ( OUTPUT ) ; type MNEM_TABLE = array [ 0 .. 255 ] of array [ 1 .. 4 ] of CHAR ; var CH : CHAR ; ... const Y : array (. 1 .. 5 .) of INTEGER = ( 4 , 5 , 7 , 9 , 12 ) ; XTBLN : MNEM_TABLE = ( '(00)' , '(01)' , '(02)' , '(03)' , 'SPM ' , 'BALR' , 'BCTR' , 'BCR ' , 'SSK ' , 'ISK ' , 'SVC ' , '(0B)' , '(0C)' , '(0D)' , 'MVCL' , 'CLCL' , 'LPR ' , 'LNR ' , 'LTR ' , 'LCR ' , 'NR ' , 'CLR ' , 'OR ' , 'XR ' , 'LR ' , ... 'AP ' , 'SP ' , 'MP ' , 'DP ' , '(FE)' , '(FF)' ) ;

The structured constants are headed by a const keyword, much the same way as simple constants, but in contrast to the simple constants, they appear after the type and var declarations. This is necessary, because the types have to be defined already, because the structured constants reference them.

The 1979 Stanford compiler didn't understand those structured constants. So I had to do a real bootstrap of the 1982 compiler to Hercules, that is: find a way to move the TEXT files (object files) to Hercules, and then maybe run them to compile the compiler on Hercules (which is kind of optional). I did this by encoding/decoding the binary object files into a real "text" format (hex encoding) and transferred them as textfiles.

The compiler then worked on Hercules, and the first correction I made was to fix some year-2000 problems: the date output on the compiler listing showed 1911, which I fixed by doing some date heuristics in PASMONN ASSEMBLE (the Pascal monitor and runtime).


First extensions to the compiler in 2011

Once I had managed to compile the compiler (using itself), and the new compiler still worked, I could try to do some minor improvements.

I first made some changes to the source code scanner; in fact I rewrote it. From IBM's Pascal compiler, I was accustomed to use (. and .) as synonyms for [ and ], but Stanford Pascal didn't allow this, so I changed that. Another substitute I was used to was -> for the pointer symbol; Stanford, too, didn't know that. I fixed that, too. I kept all the other substitutes that existed in the Stanford compiler, for example (/ and /) for [ and ].

Then I wanted some additional comment notation; Stanford already supported three (!). First the curly brackets { and }, then the common substitute (* and *), and - which sounds strange at first - all text which is enclosed in quotes is a comment in Stanford Pascal, too. I couldn't get rid of this, because it is used in the compiler and the P-Code translator, and: in my believe you always have to respect such things, because if there exists some sources somewhere that were understood by earlier releases of the compiler, they should well be accepted by later releases, if possible.

But anyway: I added the PL/1 and C style of comments /* */ as the 4th variant. In Stanford, comments may be nested, controlled by a certain compiler switch, so I had to take care not to break that feature. But it is defined that nesting of comments is only supported for comments of the same type.

Changes to symbol layout etc. are kind of easy, because they have no impact at all on code generation. That's why I started with such extensions.

From the years at the Stuttgart university and at SSB AG, I had a source code formatter program, written in Pascal. I tested this on Stanford Pascal, and it worked without problems. I had to do some maintenance to this formatter to, so that it supported all the extensions of McGill - for example structured constants - and of my own, which in part was done not before 2016.


New keywords BREAK, CONTINUE, RETURN - still in 2011

The next extensions that I did - still in 2011 - was to add some new keywords and new instructions for loop control. This was inspired from my C experience, where I found those instructions very useful, and IMO they don't compromise the structured programming style (much).

And: they are very easy to implement, because all work can be done at the P-Code level.

To make things transparent: the Pascal compiler consists of two passes; the first one is considered to be portable across platforms (which is not true in detail), and it builds code for an abstract stack oriented machine. This code is called P-Code. This P-Code is then translated by the second pass (which is a Pascal program, too) to 370 object code (files with CMS filetype TEXT, RECFM fixed, record length 80).

The changes mentioned above are all done to the first pass; there are no extensions to the P-Code, so the second pass needs not to be touched. All three extensions (BREAK, CONTINUE, LEAVE) are in fact some branches (jumps) to some place inside or outside the loop or to a certain place at the end of a procedure or function. If a branch target at the needed position didn't yet exist in the P-Code, a new one had to be built.

These extensions were done within hours, and this gave me some confidence that I will be able to do further - and more ambitioned - extensions - in the future. But anyway it lasted 5 years before I started again to do significant work on this compiler project - until 2016.

Example for the RETURN statement:


procedure CHARSET_ADD ( var X : CHARSET ; VON : CHAR ; BIS : CHAR ) ; var CH : CHAR ; begin (* CHARSET_ADD *) for CH := VON to BIS do begin if CH = 'F' then RETURN ; X (. CH .) := 'J' end (* for *) end (* CHARSET_ADD *) ;

Example for BREAK and CONTINUE:


F := 'BERND ' ; for I := 1 to 6 do begin if F (. I .) = 'D' then BREAK ; if F (. I .) = 'E' then CONTINUE ; WRITELN ( 'FOR-SCHLEIFE: I = ' , I : 3 , ' F(I) = ' , F (. I .) ) ; end (* for *) ;

BREAK and CONTINUE work with all loop types, of course, that is: FOR loops, WHILE loops, and REPEAT ... UNTIL loops.


Extensions to the runtime system (PASMONN) in 2016

When I tested some programs on VM/CMS and assigned INPUT and OUTPUT to the CMS console using FILEDEF INPUT TERM etc., I observed that I could not establish a real user dialog using WRITELN and READ due to different reasons.

I wrote a small test program similar to the following, which simply wrote a line, asked for input, wrote another line, asked again and so on.


program TESTINP ( INPUT , OUTPUT ) ; var ZAHL1 : INTEGER ; ZAHL2 : INTEGER ; ZAHL3 : INTEGER ; begin (* HAUPTPROGRAMM *) WRITELN ( 'bitte zahl1 eingeben' ) ; READ ( ZAHL1 ) ; WRITELN ( 'bitte zahl2 eingeben' ) ; READ ( ZAHL2 ) ; WRITELN ( 'bitte zahl3 eingeben' ) ; READ ( ZAHL3 ) ; WRITELN ( 'zahl1 = ' , ZAHL1 ) ; WRITELN ( 'zahl2 = ' , ZAHL2 ) ; WRITELN ( 'zahl3 = ' , ZAHL3 ) ; end (* HAUPTPROGRAMM *) .

But: the program asked for input, before the first WRITELN appeared. I found out after some research, that this is due to a RESET (INPUT) call which is introduced implicitly by the compiler at the beginning of the main procedure, and RESET is defined by the Pascal language definition to make the first file element available (that is: INPUT -> points to the first char in the file). And that again means that it reads an input buffer (from the console, of course), and it does this at the time of the RESET, that is: at the beginning of the main program, before the first WRITELN occurs.

It was very soon clear to me, that this implicit RESET has to be eliminated, but I first had no idea, how to do it. The compiler and many other programs will rely on it, and the language definition, too, says that it has to work that way. I recalled that Pascal VS had special calls TERMIN (INPUT) and TERMIN (OUTPUT), which had to be applied to terminal files; maybe those calls eliminated the compiler generated RESET (or: Pascal VS didn't do any implicit RESET at all).

To test what will happen if I eliminate this RESET I simply shifted it to a later position in the P-Code file (extension PRR on CMS) and then translated this modified P-Code file to object code.

I then observed that anyway a READ request occurred before the WRITELN. Indeed there was a second problem: because all output I/O was done using locate mode, the real output of a text line was deferred until the first WRITE of the next line. That is: not the WRITELN triggered the console output, but the first WRITE of the next line (which is in my example program obviously the next WRITELN).

So after some thinking I decided that it would be the best to transform all the writes to OUTPUT to move mode instead of locate mode; but that was a major task, because the output routines were not prepared to do it this way.

I omit all the technical details and pitfalls; it needed many days to get this working. Now the mode (move or locate) is controlled by a flag which is part of the FILDEF macro - on a per-file base - so if it should be needed to switch the mode back to locate, this is very easy task. And: not only the file OUTPUT is processed in move mode, but every output file, including the non-standard user textfiles (which were added in 1982 by McGill, another significant improvement over 1979 Stanford).

When I had successfully switched all output files to move mode and still patched my example program (by moving the RESET to a later point), all worked well and the example program worked as expected.


Making RESET and REWRITE optional and eliminating the implicit RESET on INPUT

When thinking some days about the initial RESET on INPUT applied by the compiler at the beginning of the main program, I came to the conclusion that it would be best not to require a RESET operation at all.

That was inspired in part by the observation that the implicit RESET and REWRITE is done for the standard files (INPUT, OUTPUT, PRR, PRD, QRR and QRD), but not for the additional files appearing on the program statement having arbitrary names (which was introduced with the 1982 McGill version). I didn't see a valid reason for this difference. So I decided in the end, that all RESETs and REWRITEs should be optional, that is: if no RESET has occured before the first READ (or GET) operation, it is done then. Same goes for REWRITE. I don't recall if Pascal VS did it this way, but it seems possible to me.

Of course, it must still be possible to do the RESET and REWRITE explicitly; this should make no difference to an implicit RESET on the first READ call. So the status of the file (if RESET has already been done or still needs to be done) must be recorded somewhere. And: the RESET at a later time, when a file that has been processed already fully or in part is read again from the start by a new RESET operation, must of course be supported in the same way as before. But on the other hand it should be impossible to do a READ after REWRITE or a WRITE after RESET.

To make all this possible, I first had to find some free space in the file control blocks that Pascal builds for every file. I shifted the DCB (which is part of this file control block) 4 bytes to offset 36, so I had room for 4 flags of one byte, of which I used the first (offset 32) to record the file status.

There are 4 status values:

0 = the file is closed
1 = reset has been issued, but no read operation
2 = rewrite has been issued, but no write operation
3 = read operations have been issued
4 = write operations have been issued

and a transtition matrix, which shows, which status changes are applied on which file operation (RESET, READ, REWRITE, WRITE). See the PASMONN.ASS source for details (Resources paragraph below).

Very important: this all applies only to files of type TEXT (FILE OF CHAR); for files of other types (for example FILE OF INTEGER or FILE OF RECORD ...) there are no implicit RESETs and REWRITEs, and the file status is never changed away from 0 for those (binary) files.

When I had applied all those changes to the file handling routines (after running some tests successfully), I could finally drop the implicit RESET on INPUT at the beginning of the main procedure, because the RESET will occur automatically at the first READ or GET.

Now the problems:

The compiler and the P-Code translator refused to work from that moment on, because they tested for EOF (INPUT) before the first READ. And it was FALSE with the prior implementation (because of the initial RESET), but it was TRUE now. The solution was obvious: inserting an explicit RESET (INPUT) at the beginning of each program.

In fact, every program that controls the input loop in the following way, has to be changed (without the RESET, the program will do nothing, because EOF (INPUT) is true at the beginning).


program TESTCOPY ( INPUT , OUTPUT ) ; var CH : CHAR ; begin (* HAUPTPROGRAMM *) RESET ( INPUT ) ; while not EOF ( INPUT ) do begin READ ( INPUT , CH ) ; WRITE ( OUTPUT , CH ) ; if EOLN ( INPUT ) then WRITELN ( OUTPUT ) end (* while *) end (* HAUPTPROGRAMM *) .

But anyway: I decided to stay with this solution, because I didn't see another way of getting rid of the initial terminal input at the implicit RESET. So this is how it works for the moment. Programs that need this type of control (as outlined above) need to do an explicit RESET (INPUT) at the beginning of the main program.


Allow shorter string constants on const initializers and assignments

When working on my source code formatter, I had a strange problem with indentation. The indentation on comments did not work correctly; I could not find the reason. The same program worked on the Free Pascal compiler (on Windows) without problems.

I finally found out that the compiler mixed up two variables with long names, which had the same starting 12 characters. That is: only 12 characters are significant with identifiers in this Pascal implementation (IIRC, the standard from the 1970s says, that 10 is the minimum). But IMO this is an unacceptable low number, so I decided to change that. The new number should be 16 or 20.

There is a constant IDLNGTH = 12 which controls the size of the identifiers in the internal tables. When I tried to set this to 16, I got many (some hundred) syntax errors, because the compiler doesn't accept string constant initializers which differ in length from the definition, for example:


type ALPHA = packed array [ 1 .. IDLNGTH ] of CHAR ; (*********************************************************) (* new reserved symbols in the 2011 version: *) (* break, return, continue *) (*********************************************************) const RW : array [ 1 .. NRSW ] of ALPHA = ( 'IF ' , 'DO ' , 'OF ' , 'TO ' , 'IN ' , 'OR ' , 'END ' , 'FOR ' , 'VAR ' , 'DIV ' , 'MOD ' , 'SET ' , 'AND ' , 'NOT ' , 'THEN ' , 'ELSE ' , 'WITH ' , 'GOTO ' , 'CASE ' , 'TYPE ' , 'FILE ' , 'BEGIN ' , 'UNTIL ' , 'WHILE ' , 'ARRAY ' , 'CONST ' , 'LABEL ' , 'BREAK ' , 'REPEAT ' , 'RECORD ' , 'DOWNTO ' , 'PACKED ' , 'RETURN ' , 'FORWARD ' , 'PROGRAM ' , 'FORTRAN ' , 'EXTERNAL ' , 'FUNCTION ' , 'CONTINUE ' , 'PROCEDURE ' , 'OTHERWISE ' ) ;

If the type ALPHA, which depends on IDLNGTH, is changed to length 16, I will get a syntax error for every initializer in the RW definition.

This looked unacceptable to me. I had already in 2011 the idea, that short string constants in initializations and assignments (maybe function calls) should be allowed. So I examined how the compiler could be extended to do this. It turned out to be not too diffcult; the string constants are adjusted to the new length directly after reading, depending on the length of the referencing type, and the missing blanks are appended to the buffer in the internal constant description. When the constant is written to the P-Code file, it already looks very nice, and all works well.

BTW: I am comparing the compiler all the time to the FPC (Free Pascal compiler) on Windows, and this time I first discovered some kind of problem, because FPC fills the strings with hex zeroes, if the initializers are shorter. But I don't want this behaviour on the mainframe, where other languages like PL/1 etc. fill with blanks, and every user of this compiler IMO would expect that strings would be handled this way. So it is like it is, and I have to accept this difference to FPC.

When I finished this work, the compiler accepted shorter string constant on const initializers and on string assignments, and I compiled the following program successfully:


program TESTPACK ( INPUT , OUTPUT ) ; type WORT = packed array [ 1 .. 10 ] of CHAR ; var X : array [ 1 .. 10 ] of CHAR ; Y : packed array [ 1 .. 10 ] of CHAR ; const Z : array [ 1 .. 4 ] of WORT = ( 'Bernd' , 'Sissi' , 'Lukas' , 'Marius' ) ; Z2 : WORT = 'OPPOLZER' ; begin (* HAUPTPROGRAMM *) X := 'TEST' ; Y := 'TEST2' ; WRITELN ( X , Y , Z [ 1 ] , Z [ 2 ] , Z [ 3 ] , Z [ 4 ] ) ; WRITELN ( 'TEST: ' , Z [ 1 ] [ 7 ] = ' ' ) ; WRITELN ( Z2 ) ; WRITELN ( 'TEST: ' , ORD ( Z [ 1 ] [ 7 ] ) ) ; end (* HAUPTPROGRAMM *) .

The char arrays of length 10 are all filled with blank to the end; there is no need any more to code the string constants all with length 10, which makes things a lot easier IMO.


20 significant characters on variable names (not only 12)

Now I could try to change the IDLNGTH constant in the compiler; I decided to go to 20 instead of 12. The syntax errors, which were in the hundreds before, have decreased to 4, because some definitions still used the numeric constant 12 and not the type ALPHA which depends on IDLNGTH; when I changed that, all worked OK.

But now I had the problem, that the second pass (P-Code translator) didn't work any more, because due to the changed IDLNGTH the output format of the P-Code file changed on certain instructions, and the P-Code translator couldn't read it any more. I soon discoverd that the P-Code translator had the same IDLNGTH constant; but setting it to 20, too, didn't resolve the issue - this would have been too easy.

I had to examine the differences on the P-Code file and discovered that due to my change there were differences on three P-Code instruction types: CST, ENT and BGN. One of these (ENT) was in fact wrong after the change; I had to correct the first pass (output length of the function name was 14, but it should have been IDLNGTH + 2).

After I've seen the effects of my change on the P-Code file, I knew better what I had to change on the second pass. The input and output routines in the P-Code translator had the same flaws as the compiler output (length was 14, but should have been IDLNGTH + 2). Another topic was that the header generated by the compiler containing the function name and the compile timestamp now has 40 bytes instead of 32, and it now has room for 20 bytes of the main program name (for example: the first pass has the program name PASCALCOMPILER, and that name didn't fit in the old program header, so it was completely omitted - but now it's there). Of course, there are branch instructions to jump over this header.

After all these modifications, I successfully compiled the following program (which had a syntax error E101 before, because of duplicate names):


program TESTDUPL ( INPUT , OUTPUT ) ; var VARIABLE_M_LANGEM_NAMEN : INTEGER ; VARIABLE_M_LANGEM_ANDEREM_NAMEN : INTEGER ; begin (* HAUPTPROGRAMM *) VARIABLE_M_LANGEM_NAMEN := 5 ; VARIABLE_M_LANGEM_ANDEREM_NAMEN := 12 ; WRITELN ( VARIABLE_M_LANGEM_NAMEN ) ; WRITELN ( VARIABLE_M_LANGEM_ANDEREM_NAMEN ) ; end (* HAUPTPROGRAMM *) .


SNAPSHOT Routine

I wondered if the SNAPSHOT routine will run successfully after the changes that I made regarding lengths of procedure names etc.

I had no experience with this so far, so I looked first how the SNAPSHOT routine is integrated with the rest of the Pascal system. It turned out that it is linked using WXTRN from the PASMONN monitor. The V-address is tested for non-zero. If the SNAPSHOT routine is present, it is called at a certain place. So I had to link the SNAPSHOT TEXT file with the rest of the application.

The SNAPSHOT routine is interesting, because it is written in Pascal completely, although it walks along the save area chains etc; it does all this, because it is closely tied to the Pascal compiler, that is, it knows all about certain addresses etc. And: it is a seperately translated external procedure. Such a procedure simply has to have a dummy (empty) main program, and then you can link it to your Pascal application. You can find further information in the Stanford Pascal 1979 paper.

On first tests, SNAPSHOT complained about problems reading QRD; this was because the symbol file created by the compiler was written on file QRR - not QRD - and there was no FILEDEF for this file. When I added that, all worked without problems.

Now: when a runtime error occurs, SNAPSHOT tells the location of the error (line number and name of procedure) and then it tells the variables and their contents in the abending procedure and in every procedure from the call stack.

To test this, I coded the Fibonacci example from the 1979 Stanford paper; it worked without problems and gave the same results.

I didn't test if there are any problems due to the fact that the procedure names now have 20 bytes in the symbol file (QRR); this still has to be done.

BTW: I observed that my emulated Hercules environment is 20 times faster than the original 1979 Stanford machine (there are some CLOCK outputs in the 1979 Stanford paper)


Shorter string constants - continued

I started to convert a useful program to Stanford Pascal that compares text files and prints the differences, in much the same way as the CMS command COMPARE does it, but much better, because, when finding a difference, it tries to synchronize on the next line which is equal, this way minimizing the shown differences (much the same way as SuperC and other compare tools do it).

This program is written in Ansi C and it is freeware; I found it somewhere in the Internet some years ago and adopted and improved it to meet my own needs. Now I wanted to convert it to Stanford Pascal, as an exercise, and: to see what features are still missing in the language.

I realized that the BREAK, CONTINUE and RETURN statements are very useful; I used them in the compiler, too, in the meantime, when indentation was large and they make the code more readable, IMO.

But I realized, too, that my extensions regarding shorter strings were incomplete:

So I had to do more work to fix all those shortcomings, but now it all works.

There remains one more problem: char string constants are limited to 64 bytes, which is not enough, IMO. Of course, because char string constants may not span source lines at the moment, this seems no big problem, but this also limits the size of char arrays that can be made compatible to shorter string constants; that is: a char array of, say, length 256 cannot be initialized by a single blank. This is because the initializing constant has to be expanded to 256 chars and written to the P-Code intermediate file in this expanded length, and this is not possible, at the moment.

So if you try to initialize strings having lengths > 64 with shorter string constants, you will get the same syntax error E129 (type conflict) as before my extensions. I would like to improve this in a future release.


Dynamic Storage Management

Another task for the next days or weeks:

the compare tool uses malloc and free. I used the new procedure call to get the dynamic storage needed.

But: there is no dispose procedure to free storage in the Stanford implementation.

Because the compare tool does not free storage at the moment, it crashes after ca. 1500 lines (on both files) with the error message "Stack/Heap collision", which is the symptom for "no more storage available".

There are only two ways to get rid of unused (dynamic) storage:

The mark/release feature is only possible, because the heap is another contiguous stack that grows downwards. But it has a fixed size (allocated at program startup, the size is controlled by JCL or CMD line parameters or by default) and it cannot be extended dynamically. Stack and heap together are one fixed area and have to fit into the lower 16 MB of memory (on the mainframe).

I have some experience with the LE implementation of dynamic storage management (LE heaps), and I would like to add functions ALLOC and FREE to Stanford Pascal that support this kind of storage management. With LE storage management, additional heap segments are requested from the operating system, when needed. The FREE function uses a very sophisticated algorithm to build free element trees, so that free elements can easily be reused and storage fragmentation is minimized. I would like to code all this logic in Pascal and add it to the Stanford implementation.

I cannot remove the actual implementation involving new, mark and release completely at the moment, because it is used in the compiler, and I don't want to change that at the moment. So I would instead add ALLOC and FREE as new functions which activate the second (new) heap management.

In my installation, Stack and Heap (classical) are limited to 4 MB (on CMS). But my PASCAL CMS machine has 8 MB and could be extended further. If the "new" dynamic heap would use the remaining 4 MB, the first 4 MB would be available completely to the stack segment (which is large, IMO). This would be a large progress compared to the actual situation, and I could imagine a still larger progress, if it would be possible to move the "new" heap to the "above the line" area.


Pointer arithmetic - new functions ADDR, PTRADD, PTRDIFF, SIZEOF, PTR2INT

When I started to code the LE inspired storage allocation routines, I soon discovered that some more compiler features are needed to make this task more fun. In fact, I borrowed some features from the C language and added them to Stanford Pascal in a sensible way (I hope), so that the overall spirit of the language will not be compromised.

The features added in this sprint include

Here is an example program that shows the new features:


program TESTADDR ( OUTPUT ) ; (********) (*$A+ *) (********) type CHAR10 = array [ 1 .. 10 ] of CHAR ; SHOWPTR = record case BOOLEAN of TRUE : ( X1 : -> CHAR ) ; FALSE : ( X2 : INTEGER ) end ; var X : INTEGER ; Y : -> INTEGER ; FELD : array [ 1 .. 10 ] of CHAR ; CHP : -> CHAR ; CHP2 : -> CHAR ; Z : SHOWPTR ; begin (* HAUPTPROGRAMM *) X := 25 ; Y := ADDR ( X ) ; WRITELN ( 'vergleich: ' , X , Y -> ) ; FELD := 'OPPOLZER' ; CHP := ADDR ( FELD ) ; while CHP -> <> ' ' do begin WRITELN ( 'chp -> = ' , CHP -> ) ; CHP := PTRADD ( CHP , 1 ) ; end (* while *) ; CHP2 := ADDR ( FELD ) ; X := PTR2INT ( CHP ) ; WRITELN ( 'ptr2int (chp) = ' , X ) ; Z . X1 := CHP ; WRITELN ( 'chp via showptr = ' , Z . X2 ) ; X := PTR2INT ( CHP2 ) ; WRITELN ( 'ptr2int (chp2) = ' , X ) ; X := PTR2INT ( ADDR ( FELD ) ) ; WRITELN ( 'ptr2int (addr(feld)) = ' , X ) ; WRITELN ( 'ptr2int (addr(feld)) = ' , PTR2INT ( ADDR ( FELD ) ) ) ; X := PTRDIFF ( CHP , CHP2 ) ; WRITELN ( 'ptrdiff = ' , X ) ; X := PTRDIFF ( CHP , ADDR ( FELD ) ) ; WRITELN ( 'ptrdiff = ' , X ) ; X := PTRDIFF ( ADDR ( FELD ) , CHP ) ; WRITELN ( 'ptrdiff = ' , X ) ; WRITELN ( 'sizeof (feld) = ' , SIZEOF ( FELD ) ) ; WRITELN ( 'sizeof (integer) = ' , SIZEOF ( INTEGER ) ) ; WRITELN ( 'sizeof (char10) = ' , SIZEOF ( CHAR10 ) ) ; end (* HAUPTPROGRAMM *) .

Note: the type SHOWPTR shows how you could do pointer arithmetic before my extensions (it is indeed done this way in SNAPSHOT and other system related routines). I used it to test the results.

The A+ switch tells the P-Code translator to output ASSEMBLER code in addition to the binary 370 code to file ASMOUT, which proved to be very useful when doing these changes and implementing the new P-Code instructions.

When adding the new functions, I also improved the compiler (PASCAL1), so that future additions of new library functions are still easier (improved table layout of the internal compiler tables).


Pascal library

My future extensions to the runtime library will be Pascal functions, not ASSEMBLER, so I tried to change the system in a way that it is easier to add new (Pascal) library functions.

At the moment, SNAPSHOT is the only Pascal library function that is added to the compiler generated Pascal code.

SNAPSHOT is an external Pascal function with a dummy Pascal main program, containing the X+ compiler switch. This has three effects:

But this is not sufficient at all. When you add another "module" built in this way, you get duplicate names (#MAINBLK) on the linkage editor. And: the internal procedures of deeper nesting levels have the external attribute, too. In the end, I want to be able to add more than one external module (written in Pascal) to the compiler generated main program, and it should be no problem, if there exist procedures with the same name anywhere at deeper levels.

So I changed the Pascal system in the following way:

When this all worked, I converted SNAPSHOT to the new design (MODULE keyword etc.) and renamed it to $PASSNAP. This all worked without problems.

Then I did further renames to make the names more consistent accross the Pascal system. That is:

In fact, we now have four kinds of predefined procedures or functions:


Storage management - new functions ALLOC, FREE, ALLOCX, FREEX

To start with the new storage management, I added four new standard functions to the Pascal system:

Now the problem was: all the new functions should be written in Pascal; they should be located in a single module called PASLIBX and accessed by an entry $PASLIB and different function codes, and only the GETMAIN and FREEMAIN macros (with minimal interface code) should stay in the Pascal Monitor (which is coded in ASSEMBLER).

To make this possible, the compiler had to be extended; the standard procs until now only had one attribute, that is the number of the CSP call generated (called KEY). I added to the standard procs attributes like LIBNAME (8 characters), FUNCCODE (integer), PARMCNT (parameter count) and PROCTYP (result type). If LIBNAME is not blank, the standard proc will not be implemented by a CSP call (which was the only technique until now), but by a call to a certain library module, where the first parameter is the function code. The compiler will generate a normal nonstandard procedure or function call (CUP) using the attributes of the standard proc definition.

Furthermore, the compiler contains a new table where all those new standard library functions are defined. On compiler startup, in the procedure ENTSTDNAMES, the new procedures and their attributes are entered into the compiler's general identifier repository (using procedure ENTERID), in the same way as it is done for the other predefined functions and procedures.

When all this worked, I changed the compare utility called XCOMP (see above) to use the new ALLOCX and FREEX functions instead of NEW (no DISPOSE available). Having done this, XCOMP now was able to compare very large files without problems, because the GETMAIN / FREEMAIN implementation of CMS indeed frees storage, so that it can be reused later in the same run.

The Resource files (see below) have been updated; they now contain:

The Binaries (AWS Tape) has not yet been updated. This will be done in the next days.

This is a short example program which shows the usage of ALLOCX and FREEX:


program TESTALC ( OUTPUT ) ; (********) (*$A+ *) (********) var CHP : -> CHAR ; CHP2 : -> CHAR ; CHPSTART : -> CHAR ; FELD : array [ 1 .. 25 ] of CHAR ; begin (* HAUPTPROGRAMM *) CHP := ALLOCX ( 25 ) ; CHPSTART := CHP ; WRITELN ( 'chp = ' , PTR2INT ( CHP ) ) ; FELD := 'OPPOLZER' ; CHP2 := ADDR ( FELD ) ; while CHP2 -> <> ' ' do begin CHP -> := CHP2 -> ; CHP := PTRADD ( CHP , 1 ) ; CHP2 := PTRADD ( CHP2 , 1 ) ; end (* while *) ; WRITE ( 'chp-schleife = ' ) ; CHP -> := CHP2 -> ; CHP := CHPSTART ; while CHP -> <> ' ' do begin WRITE ( CHP -> ) ; CHP := PTRADD ( CHP , 1 ) ; end (* while *) ; WRITELN ; FREEX ( CHPSTART ) ; end (* HAUPTPROGRAMM *) .

Update - 16.10.2016

The new ALLOC and FREE functions (inspired by LE) are working, but there are still problems; at a certain point in time after millions of allocs and frees, some area of storage is overwritten.

I found out, that this is due to the fact that the file access routines (QSAM) start to write at places (when reading input data), where they should not.
But the culprit must be my storage routines - I guess that they affect the DCBs or any other file handling control block, but I didn't find the error so far.

While trying to find the error, I did additional extensions to the compiler:

Before I publish new binaries, I want to resolve the problem with the storage management, first.

Update - 20.10.2016

I finally fixed the problem in the storage functions similar to the LE storage handler. It was kind of strange, because due to a logic error in these storage management functions, a QSAM file buffer got overlaid by an allocated storage area, which in turn leaded to a QSAM read overwriting the QSAM file management area, but this did not stop QSAM from working, but it did the subsequent reads to different (wrong) places, and this (finally) destroyed the meta information of the storage management system.

I only found this, when I monitored the QSAM buffers all the time and checked the moment when the contents changed from right to wrong. This was a storage allocation of a certain type, which did something wrong with respect to the management of the free areas in the heap segment, and the QSAM buffer, which followed directly in storage, was affected.

When I repaired this, all was ok. And: the change from normal GETMAIN/FREEMAIN to the new allocation functions (similar to LE) lead to a 5 percent overall CPU reduction of the compare utility XCOMP in my testcase.

So this topic is closed; the storage management functions work as they are designed, and the XCOMP utility, too.

Here are two examples; the first one compares two versions of the compiler (12000 lines), and the second one compares RUNPARM ASSEMBLE with my improved version XRUNPARM ASSEMBLE. Both versions were compared with -m9, that is: 9 lines have to match to synchronize (this is a value that I use very often; it is a good compromise and yields good results). The ASSEMBLER compare was done using -w, that is: differences involving only white space are ignored. This is important, because I aligned many comments, and those lines should be considered unchanged. The only significant change I made to XRUNPARM is that it inserts blank between the parameter tokens coming from CMS (which RUNPARM did not; this was an error IMO).

XCOMP-Example: Comparing Compiler versions
XCOMP-Example: Comparing XRUNPARM and RUNPARM

Static definitions

When implementing the new storage management, static variables would have been very useful. These static variables should be global definitions inside a module and keep their values across several invocations of procedures of the module, but they should not be visible outside the module.

The compiler supported var declarations in modules, but the generated code used the global stack register (R12) to address them, and the offsets were calculated without knowledge of the main programs characteristics, so there were overlays with variables of the main program. So, global var declarations in modules could not be used. As a consequence, the compiler now flags var declarations in modules (outside of procedures and functions) - but static declarations are allowed.

When looking for a possibility to implement static variables (not only global, but inside procedures, too), I discovered that there is already a CSECT for structured constants which has all the characteristics that are needed for static variables (and it is not write protected, which is important!). This CSECT is started by the P-Code instruction CST and the field definitions are done by DFC instructions.

I decided to put the static variables there, too.

This turned out to be kind of easy. The offsets are calculated from the same location counter (which I renamed to CONSTLCOUNTER in PASCAL1.PAS). The DFCs initialize the static variables to zero for numeric data, blank for characters and strings, and NIL for pointer, and hex zero for all complex types (arrays, sets, records). That is: inside a record, pointers and chars are NOT set to NULL and blank, but to hex zero, respectively. Maybe this can be done better, later.

A VALUE type initialization like in Pascal VS has to be done later, too.

When the first static definition is encountered inside a block and there was no STATIC CSECT so far, the STATIC CSECT is started (by issuing a CST instruction). The name of the STATIC CSECT is derived from the name of the Code CSECT. Every reference to a static variable is in fact an indirect reference, because the base address of the STATIC CSECT has to be loaded first (this was easy, because the compiler already knew how to do this from the structured constant logic).

In the first run, the base address of the STATIC CSECT was loaded via a V-address constant.

Then I realized that SNAPSHOT (aka PASSNAP) has to be extended to enable it to show static variables, too.

I added a flag (S = static, A = auto) to the file QRR, which is written by the compiler in the debug case. PASSNAP reads this file when printing variables. However, to be able to access the static variables, PASSNAP must have access to the base address of the STATIC CSECT. So I extended the header information of the Code CSECT and put the static address just behind the CSECT information, 4 bytes before the target of the initial branch. PASSNAP can find it by examining the displacement of the initial branch instruction at EPA (entry point address) and subtracting 4; this is necessary because the length of the CSECT information may vary between main program and subprograms (for example).

Because now the static address is stored at a known place in the Code CSECT header, I replaced all the V-constant references by references to this V-address constant (which can be located using the base register 10).

So it is now possible to have up to 4k of static variables added to every procedure or function and to the main program, which are initialized by the compiler in the way outlined above, and which keep their values across invocations.

Here is an example, how static definitions are used in the heap manager (PASLIBX) to hold a static heap control block (and other informations, that is: static counters of allocation and free requests) and to provide initial values that make sense to the application:


type (****************************************************) (* HANC: Heap Element *) (****************************************************) PHANC = -> HANC ; HANC = record ... end ; (****************************************************) (* HPCB: Heap Control Block *) (****************************************************) PHPCB = -> HPCB ; HPCB = record EYECATCH : CHAR4 ; FIRST : PHANC ; LAST : PHANC ; ACTIVE : INTEGER ; end ; static HEAPCB : HPCB ; PHEAP : PHPCB ; (****************************************************) (* statische Variablen: *) (* heap control block *) (* zeiger auf heap control block (anfangs nil) *) (* anzahl-felder fuer allocs und frees *) (****************************************************) ANZ_ALLOCS : INTEGER ; ANZ_FREES : INTEGER ; ... (****************************************************) (* Pointer auf Heap Control Block besorgen *) (****************************************************) if PHEAP = NIL then begin PHEAP := ADDR ( HEAPCB ) ; PHEAP -> . EYECATCH := 'HPCB' ; PHEAP -> . FIRST := NIL ; PHEAP -> . LAST := NIL ; PHEAP -> . ACTIVE := 1 ; end (* then *) ;


Extending PASSNAP aka SNAPSHOT

As mentioned before, PASSNAP had to be extended to be able to show the static variables, too.

When doing those changes to PASSNAP, I did some other reworking, too:

I furthermore wanted PASSNAP to output the offsets and addresses of the variables, too - and the storage class (auto or static).

See the following example output of PASSNAP and compare it to the output of the same program (same Snapshot) in the 1979 Stanford documentation - see below in the Documentation paragraph. I added some static definitions in the main program and in the FIBONACCI procedure, which - of course - were not present in 1979.


fibonacci # 10 is ************************* ***** SNAPSHOT DUMP ***** ************************* **** SNAPSHOT WAS CALLED BY --> 'PASCAL_MONITOR' **** RUN ERROR: 1002 FROM LINE: 37 OF PROCEDURE: 'FIBONACCI' EPA address of FIBONACCI is 000205A8 Error offset is 00B4 **** SUBRANGE VALUE IS OUT OF RANGE. **** THE OFFENDING VALUE: -1 IS NOT IN THE RANGE: 0..30 **** VARIABLES FOR 'FIBONACCI *** Stack at address 0002B590 Static variables at address 00020598 J (A/0070/0002B600) = 2 ANZCALL (S/0008/000205A0) = 10 **** PROCEDURE 'FIBONACCI' WAS CALLED BY --> 'FIBONACCI' FROM LINE: 37 EPA address of FIBONACCI is 000205A8 Call offset is 0098 **** VARIABLES FOR 'FIBONACCI *** Stack at address 0002B518 Static variables at address 00020598 J (A/0070/0002B588) = 3 ANZCALL (S/0008/000205A0) = 10 **** PROCEDURE 'FIBONACCI' WAS CALLED BY --> 'FIBONACCI' FROM LINE: 37 EPA address of FIBONACCI is 000205A8 Call offset is 0098 **** VARIABLES FOR 'FIBONACCI *** Stack at address 0002B4A0 Static variables at address 00020598 J (A/0070/0002B510) = 4 ANZCALL (S/0008/000205A0) = 10 **** PROCEDURE 'FIBONACCI' WAS CALLED BY --> 'FIBONACCI' FROM LINE: 37 EPA address of FIBONACCI is 000205A8 Call offset is 0098 **** VARIABLES FOR 'FIBONACCI *** Stack at address 0002B428 Static variables at address 00020598 J (A/0070/0002B498) = 5 ANZCALL (S/0008/000205A0) = 10 **** PROCEDURE 'FIBONACCI' WAS CALLED BY --> 'FIBONACCI' FROM LINE: 37 EPA address of FIBONACCI is 000205A8 Call offset is 0098 **** VARIABLES FOR 'FIBONACCI *** Stack at address 0002B3B0 Static variables at address 00020598 J (A/0070/0002B420) = 6 ANZCALL (S/0008/000205A0) = 10 **** PROCEDURE 'FIBONACCI' WAS CALLED BY --> 'FIBONACCI' FROM LINE: 37 EPA address of FIBONACCI is 000205A8 Call offset is 0098 **** VARIABLES FOR 'FIBONACCI *** Stack at address 0002B338 Static variables at address 00020598 J (A/0070/0002B3A8) = 7 ANZCALL (S/0008/000205A0) = 10 **** PROCEDURE 'FIBONACCI' WAS CALLED BY --> 'FIBONACCI' FROM LINE: 37 EPA address of FIBONACCI is 000205A8 Call offset is 0098 **** VARIABLES FOR 'FIBONACCI *** Stack at address 0002B2C0 Static variables at address 00020598 J (A/0070/0002B330) = 8 ANZCALL (S/0008/000205A0) = 10 **** PROCEDURE 'FIBONACCI' WAS CALLED BY --> 'FIBONACCI' FROM LINE: 37 EPA address of FIBONACCI is 000205A8 Call offset is 0098 **** VARIABLES FOR 'FIBONACCI *** Stack at address 0002B248 Static variables at address 00020598 J (A/0070/0002B2B8) = 9 ANZCALL (S/0008/000205A0) = 10 **** PROCEDURE 'FIBONACCI' WAS CALLED BY --> 'FIBONACCI' FROM LINE: 37 EPA address of FIBONACCI is 000205A8 Call offset is 0098 **** VARIABLES FOR 'FIBONACCI *** Stack at address 0002B1D0 Static variables at address 00020598 J (A/0070/0002B240) = 10 ANZCALL (S/0008/000205A0) = 10 **** PROCEDURE 'FIBONACCI' WAS CALLED BY --> '$PASMAIN' FROM LINE: 46 EPA address of FIBONACCI is 000205A8 EPA address of $PASMAIN is 000206D8 Call offset is 0108 **** VARIABLES FOR '$PASMAIN *** Stack at address 0002B068 Static variables at address 00020580 I (A/0160/0002B1C8) = 10 TIME (A/0164/0002B1CC) = 13 TESTDUMP (S/0008/00020588) = 42 TESTCHAR (S/000C/0002058C) = 'Oppolzer ' **** END OF SNAPSHOT DUMP ****

Here you have the sourcecode of the 2016 version of FIBDEMO.PAS:


program FIB_DEMO ( OUTPUT ) ; (********) (*$A+ *) (********) type POS_INT = 0 .. 30 ; var I : POS_INT ; TIME : INTEGER ; static TESTDUMP : INTEGER ; TESTCHAR : array [ 1 .. 10 ] of CHAR ; function FIBONACCI ( J : POS_INT ) : INTEGER ; (******************************) (* to evaluate fibonacci # j, *) (* for j >= 0 *) (* subject to int overflow *) (******************************) static ANZCALL : INTEGER ; begin (* FIBONACCI *) ANZCALL := ANZCALL + 1 ; if J = 0 then FIBONACCI := 0 else if J = 1 then FIBONACCI := 1 else FIBONACCI := FIBONACCI ( J - 1 ) + FIBONACCI ( J - 3 ) ; end (* FIBONACCI *) ; begin (* HAUPTPROGRAMM *) TESTDUMP := 42 ; TESTCHAR := 'Oppolzer' ; for I := 10 to 25 do begin TIME := CLOCK ( 0 ) ; WRITELN ( ' fibonacci # ' , I : 3 , ' is ' , FIBONACCI ( I ) : 8 , ' (Comp.time = ' , CLOCK ( 0 ) - TIME : 5 , ' Milli Sec.)' ) ; end (* for *) end (* HAUPTPROGRAMM *) .


New functions MEMSET and MEMCPY

Because of limitations on string lengths (the maximum string constant length is 64, at the moment), I wanted to have a function like MEMSET to be able to initialize a larger string to blanks, for example.

I added a new module PASSTRX (which is supposed to support more string functions in the future, when I want to add a STRING datatype supporting variable length strings). At the moment, the module PASSTRX only contains an external entry $PASSTR, which implements the two new library functions MEMSET and MEMCPY.

The functions are defined much the same as their C counterparts:

At the moment, the functions are implemented using simple Pascal loops; a better solution involving loops of MVCs or MVCL has to be done later.

Here is a short example program which I used to test the new functions:


program TESTMEM ( OUTPUT ) ; var BUFFER : array [ 1 .. 1000 ] of CHAR ; BUFFER2 : array [ 1 .. 200 ] of CHAR ; I : INTEGER ; begin (* HAUPTPROGRAMM *) MEMSET ( ADDR ( BUFFER2 ) , 'A' , 200 ) ; MEMSET ( ADDR ( BUFFER ) , ' ' , 1000 ) ; MEMCPY ( ADDR ( BUFFER ) , ADDR ( BUFFER2 ) , 200 ) ; MEMCPY ( PTRADD ( ADDR ( BUFFER ) , 400 ) , ADDR ( BUFFER2 ) , 200 ) ; for I := 1 to 1000 do begin if I MOD 50 = 1 then WRITE ( I : 6 , ': ' ) ; WRITE ( BUFFER [ I ] ) ; if I MOD 50 = 0 then WRITELN ; end (* for *) end (* HAUPTPROGRAMM *) .


Direct WRITE of scalar variables or expressions

The compiler up until now did not support direct WRITE of scalar variables; this was flagged as a implementation restriction. Other compilers (FPC for example) support this; they output the name of the scalar value (much the same way as our compiler does it for the type BOOLEAN).

Here is a short example program with direct WRITE of scalar values:


program TESTSCAL ( INPUT , OUTPUT ) ; (********) (*$A+ *) (********) type FARBE = ( GELB , ROT , GRUEN , BLAU ) ; OPTYPE = ( PCTS , PCTI , PLOD , PSTR , PLDA , PLOC , PSTO , PLDC , PLAB , PIND , PINC , PPOP , PCUP , PENT , PRET , PCSP , PIXA , PEQU , PNEQ , PGEQ , PGRT , PLEQ , PLES , PUJP , PFJP , PXJP , PCHK , PNEW , PADI , PADR , PSBI , PSBR , PSCL , PFLT , PFLO , PTRC , PNGI , PNGR , PSQI , PSQR , PABI , PABR , PNOT , PAND , PIOR , PDIF , PINT , PUNI , PINN , PMOD , PODD , PMPI , PMPR , PDVI , PDVR , PMOV , PLCA , PDEC , PSTP , PSAV , PRST , PCHR , PORD , PDEF , PRND , PCRD , PXPO , PBGN , PEND , PASE , PSLD , PSMV , PMST , PUXJ , PXLB , PCST , PDFC , PPAK , PADA , PSBA , UNDEF_OP ) ; var X : FARBE ; OPC : OPTYPE ; begin (* HAUPTPROGRAMM *) for X := GELB to BLAU do begin WRITELN ( 'Farbe: ' , ORD ( X ) , ' ' , '<' , PRED ( X ) : 10 , '#' , X : 10 , '#' , SUCC ( X ) : 10 , '>' ) ; end (* for *) ; WRITELN ( 'liste aller opcodes' ) ; for OPC := PCTS to UNDEF_OP do begin WRITE ( OPC , ' ' ) ; if ORD ( OPC ) MOD 10 = 9 then WRITELN end (* for *) ; end (* HAUPTPROGRAMM *) .

To support this, the compiler must build a table of the scalar names in the constant section.

Because I already did some work involving the constant section when adding the static variables, I knew what to do (the big picture).

So I added DFC instructions to the constant section, when parsing a scalar type definition, including a table of offsets and lenghts, so that the new scalar write function (a new CSP which I called WRX) will find the string corresponding to the internal scalar value. BTW: the internal scalar values are assigned from 0 to the maximum value by the compiler. The offset of this static table is recorded in the type informations, so that it can be retrieved from there, when needed; same goes for the CSECT name of the constant section. This is needed, because the scalar variable may be accessed at deeper levels, but the constant array resides where the type definition is.

This was kind of easy, but what made things really hard this time: there was an error in the code generation, which had nothing to do with this extension, but it took me a long time to find and to repair it.

Even code like the following


WRITE (R : 10 : ST);

where R is a real value and ST is a static variable (or part of a structured const) has this sort of problem, because the base address of the constant section was loaded very early into a "wrong" register. But WRITE absolutely needs the 3 parameters in the registers 2, 3 and 4 - otherwise the code generation crashes.

This is a flaw of the PCODE translator PASCAL2, and I fixed it (this time) by adding some logic to the routine FILESETUP, that is: if the parameters appear in the "wrong" registers, some LRs are generated and the needed registers are freed this way. This is not perfect, because it generates unnecessary LRs, but it works for the moment.

After this modification, all was ok (even the OLD errors), and I only had to add the new standard function WRX to the Pascal monitor (which involves some ASSEMBLER work).

When all was completed, the test program (see above) produced the following output:


prun testscal FILEDEF INPUT TERM ( RECFM V FILEDEF OUTPUT TERM ( RECFM V FILEDEF PASTRACE TERM ( RECFM F EXEC PASRUN TESTSCAL Farbe: 0 < WRX:FFFF# GELB# ROT> Farbe: 1 < GELB# ROT# GRUEN> Farbe: 2 < ROT# GRUEN# BLAU> Farbe: 3 < GRUEN# BLAU# WRX:0004> liste aller opcodes PCTS PCTI PLOD PSTR PLDA PLOC PSTO PLDC PLAB PIND PINC PPOP PCUP PENT PRET PCSP PIXA PEQU PNEQ PGEQ PGRT PLEQ PLES PUJP PFJP PXJP PCHK PNEW PADI PADR PSBI PSBR PSCL PFLT PFLO PTRC PNGI PNGR PSQI PSQR PABI PABR PNOT PAND PIOR PDIF PINT PUNI PINN PMOD PODD PMPI PMPR PDVI PDVR PMOV PLCA PDEC PSTP PSAV PRST PCHR PORD PDEF PRND PCRD PXPO PBGN PEND PASE PSLD PSMV PMST PUXJ PXLB PCST PDFC PPAK PADA PSBA UNDEF_OP Ready; T=0.02/0.08 13:21:00

BTW: you can see from this example that values outside the range of the scalar type do not produce abends, but instead WRX writes the hex value of the scalar (4 digits), prefixed by WRX:


Maximum length of string constants is now 254 (was 64)

Char arrays can be defined larger than 64 or even 254, of course. But before this extension, it was not possible to initialize char arrays larger than 64 to blank by simply assigning one blank (which IS possible for strings shorter than 64 since my extension concerning shorter string constants from september 2016). This is because the shorter string has to be extended internally to the corresponding target length, but that is (or was) not possible if the target length exceeds 64.

Of course, "true" string constants of length more than 64 were not very interesting in the past, because string constants were not allowed to span lines, and the lines are limited to 80 chars in the most common case.

So the first thing I did: I changed the source code scanner (INSYMBOL) in a way that it allowed for more than one string constant in sequence in place of one single string constant. So the Pascal programmer can simply terminate the string constant on one line and restart it on the next. See the following example:


program TESTLSTR ( OUTPUT ) ; (********) (*$A+ *) (********) var ZEILE : array [ 1 .. 200 ] of CHAR ; I : INTEGER ; begin (* HAUPTPROGRAMM *) ZEILE := 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa' 'bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb' 'cccccccccccccccccccccccccccccccccccc' 'dddddddddddddddddddddddddddddddddddd' 'eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee' ; WRITELN ( ZEILE ) ; MEMSET ( ADDR ( ZEILE ) , 'b' , 200 ) ; WRITELN ( ZEILE ) ; for I := 1 to 200 do ZEILE [ I ] := '=' ; WRITELN ( ZEILE ) ; end (* HAUPTPROGRAMM *) .

Of course, this didn't compile in the first place, because the resulting string and the target were both longer than 64. I changed the limit to 254 (which is a kind of arbitrary limit, but I believed that it would be better to stay below the MVC limit). The compiler had no problem, but the generated P-Code was not accepted by the P-Code translator, due to long string constants (on LCA/M and DFC instructions, as it turned out).

I changed the compiler, so that long string constants are split and written on several lines. Furthermore, I put a length field in front of the string constants; maybe later the string constant can be trimmed to the right, so that trailing blanks need not be written to the P-code file. The length field is optional; if it is not present (that is, there is a quote immediately following the comma), it works the same as before.

Here is an example of a long LCA/M instruction (from the example above):


LOC 16 LCA M,200,'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', 'aaaaaabbbbbbbbbbbbbbbbbbbbbbbbb', 'bbbbbbbbbbbcccccccccccccccccccc', 'ccccccccccccccccdddddddddddddd', 'ddddddddddddddddddddddeeeeeeee', 'eeeeeeeeeeeeeeeeeeeeeeeeeeee ', ' ' MOV 200

see the length information (200); the commas at the end (after the string constants) indicate that there is a continuation on the following line.

Of course, the P-Code translator (PASCAL2) had to be changed to accept this continuation, too.

Apart from those changes, there were no more changes necessary to the code generator. All worked well.


Making CSP numbers consistent between PASCAL1, PASCAL2 and PASMONN

When I did some maintenance to PASCAL2 (the P-Code translator, which translates P-Code to 370 machine code), I discovered that the internal numbers of the standard procedures (P-Code identifier = CSP) were totally different between the two compiler passes.

The compiler (first pass) defined the CSPs as a subrange type (0 .. NSPROC) and the second pass as a scalar type (CSPTYPE), and the sequence of the CSPs was different. There was no correlation between the numbers of the first pass and the sequence in the scalar type of the second pass. This was very disturbing.

The communication between the two passes goes through the P-Code file; the CSP instruction has the CSP name as parameter (e.g. CSP WLN for WRITELN). When the second pass reads the CSP instruction, it translates the CSP name to its internal representation (the scalar type), which is different from that of the first pass.

There is a strong linkage between the second pass and the PASMONN run time system; the second pass generates code involving the CSP number, when calling a CSP. The CSP number is used as an index to a branch table in PASMONN. So if the CSP numbers are changed, PASMONN and its branch table (see entry $PASCSP) will have to be changed, too

When I first tried to change the CSP sequence in October 2016, this ended up in a big disaster, because it is not sufficient to change PASMONN and PASCAL2 in sync. After you change PASMONN, you need to recompile everything, because all your existing TEXT files are invalid, but you have no valid compiler to do this, and PASCAL2 (which is worse) would need a compile by the new version of PASCAL2 ...

So I first had no idea how to do this.

Then, in November, I had the idea: PASMONN needs a second entry besides $PASCSP, which supports the new CSPTYPE, that is: a second branch table. I did the following changes:

This is the CSPTYPE definition of PASCAL2:


CSPTYPE = ( PCTR , PN01 , PN02 , PN03 , PN04 , PN05 , PN06 , PN07 , PN08 , PN09 , PPAG , PGET , PPUT , PRES , PREW , PRDC , PWRI , PWRE , PWRR , PWRC , PWRS , PWRX , PRDB , PWRB , PRDR , PRDH , PRDY , PEOL , PEOT , PRDD , PWRD , PCLK , PWLN , PRLN , PRDI , PEOF , PELN , PRDS , PTRP , PXIT , PFDF , PSIO , PEIO , PMSG , PSKP , PLIM , PTRA , PWRP , UNDEF_CSP ) ;

BTW: I don't want to add new CSPs if not absolutely necessary. If new runtime functions are needed, I usually try to code them in Pascal and add them using the new library module mechanism; see other subpages (examples: ALLOC, FREE, MEMCPY, MEMSET etc.).


Call CMS commands from PASCAL programs

To enable the snapshot module PASSNAP to show variables from modules, too, it is necessary to open files using CMS FILEDEF commands which are issued at runtime (command built from Pascal variables).

I recall that at the Stuttgart University installation in 1984 and 1985 (a 4331 and a 3083 running VM/CMS), there was an external subroutine called CMSX, which could be called from Pascal VS (see documentation) and could call CMS commands.

So I analyzed some programs which I found on the PLI minidisk of the Hercules VM distribution and looked how a CMS command can be called. In fact, this turned out to be pretty simple; the CMS command has to be "tokenized" and then the SVC 202 (X'CA') has to be called.

I did the tokenization in Pascal, put the routine CMSX in the library module PASSTRX (where we have MEMSET and MEMCPY already) and called the ASSEMBLER function $PASSYS in PASMONN to do the SVC call. BTW: $PASSYS was called $PASSTOR before; it did the basic GETMAINs and FREEMAINs for the ALLOC and FREE function, but it was better to rename it to $PASSYS, because it does all kind of system related stuff, not only storage management.

This is the CMSX procedure in PASSTRX:


function $PASSYS ( FUNCCODE : INTEGER ; X : CHARPTR ) : CHARPTR ; (****************************************************) (* ist in PASMONN.ASS implementiert und *) (* realisiert Basis-Storage-Dienste wie *) (* GETMAIN und FREEMAIN *) (****************************************************) EXTERNAL ; procedure CMSX ( CMD : CHARPTR ; var RETCODE : INTEGER ) ; const CMDEND = '#' ; var CMSCMD : CHAR128 ; CP : CHARPTR ; CPT : CHARPTR ; TOK : TOKEN ; I : INTEGER ; begin (* CMSX *) CP := CMD ; CPT := ADDR ( CMSCMD ) ; CMSCMD := ' ' ; repeat /*********************/ /* blanks ueberlesen */ /*********************/ while ( CP -> = ' ' ) and ( CP -> <> CMDEND ) and ( CP -> <> CHR ( 0 ) ) do CP := PTRADD ( CP , 1 ) ; /**********************************/ /* wenn nichts mehr da, dann raus */ /**********************************/ if ( CP -> = CMDEND ) or ( CP -> = CHR ( 0 ) ) then break ; /******************************************/ /* wenn klammer auf, dann separates token */ /******************************************/ if CP -> = '(' then begin TOK := '(' ; CP := PTRADD ( CP , 1 ) ; end (* then *) /*************************************************/ /* andernfalls token, bis trennzeichen erscheint */ /*************************************************/ else begin I := 0 ; TOK := ' ' ; while ( CP -> <> ' ' ) and ( CP -> <> '(' ) and ( CP -> <> CMDEND ) and ( CP -> <> CHR ( 0 ) ) do begin if I < SIZEOF ( TOKEN ) then begin I := I + 1 ; TOK [ I ] := CP -> ; end (* then *) ; CP := PTRADD ( CP , 1 ) ; end (* while *) end (* else *) ; /************************************************/ /* token in cmscmd uebertragen, wenn noch platz */ /************************************************/ if PTRDIFF ( CPT , ADDR ( CMSCMD ) ) < SIZEOF ( CMSCMD ) - 8 then begin MEMCPY ( CPT , ADDR ( TOK ) , SIZEOF ( TOKEN ) ) ; CPT := PTRADD ( CPT , SIZEOF ( TOKEN ) ) ; end (* then *) until FALSE ; /************************************************/ /* Ende-Kennung dahinter */ /************************************************/ MEMSET ( CPT , CHR ( 255 ) , 8 ) ; /************************************************/ /* CMS-Kommando aufrufen via $PASSYS / 11 */ /************************************************/ if FALSE then WRITELN ( 'test cmsx: <' , CMSCMD , '>' ) ; RETCODE := PTR2INT ( $PASSYS ( 11 , ADDR ( CMSCMD ) ) ) ; if FALSE then WRITELN ( 'test cmsx: retcode = ' , RETCODE ) ; end (* CMSX *) ;

and here are some CMSX sample calls (the commands must be terminated by '#' or CHR (0)):


CMSCMD := 'FILEDEF TESTFILE DISK TESTFILE INPUT ' '(RECFM F LRECL 80 #' ; WRITELN ( CMSCMD ) ; CMSX ( ADDR ( CMSCMD ) , RC ) ; WRITELN ( '... liefert RC = ' , RC : 1 ) ; CMSCMD := 'STATE XXXX YYYY A#' ; WRITELN ( CMSCMD ) ; CMSX ( ADDR ( CMSCMD ) , RC ) ; WRITELN ( '... liefert RC = ' , RC : 1 ) ; CMSCMD := 'STATE TESTDBG LISTING A#' ; WRITELN ( CMSCMD ) ; CMSX ( ADDR ( CMSCMD ) , RC ) ; WRITELN ( '... liefert RC = ' , RC : 1 ) ;


PASSNAP prints variables from more than one source file

PASSNAP (the Snapshot routine) uses debug information files which are written by the compiler. These files contain informations about the layout of structure types, the location of variables etc., variable names of course, procedure names and so on.

At runtime, PASSNAP examines the save area trace, gets the entry point and the stack addresses from there and so it can access the storage and print the variables in Pascal notation.

Up until now, PASSNAP only could print the variables of one source file; that is the file which is run by the PASRUN command, that is, the main program. External procedures which reside in separate modules could not be handled by PASSNAP, because those modules had a separate debug information file (called "modulename" QRR up until now), and PASSNAP could open only one, which was FILEDEFed statically in the PASRUN EXEC.

So to enable PASSNAP to print variables in other modules, too, several problems had to be solved:

This all worked without much problems; see below the coding from the procedure PRINT_VARIABLE in PASSNAP, where the debug information file is opened:


if not ( PID in PROCSTOCOME ) then begin if FALSE then WRITELN ( 'print_variable: try reset(qrd), sourcename = ' , SOURCENAME ) ; CMSCMD := 'STATE XXXXXXXX DBGINFO * #' ; INS_SOURCENAME ( CMSCMD , SOURCENAME , 7 ) ; CMSX ( ADDR ( CMSCMD ) , RC ) ; if FALSE then WRITELN ( 'print_variable: state command returns rc = ' , RC ) ; STATERC := RC ; CMSCMD := 'FILEDEF QRD CLEAR #' ; CMSX ( ADDR ( CMSCMD ) , RC ) ; if STATERC = 0 then begin CMSCMD := 'FILEDEF QRD DISK XXXXXXXX DBGINFO * ' '(RECFM F LRECL 80#' ; INS_SOURCENAME ( CMSCMD , SOURCENAME , 18 ) ; CMSX ( ADDR ( CMSCMD ) , RC ) ; if FALSE then WRITELN ( 'print_variable: filedef returns rc = ' , RC ) ; RESET ( QRD ) ; PROCSTOCOME := [ 0 .. 255 ] end (* then *) else begin WRITELN ( OUTPUT , ' Debug information file for ' 'program/module ' , SOURCENAME , ' not found' ) ; WRITELN ( OUTPUT ) ; return ; end (* else *) ; end (* then *) ;

You will find an example of a Snapshot output with multiple modules in the next story


PASSNAP prints heap variables allocated by NEW and ALLOC

Heap variables allocated by the standard procedure NEW are printed by PASSNAP. If PASSNAP encounters a pointer variable during variable processing, it checks the value, that is: the address in the pointer variable. If it lies between the bounds of the allocated heap storage, it prints the contents of the heap storage; because pointers in Pascal are typed, this is done in the same way as for every other variable, that is: the variable is shown in Pascal notation.

Checking for heap boundaries is easy, because the "classical" heap in Stanford Pascal is a contiguous area of storage which is allocated in its maximum size at program start time.

Heap variables allocated by the new ALLOC function were not shown, because the addresses are outside the bounds of the "classical" Pascal heap storage.

I changed PASSNAP, so that, after checking for "classical" heap, it checks for ALLOC areas, too. This was done by creating a new function called CHKALLOC, which is implemented in PASLIBX. It walks through the list of heap segments and looks if the given address lies within one of the allocated heap segments; if so, it returns the address of the area, if the area is the beginning of an allocated segment and the address of the HANC, if the address is another address inside the HANC. In all other cases (address not in heap), it returns NIL. So a result, which is not NIL tells, that the address lies inside a heap segment and the heap storage can be accessed and printed.

PASSNAP was further extended

Here you have an example of a new Snapshot output; it was issued from a test version of the storage management routines in PASLIBX which did a SNAPSHOT at a certain place.

You will also see that the variables in this example come from separate modules (from PASLIBX and from the main module, which is called TESTDBG; the CSECT name is $PASMAIN).

Snapshot output with heap storage and multiple modules


Porting Stanford Pascal to Windows, OS/2 and Linux - first steps

Since 2012, I had the plan to run Stanford Pascal programs on ASCII based platforms, too. The final goal is, of course, to run the compiler on those platforms and to get the same results running the programs compiled there.

But a first step would be to get some programs executed on Windows, for example, which behave the same as the original programs on VM/370.

To reach this goal, I wrote a P-Code interpreter program in C, which should be able to run on every platform that supports C. This program, called PCINT.C, first assembles the P-Code source into an internal (binary) representation which is held in storage, and then it interprets (that is: executes) this P-Code program, either controlled by commands from the console, or uncontrolled, that is, fast.

I started to write this program in 2012, but I soon paused work, because I discovered some severe portability issues in the P-Code which I couldn't resolve at that time. For example, some char values were represented by their numeric value (EBCDIC code point) in the P-Code, so running this on an ASCII platform would lead to wrong results. Another example: char set constant are implemented as bit strings (in fact: strings of integer constants), which are related to the specific character set used.

Some examples from real Pascal programs with the corresponding P-Code:


/***************************************************/ /* Pascal code snippets with portability issues */ /***************************************************/ CSET := [ 'B' , 'E' , 'R' , 'N' , 'D' ] ; ... for C := 'a' to 'z' do if C in CSET then CSET := CSET + [ MAJOR ( C ) ] ; /***************************************************/ /* generated P-Code */ /***************************************************/ LOC 51 LDA 1,364 LCA S,(0,0,0,0,0,0,0,0,0,0,0,0,11264,1088) -- set constant SLD 28,396 SMV 32,28 ... LOC 56 LDC C,'a' STR C,1,356 ... L12 LAB LOD C,1,356 ORD LDC I,169 -- should be 'z' NEQ I FJP L11

Now in 2016, because I was now able to make significant changes to the compiler, I continued the work on the P-Code interpreter. When I got some programs running including the Fibonacci test program involving recursive procedure calls, I ran again into the portability issues from 2012. I had to change the compiler on VM/370 (both passes), so that the generated P-Code becomes more portable across platforms.

For example, the P-Code for the example Pascal statements above now looks like this:


/***************************************************/ /* better P-Code */ /***************************************************/ LOC 51 LDA 1,364 LCA S,C28'BDENR' SLD 28,396 SMV 32,28 ... LOC 56 LDC C,'a' STR C,1,356 ... L12 LAB LOD C,1,356 ORD LDC C,'z' ORD NEQ I FJP L11

When this was finished, two test programs involving char sets ran successfully on Windows, yielding the same results as on Hercules (well, almost, because some code related issues remain visible at the source code level, e.g.: SUCC ('R') is 'S' on Windows, but not on the mainframe - this is OK in my opinion and has to be handled by the Pascal code).

The two test programs:

TESTSET.PAS
TESTSET2.PAS

As a side effect, I changed the set representation in the P-Code for sets, which are not sets of char, too.

Before my change, set constants were represented by strings of 16-bit integers, like this:


/***************************************************/ /* Pascal Code; some subset of a scalar type */ /***************************************************/ STERMSYMBOLE := [ EOFSY , STRIPU , ELSESY , ENDSY , OTHERWISESY , UNTILSY ] ; /***************************************************/ /* set representation in P-Code; */ /* almost unreadable to human readers */ /***************************************************/ LOC 2291 LCA S,(36864,768,4096,4096) SLD 8,1608 SMV 8,8

If the base type of the set is char, the set is represented by a char string which contains all the characters that are part of the set. But if the base type is not char, but some sort of scalar or other simple type, the set is represented as a bit string, but it is now printed using hex digits. The first char after the comma (X vs. C) tells the representation type. For example:


/***************************************************/ /* new set representation in P-Code; */ /* you can now spot the six one-bits, IMO */ /***************************************************/ LOC 2291 LCA S,X8'9000030010001000' SLD 8,1608 SMV 8,8

Regarding speed: I compared the speed of the (interpreted) FIBOK program to the same program running on Hercules on the same box (which of course is "interpreted", too - by the Hercules engine, which emulates the 370 hardware). The interpreted PRR code was slightly faster than the original Pascal program running on Hercules, which is an encouraging result. Maybe it will get slower, when I continue work; but on the other hand, there may be room for some improvements, too.

At the moment, the compiler itself doesn't run successfully on Windows; some problems still need to be fixed, and some instructions aren't implemented yet.

Example of a Stanford Pascal program running on Windows

KALENDER.PAS
P-Code Interpreter running KALENDER.PRR on Windows


Porting Stanford Pascal to Windows, OS/2 and Linux - other issues

When I tested a more complex program, I discovered another portability issue. Case statements are implemented by branch tables, that is, tables of branch instructions that are indexed by the value of the case variable, but: if the case variable is of char type, the branch table is constructed based on the EBCDIC character set.

It took me some time to think about several possible solutions for this problem. The problem exists only with char based branch tables; integer subtype or ordinal based branch tables work correctly. But I wanted the same solution for all kinds of branch tables.

This example shows a case statement and its implementation using the XJP instruction before the change:


case T of 'A' : WRITE ( ' SONNTAG,' ) ; 'B' : WRITE ( ' MONTAG,' ) ; 'D' : WRITE ( ' MITTWOCH,' ) ; 'F' : WRITE ( ' FREITAG,' ) ; 'G' : WRITE ( ' SAMSTAG,' ) ; end (* case *) ; /***************************************************/ /* generated P-Code */ /***************************************************/ XJP L3 ... L3 DEF 193 L4 DEF 199 L5 LAB UJP L8 UJP L9 UJP L6 UJP L11 UJP L6 UJP L13 UJP L14 L6 LAB

Each XJP involves four labels, starting with the number specified with XJP (in the example L3, L4, L5 and L6). The first is the lowest case value, the second is the highest, the third is the begin of the branch table, and the forth is the default label. The default label is the target of XJP, if the case value is outside the defined range (that is, lower than the lowest or higher than the highest value), but it is also used to fill the branch table with UJP branches for values inside the range which are not used in the case statement.

The first change I made was to add a type identifier to the DEF instruction, which was not present before. See the numbers in the example, which were used in place of the character constants. The new type identifiers can be I (integer) or C (char).

The second change made the branch table portable, because each entry now includes a DEF constant and a branch target. This means, that XJP is redefined in such a way that it scans the DEF constants in the branch table (which may be of type char) and branches to the address, if it finds a match. It branches to the default address, if it finds no match. The branch table may be "incomplete", because unused entries can be omitted. So for case statements with big "holes", the generated P-Code may be smaller.

The second pass of the compiler, which generates machine code out of the P-Code, may expand the portable P-Code to the old representation of "true" branch tables, so that there is no performance degradation. (In fact, this is what PASCAL2.PAS for IBM Mainframe does today). But this is left to the second pass, which is non-portable, anyway. The P-Code representation is meant to be portable.

See the example above again in the new "portable branch-table" version:


case T of 'A' : WRITE ( ' SONNTAG,' ) ; 'B' : WRITE ( ' MONTAG,' ) ; 'D' : WRITE ( ' MITTWOCH,' ) ; 'F' : WRITE ( ' FREITAG,' ) ; 'G' : WRITE ( ' SAMSTAG,' ) ; end (* case *) ; /***************************************************/ /* generated P-Code */ /***************************************************/ XJP L3 ... L3 DEF C,'A' L4 DEF C,'G' L5 LAB DEF C,'A' UJP L8 DEF C,'B' UJP L9 DEF C,'D' UJP L11 DEF C,'F' UJP L13 DEF C,'G' UJP L14 L6 LAB

This looks good at first sight, but there is still a portability problem:

there is no guarantee, that the constants which the compiler on platform X puts in the lowest and highest positions (L3 and L4 in the example) will remain the lowest and highest when migrating to platform Y.

In fact, the only solution will be, to completely get rid of the min and max values and the two labels, and to compute the minimum and maximum constant out of the DEF instructions contained in the "portable branch table". This will be done in a future step; the P-Code interpreter on Windows etc. can ignore the min and max values from the start, and they can be removed from the mainframe compiler, later.

1. Note: P-Code interpreters and code generators should not expect that the DEF constants are sorted with respect to the character set of their target machine; if the P-Code file comes from a foreign machine, the DEF constants will not be sorted in the general case.

2. Note: PASCAL2.PAS for the IBM mainframe (at the moment) uses the min and max values to compute the size of the branch table beforehand; this is not correct for P-Code files which come from other platforms. The min and max values are not portable. Furthermore, it is not possible to compute the needed size at a foreign platform, because it depends on the character set of the target platform. The only correct way is this: scan the DEF constants when on the target platform; compute the min and max values from this scan and determine the needed size of the branch table from their difference.

3. Note: I believe that at this moment there is no sound solution for branch tables that grow very large, but with this extension (portable branch tables), I could at least imagine how such solutions could look.


Improving PASSNAP for the NODEBUG case

When creating new features for the compiler or the P-Code translator, there are often situations when one of the two passes crashes during execution.

The compiler modules cannot be translated using the debug switch (D+), because they deliberately access arrays outside of the defined bounds in some cases etc., and this is not allowed for D+ modules.

So there was no chance up until now to get readable dumps from PASSNAP, when one of the compiler passes crashes.

I wanted to change this, so I did the following extensions to the P-Code translator:

Of course, all the relevant compiler passes and library modules have been recompiled, and PASSNAP was adjusted to the new module layout.

BTW: the following library modules are considered part of the "standard" Pascal runtime and should be used together with every Pascal program, although not every module will be needed in every single case:

To get good diagnose information in case of a runtime error in one of the compiler passes, I changed the PASRUNC EXEC (which runs the compiler) so that it calls the compiler passes using the following LOAD statement:

LOAD &1 PASMONN PASSNAP PASLIBX PASSTRX (ORIGIN 20580 RESET $PASENT CLEAR

The following file shows a PASSNAP output in the NODEBUG case. The Fibonacci program was changed to NODEBUG; in this case, a subrange domain error would not occur, so I inserted a check for the variable being positive, and if not, I wrote an explicit snapshot using $PASSNAP.

New Snapshot output in the NODEBUG case

If you look at the stack segment outputs, you will see that every stack segment starts with the 72 byte system save area; you can see the backward pointer at offset 4, the forward pointer at offset 8, RET address and EPA address at 12 and 16 and then the registers 0 to 12, starting from offset 20.

You can locate Pascal variables in the hex format dump of the STATIC CSECTs and stack segments by using the offsets printed in the compiler listing. The starting addresses of the areas are printed in the hex dump, too; but it is also the address where the hex dump starts printing hex digits. You only need to add the offset of the variable to the starting address of the relevant area.


Porting Stanford Pascal to Windows, OS/2 and Linux - portable branch tables

Please refer to the "other issues" section, too:

Porting Stanford Pascal to Windows, OS/2 and Linux - other issues

I decided to build the correct solution for the branch table problem now. That is, I removed the two constants for the lowest and highest value. And I marked the XJP instruction with a flag (N); this N stands for "new format of branch table".

The new XJP instruction needs only two labels, not four. But when working on the P-Code translator, I discovered that the translator still needs the min and max value fields and uses the label names for the machine code, although they aren't used in the P-Code any more. So I decided to reserve the names in the first pass, but not to use them; the names are left unused and can be used by the second pass.

Here is an example; the P-Code for a character based case statement looked like this before:


case T of 'A' : WRITE ( ' SONNTAG,' ) ; 'B' : WRITE ( ' MONTAG,' ) ; 'D' : WRITE ( ' MITTWOCH,' ) ; 'F' : WRITE ( ' FREITAG,' ) ; 'G' : WRITE ( ' SAMSTAG,' ) ; end (* case *) ; /***************************************************/ /* generated P-Code */ /***************************************************/ XJP L3 ... L3 DEF C,'A' L4 DEF C,'G' L5 LAB DEF C,'A' UJP L8 DEF C,'B' UJP L9 DEF C,'D' UJP L11 DEF C,'F' UJP L13 DEF C,'G' UJP L14 L6 LAB

and it looks now like this:


case T of 'A' : WRITE ( ' SONNTAG,' ) ; 'B' : WRITE ( ' MONTAG,' ) ; 'D' : WRITE ( ' MITTWOCH,' ) ; 'F' : WRITE ( ' FREITAG,' ) ; 'G' : WRITE ( ' SAMSTAG,' ) ; end (* case *) ; /***************************************************/ /* generated P-Code */ /***************************************************/ XJP N,L5 ... L5 LAB DEF C,'A' UJP L8 DEF C,'B' UJP L9 DEF C,'D' UJP L11 DEF C,'F' UJP L13 DEF C,'G' UJP L14 L6 LAB

The labels L3 and L4 are not used any more, but they are reserved and may be used by the P-Code generator to store (internally) the min and max values, which are derived from the DEF constants of the branch table (which need not be sorted by the char set sequence - in the char case).

There is an interesting difference between integer based branch tables and character based branch tables:

The difference between highest and lowest value is limited by the compiler constant CIXMAX; at the moment, CIXMAX = 400. This is equivalent to the largest possible branch table (equals 800 Bytes in the CSECT), and its by far large enough for the whole char type.

Some possible improvements for the future (if this should be a problem):

In the end, the implementation of the case statement turned out to be pretty interesting; this is something that was already observed very early by C.A.R. Hoare, IIRC.


Porting Stanford Pascal to Windows, OS/2 and Linux - success !!

In December 2016, I finally reached my target from 2012: porting the compiler to Windows, Linux and OS/2. This turned out to be pretty useful, because from now on the turnaround times for changes to the compiler became much lower. And: the other tools that I use, especially PASFORM.PAS (which formats Pascal sources) run on Windows etc., too, now, so that I don't need switch platforms that often.

This is the compiler compiling itself (and the P-Code translator) on Windows (you may notice, that the version date changed on the second run, because the compiler compiled itself and then the new compiler was used to compile the P-Code translator):


PCINT (Build 1.0 Dec 30 2016 20:41:29) **** STANFORD PASCAL COMPILER, OPPOLZER VERSION OF 12.2016 **** WARNING: PASCAL EXTENSIONS USED. **** NO SYNTAX ERROR(S) DETECTED. **** 13636 LINE(S) READ, 144 PROCEDURE(S) COMPILED, **** 23534 P_INSTRUCTIONS GENERATED, 2.30 SECONDS IN COMPILATION. *** EXIT Aufruf mit Parameter = 0 *** *** Pascal Programm STP *** PCINT (Build 1.0 Dec 30 2016 20:41:29) **** STANFORD PASCAL COMPILER, OPPOLZER VERSION OF 01.2017 **** WARNING: PASCAL EXTENSIONS USED. **** NO SYNTAX ERROR(S) DETECTED. **** 11437 LINE(S) READ, 84 PROCEDURE(S) COMPILED, **** 25297 P_INSTRUCTIONS GENERATED, 2.06 SECONDS IN COMPILATION. *** EXIT Aufruf mit Parameter = 0 *** *** Pascal Programm STP ***

Look at the performance; it is pretty amazing. And recall that the compiler is not running on the hardware directly; instead the P-Code instructions are (kind of) interpreted.

There were some interesting milestones when testing more programs on the Non-mainframe platforms:

BTW: I decided to make the compiler and the P-Code translator available on this website (see Resources page), but not the PCINT interpreter. This is because it is not perfect yet, because I invested very much in this part of software which I created from scratch, and because I want to know who is using the compiler on the Non-mainframe platforms - and probably support you, if you plan to do so. So, please, if you plan to use the compiler on Non-mainframe platforms like Windows, OS/2 or Linux, contact me by eMail etc., so that we can talk about a test license (which will be free of charge, for the moment).


Bit operations

Since December 2016, the further development of the Pascal compiler has become much easier, because now changes to the compiler (first pass) and tests can be done all at the Windows platform. "Ports" to Hercules and VM only need to be done, if changes to the PRR format or extensions to the P-Code translator are needed, and at a very late stage. Not all changes or extensions to the compiler require extensions to the P-Code.

From some discussions on the FPC mailing list, I got the idea to support bit operations on integer operands, too.

The operations AND, OR, NOT have been extended to do bit operations, when being used with integers (an error 134 was issued before, if these "logical" operations were used with integer operands).

Another operation XOR is provided (a new reserved symbol) for exclusive or operation; it can be used with integer or boolean operands.

A new P-Code instruction XOR has been created; the P-Code instructions AND, IOR, NOT and XOR now have a type parameter, which can be B for boolean or I for integer. B defines the (old) logical operation, I is for bit operations (on integers).

Sample program:


program TESTAND ( OUTPUT ) ; (********) (*$A+ *) (********) var X , Y : INTEGER ; A , B : BOOLEAN ; begin (* HAUPTPROGRAMM *) X := 0X0E ; Y := 0X1D ; A := FALSE ; B := TRUE ; WRITELN ( 'x ' , X ) ; WRITELN ( 'y ' , Y ) ; WRITELN ( 'x and y ' , X and Y ) ; WRITELN ( 'x and y ' , X & Y ) ; WRITELN ( 'x or y ' , X or Y ) ; WRITELN ( 'x or y ' , X | Y ) ; WRITELN ( 'x xor y ' , X xor Y ) ; WRITELN ( 'not x ' , not X ) ; WRITELN ( 'not y ' , not Y ) ; WRITELN ( 'a ' , A ) ; WRITELN ( 'b ' , B ) ; WRITELN ( 'a and b ' , A and B ) ; WRITELN ( 'a and b ' , A & B ) ; WRITELN ( 'a or b ' , A or B ) ; WRITELN ( 'a or b ' , A | B ) ; WRITELN ( 'a xor b ' , A xor B ) ; WRITELN ( 'not a ' , not A ) ; WRITELN ( 'not b ' , not B ) ; end (* HAUPTPROGRAMM *) .

You may notice the integer constants in hexadecimal notation, which were introduced with this change, too.

The output of the sample program:


x 14 y 29 x and y 12 x and y 12 x or y 31 x or y 31 x xor y 19 not x -15 not y -30 a FALSE b TRUE a and b FALSE a and b FALSE a or b TRUE a or b TRUE a xor b TRUE not a TRUE not b FALSE


Support undeclared procedures

Other programming languages like C and PL/1 support calls to procedures that have no declarations in the actual compile unit; they consider the undeclared procedure name to be an external name that will be resolved by the linkage editor.

When I tried the same with Pascal, I got syntax errors, because the compiler believed the identifier to be a variable; it complained already when the opening paranthese followed the procedure identifier (when calling the procedure).

This seemed unacceptable to me, so I first changed that.

The procedure SEARCHID inserted the unknown identifier into the ID list as a dummy var declaration; that's why there were more errors, when it was actually a procedure identifier (and a paranthese or a semicolon followed). Now the insert into the ID list is deferred until the next symbol has been read. (SEARCHID has to be extended with more parameters etc. to be able to do this).

In case of an undeclared procedure, the error code has been changed (from 104 "identifier not declared" to 184 "procedure not declared").

The error 184 has been changed to a warning, this way it is possible to generate code even for undeclared procedures and functions. The types of the parameters are taken from the types of the arguments, which will hopefully fit to the external definition of the procedure. If the arguments are variables, the args are passed by reference; if not, by value.

Note: it is possible that different calls to the same undeclared procedure generate different call sequences, if you provide different types of parameters, and if you mix expressions or constants with variables. The characteristics of one call are not recorded or compared with another call. If you call undeclared procedures, you do so at your own risk.

Another note: it is quite simple to force the compiler to generate a by-value call; if you put parantheses around the argument, it is an expression (no variable any more), and so a by-value parameter passing sequence will be generated ... BTW: much the same way as dummy arguments are created when using PL/1.

Undeclared functions, BTW, are supported in much the same way. Calls to undeclared functions produce the warning code 186 ("function not declared"); the function is supposed to return an integer result, and the parameters are passed according to the same rules as with undeclared procedures. If there is no opening paranthese following the function identifier, the identifier is (of course) considered to be an undeclared variable and will get an error 104, as before.

If you're not happy with all this, there is a simple solution: declare the procedure or function using the EXTERNAL directive.

BTW: when testing this, I discovered that external modules which contain global static variables (which are global inside the external modules, but local to them) use the same CSECT name (#PASMAI#) to store them, so there was a name conflict at linkage time, if two or more modules used this feature. This was resolved by creating a new attribute CSTNAME for declared procedures (which is the name of their STATIC CSECT) and by computing a CSTNAME (derived from the module name) for the dummy main program of an external module, too.


Some Pascal/VS features added (DATETIME, HALT, CLOSE, TERMIN/TERMOUT)

I wanted to compile and run a program which I wrote for a former custumer of mine in the late 199x years. The program does speed computations for urban subway trains and computes informations for the signals there. The original program was written for IBMs Pascal/VS on VM/CMS. The program is still in use today and will be extended this year, but I converted it to ANSI C in 2001, because the customer moved off of the mainframe.

My idea was to get the program running on Hercules/VM and on Windows, too, and to check if the results are the same. The program uses floating point very much, so I thought this would be a good test case for the floating point instructions.

But the first obstacle was that the program contained some calls to standard functions that were available in Pascal/VS, but not in the "new" Stanford Pascal. See above: the problem functions were: DATETIME, HALT, CLOSE, TERMIN and TERMOUT.

I decided after a while that CLOSE should be added as a new CSP function to the compiler directly, because it is a good idea in my opinion to be able to close a file before the program ends. This was not possible up until now; files were only closed by the runtime implicitly during a new RESET or REWRITE or at the very end of the program. And: CLOSE is available on most compilers (Turbo Pascal etc.)

So I added a new CSP CLS, and extended PASMONN.ASS (on Hercules/VM) and PCINT.C (on Windows etc.) to support it.

The other functions are different; they should only be included on an "as needed" base. This can be done by writing an external module (I called it PASCALVS), where the functions are implemented. The functions need not be declared, when using them (undeclared procedures are supported since 01.2017). You will get warnings W184, if you dont, but the program will compile and run, anyway. You only need to add the PASCALVS module at run time; you can find it on the AWSTape (see Resources) - and the source code is shown below.

DATETIME / DATTIM10

DATETIME has the flaw that it returns only 2-digit years; I provided another procedure DATTIM10 which returns date and time as CHAR(10) arrays; the format is DD.MM.YYYY (european format).

DATETIME and DATTIM10 both use the existing "system" variables DATE and TIME; they both had the "problem", that they were only computed once at program start time.

I added two CSPs DAT and TIM, which, when called, ask the system for the actual date and time and refresh the system variables (which are at a fixed location in store). The CSPs DAT and TIM are called before every reference to DATE and TIME, so now you always get the actual date and time, when you call DATE, TIME, DATETIME or DATTIM10.

HALT

HALT is simply EXIT (8).

TERMIN / TERMOUT

TERMIN and TERMOUT had to be implemented differently, depending on the platform.

With PCINT (that is, Windows, OS/2, Linux), there is a "terminal" switch in the file control block (FCB), which is simply set to Y by both TERMIN and TERMOUT. This means that on OPEN the files are connected to stdin and stdout, respectively.

On CMS, the DDNAME is extracted from the FCB using the FILEFCB function, and then a command "FILEDEF ddname TERM" is issued using the function CMSX, which is defined in the Pascal Extension Library (PASLIBX.PAS).

The PASCALVS Source Code


module PASCALVS ; /************************************************/ /*$A+ */ /************************************************/ /* */ /* Modul PASCALVS */ /* */ /* enthaelt Funktionen, die im PASCAL/VS */ /* Compiler vorhanden waren */ /* */ /************************************************/ type PLATFORM = ( PLATF_UNKNOWN , PLATF_INTEL , PLATF_MAINFRAME ) ; CHAR4 = array [ 1 .. 4 ] of CHAR ; CHAR8 = array [ 1 .. 8 ] of CHAR ; CHAR10 = array [ 1 .. 10 ] of CHAR ; CHAR80 = array [ 1 .. 80 ] of CHAR ; CHARPTR = -> CHAR ; static PLATF : PLATFORM ; procedure CMSX ( CMD : CHARPTR ; var RETCODE : INTEGER ) ; EXTERNAL ; local procedure CHECK_PLATFORM ; begin (* CHECK_PLATFORM *) if ORD ( 'A' ) = 65 then PLATF := PLATF_INTEL else PLATF := PLATF_MAINFRAME end (* CHECK_PLATFORM *) ; procedure HALT ; begin (* HALT *) EXIT ( 8 ) ; end (* HALT *) ; procedure DATETIME ( var DAT : CHAR8 ; var TIM : CHAR8 ) ; var DATX : CHAR10 ; begin (* DATETIME *) DATX := DATE ; DAT := 'MM/DD/YY' ; DAT [ 1 ] := DATX [ 4 ] ; DAT [ 2 ] := DATX [ 5 ] ; DAT [ 4 ] := DATX [ 1 ] ; DAT [ 5 ] := DATX [ 2 ] ; DAT [ 7 ] := DATX [ 9 ] ; DAT [ 8 ] := DATX [ 10 ] ; PACK ( TIME , 1 , TIM ) ; end (* DATETIME *) ; procedure DATTIM10 ( var DAT : CHAR10 ; var TIM : CHAR10 ) ; var DATX : CHAR10 ; begin (* DATTIM10 *) DATX := DATE ; DAT := 'DD.MM.YYYY' ; DAT [ 1 ] := DATX [ 4 ] ; DAT [ 2 ] := DATX [ 5 ] ; DAT [ 4 ] := DATX [ 1 ] ; DAT [ 5 ] := DATX [ 2 ] ; DAT [ 7 ] := DATX [ 7 ] ; DAT [ 8 ] := DATX [ 8 ] ; DAT [ 9 ] := DATX [ 9 ] ; DAT [ 10 ] := DATX [ 10 ] ; TIM := TIME ; end (* DATTIM10 *) ; procedure TERMIN ( var X : TEXT ) ; var FCB : VOIDPTR ; PDDN : -> CHAR8 ; PTERM : -> CHAR ; CMSCMD : CHAR80 ; CPT : -> CHAR ; RC : INTEGER ; begin (* TERMIN *) if PLATF = PLATF_UNKNOWN then CHECK_PLATFORM ; if PLATF = PLATF_INTEL then begin FCB := FILEFCB ( X ) ; PDDN := PTRADD ( FCB , 8 ) ; PTERM := PTRADD ( FCB , 279 ) ; PTERM -> := 'Y' end (* then *) else begin FCB := FILEFCB ( X ) ; CMSCMD := 'FILEDEF XXXXXXXX CLEAR #' ; CPT := PTRADD ( ADDR ( CMSCMD ) , 8 ) ; MEMCPY ( CPT , FCB , 8 ) ; CMSX ( ADDR ( CMSCMD ) , RC ) ; CMSCMD := 'FILEDEF XXXXXXXX TERMINAL (RECFM V #' ; MEMCPY ( CPT , FCB , 8 ) ; CMSX ( ADDR ( CMSCMD ) , RC ) ; end (* else *) end (* TERMIN *) ; procedure TERMOUT ( var X : TEXT ) ; var FCB : VOIDPTR ; PDDN : -> CHAR8 ; PTERM : -> CHAR ; CMSCMD : CHAR80 ; CPT : -> CHAR ; RC : INTEGER ; begin (* TERMOUT *) if PLATF = PLATF_UNKNOWN then CHECK_PLATFORM ; if PLATF = PLATF_INTEL then begin FCB := FILEFCB ( X ) ; PDDN := PTRADD ( FCB , 8 ) ; PTERM := PTRADD ( FCB , 279 ) ; PTERM -> := 'Y' end (* then *) else begin FCB := FILEFCB ( X ) ; CMSCMD := 'FILEDEF XXXXXXXX CLEAR #' ; CPT := PTRADD ( ADDR ( CMSCMD ) , 8 ) ; MEMCPY ( CPT , FCB , 8 ) ; CMSX ( ADDR ( CMSCMD ) , RC ) ; CMSCMD := 'FILEDEF XXXXXXXX TERMINAL (RECFM V #' ; MEMCPY ( CPT , FCB , 8 ) ; CMSX ( ADDR ( CMSCMD ) , RC ) ; end (* else *) end (* TERMOUT *) ; begin (* HAUPTPROGRAMM *) end (* HAUPTPROGRAMM *) .


Differences on floating point computations and rounding

When comparing the results of floating point operations on Hercules/VM and Windows, I discovered many differences. Most details of this difficult analysis and work are documented here:

Stanford Compiler Facebook page.

The biggest problem was that the output values were implicitly rounded on the PC (using the sprintf() function from ANSI C), but not always correctly. On Hercules/VM, there was no rounding at all.

For example: when writing the value 12.35 with only one decimal digit, Stanford Pascal on VM would always write 12.3, but Stanford Pascal on the PC would write 12.3 or 12.4, depending on the precision of the computed value (if there has been some arithmetic before the output operation, the value may be a little bit lower than "true" 12.35, so 12.3 is printed instead of 12.4, which would be correct).

To make the floating point output comparable, there is only one solution (which I know from my long experience with this topic, when working for a big European insurance company):

the output has to be rounded to the needed decimal precision,
but before the rounding a correction has to be made
that compensates the error introduced by the computations before.

This solution, BTW, will not work in every pathological case - a specialist on this topic once told me: "what you do is pretty good, but you can do what you want: I will always find a counter example where it will not work".

The "correction" in fact means that - depending on the platform - a value is added to the original value, that depends on the size on the original value; for example: if x is the original value, then the correction value is x * (16 ** (- 13)).

To do this, I implemented a new function in the Pascal runtime that does this kind of "corrective" rounding. This function is called ROUNDX, which means "round extended". It has a second parameter which tells the decimal position where the rounding has to take place; -2 for example means: at the 2nd decimal position. Values from 4 to -15 are accepted for the second parameter.

The ROUNDX function needs the FLOOR function as defined by the C runtime; so I added this function, too.

On Hercules/VM, ROUNDX is implemented in PASLIBX.PAS in Pascal (calling sequence generated by the compiler using CALLLIBRARYFUNC). FLOOR is a true CSP function which is implemented in PASMONN using ASSEMBLER (only a very small number of ASSEMBLER instructions). See the ROUNDX1 source code (from PASLIBX.PAS) below.

On Windows etc., roundx is a C function, and floor is already available, because it is a ANSI C function.

Because the "classical" ROUND function from Pascal worked differently on the both platforms, too, I changed its implementation, too. Now on both platforms, ROUND (x) is implemented as ROUNDX (x, 0), followed by TRUNC ... because ROUNDX gives a REAL result, but ROUND needs INTEGER. With this change, now ROUND works the same on all platforms (in most cases).

Note: I decided to do the corrective action only for ROUND and ROUNDX, but not for TRUNC and FLOOR.

The ROUNDX1 Source Code (from PASLIBX)


local function ROUNDX1 ( WERT : REAL ; BEREICH : INTEGER ) : REAL ; /**********************************************************/ /* */ /* roundx.c */ /* */ /* Rundungsfunktion neu mit geaenderter Logik; */ /* die Korrekturkonstante wird anhand der Groessen- */ /* ordnung des Ausgangswertes bestimmt (Ausgangs- */ /* wert durch (16 hoch 13); damit wird bei beiden */ /* Plattformen mindestens eine 1 an der letzten */ /* Ziffernposition dazuaddiert). */ /* */ /* Autor: Bernd Oppolzer */ /* April 1995 */ /* */ /**********************************************************/ var FAKTOR : REAL ; TEST : REAL ; RUNDKONST : REAL ; const FAKTTAB : array [ 0 .. 19 ] of REAL = ( 10000.0 , 1000.0 , 100.0 , 10.0 , 1.0 , 10.0 , 100.0 , 1000.0 , 10000.0 , 100000.0 , 1000000.0 , 10000000.0 , 100000000.0 , 1000000000.0 , 10000000000.0 , 100000000000.0 , 1000000000000.0 , 10000000000000.0 , 100000000000000.0 , 1000000000000000.0 ) ; begin (* ROUNDX1 *) FAKTOR := FAKTTAB [ 4 - BEREICH ] ; if WERT < 0.0 then TEST := - WERT else TEST := WERT ; if TEST < 1.0E-55 then begin ROUNDX1 := 0.0 ; return end (* then *) ; /************************************************/ /* */ /* 4 * (16 hoch 12) = 1125899906842624.0 */ /* 8 * (16 hoch 12) = 2251799813685248.0 */ /* 12 * (16 hoch 12) = 3377699720527872.0 */ /* 16 hoch 13 = 4503599627370496.0 */ /* */ /************************************************/ RUNDKONST := TEST / 1125899906842624.0 ; if BEREICH < 0 then begin TEST := ( TEST + RUNDKONST ) * FAKTOR + 0.5 ; TEST := FLOOR ( TEST ) ; if WERT < 0.0 then ROUNDX1 := - TEST / FAKTOR else ROUNDX1 := TEST / FAKTOR end (* then *) else if BEREICH > 0 then begin TEST := ( TEST + RUNDKONST ) / FAKTOR + 0.5 ; TEST := FLOOR ( TEST ) ; if WERT < 0.0 then ROUNDX1 := - TEST * FAKTOR else ROUNDX1 := TEST * FAKTOR ; end (* then *) else begin TEST := ( TEST + RUNDKONST ) + 0.5 ; TEST := FLOOR ( TEST ) ; if WERT < 0.0 then ROUNDX1 := - TEST else ROUNDX1 := TEST ; end (* else *) end (* ROUNDX1 *) ;

Example program: the program didn't show 13 and -13 as result of ROUND before my changes:


begin (* HAUPTPROGRAMM *) /******************/ /* positiver wert */ /******************/ R := 12.5 ; DUMP ( ADDR ( R ) , PTRADD ( ADDR ( R ) , 7 ) ) ; R := R * 4724731.123456 ; R := R / 4724731.123456 ; DUMP ( ADDR ( R ) , PTRADD ( ADDR ( R ) , 7 ) ) ; WRITELN ( 'r vor round = ' , R : 15 : 7 ) ; I := ROUND ( R ) ; WRITELN ( 'i nach round = ' , I : 13 ) ; I := TRUNC ( R ) ; WRITELN ( 'i nach trunc = ' , I : 13 ) ; R := FLOOR ( R ) ; WRITELN ( 'r nach floor = ' , R : 15 : 7 ) ; R := 12.357 ; WRITELN ( 'r vor roundx = ' , R : 15 : 7 ) ; R := ROUNDX ( ( R ) , - 2 ) ; WRITELN ( 'r nach roundx = ' , R : 15 : 7 ) ; /******************/ /* negativer wert */ /******************/ R := - 12.5 ; DUMP ( ADDR ( R ) , PTRADD ( ADDR ( R ) , 7 ) ) ; R := R * 4724731.123456 ; R := R / 4724731.123456 ; DUMP ( ADDR ( R ) , PTRADD ( ADDR ( R ) , 7 ) ) ; WRITELN ( 'r vor round = ' , R : 15 : 7 ) ; I := ROUND ( R ) ; WRITELN ( 'i nach round = ' , I : 13 ) ; I := TRUNC ( R ) ; WRITELN ( 'i nach trunc = ' , I : 13 ) ; R := FLOOR ( R ) ; WRITELN ( 'r nach floor = ' , R : 15 : 7 ) ; R := - 12.357 ; WRITELN ( 'r vor roundx = ' , R : 15 : 7 ) ; R := ROUNDX ( ( R ) , - 2 ) ; WRITELN ( 'r nach roundx = ' , R : 15 : 7 ) ; end (* HAUPTPROGRAMM *) .

Now the program produces the correct (and the same) result on both platforms:


EXEC PASRUN TESTRND Dump Speicherbereich von 000302A0 bis 000302A7 000302A0: 41c80000 00000000 ........ ........ *.H......aaaaaaaa* Dump Speicherbereich von 000302A0 bis 000302A7 000302A0: 41c7ffff fffffffd ........ ........ *.G......aaaaaaaa* r vor round = 12.5000000 i nach round = 13 i nach trunc = 12 r nach floor = 12.0000000 r vor roundx = 12.3570000 r nach roundx = 12.3600000 Dump Speicherbereich von 000302A0 bis 000302A7 000302A0: c1c80000 00000000 ........ ........ *AH..........aaaa* Dump Speicherbereich von 000302A0 bis 000302A7 000302A0: c1c7ffff fffffffd ........ ........ *AG..........aaaa* r vor round = -12.5000000 i nach round = -13 i nach trunc = -12 r nach floor = -13.0000000 r vor roundx = -12.3570000 r nach roundx = -12.3600000 Ready; T=0.05/0.19 23:44:15


Differences on floating point output

Using the new function ROUNDX (see previous chapter) I now was able to change the floating point output. The floating point values are implicitly rounded to the needed decimal position before output; on both platforms, BTW (because sprintf et. al. don't do it right - no correction).

What made things complicated at this point:

WRR (write real) is a CSP function which is implemented in PASMONN.ASS on Hercules/VM. I wanted to call the Pascal function ROUNDX1 (which is implemented in PASLIBX.PAS) from this WRR function, but this was not supported up until now.

So I first tried to built an environment in PASMONN to be able to call a Pascal function from there; this required to understand the save area layout etc. that Pascal uses. The save area layout is the same, but the actions done in the function prologue and epilogue are quite different from MVS and VM standards. Anyway, I finally succeeded in building the environment and in calling the Pascal function from PASMONN.

But then, the program looped. As it turned out, the called Pascal function called another CSP (FLOOR, BTW), and the static save areas of PASMONN were overwritten on this second (recursive) call.

So I had to rethink about this. In the end, I managed to change $PASCSP (the part of PASMONN that implements the CSPs) in such a way that it does not provide its own save areas but it uses instead the Pascal stack to put its save areas there. This way, recursive calls are possible.

There are some drawbacks:

I tried to optimize the new calling sequence a bit by removing some reloads of the stack address register, which seemed unnecessary, but that did not work. Maybe some more examination and improvements can be done later.

Anyway:

At the end of this long story, the output of the floating point values of the old subway train speed program was exactly the same on both platforms.

If I've had this compiler available in 2001, it would not have been necessary to convert the program to C.

Floating point arithmetic seems to work correctly on all platforms

Output of a test program; the second column is printed without explicit rounding (using WRITE (x : 7 : 1)); the third column is printed with explicit rounding - as you can see, there is no difference:


R unkorrigiert ..: 12.30 12.3 12.3 R gerundet ......: 12.30 12.3 12.3 R unkorrigiert ..: 12.31 12.3 12.3 R gerundet ......: 12.31 12.3 12.3 R unkorrigiert ..: 12.32 12.3 12.3 R gerundet ......: 12.32 12.3 12.3 R unkorrigiert ..: 12.33 12.3 12.3 R gerundet ......: 12.33 12.3 12.3 R unkorrigiert ..: 12.34 12.3 12.3 R gerundet ......: 12.34 12.3 12.3 R unkorrigiert ..: 12.35 12.4 12.4 R gerundet ......: 12.35 12.4 12.4 R unkorrigiert ..: 12.36 12.4 12.4 R gerundet ......: 12.36 12.4 12.4 R unkorrigiert ..: 12.37 12.4 12.4 R gerundet ......: 12.37 12.4 12.4 R unkorrigiert ..: 12.38 12.4 12.4 R gerundet ......: 12.38 12.4 12.4 R unkorrigiert ..: 12.39 12.4 12.4 R gerundet ......: 12.39 12.4 12.4 R unkorrigiert ..: 12.40 12.4 12.4 R gerundet ......: 12.40 12.4 12.4 R unkorrigiert ..: 12.41 12.4 12.4 R gerundet ......: 12.41 12.4 12.4 R unkorrigiert ..: 12.42 12.4 12.4 R gerundet ......: 12.42 12.4 12.4 R unkorrigiert ..: 12.43 12.4 12.4 R gerundet ......: 12.43 12.4 12.4 R unkorrigiert ..: 12.44 12.4 12.4 R gerundet ......: 12.44 12.4 12.4 R unkorrigiert ..: 12.45 12.5 12.5 R gerundet ......: 12.45 12.5 12.5 R unkorrigiert ..: 12.46 12.5 12.5 R gerundet ......: 12.46 12.5 12.5 R unkorrigiert ..: 12.47 12.5 12.5 R gerundet ......: 12.47 12.5 12.5 R unkorrigiert ..: 12.48 12.5 12.5 R gerundet ......: 12.48 12.5 12.5 R unkorrigiert ..: 12.49 12.5 12.5 R gerundet ......: 12.49 12.5 12.5 R unkorrigiert ..: 12.50 12.5 12.5 R gerundet ......: 12.50 12.5 12.5

Another test: here the same value is printed using different 2nd parameters for roundx. You can see the effects of roundx, but you can also see at this precision printed the "correction factor", which is x * (1E-13). This, BTW, is NOT the correction factor from ROUNDX, but is is instead another correction implied by WRR (write real) - which was 1E-12 before my change, BTW. It helps to move the value above the critical limit; without this correction, most values printed would be wrong. The correct value is 71.219154.


test roundx ( 2) = 100.000000000009 test roundx ( 1) = 70.000000000006 test roundx ( 0) = 71.000000000007 test roundx ( -1) = 71.200000000007 test roundx ( -2) = 71.220000000007 test roundx ( -3) = 71.219000000007 test roundx ( -4) = 71.219200000007 test roundx ( -5) = 71.219150000007 test roundx ( -6) = 71.219154000007 test roundx ( -7) = 71.219154000007 test roundx ( -8) = 71.219154000007 test roundx ( -9) = 71.219154000007 test roundx (-10) = 71.219154000007 test roundx (-11) = 71.219154000007 test roundx (-12) = 71.219154000007 test roundx (-13) = 71.219154000007 test roundx (-14) = 71.219154000007

Another success story:

The size of the $PASCSP routine is almost 8k, and it is covered by two base registers (10 and 11). Some times during test, I already got addressability errors (when adding trace output etc.).

I managed to move some of the larger "procedures" to the end of the $PASCSP routine and make them define and use their own base register (11), so that the addressability restrictions on the $PASCSP mainline are removed (because it has become shorter). This way $PASCSP as a whole can become much larger than 8k, while still requiring only two base registers - which opens the way for further enhancements.


MVS version

Compiler version: 05.2017

The compiler now runs on MVS (Hercules), too. Same source code (PASCAL1, PASCAL2) as with CMS, same runtime (PASMONN) - although there are some CMS dependencies, controlled by SYSPARM(CMS). Different PASSNAP ... see below.

Several changes and corrections to PASMONN (Pascal runtime) have been made. The most important: PASMONN now supports RESET and REWRITE of PO members with their name specified at runtime using the new function ASSIGNMEM. After RESET, success can be checked by looking at a flag in the Pascal FCB, accessed by the (existing) function FILEFCB.

This has been necessary to provide a MVS variant of PASSNAP; PASSNAP reads debug information at runtime, which depends on the name of the source file. In CMS, this was accomplished using CMS FILEDEFs, issued from the Pascal program. In MVS, ASSIGNMEM is used. The version of PASSNAP for CMS has been renamed to PASSNAPC; the MVS variant now takes the name PASSNAP. The technique to open the debug information file is the only difference between PASSNAPC (CMS) and PASSNAP (MVS).

There is still room for some improvement in the area of error handling etc.; some ideas:

Thanks to Gerhard Postpischil and Juergen Winckelmann for help and good advice and for encouraging me to do the MVS port.


Job control examples for the MVS compiler

Compiler version: 05.2017

I am discussing the best way to distribute the MVS version of the compiler with Juergen Winckelmann; maybe there will be a tape based distribution in the next days or weeks.

Part of the distribution will be some JCL procedures, that should be copied to SYS2.PROCLIB on installation. These procedures allow for easy use of the compiler and are named after the usual MVS convention:

PASNC (to compile only),
PASNCL (compile and link),
PASNCG (compile and go),
PASNCLG (compile, link and go).

The compiler files should be copied at installation time to datasets with a high level qualified (HLQ) of your choice; I am using PASCALN. You should not use PASCAL, because that is used by the old (1979) Stanford Pascal version which is part of the TK3 and TK4- distribution. The HLQ can be specified on call to the JCL procedures, but defaults to PASCALN.

Here is how your compile jobs could look:


//PASCALNG JOB (PASCAL),'CLG FIBDEMO',CLASS=A,MSGCLASS=X, // TIME=1440,REGION=9000K,MSGLEVEL=(1,1) //* //*********************************************************** //* Test compiler //*********************************************************** //* //COMPGO EXEC PASNCG,MEM=FIBDEMO, // SRC='PASCALN.TESTPGM.PAS' //*

This is for compile and go; no load library needed; and no particular input and output files for the GO step in this case.

With a linker step, and with some assignments for the GO step:


//PASCALN6 JOB (PASCAL),'TEST KALENDER',CLASS=A,MSGCLASS=X, // TIME=1440,REGION=8000K,MSGLEVEL=(1,1) //* //COMPGO EXEC PASNCLG,MEM=KALENDER, // SRC='PASCALN.TESTPGM.PAS', // MOD='PASCALN.TESTPGM.LOAD' //GO.INPUT DD * 2017 //GO.DRUCKER DD SYSOUT=A

This is how the JCL procedure PASNCLG is defined:


//PASNCLG PROC MEM=X, // SRC='SYS2.NULLBIBL', // MOD='SYS2.NULLBIBL', // SOUT=A,HLQ=PASCALN,REG=8192K,WORK=VIO, // GOTIME=299,GOPARM=,GOREG=2048K //* //*********************************************************** //* //* PASCAL COMPILER FIRST AND SECOND PASS //* //*********************************************************** //* //COMPILE EXEC PGM=PASCAL1,REGION=&REG,PARM='&MEM' //STEPLIB DD DISP=SHR,DSN=&HLQ..COMPILER.LOAD //INPUT DD DISP=SHR,DSN=&SRC.(&MEM) //OUTPUT DD SYSOUT=&SOUT //PRD DD DISP=SHR,DSN=&HLQ..COMPILER.MESSAGES(MESSAGES) //PRR DD DSN=&&PCODE,UNIT=&WORK, // SPACE=(TRK,(200,200),RLSE),DISP=(,PASS), // DCB=(RECFM=FB,LRECL=80,BLKSIZE=19040) //LISTING DD SYSOUT=&SOUT,DCB=(RECFM=FB,LRECL=137) //DBGINFO DD DISP=SHR,DSN=&HLQ..DBGINFO(&MEM) //TRACEF DD SYSOUT=&SOUT //PASTRACE DD SYSOUT=&SOUT //SYSUDUMP DD SYSOUT=&SOUT //* //POSTPROC EXEC PGM=PASCAL2,REGION=&REG,COND=(4,LT,COMPILE) //STEPLIB DD DISP=SHR,DSN=&HLQ..COMPILER.LOAD //INPUT DD DISP=(OLD,DELETE),DSN=*.COMPILE.PRR //OUTPUT DD SYSOUT=&SOUT //PRR DD DSN=&&OBJF,UNIT=&WORK, // SPACE=(TRK,(200,200),RLSE),DISP=(,PASS), // DCB=(RECFM=FB,LRECL=80,BLKSIZE=19040) //ASMOUT DD SYSOUT=&SOUT,DCB=(RECFM=FB,LRECL=137) //DBGINFO DD DISP=SHR,DSN=&HLQ..DBGINFO(&MEM) //TRACEF DD SYSOUT=&SOUT //PASTRACE DD SYSOUT=&SOUT //SYSUDUMP DD SYSOUT=&SOUT //* //LKED EXEC PGM=IEWLF880,REGION=&REG,PARM='MAP', // COND=(0,LT,POSTPROC) //SYSLIB DD DISP=SHR,DSN=&HLQ..RUNTIME.TEXT // DD DISP=SHR,DSN=&HLQ..COMPILER.TEXT // DD DISP=SHR,DSN=SYS1.FORTLIB //SYSLMOD DD DISP=SHR,DSN=&MOD.(&MEM) //SYSPRINT DD SYSOUT=&SOUT //SYSUT1 DD UNIT=SYSDA,SPACE=(TRK,(200,200),RLSE) //SYSLIN DD DISP=(OLD,DELETE),DSN=*.POSTPROC.PRR // DD DISP=SHR,DSN=&HLQ..COMPILER.MESSAGES(STDINC) // DD DDNAME=SYSIN //* //GO EXEC PGM=&MEM, // COND=((4,LT,COMPILE),(0,LT,POSTPROC),(0,LT,LKED)), // PARM='/TIME=&GOTIME,&GOPARM',REGION=&GOREG //STEPLIB DD DISP=SHR,DSN=&MOD //QRD DD DISP=SHR,DSN=&HLQ..DBGINFO //INPUT DD DDNAME=SYSIN //FT06F001 DD SYSOUT=&SOUT //OUTPUT DD SYSOUT=&SOUT,DCB=(RECFM=V,LRECL=133) //PASTRACE DD SYSOUT=&SOUT //SYSUDUMP DD SYSOUT=&SOUT //*


Changes for the MVS compiler

Compiler version: 05.2017

In fact, nothing had to be changed for the MVS version of the compiler. The XRUNPARM.ASS startup module for CMS which converts the CMS (tokenized) command line into an OS type parameter is needed no longer for MVS. That means, the generated Pascal modules can be started directly at their $PASENT entry points.

There were some issues with the runtime which had to be resolved, that is, errors that did not show up on CMS but had to be fixed for MVS.

The major effort began when I tried to support the PASSNAP snapshot utility on MVS, too. PASSNAP works for applications which consist of different source files. For every procedure or function, the name of the source file is recorded in the procedure meta data in the load module. In case of a snapshot, PASSNAP reads the source file name from there and opens the appropriate DBGINFO file to get the debug information.

With CMS, PASSNAP does this by issuing a dynamic FILEDEF for a file called <source> DBGINFO. If it finds such a file, debug information can be used from there, otherwise processing continues without debug information.

With MVS, I decided that there should be a PDS library (for example PASCALN.DBGINFO) which contains members for each sourcefile, DBGINFO member name = source name. This DBGINFO library must be kept in sync with the load module library (maybe some sort of timestamp checking will be etablished later).

Now the problem arose: the runtime must be able to open files on PDS libraries with member names known at runtime and to return "not found" conditions (and not abend) when members are not present. Because PASSNAP is written in Pascal completely, this had to be provided as an extension to the normal Pascal I/O functions (that is: RESET and REWRITE).

After some thinking, I decided that it would be best to allow this for all Pascal programs, not just PASSNAP.

I extended the Pascal File control block (FCB) to make room for the 8 byte member name. Then I provided a new function ASSIGNMEM to set the member name in the FCB to a value other than blank (which is the default).

ASSIGNMEM is defined inside PASUTILS at the moment, and there is no definition for it, so that it is limited to TEXT files. This will be corrected later.

Definition of ASSIGNMEM:


procedure ASSIGNMEM ( var X : TEXT ; MEMBNAME : CHARPTR ; LEN : INTEGER ) ; var FCB : VOIDPTR ; CPT : -> CHAR ; begin (* ASSIGNMEM *) if PLATF = PLATF_UNKNOWN then CHECK_PLATFORM ; if PLATF = PLATF_INTEL then begin end (* then *) else begin FCB := FILEFCB ( X ) ; CPT := PTRADD ( FCB , 40 ) ; MEMSET ( CPT , ' ' , 8 ) ; if MEMBNAME <> NIL then if LEN > 8 then MEMCPY ( CPT , MEMBNAME , 8 ) else MEMCPY ( CPT , MEMBNAME , LEN ) end (* else *) end (* ASSIGNMEM *) ;

Usage example of ASSIGNMEM:


var PDS : TEXT ; MEMBERNAME : CHAR8 ; P : VOIDPTR ; CP : -> CHAR ; ... MEMBERNAME := 'TEST' ; ASSIGNMEM ( PDS , ADDR ( MEMBERNAME ) , 8 ) ; RESET ( PDS ) ; P := FILEFCB ( PDS ) ; CP := PTRADD ( P , 32 ) ; if CP -> = '0' then return ;

Successful execution of the RESET is indicated by a value other than zero in the open flag at the FCB offset 32.

There is still some work to be done, but using ASSIGNMEM, a first version of PASSNAP could be implemented for MVS which seems to work and gives the same results as the CMS version.

The CMS version of PASSNAP has been renamed to PASSNAPC; the only difference is in fact the open logic of the DBGINFO files.

Later on the wish list: PASSNAP for Windows, OS/2 and Linux.


Changes to PASSNAP and PASSNAPC - better error handling in PASMONN

Compiler version: 05.2017

I did some more improvements to the error handling routines inside the Pascal monitor PASMONN (Mainframe version); this was inspired by the MVS SYMPTOM DUMP and by some of the more useful outputs from CEEDUMP.

See these examples:

1.) Error 0C4 when using a wrong pointer:


**** INTERRUPT PSW : 078D0004 200A54AE **** REGS 0 - 3 : 00000040 000B5530 008566B0 00000006 **** REGS 4 - 7 : 00000004 400A5A1C 00000000 800A9DE2 **** REGS 8 - 11 : 000B51B8 000B5154 000A5428 000A6424 **** REGS 12 - 15 : 000B5050 000B5438 000A5428 000A5428 **** INTERRUPT ADDRESS AT : 000A54AE **** RUN ERROR AT LOCATION : 00000086 OF PROCEDURE : FIBONACCI **** ERROR CODE IS 2004 : ADDRESSING EXCEPTION **** CODE AROUND ERROR PSW : A13C502D 007492C1 2000482D **** CALL STACK: CALLED ENTRY CALLER CALLOFFS FIBONACCI (000A5428) CALLED BY FIBONACCI (000000D8) FIBONACCI (000A5428) CALLED BY FIBONACCI (000000D8) FIBONACCI (000A5428) CALLED BY FIBONACCI (000000D8) FIBONACCI (000A5428) CALLED BY FIBONACCI (000000D8) FIBONACCI (000A5428) CALLED BY FIBONACCI (000000D8) FIBONACCI (000A5428) CALLED BY $PASMAIN (00000130) $PASMAIN (000A5598) CALLED BY Pascal Monitor **** ENTRY POINT $PASENT AT : 000A57B0 **** BOTTOM OF RUNTIME STACK : 000B5050 **** CURRENT STACK FRAME : 000B5438 **** CURRENT HEAP POINTER : 00493F78 **** POINTER TO TOP OF HEAP : 00493F78

look at the CALL STACK, which is new, and CODE AROUND ERROR PSW, which shows the machine instruction, which is the cause of the abend.

2.) Runtime check: SUBRANGE error


**** INTERRUPT ADDRESS AT : 000A5524 **** RUN ERROR AT LOCATION : 000000D4 OF PROCEDURE : FIBONACCI **** ERROR CODE IS 1002 : SUBRANGE VALUE OUT OF RANGE **** THE OFFENDING VALUE : -1 IS NOT IN THE RANGE : 0 30 **** CALL STACK: CALLED ENTRY CALLER CALLOFFS FIBONACCI (000A5450) CALLED BY FIBONACCI (000000B8) FIBONACCI (000A5450) CALLED BY FIBONACCI (000000B8) FIBONACCI (000A5450) CALLED BY FIBONACCI (000000B8) FIBONACCI (000A5450) CALLED BY FIBONACCI (000000B8) FIBONACCI (000A5450) CALLED BY FIBONACCI (000000B8) FIBONACCI (000A5450) CALLED BY FIBONACCI (000000B8) FIBONACCI (000A5450) CALLED BY FIBONACCI (000000B8) FIBONACCI (000A5450) CALLED BY FIBONACCI (000000B8) FIBONACCI (000A5450) CALLED BY $PASMAIN (00000130) $PASMAIN (000A5598) CALLED BY Pascal Monitor **** ENTRY POINT $PASENT AT : 000A57B0 **** BOTTOM OF RUNTIME STACK : 000B5050 **** CURRENT STACK FRAME : 000B5578 **** CURRENT HEAP POINTER : 00493F78 **** POINTER TO TOP OF HEAP : 00493F78

3.) Stack / Heap collision, that is, the program runs out of memory (2000 KB in this case):


**** INTERRUPT ADDRESS AT : 000A53E8 **** RUN ERROR AT LOCATION : 00000000 OF PROCEDURE : FIBONACCI **** ERROR CODE IS 1006 : STACK/HEAP COLLISION **** CALL STACK: CALLED ENTRY CALLER CALLOFFS FIBONACCI (000A53E8) CALLED BY FIBONACCI (000000FA) FIBONACCI (000A53E8) CALLED BY FIBONACCI (000000FA) FIBONACCI (000A53E8) CALLED BY FIBONACCI (000000FA) FIBONACCI (000A53E8) CALLED BY FIBONACCI (000000FA) FIBONACCI (000A53E8) CALLED BY $PASMAIN (00000130) $PASMAIN (000A5598) CALLED BY Pascal Monitor **** ENTRY POINT $PASENT AT : 000A57B0 **** BOTTOM OF RUNTIME STACK : 000B5050 **** CURRENT STACK FRAME : 001EDBB8 **** CURRENT HEAP POINTER : 0020CB78 **** POINTER TO TOP OF HEAP : 002A8F78

the error occurs, when a new stack frame should be allocated. Because it is way too large, this kind of error occurs. See the values at the end of the dump, showing the limits of the areas. The stack grows upwards, the (classical) heap downwards.

There is another heap (since 2016), similar to the LE heap, which is accessable thru the new functions ALLOC and FREE; it lives outside of this "classical" heap and makes room for another 4 to 6 MB of storage, depending on address space size.

PASSNAP and PASSNAPC - language specific SNAP dumps

For the kind of error diagnosis shown above, there is no need for a DBGINFO file or member. But if you have one, you can link your module with PASSNAP (on MVS) or PASSNAPC (on CMS) and you will get still better error diagnosis. Source line numbers are shown, and all variables will be printed in Pascal notation with the true Pascal variable names.

See these examples:

SNAPSHOT-Example (no program error)
PASSNAP output following S0C4 error
Same S0C4, but no PASSNAP due to NOSNAP option (PASMONN error handling)
Same S0C4, but SYSUDUMP due to NOSPIE option
PASSNAP output following Subrange error

New standard type ANYFILE - VOIDPTR renamed to ANYPTR

Compiler version: 05.2017

I am planning to rewrite parts of the Pascal runtime in Pascal and, as told before, I am working all the time doing extensions to the Pascal runtime, for example working with PDS members on MVS.

What makes things difficult:

the Pascal language defined files as typed files; the files differ by the types of their element. So for example it is not possible to write a function that works for a file of type TEXT (FILE OF CHAR) as well as for a file of type FILE OF INTEGER, even if the function doesn't do anything with respect to the content of the file - e.g. if the function does only ASSIGN file names or member names to the file.

The only solution to this problem until now was: the function had to be implemented by the compiler itself, and the compiler could be tolerant for all types of files.

But I wanted to be able to write such functions as ASSIGN in Pascal, in a library seperate from the compiler.

My solution is:

I added a new standard type ANYFILE to the compiler, much the same way as I added last year a new type VOIDPTR. The type ANYFILE is compatible to every other file type. But, because you cannot assign files using the assignment statement, the only thing you can do: you can call a procedure or function and pass a "normal" file argument to an ANYFILE var parameter. Inside the function, the (somehow) restricted actions on the ANYFILE variable may be executed - for example ASSIGN, or FILEFCB, which gives you access to the file's FCB.

This extension will be present in the next release of the Stanford Pascal compiler; and then the ASSIGN and ASSIGNMEM Procedures, which were limited to TEXT files until now, will be possible for all sorts of files.

BTW: ANYPTR is the new name for VOIDPTR; VOIDPTR will be kept for compatibility reasons. For both ANYPTR and ANYFILE, access to the elements (dereferencing) via the arrow operator is not allowed; this is now flagged by the compiler with two new error messages

187: DEREF NOT ALLOWED WITH ANYPTR TYPE;
188: ACCESS TO FILE BUFFER NOT ALLOWED WITH ANYFILE TYPE.

Here is a short example of a function using the new ANYFILE type:


procedure ASSIGNMEM ( var X : ANYFILE ; MEMBNAME : CHARPTR ; LEN : INTEGER ) ; var FCB : VOIDPTR ; CPT : -> CHAR ; begin (* ASSIGNMEM *) if PLATF = PLATF_UNKNOWN then CHECK_PLATFORM ; if PLATF = PLATF_INTEL then begin end (* then *) else begin FCB := FILEFCB ( X ) ; CPT := PTRADD ( FCB , 40 ) ; MEMSET ( CPT , ' ' , 8 ) ; if MEMBNAME <> NIL then if LEN > 8 then MEMCPY ( CPT , MEMBNAME , 8 ) else MEMCPY ( CPT , MEMBNAME , LEN ) end (* else *) end (* ASSIGNMEM *) ;

this way, ASSIGNMEM is usable for every type of file.


MVS version available for download

Compiler version: 05.2017

All the files needed for the Pascal system on MVS (or z/OS) have been packed into a single ZIP file; this ZIP file is available from this website (see Resources paragraph). There are also instructions how to install the contents of this ZIP file on a MVS or z/OS system. You only need a FTP connection; most of the files are text files, but some (object) files are binary (recfm = f, lrecl = 80).

Some users have successfully installed the compiler based on this ZIP file on modern z/OS systems; I am very happy to report that the compiler is running there, too (AMODE 24, at the moment). Same goes for the compiler generated programs.


Stanford Pascal works on MacOS and on (modern) z/OS

Compiler version: 05.2017

With the help of Rene Jansen, the Stanford Pascal compiler was tested on two new targets:

- (modern) z/OS ... in fact, Rene tried it on a z/OS V1 R8. Some issues were the program name of the linkage editor (must be IEWL on modern systems) and the UNIT=xxx name for the temporary datasets, which had to be customized according to the needs of the local installation. Furthermore, a dataset had to be defined (PO, LRECL 80, RECFM F) to hold the DBGINFO members, which are needed even in the case when no debugging support is desired

- and Mac OS. Rene discovered that the method to assign external names to Pascal files (environment variables with prefix DD:) is not valid on all platforms, because the colon is not accepted on all platforms inside the names of environment variables. So we decided to change that to DD_

for example:

to assign an external filename EINGABE.TXT with optional path information to the Pascal file INPUT, you have to code (on Windows):

SET DD_INPUT=C:/MY_PATH/EINGABE.TXT

Many thanks to Rene for his interest in the compiler and his support !!

Excerpts from the z/OS Compile Job

Terminal flag in the Pascal FCB - effects on I/O functions

Compiler version: 05.2017

I had some problems with terminal I/O (especially input) on CMS and Windows. The "normal" behaviour of the READ and READLN procedures doesn't go well with the needs of terminal I/O.

The Pascal FCB (File Control Block) now contains a Terminal flag, which is set on the mainframe (CMS) by use of the TERMIN and TERMOUT function (TSO support still has to be done). TERMIN and TERMOUT also do FILEDEF xxxx TERM on the Pascal TEXT file, which is specified as the argument.

On the other platforms (Non-mainframe), the Terminal flag is set by TERMIN and TERMOUT, too, or by assigning *stdin* or *stdout* to Pascal TEXT files; it is also set implicitly for INPUT and OUTPUT, if there are no external filenames assigned to them using environment variables.

The terminal flag affects some I/O functions:

- RESET on input files does not read the first input buffer (if it would, a line of input would be requested from the user already at RESET time)

- READLN does not read a new input buffer, instead it flushes (ignores) the rest of the existing buffer and schedules a new buffer input for the next READ operation (very important)

- READ of characters does not implicitly read a new input buffer at EOLN; instead the program is required to explicitly READLN at EOLN, so that the next READ of char will read a new input buffer (this way the user gets more control about when an input buffer will be requested from the terminal user)

- READ of integers, reals and boolean still reads until the next nonblank character, and, if needed, these functions even schedule buffer input; this was done for compatibility reasons, because many existing programs expect sequences of WRITELN / READ / WRITELN / READ (involving integers or reals) to work

- maybe there will be (for example) additional READ integer functions in the future, which have a maximum field width, e.g. READ (I : 8); in this case, a maximum of 8 chars would be read and no buffer input would ever occur, even if EOLN would be reached - this would not depend on the Terminal flag at all

Caution: these changes only affect Terminal files, normal File I/O is not affected.

For example: program TESTCEIN.PAS


program TESTCEIN ( OUTPUT ) ; var C : CHAR ; procedure WAIT ; (**********************************) (* WARTET DARAUF, DASS EINE *) (* EINGABE ERFOLGT. *) (**********************************) var I : INTEGER ; C : CHAR ; begin (* WAIT *) WRITELN ( 'Start Funktion Wait' ) ; READLN ; WRITELN ( 'Ende Funktion Wait' ) ; end (* WAIT *) ; procedure READY ; (************************************) (* READY-MELDUNG, DIE QUITTIERT *) (* WERDEN MUSS. *) (************************************) begin (* READY *) WRITELN ; WRITELN ( 'FUNKTION AUSGEFUEHRT (ENTER DRUECKEN)' ) ; WAIT end (* READY *) ; begin (* HAUPTPROGRAMM *) if TRUE then begin TERMIN ( INPUT ) ; TERMOUT ( OUTPUT ) end (* then *) ; CLRSCRN ; WRITELN ( 'lesen bis zum dollarzeichen ...' ) ; WRITELN ( 'erst mal initialisierung ...' ) ; WRITELN ( 'start einleseschleife ...' ) ; RESET ( INPUT ) ; repeat READ ( C ) ; WRITELN ( 'gelesen: <' , C , '> ord = ' , ORD ( C ) : 3 ) ; if EOLN then begin WRITELN ( '*eoln*' ) ; READLN ; WRITELN ( 'readln ausgefuehrt' ) ; if EOF then WRITELN ( '*eof*' ) ; end (* then *) until ( C = '$' ) or EOF ; end (* HAUPTPROGRAMM *) .


Integer and character constants in hex and binary representation

Compiler version: 08.2017

Although character constants written in hex normally may result in portability problems, some utilities need such facilities. For example, I recently had to write a program on the mainframe acting as a HTTP client talking to a web server, and the answers are in ASCII code (or UTF-8), and so the parsing of the HTTP header involves some looking for X'0a' etc.; even on the mainframe.

Or: if you want to define code tables, that is: char arrays with constant initializers, hex notation is very convenient.

I added X'hh' and B'bbbbbbbb' as another possibility to specify character constants; you may add underscores to improve readability (for example to separate 4 or 8 bit groups in longer constants).

Some versions earlier, 0xhhhh has already been added as another notation for integer constants.

Now there are several problems or pitfalls:

- a program containing a mixture of "normal" character constants and "hex" character constants in a case statement (for example) will probably not compile on all platforms

- same goes for set constants or other expressions involving range expressions

Please look at the following example:


program TESTXB ( OUTPUT ) ; type SET_CHAR = set of CHAR ; var C : CHAR ; S1 : SET_CHAR ; S2 : SET_CHAR ; S3 : SET_CHAR ; S4 : SET_CHAR ; begin (* HAUPTPROGRAMM *) C := 'A' ; WRITELN ( 'c = <' , C , '>' , ORD ( C ) ) ; C := X'31' ; WRITELN ( 'c = <' , C , '>' , ORD ( C ) ) ; C := B'110011' ; WRITELN ( 'c = <' , C , '>' , ORD ( C ) ) ; case C of '1' : WRITELN ( 'eins' ) ; '5' : WRITELN ( 'fuenf' ) ; X'31' : WRITELN ( 'nochmal eins' ) ; X'32' : WRITELN ( 'nochmal zwei' ) ; X'33' : WRITELN ( 'drei als hex' ) ; B'1000110' : WRITELN ( 'f binaer' ) ; X'41' .. 'E' : WRITELN ( 'a als hex' ) ; end (* case *) ; WRITELN ( X'4142434445_414b4c4d4e' ) ; WRITELN ( B'01001111_01000011' ) ; S1 := [ 'A' .. 'E' ] ; S2 := [ X'41' .. 'E' ] ; S4 := [ 'A' .. X'45' ] ; end (* HAUPTPROGRAMM *) .

This will not work on an ASCII machine, because the two case labels '1' and x'31' are obviously the same, but only in ASCII. On an EBCDIC machine, the expression 'A' .. X'45' will not work, because the internal value of 'A' is much higher than X'45'. You will get syntax error on both platforms.

In a future release, the compiler will give warnings on mixed uses of normal and hex or binary notation of char constants that may lead to portability problems.

I hope that I discovered all places in the compiler where character constants are relevant; please feel free to contact me if you detect any errors introduced by this extension. Thanks in advance!


Improvement on Pascal sets - Part one

Compiler version: 08.2017

The improvement on Pascal sets took longer as expected. In fact I tried far too many changes in one sprint, and so the new version did not run at all. I had to fall back to the previous version and restart work from there, and I had to restructure first some parts of the internal set representation ... and some of the goals have to be moved to a later completion date.

The new version has the following extensions:

- Pascal sets now may have up to 2000 elements (previous 256)
- the internal origin of the set type is still zero
- that is: sets have to be in the range 0 .. 1999
- they may be of char type, integer subrange type, or scalar (aka enum)
- all integer subrange set types are compatible
- no negative integers in sets
- some errors removed and some performance enhancements
- characters and integers inside of set constants may be written in hex and binary notation
(example: 0x0a for integers, x'0a' and B'1111_0000' for characters)
- the PCINT debugger has been improved (type ? for online help)

Further improvements will be done in the next releases (preparations have been done already):

- min and max value of set types not limited to 0 .. 1999
- the origin may be in the range -16.000.000 to +16.000.000 (for integer subranges);
negative integers in sets are possible
- warning on portability issues with hex and binary constants in sets
- run time errors, when assigning out-of-range constants to sets (or sets of other subrange types)
- simpler P-Code instructions for sets
- new release of the P-Code documentation with the reworked set instructions (and other changed P-Code instructions)

The new version of the compiler uses the new feature already; for example: at the end the compiler shows a listing of all error messages issued (by error message type).
Because the maximum error message code is 999 (was 401 before), is it now possible to build a "set of errcode", where errcode is a subrange "1 .. maxerrnr".

The following example shows, how a boolean expression involving IN and a char based set constant is implemented. First you see the (portable) P-Code, and then the resulting machine instructions from the Mainframe variant (generated by the P-Code to 370 translator PASCAL2.PAS).


if CH in [ '+' , '-' , '*' , '/' ] then WRITELN ( 'arithm. operator' ) ;

The generated P-Code looks like this (only the IF part is shown, not the WRITELN) - including some comments on the right side:


LOD C,1,912 load the char CH ORD convert to integer LCA S,C32'+*-/' load the set SLD 32,968 copy to workarea INN check, if int is in set FJP L13 jump, if false

The copy to workarea could be omitted in future releases, room for improvement.

The P-Code translator builds the following 370-code from this (you may notice that no code is produced for the SLD instruction, which copies the set):


-------------------- LOC 246 -------------------------------- 0714: LOD C,1,912 0714: ORD @@ 0714: SR 2,2 @@ 0716: IC 2,912(13) load ch into reg 2 071A: LCA S,C32'+*-/' 071A: SLD 32,968 071A: INN @@ 071A: AH 2,=H'-72' subtract 72 @@ 071E: L 3,=X'020008C0' bit string (part of set) @@ 0722: LA 0,32 @@ 0726: CLR 2,0 look if ch - 72 > 32 @@ 0728: BC 11,*+10 if so, jump false @@ 072C: SLL 3,0(2) shift bit in set left @@ 0730: LTR 3,3 test leftmost bit 0732: FJP L13 @@ 0732: BC 11,L13 jump false, if off

Of course, this only works, because the characters in the set constant are within a codepage range of size 32. If the set range is larger, there are other implementations. But anyway: the original implementors of PASCAL2 did a great job, and I tried not to compromise this, although the maximum size of sets has been extended.

The new version will be available in the next few days from the New Stanford Pascal compiler website.


String constants built from several mixed parts (hex or binary)

Compiler version: 10.2017

String constants may be concatenated simply by appending them one after the other, seperated by blanks. The maximum length may not exceed 254 at the moment.

With older releases of the compiler, it was not possible to mix normal, hex and binary representations in such concatenations. And the length limit was not computed correctly with hex string constants.

When migrating to the new (external) scanner PASSCAN, these problems could be solved.

Definitions like the following are now possible:


const S_ESC = X'1b' ; S_LF = X'0a' ; S_CR = X'0d' ; S_VS_0 = X'0d' ; S_VS_1 = X'1b' '&l12D' X'0d0a' ; S_VS_2 = X'1b' '&l6D' X'0d0a' ; S_VS_3 = X'1b' '&l4D' X'0d0a' ; S_VS_4 = X'1b' '&l3D' X'0d0a' ; S_SCHATTEIN = X'1b' '(s3B' ; S_SCHATTAUS = X'1b' '(s0B' ; S_CPI10 = X'1b' '(s10H' ; S_CPI12 = X'1b' '(s13H' ; S_TRANSP = X'1b' '&p1X' ; S_C24 = X'1b' '&p1X' X'18' ; S_C25 = X'1b' '&p1X' X'19' ; S_CPI10VS2 = X'1b' '&l6D' X'0d0a' ; S_CPI12VS2 = X'1b' '&l6C' X'0d0a' ;

You may have noticed that these constants are printer control sequences for a HP printer (PCL = printer control language). I translated a very old BASIC program (from the 1980s) into Pascal, which performed simple layout services on text documents, and while doing this, the need for such constant definitions came up.


SIZEOF usable on string constants

Compiler version: 10.2017

SIZEOF until now was only usable on type identifiers and on variables. When using it on a constant identified, a syntax error P103 was shown (IDENTIFIER IS NOT OF APPROPRIATE CLASS).

SIZEOF is now usable on const identifiers, too.

For example:


procedure INIT ( var SCB : SCAN_BLOCK ) ; const MSG1 = 'symbol not known to source program scanner' ; MSG2 = 'unexpected end of source file' ; MSG3 = 'unexpected char in options string' ; MSG4 = 'closing paranthese expected in options string' ; MSG5 = 'the rest of the options string will be ignored' ; var C : SCAN_ERRCLASS ; N : INTEGER ; begin (* INIT *) ... PASSCANF ( SCB , 'S' , 'S' , 1 , MSG1 , SIZEOF ( MSG1 ) ) ; PASSCANF ( SCB , 'S' , 'S' , 2 , MSG2 , SIZEOF ( MSG2 ) ) ; PASSCANF ( SCB , 'S' , 'S' , 3 , MSG3 , SIZEOF ( MSG3 ) ) ; PASSCANF ( SCB , 'S' , 'S' , 4 , MSG4 , SIZEOF ( MSG4 ) ) ; PASSCANF ( SCB , 'S' , 'S' , 5 , MSG5 , SIZEOF ( MSG5 ) ) ; end (* INIT *) ;


New source program scanner (PASSCAN) - separate module

Compiler version: 10.2017

This is a change which I wanted to do since long time ago.

The old source program scanner (procedure INSYMBOL) has been completely replaced by a new scanner called PASSCAN; the new scanner is not hand-written any more, but it is generated using a scanner-generating tool that was written at the Computer Science department of the Stuttgart University in 1980 by four students (including myself). I extended this scanner generator in 1996, to make a usable product out of it, and I used it in many projects from 1996 until today. PASSCAN is an external module, seperate from the compiler. It does all the source handling and it writes the compile listing.

The new scanner will make extensions to the compiler symbol repertoire much easier, because it is generated from a "grammar", which is in fact a large regular expression (with attributes). The scanner generator works similar to the well-known Unix tool "lex".

With the help of the new scanner, I added some (scanner-related) features to the compiler:

- C++ style comments (terminated at the end of the line)
- binary integer constants
- variables starting with an underscore
- improvements on the compiler listing
- compiler messages (including source text) shown at terminal output

This last improvement is very helpful during development, because in most cases it is no longer needed to open the listing file and look for the compiler messages there, which speeds up development and makes it more fun.

Example: when compiling the old compiler with the new compiler, you get the following warning:


**** STANFORD PASCAL COMPILER, OPPOLZER VERSION OF 10.2017 **** 1839 (*$D- ... MUST BE IN EFFECT FOR THIS LOOP *) ! ** Warning S005: the rest of the options string will be ignored **** Compiler Summary **** **** WARNING: PASCAL EXTENSIONS USED. **** 1 Warning. **** 15161 LINE(S) READ, 157 PROCEDURE(S) COMPILED, **** 25920 P_INSTRUCTIONS GENERATED, 7.22 SECONDS IN COMPILATION.

The warning is new (it was not present on the old compiler); it says that the options in the comments are followed by text which is not considered as an option. This warning is shown at the Windows or CMS console (for example), given the appropriate DD assignments for OUTPUT. The number 1839 on the left is the source line number.

Some more experience from the restructuring of the compiler

By extracting the scanner logic from the compiler, the overall structure of the compiler became much clearer, and some unused or strange constructs could be removed.

But there were some strange situation during the migration, too.

First of all, the Pascal grammar has a well-known problem: when a subrange definition involving integers starts this way:

1..50

the scanner first thinks it could be the beginning of a real number, and when encountering the second period, it has to rethink this hypothesis. I fixed this by introducing another symbol into the grammar, which I called INTDOTDOT; this is an integer constant, followed by two dots. This way my grammar worked without re-tracking the characters.

There were some other problems; I first had an error in the definition of real numbers, and I forgot that identifiers can start with a dollar char in Stanford Pascal etc. etc., but all that can be fixed within minutes ... simply by generating a new scanner.

The compiler listing is produced by PASSCAN, too; I reworked it a little, but the information content stays the same. Some excerpts from the compiler listing:


1LINE # D/NEST LVL < STANFORD PASCAL, OPPOLZER VERSION OF 10.2017 > 13:18:43 10/03/2017 PAGE 28 ....5...10....5...20....5...30....5...40....5...50....5...60....5...70.. 1729 3N 2) begin 1730 3N 2) WRITE ( F , '1' ) ; 1731 3N 2) I := I - X ; 1732 3N 2) end (* then *) 1733 2N 2) else 1734 2N 2) WRITE ( F , '0' ) ; 1735 2N 2) X := X DIV 2 ; 1736 2N 2) end (* for *) 1737 ) end (* WRITEBINBYTE *) ; 1738 ) 1739 ) 1740 ) 1741 ) function MODP ( X : INTEGER ; Y : INTEGER ) : INTEGER ; 1742 ) 1743 120D 2) var M : INTEGER ; 1744 120D 2) 1745 ) begin (* MODP *) 1746 1N 2) M := X MOD Y ; 1747 1N 2) if M < 0 then 1748 1N 2) M := M + Y ; 1749 1N 2) MODP := M ; 1750 ) end (* MODP *) ; 1751 ) 1752 ) 1753 ) 1754 ) procedure INTTOSTR ( CP : VOIDPTR ; LEN : INTEGER ; VAL : INTEGER ; 1755 ) ZEROES : BOOLEAN ) ; 1756 ) 1757 125D 2) var BUFFER : array [ 1 .. 20 ] of CHAR ; 1758 145D 2) MINUS : BOOLEAN ; 1759 146D 2) LETZT : INTEGER ; 1760 152D 2) I : INTEGER ; 1761 156D 2) LIMIT : INTEGER ; 1762 160D 2) LENX : INTEGER ; 1763 164D 2) POSX : INTEGER ; 1764 164D 2) 1765 ) begin (* INTTOSTR *) 1766 1N 2) if VAL < 0 then 1767 2N 2) begin 1768 2N 2) VAL := - VAL ; 1769 2N 2) MINUS := TRUE 1770 2N 2) end (* then *) 1771 1N 2) else 1772 1N 2) MINUS := FALSE ; 1773 1N 2) I := 20 ; 1774 1N 2) BUFFER := ' ' ; 1775 1N 2) if VAL = 0 then 1776 2N 2) begin 1777 2N 2) BUFFER [ I ] := '0' ; 1778 2N 2) I := I - 1 ; 1779 2N 2) end (* then *) 1780 1N 2) else 1781 1N 2) while VAL > 0 do 1782 2N 2) begin 1783 2N 2) LETZT := VAL MOD 10 ; 1784 2N 2) BUFFER [ I ] := CHR ( ORD ( '0' ) + LETZT ) ; 1785 2N 2) I := I - 1 ; 1786 2N 2) VAL := VAL DIV 10 ; 1787 1N 2) end (* while *) ; 1788 1N 2) LIMIT := 20 - LEN + 1 ; 1789 1N 2) if MINUS then 1790 1N 2) LIMIT := LIMIT + 1 ; 1791 1N 2) if ZEROES then 1792 1N 2) while I > LIMIT do ....5...10....5...20....5...30....5...40....5...50....5...60....5...70..

This part of the compiler listing shows the reworked compiler information.

On the left of the source lines, you have the source line number first, and then data offset or nesting information, depending if the source line contains declarations or executable statements.

From the data offsets shown above (line 1757 ff), you can see that the variable BUFFER, for example, is located at offset 125 of the automatic storage block of procedure INTTOSTR. It is 20 bytes long. The next variable MINUS is at offset 145 and is 1 byte long; LETZT is an integer (4 bytes) and is at 148 due to alignment needs, I at 152 and so on ...

The nesting level is increased on every BEGIN symbol; maybe it would be better to increase it on IF, WHILE etc., too - much the same way as the indentation is done by PASFORM on the example above.

If the source contains errors or warnings, the messages are inserted directly into the source protocol (and shown on DD:OUTPUT, too):


1LINE # D/NEST LVL < STANFORD PASCAL, OPPOLZER VERSION OF 10.2017 > 14:59:54 10/04/2017 PAGE 29 ....5...10....5...20....5...30....5...40....5...50....5...60....5...70.. 1793 3N 2) LASTKIND := CURRKIND ; 1794 3N 2) while FREEPOS < CURRPOS do 1795 4N 2) begin 1796 4N 2) WRITE ( LISTING , ' ' ) ; 1797 4N 2) FREEPOS := FREEPOS + 1 1798 3N 2) end (* while *) ; 1799 3N 2) WRITE ( LISTING , CURRKIND ) ; 1800 3N 2) LASTPOS := CURRPOS 1801 2N 2) end (* else *) ; 1802 2N 2) if CURRNMR < 10 then 1803 2N 2) F := 1 1804 2N 2) else 1805 2N 2) if CURRNMR < 100 then 1806 2N 2) F := 2 1807 2N 2) else 1808 2N 2) F := 3 ; 1809 2N 2) WRITE ( LISTING , CURRNMR : F ) ; 1810 2N 2) FREEPOS := FREEPOS + F + 1 1811 1N 2) end (* for *) ; 1812 1N 2) WRITELN ( LISTING ) ; 1813 1N 2) ERRINX := 0 ; 1814 1N 2) if ERRORCNT > 0 then 1815 1N 2) PRCODE := FALSE ; 1816 1N 2) if ERRLN > 0 then 1817 1N 2) WRITELN ( LISTING , '****' : 9 , 1818 1N 2) ' PREVIOUS ERROR/WARNING ON LINE -->' , ERRLN : 4 ) ; 1819 1N 2) ERRLN := LINECNT ; 1820 ) end (* PRINTERROR *) ; 1821 ) 1822 ) 1823 ) 1824 ) procedure ENDOFLINE ; 1825 ) 1826 ) label 10 ; 1827 ) 1828 112D 2) var I : 1 .. 9 ; 1829 114D 2) DCN : INTEGER ; 1830 114D 2) 1831 ) begin (* ENDOFLINE *) 1832 1N 2) if ERRINX > 0 then 1833 1N 2) PRINTERROR ; 1834 1N 2) READLN ( INPUT , LINEBUF ) ; 1835 1N 2) LINELEN := BUFEND ; 1836 1N 2) 1837 1N 2) (*******************************************************) 1838 1N 2) (*THIS WILL SPEED THINGS UP IF NO MARGIN IS SET/RESET *) 1839 1N 2) (*$D- ... MUST BE IN EFFECT FOR THIS LOOP *) ! ** Warning S005: the rest of the options string will be ignored 1840 1N 2) (*******************************************************) 1841 1N 2) 1842 2N 2) repeat 1843 2N 2) LINELEN := LINELEN - 1 ; 1844 1N 2) until LINEBUF [ LINELEN ] <> ' ' ; 1845 1N 2) 1846 1N 2) (***********************************************************) 1847 1N 2) (* IF NEEDED, DEBUG SWITCH SHOULD BE RESTORED HERE ---> $D+*) 1848 1N 2) (***********************************************************) 1849 1N 2) 1850 1N 2) 10 : 1851 1N 2) if LINELEN > RMARGIN then 1852 2N 2) begin 1853 2N 2) MWARN := TRUE ; ....5...10....5...20....5...30....5...40....5...50....5...60....5...70..


C++ style comments

Compiler version: 10.2017

What I like about C++ style comments: they need not be terminated, that is: you can start them by two slashes and they terminate automatically at the end of the line.

In my money job, I am coding in C and PL/1 most of the time, and the C compilers today most often have a compiler option that allows C++ style comments on C programs, too, and this is used by me and my co-workers all the time.

The new scanner PASSCAN was introduced with the idea in mind, that C++ style comments should be possible in Stanford Pascal, too.

When I started working on the details, I soon discovered that it would be better to let the scanner only handle the comment starting symbol and to do the other comment handling manually. Well, that's not quite right: the comment handling is done by hand-written logic, but inside the skeleton which is used by the scanner generator, so it is generated logic, too - after all.

Here you have some relevant parts of PASSCAN.PAS:


begin SCB . SYMBOL [ SCB . LSYMBOL ] := ' ' ; SCB . LSYMBOL := SCB . LSYMBOL - 1 ; SCANNER2 ( ALTZUST , SCB ) ; case SCB . SYMBOLNR of COMMENT1 : COMMENT ( SCANINP , SCANOUT , SCB , 1 ) ; COMMENT2 : COMMENT ( SCANINP , SCANOUT , SCB , 2 ) ; COMMENT3 : COMMENT ( SCANINP , SCANOUT , SCB , 3 ) ; COMMENT4 : COMMENT ( SCANINP , SCANOUT , SCB , 4 ) ; COMMENT5 : COMMENT ( SCANINP , SCANOUT , SCB , 5 ) ; otherwise end (* case *) end (* else *)

this is just after the finite state machine, which is at the core of the scanner procedure, has terminated. The final state (German: Endzustand) is in the variable ALTZUST, and the procedure SCANNER2 computes the SYMBOLNR from the ALTZUST. If the SYMBOLNR is one of the five comment starting symbols, the routine COMMENT is called, which reads characters from the source program until the comment terminates.

Of course, the routine COMMENT has to take care of line ends, "page full" conditions, and: comments may be nested, so there are recursive calls of COMMENT, too. For more detail, you may look at the source of PASSCAN.

COMMENT type 5 is the new C++ style comment.

Here an example:


program TESTXB ( OUTPUT ) ; //********************************************************** //$A+ // // Testprogramm fuer hexadezimale und binaere Codierung // von Integers und Strings // //********************************************************** const CC = 'A' ; CHEX = X'67' ; CSET = [ 'a' .. 'e' ] ; CSETHEX = [ 'a' .. X'66' ] ; // hier Problem bei IBM //**************************************************** //* verschiedene tests, z.B. wird diese konstante //* csethex auf IBM nicht funktionieren, da 'a' schon //* groesser ist als x'66' //**************************************************** type SET_CHAR = set of CHAR ; ALPHA = array [ 1 .. 20 ] of CHAR ; var C : CHAR ; S1 : SET_CHAR ; S2 : SET_CHAR ; S3 : SET_CHAR ; S4 : SET_CHAR ; S5 : SET_CHAR ; S6 : SET_CHAR ; I : INTEGER ; testf : text ;

Problem with PASFORM

At the moment, the source program formatter PASFORM does not work with C++ style comments !!

I would like to rework PASFORM in such a way, that is uses the scanner PASSCAN, too; this way solving all the problems that PASFORM currently has (no good reaction on syntax errors etc.). And, of course, the new version of PASFORM will handle C++ comments correctly.


Binary integer constants

Compiler version: 10.2017

With the new scanner, it was easy to add support for binary integer constants.

Integer constants may now be coded

- in normal decimal notation (12345)
- in hexadecimal notation (0x1afe)
- or in binary notation (0b1001_0001)

With all three notations, it is possible to add underscores to separate digit groups (to improve readability). The underscores have no meaning.

Underscores are allowed only between digits, not at the end or at the beginning. And only one underscore is allowed between two digits, not more than one.

123_456_789 is a legal integer constant; 11__23 and _45 are both illegal.


Write integer with leading zeroes, if desired (controlled by negative width)

Compiler version: 10.2017

When reworking the compiler listing, there was a need for writing integer values with leading zeroes.

The standard procedure (CSP) WRI so far did nothing, when the width specified was negative; IBMs Pascal/VS compiler does a left-justified write of integer, when negative widths are specified, but IMO leading zeroes is more useful. So I changed WRI on both Windows (PCINT) and on the mainframe so that leading zeroes are written, when a negative width is specified.

In fact, there is a restriction on the mainframe: WRI only writes 12 digits at most; if the width is below -12, spaces are written instead of leading zeroes. With PCINT, the CSP WRI works as designed.

Example:


for I := 1 to MAXERRNR do begin if I in ERRLOG then begin while ( not EOF ( PRD ) ) and ( I > J ) do READLN ( PRD , J , INPLINE ) ; MEMCPY ( ADDR ( ERRMSG ) , ADDR ( INPLINE [ 4 ] ) , 64 ) ; if J = I then begin WRITELN ( '****' : 7 , ' P' , J : - 3 , ': ' , ERRMSG ) ; WRITELN ( LISTING , '****' : 7 , ' P' , J : - 3 , ': ' , ERRMSG ) ; end (* then *) end (* then *) end (* for *)


Compiler messages shown at terminal (aka SYSPRINT) during compile

Compiler version: 10.2017

When the compiler was restructured by using the new scanner PASSCAN, it was possible to show the compiler messages and the corresponding source lines at the terminal (= File OUTPUT), too.

The older versions of the compiler only showed the error number under the corresponding line in the compiler listing; at the terminal, only the total error count and the line number of the last error was shown. So the developer had to open the listing, locate the line of the last error and go from there to the previous error and so on.

The terminal output of the new version looks like this:


**** STANFORD PASCAL COMPILER, OPPOLZER VERSION OF 10.2017 **** 90 X'C1' .. 'E' : ! +++ Error P102: LOW BOUND EXCEEDS HIGHBOUND 100 S2 := [ X'c1' .. 'E' ] ; ! +++ Error P310: LOWER ELEMENT OF SET RANGE HIGHER THAN UPPER ELEMENT **** Compiler Summary **** **** WARNING: PASCAL EXTENSIONS USED. **** 2 Errors. **** 123 LINE(S) READ, 1 PROCEDURE(S) COMPILED, **** 358 P_INSTRUCTIONS GENERATED, 0.11 SECONDS IN COMPILATION. **** Last Error/Warning on Line 100 **** Error/Warning Codes for this Program: **** P102: LOW BOUND EXCEEDS HIGHBOUND **** P310: LOWER ELEMENT OF SET RANGE HIGHER THAN UPPER ELEMENT

that is: we have two errors here in line 90 and line 100; the source lines are shown at the terminal, the messages and the position within the line where the error was found.

At the end, we have a "Compiler Summary", which shows the number of errors, other statistics (like before), the line number of the last message (although not really needed now), and then a summary of all error types that have occured - much the same as in prior releases.

BTW: these errors would not have occured on the mainframe; the mixed range expressions (hexadecimal and "normal" notation) are not portable and lead to syntax errors on Windows (only). In a future release I have the plan to show warnings on such expressions even on platforms where there is no error.


Old errors removed from the compiler

Compiler version: 11.2017

When I decided in October 2017 to use runtime checks in the first pass of the compiler, I had to do substantial work to remove all the places where out-of-range assigments were done or uninitialized variables were used.

At the same time I tried to use the PASSCAN scanner with the source code formatter PASFORM. This implied the use of the scalar type SYMB. When I introduced this into PASFORM, I got some hundred errors P104 due to unknown identifiers. It turned out, that most P104 errors were followed by other additional errors, because the compiler did additional checks on undefined or incomplete internal structures, which it should not do after the first error.

Furthermore, the compiler crashed, when there was an erroneous semicolon before else, like, for example, in:


program CRASH2 ( OUTPUT ) ; type CHARX = array [ 1.. CMAX ] of CHAR ; var C : CHAR ; begin (* HAUPTPROGRAMM *) C := 'X' ; if C = 'X' then C := 'A' ; else C := 'B' ; end (* HAUPTPROGRAMM *) .

the problem was, that the statement before "else" ended correctly, and the symbol else at the beginning of the next statement was not handled correctly by the parser and lead to disaster. This had to be fixed as the first step.

The next working version of the compiler showed these errors:


c:\work\pascal\work>ppb crash2 PCINT (Build 1.0 Nov 11 2017 08:49:28) **** STANFORD PASCAL COMPILER, OPPOLZER VERSION OF 09.2017 **** 3 SYNTAX ERROR(S) DETECTED. **** 16 LINE(S) READ, 0 PROCEDURE(S) COMPILED, **** 12 P_INSTRUCTIONS GENERATED, 0.02 SECONDS IN COMPILATION. **** LAST ERROR/WARNING ON LINE --> 14 **** ERROR/WARNING CODES FOR THIS PROGRAM : **** 6 : ILLEGAL SYMBOL **** 104 : IDENTIFIER IS NOT DECLARED **** 107 : INCOMPATIBLE SUBRANGE TYPES *** EXIT Aufruf mit Parameter = 3 ***

The semicolon before "else" is now handled correctly; the compiler shows an error P006 ("Illegal symbol") at the "else" symbol.

This is the 09.2017 version, which does not yet show the errors on the terminal. But anyway: P104 is shown, because CMAX is not defined, and P117 is shown, because the type of CMAX does not fit to the type of the integer constant 1 (which makes no sense).

There were many places in the compiler, where such additional errors after P104 were shown. I tried to eliminate all or most of these. The compiler output now looks like this (12.2017 version):


c:\work\pascal\work>pp crash2 PCINT (Build 1.0 Nov 11 2017 08:49:28) **** STANFORD PASCAL COMPILER, OPPOLZER VERSION OF 12.2017 **** 4 type CHARX = array [ 1.. CMAX ] of CHAR ; ! +++ Error P104: IDENTIFIER IS NOT DECLARED 14 else ! +++ Error P006: ILLEGAL SYMBOL **** Compiler Summary **** **** 2 Errors. **** 16 LINE(S) READ, 0 PROCEDURE(S) COMPILED, **** 12 P_INSTRUCTIONS GENERATED, 0.06 SECONDS IN COMPILATION. **** Last Error/Warning on Line 14 **** Error/Warning Codes for this Program: **** P006: ILLEGAL SYMBOL **** P104: IDENTIFIER IS NOT DECLARED *** EXIT Aufruf mit Parameter = 2 ***


Shorter strings (variables) can be assigned to longer strings

Compiler version: 12.2017

When doing some enhancements on the compiler, I had to assign a 20 byte string (a variable identifier) to a 32 byte string (error information of the symbol scanner). "String" at the moment is simply an abbreviation for "array of char", BTW.

But this was not possible until now; the compiler showed an error P129 ("Type conflict of operands").

I tried to make such assignments possible (shorter strings to longer strings); the longer target strings should be filled with blanks.

It turned out that the easiest way will be to create a new P-Code instruction MFI (memory fill). I planned this instruction to have a length argument; it should fetch an address and a char from the stack and fill the storage from the given address with the char pattern, using the (fixed) length from the instruction.

Example:


LDA 1,362 - target address LDC C,' ' - blank = initialization pattern MFI 30 - move 30 blanks to address 362

When I started to use the new P-Code in assignments involving different lengths, it turned out that another additional variant of MFI is needed. The two addresses for the MOV instruction are already on the stack (target first, then source address). Then, before MFI, the Blank (the initialization pattern) is loaded on the stack. The second variant of the MFI instruction should then take the target address from the 3rd position of the stack (the 2nd stack position, which is the source address, should be ignored), and after execution of MFI, only the initialization pattern should be popped; the two addresses should remain on the stack for the subsequent MOV instruction.

To control this, a negative length is specified on MFI.

Example:


LDA 1,362 - target address LDA 1,352 - source address DBG 1 - new DBG instruction, see later LDC C,' ' - blank = initialization pattern MFI -30 - move 30 blanks to address 362 MOV 10 - move 10 bytes from 352 to 362

Using the new MFI instruction, assigning shorter strings to longer strings now is possible without problems; see the following example:


program TESTSVAR ( OUTPUT ) ; type CHAR10 = array [ 1 .. 10 ] of CHAR ; CHAR30 = array [ 1 .. 30 ] of CHAR ; CHAR21 = array [ 20 .. 40 ] of CHAR ; var F10 : CHAR10 ; F30 : CHAR30 ; F21 : CHAR21 ; begin (* HAUPTPROGRAMM *) F10 := 'Oppolzer' ; F30 := F10 ; WRITELN ( F30 ) ; F21 := 'Teststring Length 21' ; F30 := F21 ; WRITELN ( F30 ) ; end (* HAUPTPROGRAMM *) .

Warning: this new feature is not yet available for download; it is only present in the working version and only for Windows, at the moment (11.2017).


Compiler error messages with additional information

Compiler version: 12.2017

Some compiler messages need additional information. See the following example:


PCINT (Build 1.0 Nov 11 2017 08:49:28) **** STANFORD PASCAL COMPILER, OPPOLZER VERSION OF 09.2017 **** 4 SYNTAX ERROR(S) DETECTED. **** 53 LINE(S) READ, 0 PROCEDURE(S) COMPILED, **** 232 P_INSTRUCTIONS GENERATED, 0.02 SECONDS IN COMPILATION. **** LAST ERROR/WARNING ON LINE --> 53 **** ERROR/WARNING CODES FOR THIS PROGRAM : **** 117 : UNSATISFIED FORWARD REFERENCE **** 168 : UNDEFINED LABEL *** EXIT Aufruf mit Parameter = 4 ***

These errors are reported late, for example: the missing forward reference (for types, for example) can only be shown at the beginning of the next section (for example, the var section).

It is therefore necessary to show the value (the identifier) of the missing reference anywhere. This was done in previous releases by simple additional WRITELN statements to the listing file. See the following example from the 09.2017 version:


10 ) 11 ) 12 ) type PTRX = -> X ; 13 ) PTRF = -> FELD ; 14 ) 15 ) 16 ) var ZAHL1 : INTEGER ; UNDEFINED TYPE: FELD UNDEFINED TYPE: X **** E117 17 356D 1) ZAHL2 : INTEGER ; 18 360D 1) ZAHL3 : INTEGER ;

The newer compiler releases show the error messages on the terminal, so this sort of user information is not practical any more. Furthermore, the new source program scanner PASSCAN supports error messages with parameters, which solves this requirement. To use this, the error messages in the message file PASCAL.MESSAGES need a % char; this is the place, where the additional information has to be inserted. And: the additional information has to be passed as an additional parameter to the PASSCANE procedure (character string of length 32). This, BTW, was the motivation to implement the compiler extension concerning assignments of strings of different lengths.

After the necessary changes to the compiler and to the message file, the compiler output (on the terminal) now looks like this:


PCINT (Build 1.0 Nov 11 2017 08:49:28) **** STANFORD PASCAL COMPILER, OPPOLZER VERSION OF 12.2017 **** 16 var ZAHL1 : INTEGER ; !! +++ Error P117: UNSATISFIED FORWARD REFERENCE (TYPE "FELD") +++ Error P117: UNSATISFIED FORWARD REFERENCE (TYPE "X") 25 begin (* HAUPTPROGRAMM *) ! +++ Error P117: UNSATISFIED FORWARD REFERENCE (TYPE "UNKNOWN") 53 end (* HAUPTPROGRAMM *) . !! +++ Error P168: UNDEFINED LABEL "20" +++ Error P168: UNDEFINED LABEL "10" **** Compiler Summary **** **** 5 Errors. **** 53 LINE(S) READ, 0 PROCEDURE(S) COMPILED, **** 232 P_INSTRUCTIONS GENERATED, 0.06 SECONDS IN COMPILATION. **** Last Error/Warning on Line 53 **** Error/Warning Codes for this Program: **** P117: UNSATISFIED FORWARD REFERENCE (TYPE "%") **** P168: UNDEFINED LABEL "%" *** EXIT Aufruf mit Parameter = 5 ***

The compiler now has no more "uncontrolled" WRITELNs to the listing file (that is, WRITELNs, that are not scheduled by the source program scanner).

Warning: this new feature is not yet available for download; it is only present in the working version and only for Windows, at the moment (11.2017).


Verifying new P-Code instructions with the PCINT debugging features

Compiler version: 12.2017

To allow assignments of shorter string variables to longer string targets, I had to introduce a new P-Code instruction MFI (memory fill). I'd like to show you here, how the PCINT interpreter and its debugging features can be used to verify that new P-Code instructions are implemented correctly.

First of all, here is the Pascal example used:


program TESTSVAR ( OUTPUT ) ; type CHAR10 = array [ 1.. 10 ] of CHAR ; CHAR30 = array [ 1.. 30 ] of CHAR ; CHAR21 = array [ 20.. 40 ] of CHAR ; var F10 : CHAR10 ; F30 : CHAR30 ; F21 : CHAR21 ; begin (* HAUPTPROGRAMM *) F10 := 'Oppolzer' ; F30 := F10 ; WRITELN ( F30 ) ; F21 := 'Teststring Length 21' ; F30 := F21 ; WRITELN ( F30 ) ; F21 := F30 ; end (* HAUPTPROGRAMM *) .

You may notice that the assignments to F30 are ok, but the assignment to F21 is wrong, because F21 is shorter than F30. This is what the compiler says, too:


c:\work\pascal\work>pp testsvar PCINT (Build 1.0 Nov 11 2017 08:49:28) **** STANFORD PASCAL COMPILER, OPPOLZER VERSION OF 12.2017 **** 21 F21 := F30 ; ! +++ Error P129: TYPE CONFLICT OF OPERANDS **** Compiler Summary **** **** 1 Error. **** 22 LINE(S) READ, 0 PROCEDURE(S) COMPILED, **** 38 P_INSTRUCTIONS GENERATED, 0.05 SECONDS IN COMPILATION. **** Last Error/Warning on Line 21 **** Error/Warning Codes for this Program: **** P129: TYPE CONFLICT OF OPERANDS *** EXIT Aufruf mit Parameter = 1 ***

Luckily, the compiler simply omits the erroneous statement 21, so the resulting PRR file is usable, anyway. Here you can see the beginning of the PRR file:


c:\work\pascal\work>type testsvar.prr LOC 15 BGN TESTSVAR 10:31:22 11/12/2017 $PASMAIN ENT P,1,L3 $PASMAIN ,T,F,F,F,2,0,, LDA 1,352 LCA M,10,'Oppolzer ' MOV 10 LOC 16 LDA 1,362 LDA 1,352 DBG 1 LDC C,' ' MFI -30 MOV 10 LOC 17 ...

For debugging purposes, I generated the new DBG instruction just before the instruction sequence which sets up the MFI P-Code. DBG has no effect, if PCINT is started with debug=n, but with debug=y, the interpreter enters the debug dialog mode, when it encounters a DBG instruction. The operand on DBG currently has no meaning.

I now start the program using debug=y. The program (the PRR file) is translated into an internal memory representation and then executed, but, because the debug switch is on, execution stops at the very first P-Code instruction (which is BGN).


c:\work\pascal\work>pcint prr=testsvar.prr inc=paslibx,pasutils, pas=testsvar.pas out=testsvar.prrlis debug=y PCINT (Build 1.0 Nov 11 2017 08:49:28) ip=000001 sp=00000016 hp=02999984 00000000 00000000: 81818181 81818181 81818181 81818181 +................+ 00000016 00000010: 81818181 81818181 81818181 81818181 +................+ 000001: 007 0 1 BGN TESTSVAR 10:31:22 11/12/2017 g

The "g" command starts unlimited execution, but in this case execution stops, when the interpreter encounters the DBG instruction (in fact, after it processed the DBG instruction).


ip=000010 sp=00000024 hp=02999984 00000008 00000008: 81818181 81818181 81818181 6A010000 +............j...+ 00000024 00000018: 60010000 81818181 81818181 81818181 +................+ 00000040 00000028: 81818181 81818181 +........ + 000010: 041 C 0 32 LDC C,' ' asm

I switch to ASM mode using the "asm" command; in PAS mode, the debugger only stops once for every Pascal statement and not for every P-Code instruction.


*** Mode ASM set ip=000010 sp=00000024 hp=02999984 00000008 00000008: 81818181 81818181 81818181 6A010000 +............j...+ 00000024 00000018: 60010000 81818181 81818181 81818181 +................+ 00000040 00000028: 81818181 81818181 +........ + 000010: 041 C 0 32 LDC C,' ' s

Now I step one instruction forward using the "s" command.


ip=000011 sp=00000028 hp=02999984 00000012 0000000c: 81818181 81818181 6A010000 60010000 +........j.......+ 00000028 0000001c: 20000000 81818181 81818181 81818181 + ...............+ 00000044 0000002c: 81818181 81818181 81818181 +............ + 000011: 047 0 -30 MFI -30 d x16a x180

You may notice that the stack pointer SP has increased to 28; and the stack now contains the blank (X'20') at the topmost stack position. The address which will be the target of the subsequent MFI instruction is x0000016a (at the position sp - 8, little endian representation, because we are on a Windows machine). So I'd like to display the target area of the MFI instruction before executing it. This is what the "d" command does (from address ... to address, hex notation is possible).


00000362 0000016a: 81818181 81818181 81818181 81818181 +................+ 00000378 0000017a: 81818181 8181 +...... + ip=000011 sp=00000028 hp=02999984 00000012 0000000c: 81818181 81818181 6A010000 60010000 +........j.......+ 00000028 0000001c: 20000000 81818181 81818181 81818181 + ...............+ 00000044 0000002c: 81818181 81818181 81818181 +............ + 000011: 047 0 -30 MFI -30 s

The area (only 22 bytes are shown) is undefined; x'81' is the initialization pattern used by the runtime. So now I will execute the new implemented MFI instruction, to see, how it works ... using the "s" command.


ip=000012 sp=00000024 hp=02999984 00000008 00000008: 81818181 81818181 81818181 6A010000 +............j...+ 00000024 00000018: 60010000 20000000 81818181 81818181 +.... ...........+ 00000040 00000028: 81818181 81818181 +........ + 000012: 049 0 10 MOV 10 d x16a x180

Obviously, it popped one item from the stack, which is correct. Now let's see, if the area is filled correctly, by again displaying it.


00000362 0000016a: 20202020 20202020 20202020 20202020 + + 00000378 0000017a: 20202020 2020 + + ip=000012 sp=00000024 hp=02999984 00000008 00000008: 81818181 81818181 81818181 6A010000 +............j...+ 00000024 00000018: 60010000 20000000 81818181 81818181 +.... ...........+ 00000040 00000028: 81818181 81818181 +........ + 000012: 049 0 10 MOV 10 s

Looks good; now I execute the next instruction (MOV, using a length of 10).


ip=000013 sp=00000016 hp=02999984 00000000 00000000: 81818181 81818181 81818181 81818181 +................+ 00000016 00000010: 81818181 6A010000 60010000 20000000 +....j....... ...+ 000013: 044 0 17 LOC 17 *** LOC 17: WRITELN ( F30 ) ; g

The next instructions write the F30 buffer, so I will see the result anyway. I continue by using the "g" command.


Oppolzer ip=000030 sp=00000024 hp=02999984 00000008 00000008: 81818181 81818181 81818181 6A010000 +............j...+ 00000024 00000018: 88010000 1E000000 1E000000 81818181 +................+ 00000040 00000028: 81818181 81818181 +........ + 000030: 041 C 0 32 LDC C,' ' s ip=000031 sp=00000028 hp=02999984 00000012 0000000c: 81818181 81818181 6A010000 88010000 +........j.......+ 00000028 0000001c: 20000000 1E000000 81818181 81818181 + ...............+ 00000044 0000002c: 81818181 81818181 81818181 +............ + 000031: 047 0 -30 MFI -30 s ip=000032 sp=00000024 hp=02999984 00000008 00000008: 81818181 81818181 81818181 6A010000 +............j...+ 00000024 00000018: 88010000 20000000 1E000000 81818181 +.... ...........+ 00000040 00000028: 81818181 81818181 +........ + 000032: 049 0 21 MOV 21 s ip=000033 sp=00000016 hp=02999984 00000000 00000000: 81818181 81818181 81818181 81818181 +................+ 00000016 00000010: 81818181 6A010000 88010000 20000000 +....j....... ...+ 000033: 044 0 20 LOC 20 *** LOC 20: WRITELN ( F30 ) ; g

The content of F30 is shown correctly, as expected. Because there is another MFI in the code, execution stops again. I walk along the code using "s" commands (watch SP changing) and then use "g" to execute the rest of the program.


Teststring Length 21 c:\work\pascal\work>

This debug session proved IMO, that the new P-Code instruction works as desired. I finish by calling the program using debug=n; there should be no interruption.


c:\work\pascal\work>prun testsvar c:\work\pascal\work>pcint prr=testsvar.prr inc=paslibx,pasutils, pas=testsvar.pas out=testsvar.prrlis PCINT (Build 1.0 Nov 11 2017 08:49:28) Oppolzer Teststring Length 21 c:\work\pascal\work>

If you want to use the debug features of PCINT yourself, start it using debug=y and enter a question mark on the first prompt; you will get a short online help telling the available commands (at the moment in German, only ... sorry).

Warning: the actual version of PCINT may not work this way; you may have to wait for the 12.2017 version.


New P-Code instructions to support inline MEMCPY and MEMSET

Compiler version: 12.2017

MEMCPY and MEMSET were introduced in 2016 as new standard procedures to support block moves and block initializations in the same way as C does it. The arguments are the same as in their C counterparts. Because we have the ADDR function, too, which supports getting the address of every variable (and SIZEOF to get the size of every variable or type), MEMCPY and MEMSET are very easy to use. Of course, there are some security problems, when using these procedures, and they have to be used with care.

The initial versions of MEMCPY and MEMSET were implemented using the library call facility of Stanford Pascal, that is: the compiler generates calls to an external Pascal procedure which implements the standard function (in this case, the external procedure was located in PASLIBX.PAS). This was a bit costly, because the procedure prologue has to be executed on every MEMCPY and MEMSET call.

What made things worse: MEMCPY and MEMSET were implemented using Pascal loops, which in fact moved one character at a time. This was, of course, the easiest way to do it, and for a proof of concept it was sufficient, but not for production work.

This is the library function that implemented MEMCPY and MEMSET, BTW; at least the PTR2INT and PRTADD function calls are implemented by inline code (simple P-Code instuctions). If they were true function calls, performance would be a real nightmare.


procedure $PASSTR ( FUNCCODE : INTEGER ; X1 : CHARPTR ; X2 : CHARPTR ; L : INTEGER ) ; (**************************************) (* Verteiler fuer String-Funktionen *) (**************************************) var CH : CHAR ; begin (* $PASSTR *) case FUNCCODE of /*********************************/ /* MEMSET */ /*********************************/ 1 : begin CH := CHR ( PTR2INT ( X2 ) ) ; while L > 0 do begin X1 -> := CH ; L := L - 1 ; X1 := PTRADD ( X1 , 1 ) end (* while *) end (* tag/ca *) ; /*********************************/ /* MEMCPY */ /*********************************/ 2 : begin while L > 0 do begin X1 -> := X2 -> ; L := L - 1 ; X1 := PTRADD ( X1 , 1 ) ; X2 := PTRADD ( X2 , 1 ) end (* while *) end (* tag/ca *) ; otherwise EXIT ( 1120 ) ; end (* case *) ; end (* $PASSTR *) ;

It was clear to me from the start that I would have to find a better solution later.

With the 12.2017 compiler release, I introduced some new P-Code instructions to support inline translation of MEMCPY and MEMSET:

MCP - for MEMCPY, if length is variable (MOV is sufficient for MEMCPY, if length is fixed)

MSE - for MEMSET, if length is variable (MFI is sufficient for MEMSET, if length is fixed;
MFI was introduced recently to support pre-formatting of longer target strings, when assigning shorter strings)

MZE - special case for MEMSET, if length is fixed and init pattern is zero
(some platforms have special instructions for this case, for example XC on the mainframe)

For a detailed description of the new P-Code instructions, you may look at the new P-Code documentain (2017 update):

P-Code Description - 2017 release

Implementation of the new P-Code instructions

On the mainframe, the P-Code translator PASCAL2.PAS generates MVCs for P-Codes MOV and MFI, if the length is less or equal to 256, and MVCLs otherwise. For variable length transfers (like MSE and MCP), MVCL is generated. MZE generates XC, if the length is less or equal to 256.

Example: some lines of a test program containing MEMSET and MEMCPY (variable lengths); the numbers on the left are the line numbers of the source.


55: L := 2000 ; 58: P := ADDR ( BUF ) ; 59: P := PTRADD ( P , L ) ; 60: MEMSET ( P , CH , L ) ; 64: L := 21 ; 65: MEMCPY ( ADDR ( F21 ) , ADDR ( F30 ) , L ) ;

This is the P-Code that the compiler generated:


LOC 55 LDC I,2000 STR I,1,416 LOC 58 LDA 1,421 STR A,1,4424 LOC 59 LOD A,1,4424 LOD I,1,416 ADA STR A,1,4424 LOC 60 LOD A,1,4424 LOD C,1,420 LOD I,1,416 MSE 0 LOC 64 LDC I,21 STR I,1,416 LOC 65 LDA 1,392 LDA 1,362 LOD I,1,416 MCP

and here you can see the 370 instructions that PASCAL2 generates from this P-Code (Pseudo ASSEMBLER):


-------------------- LOC 55 ------------------ 0298: LDC I,2000 0298: STR I,1,416 @@ 0298: LA 2,2000 @@ 029C: ST 2,416(13) -------------------- LOC 58 ------------------ 02EE: LDA 1,421 02EE: STR A,1,4424 @@ 02EE: LA 2,421(13) @@ 02F2: LA 14,4088(13) @@ 02F6: ST 2,336(0,14) -------------------- LOC 59 ------------------ 02FA: LOD A,1,4424 02FA: LOD I,1,416 02FA: ADA @@ 02FA: A 2,416(13) 02FE: STR A,1,4424 @@ 02FE: ST 2,336(0,14) -------------------- LOC 60 ------------------ 0302: LOD A,1,4424 0302: LOD C,1,420 0302: LOD I,1,416 0302: MSE 0 @@ 0302: LA 6,0(0,2) @@ 0306: L 7,416(13) @@ 030A: LTR 7,7 @@ 030C: BC 13,0 ## BNP @NOMV @@ 0310: XR 4,4 @@ 0312: IC 5,420(13) @@ 0316: SLL 5,24 @@ 031A: MVCL 6,4 ## @NOMV DS 0H -------------------- LOC 64 ------------------ 0372: LDC I,21 0372: STR I,1,416 @@ 0372: LA 2,21 @@ 0376: ST 2,416(13) -------------------- LOC 65 ------------------ 037A: LDA 1,392 037A: LDA 1,362 037A: LOD I,1,416 037A: MCP @@ 037A: LA 6,392(13) @@ 037E: LA 4,362(13) @@ 0382: LR 7,2 @@ 0384: LTR 5,7 @@ 0386: BC 13,0 ## BNP @NOMV @@ 038A: MVCL 6,4 ## @NOMV DS 0H -------------------- LOC 66 ------------------

The P-Code interpreter PCINT also runs much better with the new P-Codes, of course. Without them, procedure calls MEMSET and MEMCPY had to be done by running (interpreting) the Pascal loops in PASLIBX.PAS (see above). Now the new P-Codes are interpreted directly by PCINT (that is, it uses C function calls memset and memcpy to interpret them, which are often implemented by block moves using native code instructions). This should make a big difference.


Calling external procedures and functions written in ASSEMBLER

Compiler version: 12.2017

In the beginning, the support for external procedures and functions in Stanford Pascal was very limited. Only one external module was possible, due to name conflicts. This single external module was a Pascal program with the main program omitted. I changed this in 2016 and introduced MODULEs, look here:

Pascal library

With modules, it was possible to add an arbitrary number of external modules, containing one to many entries (external procedures and functions) and local procedures, which appear at lower levels or which have the LOCAL attribute (a new keyword). Further extensions included STATIC variables, which are global to the module, but not visible outside the module. There are also STATIC variables, which are local to procedures and functions. Find details here:

Static definitions

So external procedure and functions in Pascal are no problem any more, but some people asked for subroutines written in ASSEMBLER. Of course, it would be best, if these ASSEMBLER subprograms could be coded using the normal OS linkage conventions. Unfortunately, Pascal itself uses somehow different linkage conventions.

The compiler already supported FORTRAN subprograms, using the normal OS linkage. But FORTRAN has some restrictions on parameters; all parameters must be passed by reference. For by-value parameters, Pascal creates so-called dummy arguments (copies of the real parameters) and passes their addresses; this way, changes to the parameters done by the FORTRAN routines (if there are any) would get lost.

I decided not to do this for ASSEMBLER subprograms; instead all the by-value parameters are passed in the parameter list (register 1 based) in the same way as the C compiler does it. Consequently, the end of the parameter list cannot be marked (because it is not an address list any more).

Find much more technical details and some example programs here:

External Procedures in Stanford Pascal - written in Pascal, FORTRAN or Assembler (2017)


New installation procedure for MVS (or z/OS)

Compiler version: 12.2017

At 23rd of November 2017, a video appeared on YouTube, which told about a failed installation of Stanford Pascal on MVS (TK4-).

Youtube video (thanks to moshix!)

I examined that and came to the conclusion that the installation procedure suggested for MVS so far was too complicated and error-prone, so I reworked it in the following weeks.

The result is a Pascal program, which reads a large file (PASCALN.TXT) and puts all the files which make up the Pascal system on the MVS target (text and binary) in the right place. This is no problem, because it is possible since 05.2017 to specify member names on files before doing REWRITE (procedure ASSIGNMEM), so PDS members on MVS are supported (when reading or writing PDS members, the member names can be specified at run time).

The remaining problem is: how to run a Pascal program on a MVS or z/OS system, before the compiler and the runtime is there?

The solution is: the program which does the installation has to be copied there using binary FTP (for example). In more detail: some FB 80 binary object files (text decks) need to be transferred to the target MVS system and then linked there to build a load module.

After that, all works automatically.

I prepared some MVS jobs to do this and a ZIP file, which is part of the GitHub repository; the ZIP files contains all the needed stuff. And I wrote a document, which describes the necessary steps in detail.

Link to the GitHub repository (only the MVSINST subdirectory is relevant for MVS):

https://github.com/StanfordPascal/Pascal

Link to the documentation:

New Stanford Pascal Installation Guide for MVS (2017)

New types: CHAR (n), DECIMAL (n,m) and STRING (n) - aka VARCHAR (n)

Compiler version: 2018.01

Most Pascal compilers support variable length strings in some way. Pascal/VS has a STRING(n) type, which is similar to CHAR(n) VARYING of PL/1 (implemented as a varying string of CHAR with a 2 byte length field). This seemed OK for me, so I wanted to implement such a STRING type, too.

The first task was to allow types to have parameters. I thought about a DECIMAL type, too, and thought it would be best to allow two parameters in general, so that DECIMAL could be handled in the same way.

Then it came to my mind, that CHAR(n) would be nice - simply as an abbreviation for ARRAY [1..n] OF CHAR. This would then be the third type which could be used with parameters.

The method is as follows: as soon as a type with a parameter is encountered - say: CHAR (20) - a type is constructed which is derived from the appropriate standard type (in this case ARRAY [1..n] OF CHAR) and this new type is given an artificial, but unique name, which can technically never be chosen as a normal identifier. And then this new type is inserted in the type identifier list at the top procedure level, as if it were defined in the top procedure. If later the same type is specified again (e.g. another definition using CHAR (20)), the type is already there and all works well. Furthermore, all CHAR(n) types have been made compatible in the meantime in assignment etc. (shorter to longer ones, at least), so the different CHAR(n) types can be mixed without problems.

This IMO is very convenient for the users of the New Stanford Pascal Compiler and makes it almost as comfortable to use as the different PL/1 compilers of its time.

I did something similar to DECIMAL definitions; see next article.

The implementation of STRINGs (real varying character strings) was a much greater effort; this will be covered in some future articles. But here again: all STRING types are compatible, regardless of the length. When strings are assigned, assignments will always work, as long as the field length of the target string variable is sufficient. Strings are never truncated, like in Pascal/VS.

Example program to test some of the new variable types:


program TESTCHAR ( OUTPUT ) ; //************************************************* //$A+ //************************************************* const TESTKONST = 'Bernd ' 'Oppolzer' ; S_VS_1 = X'1b' '&l12D' X'0d0a' ; X = 15 ; Y = 1.2 ; type CHAR25 = array [ 1 .. 25 ] of CHAR ; CHAR30 = array [ 1 .. 30 ] of CHAR ; var CH : CHAR ; CH2 : CHAR ( 25 ) ; CH3 : CHAR ( 14 ) ; CH4 : CHAR ( 25 ) ; CH5 : CHAR25 ; CH6 : CHAR30 ; D1 : DECIMAL ( 7 ) ; D2 : DECIMAL ( 15 , 2 ) ; D3 : DECIMAL ( 7 ) ; D4 : DECIMAL ( 25 , 0 ) ; S1 : STRING ( 254 ) ; S2 : STRING ( 3000 ) ; V1 : VARCHAR ( 254 ) ; V2 : VARCHAR ( 3000 ) ; //************************************ // types with syntax errors: // // IFALSCH : INTEGER ( 2 ) ; // D3 : DECIMAL ; // S4 : STRING ; // V4 : VARCHAR ; // D5 : DECIMAL ( 7 , 13 ) ; // D6 : DECIMAL ( 50 , 0 ) ; // S3 : STRING ( 0 ) ; // V3 : VARCHAR ( 0 ) ; //************************************ TESTCP : -> CHAR ; procedure TESTWRITE ( X : CHAR ( 30 ) ) ; begin (* TESTWRITE *) WRITELN ( 'testwrite: x = ' , X ) ; end (* TESTWRITE *) ; procedure TESTWRITE2 ( X : CHAR30 ) ; begin (* TESTWRITE2 *) WRITELN ( 'testwrite2: x = ' , X ) ; end (* TESTWRITE2 *) ; begin (* HAUPTPROGRAMM *) CH := '''' ; WRITELN ( CH ) ; WRITELN ( 'sizeof string const = ' , SIZEOF ( S_VS_1 ) ) ; WRITELN ( 'const = ' , S_VS_1 ) ; WRITELN ( 'sizeof string const = ' , SIZEOF ( TESTKONST ) ) ; WRITELN ( 'const = ' , TESTKONST ) ; WRITELN ( 'sizeof integer const = ' , SIZEOF ( X ) , X ) ; WRITELN ( 'sizeof real const = ' , SIZEOF ( Y ) , Y ) ; WRITELN ( 'Tests neue char-Datentypen:' ) ; CH := 'A' ; WRITELN ( 'test1: ch = ' , CH ) ; CH3 := CH ; WRITELN ( 'test1: ch3 = ' , CH3 ) ; CH3 := TESTKONST ; CH2 := CH3 ; D2 := 1234.56 ; S1 := 'das ist ein String' ; CH5 := CH2 ; TESTWRITE ( CH2 ) ; TESTWRITE ( CH5 ) ; CH5 := 'das ist ein String' ; CH2 := CH5 ; TESTWRITE2 ( CH5 ) ; WRITELN ( 'test1: ch2 = ' , CH2 ) ; end (* HAUPTPROGRAMM *) .


Cheating on DECIMALs

Compiler version: 2018.01


Implementing Strings in Pascal (inspired by Pascal/VS)

Compiler version: 2018.01


Back to Compiler main page