Oppolzer - Informatik / Stanford Pascal Compiler


Home       Lebenslauf       Schwerpunkte       Kenntnisse       Seminare       Kunden       Projekte       Produkte       Blog       Stanford Pascal       Kontakt

The Stanford Pascal Compiler / Evolution Steps

Back to Compiler main page

New source program scanner (PASSCAN) - separate module

Compiler version: 10.2017

This is a change which I wanted to do since long time ago.

The old source program scanner (procedure INSYMBOL) has been completely replaced by a new scanner called PASSCAN; the new scanner is not hand-written any more, but it is generated using a scanner-generating tool that was written at the Computer Science department of the Stuttgart University in 1980 by four students (including myself). I extended this scanner generator in 1996, to make a usable product out of it, and I used it in many projects from 1996 until today. PASSCAN is an external module, seperate from the compiler. It does all the source handling and it writes the compile listing.

The new scanner will make extensions to the compiler symbol repertoire much easier, because it is generated from a "grammar", which is in fact a large regular expression (with attributes). The scanner generator works similar to the well-known Unix tool "lex".

With the help of the new scanner, I added some (scanner-related) features to the compiler:

- C++ style comments (terminated at the end of the line)
- binary integer constants
- variables starting with an underscore
- improvements on the compiler listing
- compiler messages (including source text) shown at terminal output

This last improvement is very helpful during development, because in most cases it is no longer needed to open the listing file and look for the compiler messages there, which speeds up development and makes it more fun.

Example: when compiling the old compiler with the new compiler, you get the following warning:


**** STANFORD PASCAL COMPILER, OPPOLZER VERSION OF 10.2017 **** 1839 (*$D- ... MUST BE IN EFFECT FOR THIS LOOP *) ! ** Warning S005: the rest of the options string will be ignored **** Compiler Summary **** **** WARNING: PASCAL EXTENSIONS USED. **** 1 Warning. **** 15161 LINE(S) READ, 157 PROCEDURE(S) COMPILED, **** 25920 P_INSTRUCTIONS GENERATED, 7.22 SECONDS IN COMPILATION.

The warning is new (it was not present on the old compiler); it says that the options in the comments are followed by text which is not considered as an option. This warning is shown at the Windows or CMS console (for example), given the appropriate DD assignments for OUTPUT. The number 1839 on the left is the source line number.

Some more experience from the restructuring of the compiler

By extracting the scanner logic from the compiler, the overall structure of the compiler became much clearer, and some unused or strange constructs could be removed.

But there were some strange situation during the migration, too.

First of all, the Pascal grammar has a well-known problem: when a subrange definition involving integers starts this way:

1..50

the scanner first thinks it could be the beginning of a real number, and when encountering the second period, it has to rethink this hypothesis. I fixed this by introducing another symbol into the grammar, which I called INTDOTDOT; this is an integer constant, followed by two dots. This way my grammar worked without re-tracking the characters.

There were some other problems; I first had an error in the definition of real numbers, and I forgot that identifiers can start with a dollar char in Stanford Pascal etc. etc., but all that can be fixed within minutes ... simply by generating a new scanner.

The compiler listing is produced by PASSCAN, too; I reworked it a little, but the information content stays the same. Some excerpts from the compiler listing:


1LINE # D/NEST LVL < STANFORD PASCAL, OPPOLZER VERSION OF 10.2017 > 13:18:43 10/03/2017 PAGE 28 ....5...10....5...20....5...30....5...40....5...50....5...60....5...70.. 1729 3N 2) begin 1730 3N 2) WRITE ( F , '1' ) ; 1731 3N 2) I := I - X ; 1732 3N 2) end (* then *) 1733 2N 2) else 1734 2N 2) WRITE ( F , '0' ) ; 1735 2N 2) X := X DIV 2 ; 1736 2N 2) end (* for *) 1737 ) end (* WRITEBINBYTE *) ; 1738 ) 1739 ) 1740 ) 1741 ) function MODP ( X : INTEGER ; Y : INTEGER ) : INTEGER ; 1742 ) 1743 120D 2) var M : INTEGER ; 1744 120D 2) 1745 ) begin (* MODP *) 1746 1N 2) M := X MOD Y ; 1747 1N 2) if M < 0 then 1748 1N 2) M := M + Y ; 1749 1N 2) MODP := M ; 1750 ) end (* MODP *) ; 1751 ) 1752 ) 1753 ) 1754 ) procedure INTTOSTR ( CP : VOIDPTR ; LEN : INTEGER ; VAL : INTEGER ; 1755 ) ZEROES : BOOLEAN ) ; 1756 ) 1757 125D 2) var BUFFER : array [ 1 .. 20 ] of CHAR ; 1758 145D 2) MINUS : BOOLEAN ; 1759 146D 2) LETZT : INTEGER ; 1760 152D 2) I : INTEGER ; 1761 156D 2) LIMIT : INTEGER ; 1762 160D 2) LENX : INTEGER ; 1763 164D 2) POSX : INTEGER ; 1764 164D 2) 1765 ) begin (* INTTOSTR *) 1766 1N 2) if VAL < 0 then 1767 2N 2) begin 1768 2N 2) VAL := - VAL ; 1769 2N 2) MINUS := TRUE 1770 2N 2) end (* then *) 1771 1N 2) else 1772 1N 2) MINUS := FALSE ; 1773 1N 2) I := 20 ; 1774 1N 2) BUFFER := ' ' ; 1775 1N 2) if VAL = 0 then 1776 2N 2) begin 1777 2N 2) BUFFER [ I ] := '0' ; 1778 2N 2) I := I - 1 ; 1779 2N 2) end (* then *) 1780 1N 2) else 1781 1N 2) while VAL > 0 do 1782 2N 2) begin 1783 2N 2) LETZT := VAL MOD 10 ; 1784 2N 2) BUFFER [ I ] := CHR ( ORD ( '0' ) + LETZT ) ; 1785 2N 2) I := I - 1 ; 1786 2N 2) VAL := VAL DIV 10 ; 1787 1N 2) end (* while *) ; 1788 1N 2) LIMIT := 20 - LEN + 1 ; 1789 1N 2) if MINUS then 1790 1N 2) LIMIT := LIMIT + 1 ; 1791 1N 2) if ZEROES then 1792 1N 2) while I > LIMIT do ....5...10....5...20....5...30....5...40....5...50....5...60....5...70..

This part of the compiler listing shows the reworked compiler information.

On the left of the source lines, you have the source line number first, and then data offset or nesting information, depending if the source line contains declarations or executable statements.

From the data offsets shown above (line 1757 ff), you can see that the variable BUFFER, for example, is located at offset 125 of the automatic storage block of procedure INTTOSTR. It is 20 bytes long. The next variable MINUS is at offset 145 and is 1 byte long; LETZT is an integer (4 bytes) and is at 148 due to alignment needs, I at 152 and so on ...

The nesting level is increased on every BEGIN symbol; maybe it would be better to increase it on IF, WHILE etc., too - much the same way as the indentation is done by PASFORM on the example above.

If the source contains errors or warnings, the messages are inserted directly into the source protocol (and shown on DD:OUTPUT, too):


1LINE # D/NEST LVL < STANFORD PASCAL, OPPOLZER VERSION OF 10.2017 > 14:59:54 10/04/2017 PAGE 29 ....5...10....5...20....5...30....5...40....5...50....5...60....5...70.. 1793 3N 2) LASTKIND := CURRKIND ; 1794 3N 2) while FREEPOS < CURRPOS do 1795 4N 2) begin 1796 4N 2) WRITE ( LISTING , ' ' ) ; 1797 4N 2) FREEPOS := FREEPOS + 1 1798 3N 2) end (* while *) ; 1799 3N 2) WRITE ( LISTING , CURRKIND ) ; 1800 3N 2) LASTPOS := CURRPOS 1801 2N 2) end (* else *) ; 1802 2N 2) if CURRNMR < 10 then 1803 2N 2) F := 1 1804 2N 2) else 1805 2N 2) if CURRNMR < 100 then 1806 2N 2) F := 2 1807 2N 2) else 1808 2N 2) F := 3 ; 1809 2N 2) WRITE ( LISTING , CURRNMR : F ) ; 1810 2N 2) FREEPOS := FREEPOS + F + 1 1811 1N 2) end (* for *) ; 1812 1N 2) WRITELN ( LISTING ) ; 1813 1N 2) ERRINX := 0 ; 1814 1N 2) if ERRORCNT > 0 then 1815 1N 2) PRCODE := FALSE ; 1816 1N 2) if ERRLN > 0 then 1817 1N 2) WRITELN ( LISTING , '****' : 9 , 1818 1N 2) ' PREVIOUS ERROR/WARNING ON LINE -->' , ERRLN : 4 ) ; 1819 1N 2) ERRLN := LINECNT ; 1820 ) end (* PRINTERROR *) ; 1821 ) 1822 ) 1823 ) 1824 ) procedure ENDOFLINE ; 1825 ) 1826 ) label 10 ; 1827 ) 1828 112D 2) var I : 1 .. 9 ; 1829 114D 2) DCN : INTEGER ; 1830 114D 2) 1831 ) begin (* ENDOFLINE *) 1832 1N 2) if ERRINX > 0 then 1833 1N 2) PRINTERROR ; 1834 1N 2) READLN ( INPUT , LINEBUF ) ; 1835 1N 2) LINELEN := BUFEND ; 1836 1N 2) 1837 1N 2) (*******************************************************) 1838 1N 2) (*THIS WILL SPEED THINGS UP IF NO MARGIN IS SET/RESET *) 1839 1N 2) (*$D- ... MUST BE IN EFFECT FOR THIS LOOP *) ! ** Warning S005: the rest of the options string will be ignored 1840 1N 2) (*******************************************************) 1841 1N 2) 1842 2N 2) repeat 1843 2N 2) LINELEN := LINELEN - 1 ; 1844 1N 2) until LINEBUF [ LINELEN ] <> ' ' ; 1845 1N 2) 1846 1N 2) (***********************************************************) 1847 1N 2) (* IF NEEDED, DEBUG SWITCH SHOULD BE RESTORED HERE ---> $D+*) 1848 1N 2) (***********************************************************) 1849 1N 2) 1850 1N 2) 10 : 1851 1N 2) if LINELEN > RMARGIN then 1852 2N 2) begin 1853 2N 2) MWARN := TRUE ; ....5...10....5...20....5...30....5...40....5...50....5...60....5...70..

Back to Compiler main page