Oppolzer - Informatik / Stanford Pascal Compiler
Home
Lebenslauf
Schwerpunkte
Kenntnisse
Seminare
Kunden
Projekte
Produkte
Blog
Stanford Pascal
Kontakt
The Stanford Pascal Compiler / Evolution Steps
Back to Compiler main page
New source program scanner (PASSCAN) - separate module
Compiler version: 10.2017
This is a change which I wanted to do since long time ago.
The old source program scanner (procedure INSYMBOL)
has been completely replaced by a new scanner called
PASSCAN; the new scanner is not hand-written any more,
but it is generated using a scanner-generating tool that
was written at the Computer Science department of the
Stuttgart University in 1980 by four students (including
myself). I extended this scanner generator in 1996, to
make a usable product out of it, and I used it in many
projects from 1996 until today. PASSCAN is an external
module, seperate from the compiler. It does all the
source handling and it writes the compile listing.
The new scanner will make extensions to the compiler
symbol repertoire much easier, because it is generated
from a "grammar", which is in fact a large regular
expression (with attributes). The scanner generator
works similar to the well-known Unix tool "lex".
With the help of the new scanner, I added some
(scanner-related) features to the compiler:
- C++ style comments (terminated at the end of the line)
- binary integer constants
- variables starting with an underscore
- improvements on the compiler listing
- compiler messages (including source text) shown at terminal output
This last improvement is very helpful during development, because
in most cases it is no longer needed to open the listing file and
look for the compiler messages there, which speeds up development
and makes it more fun.
Example: when compiling the old compiler with the new compiler,
you get the following warning:
**** STANFORD PASCAL COMPILER, OPPOLZER VERSION OF 10.2017 ****
1839 (*$D- ... MUST BE IN EFFECT FOR THIS LOOP *)
!
** Warning S005: the rest of the options string will be ignored
**** Compiler Summary ****
**** WARNING: PASCAL EXTENSIONS USED.
**** 1 Warning.
**** 15161 LINE(S) READ, 157 PROCEDURE(S) COMPILED,
**** 25920 P_INSTRUCTIONS GENERATED, 7.22 SECONDS IN COMPILATION.
|
The warning is new (it was not present on the old compiler);
it says that the options in the comments are followed by text
which is not considered as an option. This warning is shown at the
Windows or CMS console (for example), given the appropriate DD
assignments for OUTPUT. The number 1839 on the left is the source
line number.
Some more experience from the restructuring of the compiler
By extracting the scanner logic from the compiler, the overall structure
of the compiler became much clearer, and some unused or strange constructs
could be removed.
But there were some strange situation during the migration, too.
First of all, the Pascal grammar has a well-known problem: when a subrange
definition involving integers starts this way:
1..50
the scanner first thinks it could be the beginning of a real number, and
when encountering the second period, it has to rethink this hypothesis.
I fixed this by introducing another symbol into the grammar, which I
called INTDOTDOT; this is an integer constant, followed by two dots.
This way my grammar worked without re-tracking the characters.
There were some other problems; I first had an error in the definition
of real numbers, and I forgot that identifiers can start with a dollar
char in Stanford Pascal etc. etc., but all that can be fixed within
minutes ... simply by generating a new scanner.
The compiler listing is produced by PASSCAN, too; I reworked it a little,
but the information content stays the same. Some excerpts from the compiler
listing:
1LINE # D/NEST LVL < STANFORD PASCAL, OPPOLZER VERSION OF 10.2017 > 13:18:43 10/03/2017 PAGE 28
....5...10....5...20....5...30....5...40....5...50....5...60....5...70..
1729 3N 2) begin
1730 3N 2) WRITE ( F , '1' ) ;
1731 3N 2) I := I - X ;
1732 3N 2) end (* then *)
1733 2N 2) else
1734 2N 2) WRITE ( F , '0' ) ;
1735 2N 2) X := X DIV 2 ;
1736 2N 2) end (* for *)
1737 ) end (* WRITEBINBYTE *) ;
1738 )
1739 )
1740 )
1741 ) function MODP ( X : INTEGER ; Y : INTEGER ) : INTEGER ;
1742 )
1743 120D 2) var M : INTEGER ;
1744 120D 2)
1745 ) begin (* MODP *)
1746 1N 2) M := X MOD Y ;
1747 1N 2) if M < 0 then
1748 1N 2) M := M + Y ;
1749 1N 2) MODP := M ;
1750 ) end (* MODP *) ;
1751 )
1752 )
1753 )
1754 ) procedure INTTOSTR ( CP : VOIDPTR ; LEN : INTEGER ; VAL : INTEGER ;
1755 ) ZEROES : BOOLEAN ) ;
1756 )
1757 125D 2) var BUFFER : array [ 1 .. 20 ] of CHAR ;
1758 145D 2) MINUS : BOOLEAN ;
1759 146D 2) LETZT : INTEGER ;
1760 152D 2) I : INTEGER ;
1761 156D 2) LIMIT : INTEGER ;
1762 160D 2) LENX : INTEGER ;
1763 164D 2) POSX : INTEGER ;
1764 164D 2)
1765 ) begin (* INTTOSTR *)
1766 1N 2) if VAL < 0 then
1767 2N 2) begin
1768 2N 2) VAL := - VAL ;
1769 2N 2) MINUS := TRUE
1770 2N 2) end (* then *)
1771 1N 2) else
1772 1N 2) MINUS := FALSE ;
1773 1N 2) I := 20 ;
1774 1N 2) BUFFER := ' ' ;
1775 1N 2) if VAL = 0 then
1776 2N 2) begin
1777 2N 2) BUFFER [ I ] := '0' ;
1778 2N 2) I := I - 1 ;
1779 2N 2) end (* then *)
1780 1N 2) else
1781 1N 2) while VAL > 0 do
1782 2N 2) begin
1783 2N 2) LETZT := VAL MOD 10 ;
1784 2N 2) BUFFER [ I ] := CHR ( ORD ( '0' ) + LETZT ) ;
1785 2N 2) I := I - 1 ;
1786 2N 2) VAL := VAL DIV 10 ;
1787 1N 2) end (* while *) ;
1788 1N 2) LIMIT := 20 - LEN + 1 ;
1789 1N 2) if MINUS then
1790 1N 2) LIMIT := LIMIT + 1 ;
1791 1N 2) if ZEROES then
1792 1N 2) while I > LIMIT do
....5...10....5...20....5...30....5...40....5...50....5...60....5...70..
|
This part of the compiler listing shows the reworked compiler information.
On the left of the source lines, you have the source line number first,
and then data offset or nesting information, depending if the source
line contains declarations or executable statements.
From the data offsets shown above (line 1757 ff), you can see that the
variable BUFFER, for example, is located at offset 125 of the automatic
storage block of procedure INTTOSTR. It is 20 bytes long.
The next variable MINUS is at offset 145 and is 1 byte long;
LETZT is an integer (4 bytes) and is at 148 due to alignment needs,
I at 152 and so on ...
The nesting level is increased on every BEGIN symbol; maybe it would be
better to increase it on IF, WHILE etc., too - much the same way as the
indentation is done by PASFORM on the example above.
If the source contains errors or warnings, the messages are inserted
directly into the source protocol (and shown on DD:OUTPUT, too):
1LINE # D/NEST LVL < STANFORD PASCAL, OPPOLZER VERSION OF 10.2017 > 14:59:54 10/04/2017 PAGE 29
....5...10....5...20....5...30....5...40....5...50....5...60....5...70..
1793 3N 2) LASTKIND := CURRKIND ;
1794 3N 2) while FREEPOS < CURRPOS do
1795 4N 2) begin
1796 4N 2) WRITE ( LISTING , ' ' ) ;
1797 4N 2) FREEPOS := FREEPOS + 1
1798 3N 2) end (* while *) ;
1799 3N 2) WRITE ( LISTING , CURRKIND ) ;
1800 3N 2) LASTPOS := CURRPOS
1801 2N 2) end (* else *) ;
1802 2N 2) if CURRNMR < 10 then
1803 2N 2) F := 1
1804 2N 2) else
1805 2N 2) if CURRNMR < 100 then
1806 2N 2) F := 2
1807 2N 2) else
1808 2N 2) F := 3 ;
1809 2N 2) WRITE ( LISTING , CURRNMR : F ) ;
1810 2N 2) FREEPOS := FREEPOS + F + 1
1811 1N 2) end (* for *) ;
1812 1N 2) WRITELN ( LISTING ) ;
1813 1N 2) ERRINX := 0 ;
1814 1N 2) if ERRORCNT > 0 then
1815 1N 2) PRCODE := FALSE ;
1816 1N 2) if ERRLN > 0 then
1817 1N 2) WRITELN ( LISTING , '****' : 9 ,
1818 1N 2) ' PREVIOUS ERROR/WARNING ON LINE -->' , ERRLN : 4 ) ;
1819 1N 2) ERRLN := LINECNT ;
1820 ) end (* PRINTERROR *) ;
1821 )
1822 )
1823 )
1824 ) procedure ENDOFLINE ;
1825 )
1826 ) label 10 ;
1827 )
1828 112D 2) var I : 1 .. 9 ;
1829 114D 2) DCN : INTEGER ;
1830 114D 2)
1831 ) begin (* ENDOFLINE *)
1832 1N 2) if ERRINX > 0 then
1833 1N 2) PRINTERROR ;
1834 1N 2) READLN ( INPUT , LINEBUF ) ;
1835 1N 2) LINELEN := BUFEND ;
1836 1N 2)
1837 1N 2) (*******************************************************)
1838 1N 2) (*THIS WILL SPEED THINGS UP IF NO MARGIN IS SET/RESET *)
1839 1N 2) (*$D- ... MUST BE IN EFFECT FOR THIS LOOP *)
!
** Warning S005: the rest of the options string will be ignored
1840 1N 2) (*******************************************************)
1841 1N 2)
1842 2N 2) repeat
1843 2N 2) LINELEN := LINELEN - 1 ;
1844 1N 2) until LINEBUF [ LINELEN ] <> ' ' ;
1845 1N 2)
1846 1N 2) (***********************************************************)
1847 1N 2) (* IF NEEDED, DEBUG SWITCH SHOULD BE RESTORED HERE ---> $D+*)
1848 1N 2) (***********************************************************)
1849 1N 2)
1850 1N 2) 10 :
1851 1N 2) if LINELEN > RMARGIN then
1852 2N 2) begin
1853 2N 2) MWARN := TRUE ;
....5...10....5...20....5...30....5...40....5...50....5...60....5...70..
|
Back to Compiler main page