I'd like to enter this thread once again.
90 percent of the dumps - if not more - that we see at our site are dumps from
normal application code, that is 0Cx etc. exceptions or other "easy" resolvable
reasons from application code. Or same reason, but the error is detected
somewhere below, for example in a LE routine which is called after a call to a
PL/1 or C runtime routine. Then the caller of this runtime routine normally has
to be blamed for it.
We automized dump reading as much as possible - making it almost unnecessary
most of the time - by providing an LE exit which runs in all our environment and
which in case of an error catches this error and provides enough information
from the save area back trace, that normally the application developer only has
to look at those informations and simply doesn't need to refer to the following
SYSUDUMP. For example, we print every DSA for every procedure call, together
with the name of the function, the parameter address lists of every call, the
complete call hierarchy etc., the registers at every call level, the offset of
the call etc. If the error is indeed in a LE routine below the application code,
we recognize this and go up to the application code and identify the error
position in the application code - same goes for DB2 errors, that is, when the
error position is in the routine that is handling the DB2 "SQLCODE not handled"
condition. And: if we found the name of the module which is the cause of the
error, we send an alarm mail to the department which is reponsible for the
module - we get this information from a repository.
The information provided this way is much easier to read for our people than
SYSUDUMP and even easier than CEEDUMP (it has more information, has a somehow
better structure in our opinion, and - important for some of our co-workers -
it's in German language).
Furthermore, we teach the developers how to cope with this.
This was necessary (we did it in 2005), because we realized some problems:
- the dumps looked different in the different environments (batch, test, DB
dialog aka IMS), but we wanted the same look and feel in every environment
- dump reading skills degraded
- we didn't want to buy an expensive tool and do the customizing in the
different environments; instead we wanted one of our own, where we could add
additional function (see above in an easy way)
From today's viewpoint, it looks like a success story.
Even in cases when the save area is destroyed (overwritten), the LE exit does a
very good job by providing at least the rests of the save area trace. It tries
to find the save areas first from the bottom (register 13), then from above
(TCBFSA), and in the normal case, the two chains fit together. If not, there is
a gap, and this gap is documented.
The save area trace and the back chain is very imporant for us, because at our
site we typically have many small modules calling each other and it is not
uncommon to see some 50 levels of calling hierarchy.
BTW: the method works regardless of the programming language; we have C, PL/1
and ASSEMBLER (and, at a neighbor site, the exit also works with C++ functions -
in fact the method to get the function name from the entry point is the same for
all LE languages, so I believe it will work for COBOL, too, although there is no
COBOL around).
Kind regards
Bernd
Am 29.07.2013 23:33, schrieb T.S.:
> (First set of registers are usually the calling program) be careful
>
> That can be misleading as the RTM2 will put things in the DUMP as well.
> That makes IPCS indispensable when dealing with dumps. The PSW might be pointing
> to something else initially in the dump so you have to scan down through the
> listing find you program. That's why, as some have said here mentorship is
> a good way to skip some other painful lessons. I still have to call on 50 year
> vetran even after I've done t6his for over 30.. I don't write wait post code..
> Good point though ..
>
>
|