Just to keep you up to date:
in the meantime, my co-worker removed the ON ERROR units from the module which
calls the math lib. We now only have an ON ERROR unit in the main routine and
maybe additional try-catch things in the C++ area.
Furthermore, we managed to run this job in the test environment, but we have
only 8 input testcases, while in production there are several hundreds. So we
limited the storage in test to force the short-on-storage condition.
Running with 230 MB, all runs fine, also with 221 MB. With 220 MB, there is some
error in a terminating routine, which has yet to be examined, but the "normal"
work has been done. With 219 MB, we have a strange loop inside the C++ area,
followed by a user abend U4087 reason 3 - that means: another error condition
occured, while in a language specific error handler !! This seems to be the same
situation as in production; there is some evidence.
This time, we get a dump, which I will examine tomorrow.
I don't understand so far, why the presence of the ON ERRORs leads to an endless
loop, while in absence of the ON ERRORs there is the U4087 reason 3.
To make it clear: there are two problems:
a) the original problem with the storage which is not freed correctly. This has
to be solved, of course.
b) but: I would like to understand also, why we don't get a proper ABEND in this
case (when the storage is used up), but instead the endless loop.
Am 21.06.2012 15:38, schrieb R.:
> From: P.F.
> Sent: Thursday, 21 June 2012 9:56 PM
>> On 6/21/2012 7:16 AM, Bernd Oppolzer wrote:
>>> You are right, the
>>> ON ERROR BEGIN;
>>> ON ERROR GOTO ENDE;
>>> unit works as expected.
>>> I tested it using a little test program which allocated buffer of 1
>>> million bytes
>>> in a loop (1000 times). After the 400-th iteration the error unit
>>> was entered,
>>> following a STORAGE condition, which is reasonable, due to region
>>> size etc.
>>> The iterations continued as expected, but because there was no STORAGE,
>>> starting from then, the error unit was entered at every iteration.
>>> Now the difference to the real thing is:
>>> the storage is not allocated using PL/1-ALLOC, but the C++ math lib
>>> allocates the storage; maybe there is even Xerces or other software
>>> packages involved (some hundred modules, I don't know all of them).
>>> So I'm not sure if the short-on-storage condition below is reflected by
>>> a PL/1 STORAGE condition (maybe not). And: it could well be that
>>> there are additional error handlers below this PL/1 level (maybe
>>> C++ try/catch logic).
>> God knows what C and C++ do, they have only the most basic ideas of
>> error handling. What I would do if you want to keep the _idea_ of
>> this logic is have the first ERROR ON-unit execute "ON ERROR SYSTEM;"
>> , try its cleanup, reexecute the call _in the ON-Unit_, and then
>> GOTO the next statement following the original CALL, which should be
>> changed to "REVERT ERROR;". That way you only get one shot at
>> recovery and shouldn't be able to get an infinite loop.
> There's no evidence yet that the loop is in the error-handling.
> To prove one way or the other, SNAP needs to be inserted
> after ON ERROR in the two statements that use BEGIN blocks.