Just this week, I had to correct a routine for a customer of mine that outputs a
line of character data in CSV format. The routine had some errors and flaws. I
would like to tell you what I did and discuss some of the properties of the CSV
format.
First: the routine gets as input a number (number of items) and an array of char
pointers that point to (null terminated) char data which have to be written to
the output file in CSV format. All data should be char data; the numeric values
have been converted to char before the call to that routine. Another parameter
to the routine is the separation char, which is semicolon in our case (could
well be comma or tab, no problem at all).
Second: for CSV, it is ok to surround every field with quotes. ExCel etc. works
perfectly with this and omits the quotes while reading the CSV. This is very
good, because this way you don't have problems if the separation char appears
inside the value fields. So this is what I did. Another goodie: if the content
of the field is numeric, ExCel does sometimes strange things; it tries to
interpret the value and converts for example currency values to date fields
etc., which is plain nonsense. You get around this if you enclose all your
fields in quotes. Of course, if would not be necessary for char fields that are
not numeric. But we do it for all fields, that makes the logic simpler.
Third: the only thing you have to consider: if a qoute appears inside your value
fields, you have to double it.
Forth: to reduce the size of the CSV file, I omit trailing blanks in the char
fields, that is: in the loop that puts the chars into the output buffer, the
position of the last non-blank char is stored in a temp variable, and the
closing quote is put just behind that position, when the end of the output
string is reached (implicit right trim of the output string at almost no cost).
This reduced the output files to ca. 2/3 of the original size, without any
effect on the apperance of the result in ExCel.
I guess this is Turing-complete, BTW.
<ad>
This is exactly the same logic I implemented in the CSV output (and input)
routine of my DB2 and Oracle ETL tool that I mentioned in some prior posts. Feel
free to contact me offline, if you want to know more about this.
</ad>
To the List admins: I decided not to put Ad: in the subject, because this post
in my opinion was a technical post for the most part. It is sometimes dfficult
to separate pure advertising from technical information, how something is or
should be done.
I hope you can accept this.
Kind regards
Bernd
Am 26.02.2015 um 18:51 schrieb P.G.:
> On Thu, 26 Feb 2015 08:20:49 -0800, S.K. wrote:
>> Tony suggested the use of Tab (X'05') as delimiter which will avoid the
>> problem of data already have the common delimiter comma.
>>
> I stand by my assertion that lacking a priori knowledge that a character
> can not occur in the data the task becomes more difficult; not impossible.
> And I can't resist mentioning (again):
>
> http://xkcd.com/327/
>
>> This is the second time in this week that you had a comment about DFSORT.
>> I am a developer of DFSORT and there are many cool things that DFSORT can
>> do and I am more than willing to show them. However I also believe using
>> the right utility/program for the right job. I do not suggest to use
>> DFSORT for every solution I posted here.
>>
> My apologies if I offended. I recognize that DFSORT isn't the only tool in your
> kit, and I'm aware that you suggest other techniques when appropriate.
>
> Still, since DFSORT appears to be the Swiss Army Knife of z/OS, I can't
> resist wondering, idly, whether DFSORT is Turing-complete.
>
> -- G.
>
|