Some years ago we were forced to build a new and very large application system
based on a database design where all tables have UUIDs as primary keys.
The UUIDs are defined as CHAR(36), but the have a FIELDPROC on it
that translates them to a CHAR(16) internally and moves the different parts
of it around, so that the INSERTS get (hopefully) distributed uniformly
across the key interval.
I am very critical to this design, because on tables that have 1:n relations
to each other (several layers), it is not possible to have common keys on
such tables, because the design paradigm says that every table has to have
its own UUID PK. So at the third layer at least it is not possible to cluster
the tables of such a hierarchical table complex on the same key,
because the (same) keys are simply not present on the lower tables.
I hope, you can follow this problem description...
For example:
table a: pk a#
table b: pk b#, fk a#
table c: pk c#, fk b# (no a# in table c)
all keys are UUIDs.
It is not possible to cluster table c on the same key as table a, because there
is no common key.
From this design, we get heavy I/O waits on the batch processes that
work on those large tables. The applications normally have hours of
elapsed time, but only minutes of CPU time.
We try to overcome this problem by parallelizing the batch processes
heavily, but anyway: although we only have some low percentage of the
load on the new system (the rest still remains on the old legacy systems),
the new system already doesn't perform well and is on the limit.
It is not possible in an easy way to change the design paradigm of the
new system (that is: change the keys from UUID to natural keys) - for
several reasons.
The DB2 key features regarding this problem are: Sequential Prefetch
and Detection and Clusterratio of the Clustering Index (there always
should be one on every table, and this should be the one which is
used most of the time).
I would strongly recommend not to use artificial keys as primary keys,
but natural keys. If some tables are in a hierarchical relationship,
put the keys from the higher tables to the lower tables, and cluster the
tables on the same keys (possibly: partition the tables on the same
keys and the same intervals).
Kind regards
Bernd
Am 01.05.2013 16:18, schrieb L.L.:
> Andre:
>
> The decision to 'generate' a surrogate key for your data is a
> business/application decision. UUIDs are one method of doing this.
>
> In my experience, the application decision to use a UUID (or other non-natural
> key) is usually a mistake. Most business data is best identified by natural
> keys, at least in the logical design and/or presentation layer. It is only when
> you do a physical DB design that you need to consider surrogate keys, and this
> is usually a deliberate design decision based on the application. For example, a
> data warehouse commonly uses surrogate keys to identify entities having
> differing physical attributes (consider how to store AccountNumber fields having
> varying lengths and then JOINing them; better to JOIN on a surrogate key.)
>
> One exception to the "don't use UUIDs as keys" rule is transactional data.
> Another popular ID choice for transactions is Timestamp. Again, a business
> decision. However, it is rare that you need to use a database to store
> transactional data. (If you are, then you are using a database as a "file". Is
> this a good design?)
>
> I would recommend against using UUIDs as Primary Keys. PKs are for
> DBMS-enforced referential integrity, which means propagating the key values to
> tables with FKs. Depending on which UUID format you are planning on using (MAC
> address? SHA-1 hash?), do you really want to force a 32 +/- byte field into
> every child table? In many tables, this will mean the key is longer than the
> data!
>
> Another issue is uniqueness. If you intend to use UUIDs to ensure uniqueness,
> the DBMS won't know about this, and can't enforce it without a unique index.
> There are some (minor) space and performance considerations with creating an
> index on a 32-byte string.
>
> One possibility to consider is a "translation" table. Create a table with
> columns (UUID, SEQ#), where the SEQ# is a DB2 INTEGER Identity column, or is
> populated from an INTEGER Sequence. This limits your UUID "footprint" to a
> single table, provides a shorter 4-byte surrogate key for use in all tables, and
> even provides an association table if you need to JOIN anything based your
> (small number of) tables that contain the UUID.
>
> Your mileage may vary! Not tested!
>
> Hope this helps.
>
> - L.L.
>
>
> From: A.L.
> Sent: Wednesday, May 01, 2013 8:51 AM
> To: 'db2-l@lists.idug.org'
> Subject: [DB2-L] - RE: DB2 z/OS - How do you deal with UUID?
>
> Hi M.,
>
> I completely agree with you on SEQUENCES vs. IDENTITY. When it comes to UUID,
> I'd much rather use a SEQUENCE instead.
> But generating a 'unique' random number on any platform, without requiring
> a DB2 connection, is becoming a big issue for us.
>
> I'd like to hear from anyone who has some experience, good or bad, using UUID
> on DB2 z/OS. Unless this is really not an issue.
>
> Thanks,
> A.
>
> -----End Original Message-----
>
>
|