Collation Sequence Definitions

Normally, GT.M orders data with numeric values first, followed by strings sequenced by ASCII values. To use an alternative collating sequence the following items must be provided at GT.M process intialization.

Creating the Shared Library holding the alternative sequencing routines

A shared library for an alternative collation sequence must contain the following four routines:

  1. gtm_ac_xform_1: Transforms subscripts up to the maximum supported string length to the alternative collation sequence, or

    gtm_ac_xform: Transforms subscripts up to 32,767 bytes to the alternative collation sequence.

  2. gtm_ac_xback_1: Use with gtm_ac_xform_1 to transform the alternative collation keys back to the original subscript representation, or

    gtm_ac_xback: Use with gtm_ac_xform to transforms the alternative collation keys back to the original subscript representation.

  3. gtm_ac_version: Returns a numeric version identifier for the "currently active" set of collation routines.

  4. gtm_ac_verify: Returns the success (odd) or failure (even) in matching a collation sequence with a given version number.

GT.M searches the shared library for the gtm_ac_xform_1 and gtm_ac_xback_1 before searching for the gtm_ac_xform and gtm_ac_xback routines. If the shared library contains gtm_ac_xform_1, GT.M ignores gtm_ac_xform even if it is present. If GT.M finds gtm_ac_xform_1 but does not find gtm_ac_xback_1, it reports a COLLATIONUNDEF error with an additional mismatch COLLFNMISSING warning.

If the application does not use strings longer than 32,767 bytes, the alternative collation library need not contain the gtm_ac_xform_1 and gtm_ac_xback_1 routines. On the other hand, if the application passes strings greater than 32,767 bytes (but less than the maximum support string length) and does not provide gtm_xc_xform_1 and gtm_xc_xback_1, GT.M issues the COLLARGLONG run-time error.

Note that database key sizes are much more restricted by GT.M than local key sizes, and may be restricted further by user configuration.

Defining the Environment Variable

GT.M locates the alternative collation sequences through the environment variable gtm_collate_n where n is an integer from 1 to 255 that identifies the collation sequence, and pathname identifies the shared library containing the routines for that collation sequence, for example:

$ gtm_collate_1=/opt/fis-gtm/collation
$ export gtm_collate_1

Multiple alternative collation sequence definitions can co-exist.

Considerations in Establishing Alternative Collations

Alternative collation sequences for a global must be set when the global contains no data. When the global is defined the collation sequence is stored in the global. This ensures the future integrity of the global's collation. If it becomes necessary to change the collation sequence of a global containing data, you must copy the data to a temporary repository, delete the global, modify the variable's collation sequence by reinitializing the global either in a region that has the desired collation or with %GBLDEF, and restore the data from the temporary repository.

Be careful when creating the transformation and inverse transformation routines. The transformation routine must unambiguously and reliably encode every possible input value. The inverse routine must faithfully return the original value in every case. Errors in these routines can produce delayed symptoms that could be hard to debug. These routines may not be written in M.

Defining a Default Database Collation Method

GT.M lets you define an alternative collation sequence as the default when creating a new database. Subsequently, this default is applied when each new global is created.

This default collation sequence is set as a GDE qualifier for the ADD, CHANGE, and TEMPLATE commands using the following example with CHANGE:

GDE>CHANGE -REGION DEFAULT -COLLATION_DEFAULT=<0-255>

This qualifier always applies to regions, and takes effect when a database is created with MUPIP CREATE. The output of GDE SHOW displays this value, and DSE DUMP -FILEHEADER also includes this information. In the absence of an alternative default collations sequence, the default used is 0, or ASCII.

The value cannot be changed once a database file is created, and will be in effect for the life of the database file. The same restriction applies to the version of the collation sequence. The version of a collation sequence implementation is also stored in the database fileheader and cannot be modified except by recreating the file.

If the code of the collation sequence changes, making it incompatible with the collation sequence in use when the database was created, use the following procedure to ensure the continued validity of the database. MUPIP EXTRACT the database using the older compatible collation routines, then recreate and MUPIP LOAD using the newer collation routines.

Establishing A Local Collation Sequence

All subscripted local variables for a process must use the same collation sequence. The collation sequence used by local variables can be established as a default or in the current process. The local collation sequence can only be changed when a process has no subscripted local variables defined.

To establish a default local collation sequence provide a numeric value to the environment variable gtm_local_collate to select one of the collation tables, for example:

$ gtm_local_collate=n
$ export gtm_local_collate

where n is the number of a collation sequence that matches a valid collation number defined by an environment variable in the form gtm_collate_n.

An active process can use the %LCLCOL utility to define the collation sequence for subscripts of local variables.

For more information, refer to “%LCLCOL ”in the Utilities Chapter of this manual.

set^%LCLCOL(n)changes the local collation to the type specified by n.

Example:

IF '$$set^%LCLCOL(3) D
. Write "local collation sequence not changed",! Break

This piece of code illustrates $$set^LCLCOL used as an extrinsic. It would write an error message and BREAK if the local collation sequence was not set to 3.

set^%LCLCOL(n,ncol) determines the null collation type to be used with the collation type n.

  • If the truth value of ncol is FALSE(0), local variables use the GT.M standard null collation.

  • If the truth value of ncol is TRUE(1), local variables use the M standard null collation.

With set^%LCLCOL(,ncol), the null collation order can be changed while keeping the alternate collation order unchanged. If subscripted local variables exist, null collation order cannot be changed. In this case, GT.M issues the COLLDATAEXISTS error.

get^%LCLCOL returns the current local type.

Example:

GTM>Write $$get^%LCLCOL
0

This example uses $$get^%LCLCOL as an extrinsic that returns 0, indicating that the effective local collation sequence is the standard M collation sequence.

If set^%LCLCOL is not specified and gtm_local_collate is not defined, or is invalid, the process uses M standard collation. The following would be considered invalid values:

  • A value less than 0

  • A value greater than 255

  • A legal collation sequence that is inaccessible to the process

Inaccessibility could be caused by a missing environment variable, a missing image, or security denial of access.