Creating the Alternate Collation Routines

Each alternative collation sequence requires a set of four user-created routines--gtm_ac_xform_1 (or gtm_ac_xform), gtm_ac_xback_1 (or gtm_ac_xback), gtm_ac_version, and gtm_ac_verify. The original and transformed strings are passed between GT.M and the user-created routines using parameters of type gtm_descriptor or gtm32_descriptor. An "include file" gtm_descript.h, located in the GT.M distribution directory, defines gtm_descriptor (used with gtm_ac_xform and gtm_ac_xback) as:

typedef struct
{
    short len;
    short type;
    void *val;
} gtm_descriptor;
[Note] Note

On 64-bit UNIX platforms, gtm_descriptor may grow by up to eight (8) additional bytes as a result of compiler padding to meet platform alignment requirements.

gtm_descript.h defines gtm32_descriptor (used with gtm_xc_xform_1 and gtm_xc_xback_2) as:

typedef struct
{
    unsigned int len;
    unsigned int type;
    void *val;
} gtm32_descriptor;

where len is the length of the data, type is set to DSC_K_DTYPE_T (indicating that this is an M string), and val points to the text of the string.

The interface to each routine is described below.

Transformation Routine (gtm_ac_xform_1 or gtm_ac_xform)

gtm_ac_xform_1 or gtm_ac_xform routines transforms subscripts to the alternative collation sequence.

This routine returns altered keys to the original subscripts. The syntax of this routine is:

#include "gtm_descript.h"
long gtm_ac_xback(gtm_descriptor *in, int level, gtm_descriptor *out, int *outlen)

If the application uses subscripted lvns longer than 32,767 bytes (but less than 1,048,576 bytes), the alternative collation library must contain the gtm_ac_xform_1 and gtm_ac_xback_1 routines. Otherwise, the alternative collation library can contain gtm_ac_xform and gtm_ac_xback.

The syntax of this routine is:

#include "gtm_descript.h"
int gtm_ac_xform_1(gtm32_descriptor* in, int level, gtm32_descriptor* out, int* outlen);

Input Arguments for gtm_ac_xform1

The input arguments for gtm_ac_xform1 are:

in: a gtm32_descriptor containing the string to be transformed.

level: an integer; this is not used currently, but is reserved for future facilities.

out: a gtm32_descriptor to be filled with the transformed key.

Output Arguments for gtm_ac_xform1

The output arguments for gtm_ac_xform1 are:

return value: A long word status code.

out: A transformed subscript in the string buffer, passed by gtm32_descriptor.

outlen: A 32-bit signed integer, passed by reference, returning the actual length of the transformed key.

The syntax of gtm_ac_xform routine is:

#include "gtm_descript.h"
long gtm_ac_xform(gtm_descriptor *in, int level, gtm_descriptor *out, int *outlen)

Input Arguments for gtm_ac_xform

The input arguments for gtm_ac_xform are:

in: a gtm_descriptor containing the string to be transformed.

level: an integer; this is not used currently, but is reserved for future facilities.

out: a gtm_descriptor to be filled with the transformed key.

Output Arguments for gtm_ac_xform

The output arguments for gtm_ac_xform are:

return value: a long result providing a status code; it indicates the success (zero) or failure (non-zero) of the transformation.

out: a gtm_descriptor containing the transformed key.

outlen: an unsigned long, passed by reference, giving the actual length of the output key.

Example:

#include "gtm_descript.h"
#define MYAPP_SUBSC2LONG 12345678
static unsigned char xform_table[256] =
{
  0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,
 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
 64, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93,
 95, 97, 99,101,103,105,107,109,111,113,115,117,118,119,120,121,
122, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94,
 96, 98,100,102,104,106,108,110,112,114,116,123,124,125,126,127,
128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,
144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,
160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,
176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,
192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,
208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,
224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,
240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255
};
long
gtm_ac_xform (in, level, out, outlen)
     gtm_descriptor *in;    /* the input string */
     int level;            /* the subscript level */
     gtm_descriptor *out;    /* the output buffer */
     int *outlen;        /* the length of the output string */
{
  int n;
  unsigned char *cp, *cout;
/* Ensure space in the output buffer for the string. */
  n = in->len;
  if (n > out->len)
    return MYAPP_SUBSC2LONG;
/* There is space, copy the string, transforming, if necessary */
  cp = in->val;            /* Address of first byte of input string */
  cout = out->val;        /* Address of first byte of output buffer */
  while (n-- > 0)
    *cout++ = xform_table[*cp++];
  *outlen = in->len;
  return 0;
}

Transformation Routine Characteristics

The input and output values may contain <NUL> (hex code 00) characters.

The collation transformation routine may concatenate a sentinel, such as <NUL>, followed by the original subscript on the end of the transformed key. If key length is not an issue, this permits the inverse transformation routine to simply retrieve the original subscript rather than calculating its value based on the transformed key.

If there are reasons not to append the entire original subscript, GT.M allows you to concatenate a sentinel plus a predefined code so the original subscript can be easily retrieved by the inverse transformation routine, but still assures a reformatted key that is unique.

Inverse Transformation Routine (gtm_ac_xback or gtm_ac_xback_1)

This routine returns altered keys to the original subscripts. The syntax of this routine is:

#include "gtm_descript.h"
long gtm_ac_xback(gtm_descriptor *in, int level, gtm_descriptor *out, int *outlen)

The arguments of gtm_ac_xback are identical to those of gtm_ac_xform.

The syntax of gtm_ac_xback_1 is:

#include "gtm_descript.h"
long gtm_ac_xback_1(gtm32_descriptor *src, int level, gtm32_descriptor *dst, int *dstlen)

The arguments of gtm_ac_xback_1 are identical to those of gtm_ac_xform_1.

Example:

#include "gtm_descript.h"
#define MYAPP_SUBSC2LONG 12345678
static unsigned char inverse_table[256] =
{
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
64, 65, 97, 66, 98, 67, 99, 68,100, 69,101, 70,102, 71,103, 72,
104, 73,105, 74,106, 75,107, 76,108, 77,109, 78,110, 79,111, 80,
112, 81,113, 82,114, 83,115, 84,116, 85,117, 86,118, 87,119, 88,
120, 89,121, 90,122, 91, 92, 93, 94, 95, 96,123,124,125,126,127,
128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,
144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,
160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,
176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,
192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,
208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,
224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,
240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255
};
long gtm_ac_xback (in, level, out, outlen)
     gtm_descriptor *in;    /* the input string */
     int level;             /* the subscript level */
     gtm_descriptor *out;   /* output buffer */
     int *outlen;           /* the length of the output string */
{
  int n;
  unsigned char *cp, *cout;
/* Ensure space in the output buffer for the string. */
  n = in->len;
  if (n > out->len)
    return MYAPP_SUBSC2LONG;
/* There is enough space, copy the string, transforming, if necessary */
  cp = in->val;            /* Address of first byte of input string */
  cout = out->val;        /* Address of first byte of output buffer */
  while (n-- > 0)
    *cout++ = inverse_table[*cp++];
  *outlen = in->len;
  return 0;
}

Transform Utility Routine (gtm_ac_xutil)

This routine returns a next or previous character in the collation sequence. The syntax of this routine is:

#include "gtm_descript.h"
long gtm_ac_xutil (gtm32_descriptor *in, int level, gtm32_descriptor *out, int *outlen, int op, int honor_numeric)

Input Arguments

The input arguments of gtm_ac_xutil are:

  • in: Specifies the input string; gtm_ac_xutil considers the first character of the input string.

  • level: Currently unused and should not be examined or changed.

  • honor_numeric: Boolean variable to specify whether to use standard GT.M collation for digits.

    • TRUE: use standard GT.M collation for digits before any other character

    • FALSE: treat digits the same as all other characters

Output Arguments

The output arguments of gtm_ac_xutil are:

  • out: Supplies the one (1) character result string produced by applying the collation operation if a result was possible.

  • outlen: Supplies to the caller the length of the returned string- 0 or 1.

  • op: Supplies the collation operation as follows:

    • 0: collation value of the given character

    • 1: character collating before the given character if it exists

    • 2: character collating after the given character if it exists

The gtm_ac_xutil function returns 0 on success and -1 on failure.

Version Control Routines (gtm_ac_version and gtm_ac_verify)

Two user-defined version control routines provide a safety mechanism to guard against a collation routine being used on the wrong global, or an attempt being made to modify a collation routine for an existing global. Either of these situations could cause incorrect collation or damage to subscripts.

When a global is assigned an alternative collation sequence, GT.M invokes a user-supplied routine that returns a numeric version identifier for the set of collation routines, which was stored with the global. The first time a process accesses the global, GT.M determines the assigned collation sequence, then invokes another user-supplied routine. The second routine matches the collation sequence and version identifier assigned to the global with those of the current set of collation routines.

When you write the code that matches the type and version, you can decide whether to modify the version identifier and whether to allow support of globals created using a previous version of the routine.

Version Identifier Routine (gtm_ac_version)

This routine returns an integer identifier between 0 and 255. This integer provides a mechanism to enforce compatibility as a collation sequence potentially evolves. When GT.M first uses an alternate collation sequence for a database or global, it captures the version and if it finds the version has changed it at some later startup, it generates an error. The syntax is:

int gtm_ac_version()

Example:

int gtm_ac_version()
{ 
    return 1;
}

Verification Routine (gtm_ac_verify)

This routine verifies that the type and version associated with a global are compatible with the active set of routines. Both the type and version are unsigned characters passed by value. The syntax is:

#include "gtm_descript.h"
int gtm_ac_verify(unsigned char type, unsigned char ver)

Example:

Example:
#include "gtm_descript.h"
#define MYAPP_WRONGVERSION 20406080    /* User condition */
gtm_ac_verify (type, ver)
     unsigned char type, ver;
{
  if (type == 3)
    {
      if (ver > 2)        /* version checking may be more complex */
    {
      return 0;
    }
}
  return MYAPP_WRONGVERSION;
}

Using the %GBLDEF Utility

Use the %GBLDEF utility to get, set, or kill the collation sequence of a global variable mapped by the current global directory. %GBLDEF cannot modify the collation sequence for either a global containing data or a global whose subscripts span multiple regions. To change the collation sequence for a global variable that contains data, extract the data, KILL the variable, change the collation sequence, and reload the data. Use GDE to modify the collation sequence of a global variable that spans regions.

For more information, refer to “%GBLDEF ”in the Utilities Chapter of this manual.

Assigning the Collation Sequence

To assign a collation sequence to an individual global use the extrinsic entry point:

set^%GBLDEF(gname,nct,act)

Example:

GTM>kill ^G
GTM>write $select($$set^%GBLDEF("^G",0,3):"ok",1:"failed")
ok
GTM>

This deletes the global variable ^G, then uses the $$set%GBLDEF as an extrinsic to set ^G to the collation sequence number 3 with numeric subscripts collating before strings. Using $$set%GBLDEF as an argument to $SELECT provides a return value as to whether or not the set was successful. $SELECT will return a "FAILED" message if the collation sequence requested is undefined.

Examining Global Collation Characteristics

To examine the collation characteristics currently assigned to a global use the extrinsic entry point:

get^%GBLDEF(gname[,reg])
[Note] Note

get^%GBLDEF(gname) returns global specific characteristics, which can differ from collation characteristics defined for the database file at MUPIP CREATE time from settings in the global directory.

DSE DUMP -FILEHEADER command displays region collation whenever the collation is other than M standard collation.

Example:

GTM>Write $$get^%GBLDEF("^G")
1,3,1

This example returns the collation sequence information currently assigned to the global ^G.

Deleting Global Collation Characteristics

To delete the collation characteristics currently assigned to a global, use the extrinsic entry point:

kill^%GBLDEF(gname)

Example of Upper and Lower Case Alphabetic Collation Sequence

This example is create an alternate collation sequence that collates upper and lower case alphabetic characters in such a way that the set of keys "du Pont," "Friendly," "le Blanc," and "Madrid" collates as:

  • du Pont

  • Friendly

  • le Blanc

  • Madrid

This is in contrast to the standard M collation that orders them as:

  • Friendly

  • Madrid

  • du Pont

  • le Blanc

[Important] Important

No claim of copyright is made with respect to the code used in this example. Please do not use the code as-is in a production environment.

Please ensure that you have a correctly configured GT.M installation, correctly configured environment variables, with appropriate directories and files.

Seasoned GT.M users may want download polish.c used in this example and proceed directly to Step 5 for compiling and linking instructions. First time users may want to start from Step 1.

  1. Create a new file called polish.c and put the following code:

    #include <stdio.h>
    #include "gtm_descript.h"
    #define COLLATION_TABLE_SIZE     256
    #define MYAPPS_SUBSC2LONG        12345678
    #define SUCCESS     0
    #define FAILURE     1                
    #define VERSION     0
    static unsigned char xform_table[COLLATION_TABLE_SIZE] =
              {
              0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
              16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
              32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
              48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
              64, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93,
              95, 97, 99,101,103,105,107,109,111,113,115,117,118,119,120,121,
              122, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94,
              96, 98,100,102,104,106,108,110,112,114,116,123,124,125,126,127,
              128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,
              144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,
              160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,
              176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,
              192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,
              208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,
              224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,
              240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255
              };
    static unsigned char inverse_table[COLLATION_TABLE_SIZE] =
              {
              0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
              16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
              32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
              48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
              64, 65, 97, 66, 98, 67, 99, 68,100, 69,101, 70,102, 71,103, 72,
              104, 73,105, 74,106, 75,107, 76,108, 77,109, 78,110, 79,111, 80,
              112, 81,113, 82,114, 83,115, 84,116, 85,117, 86,118, 87,119, 88,
              120, 89,121, 90,122, 91, 92, 93, 94, 95, 96,123,124,125,126,127,
              128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,
              144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,
              160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,
              176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,
              192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,
              208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,
              224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,
              240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255
              };

    Elements in xform_table represent input order for transform. Elements in inverse_table represent reverse transform for x_form_table.

  2. Add the following code for the gtm_ac_xform transformation routine:

    long gtm_ac_xform ( gtm_descriptor *src, int level, gtm_descriptor *dst, int *dstlen)
          {
              int n;
              unsigned char  *cp, *cpout;
          #ifdef DEBUG
              char input[COLLATION_TABLE_SIZE], output[COLLATION_TABLE_SIZE];
          #endif
              n = src->len;
              if ( n > dst->len)
                 return MYAPPS_SUBSC2LONG;
              cp  = (unsigned char *)src->val;
          #ifdef DEBUG
              memcpy(input, cp, src->len);
              input[src->len] = '\0';
          #endif
              cpout = (unsigned char *)dst->val;
              while ( n-- > 0 )
                 *cpout++ = xform_table[*cp++];
              *cpout = '\0';
              *dstlen = src->len;
          #ifdef DEBUG
              memcpy(output, dst->val, dst->len);
              output[dst->len] = '\0';
              fprintf(stderr, "\nInput = \n");
              for (n = 0; n < *dstlen; n++ ) fprintf(stderr," %d ",(int )input[n]);
              fprintf(stderr, "\nOutput = \n");
              for (n = 0; n < *dstlen; n++ ) fprintf(stderr," %d ",(int )output[n]);
          #endif
              return SUCCESS;
          }
       3. Add the following code for the gtm_ac_xback reverse transformation routine:
          long gtm_ac_xback ( gtm_descriptor *src, int level, gtm_descriptor *dst, int *dstlen)
          {
              int n;
              unsigned char  *cp, *cpout;
          #ifdef DEBUG
              char input[256], output[256];
          #endif
              n = src->len;
              if ( n > dst->len)
              return MYAPPS_SUBSC2LONG;
              cp  = (unsigned char *)src->val;
              cpout = (unsigned char *)dst->val;
              while ( n-- > 0 )
                 *cpout++ = inverse_table[*cp++];
              *cpout = '\0';
              *dstlen = src->len;
          #ifdef DEBUG
              memcpy(input, src->val, src->len);
              input[src->len] = '\';
              memcpy(output, dst->val, dst->len);
              output[dst->len] = '\0';
              fprintf(stderr, "Input = %s, Output = %s\n",input, output);
          #endif
              return SUCCESS;
          }
  3. Add code for the version identifier routine (gtm_ac_version) or the verification routine (gtm_ac_verify):

    int gtm_ac_version ()
          {
              return VERSION;
          }
          int gtm_ac_verify (unsigned char type, unsigned char ver)
          {
                  return !(ver == VERSION);
          }
  4. Save and compile polish.c. On x86 GNU/Linux (64-bit Ubuntu 10.10), execute a command like the following:

    gcc -c polish.c -I$gtm_dist
    [Note] Note

    The -I$gtm_dist option includes gtmxc_types.h.

  5. Create a new shared library or add the above routines to an existing one. The following command adds these alternative sequence routines to a shared library called altcoll.so on x86 GNU/Linux (64-bit Ubuntu 10.10).

    gcc -o altcoll.so -shared polish.o
  6. Set $gtm_collate_1 to point to the location of altcoll.so.

  7. At the GTM> prompt execute the following command:

    GTM>Write $SELECT($$set^%GBLDEF("^G",0,1):"OK",1:"FAILED")
          OK

    This deletes the global variable ^G, then sets ^G to the collation sequence number 1 with numeric subscripts collating before strings.

  8. Assign the following value to ^G.

    GTM>Set ^G("du Pont")=1
    GTM>Set ^G("Friendly")=1
    GTM>Set ^G("le Blanc")=1
    GTM>Set ^G("Madrid")=1
  9. See how the subscript of ^G order according to the alternative collation sequence:

    GTM>ZWRite ^G
    ^G("du Pont")=1
    ^G("Friendly")=1
    ^G("le Blanc")=1
    ^G("Madrid")=1

Example of Collating Alphabets in Reverse Order using gtm_ac_xform_1 and gtm_ac_xback_1

This example creates an alternate collation sequence that collates alphabets in reverse order. This is in contrast to the standard M collation that collates alphabets in ascending order.

[Important] Important

No claim of copyright is made with respect to the code used in this example. Please do not use the code as-is in a production environment.

Please ensure that you have a correctly configured GT.M installation, correctly configured environment variables, with appropriate directories and files.

  1. Download col_reverse_32.c from http://tinco.pair.com/bhaskar/gtm/doc/books/pg/UNIX_manual/col_reverse_32.c. It contain code for transformation routine (gtm_ac_xform_1), reverse transformation routine (gtm_ac_xback_1) and version control routines (gtm_ac_version and gtm_ac_verify).

  2. Save and compile col_reverse_32.c. On x86 GNU/Linux (64-bit Ubuntu 10.10), execute a command like the following:

    gcc -c col_reverse_32.c -I$gtm_dist
    [Note] Note

    The -I$gtm_dist option includes gtmxc_types.h.

  3. Create a new shared library or add the routines to an existing one. The following command adds these alternative sequence routines to a shared library called altcoll.so on x86 GNU/Linux (64-bit Ubuntu 10.10).

    gcc -o revcol.so -shared col_reverse_32.o
  4. Set the environment variable gtm_collate_2 to point to the location of revcol.so. To set the local variable collation to this alternative collation sequence, set the environment variable gtm_local_collate to 2.

  5. At the GTM prompt, execute the following command:

    GTM>Write $SELECT($$set^%GBLDEF("^E",0,2):"OK",1:"FAILED")
    OK
  6. Assign the following value to ^E.

    GTM>Set ^E("du Pont")=1
    GTM>Set ^E("Friendly")=1
    GTM>Set ^E("le Blanc")=1
    GTM>Set ^E("Madrid")=1
  7. Notice how the subscript of ^E sort in reverse order:

    GTM>zwrite ^E
    ^G("le Blanc")=1
    ^G("du Pont")=1
    ^G("Madrid")=1
    ^G("Friendly")=1