AN OVERVIEW ON SYNCSORT
1. OVERVIEW
Syncsort belongs to
Syncsort IncC that can sort data, merge data, selectively process data,
reformat data, create summary records from data and create extensive reports
from input data. It can also perform any combination of the above mentioned
functions and more. This document tries to explain in brief the various items
related to processing data using Syncsort. This document describes a subset of
the Syncsort functions and does not claim to be a replacement nor does the
author guarantee the exactness of details or syntax.
2. SYNCSORT COMMANDS
2.1 SORT The SORT statement can be used sort a dataset
or concatenated datasets. The SORT statement requires the sort sequence for the
data. The list of fields and their formats must be specified for this
statement. The output records are sorted in the specified sequence. The
multiple records contain the same sort sequence key, then the options specified
will determine if the input order is maintained for such records. The format of
the SORT statement is as follows.
SORT FIELDS=({begcol},{length},{fieldtype},{D|A}[,{begcol},
{length},{fieldtype},{D|A}]...)
or
SORT FIELDS=({begcol},{length},{D|A}[,{begcol},{length},
{D|A}]...),{fieldtype}
The beginning column is specified in bytes, starting with 1 for
the first byte. The length of the field must be specified in bytes,
irrespective of the field type. There are many field types. The frequently used
ones are CH for character, BI for binary (COMP fields of COBOL), PD for packed
decimal (COMP-3 fields of COBOL), ZD for zone decimal and AQ for alternate
collating sequence (refer to later section on alternate collating sequence).
The second form of the SORT statement can be used when all the fields specified
for the sort sequence are of the same type.
2.2 MERGE
The MERGE statement can be used merge two or more pre-sorted
datasets. The MERGE statement requires the sort sequence for the data. The list
of fields and their formats must be specified for this statement. The format of
the MERGE statement is as follows.
MERGE FIELDS=({begcol},{length},{fieldtype},{D|A}[,{begcol},
{length},{fieldtype},{D|A}]...)
or
MERGE FIELDS=({begcol},{length},{D|A}[,{begcol},{length},
{D|A}]...),{fieldtype}
Please refer to SORT statement above for description of the
statement.
2.3 INCLUDE AND OMIT
The INCLUDE statement can be used to specify the conditions for
inclusion of records from the input during processing. The OMIT statement can
be used to specify the conditions for exclusion of records from the input
during processing. Both statements cannot be used together. The different
formats of the INCLUDE and OMIT statements are shown below.
INCLUDE COND=({begcol1},{length1},{fldtype1},{comp.oper},
{begcol2},{length2},{fldtype2})
The above statement can be used to compare two fields within the
same record. The valid comparison operators are EQ, NE, GT, LT, LE and GE.
INCLUDE COND=({begcol},{length},{fldtype},{comp.oper}, {constant})
The above statement can be used to compare a field in the record
with a character, decimal or hexadecimal constant.
Using the convention that {cond.stmt} is the part of statement
between the parenthesis in either of the statements mentioned above, compound
statements can be constructed as follows.
INCLUDE COND=({cond.stmt},[{OR|AND},{cond.stmt}]...)
Parenthesis can be used to group conditional statements to form
complex conditions. For OMIT statements replace the INCLUDE verb by OMIT verb
in the above examples.
2.4 INREC AND OUTREC
The INREC statement can be specified to reformat the input records
before SYNCSORT processes them for SORT or MERGE. The OUTREC statement can be
specified to reformat the processed records into the required layout for the
output records or the report that is generated. The format for the INREC and
OUTREC statements are similar.
INREC FIELDS=([{outpos}:]{begcol},{length}
[,[{outpos}:]{begcol},{length}]...)
The Outpos description is optional. This specifies the position in
the output record where the field must be placed. The default is to place it at
the current position in the output record, placing the first field specified at
column 1.
The individual numeric fields can also be reformatted from any
form to zone decimal. An example syntax is given below.
INREC FIELDS=({begcol},{length},{fldtype},EDIT=M#)
or
INREC FIELDS=({begcol},{length},{fldtype},EDIT=SIII,IIT.TT)
Default picture clauses are provided and named M0 to M9.
Customized picture clauses can be specified by using appropriate syntax. In the
example above, the S character is for sign field, I is equivalent of Z PIC
clause of COBOL and T is equivalent of 9 PIC clause. Conventions for the sign
displayed for numeric fields can be specified after the edit parameter, using
SIGNS parameter.
Constant fields can also be introduced in the record. For example,
spaces or zeroes can be placed in the record at specific positions.
NOTE: When using INREC fields, the column positions and lengths in the
SORT, MERGE and SUM statements must reflect the output from the INREC
processing.
2.5 SUM
The SUM statement can be used to summarize data based on the SORT
statement. One record will be produced for each unique key present in the
input. If numeric fields are specified for summation, those fields will be
summed up. The SYNCSORT software does not guarantee unique records if numeric
fields are required to be summed up. Whenever there is an overflow of a numeric
field, more than one record may be created. The syntax is as follows.
SUM FIELDS=NONE
or
SUM
FIELDS=({begcol},{length},{fldtype}[,{begcol},{length}, {fldtype}]...)
The first
format is used when duplicate records need to be removed and no numeric
summation is required. The second format is used when numeric summation is
required when duplicate records exist.
2.6 ALTERNATE COLLATING SEQUENCE
The
alternate collating sequence can be specified using the ALTSEQ statement. This
will help the user in sorting records in a different sequence than the EBCDIC
character set. This may be required in situations where the character codes for
fields, that the user intends to sort on, are not in EBCDIC collating sequence.
ALTSEQ
CODE=({hexcode}{newhexcode}[,{hexcode}{newhexcode}]...)
The new
hex code will be used for the sorting or merging process only for the
appropriate hex code specified for it. The new hex code will not replace the
hex codes in the output or the reports.
2.7 OPTION STATEMENT
The
option statement can be used to control parameters during SORT, MERGE and SUM
processing.
2.7.1 EQUALS AND NOEQUALS
The default
of NOEQUALS specifies that SYNCSORT need not retain the order of input data
when duplicate record keys are found. OPTION EQUALS should be used if the order
of input data must be maintained during the SORT processing.
This
parameter will affect the non-summation data in SUM processing. When EQUALS is
used the data for the non key fields are taken from the first input record for
that key value. When NOEQUALS is specified the data for the non keyfields are
unpredictable.
2.7.2 RECORD
The
record option of the OPTION statement can be used to specify if the input data
to be processed are Variable length records or Fixed length records. This is
required when both the input and output from the SORT or MERGE processing are
VSAM files. The valid values are RECORD=V and RECORD=F.
2.7.3 SKIPREC
The
SKIPREC parameter of the OPTION statement can be used to specify the number of
records of input to skip before any processing should begin. SKIPREC=20
specifies that the first 20 records of the input must be skipped.
2.7.4 STOPAFT
The
STOPAFT parameter of the OPTION statement can be used to specify the number of
records to be included for processing. STOPAFT=100 specifies that SYNCSORT stop
taken any more input after 100 records that match the criteria are selected.
2.7.5
COPY The COPY parameter can be used if a simple COPY operation is
required. If neither SORT processing nor MERGE processing is required this is
ideal to use. The parameter when combined with SKIPREC and STOPAFT helps copy
selected records to output based on number of records. If COPY is combined with
INCLUDE or OMIT condition statements, selected records can be copied to output
based on specific conditions in field values. When COPY is combined with INREC
or OUTREC processing (INREC is more efficient in this case), a reformatted
output can be produced.
2.8 OUTFIL
The
OUTFIL statement can be used to produce multiple output datasets. This
statement must be used if elaborate formatting is required, like producing
reports. One OUTFIL statement is required for each output dataset. The FILE
parameter specifies the DD name suffix to be used for the dataset output.
Each
output dataset can have its own INCLUDE or OMIT condition and its own INREC and
OUTREC parameters. Further, report formatting is available, including 3 levels
of HEADER#, 3 levels of TRAILER#, summation, Section processing and Section
breaks. For example, this statement can help create separate reports for each
department into a different dataset or sysout and route them to the appropriate
destination.
The
SYNCSORT manual should be referred if the OUTFIL statement is required.
3. JCL REQUIREMENTS
The
different DD statements required for the SORT step are as follows.
3.1.1.1 SYSIN It should point to the SYNCSORT control statements mentioned above.
SYSOUT
It should point to a dataset or SYSOUT. This is where the SYNCSORT
messages are placed.
SORTIN
This dataset should point to the input dataset(s) for the sort process.
SORTIN## These
statements should refer to the individual datasets to be merged. These
individual datasets are required to be in pre-sorted order.
SORTOUT This
dataset should point to the dataset where the output must be placed.
SORTOUT# These
datasets should point to the individual output datasets referred in the OUTFIL statements.
SORTOT## The same
as SORTOUT#. There must be a one to one correspondence between the FILES
parameter in the OUTFIL statement and the list of DD statements specified.
SORTWK## These
statements should refer to temporary volumes with appropriate space parameters
depending on the volume of data to be processed.
4. PROCESSING ORDER
The
processing order of the control statements by SYNCSORT is as follows.
1. INCLUDE or OMIT condition statement processing.
2. INREC statement processing.
3. SORT, MERGE or COPY processing (including alternate collating
sequence processing for SORT and MERGE).
4. SUM statement processing.
5. OUTREC processing.
The processing order will drastically change if
OUTFIL statement is present in the SYSIN of SYNCSORT. The processing is very
complex if the OUTFIL statements use different INREC statements and different
INCLUDE or OMIT statements.
No comments:
Post a Comment