parseHierarchicalCsv

Processor that, like the parseCsv processor, parses flat file input with data columns separated by a delimiter.

Unlike parseCsv, this processor supports multiple record types in the same file, and enables data from multiple lines to be grouped together in a hierarchy.

Configuration

To use this processor, you must provide configuration describing the record types and their relationships. Specifically, note the following properties:

  • typeIdentifierPosition: Each row must provide a unique identifier that indicates which record type it contains. The identifier must be in the same column for all record types. This property defines the position of the identifier. It defaults to the first column: 1.

  • record: The processor must be configured with specifications for each record that it will process. The specifications are defined using the record sub-builder function. Nested records are added using nested record sub-builders.

  • outputType: The record data output format. This property is required and can be one of the following values:

    • Objects: Each record is output as a regular JSON compliant object with property names from the columns record config. When using this output type, the columns record property is required. You can also use columns in combination with the selectedColumns property to select a subset of the columns.

    • Values: Each record is output as a list of values under the values property. When using this output type, the columns record property is optional. You can still use it in combination with the selectedColumns property to select a subset of the columns and/or reorder them.

The processor also takes many more optional properties that can be used to configure the parsing of the input fields. See the documentation on each individual property for more information.

Examples

The following example demonstrates a configuration with a format for listing one or more customers with a nested address object:

parseHierarchicalCsv {
    id = "parse-records"

    typeIdentifierPosition = 1 // default
    outputType = FlatFileOutputType.Objects
    delimiter = '|'

    record {
        type = "HEADER"
        range = ONE
        columns = "type, timestamp, sender, version"
        selectedColumns = "timestamp, sender"
    }

    record {
        type = "CUST"
        range = ONE_OR_MORE
        columns = "type, id, name, rating"
        selectedColumns = "id, name"

        record {
            type = "ADDR"
            range = ONE
            columns = "type, street, city, state, zip, country"
            excludedColumns = "type, country"
        }
    }
}

Given a file with the following content:

HEADER|1683496867|Customer service|1.0
CUST|2239739|Jae Hector|A
ADR|436 Amethyst Drive|Michigan|MI|48933|US
CUST|4365743|Annika Lyndi|C
ADR|4188 Finwood Road|New Brunswick|NJ|08901|US

An outputType of Objects produces the following results:

{
  "HEADER": {
    "timestamp": "1683496867",
    "sender": "Customer service"
  },
  "CUST": [
    {
      "id": "2239739",
      "name": "Jae Hector",
      "ADDR": {"street": "436 Amethyst Drive", "city": "Michigan", "state": "MI", "zip": "48933"}
    },
    {
      "id": "4365743",
      "name": "Annika Lyndi",
      "ADDR": {"street": "4188 Finwood Road", "city": "New Brunswick", "state": "NJ", "zip": "08901"}
    }
  ]
}

Each record type is added using its type ID as the property name. They are nested at the level where they were specified in the record sub-builders.

When the range is specified as ZERO_OR_ONE or ONE, the record is added as a single object. When the range is specified as ZERO_OR_MORE or ONE_OR_MORE, the record is added as a list of objects.

If the processor is configured with the Values output type, the output would look like the following:

{
  "HEADER": {
    "values": ["1683496867", "Customer service"]
  },
  "CUST": [
    {
      "values": ["2239739", "Jae Hector"],
      "ADDR": {
        "values": ["436 Amethyst Drive", "Michigan", "MI", "48933"]
      }
    },
    {
      "values": ["4365743", "Annika Lyndi"],
      "ADDR": {
        "values": ["4188 Finwood Road", "New Brunswick", "NJ", "08901"]
      }
    }
  ]
}

Input

The processor only accepts binary and char-based input data. Binary data will be converted to char-based data before processing using the characterSet defined in the inboundTransformationStrategy. If this is not set, it will default to UTF-8.

Properties

Name Summary

record()

Adds a top-level record specification. Use sub-builder syntax (e.g., record { ... }) to configure records. At least one record specification is required. Nested records are added using nested record sub-builders.

typeIdentifierPosition

The position of the column that contains the identifier determining the record type. The identifier must be in the same position in all the rows. Optional and defaults to 1 (i.e., the first column).

outputType

The required FlatFileOutputType. The following types are supported:

  • Objects: Each record is output as a regular JSON compliant object with property names from the columns record config. When using this output type, the columns record property is required. You can also use columns in combination with the selectedColumns property to select a subset of the columns.

  • Values: Each record is output as a list of values under the values property. When using this output type, the columns record property is optional. You can still use it in combination with the selectedColumns property to select a subset of the columns and/or reorder them.

detectLineSeparators

Whether to automatically detect the line separators used in the file. Optional, but you should either set this or the lineSeparator property explicitly.

lineSeparator

The sequence of one or two characters that indicates the end of a line. For instance, macOS uses \r, Windows \r\n, and Unix \n. Optional and defaults to the line separator of the integration server. However, you should either set the line separator explicitly here or enable the detectLineSeparators property.

commentCharacter

Defines which character at the beginning of a line denotes that it is a comment, thereby discarding that line from processing. Optional and defaults to #.

maxNoOfCharactersPerColumn

Maximum number of characters allowed per column. If the max number is passed, the whole processing will fail. Optional and defaults to 4096.

maxNoOfLinesToRead

Sets the maximum number of lines to process. Otherwise, the full file is processed. Superfluous rows will be ignored. Optional.

ignoreLeadingWhitespaces

Whether to skip any whitespace at the start of a line. Optional and defaults to true.

ignoreTrailingWhitespaces

Whether to skip any whitespace at the end of a line. Optional and defaults to true.

treatBitsAsWhitespace

Whether to treat the bit characters \0 and \1 as whitespace when skipping. Can be unset for special cases (such as certain types of database dumps) where these characters are significant. Optional and defaults to true.

defaultForNull

The default value to use when a value is null. Default is applied after any other processing of the value, such as expression processing. Optional.

defaultForEmpty

The default value to use when a value is empty. Default is applied after any other processing of the value, such as expression processing. Optional.

delimiter

The character used in the input to separate the data columns. Optional and defaults to a comma ( ,).

quote

Character used for escaping values when the column delimiter is part of the value. Optional and defaults to double quotes ( "). For example, the value "a, b" is parsed as a, b and not as two columns.

quoteEscape

Character used for escaping the quote character inside an already escaped value. Defaults to double quotes ( "). For example, the value "" a , b "" is parsed as " a , b " with the inner quotes preserved.

quoteEscapeEscape

Defines a second escape character if the quote and escape character are different. In normal cases, though, the quote and escape character are the same. Thus, " is escaped as "", "" escaped as """", etc. Otherwise, you will need a second escape character to escape the quote character inside an already escaped value.

For example, if the quote is ", and quoteEscape and quoteEscapeEscape are both \, then the value \\" a , b \\" is parsed as \" a , b \". Optional and defaults to \0.

keepEscapes

Whether to retain the escape characters in the output. With the default escapes, the value ""hi""! would be output as "hi"!. If keepEscapes is enabled, however, the output would be ""hi""!. Optional and defaults to false.

skipLinesWithEmptyValues

Whether to skip lines that have only null, empty, or whitespace values. For example, if true, the line ,,, , , will be skipped. Optional and defaults to false.

name

Optional, descriptive name for the processor.

id

Required identifier of the processor, unique across all processors within the flow. Must be between 3 and 30 characters long; contain only lower and uppercase alphabetical characters (a-z and A-Z), numbers, dashes ("-"), and underscores ("_"); and start with an alphabetical character. In other words, it adheres to the regex pattern [a-zA-Z][a-zA-Z0-9_-]{2,29}.

exchangeProperties

Optional set of custom properties in a simple jdk-format, that are added to the message exchange properties before processing the incoming payload. Any existing properties with the same name will be replaced by properties defined here.

retainPayloadOnFailure

Whether the incoming payload is available for error processing on failure. Defaults to false.

Sub-builders

Name Summary

messageLoggingStrategy

Strategy for describing how a processor’s message is logged on the server.

payloadArchivingStrategy

Strategy for archiving payloads.

inboundTransformationStrategy

Strategy that customizes the conversion of an incoming payload by a processor (e.g., string to object). Should be used when the processor’s default conversion logic cannot be used.