parseHierarchicalCsv
Processor that, like the parseCsv processor, parses flat file input with data columns separated by a delimiter.
Unlike  parseCsv, this processor supports multiple record types in the same file, and enables data from multiple lines to be grouped together in a hierarchy.
Configuration
To use this processor, you must provide configuration describing the record types and their relationships. Specifically, note the following properties:
- 
typeIdentifierPosition: Each row must provide a unique identifier that indicates which record type it contains. The identifier must be in the same column for all record types. This property defines the position of the identifier. It defaults to the first column:1. - 
record: The processor must be configured with specifications for each record that it will process. The specifications are defined using therecordsub-builder function. Nested records are added using nestedrecordsub-builders. - 
outputType: The record data output format. This property is required and can be one of the following values:- 
Objects: Each record is output as a regular JSON compliant object with property names from thecolumnsrecord config. When using this output type, thecolumnsrecord property is required. You can also usecolumnsin combination with theselectedColumnsproperty to select a subset of the columns. - 
Values: Each record is output as a list of values under thevaluesproperty. When using this output type, thecolumnsrecord property is optional. You can still use it in combination with theselectedColumnsproperty to select a subset of the columns and/or reorder them. 
 - 
 
The processor also takes many more optional properties that can be used to configure the parsing of the input fields. See the documentation on each individual property for more information.
Examples
The following example demonstrates a configuration with a format for listing one or more customers with a nested address object:
parseHierarchicalCsv {
    id = "parse-records"
    typeIdentifierPosition = 1 // default
    outputType = FlatFileOutputType.Objects
    delimiter = '|'
    record {
        type = "HEADER"
        range = ONE
        columns = "type, timestamp, sender, version"
        selectedColumns = "timestamp, sender"
    }
    record {
        type = "CUST"
        range = ONE_OR_MORE
        columns = "type, id, name, rating"
        selectedColumns = "id, name"
        record {
            type = "ADDR"
            range = ONE
            columns = "type, street, city, state, zip, country"
            excludedColumns = "type, country"
        }
    }
}
Given a file with the following content:
HEADER|1683496867|Customer service|1.0 CUST|2239739|Jae Hector|A ADR|436 Amethyst Drive|Michigan|MI|48933|US CUST|4365743|Annika Lyndi|C ADR|4188 Finwood Road|New Brunswick|NJ|08901|US
An  outputType of  Objects produces the following results:
{
  "HEADER": {
    "timestamp": "1683496867",
    "sender": "Customer service"
  },
  "CUST": [
    {
      "id": "2239739",
      "name": "Jae Hector",
      "ADDR": {"street": "436 Amethyst Drive", "city": "Michigan", "state": "MI", "zip": "48933"}
    },
    {
      "id": "4365743",
      "name": "Annika Lyndi",
      "ADDR": {"street": "4188 Finwood Road", "city": "New Brunswick", "state": "NJ", "zip": "08901"}
    }
  ]
}
Each record type is added using its type ID as the property name. They are nested at the level where they were specified in the  record sub-builders.
When the  range is specified as  ZERO_OR_ONE or  ONE, the record is added as a single object. When the  range is specified as  ZERO_OR_MORE or  ONE_OR_MORE, the record is added as a list of objects.
If the processor is configured with the  Values output type, the output would look like the following:
{
  "HEADER": {
    "values": ["1683496867", "Customer service"]
  },
  "CUST": [
    {
      "values": ["2239739", "Jae Hector"],
      "ADDR": {
        "values": ["436 Amethyst Drive", "Michigan", "MI", "48933"]
      }
    },
    {
      "values": ["4365743", "Annika Lyndi"],
      "ADDR": {
        "values": ["4188 Finwood Road", "New Brunswick", "NJ", "08901"]
      }
    }
  ]
}
Input
The processor only accepts binary and char-based input data. Binary data will be converted to char-based data before processing using the  characterSet defined in the  inboundTransformationStrategy. If this is not set, it will default to  UTF-8.
Properties
| Name | Summary | 
|---|---|
  | 
Adds a top-level record specification. Use sub-builder syntax (e.g.,    | 
  | 
The position of the column that contains the identifier determining the record type. The identifier must be in the same position in all the rows. Optional and defaults to    | 
  | 
The required   
  | 
  | 
Whether to automatically detect the line separators used in the file. Optional, but you should either set this or the   | 
  | 
The sequence of one or two characters that indicates the end of a line. For instance, macOS uses    | 
  | 
Defines which character at the beginning of a line denotes that it is a comment, thereby discarding that line from processing. Optional and defaults to    | 
  | 
Maximum number of columns a line can have. If the max number is passed, the whole processing will fail. This setting is needed as a safety measure to guard against running out of memory if receiving malformed data. Optional and defaults to    | 
  | 
Maximum number of characters allowed per column. If the max number is passed, the whole processing will fail. Optional and defaults to    | 
  | 
Sets the maximum number of lines to process. Otherwise, the full file is processed. Superfluous rows will be ignored. Optional.  | 
  | 
Whether to skip any whitespace at the start of a line. Optional and defaults to    | 
  | 
Whether to skip any whitespace at the end of a line. Optional and defaults to    | 
  | 
Whether to treat the bit characters    | 
  | 
The default value to use when a value is null. Default is applied after any other processing of the value, such as expression processing. Optional.  | 
  | 
The default value to use when a value is empty. Default is applied after any other processing of the value, such as expression processing. Optional.  | 
  | 
The character used in the input to separate the data columns. Optional and defaults to a comma (   | 
  | 
Character used for escaping values when the column delimiter is part of the value. Optional and defaults to double quotes (   | 
  | 
Character used for escaping the quote character inside an already escaped value. Defaults to double quotes (   | 
  | 
Defines a second escape character if the quote and escape character are different. In normal cases, though, the quote and escape character are the same. Thus,   For example, if the    | 
  | 
Whether to retain the escape characters in the output. With the default escapes, the value    | 
  | 
Whether to skip lines that have only null, empty, or whitespace values. For example, if    | 
  | 
Optional, descriptive name for the processor.  | 
  | 
Required identifier of the processor, unique across all processors within the flow. Must be between 3 and 30 characters long; contain only lower and uppercase alphabetical characters (a-z and A-Z), numbers, dashes ("-"), and underscores ("_"); and start with an alphabetical character. In other words, it adheres to the regex pattern    | 
  | 
Optional set of custom properties in a simple jdk-format, that are added to the message exchange properties before processing the incoming payload. Any existing properties with the same name will be replaced by properties defined here.  | 
  | 
Whether the incoming payload is available for error processing on failure. Defaults to    | 
Sub-builders
| Name | Summary | 
|---|---|
Strategy for describing how a processor’s message is logged on the server.  | 
|
Strategy for archiving payloads.  | 
|
Strategy that customizes the conversion of an incoming payload by a processor (e.g., string to object). Should be used when the processor’s default conversion logic cannot be used.  |