parseHierarchicalCsv
Processor that, like the parseCsv processor, parses flat file input with data columns separated by a delimiter.
Unlike parseCsv
, this processor supports multiple record types in the same file, and enables data from multiple lines to be grouped together in a hierarchy.
Configuration
To use this processor, you must provide configuration describing the record types and their relationships. Specifically, note the following properties:
-
typeIdentifierPosition
: Each row must provide a unique identifier that indicates which record type it contains. The identifier must be in the same column for all record types. This property defines the position of the identifier. It defaults to the first column:1
. -
record
: The processor must be configured with specifications for each record that it will process. The specifications are defined using therecord
sub-builder function. Nested records are added using nestedrecord
sub-builders. -
outputType
: The record data output format. This property is required and can be one of the following values:-
Objects
: Each record is output as a regular JSON compliant object with property names from thecolumns
record config. When using this output type, thecolumns
record property is required. You can also usecolumns
in combination with theselectedColumns
property to select a subset of the columns. -
Values
: Each record is output as a list of values under thevalues
property. When using this output type, thecolumns
record property is optional. You can still use it in combination with theselectedColumns
property to select a subset of the columns and/or reorder them.
-
The processor also takes many more optional properties that can be used to configure the parsing of the input fields. See the documentation on each individual property for more information.
Examples
The following example demonstrates a configuration with a format for listing one or more customers with a nested address object:
parseHierarchicalCsv { id = "parse-records" typeIdentifierPosition = 1 // default outputType = FlatFileOutputType.Objects delimiter = '|' record { type = "HEADER" range = ONE columns = "type, timestamp, sender, version" selectedColumns = "timestamp, sender" } record { type = "CUST" range = ONE_OR_MORE columns = "type, id, name, rating" selectedColumns = "id, name" record { type = "ADDR" range = ONE columns = "type, street, city, state, zip, country" excludedColumns = "type, country" } } }
Given a file with the following content:
HEADER|1683496867|Customer service|1.0 CUST|2239739|Jae Hector|A ADR|436 Amethyst Drive|Michigan|MI|48933|US CUST|4365743|Annika Lyndi|C ADR|4188 Finwood Road|New Brunswick|NJ|08901|US
An outputType
of Objects
produces the following results:
{ "HEADER": { "timestamp": "1683496867", "sender": "Customer service" }, "CUST": [ { "id": "2239739", "name": "Jae Hector", "ADDR": {"street": "436 Amethyst Drive", "city": "Michigan", "state": "MI", "zip": "48933"} }, { "id": "4365743", "name": "Annika Lyndi", "ADDR": {"street": "4188 Finwood Road", "city": "New Brunswick", "state": "NJ", "zip": "08901"} } ] }
Each record type is added using its type ID as the property name. They are nested at the level where they were specified in the record
sub-builders.
When the range
is specified as ZERO_OR_ONE
or ONE
, the record is added as a single object. When the range
is specified as ZERO_OR_MORE
or ONE_OR_MORE
, the record is added as a list of objects.
If the processor is configured with the Values
output type, the output would look like the following:
{ "HEADER": { "values": ["1683496867", "Customer service"] }, "CUST": [ { "values": ["2239739", "Jae Hector"], "ADDR": { "values": ["436 Amethyst Drive", "Michigan", "MI", "48933"] } }, { "values": ["4365743", "Annika Lyndi"], "ADDR": { "values": ["4188 Finwood Road", "New Brunswick", "NJ", "08901"] } } ] }
Input
The processor only accepts binary and char-based input data. Binary data will be converted to char-based data before processing using the characterSet
defined in the inboundTransformationStrategy
. If this is not set, it will default to UTF-8
.
Properties
Name | Summary |
---|---|
|
Adds a top-level record specification. Use sub-builder syntax (e.g., |
|
The position of the column that contains the identifier determining the record type. The identifier must be in the same position in all the rows. Optional and defaults to |
|
The required
|
|
Whether to automatically detect the line separators used in the file. Optional, but you should either set this or the |
|
The sequence of one or two characters that indicates the end of a line. For instance, macOS uses |
|
Defines which character at the beginning of a line denotes that it is a comment, thereby discarding that line from processing. Optional and defaults to |
|
Maximum number of characters allowed per column. If the max number is passed, the whole processing will fail. Optional and defaults to |
|
Sets the maximum number of lines to process. Otherwise, the full file is processed. Superfluous rows will be ignored. Optional. |
|
Whether to skip any whitespace at the start of a line. Optional and defaults to |
|
Whether to skip any whitespace at the end of a line. Optional and defaults to |
|
Whether to treat the bit characters |
|
The default value to use when a value is null. Default is applied after any other processing of the value, such as expression processing. Optional. |
|
The default value to use when a value is empty. Default is applied after any other processing of the value, such as expression processing. Optional. |
|
The character used in the input to separate the data columns. Optional and defaults to a comma ( |
|
Character used for escaping values when the column delimiter is part of the value. Optional and defaults to double quotes ( |
|
Character used for escaping the quote character inside an already escaped value. Defaults to double quotes ( |
|
Defines a second escape character if the quote and escape character are different. In normal cases, though, the quote and escape character are the same. Thus, For example, if the |
|
Whether to retain the escape characters in the output. With the default escapes, the value |
|
Whether to skip lines that have only null, empty, or whitespace values. For example, if |
|
Optional, descriptive name for the processor. |
|
Required identifier of the processor, unique across all processors within the flow. Must be between 3 and 30 characters long; contain only lower and uppercase alphabetical characters (a-z and A-Z), numbers, dashes ("-"), and underscores ("_"); and start with an alphabetical character. In other words, it adheres to the regex pattern |
|
Optional set of custom properties in a simple jdk-format, that are added to the message exchange properties before processing the incoming payload. Any existing properties with the same name will be replaced by properties defined here. |
|
Whether the incoming payload is available for error processing on failure. Defaults to |
Sub-builders
Name | Summary |
---|---|
Strategy for describing how a processor’s message is logged on the server. |
|
Strategy for archiving payloads. |
|
Strategy that customizes the conversion of an incoming payload by a processor (e.g., string to object). Should be used when the processor’s default conversion logic cannot be used. |