parseCsv
Processor that parses a flat file input with data columns separated by a delimiter. This processor is highly configurable but will require almost no configuration to process regular CSV files.
Input
The processor only accepts binary and char-based input data. Binary data will be converted to char-based data before processing using the characterSet
defined in the inboundTransformationStrategy
. If this is not set, it will default to UTF-8
.
Output
The output will always be a JSON compliant object. The exact format is based on the outputType
property, which can be one of the following two values:
-
Objects
: Returns a list of objects with the column names as keys.Example:
[{"name": "Liv", "Age": 45}, {"name": "Ben", "Age": 20}]
.Useful if you need to process the data further using processors like
map
anddbStatement
, or if you need to deliver the result as a JSON file with named keys. -
Values
: Returns a list of lists with only the values, no column names.Example:
[["Liv", 45], ["Ben", 20]]
.
Naming and Selecting Columns
If you want to use other names for the columns than the ones in the header row, or there isn’t a header row, you can name the columns using the columns
property. The order of the names must reflect the order of the columns.
Use the selectedColumns
property to define which columns to keep. You can also reorder the columns using the same property. The order that the columns is listed in is the order in which they will be output. The order is only significant when the output type is Values
(i.e., a list of list of values). When the output type is Objects
, the output is a list of maps where the order of the keys is not predictable.
Properties
Name | Summary |
---|---|
|
Required |
|
If |
|
Comma separated list of column names for the data. Escape with a backslash if you need a comma as a part of the name (e.g., |
|
Optional, comma separated list of columns to include in the dataset. Can not be used if excludedColumns is defined. If not set, all columns will be included, unless excludedColumns is defined. Escape with a backslash if you need a comma as a part of the name (e.g., |
|
Optional, comma separated list of columns not to include in the dataset. Can not be used if selectedColumns is defined. If not set, all columns will be included, unless selectedColumns is defined. Escape with a backslash if you need a comma as a part of the name (e.g., |
|
Automatically detect the line separators used in the file. Optional, but you should either set this or the |
|
The sequence of one or two characters that indicates the end of a line. For instance, macOS uses |
|
Defines which character at the beginning of a line denotes that it is a comment, thereby discarding that line from processing. Optional and defaults to |
|
Maximum number of characters allowed per column. If the max number is passed, the whole processing will fail. Optional and defaults to |
|
Sets the maximum number of lines to process. Otherwise, the full file is processed. Superfluous rows will be ignored. Optional. |
|
Whether to skip lines with no data. Optional and defaults to |
|
Whether to skip any whitespace at the start of a line. Optional and defaults to |
|
Whether to skip any whitespace at the end of a line. Optional and defaults to |
|
Whether to treat the bit characters |
|
The default value to use when a value is null. Default is applied after any other processing of the value, such as expression processing. Optional. |
|
The default value to use when a value is empty. Default is applied after any other processing of the value, such as expression processing. Optional. |
|
Whether to perform a best effort to detect line separators, column separators, quotes, and quote escapes automatically. Works for many formats and can, for instance, be practical in cases where a processor has to process files with multiple formats. Make sure to properly test that this option works for you before using it in production. Optional and defaults to |
|
The character used in the input to separate the data columns. Optional and defaults to a comma ( |
|
Character used for escaping values when the column delimiter is part of the value. Optional and defaults to double quotes ( |
|
Character used for escaping the quote character inside an already escaped value. Defaults to double quotes ( |
|
Defines a second escape character if the quote and escape character are different. In normal cases, though, the quote and escape character are the same. Thus, For example, if the |
|
Whether to retain the escape characters in the output. With the default escapes, the value |
|
Whether to skip lines that have only null, empty, or whitespace values. For example, if |
|
Optional, descriptive name for the processor. |
|
Required identifier of the processor, unique across all processors within the flow. Must be between 3 and 30 characters long; contain only lower and uppercase alphabetical characters (a-z and A-Z), numbers, dashes ("-"), and underscores ("_"); and start with an alphabetical character. In other words, it adheres to the regex pattern |
|
Optional set of custom properties in a simple jdk-format, that are added to the message exchange properties before processing the incoming payload. Any existing properties with the same name will be replaced by properties defined here. |
|
Whether the incoming payload is available for error processing on failure. Defaults to |
Sub-builders
Name | Summary |
---|---|
Strategy for describing how a processor’s message is logged on the server. |
|
Strategy for archiving payloads. |
|
Strategy that customizes the conversion of an incoming payload by a processor (e.g., string to object). Should be used when the processor’s default conversion logic cannot be used. |