parseCsv

Processor that parses a flat file input with data columns separated by a delimiter. This processor is highly configurable but will require almost no configuration to process regular CSV files.

Input

The processor only accepts binary and char-based input data. Binary data will be converted to char-based data before processing using the characterSet defined in the inboundTransformationStrategy. If this is not set, it will default to UTF-8.

Output

The output will always be a JSON compliant object. The exact format is based on the outputType property, which can be one of the following two values:

  1. Objects: Returns a list of objects with the column names as keys.

    Example: [{"name": "Liv", "Age": 45}, {"name": "Ben", "Age": 20}].

    Useful if you need to process the data further using processors like map and dbStatement, or if you need to deliver the result as a JSON file with named keys.

  2. Values: Returns a list of lists with only the values, no column names.

    Example: [["Liv", 45], ["Ben", 20]].

Naming and Selecting Columns

If you want to use other names for the columns than the ones in the header row, or there isn’t a header row, you can name the columns using the columns property. The order of the names must reflect the order of the columns.

Use the selectedColumns property to define which columns to keep. You can also reorder the columns using the same property. The order that the columns is listed in is the order in which they will be output. The order is only significant when the output type is Values (i.e., a list of list of values). When the output type is Objects, the output is a list of maps where the order of the keys is not predictable.

Properties

Name Summary

outputType

Required FlatFileOutputType that must be set to either Objects or Values. If Objects, the output is a list of objects with the header names as keys. For example: [{"name": "Liv", "Age": 45}, {"name": "Ben", "Age": 20}]. If Values, the output is a list of value-lists. For example: [["Liv", 45], ["Ben", 20]].

hasHeaderRow

If true, the first row will not be included in the dataset but used to extract the column names of the data. Optional and defaults to false. However, either this or columns must be set when the outputType is Objects.

columns

Comma separated list of column names for the data. Escape with a backslash if you need a comma as a part of the name (e.g., \,). If both this and hasHeaderRow are set, then the names defined here take precedence over the names in the file header. Optional, but either this or hasHeaderRow must be set when the outputType is Objects.

selectedColumns

Optional, comma separated list of columns to include in the dataset. Can not be used if excludedColumns is defined. If not set, all columns will be included, unless excludedColumns is defined. Escape with a backslash if you need a comma as a part of the name (e.g., \,). Non-selected columns will be discarded. Selected columns can also be rearranged by specifying the preferred order here. For example, given columns "a, b, c, d", specifying "d, b" would only include those two columns, in that order, in the output.

excludedColumns

Optional, comma separated list of columns not to include in the dataset. Can not be used if selectedColumns is defined. If not set, all columns will be included, unless selectedColumns is defined. Escape with a backslash if you need a comma as a part of the name (e.g., \,). Excluded columns will be discarded. For example, given columns "a, b, c, d", specifying "b, d" would filter out those columns, and only include column a and c.

detectLineSeparators

Automatically detect the line separators used in the file. Optional, but you should either set this or the lineSeparator property explicitly.

lineSeparator

The sequence of one or two characters that indicates the end of a line. For instance, macOS uses \r, Windows \r\n, and Unix \n. Optional and defaults to the line separator of the integration server. However, you should either set the line separator explicitly here or enable the detectLineSeparators property.

commentCharacter

Defines which character at the beginning of a line denotes that it is a comment, thereby discarding that line from processing. Optional and defaults to #.

maxNoOfCharactersPerColumn

Maximum number of characters allowed per column. If the max number is passed, the whole processing will fail. Optional and defaults to 4096.

maxNoOfLinesToRead

Sets the maximum number of lines to process. Otherwise, the full file is processed. Superfluous rows will be ignored. Optional.

skipEmptyLines

Whether to skip lines with no data. Optional and defaults to true.

ignoreLeadingWhitespaces

Whether to skip any whitespace at the start of a line. Optional and defaults to true.

ignoreTrailingWhitespaces

Whether to skip any whitespace at the end of a line. Optional and defaults to true.

treatBitsAsWhitespace

Whether to treat the bit characters \0 and \1 as whitespace when skipping. Can be unset for special cases (such as certain types of database dumps) where these characters are significant. Optional and defaults to true.

defaultForNull

The default value to use when a value is null. Default is applied after any other processing of the value, such as expression processing. Optional.

defaultForEmpty

The default value to use when a value is empty. Default is applied after any other processing of the value, such as expression processing. Optional.

detectFormatAutomatically

Whether to perform a best effort to detect line separators, column separators, quotes, and quote escapes automatically. Works for many formats and can, for instance, be practical in cases where a processor has to process files with multiple formats. Make sure to properly test that this option works for you before using it in production. Optional and defaults to false.

delimiter

The character used in the input to separate the data columns. Optional and defaults to a comma ( ,).

quote

Character used for escaping values when the column delimiter is part of the value. Optional and defaults to double quotes ( "). For example, the value "a, b" is parsed as a, b and not as two columns.

quoteEscape

Character used for escaping the quote character inside an already escaped value. Defaults to double quotes ( "). For example, the value "" a , b "" is parsed as " a , b " with the inner quotes preserved.

quoteEscapeEscape

Defines a second escape character if the quote and escape character are different. In normal cases, though, the quote and escape character are the same. Thus, " is escaped as "", "" escaped as """", etc. Otherwise, you will need a second escape character to escape the quote character inside an already escaped value.

For example, if the quote is ", and quoteEscape and quoteEscapeEscape are both \, then the value \\" a , b \\" is parsed as \" a , b \". Optional and defaults to \0.

keepEscapes

Whether to retain the escape characters in the output. With the default escapes, the value ""hi""! would be output as "hi"!. If keepEscapes is enabled, however, the output would be ""hi""!. Optional and defaults to false.

skipLinesWithEmptyValues

Whether to skip lines that have only null, empty, or whitespace values. For example, if true, the line ,,, , , will be skipped. Optional and defaults to false.

name

Optional, descriptive name for the processor.

id

Required identifier of the processor, unique across all processors within the flow. Must be between 3 and 30 characters long; contain only lower and uppercase alphabetical characters (a-z and A-Z), numbers, dashes ("-"), and underscores ("_"); and start with an alphabetical character. In other words, it adheres to the regex pattern [a-zA-Z][a-zA-Z0-9_-]{2,29}.

exchangeProperties

Optional set of custom properties in a simple jdk-format, that are added to the message exchange properties before processing the incoming payload. Any existing properties with the same name will be replaced by properties defined here.

retainPayloadOnFailure

Whether the incoming payload is available for error processing on failure. Defaults to false.

Sub-builders

Name Summary

messageLoggingStrategy

Strategy for describing how a processor’s message is logged on the server.

payloadArchivingStrategy

Strategy for archiving payloads.

inboundTransformationStrategy

Strategy that customizes the conversion of an incoming payload by a processor (e.g., string to object). Should be used when the processor’s default conversion logic cannot be used.