parseExcel
Processor that parses an Excel spreadsheet file (i.e., XLSX and XLS files). This processor is highly configurable but will require almost no configuration to process regular Excel files.
Input
The processor only accepts binary input data in either MS Office Open XML format (2007+, typically suffixed with xlsx
) or the legacy Excel 97-2003 format (BIFF, typically suffixed with xls
).
Note that merged cells are parsed as one cell, so use caution if you use merged cells. If you have to use merged cells, make sure that the same columns are merged in each row.
Output
The output will always be a JSON compliant object. The exact format is based on the outputType
property, which can be one of the following two values:
-
Objects
: Returns a list of objects with the column names as keys.Example:
[{"name": "Liv", "Age": 45}, {"name": "Ben", "Age": 20}]
.Useful if you need to process the data further using processors like
map
anddbStatement
, or if you need to deliver the result as a JSON file with named keys. -
Values
: Returns a list of lists with only the values, no column names.Example:
[["Liv", 45], ["Ben", 20]]
.
Selecting Data Range
Use the worksheetName
property to select a worksheet to use. If omitted, the first worksheet is used. Within the selected worksheet, use cellRange
to define the range to process. Cell range can be a named range (e.g., MyRangeName), closed range (C7:S115), column range (A:F), or open ended range (A1:S). Note that row ranges (e.g., 10:20) are not supported, despite being a legal range in Excel.
Naming and Selecting Columns
If you want to use other names for the columns than the ones in the header row, or there isn’t a header row, you can name the columns using the columns
property. The order of the names must reflect the order of the columns.
Use the selectedColumns
property to define which columns to keep. You can also reorder the columns using the same property. The order that the columns is listed in is the order in which they will be output. The order is only significant when the output type is Values
(i.e., a list of list of values). When the output type is Objects
, the output is a list of maps where the order of the keys is not predictable.
Data Types
How to parse each cell into a Kotlin data type is detected for each cell. If you want to enforce a certain data type per column, use the columnTypes
property.
In practice, Excel uses mostly two data types: numeric and text. Dates are simply numbers plus a formatting pattern. To avoid issues with locale-based differences of date formatting, the parseExcel
processor outputs all dates as type String
in ISO-8601 compliant format without the timezone. For example: "2011-12-03T10:15:30"
. This is, of course, different from the format displayed in Excel.
Numbers are returned as type Double
unless type Long
is specified using the columnTypes
property.
Please note that when using MS Excel, the Format Cell dialogue can show that a cell is formatted as a "Number" even though it’s actually text.
Formulas are ignored and their cached value returned with the respective data type.
Anything else is returned as a String
.
Properties
Name | Summary |
---|---|
|
Required |
|
If |
|
Comma separated list of column names for the data. Escape with a backslash if you need a comma as a part of the name (e.g., |
|
Optional, comma separated list of columns to include in the dataset. Can not be used if excludedColumns is defined. If not set, all columns will be included, unless excludedColumns is defined. Escape with a backslash if you need a comma as a part of the name (e.g., |
|
Optional, comma separated list of columns not to include in the dataset. Can not be used if selectedColumns is defined. If not set, all columns will be included, unless selectedColumns is defined. Escape with a backslash if you need a comma as a part of the name (e.g., |
|
Sets the maximum number of lines to process (including the header, if it’s part of the range). Otherwise, the full range (or full file, if range is open) is processed. Superfluous rows will be ignored. Optional. |
|
Whether to skip lines with no data. Optional and defaults to |
|
The default value to use when a value is null. Default is applied after any other processing of the value, such as expression processing. Optional. |
|
Worksheet name, case-insensitive. Optional. If not set, the first sheet is used. |
|
Optional, comma separated list of column names and their enforced data types. Supported data types include Note that Excel uses only one type for numbers, regardless if they are integers or floating point numbers. Therefore, all numbers are returned as the Kotlin type Another caveat is that Excel doesn’t use |
|
Cell range. The following range types are supported:
The range must be contiguous. It is not possible to combine multiple ranges, even if their union would be contiguous. Note that open-ended ranges (e.g., If If |
|
Optional, descriptive name for the processor. |
|
Required identifier of the processor, unique across all processors within the flow. Must be between 3 and 30 characters long; contain only lower and uppercase alphabetical characters (a-z and A-Z), numbers, dashes ("-"), and underscores ("_"); and start with an alphabetical character. In other words, it adheres to the regex pattern |
|
Optional set of custom properties in a simple jdk-format, that are added to the message exchange properties before processing the incoming payload. Any existing properties with the same name will be replaced by properties defined here. |
|
Whether the incoming payload is available for error processing on failure. Defaults to |
Sub-builders
Name | Summary |
---|---|
Strategy for describing how a processor’s message is logged on the server. |
|
Strategy for archiving payloads. |
|
Strategy that customizes the conversion of an incoming payload by a processor (e.g., string to object). Should be used when the processor’s default conversion logic cannot be used. |