writeToArrow
Processor for writing data in the Arrow format using Arrow Flight RPC protocol.
This processor requires a schema to be provided via a reference schemaId
. The schema must be formatted as a JSON document, detailed in the Arrow Schema section.
For reference see Arrow data format Version 1.4 and schema specification.
This processor consumes data in JSON Compliant format (JSON objects and JSON lists for batch writes) or payloads that can be converted to JSON Compliant format. When necessary, the processor will convert the data to respective Arrow data types.
Data conversion has the following specifics:
-
All conversions to byte arrays are done via intermediate conversion to strings.
-
During conversion to numeric data types, overflows are not allowed. Large data types can only be converted to smaller types if they can hold the exact same numeric values.
-
Rounding of floating-point numbers during conversion is not permitted.
-
When converting to bit, numeric representation of 0 (integer or string) or false results in unsetting the bit. Any other values (numerics other than 0 or true) will set the bit.
-
To set Arrow Date type, the incoming value must be string or integer. When using a string representation of a date, the "yyyy-MM-dd" format should be followed (also known as "full-time" format in RFC 3339: Date and Time on the Internet: Timestamps).
-
To set Arrow Time type incoming value must be string or integer. When using a string representation of a time, the "HH:MM:SS.n" format should be followed (also known as "partial-time" format in RFC 3339: Date and Time on the Internet: Timestamps).
-
To set Arrow Timestamp type incoming value must be string or integer. When using a string representation of a timestamp, the "yyyy-MM-dd’T’HH:mm:ss.n" pattern should be followed. This pattern is a combination of "full-time" and "partial-time" formats from RFC 3339: Date and Time on the Internet: Timestamps. As a fixed epoch for Arrow Timestamp Unix epoch is used. Time zone offsets and time zone postfixes within the incoming payload are not supported. Time zone configuration relies on the schema. If the relevant field in the schema doesn’t specify a time zone, UTC is used, and the value is calculated accordingly. If a time zone is specified, it’s taken into consideration when calculating the timestamp value.
-
To set Arrow Duration type incoming value must be string or integer. When using a string representation of a duration, the ISO-8601 based "PnDTnHnMn.nS" pattern should be followed. Period values (year, month, week) except days are not allowed because they cannot be converted without approximation (e.g., there’s no fixed number of days in a month).
Currently supported Arrow Data Types:
-
Null
-
TinyInt, SmallInt, Int, BigInt
-
UInt1, UInt2, UInt4, UInt8
-
Float2, Float4, Float8
-
Decimal, Decimal256
-
Bit
-
Varbinary, LargeVarbinary, FixedSizeBinary
-
Varchar, LargeVarchar
-
DateDay, DateMilli
Properties
Name | Summary |
---|---|
|
URI of the Flight RPC service to call. Currently only |
|
Path to data in Arrow instance, typically points to a table in a database in |
|
ID of the Arrow schema following the pattern Note that metadata with specified keys ("database_name" and "record_name") must be present when communicating with Anybase. Required. |
|
Whether to fail processing if an unknown field is encountered in the payload. Defaults to |
|
Whether to retain the payload after writing it to Arrow instance, making it available for processing in downstream processors. Defaults to |
|
The timeout of the HTTP client socket connection. Optional. Configuring of connection timeout is not supported in the current version. |
|
The socket timeout to wait for the first byte of response from the server. Optional. |
|
Key from the server configuration used to look up the credentials needed to connect to Arrow instance. Optional and uses no authentication by default. |
|
Optional, descriptive name for the processor. |
|
Required identifier of the processor, unique across all processors within the flow. Must be between 3 and 30 characters long; contain only lower and uppercase alphabetical characters (a-z and A-Z), numbers, dashes ("-"), and underscores ("_"); and start with an alphabetical character. In other words, it adheres to the regex pattern |
|
Optional set of custom properties in a simple jdk-format, that are added to the message exchange properties before processing the incoming payload. Any existing properties with the same name will be replaced by properties defined here. |
|
Whether the incoming payload is available for error processing on failure. Defaults to |
Sub-builders
Name | Summary |
---|---|
Strategy for describing the external system integration. Optional. |
|
Strategy for providing message processing hints to the server. The |
|
Strategy for configuring the processor’s circuit breaker. Optional. |
|
Strategy for describing how a processor’s message is logged on the server. |
|
Strategy for archiving payloads. |
|
Strategy that customizes the conversion of an incoming payload by a processor (e.g., string to object). Should be used when the processor’s default conversion logic cannot be used. |