readFiles

Source processor that consumes files over different protocols.

Keep in mind that flow message processing is concurrent. Also, messages that fail with transient errors will be redelivered at a later time (and out of order) with other concurrent messages. Using sorting as part of the eagerMaxMessagesPerPoll and/or sortBy options will only provide a fuzzy and best effort ordering of the message processing.

Properties

Name Summary

protocol

The FileTransferProtocol to use. Can be one of the following:

  • SFTP

  • FILE (Note: FILE protocol is disabled for security reasons. Must be enabled manually if needed on a specific platform instance.)

host

Host name or IP. Not relevant for FILE protocol.

port

Port to override the default. Not relevant for FILE protocol.

authenticationConfigKey

Secret key that the server uses to look up the credentials needed to perform authentication. Not relevant for FILE protocol.

path

Location from where files are consumed. The last element of the path must be a directory.

pollingFrequencySeconds

Time between the polls in seconds. Some polling strategies might require files to be present in two poll listings to ensure they’re not being modified, so the minimum time for reading a file will be greater than this value. Do not use this parameter for scheduling purposes.

maxMessagesPerPoll

The maximum number of files to fetch per poll. Must be a positive integer. If not set, all available files will be polled.

When polling with the SFTP protocol and without sortBy, a locking mechanism is used that requires files to be present in a minimum of two polls. This has performance advantages in high volume scenarios. However, if this setting is too low, it might result in ineffective polling, because subsequent polls might list different files.

eagerMaxMessagesPerPoll

Whether the poller will only list maxMessagesPerPoll files and do preprocessing (such as sortBy) on the limited list, or list all available files, preprocess the full list, and then select maxMessagesPerPoll files for processing from the top of the processed list. Optional and defaults to true (eager).

The typical use case for switching off eagerness is when you need to process files in maxMessagesPerPoll chunks but want those chunks to be selected from the top of the fully sorted file list.

greedy

Whether to poll again immediately when the last poll returned files instead of waiting for pollingFrequencySeconds. Will continuously poll until there are no more files, then return to waiting for pollingFrequencySeconds between polls. Can be used together with maxMessagesPerPoll to set up the poller to pull a manageable chunk of files, process it, then pull the next chunk immediately. Optional and defaults to false.

moveToFolder

Location to move files to after successful consumption. Must be a directory. If not set, then the files will be deleted.

include

Regex that limits file consumption to files whose names match.

exclude

Regex that excludes files from consumption whose names match. Takes precedence over include.

sortBy

Optional FileSortingStrategy to use when consuming files. This might incur a performance penalty if the amount of files is high. Can be one of the following:

  • NAME_CASE_SENSITIVE_ASC: Sorted by name and case, ascending.

  • NAME_CASE_SENSITIVE_DESC: Sorted by name and case, descending.

  • NAME_CASE_INSENSITIVE_ASC: Sorted by name, ignoring case, ascending.

  • NAME_CASE_INSENSITIVE_DESC: Sorted by name, ignoring case, descending.

  • MODIFIED_ASC: Sorted by when the file was last modified, ascending (oldest first).

  • MODIFIED_DESC: Sorted by when the file was last modified, descending (newest first).

recursive

Whether files in subdirectories below the given path should be consumed, too.

fileChangedCheckIntervalMillis

Time between checks if the file is still being written in milliseconds. For polling with the SFTP protocol and without sortBy, this is the required amount of time since the file was last modified before processing can start.

stepwise

Whether to stepwise into directories while traversing file structures when downloading files. Defaults to false.

Enabling stepwise incurs a performance penalty, since a cd into each directory level will be performed. However, some users can only download files if they use stepwise, while others can only download if they do not. Use the stepwise option to control this behavior as needed.

fastExistsCheck

Whether to check for the existence of (and update to) a file by listing the file itself or the parent folder containing the file. Defaults to true (listing the file).

Even though listing only the file is faster than listing all files in the parent folder, it may not be supported by all SFTP servers.

errorFolder

Location to move files to when they could not be processed successfully. Only applies to flows with an exchange pattern of RequestResponse.

lineSplitMaxBytes

Optional configuration that splits the consumed files by line. Only applies to line-based text files (e.g., CSV files). If the consumed file is larger than the configured size, it will be split into fragments with the configured size. The splitting is line based, meaning the nearest previous line before the size of the fragment file crosses the threshold will be the last line in the fragment file.

If an error occurs while the file consumer is processing the source file (during splitting or delivering fragments), then the source file will be placed in the errorFolder, when configured, or remain in the source folder to be consumed again.

Note that fragments will be delivered immediately during the splitting, so a redelivery of the source file can result in the duplicate delivery of fragments.

copyHeaderOnLineSplit

Whether the header row from the original file should be copied to the fragments. Optional and only relevant when lineSplitMaxBytes is configured. Defaults to false.

splitFileCharacterEncoding

The file encoding to use when the file is to be split. Optional and only relevant when lineSplitMaxBytes is configured. Defaults to UTF-8.

forwardMessagePreprocessingErrors

Whether the file endpoint should treat errors encountered while reading files as regular message processing errors. The file reading can for example fail if the file exceeds the allowed maximum size.

When this feature is active (the default), these kinds of errors will be treated just like any other message processing error happening in the flow pipeline. I.e. they will be subject to regular error handing, and reported as failed messages in the monitoring tools. If the feature is deactivated the files will be rejected without any further processing.

This property is optional, and true by default.

name

Optional, descriptive name for the processor.

id

Required identifier of the processor, unique across all processors within the flow. Must be between 3 and 30 characters long; contain only lower and uppercase alphabetical characters (a-z and A-Z), numbers, dashes ("-"), and underscores ("_"); and start with an alphabetical character. In other words, it adheres to the regex pattern [a-zA-Z][a-zA-Z0-9_-]{2,29}.

exchangeProperties

Optional set of custom properties in a simple jdk-format, that are added to the message exchange properties before processing the incoming payload. Any existing properties with the same name will be replaced by properties defined here.

retainPayloadOnFailure

Whether the incoming payload is available for error processing on failure. Defaults to false.

Sub-builders

Name Summary

externalSystemDetails

Strategy for describing the external system integration. Optional.

messageLoggingStrategy

Strategy for describing how a processor’s message is logged on the server.

payloadArchivingStrategy

Strategy for archiving payloads.

inboundTransformationStrategy

Strategy that customizes the conversion of an incoming payload by a processor (e.g., string to object). Should be used when the processor’s default conversion logic cannot be used.

Details

Authentication

The authenticationConfigKey property supports secrets of type UserNameAndPassword and SshPrivateKey. See the Secret Types documentation for formatting details.