receiveFromAwsS3

Source processor that polls an AWS S3 bucket according to the specified poll configuration.

All objects found in the bucket are processed, unless they match an optional prefix. Objects selected for processing will be deleted after being processed or optionally moved to a backup bucket.

To poll S3 objects from an AWS S3 API, an example of minimum required processor definition is:

receiveFromAwsS3 {
  id = "AwsS3_receiveFromAwsS3"
  bucketName = "my-bucket"
  authenticationConfigKey = "awsAuthKey"
  pollFixedDelayMillis = 500
}

To poll S3 objects from a S3 API other than AWS (e.g. MinIO), an endpoint url override is utilized:

receiveFromAwsS3 {
  id = "AwsS3_receiveFromAwsS3"
  endpointUrl = "https://s3.us-west-2.minio.com"
  bucketName = "my-bucket"
  authenticationConfigKey = "awsAuthKey"
  pollFixedDelayMillis = 500
}

Note: The endpointUrl field should specify a base URL, e.g:

https://s3.us-west-2.minio.com

The processor does NOT support Virtual-Hosted–Style URLs (containing the bucket name in the domain), like the following:

https://my-bucket.s3.us-west-2.minio.com

neither it supports URLs containing bucket name and/or object key prefix as path parameters, like the following:

https://s3.us-west-2.minio.com/my-bucket/folder/pref-

Properties

Name Summary

forwardProperty()

Adds a property from the object’s metadata to a list to convert into message exchange properties. If an exchange property with the same name already exists, it will be replaced.

bucketName

The storage bucket name the object(s) will be read from. Required.

authenticationConfigKey

A secret key that the server uses to look up the credentials needed to perform the AWS authentication. Required.

endpointUrl

Override base URL of the AWS S3 endpoint. Optional. The base URL is automatically detected when interacting with AWS S3, so it’s not necessary to configure it in most of the cases. However, it could be relevant in the following situations:

  1. When using AWS PrivateLink to access S3 within Virtual Private Cloud (VPC) without traversing the public internet, a specific endpoint might be used for that PrivateLink connection.

  2. For applications that require Federal Information Processing Standards (FIPS) compliance, AWS provides FIPS endpoints, which use FIPS 140-2 validated cryptographic modules. These endpoints are different from the standard regional endpoints.

  3. When using S3 in AWS Local Zones or AWS Outposts, specific endpoints associated with those local deployments might be configured.

It is required when running against S3 API services other than AWS, for example MinIO.

Important: Endpoint URL must not specify bucket name as path parameter. Instead, the required bucketName config field must be used. Virtual-hosted-style URLs (i.e. domain containing bucket reference) are not supported. Instead, path-style URLs are used, that require the bucketName field configured.

region

Region name of the bucket. Optional.

retryMaxAttempts

Optional, maximum number of retry attempts the S3 client can perform before failing the request. This option only affects the producer and not the regular flow redelivery mechanism. Defaults to 0, meaning the processor will not perform any retries (though the flow source will still perform retries if configured).

retryInitialDelayMillis

The delay before performing the first retry after the original request has failed. Optional.

retryMaxDelayMillis

Optional, maximum time to wait between retries. An exponential mechanism is used to calculate the next delay between retries.

pollFixedRateMillis

The fixed rate poll interval (in milliseconds) to use when polling the AWS S3 bucket. AWS S3 will be polled immediately on startup and then precisely at the rate specified. This, pollFixedDelayMillis, or pollCronExpression must be set.

pollFixedDelayMillis

The fixed delay poll interval (in milliseconds) to use when polling the AWS S3 bucket. AWS S3 will be polled immediately on startup and then with the given delay added after each processing has completed. This, pollFixedRateMillis, or pollCronExpression must be set.

pollCronExpression

The poll interval (written as a Quartz Cron Trigger) to use when polling the AWS S3 bucket. The polling will happen at the specified times described by the cron expression. This, pollFixedRateMillis, or pollFixedDelayMillis must be set.

maxObjectsPerPoll

Max objects to return per bucket listing command. If more than maxObjectsPerPoll objects are available, the AWS S3 consumer will perform a new bucket listing until all objects are consumed. AWS has an upper (service) limit of 1000. The response might contain fewer keys but will never contain more. Optional and defaults to 100.

prefix

The object key prefix used when processing objects in the bucket. Optional. If not set, all objects in the bucket will be processed.

moveToBucketName

The bucket name to be used for backing up processed objects. Optional. If not set, the processed object will not be moved (only deleted).

connectionTimeoutMillis

The timeout of the HTTP client socket connection. Optional.

receiveTimeoutMillis

The socket timeout to wait for the first byte of response from the server. Optional.

sslAuthenticationConfigKey

Key used to look up secret with SSL credentials for server and client authentication. Parameter allowCertificateAssociatedWithWrongHost is ignored on this processor. If using custom trust store, you must provide endpointUrl. Optional.

name

Optional, descriptive name for the processor.

id

Required identifier of the processor, unique across all processors within the flow. Must be between 3 and 30 characters long; contain only lower and uppercase alphabetical characters (a-z and A-Z), numbers, dashes ("-"), and underscores ("_"); and start with an alphabetical character. In other words, it adheres to the regex pattern [a-zA-Z][a-zA-Z0-9_-]{2,29}.

exchangeProperties

Optional set of custom properties in a simple jdk-format, that are added to the message exchange properties before processing the incoming payload. Any existing properties with the same name will be replaced by properties defined here.

retainPayloadOnFailure

Whether the incoming payload is available for error processing on failure. Defaults to false.

Sub-builders

Name Summary

externalSystemDetails

Strategy for describing the external system integration. Optional.

messageLoggingStrategy

Strategy for describing how a processor’s message is logged on the server.

payloadArchivingStrategy

Strategy for archiving payloads.

inboundTransformationStrategy

Strategy that customizes the conversion of an incoming payload by a processor (e.g., string to object). Should be used when the processor’s default conversion logic cannot be used.

Details

Authentication

The authenticationConfigKey property supports secrets of type AwsCredentials. See the Secret Types documentation for formatting details.

Exchange Properties

In addition to any message exchange properties defined by the forwardProperty() function, the payload always includes the following exchange properties:

Property Description

aws_s3_key

The name that was assign to the object.

aws_s3_eTag

The entity tag (i.e., a hash of the object). The ETag only reflects changes to the contents of the object, not its metadata.

aws_s3_owner

The owner of the object.

aws_s3_storageClass

The storage class used to store the object.

aws_s3_lastModified

The creation date of the object in the format yyyy-MM-dd’T’HH:mm:ss.SSS’Z'.

initialPayloadSizeInBytes

The size of the payload as it was received by the endpoint.

messageDescription

A textual description of the payload, typically used for logging.

Refer to the AWS S3 documentation for more information on the aws_s3_* properties.