Schemas and Formats
Arrow Schema
The basic JSON structure for an Arrow schema is:
{
"fields": [<field-object>, ..],
"metadata": [<key-value-object>, ..]
}
-
The
fieldsarray must contain at least one field object. -
The
metadataarray can contain zero or more key-value object's.
fields
Each JSON object in the fields array defines an Arrow column data type.
|
There are two general categories of an Arrow column data type:
Currently, we only support Primitive field objects. |
The general structure of the field object is:
{
"name": "someFieldName",
"nullable": <true|false>,
"type": {
"name": <field-type>,
..
}
}
|
All the properties in a field object are required. |
The following field object types are supported:
Consult the Arrow Schema specification for more details on each data type.
The rest of the JSON object structure is specific to each field type:
int
| Property | Type | Allowed value(s) |
|---|---|---|
|
String |
|
|
Number |
|
|
Boolean |
|
floatingpoint
| Property | Type | Allowed value(s) |
|---|---|---|
|
String |
|
|
String |
|
fixedsizebinary
| Property | Type | Allowed value(s) |
|---|---|---|
|
String |
|
|
Number |
(Integer, minimum: |
decimal
| Property | Type | Allowed value(s) |
|---|---|---|
|
String |
|
|
Number |
(Integer, minimum: |
|
Number |
(Integer, minimum: |
|
Number |
|
time
| Property | Type | Allowed value(s) |
|---|---|---|
|
String |
|
|
String |
|
|
Number |
|
Avro Schema
The Apache Avro specification details the structure of an Avro schema. Both primitive and complex types are supported.
| The Avro transformation does not currently support the proprietary Confluent wire format. |
Primitive Types
-
null: no value
-
boolean: a binary value
-
int: 32-bit signed integer
-
long: 64-bit signed integer
-
float: single precision (32-bit)
-
double: double precision (64-bit)
-
bytes: sequence of 8-bit unsigned bytes
-
string: unicode character sequence
Avro Schema Example
{
"type": "record",
"name": "person",
"fields": [
{
"name": "name",
"type": "string"
},
{
"name": "age",
"type": "int"
},
{
"name": "pets",
"type": {
"type": "array",
"items": {
"type": "record",
"name": "pet",
"fields": [
{
"name": "name",
"type": "string"
},
{
"name": "type",
"type": {
"type": "enum",
"name": "AnimalType",
"symbols": [
"Cat",
"Dog"
]
}
}
]
}
}
}
]
}
The above schema may be used to serialise the below JSON into a binary Avro representation.
{
"name": "John Doe",
"age": 58,
"pets": [
{
"name": "Milo",
"type": "Dog"
},
{
"name": "Nero",
"type": "Cat"
}
]
}