Parquet Format
Under Construction
Note that the Parquet support is currently under development and is missing some functionality such as support for arrays and does not propagate information about deletes.
Feldera can ingest and output data in the Parquet format.
- via
ingress
andegress
REST endpoints by specifying?format=parquet
in the URL - as a payload received from or sent to a connector
Here we document the Parquet format and how it interacts with different SQL types.
Types
The parquet file is expected to be a valid parquet file with a schema. The schema (row name and type) must match the table definition in the Feldera pipeline program. We use Arrow to specify the data-types in parquet. The following table shows the mapping between Feldera SQL types and Arrow types.
Feldera SQL Type | Apache Arrow Type |
---|---|
BOOLEAN | Boolean |
TINYINT , SMALLINT , INTEGER , BIGINT | Int8 , Int16 , Int32 , Int64 |
FLOAT , DOUBLE , DECIMAL | Float32 , Float64 , Decimal |
VARCHAR , CHAR , STRING | LargeUtf8 |
BINARY , VARBINARY | DataType::Binary |
TIME | DataType::UInt64 (time in nanoseconds) |
TIMESTAMP | DataType::Timestamp(TimeUnit::Millisecond, None) (milliseconds since unix epoch) |
DATE | DataType::Int32 (days since unix epoch) |
ARRAY | DataType::LargeList |
STRUCT | DataType::Struct |