Skip to main content

Parquet Format

Under Construction

Note that the Parquet support is currently under development and is missing some functionality such as support for arrays and does not propagate information about deletes.

Feldera can ingest and output data in the Parquet format.

Here we document the Parquet format and how it interacts with different SQL types.

Types

The parquet file is expected to be a valid parquet file with a schema. The schema (row name and type) must match the table definition in the Feldera pipeline program. We use Arrow to specify the data-types in parquet. The following table shows the mapping between Feldera SQL types and Arrow types.

Feldera SQL TypeApache Arrow Type
BOOLEANBoolean
TINYINT, SMALLINT, INTEGER, BIGINTInt8, Int16, Int32, Int64
FLOAT, DOUBLE, DECIMALFloat32, Float64, Decimal
VARCHAR, CHAR, STRINGLargeUtf8
BINARY, VARBINARYDataType::Binary
TIMEDataType::UInt64 (time in nanoseconds)
TIMESTAMPDataType::Timestamp(TimeUnit::Millisecond, None) (milliseconds since unix epoch)
DATEDataType::Int32 (days since unix epoch)
ARRAYDataType::LargeList
STRUCTDataType::Struct