Skip to content

Commit 3de7eea

Browse files
authored
fix: Fix incorrect primitive type detection (#122)
Problem ======= `typeLength`, and potentially `precision`, with value "null" causes incorrect primitive type detection result. Solution ======== We should handle the null values such that when the `typeLength` or `precisions` field is of value "null", its primitive type are detected as "INT64". Steps to Verify: The bug reproduces when the parquet file consists of a Dictionary_Page with a INT64 field whose typeLength is null upon read. Unfortunately, I don't have such a test file for now. My debugging was based on a piece of privately shared data from our customer. When the bug reproduces, the primitive type parsed from the schema (Fixed_Length_Byte_Array) won't match the primitive type discovered from the column data (Int64). Due to a discrepancy on how the library decodes data pages, when the data is in a Dictionary_Page, the decoding logic will hit the check for `typeLength` and fail. For Data_Page and Data_Page_V2, decoding ignores the schema and privileges the primitive type inferred from the column data. However, for Dictionary_Page, decoding uses the primitive type specified in the schema. decodeDataPageV2 https://github.com/LibertyDSNP/parquetjs/blob/91fc71f262c699fdb5be50df2e0b18da8acf8e19/lib/reader.ts#L1104 decodeDictionaryPage https://github.com/LibertyDSNP/parquetjs/blob/91fc71f262c699fdb5be50df2e0b18da8acf8e19/lib/reader.ts#L947 Notice that one uses "opts.type" while the other uses "opts.column.primitiveType".
1 parent 91fc71f commit 3de7eea

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

lib/types.ts

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,14 +20,14 @@ interface INTERVAL {
2020

2121
export function getParquetTypeDataObject(type: ParquetType, field?: ParquetField | Options | FieldDefinition): ParquetTypeDataObject {
2222
if (type === 'DECIMAL') {
23-
if (field?.typeLength !== undefined) {
23+
if (field?.typeLength !== undefined && field?.typeLength !== null) {
2424
return {
2525
primitiveType: 'FIXED_LEN_BYTE_ARRAY',
2626
originalType: 'DECIMAL',
2727
typeLength: field.typeLength,
2828
toPrimitive: toPrimitive_FIXED_LEN_BYTE_ARRAY_DECIMAL
2929
};
30-
} else if (field?.precision !== undefined && field.precision > 18) {
30+
} else if (field?.precision !== undefined && field?.precision !== null && field.precision > 18) {
3131
return {
3232
primitiveType: 'BYTE_ARRAY',
3333
originalType: 'DECIMAL',

0 commit comments

Comments
 (0)