Skip to content

[c++] Add nullability support for data type in c++#525

Open
charlesdong1991 wants to merge 3 commits intoapache:mainfrom
charlesdong1991:issue-510
Open

[c++] Add nullability support for data type in c++#525
charlesdong1991 wants to merge 3 commits intoapache:mainfrom
charlesdong1991:issue-510

Conversation

@charlesdong1991
Copy link
Copy Markdown
Contributor

Purpose

Preserve nullability when constructing Fluss schemas through the C++ binding. Previously, the C++ DataType class had no nullable flag, so all types defaulted to nullable regardless of what the user specified.

Linked issue: close #510

Tests

All tests have passed locally

Documentation

Docs are updated accordingly.

Copy link
Copy Markdown
Contributor

@leekeiabstraction leekeiabstraction left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TY for the PR, left a question.

Comment thread bindings/cpp/src/types.rs
Comment on lines +418 to +419
nullable: true,
element_nullable: true,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate why and the impact of hardcoding nullability here? Also other places where element nullability is hardcoded to true.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hardcoding here is on purpose actually, since the spec struct has 2 nullability concept, where nullable is the nullability of this constructed type where element_nullable is for the leaf element of an array.

and function like build_array_type_from_leaf recursively calls to construct leaf which is scaler, so element_nullable is unused.

ArrayWriter it only needs the type for encoding i believe, and putting default as true is aligned with Fluss default, which means all types are nullable unless explicitly marked.

I think probably best to add some comments in case of questions in future, WDYT? @leekeiabstraction

@charlesdong1991
Copy link
Copy Markdown
Contributor Author

thanks for the question. i replied, let me know if further question @leekeiabstraction

if find time, would be great to give a look @fresh-borzoni @leekeiabstraction 🙏

Copy link
Copy Markdown
Contributor

@fresh-borzoni fresh-borzoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charlesdong1991 Ty for the PR, overall LGTM
Left some comments, PTAL

Comment thread bindings/cpp/src/types.rs
pub const DATA_TYPE_BINARY: i32 = 16;
pub const DATA_TYPE_ARRAY: i32 = 17;

struct FfiDataTypeSpec {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this struct is the right shape. One struct lumps two jobs (build scalar / build array). Most call sites zero out the majority of the fields. Shall we split into separate Scalar/Array variants?

Comment thread bindings/cpp/src/lib.rs
element_data_type: i32,
element_precision: i32,
element_scale: i32,
element_nullable: bool,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we split FfiDataTypeSpec to Scalar/Array, this field can go away too = just make array_nullability one entry longer (last = leaf)

bool nullable = true;
if (static_cast<size_t>(i) < flat.array_nullability.size()) {
nullable = flat.array_nullability[static_cast<size_t>(i)] != 0;
} else if (i == 0) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but do we have the case for this?

Comment thread bindings/cpp/src/types.rs
array_nesting: u32,
array_nullability: Vec<u8>,
nullable: bool,
element_nullable: bool,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it only for backward compatibility, mb we can drop it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[c++] Add nullability support to DataType in C++ binding

3 participants