[FLINK-39558][table] LogicalUnnestRule: use Calcite Uncollect rowType instead of LogicalType round-trip#28053
Draft
jnh5y wants to merge 2 commits intoapache:masterfrom
Draft
[FLINK-39558][table] LogicalUnnestRule: use Calcite Uncollect rowType instead of LogicalType round-trip#28053jnh5y wants to merge 2 commits intoapache:masterfrom
jnh5y wants to merge 2 commits intoapache:masterfrom
Conversation
Adds two tests against the existing nested_not_null.business_data column,
covering LEFT-JOIN UNNEST shapes that hit RelOptUtil.verifyTypeEquivalence
("Cannot add expression of different type to set"):
- testNullMismatchLeftJoinNoAliasList: bare Uncollect under LEFT (no
column-list alias inserts no Project).
- testNullMismatchLeftJoinOnPredicate: ON-clause predicate adds a
LogicalFilter between the LEFT correlate and the Uncollect.
Generated-by: claude-opus-4-7
Replace the LogicalType round-trip in LogicalUnnestRule with Calcite's
native Uncollect.deriveRowType output. The previous code derived the
table-function-scan rowType via:
Calcite RelDataType -> Flink LogicalType ->
UnnestRowsFunctionBase.getUnnestedType(...) ->
Calcite RelDataType (via createFieldTypeFromLogicalType)
This round-trip dropped per-field nullability information that Calcite's
upstream Correlate had derived for the LEFT-JOIN case, causing
RelOptUtil.verifyTypeEquivalence to fail with "Cannot add expression of
different type to set" whenever the array element was NOT NULL but the
LEFT join made the right-side fields nullable. The previous fix
(getLogicalProjectWithAdjustedNullability) patched up that divergence by
inserting CAST-to-nullable wrappers, but only for shapes where a
LogicalProject sat between the Correlate and the Uncollect; bare
Uncollect, Filter(Uncollect), and Filter(Project(Uncollect)) shapes were
not covered.
Using uncollect.getRowType() makes the rewritten Correlate's derived
rowType match the original byte-for-byte, eliminating the divergence at
the source. The patchwork helper and its dependent imports are removed.
Field naming follows Calcite's Uncollect convention now:
* ARRAY/MULTISET<T>: element column is named after the source array
column with a numeric suffix to disambiguate (e.g. "tags0" instead of
the synthetic "f0" / "EXPR$0").
* MAP<K,V>: key/value columns are named "KEY"/"VALUE" instead of
"f0"/"f1".
* WITH ORDINALITY: ordinality column is named "ORDINALITY" (unchanged).
Recorded plan fixtures in UnnestTest.xml, LogicalUnnestRuleTest.xml, and
MultiJoinTest.xml are updated to reflect the new naming. The runtime
INTERNAL_UNNEST_ROWS function is positional, so existing CompiledPlan
instances continue to restore correctly (verified by
CorrelateRestoreTest).
Generated-by: claude-opus-4-7
Collaborator
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What is the purpose of the change
Fixes an internal-error class in
LogicalUnnestRulewhere theTableFunctionScanrowType diverges from what Calcite derives for the originalCorrelate(Uncollect)tree, causingRelOptUtil.verifyTypeEquivalenceto fail forLEFT JOIN UNNESTshapes that don't fit the FLINK-33217 patch path.Repro (fails on
release-2.0/ master, passes with this PR):Both crash with
java.lang.AssertionError: Cannot add expression of different type to set.Jira: FLINK-39558
Brief change log
UnnestRowsFunctionBase.getUnnestedType(...)round-trip inLogicalUnnestRulewith Calcite'suncollect.getRowType(). This makes the rewrittenCorrelate's derived rowType match the original byte-for-byte.getLogicalProjectWithAdjustedNullabilityandcreateNullableTypehelpers (FLINK-33217's CAST-to-nullable patchwork is no longer needed because the divergence at the source is gone).Uncollect,Filter(Uncollect)).UnnestTest.xml(batch + stream),LogicalUnnestRuleTest.xml,MultiJoinTest.xml, andJavaCatalogTableTest.xmlto reflect Calcite-derived field naming.Field naming change
Calcite's
Uncollectderives field names from the source array, so plan output for UNNEST columns changes:ARRAY<T>/MULTISET<T>: the unnested column is named after the source array column (e.g.,tags0instead of syntheticf0/EXPR$0).MAP<K,V>: key/value columns are namedKEYandVALUEinstead off0/f1.WITH ORDINALITY: ordinality column is namedORDINALITY(unchanged).Multiple unnests in the same query are auto-disambiguated by Calcite's outer Correlate (e.g. two MAP unnests produce
KEY, VALUE, KEY0, VALUE0).The runtime
INTERNAL_UNNEST_ROWSfunction is positional, so persistedCompiledPlaninstances continue to restore correctly (verified byCorrelateRestoreTest).Verifying this change
This change adds 2 reproducer tests in commit 1, which fail at HEAD~1 with
RelOptUtil.verifyTypeEquivalenceand pass with the fix in commit 2:UnnestTestBase.testNullMismatchLeftJoinNoAliasListUnnestTestBase.testNullMismatchLeftJoinOnPredicateFull
flink-table-plannersuite passes (10691 tests, 0 failures, 41 skipped).CorrelateRestoreTestpasses, demonstrating that persisted CompiledPlans containing UNNEST continue to restore correctly.Does this pull request potentially affect one of the following parts?
@Public(Evolving): noDocumentation