Skip to content

Bug updates#33

Merged
rbs333 merged 1 commit into
mainfrom
bug/RAAE-1685
May 27, 2026
Merged

Bug updates#33
rbs333 merged 1 commit into
mainfrom
bug/RAAE-1685

Conversation

@rbs333
Copy link
Copy Markdown
Contributor

@rbs333 rbs333 commented May 27, 2026

Fix 10 trial-user findings in the SQL to RediSearch translator

Closes the bugs catalogued in ISSUE_REPORT.md (originally reported in error_spec.md). Every defect now has a dedicated regression test that fails on main and passes here.

Issue-by-issue resolution

Two flavors of resolution are used. Where RediSearch has a faithful translation, the bug is fixed so the query produces the right result. Where RediSearch has no equivalent semantics (e.g., ordering on TAG values), the translator now raises a clear ValueError so users get a fast, accurate signal instead of silently wrong output.

# Severity Reported behavior Worth fixing? Resolution in this PR Notes
1 Critical NOT IN on TAG silently emits the positive match (@status:{a|b}) Yes. RediSearch supports negation as a - prefix; the parser was setting negated=True and the builder was ignoring it. Threaded negated through build_tag_condition and the translator's is_negated. Now emits -@status:{a|b}. Faithful 1:1 mapping; no semantic compromise.
2 Critical NOT BETWEEN on NUMERIC silently emits the in-range query Yes. Same root cause as 1 for numeric ranges. Threaded negated through build_numeric_condition. Now emits -@age:[10 20]. Faithful 1:1 mapping.
3 Critical NOT field > 5 silently emits the positive comparison Yes. Same root cause. Same fix as 2; covers >, >=, <, <=, BETWEEN, IN, =, != for both TAG and NUMERIC. Faithful 1:1 mapping.
4 Critical Literal | in a TAG value is read by RediSearch as a value separator (status = 'a|b' matches a OR b) Yes. | is the IN-list separator at the syntax level, so it must be escaped inside a value. Added | to QueryBuilder.TAG_SPECIAL_CHARS. Now emits @status:{a\\|b} and IN ('x|y','z') correctly produces two values, not three. Cosmetic in indexes that never store pipes; critical anywhere a tag value can contain user input.
5 High IN on NUMERIC crashes with TypeError: float([1,2,3]) Yes. RediSearch has no native NUMERIC IN, but the union of equality ranges is faithful and cheap. The translator expands field IN (a,b,c) to (@field:[a a]|@field:[b b]|@field:[c c]), or to an ANDed set of negated ranges for NOT IN. No more crash. Query string grows linearly with list length. For very large IN lists, a single BETWEEN min AND max may be cheaper if the values are contiguous.
6 High LIKE '' emits @field: (bare colon) which fails at runtime Yes, but no sensible behavior to map to. Empty pattern has no defined match. Raises ValueError with a hint pointing at IS NULL or %. Better than emitting invalid syntax. Behavioral change: callers that previously got a runtime error from Redis now get an earlier ValueError from the translator.
7 High BETWEEN on TAG emits @status:{\\('a'\\, 'z'\\)} (a literal string match on the seven-character token ('a', 'z')) No, this cannot be faithfully fixed. RediSearch TAG fields are an unordered inverted index; there is no < / <= on tags, no collation, no lexicographic range. The previous output was already junk. Raises ValueError explaining the limitation and pointing at IN (...). If the user truly wants lexicographic ranges, the field must be declared as TEXT (or NUMERIC) at index creation. That is a schema decision the translator cannot make.
8 High status = "active" becomes @status:{None} (and crashes on NUMERIC with float(None)) Yes. Users clearly intend a value comparison; the issue is sqlglot parsing "active" as a quoted identifier instead of a literal. _extract_literal_value now treats a quoted exp.Column(Identifier(quoted=True)) as the identifier's string value. status = "active"@status:{active}. Aligns with MySQL's permissive double-quote handling. Standard ANSI SQL reserves "..." for identifiers, so any future strict-mode flag should turn this back into an error.
9 Medium SELECT DISTINCT is silently ignored and returns duplicates Yes. FT.AGGREGATE with GROUPBY on the projected columns produces the same set of distinct rows. Parser detects select.args["distinct"] and promotes the projected columns into groupby_fields, routing through FT.AGGREGATE with no REDUCE. DISTINCT * and DISTINCT mixed with aggregates raise a clear ValueError. Underlying Redis command changes from FT.SEARCH to FT.AGGREGATE. The executor normalizes both into QueryResult, so user-visible row shape is unchanged.
10 Medium Multi-column ORDER BY silently drops every key after the first Yes. FT.AGGREGATE SORTBY accepts multiple keys; we can route there automatically. Added len(parsed.orderby_fields) > 1 to the use_aggregate trigger in _build_command. FT.AGGREGATE emits SORTBY 4 @price ASC @title DESC for two keys. ORDER BY fields are validated by the analyzer and added to LOAD so non-SORTABLE columns work. Tradeoff: FT.AGGREGATE is heavier than FT.SEARCH because it materializes the result pipeline, and LIMIT uses cursor semantics rather than score-ordered windows. For a multi-key sort this is the right tool anyway; the side effect is that SQL queries differing only by a second ORDER BY key now hit a different Redis command.

What changed in code

sql_redis/query_builder.py

  • build_tag_condition and build_numeric_condition now accept negated: bool = False and apply the - prefix to the entire range expression. Existing != behavior is preserved by ORing the two sources.
  • Added | to TAG_SPECIAL_CHARS so a literal pipe inside a TAG value is escaped.
  • build_text_condition rejects empty LIKE patterns with a ValueError.

sql_redis/translator.py

  • _build_condition passes the XOR-resolved is_negated to the tag and numeric builders.
  • IN on a NUMERIC field is expanded to a |-joined union of equality ranges (or an ANDed set of negated equalities for NOT IN).
  • BETWEEN on a TAG field raises ValueError.
  • Multi-key ORDER BY auto-routes to FT.AGGREGATE (added to the use_aggregate condition in _build_command). ORDER BY fields are added to the LOAD set so non-SORTABLE columns sort correctly.

sql_redis/parser.py

  • _extract_literal_value treats a quoted identifier (exp.Column(Identifier(quoted=True))) as the string value of the identifier.
  • ParsedQuery gained a distinct: bool field. SELECT DISTINCT promotes the projected columns to groupby_fields. SELECT DISTINCT * and DISTINCT mixed with aggregates raise.

sql_redis/analyzer.py

  • ORDER BY fields are now added to referenced_fields so they're validated against the schema and available in field_types for the LOAD step.
  • The alias set used to skip schema lookups was broadened to include score(), vector-distance, and aggregation aliases. Previously these were not in the set because no path put them in referenced_fields; now that ORDER BY does, the alias set has to be complete.

Tests

24 new tests, all failing before this change and passing after:

  • tests/test_query_builder.py::TestQueryBuilderNegationBug (9 tests) covers the builder-level negation contract for tag and numeric.
  • tests/test_query_builder.py::TestQueryBuilderTagPipeEscape (2 tests) covers literal pipe escaping in both scalar and IN-list values.
  • tests/test_translator.py::TestTranslatorTrialUserBug (13 tests) covers the end-to-end SQL to RediSearch translation for every case from the bug report, including the FT.AGGREGATE routing and LOAD behavior for multi-key ORDER BY.

Full suite: 550 passed.

Behavioral changes that callers may notice

Input Before After
LIKE '' Invalid query at runtime ValueError
BETWEEN on TAG Garbage query, no matches ValueError
Multi-key ORDER BY Silently dropped trailing keys (FT.SEARCH) All keys honored via FT.AGGREGATE SORTBY
SELECT DISTINCT col FT.SEARCH with duplicates FT.AGGREGATE with GROUPBY
status = "active" @status:{None} @status:{active}
NOT field > x on NUMERIC Positive match (wrong) Negated match (correct)
NOT IN on TAG Positive match (wrong) Negated match (correct)
status = 'a|b' on TAG Matches a OR b Matches literal a|b
IN on NUMERIC TypeError Union of equality ranges

No existing test required updating; all changes preserved the prior contract where it was correct.

Files touched

  • sql_redis/query_builder.py
  • sql_redis/translator.py
  • sql_redis/parser.py
  • sql_redis/analyzer.py
  • tests/test_query_builder.py
  • tests/test_translator.py

@rbs333 rbs333 requested a review from nkanu17 May 27, 2026 12:00
@rbs333 rbs333 added the auto:release Create a release when this PR is merged label May 27, 2026
@rbs333 rbs333 merged commit 4063514 into main May 27, 2026
8 checks passed
@rbs333 rbs333 deleted the bug/RAAE-1685 branch May 27, 2026 16:34
@github-actions
Copy link
Copy Markdown
Contributor

🚀 PR was released in v0.6.0 🚀

@github-actions github-actions Bot added the released This issue/pull request has been released. label May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto:release Create a release when this PR is merged released This issue/pull request has been released.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant