Skip to content

fix(shaclgen): emit sh:pattern for pattern constraints inside any_of#13

Open
jdsika wants to merge 17 commits intomainfrom
fix/shaclgen-any-of-pattern
Open

fix(shaclgen): emit sh:pattern for pattern constraints inside any_of#13
jdsika wants to merge 17 commits intomainfrom
fix/shaclgen-any-of-pattern

Conversation

@jdsika
Copy link
Copy Markdown

@jdsika jdsika commented May 7, 2026

Summary

The SHACL generator (gen-shacl) drops pattern constraints declared inside any_of branches,
producing empty blank nodes [ ] instead of the intended [ sh:pattern "..." ].

Root Cause

In shaclgen.py, the for any in s.any_of: loop (line ~202) dispatches exclusively on
any.range — resolving it as a class, type, enum, or simple datatype. The pattern attribute
of each AnonymousSlotExpression is never read.

When a branch specifies pattern: without range:, any.range is None, so
add_simple_data_type(func, None) finds no matching datatype → emits an empty blank node.

The top-level s.pattern check (line ~267) only runs in the else branch when any_of is absent.

Fix

After the range dispatch loop appends a BNode to range_list, propagate the pattern:

if any.pattern:
    g.add((range_list[-1], SH.pattern, Literal(any.pattern)))

This correctly handles:

  • Pattern-only branches (no range): node gets only sh:pattern
  • Range + pattern branches: node gets both sh:datatype and sh:pattern
  • Range-only branches (no pattern): unchanged behaviour

Motivation: SPDX LicenseRef-

The SPDX specification v2.3 (Annex D) defines custom license identifiers:

license-ref = ["DocumentRef-"(idstring)":"]"LicenseRef-"(idstring)
idstring    = 1*(ALPHA / DIGIT / "-" / ".")

Schemas modelling SPDX-compliant license fields need any_of with:

  1. An enum of standard SPDX identifiers
  2. A URI range (for externally-defined licenses)
  3. A pattern ^LicenseRef-[a-zA-Z0-9\-\.]+$ for custom identifiers

Without this fix, branch (3) generates [ ] which is trivially satisfied by ANY value,
making the constraint weaker than intended.

Real-World Validation

This fix has been validated end-to-end with the Gaia-X Trust Framework ontology
(gx:VirtualResourceShape license constraint), where the pattern branch is used to accept
LicenseRef-Custom-Commercial-Agreement for proprietary simulation assets in the ENVITED-X
data space.

The corresponding upstream model change that depends on this fix:

After regenerating GX artifacts with this fix, all 14 gx:license property shapes now
produce [ sh:pattern "^LicenseRef-[a-zA-Z0-9\\-\\.]+$" ] instead of [ ], and 45+
real-world simulation assets pass SHACL validation.

Test Coverage

New test schema (any_of_pattern.yaml) exercises all three cases with assertions on:

  • Generated RDF triples (sh:pattern present/absent on correct branch nodes)
  • pyshacl validation (conforming and non-conforming data)

All existing tests pass unchanged (23→24 in test_shaclgen.py).

Specification References

AI Disclosure

This fix was developed with AI assistance (GitHub Copilot) for code analysis, test generation,
and commit message drafting. All changes were reviewed, validated with pyshacl against the
Gaia-X Trust Framework model, and are fully understood by the author.

@jdsika jdsika force-pushed the fix/shaclgen-any-of-pattern branch 3 times, most recently from 6630206 to f9c16d5 Compare May 7, 2026 12:13
The error is explained in the comment - its spurious and annoying to wait rely on the PURL system being updated.
cmungall and others added 3 commits May 7, 2026 10:52
Co-authored-by: Kevin Schaper <kevinschaper@gmail.com>
Co-authored-by: Corey Cox <69321580+amc-corey-cox@users.noreply.github.com>
@jdsika jdsika force-pushed the fix/shaclgen-any-of-pattern branch 2 times, most recently from 7197c74 to 95f0e86 Compare May 7, 2026 17:12
The SHACL generator translated any_of branches by dispatching
solely on `any.range` (class, type, enum, or simple datatype).
If a branch specified `pattern:` — either alone or combined
with a range — the constraint was silently dropped, producing
an empty blank node `[ ]` (trivially satisfied) instead of the
intended `[ sh:pattern "..." ]`.

This is a problem for schemas that use pattern alternatives in
`any_of`, such as the SPDX license field where valid values are
either members of a fixed enum (SPDX identifiers), IRIs, or
custom identifiers matching the LicenseRef- pattern defined in
SPDX Specification v2.3 Annex D (ABNF: license-ref =
["DocumentRef-"(idstring)":"]"LicenseRef-"(idstring)).

The fix adds a single check after the range dispatch:

    if any.pattern:
        g.add((range_list[-1], SH.pattern, Literal(any.pattern)))

This correctly handles:
- Pattern-only branches (no range): node gets only sh:pattern
- Range + pattern branches: node gets both sh:datatype and sh:pattern
- Range-only branches (no pattern): unchanged behaviour

The test suite now includes a dedicated schema exercising all
three cases, with assertions on both the generated RDF triples
and pyshacl validation of conforming/non-conforming data.

Signed-off-by: Carlo van Driesten <carlo.van-driesten@bmw.de>
@jdsika jdsika force-pushed the fix/shaclgen-any-of-pattern branch from 95f0e86 to 097f25c Compare May 7, 2026 20:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants