fix(shaclgen): emit sh:pattern for pattern constraints inside any_of#13
Open
fix(shaclgen): emit sh:pattern for pattern constraints inside any_of#13
Conversation
Co-authored-by: Patrick Kalita <pkalita@lbl.gov>
…rrides Co-authored-by: Kevin Schaper <kevinschaper@gmail.com>
fix(excelgen): move workbook.save outside loop
6630206 to
f9c16d5
Compare
The error is explained in the comment - its spurious and annoying to wait rely on the PURL system being updated.
190a69a to
04182d5
Compare
Co-authored-by: Kevin Schaper <kevinschaper@gmail.com> Co-authored-by: Corey Cox <69321580+amc-corey-cox@users.noreply.github.com>
7197c74 to
95f0e86
Compare
The SHACL generator translated any_of branches by dispatching
solely on `any.range` (class, type, enum, or simple datatype).
If a branch specified `pattern:` — either alone or combined
with a range — the constraint was silently dropped, producing
an empty blank node `[ ]` (trivially satisfied) instead of the
intended `[ sh:pattern "..." ]`.
This is a problem for schemas that use pattern alternatives in
`any_of`, such as the SPDX license field where valid values are
either members of a fixed enum (SPDX identifiers), IRIs, or
custom identifiers matching the LicenseRef- pattern defined in
SPDX Specification v2.3 Annex D (ABNF: license-ref =
["DocumentRef-"(idstring)":"]"LicenseRef-"(idstring)).
The fix adds a single check after the range dispatch:
if any.pattern:
g.add((range_list[-1], SH.pattern, Literal(any.pattern)))
This correctly handles:
- Pattern-only branches (no range): node gets only sh:pattern
- Range + pattern branches: node gets both sh:datatype and sh:pattern
- Range-only branches (no pattern): unchanged behaviour
The test suite now includes a dedicated schema exercising all
three cases, with assertions on both the generated RDF triples
and pyshacl validation of conforming/non-conforming data.
Signed-off-by: Carlo van Driesten <carlo.van-driesten@bmw.de>
95f0e86 to
097f25c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The SHACL generator (
gen-shacl) dropspatternconstraints declared insideany_ofbranches,producing empty blank nodes
[ ]instead of the intended[ sh:pattern "..." ].Root Cause
In
shaclgen.py, thefor any in s.any_of:loop (line ~202) dispatches exclusively onany.range— resolving it as a class, type, enum, or simple datatype. Thepatternattributeof each
AnonymousSlotExpressionis never read.When a branch specifies
pattern:withoutrange:,any.rangeisNone, soadd_simple_data_type(func, None)finds no matching datatype → emits an empty blank node.The top-level
s.patterncheck (line ~267) only runs in theelsebranch whenany_ofis absent.Fix
After the range dispatch loop appends a BNode to
range_list, propagate the pattern:This correctly handles:
sh:patternsh:datatypeandsh:patternMotivation: SPDX LicenseRef-
The SPDX specification v2.3 (Annex D) defines custom license identifiers:
Schemas modelling SPDX-compliant license fields need
any_ofwith:^LicenseRef-[a-zA-Z0-9\-\.]+$for custom identifiersWithout this fix, branch (3) generates
[ ]which is trivially satisfied by ANY value,making the constraint weaker than intended.
Real-World Validation
This fix has been validated end-to-end with the Gaia-X Trust Framework ontology
(
gx:VirtualResourceShapelicense constraint), where the pattern branch is used to acceptLicenseRef-Custom-Commercial-Agreementfor proprietary simulation assets in the ENVITED-Xdata space.
The corresponding upstream model change that depends on this fix:
fix/spdx-license-ref-pattern—adds
pattern: "^LicenseRef-[a-zA-Z0-9\\-\\.]+$"as a thirdany_ofbranch in thelicenseslot definition (GitLab MR pending for upstream service-characteristics).After regenerating GX artifacts with this fix, all 14
gx:licenseproperty shapes nowproduce
[ sh:pattern "^LicenseRef-[a-zA-Z0-9\\-\\.]+$" ]instead of[ ], and 45+real-world simulation assets pass SHACL validation.
Test Coverage
New test schema (
any_of_pattern.yaml) exercises all three cases with assertions on:sh:patternpresent/absent on correct branch nodes)All existing tests pass unchanged (23→24 in test_shaclgen.py).
Specification References
AI Disclosure
This fix was developed with AI assistance (GitHub Copilot) for code analysis, test generation,
and commit message drafting. All changes were reviewed, validated with pyshacl against the
Gaia-X Trust Framework model, and are fully understood by the author.