Skip to content

Preserve is_import for imports whose path matches a path filter#4501

Closed
CGA1123 wants to merge 2 commits intobufbuild:mainfrom
CGA1123:cga1123/paths-filter-is-import
Closed

Preserve is_import for imports whose path matches a path filter#4501
CGA1123 wants to merge 2 commits intobufbuild:mainfrom
CGA1123:cga1123/paths-filter-is-import

Conversation

@CGA1123
Copy link
Copy Markdown
Contributor

@CGA1123 CGA1123 commented Apr 23, 2026

ImageWithOnlyPaths and ImageWithOnlyPathsAllowNotExist were clobbering a file's is_import=true flag whenever its path happened to match one of the filter entries, reclassifying it as a target in the output.

Transitive imports of actual targets are still pulled in by addFileWithImports via FileDescriptorProto.GetDependency(), so files that are imports of matched targets continue to appear in the filtered image (flagged as imports), and --include-imports behaviour is unchanged.

CGA1123 added 2 commits April 23, 2026 16:17
ImageWithOnlyPaths and ImageWithOnlyPathsAllowNotExist were clobbering
a file's is_import=true flag whenever its path happened to match one
of the filter entries, reclassifying it as a target in the output.

Transitive imports of actual targets are still pulled in by
addFileWithImports via FileDescriptorProto.GetDependency(), so files
that are imports of matched targets continue to appear in the filtered
image (flagged as imports), and --include-imports behaviour is
unchanged.
"google/protobuf/*" files are imports from the googleapis module, not
sourced from it, update the tests to reflect this.
Copy link
Copy Markdown
Member

@doriable doriable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I believe the current code path is intended behaviour, in that it specifically clobbers that flag for inputs that did not come from a workspace, in order to treat all paths from the --path flag as targets.

That being said, I understand the reasoning behind this PR, in that now there's a gap between the module's understanding of "what is an import" vs. the image compiled. And I think to address this gap, we would need to do a broader clean-up.

If you don't mind sharing, is there a specific problem you are targeting with this PR?

@CGA1123
Copy link
Copy Markdown
Contributor Author

CGA1123 commented Apr 27, 2026

If you don't mind sharing, is there a specific problem you are targeting with this PR?

Of course!

I noticed a diff when generating some protos across git_repo and binary_image input for the same underlying module, which was confusing/unexpected.

We generally use git_repo as the canonical source for our internal protobuf definitions, but have some wrappers which swap those git_repo input definitions out for a locally cached binary_image instead (git fetches can be painfully slow), which is how I discovered this & the problem I'm hoping to address!

@doriable
Copy link
Copy Markdown
Member

thank you for providing such quick feedback! sorry, just a couple of follow-up/clarifying questions, since i want to make sure i have the correct full picture the ascertain where the bug is coming from ^^" the workflow makes sense to me, using a cached image instead of the git_repo input for CI build times.

i wanted to double check the config being used. stepping back slightly, the premise of all of this is basically, for each path that is passed either through the --path flag or the paths input key config is being considered a target path. in the workspace build/compilation workflow, this is handled upfront with the workspace, so we skip this check in the controller.

so it sounds like the cached image includes module dependencies that may have an overlap in the path filter, and so this happens, whereas in the case of building the workspace, the dependencies would be resolved after the workspace build, and so they are excluded by the same path filter... but this is the point i want to verify.

and the issue lies with the fact that the semantics of the image input are inherently different from the semantics of the workspace input when it comes to target paths:

  • it's intuitive for the path filter to include everything that is viable from the image input (since the origins of the image may or may not be from a workspace)
  • it's also intuitive for target paths to only apply to your workspace source, and not concern itself with module dependencies -- those will always be resolved as imports...

so i think, unfortunately, the solution here is actually that when using the binary_image input, the configuration might need to be additionally adjusted, but again, i think this hinges on the fact that i have the right understanding of what is going on...

@CGA1123
Copy link
Copy Markdown
Contributor Author

CGA1123 commented Apr 27, 2026

so it sounds like the cached image includes module dependencies that may have an overlap in the path filter, and so this happens, whereas in the case of building the workspace, the dependencies would be resolved after the workspace build, and so they are excluded by the same path filter... but this is the point i want to verify.

Yes. I would agree with this (at least from my brief understanding of the underlying logic around images & resolution).

so i think, unfortunately, the solution here is actually that when using the binary_image input, the configuration might need to be additionally adjusted

Yeah, I can see that both interpretations can be seen as the path of least-surprise, dependent on what someone is trying to achieve with this combination of option -- not quite straight forward 😄

i think this hinges on the fact that i have the right understanding of what is going on...

If it's useful, I've pushed up https://github.com/CGA1123/buf-gen-paths with a reproduce.sh which exhibits the behaviour with a local directory input and a binary_image input. It's a silly example of clashing protobuf paths under google/protobuf/, but it exercises either side of the two paths you described.

The same can be seen via buf ls-files, for example:

$ buf --version
1.68.4
$ git clone https://github.com/CGA1123/buf-gen-paths && cd buf-gen-paths
$ buf ls-files . --path google/protobuf
google/protobuf/local.proto
$ buf build -o image.binpb
$ buf ls-files image.binpb --path google/protobuf
google/protobuf/local.proto
google/protobuf/timestamp.proto

@doriable
Copy link
Copy Markdown
Member

doriable commented Apr 27, 2026

the repro is definitely helpful (lol, i get that it's a little bit contrived to use the WKTs path, but as long as this is representative of your setup, that's fine), and i think it validates my understanding of what is going on.

Yeah, I can see that both interpretations can be seen as the path of least-surprise, dependent on what someone is trying to achieve with this combination of option -- not quite straight forward 😄

and unfortunately, to my point on the different semantics of image + workspace, this does result in a change like this being a breaking change against these semantics.

and so i think, to address your use-case, i would do one of the following:

  • either build the image with the target paths (e.g. the --path flag), and so when you use the image, you would remove the configured paths key
  • OR you would need to provide more granular targeting when using the image

i tried both out in your repro and they both yielded the desired result:

  • for option 1, i just change the build step in reproduce.sh to include --path google/protobuf and removed the paths key in buf.gen.binary_image.yaml
  • for option 2, i just augmented buf.gen.binary_image.yaml to target google/protobuf/local.proto instead

@CGA1123
Copy link
Copy Markdown
Contributor Author

CGA1123 commented Apr 28, 2026

That makes sense! Appreciate the clarification 🙇‍♂️

@CGA1123 CGA1123 closed this Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants