Skip to content

Conversation

@mjheilmann
Copy link
Collaborator

@mjheilmann mjheilmann commented Nov 29, 2025

This is an attempt to resolve the circular recursion case, in this PR:

  1. Rework the state to store the descriptor once, so symbols reference a filename instead of holding a duplicate descriptor
  2. Some state simplification, making some operations stricter
  3. Add a function that shrinks the state by combining cycles

When files participate in a cycle, causing the reflection tree to instead be a graph, we combine those participating file descriptors into a single descriptor.

  • minimal change from each-module-as-a-file
  • clients should now resolve schemas that contain cycles

@mjheilmann mjheilmann force-pushed the feature/cleanup_and_recursion branch from 4a2adc2 to 8333e21 Compare December 5, 2025 03:00
@aj-foster
Copy link

aj-foster commented Dec 17, 2025

Hi there 👋🏼

Thank you for working on this. I ran a few tests using this branch, and unfortunately the problem persists. Going to try and share as much information as possible, progressively, as time allows me to investigate further.

Current knowledge: we see infinite recursion in trace_message_refs/3, prior to the code modified in this PR. Still digging, as time allows, to figure out why.

Edit: It looks like the smallest reproducible test case might be any message that references google.protobuf.Struct. This message type is indirectly self-referential.

Edit 2: It's possible that the solution is related to adding a conditional based on State.has_symbol?/2 in the reduce call within trace_message_fields/4:

    |> Enum.reduce(state, fn %{mod: mod, symbol: symbol}, state ->
      symbol = Util.trim_symbol(symbol)

+     if State.has_symbol?(state, symbol) do
+       state
+     else
        response =
          if symbol in nested_types do
            build_response(parent_symbol, module)
          else
            build_response(symbol, mod)
          end

        state
        |> Extensions.add_extensions(symbol, mod)
        |> State.add_symbols(%{symbol => response})
        |> State.add_files(%{(symbol <> ".proto") => response})
        |> trace_message_refs(symbol, mod)
+     end
    end)

Unfortunately, while this addition avoids infinite recursion in my test cases, we get a different error (that may be completely unrelated, because it happens in this branch and on the latest release).

Edit 3: I see now that #59 is a better version of this suggestion.

* Don't build symbols we already have in the state

* update test for file merging in cycles
@mjheilmann
Copy link
Collaborator Author

@aj-foster yeah, I had some work in that direction in #59 already, which I'm embarrassed I didn't think to combine these before your comment. They are both in this PR now. It does make sense that we would need both solutions as the recursion occurs in two places.

Thanks for helping keep an eye on this, it is really appreciated.

@aj-foster
Copy link

Hello again,

I've encountered two problems since the recursion changes were merged in here. Will attempt to describe:


First, grpcurl did not enjoy the way nested types were communicated. We make extensive use of nested message types, for example a MyEndpoint message with nested Request and Reply messages. grpcurl appears to naively combines the package and name fields when interpreting a descriptor, meaning myservice.MyEndpoint.Reply was being treated like myservice.Reply.

Here's an example error message:

Failed to list methods for service "myservice.MyEndpoint": proto: message field "myservice.Reply.some_field" cannot resolve type: "myservice.MyEndpoint.Reply.SomeFieldType" not found

The following diff worked to solve this issue:

# builder.ex

   defp build_response(symbol, module) do
     # ...

+    package = String.trim_trailing(symbol, descriptor.name) |> String.trim_trailing(".")
     syntax = Util.get_syntax(module)

     response_stub =
       %FileDescriptorProto{
         name: symbol <> ".proto",
+        package: package,
-        package: Util.get_package(symbol),
         dependency: dependencies,
         syntax: syntax
       }

With the above fix in place, it looks like grpcurl is unable to handle recursive message types. It's completely possible that this is an issue with grpcurl and there's nothing to be done on the server side. Any service that includes google.protobuf.Struct (and its recursively-defined types google.protobuf.Value and google.protobuf.ListValue) fails to reflect with a stack overflow:

        github.com/jhump/protoreflect@v1.17.0/grpcreflect/client.go:276 +0x584 fp=0x140204825a0 sp=0x140204824d0 pc=0x10468c134
github.com/jhump/protoreflect/grpcreflect.(*Client).descriptorFromProto(0x140002fa600, 0x14000022a00)
        github.com/jhump/protoreflect@v1.17.0/grpcreflect/client.go:476 +0x11c fp=0x14020482680 sp=0x140204825a0 pc=0x10468d1ac
github.com/jhump/protoreflect/grpcreflect.(*Client).FileByFilename(0x140002fa600, {0x1400043c540, 0x1f})

Again, this looks like an issue with grpcurl, but since I can't find any issues about others running into the same problem, perhaps there is some way in which this reflection implementation is different from others.

@mjheilmann
Copy link
Collaborator Author

Hmm, nested types shouldn't have been touched, that's a regression. Especially since we should be loading the symbol names from the descriptor and not guessing. We guess the filenames

Grpcurl cannot handle cycles or graphs at all, it can only process trees. But with that regression it is difficult.

I'm hesitant to bring grpcurl or another third party into the integration testing,and this PR id getting too large and complex for my taste.

I'm going to spend some time isolating changes and whittling this scope down. With luck it will make this easier to reason around. It would be ideal to get a response from the upstream dep to let us be correct, but 🤷‍♂️

@mjheilmann mjheilmann mentioned this pull request Dec 31, 2025
@mjheilmann
Copy link
Collaborator Author

@aj-foster Now that I'm at a keyboard and not on my phone, I've re-read your message and can respond more completely.

If you wouldn't mind:

  • take a quick look and see if our test protos cover your cases? I believe that I have one recursive proto, and another proto that leverages nested messages
  • if there are questions around grpcurl, try using postman? I'd wager if the problem is in this lib postman should have the same problems. Postman should be v1alpha instead of v1, but the contents are all identical, it's just the message/structure names used.

Did the nested type problem only start having trouble in this branch? Or is it on main or one of the releases too? I am very interested in how it seems to have appeared for you recently.

@mjheilmann
Copy link
Collaborator Author

I added both a simulated graph traversal, and graph traversal using grpcurl (version 1.9.3) to the existing system. The tests are all currently passing. I'm going to continue to leave this PR open on the off-chance that there is.a bug here that would be introduced

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants