Skip to content

fix: 修复工具调用名重复拼接问题(流式响应chunk重复累积)#7735

Open
Blueteemo wants to merge 6 commits intoAstrBotDevs:masterfrom
Blueteemo:fix/issue-7694-tool-call-name-deduplication
Open

fix: 修复工具调用名重复拼接问题(流式响应chunk重复累积)#7735
Blueteemo wants to merge 6 commits intoAstrBotDevs:masterfrom
Blueteemo:fix/issue-7694-tool-call-name-deduplication

Conversation

@Blueteemo
Copy link
Copy Markdown
Contributor

@Blueteemo Blueteemo commented Apr 22, 2026

问题描述

使用 MiniMax 模型通过 NVIDIA 代理调用时,工具调用名出现重复拼接:

  • astr_kb_searchastr_kb_searchastr_kb_search
  • call_xxxcall_xxxcall_xxx

这是因为流式响应处理过程中 chunk 被重复累积导致的。

修复方案

openai_source.py_parse_openai_completion 方法中,对 tool call 的 idname 增加去重校验。当检测到字符串是由前半段和后半段相同的“自重复”模式构成时(例如 abcabcabc),取前半段作为正确值。

修改内容

  1. ProviderOpenAIOfficial 类中添加 _deduplicate_self_repeating 静态方法,用于检测并修复自重复字符串。
  2. 在解析 tool_calls 时,对 function.nameidextra_content 的 key 都应用去重处理。

关联 Issue

Fixes #7694

Summary by Sourcery

Deduplicate self-repeating tool call identifiers in OpenAI streaming responses to prevent duplicated tool names and IDs caused by chunk accumulation.

Bug Fixes:

  • Fix duplicated tool call function names and IDs produced when handling streaming MiniMax/NVIDIA responses by normalizing self-repeating strings before use.

Enhancements:

  • Introduce a reusable helper to detect and collapse simple self-repeating string patterns when parsing tool calls.

Summary by Sourcery

Deduplicate self-repeating tool call identifiers in OpenAI streaming responses to avoid duplicated names and IDs caused by chunk accumulation.

Bug Fixes:

  • Prevent duplicated tool call function names and IDs when parsing streaming MiniMax/NVIDIA responses by normalizing self-repeating strings before use.

Enhancements:

  • Introduce a helper to collapse simple self-repeating string patterns and apply it to tool call names, IDs, and function arguments during parsing.

@auto-assign auto-assign Bot requested review from Fridemn and LIghtJUNction April 22, 2026 21:12
@dosubot dosubot Bot added size:S This PR changes 10-29 lines, ignoring generated files. area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. labels Apr 22, 2026
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The _deduplicate_self_repeating helper only detects exactly-two-part repetitions (abcabc) and skips shorter values, so if other repetition patterns (e.g. abcabcabc or abab) might occur in streaming, consider either documenting this limitation or generalizing the detection logic to handle arbitrary repeated substrings.
  • When deduplicating tool_call.id, None is converted to an empty string; if downstream logic distinguishes between None and a real (even if malformed) id, it may be safer to short-circuit and skip deduplication when id is falsy instead of normalizing to "".
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `_deduplicate_self_repeating` helper only detects exactly-two-part repetitions (`abcabc`) and skips shorter values, so if other repetition patterns (e.g. `abcabcabc` or `abab`) might occur in streaming, consider either documenting this limitation or generalizing the detection logic to handle arbitrary repeated substrings.
- When deduplicating `tool_call.id`, `None` is converted to an empty string; if downstream logic distinguishes between `None` and a real (even if malformed) id, it may be safer to short-circuit and skip deduplication when `id` is falsy instead of normalizing to `""`.

## Individual Comments

### Comment 1
<location path="astrbot/core/provider/sources/openai_source.py" line_range="883-884" />
<code_context>
+                    func_name_ls.append(
+                        cls._deduplicate_self_repeating(tool_call.function.name)
+                    )
+                    tool_call_ids.append(
+                        cls._deduplicate_self_repeating(tool_call.id or "")
+                    )

</code_context>
<issue_to_address>
**issue (bug_risk):** Converting `None` tool_call IDs to an empty string may change behavior and cause key collisions.

Previously, `tool_call_ids.append(tool_call.id)` preserved `None`, but `tool_call.id or ""` now turns `None` into `""` and uses that as the key in `tool_call_extra_content_dict`. Multiple `None` IDs will all map to `""`, potentially merging distinct entries and changing behavior. Consider either keeping `None` as-is (e.g., branch on `tool_call.id is None`) or updating `_deduplicate_self_repeating` to accept `Optional[str]` and leave `None` unchanged.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread astrbot/core/provider/sources/openai_source.py Outdated
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a _deduplicate_self_repeating helper method to address streaming duplication issues in tool call names and IDs. Reviewers identified a critical bug where cls is used instead of self in an instance method, which will lead to a runtime error. Furthermore, the deduplication logic is noted to be overly simplistic, potentially affecting valid strings or failing for triple repetitions, and requires unit tests for validation.

Comment thread astrbot/core/provider/sources/openai_source.py
Comment thread astrbot/core/provider/sources/openai_source.py Outdated
Comment thread astrbot/core/provider/sources/openai_source.py Outdated
@dosubot dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:S This PR changes 10-29 lines, ignoring generated files. labels Apr 22, 2026
@Blueteemo
Copy link
Copy Markdown
Contributor Author

@sourcery-ai review

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The _deduplicate_self_repeating heuristic is quite aggressive (e.g., it will collapse aaaa to a or abcabcabc to abc), which risks corrupting legitimately repetitive IDs/names; consider constraining it to simple ABAB double-repeat patterns or adding additional guards (e.g., minimum unit size, character set pattern) to better match the known MiniMax/NVIDIA failure mode.
  • You currently recompute _deduplicate_self_repeating(tool_call.id) multiple times; consider deduplicating once into a local deduped_id and reusing it for tool_call_ids and tool_call_extra_content_dict to avoid divergence and keep the logic clearer.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `_deduplicate_self_repeating` heuristic is quite aggressive (e.g., it will collapse `aaaa` to `a` or `abcabcabc` to `abc`), which risks corrupting legitimately repetitive IDs/names; consider constraining it to simple `ABAB` double-repeat patterns or adding additional guards (e.g., minimum unit size, character set pattern) to better match the known MiniMax/NVIDIA failure mode.
- You currently recompute `_deduplicate_self_repeating(tool_call.id)` multiple times; consider deduplicating once into a local `deduped_id` and reusing it for `tool_call_ids` and `tool_call_extra_content_dict` to avoid divergence and keep the logic clearer.

## Individual Comments

### Comment 1
<location path="astrbot/core/provider/sources/openai_source.py" line_range="891-902" />
<code_context>
+                    func_name_ls.append(
+                        self._deduplicate_self_repeating(tool_call.function.name)
+                    )
+                    tool_call_ids.append(self._deduplicate_self_repeating(tool_call.id))

                     # gemini-2.5 / gemini-3 series extra_content handling
                     extra_content = getattr(tool_call, "extra_content", None)
                     if extra_content is not None:
-                        tool_call_extra_content_dict[tool_call.id] = extra_content
+                        deduped_id = self._deduplicate_self_repeating(tool_call.id)
+                        if deduped_id is not None:
+                            tool_call_extra_content_dict[deduped_id] = extra_content
</code_context>
<issue_to_address>
**suggestion:** Reuse the same deduplicated ID instance instead of recomputing it to keep behavior consistent.

The ID is currently deduplicated twice: once when appending to `tool_call_ids` and again when setting `deduped_id` for `tool_call_extra_content_dict`. This risks subtle divergence if `_deduplicate_self_repeating` changes and adds unnecessary work. Compute `deduped_id` once, append it to `tool_call_ids`, and reuse it as the key in `tool_call_extra_content_dict`.

```suggestion
                    args_ls.append(args)
                    func_name_ls.append(
                        self._deduplicate_self_repeating(tool_call.function.name)
                    )
                    deduped_id = self._deduplicate_self_repeating(tool_call.id)
                    tool_call_ids.append(deduped_id)

                    # gemini-2.5 / gemini-3 series extra_content handling
                    extra_content = getattr(tool_call, "extra_content", None)
                    if extra_content is not None and deduped_id is not None:
                        tool_call_extra_content_dict[deduped_id] = extra_content
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread astrbot/core/provider/sources/openai_source.py Outdated
@zouyonghe
Copy link
Copy Markdown
Member

@sourcery-ai review

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • The _deduplicate_self_repeating helper collapses any fully repeating pattern (e.g. ababab, hahahaha), which might unintentionally alter legitimate tool names/IDs/arguments; consider constraining it to the specific duplication pattern you’ve observed (e.g., exact double repetition with a minimum length) or scoping it to known-affected providers.
  • When deduplicating function.arguments before json.loads, this can change the original payload beyond just fixing chunk duplication; if that’s not desired, consider limiting deduplication to identifiers (id, name) and only applying it to arguments when you can positively detect the streaming concatenation pattern.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `_deduplicate_self_repeating` helper collapses any fully repeating pattern (e.g. `abab``ab`, `hahaha``ha`), which might unintentionally alter legitimate tool names/IDs/arguments; consider constraining it to the specific duplication pattern you’ve observed (e.g., exact double repetition with a minimum length) or scoping it to known-affected providers.
- When deduplicating `function.arguments` before `json.loads`, this can change the original payload beyond just fixing chunk duplication; if that’s not desired, consider limiting deduplication to identifiers (`id`, `name`) and only applying it to arguments when you can positively detect the streaming concatenation pattern.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Copy Markdown
Member

@zouyonghe zouyonghe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我这边补充一个回归风险:

当前 _deduplicate_self_repeating() 会对所有 OpenAI 兼容 provider 的 tool call 名和 id 无条件生效,只要字符串满足“前半段 == 后半段”就会被折叠,例如合法的 pingping 会被改成 pingabab 会被改成 ab

_parse_openai_completion()openai_source.py 的共享解析路径,普通非流式和流式最终都会走这里,所以这个改动不只影响 MiniMax/NVIDIA 的异常场景,也会影响正常 provider 的合法工具名 / id。后续工具查找是按折叠后的名字继续执行的,这会导致原本存在的工具变成 tool not found

建议至少把这个修复收窄到已知受影响的 provider / model,或者增加更强的判定信号,而不是对所有 OpenAI 兼容返回统一改写。

@Blueteemo Blueteemo force-pushed the fix/issue-7694-tool-call-name-deduplication branch from f23d255 to e329971 Compare April 23, 2026 05:59
@Blueteemo
Copy link
Copy Markdown
Contributor Author

@sourcery-ai review

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The _deduplicate_self_repeating helper only detects exactly two repetitions (e.g. abcabc) with a relatively high min_length default of 20; if streaming ever yields 3+ repeats or shorter tool names/IDs, the bug will still surface—consider making the repetition detection more general and/or lowering or parameterizing the length threshold.
  • By applying _deduplicate_self_repeating to both tool_call.id and the extra_content key, different original IDs that collapse to the same base string will now collide in tool_call_extra_content_dict; it may be worth explicitly deciding how to handle collisions or at least guarding/logging when a key is being overwritten.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `_deduplicate_self_repeating` helper only detects exactly two repetitions (e.g. `abcabc`) with a relatively high `min_length` default of 20; if streaming ever yields 3+ repeats or shorter tool names/IDs, the bug will still surface—consider making the repetition detection more general and/or lowering or parameterizing the length threshold.
- By applying `_deduplicate_self_repeating` to both `tool_call.id` and the `extra_content` key, different original IDs that collapse to the same base string will now collide in `tool_call_extra_content_dict`; it may be worth explicitly deciding how to handle collisions or at least guarding/logging when a key is being overwritten.

## Individual Comments

### Comment 1
<location path="astrbot/core/provider/sources/openai_source.py" line_range="878-882" />
<code_context>
                 if tool_call.type == "function":
                     # workaround for #1454
                     if isinstance(tool_call.function.arguments, str):
+                        deduped_args = self._deduplicate_self_repeating(
+                            tool_call.function.arguments
+                        )
                         try:
-                            args = json.loads(tool_call.function.arguments)
+                            args = json.loads(deduped_args)
                         except json.JSONDecodeError as e:
                             logger.error(f"解析参数失败: {e}")
</code_context>
<issue_to_address>
**issue:** Consider guarding json.loads() against unexpected None from the deduplication helper.

`_deduplicate_self_repeating` is typed to allow returning `None`, but `json.loads(None)` would raise a `TypeError` that isn’t caught by the `JSONDecodeError` handler. It’s currently safe only because the caller always passes a `str`. To avoid a future unhandled exception, either make `_deduplicate_self_repeating` return `str` when given `str`, or guard before parsing, e.g. `if deduped_args is None: args = {};` (or another explicit fallback) before calling `json.loads`.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread astrbot/core/provider/sources/openai_source.py
Copy link
Copy Markdown
Contributor Author

@Blueteemo Blueteemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已解决。感谢反馈,min_length=20 参数可以有效保护短名称不被误伤,且只处理正好 2 次重复的情况,平衡了修复效果和回归风险。

@Blueteemo
Copy link
Copy Markdown
Contributor Author

补充:关于 Sourcery 提出的 json.loads(None) 会抛 TypeError 的问题,经分析这是误报。代码中有 isinstance guard:

if isinstance(tool_call.function.arguments, str): deduped_args = self._deduplicate_self_repeating(...) args = json.loads(deduped_args)

只有 arguments 是 str 时才调用 json.loads,不会收到 None。所有问题均已处理。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]在查询的时候提示:未找到指定的工具: astr_kb_searchastr_kb_search,将跳过。

2 participants