fix: 修复工具调用名重复拼接问题(流式响应chunk重复累积)#7735
Conversation
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The
_deduplicate_self_repeatinghelper only detects exactly-two-part repetitions (abcabc) and skips shorter values, so if other repetition patterns (e.g.abcabcabcorabab) might occur in streaming, consider either documenting this limitation or generalizing the detection logic to handle arbitrary repeated substrings. - When deduplicating
tool_call.id,Noneis converted to an empty string; if downstream logic distinguishes betweenNoneand a real (even if malformed) id, it may be safer to short-circuit and skip deduplication whenidis falsy instead of normalizing to"".
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `_deduplicate_self_repeating` helper only detects exactly-two-part repetitions (`abcabc`) and skips shorter values, so if other repetition patterns (e.g. `abcabcabc` or `abab`) might occur in streaming, consider either documenting this limitation or generalizing the detection logic to handle arbitrary repeated substrings.
- When deduplicating `tool_call.id`, `None` is converted to an empty string; if downstream logic distinguishes between `None` and a real (even if malformed) id, it may be safer to short-circuit and skip deduplication when `id` is falsy instead of normalizing to `""`.
## Individual Comments
### Comment 1
<location path="astrbot/core/provider/sources/openai_source.py" line_range="883-884" />
<code_context>
+ func_name_ls.append(
+ cls._deduplicate_self_repeating(tool_call.function.name)
+ )
+ tool_call_ids.append(
+ cls._deduplicate_self_repeating(tool_call.id or "")
+ )
</code_context>
<issue_to_address>
**issue (bug_risk):** Converting `None` tool_call IDs to an empty string may change behavior and cause key collisions.
Previously, `tool_call_ids.append(tool_call.id)` preserved `None`, but `tool_call.id or ""` now turns `None` into `""` and uses that as the key in `tool_call_extra_content_dict`. Multiple `None` IDs will all map to `""`, potentially merging distinct entries and changing behavior. Consider either keeping `None` as-is (e.g., branch on `tool_call.id is None`) or updating `_deduplicate_self_repeating` to accept `Optional[str]` and leave `None` unchanged.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Code Review
This pull request introduces a _deduplicate_self_repeating helper method to address streaming duplication issues in tool call names and IDs. Reviewers identified a critical bug where cls is used instead of self in an instance method, which will lead to a runtime error. Furthermore, the deduplication logic is noted to be overly simplistic, potentially affecting valid strings or failing for triple repetitions, and requires unit tests for validation.
|
@sourcery-ai review |
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The
_deduplicate_self_repeatingheuristic is quite aggressive (e.g., it will collapseaaaatoaorabcabcabctoabc), which risks corrupting legitimately repetitive IDs/names; consider constraining it to simpleABABdouble-repeat patterns or adding additional guards (e.g., minimum unit size, character set pattern) to better match the known MiniMax/NVIDIA failure mode. - You currently recompute
_deduplicate_self_repeating(tool_call.id)multiple times; consider deduplicating once into a localdeduped_idand reusing it fortool_call_idsandtool_call_extra_content_dictto avoid divergence and keep the logic clearer.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `_deduplicate_self_repeating` heuristic is quite aggressive (e.g., it will collapse `aaaa` to `a` or `abcabcabc` to `abc`), which risks corrupting legitimately repetitive IDs/names; consider constraining it to simple `ABAB` double-repeat patterns or adding additional guards (e.g., minimum unit size, character set pattern) to better match the known MiniMax/NVIDIA failure mode.
- You currently recompute `_deduplicate_self_repeating(tool_call.id)` multiple times; consider deduplicating once into a local `deduped_id` and reusing it for `tool_call_ids` and `tool_call_extra_content_dict` to avoid divergence and keep the logic clearer.
## Individual Comments
### Comment 1
<location path="astrbot/core/provider/sources/openai_source.py" line_range="891-902" />
<code_context>
+ func_name_ls.append(
+ self._deduplicate_self_repeating(tool_call.function.name)
+ )
+ tool_call_ids.append(self._deduplicate_self_repeating(tool_call.id))
# gemini-2.5 / gemini-3 series extra_content handling
extra_content = getattr(tool_call, "extra_content", None)
if extra_content is not None:
- tool_call_extra_content_dict[tool_call.id] = extra_content
+ deduped_id = self._deduplicate_self_repeating(tool_call.id)
+ if deduped_id is not None:
+ tool_call_extra_content_dict[deduped_id] = extra_content
</code_context>
<issue_to_address>
**suggestion:** Reuse the same deduplicated ID instance instead of recomputing it to keep behavior consistent.
The ID is currently deduplicated twice: once when appending to `tool_call_ids` and again when setting `deduped_id` for `tool_call_extra_content_dict`. This risks subtle divergence if `_deduplicate_self_repeating` changes and adds unnecessary work. Compute `deduped_id` once, append it to `tool_call_ids`, and reuse it as the key in `tool_call_extra_content_dict`.
```suggestion
args_ls.append(args)
func_name_ls.append(
self._deduplicate_self_repeating(tool_call.function.name)
)
deduped_id = self._deduplicate_self_repeating(tool_call.id)
tool_call_ids.append(deduped_id)
# gemini-2.5 / gemini-3 series extra_content handling
extra_content = getattr(tool_call, "extra_content", None)
if extra_content is not None and deduped_id is not None:
tool_call_extra_content_dict[deduped_id] = extra_content
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
@sourcery-ai review |
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- The
_deduplicate_self_repeatinghelper collapses any fully repeating pattern (e.g.abab→ab,hahaha→ha), which might unintentionally alter legitimate tool names/IDs/arguments; consider constraining it to the specific duplication pattern you’ve observed (e.g., exact double repetition with a minimum length) or scoping it to known-affected providers. - When deduplicating
function.argumentsbeforejson.loads, this can change the original payload beyond just fixing chunk duplication; if that’s not desired, consider limiting deduplication to identifiers (id,name) and only applying it to arguments when you can positively detect the streaming concatenation pattern.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `_deduplicate_self_repeating` helper collapses any fully repeating pattern (e.g. `abab` → `ab`, `hahaha` → `ha`), which might unintentionally alter legitimate tool names/IDs/arguments; consider constraining it to the specific duplication pattern you’ve observed (e.g., exact double repetition with a minimum length) or scoping it to known-affected providers.
- When deduplicating `function.arguments` before `json.loads`, this can change the original payload beyond just fixing chunk duplication; if that’s not desired, consider limiting deduplication to identifiers (`id`, `name`) and only applying it to arguments when you can positively detect the streaming concatenation pattern.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
zouyonghe
left a comment
There was a problem hiding this comment.
我这边补充一个回归风险:
当前 _deduplicate_self_repeating() 会对所有 OpenAI 兼容 provider 的 tool call 名和 id 无条件生效,只要字符串满足“前半段 == 后半段”就会被折叠,例如合法的 pingping 会被改成 ping,abab 会被改成 ab。
但 _parse_openai_completion() 是 openai_source.py 的共享解析路径,普通非流式和流式最终都会走这里,所以这个改动不只影响 MiniMax/NVIDIA 的异常场景,也会影响正常 provider 的合法工具名 / id。后续工具查找是按折叠后的名字继续执行的,这会导致原本存在的工具变成 tool not found。
建议至少把这个修复收窄到已知受影响的 provider / model,或者增加更强的判定信号,而不是对所有 OpenAI 兼容返回统一改写。
…rary repetitions, dedupe arguments
f23d255 to
e329971
Compare
|
@sourcery-ai review |
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The
_deduplicate_self_repeatinghelper only detects exactly two repetitions (e.g.abcabc) with a relatively highmin_lengthdefault of 20; if streaming ever yields 3+ repeats or shorter tool names/IDs, the bug will still surface—consider making the repetition detection more general and/or lowering or parameterizing the length threshold. - By applying
_deduplicate_self_repeatingto bothtool_call.idand theextra_contentkey, different original IDs that collapse to the same base string will now collide intool_call_extra_content_dict; it may be worth explicitly deciding how to handle collisions or at least guarding/logging when a key is being overwritten.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `_deduplicate_self_repeating` helper only detects exactly two repetitions (e.g. `abcabc`) with a relatively high `min_length` default of 20; if streaming ever yields 3+ repeats or shorter tool names/IDs, the bug will still surface—consider making the repetition detection more general and/or lowering or parameterizing the length threshold.
- By applying `_deduplicate_self_repeating` to both `tool_call.id` and the `extra_content` key, different original IDs that collapse to the same base string will now collide in `tool_call_extra_content_dict`; it may be worth explicitly deciding how to handle collisions or at least guarding/logging when a key is being overwritten.
## Individual Comments
### Comment 1
<location path="astrbot/core/provider/sources/openai_source.py" line_range="878-882" />
<code_context>
if tool_call.type == "function":
# workaround for #1454
if isinstance(tool_call.function.arguments, str):
+ deduped_args = self._deduplicate_self_repeating(
+ tool_call.function.arguments
+ )
try:
- args = json.loads(tool_call.function.arguments)
+ args = json.loads(deduped_args)
except json.JSONDecodeError as e:
logger.error(f"解析参数失败: {e}")
</code_context>
<issue_to_address>
**issue:** Consider guarding json.loads() against unexpected None from the deduplication helper.
`_deduplicate_self_repeating` is typed to allow returning `None`, but `json.loads(None)` would raise a `TypeError` that isn’t caught by the `JSONDecodeError` handler. It’s currently safe only because the caller always passes a `str`. To avoid a future unhandled exception, either make `_deduplicate_self_repeating` return `str` when given `str`, or guard before parsing, e.g. `if deduped_args is None: args = {};` (or another explicit fallback) before calling `json.loads`.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Blueteemo
left a comment
There was a problem hiding this comment.
已解决。感谢反馈,min_length=20 参数可以有效保护短名称不被误伤,且只处理正好 2 次重复的情况,平衡了修复效果和回归风险。
|
补充:关于 Sourcery 提出的 json.loads(None) 会抛 TypeError 的问题,经分析这是误报。代码中有 isinstance guard:
只有 arguments 是 str 时才调用 json.loads,不会收到 None。所有问题均已处理。 |
问题描述
使用 MiniMax 模型通过 NVIDIA 代理调用时,工具调用名出现重复拼接:
astr_kb_search→astr_kb_searchastr_kb_searchcall_xxx→call_xxxcall_xxx这是因为流式响应处理过程中
chunk被重复累积导致的。修复方案
在
openai_source.py的_parse_openai_completion方法中,对 tool call 的id和name增加去重校验。当检测到字符串是由前半段和后半段相同的“自重复”模式构成时(例如abcabc→abc),取前半段作为正确值。修改内容
ProviderOpenAIOfficial类中添加_deduplicate_self_repeating静态方法,用于检测并修复自重复字符串。tool_calls时,对function.name、id和extra_content的 key 都应用去重处理。关联 Issue
Fixes #7694
Summary by Sourcery
Deduplicate self-repeating tool call identifiers in OpenAI streaming responses to prevent duplicated tool names and IDs caused by chunk accumulation.
Bug Fixes:
Enhancements:
Summary by Sourcery
Deduplicate self-repeating tool call identifiers in OpenAI streaming responses to avoid duplicated names and IDs caused by chunk accumulation.
Bug Fixes:
Enhancements: