Skip to content

AnthropicLlm usage_metadata is missing thinking token count #5397

@sebastienc

Description

@sebastienc

AnthropicLlm only maps basic input/output token counts into usage_metadata. When extended thinking is enabled, thinking block tokens are included in output_tokens but never broken out separately.

Current behaviour

message_to_generate_content_response and the streaming final response both produce:

usage_metadata=types.GenerateContentResponseUsageMetadata(
    prompt_token_count=message.usage.input_tokens,
    candidates_token_count=message.usage.output_tokens,
    total_token_count=(input_tokens + output_tokens),
)

Cache token counts (cache_creation_input_tokens, cache_read_input_tokens) are also missing but are tracked separately in #5395.

Expected behaviour

When extended thinking is enabled, populate usage_metadata.thoughts_token_count with the token count of thinking blocks. This is derivable from the thinking block content (supplemental API call to tokenizer) or from a future dedicated API field (ref: anthropic-python-sdk ).

Reference

This is particularly relevant now that extended thinking is supported via PR #5392.

Metadata

Metadata

Assignees

Labels

models[Component] Issues related to model supportrequest clarification[Status] The maintainer need clarification or more information from the author

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions