AnthropicLlm only maps basic input/output token counts into usage_metadata. When extended thinking is enabled, thinking block tokens are included in output_tokens but never broken out separately.
Current behaviour
message_to_generate_content_response and the streaming final response both produce:
usage_metadata=types.GenerateContentResponseUsageMetadata(
prompt_token_count=message.usage.input_tokens,
candidates_token_count=message.usage.output_tokens,
total_token_count=(input_tokens + output_tokens),
)
Cache token counts (cache_creation_input_tokens, cache_read_input_tokens) are also missing but are tracked separately in #5395.
Expected behaviour
When extended thinking is enabled, populate usage_metadata.thoughts_token_count with the token count of thinking blocks. This is derivable from the thinking block content (supplemental API call to tokenizer) or from a future dedicated API field (ref: anthropic-python-sdk ).
Reference
This is particularly relevant now that extended thinking is supported via PR #5392.
AnthropicLlmonly maps basic input/output token counts intousage_metadata. When extended thinking is enabled, thinking block tokens are included inoutput_tokensbut never broken out separately.Current behaviour
message_to_generate_content_responseand the streaming final response both produce:Cache token counts (
cache_creation_input_tokens,cache_read_input_tokens) are also missing but are tracked separately in #5395.Expected behaviour
When extended thinking is enabled, populate
usage_metadata.thoughts_token_countwith the token count of thinking blocks. This is derivable from the thinking block content (supplemental API call to tokenizer) or from a future dedicated API field (ref: anthropic-python-sdk ).Reference
This is particularly relevant now that extended thinking is supported via PR #5392.