Skip to content

Conversation

@devin-ai-integration
Copy link
Contributor

fix: track token usage in litellm non-streaming and async calls

Summary

Fixes GitHub issue #4170 where get_token_usage_summary() was not returning accurate metrics when using litellm with non-streaming responses and async calls.

The root cause was that _track_token_usage_internal() was only being called in the sync streaming code path. This PR adds token tracking to:

  • _handle_non_streaming_response (sync non-streaming)
  • _ahandle_non_streaming_response (async non-streaming)
  • _ahandle_streaming_response (async streaming)
  • Fixed both code paths in sync streaming (with/without tool calls)

Additionally, litellm returns usage as an object with attributes (e.g., usage.prompt_tokens), but _track_token_usage_internal() expects a dict. Added conversion logic to handle this.

Review & Testing Checklist for Human

  • Verify no double-counting: Check that token usage isn't tracked twice in any code path. The sync streaming method has two tracking calls (lines 941 and 977) - verify these are mutually exclusive paths
  • Test with real litellm calls: The unit tests use mocks. Manually test with actual API calls to verify token metrics are populated correctly for streaming and async scenarios
  • Consider refactoring: The usage object-to-dict conversion is duplicated 5 times. Consider extracting to a helper method if this is a concern
  • Verify hasattr(usage_info, "__dict__") check: This distinguishes objects from dicts - verify this works correctly with all litellm response formats

Recommended Test Plan

  1. Create a simple script using LLM(model="gpt-4o-mini", is_litellm=True) with:
    • stream=False + call()
    • stream=False + acall()
    • stream=True + acall()
  2. After each call, verify llm.get_token_usage_summary() returns non-zero values

Notes

This fixes GitHub issue #4170 where token usage metrics were not being
updated when using litellm with streaming responses and async calls.

Changes:
- Add token usage tracking to _handle_non_streaming_response
- Add token usage tracking to _ahandle_non_streaming_response
- Add token usage tracking to _ahandle_streaming_response
- Fix sync streaming to track usage in both code paths
- Convert usage objects to dicts before passing to _track_token_usage_internal
- Add comprehensive tests for token usage tracking in all scenarios

Co-Authored-By: João <joao@crewai.com>
@devin-ai-integration
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant