fix: track token usage in litellm non-streaming and async calls #4171
+441
−5
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
fix: track token usage in litellm non-streaming and async calls
Summary
Fixes GitHub issue #4170 where
get_token_usage_summary()was not returning accurate metrics when using litellm with non-streaming responses and async calls.The root cause was that
_track_token_usage_internal()was only being called in the sync streaming code path. This PR adds token tracking to:_handle_non_streaming_response(sync non-streaming)_ahandle_non_streaming_response(async non-streaming)_ahandle_streaming_response(async streaming)Additionally, litellm returns usage as an object with attributes (e.g.,
usage.prompt_tokens), but_track_token_usage_internal()expects a dict. Added conversion logic to handle this.Review & Testing Checklist for Human
hasattr(usage_info, "__dict__")check: This distinguishes objects from dicts - verify this works correctly with all litellm response formatsRecommended Test Plan
LLM(model="gpt-4o-mini", is_litellm=True)with:stream=False+call()stream=False+acall()stream=True+acall()llm.get_token_usage_summary()returns non-zero valuesNotes