-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Add configurable model cache timeout for automatic memory management #8693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Added model_cache_keep_alive config field (minutes, default 0 = infinite) - Implemented timeout tracking in ModelCache class - Added _record_activity() to track model usage - Added _on_timeout() to auto-clear cache when timeout expires - Added shutdown() method to clean up timers - Integrated timeout with get(), lock(), unlock(), and put() operations - Updated ModelManagerService to pass keep_alive parameter - Added cleanup in stop() method Co-authored-by: lstein <111189+lstein@users.noreply.github.com>
- Created test_model_cache_timeout.py with comprehensive tests - Tests timeout clearing behavior - Tests activity resetting timeout - Tests no-timeout default behavior - Tests shutdown canceling timers Co-authored-by: lstein <111189+lstein@users.noreply.github.com>
- Added clarifying comment that _record_activity is called with lock held - Enhanced double-check in _on_timeout for thread safety - Added lock protection to shutdown method - Improved handling of edge cases where timer fires during activity Co-authored-by: lstein <111189+lstein@users.noreply.github.com>
- Remove unused variable in test - Add clarifying comment for daemon thread setting - Add detailed comment explaining cache clearing with 1000 GB value - Improve code documentation Co-authored-by: lstein <111189+lstein@users.noreply.github.com>
- Add explicit storage_device parameter (cpu) - Add explicit log_memory_usage parameter from config - Improves code clarity and configuration transparency Co-authored-by: lstein <111189+lstein@users.noreply.github.com>
|
@copilot Issues detected:
|
- Remove all trailing whitespace (W293 errors) - Add debug logging when timeout fires but activity detected - Add debug logging when timeout fires but cache is empty - Only log "Clearing model cache" message when actually clearing - Prevents misleading timeout messages during active generation Co-authored-by: lstein <111189+lstein@users.noreply.github.com>
All three issues have been addressed in commit 8d76b4e:
|
|
@copilot I'm still seeing the cache clearing message in the middle of denoising steps: In addition, the There are many similar error messages. |
Only log "Clearing model cache" message when there are actually unlocked models to clear. This prevents the misleading message from appearing during active generation when all models are locked. Changes: - Check for unlocked models before logging clear message - Add count of unlocked models in log message - Add debug log when all models are locked - Improves user experience by avoiding confusing messages Co-authored-by: lstein <111189+lstein@users.noreply.github.com>
What this PR is doing is to move unlocked models from VRAM back into RAM when there has been no generation activity over the past N minutes. It does not remove models from RAM. I basically conceived it as a "hibernation mode" for Invoke. If I'm not actively using it, I want the VRAM to be free for other GPU-consuming processes. The check only runs when the model manager has acquired a thread lock, so I think it is safe from race conditions, but I'm going through all the comments to make double-sure. |
…model-option' into copilot/add-unload-model-option
JPPhoto
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
lstein
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested under various conditions of load and couldn't break it.
…e previous default behavior
Pfannkuchensack
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works good.
Summary
Adds
model_cache_keep_alive_minconfig option (minutes, default 5) to automatically clear model cache after inactivity. Addresses memory contention when running InvokeAI alongside other GPU applications like Ollama.Implementation:
model_cache_keep_alive_minfield inInvokeAIAppConfigwith 5-minute defaultUsage:
Key Behavior:
Related Issues / Discussions
Addresses enhancement request for automatic model unloading from memory after inactivity period.
QA Instructions
Test default behavior (5-minute timeout):
Test custom timeout:
model_cache_keep_alive_min: 0.1(6 seconds) in configTest no timeout (old behavior):
model_cache_keep_alive_min: 0in configTest during active use:
Merge Plan
N/A - Additive change with sensible defaults. The 5-minute default enables automatic memory management while remaining practical for typical workflows.
Checklist
What's Newcopy (if doing a release after this PR)Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.