-
Notifications
You must be signed in to change notification settings - Fork 1
Rework of Rewards and Queueing + Small hotfixes #17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
enlorenz
commented
Jan 28, 2026
- After Refactor, "new" Energy-efficiency reward was gone. It's reintroduced, and put through where "efficiency-reward" is carried.
- Introduce helper_queue to handle job-submission overflow. Agent still only sees 1000 MAX_QUEUE as observation space, while excess jobs are caught by helper.
- Reworked job-age penalty. Now, takes age of longest-waiting job. Smooth rise of penalty with age, after 1 Week max. penalty reached.
- Added more metrics/logs into callbacks for tensorboard.
- Some small bug-fixes.
… refactoring. Now reintroduced and used instead of "efficiency" function.
8bf27e2 to
963be49
Compare
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📝 WalkthroughWalkthroughAdds deque-based backlog queues and helpers, integrates backlog into job submission and baseline flows, disables age-based dropping, introduces energy-efficiency and blackout reward terms, expands hourly/episode metrics, increases MAX_JOB_AGE/observable age, updates training naming, and adds Changes
Sequence Diagram(s)sequenceDiagram
participant Env as Environment
participant JM as JobManagement
participant Q as MainQueue
participant HQ as BacklogQueue
participant BL as Baseline
participant Nodes as Nodes
Env->>JM: age_backlog_queue(HQ, metrics)
JM-->>Env: updated backlog, dropped_count
Env->>JM: fill_queue_from_backlog(Q, HQ, next_empty_slot)
JM->>HQ: pop FIFO jobs
JM->>Q: insert into main queue
JM-->>Env: next_empty_slot, moved_count
Env->>JM: add_new_jobs(Q, new_jobs..., backlog_queue=HQ)
alt main queue full
JM->>HQ: append overflow job
else space available
JM->>Q: insert job
end
JM-->>Env: new_jobs, next_empty_slot
Env->>BL: baseline_step(..., baseline_backlog_queue=HQ)
BL->>JM: (may) age/fill backlog helpers
BL-->>Env: baseline results
Env->>Nodes: assign_jobs_to_available_nodes(..., metrics, is_baseline?)
Nodes-->>Env: updated running jobs / completions
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@src/config.py`:
- Around line 9-10: The observation `job_queue` uses MAX_JOB_AGE for all four
job attributes causing job_age to be massively overbounded; add a realistic
observation cap like MAX_JOB_AGE_OBS (e.g., set to EPISODE_HOURS * 26 for ~6
months) in config.py and replace the job_age entry in the `high` vector used to
build the `job_queue` observation space in environment.py (referencing the
`job_queue` observation construction and `MAX_JOB_AGE` symbol) so that the
job_age bound uses MAX_JOB_AGE_OBS while duration/nodes/cores keep their
existing reasonable bounds.
In `@src/environment.py`:
- Line 340: Replace the extraneous f-string used in the env output: change the
call to self.env_print(f"[5] Calculating reward...") so it uses a plain string
without the f prefix (self.env_print("[5] Calculating reward...")); update the
occurrence inside the surrounding method where self.env_print is called to
remove the unnecessary f-string formatting.
In `@src/reward_calculation.py`:
- Around line 202-204: The combined efficiency term (efficiency_reward_norm) can
be in [-1,1] because _reward_energy_efficiency_normalized returns [0,1] and
_blackout_term returns [-1,1]; if downstream code expects [0,1], replace the
current sum with a normalized mapping and clamp: compute raw =
self._reward_energy_efficiency_normalized(...) + self._blackout_term(...), then
set efficiency_reward_norm = max(0.0, min(1.0, (raw + 1.0) / 2.0)) so the result
is guaranteed in [0,1]; update the expression where efficiency_reward_norm is
computed (referencing _reward_energy_efficiency_normalized, _blackout_term, and
the efficiency_reward_norm variable) and add a short comment explaining the
normalization.
🧹 Nitpick comments (6)
requirements.txt (1)
8-8: Consider pinning dependency versions for reproducibility.The new
torchinfodependency follows the existing pattern of unpinned versions. For production stability and reproducible builds, consider pinning versions for all dependencies (e.g.,torchinfo==1.8.0). This is optional since it matches the current style.src/baseline.py (2)
16-17: Update docstring to includebaseline_helper_queueparameter.The new
baseline_helper_queueparameter is added to the function signature but not documented in the Args section of the docstring.Proposed fix
Add after line 35:
baseline_helper_queue: Helper queue for baseline overflow jobsAlso applies to: 36-36
50-53: Use underscore prefix for unused variable.
new_baseline_jobsis never used after assignment. Per Python convention, prefix unused variables with_to signal intent.Proposed fix
- new_baseline_jobs, baseline_next_empty_slot = add_new_jobs( + _new_baseline_jobs, baseline_next_empty_slot = add_new_jobs( job_queue_2d, new_jobs_count, new_jobs_durations, new_jobs_nodes, new_jobs_cores, baseline_next_empty_slot, baseline_helper_queue )src/job_management.py (1)
231-232: Document the hotfix more explicitly for future cleanup.The
if False and new_age > MAX_JOB_AGEpattern effectively disables dropping. Consider adding a TODO comment with a tracking reference (e.g., issue number) to ensure this temporary measure gets revisited.📝 Suggested improvement
- # TEMP HOTFIX: disable age-based dropping (keep logic for later). - if False and new_age > MAX_JOB_AGE: + # TEMP HOTFIX: disable age-based dropping (keep logic for later). + # TODO(enlorenz): Re-enable age-based dropping once queue overflow is properly handled. + if False and new_age > MAX_JOB_AGE:src/reward_calculation.py (2)
135-139: Normalization produces incorrect values:_normalize(current, 0, -1)inverts the scale unexpectedly.The
_normalizemethod computes(current - minimum) / (maximum - minimum). Withminimum=0andmaximum=-1:
- When
current_penalty = 0: returns(0 - 0) / (-1 - 0) = 0- When
current_penalty = 1: returns(1 - 0) / (-1 - 0) = -1This effectively maps
[0, 1]→[0, -1], which is the intended outcome, but the parameter ordering is confusing (passing-1asmaximumwhen it's actually the minimum of the output range). The subsequentclip(-1, 0)ensures correctness, but the code is misleading.♻️ Clearer implementation
def _penalty_job_age_normalized(self, num_off_nodes, job_queue_2d): """Calculate normalized job age penalty [-1, 0].""" current_penalty = self._penalty_job_age(num_off_nodes, job_queue_2d) - normalized_penalty = self._normalize(current_penalty, 0, -1) - return np.clip(normalized_penalty, -1, 0) + # current_penalty is in [0, 1], map to [-1, 0] + return -current_penalty
158-176: Blackout term uses hardcoded magic numbers.
BLACKOUT_QUEUE_THRESHOLD = 10andSATURATION_FACTOR = 2are defined inline. Consider extracting these tosrc/config.pyfor consistency with other configurable parameters and easier tuning.The penalty formula
exp(-ratio * 2) - 1produces:
- 0 jobs →
exp(0) - 1 = 0(but handled separately, returns 1.0)- 10 jobs →
exp(-2) - 1 ≈ -0.865- 20 jobs →
exp(-4) - 1 ≈ -0.982This provides smooth saturation toward -1, which is sensible for the design.
📝 Consider moving constants to config
from src.config import ( COST_IDLE_MW, COST_USED_MW, PENALTY_IDLE_NODE, - PENALTY_DROPPED_JOB, MAX_NODES, MAX_NEW_JOBS_PER_HOUR, MAX_QUEUE_SIZE, WEEK_HOURS + PENALTY_DROPPED_JOB, MAX_NODES, MAX_NEW_JOBS_PER_HOUR, MAX_QUEUE_SIZE, WEEK_HOURS, + BLACKOUT_QUEUE_THRESHOLD, BLACKOUT_SATURATION_FACTOR )
| # Use a very high cap; age-based dropping is temporarily disabled in code. | ||
| MAX_JOB_AGE = WEEK_HOURS * 52 * 10 # ~10 years in hours |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# Check all usages of MAX_JOB_AGE and related constants
rg -n "MAX_JOB_AGE" --type py -A 2 -B 2Repository: FairRootGroup/powersched
Length of output: 2438
🏁 Script executed:
# Check the config.py file to see constant definitions
cat -n src/config.py | head -30Repository: FairRootGroup/powersched
Length of output: 1157
🏁 Script executed:
# Find environment.py and check observation space implementation
fd -n "environment.py" --type fRepository: FairRootGroup/powersched
Length of output: 298
🏁 Script executed:
# Find environment.py
fd "environment.py" --type fRepository: FairRootGroup/powersched
Length of output: 86
🏁 Script executed:
# Check environment.py around line 156
rg -n "job_queue.*spaces.Box" --type py -A 5 -B 5Repository: FairRootGroup/powersched
Length of output: 772
🏁 Script executed:
# Get a sense of actual job ages in the codebase - check if there's any documentation or comments
rg -n "job.*age\|age.*job" --type py -i | head -20Repository: FairRootGroup/powersched
Length of output: 50
Observation space uses MAX_JOB_AGE for job_age bounds, creating potential learning issues.
The high bound for the job_queue observation space (line 156 in environment.py) is set to 87,360 (the value of MAX_JOB_AGE), which applies to all four job attributes: duration, age, nodes, and cores. This creates a problematic mismatch since job_age values are typically hours to months while the bound allows for 10 years, whereas other attributes (nodes: 1-16, cores: 1-96) operate at much smaller scales.
For RL algorithms, this causes extreme range disparity in the normalized observation space—job_age values occupy only 1-2% of the possible range—which can degrade learning stability and policy convergence. Consider using a separate, realistic constant for job_age observation bounds (e.g., MAX_JOB_AGE_OBS = EPISODE_HOURS * 26 for 6 months) instead of the disabled-dropping cap.
🤖 Prompt for AI Agents
In `@src/config.py` around lines 9 - 10, The observation `job_queue` uses
MAX_JOB_AGE for all four job attributes causing job_age to be massively
overbounded; add a realistic observation cap like MAX_JOB_AGE_OBS (e.g., set to
EPISODE_HOURS * 26 for ~6 months) in config.py and replace the job_age entry in
the `high` vector used to build the `job_queue` observation space in
environment.py (referencing the `job_queue` observation construction and
`MAX_JOB_AGE` symbol) so that the job_age bound uses MAX_JOB_AGE_OBS while
duration/nodes/cores keep their existing reasonable bounds.
| # 0. Energy efficiency. Reward calculation based on Workload (used nodes) (W) / Cost (C) | ||
| total_cost = power_cost(num_used_nodes, num_idle_nodes, current_price) | ||
| efficiency_reward_norm = self._reward_efficiency_normalized( | ||
| num_used_nodes, num_idle_nodes, num_unprocessed_jobs, total_cost | ||
| ) | ||
| efficiency_reward_norm = self._reward_energy_efficiency_normalized(num_used_nodes, num_idle_nodes) + self._blackout_term(num_used_nodes, num_idle_nodes, num_unprocessed_jobs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Efficiency reward can exceed [0, 1] range when blackout term adds +1.0.
Line 204 computes:
efficiency_reward_norm = self._reward_energy_efficiency_normalized(...) + self._blackout_term(...)_reward_energy_efficiency_normalizedreturns[0, 1]_blackout_termreturns[-1, 1]
When nodes are on: _blackout_term returns 0.0, so the sum is in [0, 1].
When nodes are off with empty queue: _reward_energy_efficiency_normalized returns 0.0 and _blackout_term returns 1.0, so sum is 1.0. ✓
When nodes are off with jobs waiting: _reward_energy_efficiency_normalized returns 0.0 and _blackout_term returns [-1, 0], so sum is [-1, 0]. ✓
The combined range is [-1, 1], not [0, 1]. The variable name efficiency_reward_norm and downstream weighting may assume a [0, 1] range. Verify this is intentional.
🤖 Prompt for AI Agents
In `@src/reward_calculation.py` around lines 202 - 204, The combined efficiency
term (efficiency_reward_norm) can be in [-1,1] because
_reward_energy_efficiency_normalized returns [0,1] and _blackout_term returns
[-1,1]; if downstream code expects [0,1], replace the current sum with a
normalized mapping and clamp: compute raw =
self._reward_energy_efficiency_normalized(...) + self._blackout_term(...), then
set efficiency_reward_norm = max(0.0, min(1.0, (raw + 1.0) / 2.0)) so the result
is guaranteed in [0,1]; update the expression where efficiency_reward_norm is
computed (referencing _reward_energy_efficiency_normalized, _blackout_term, and
the efficiency_reward_norm variable) and add a short comment explaining the
normalization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is intentional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@enlorenz, thank you for confirming! I've noted this design decision for future reference.
✏️ Learnings added
Learnt from: enlorenz
Repo: FairRootGroup/powersched PR: 17
File: src/reward_calculation.py:202-204
Timestamp: 2026-01-28T13:46:11.565Z
Learning: In src/reward_calculation.py, the efficiency_reward_norm variable intentionally has a range of [-1, 1] (not [0, 1]) because it combines _reward_energy_efficiency_normalized [0, 1] with _blackout_term [-1, 1]. This design allows the combined efficiency term to include both positive energy efficiency rewards and negative blackout penalties.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-22T14:42:32.006Z
Learning: Applies to src/reward_calculation.py : Reward calculation must balance efficiency, cost savings, and resource utilization through weighted components
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@src/job_management.py`:
- Around line 9-33: The age_helper_queue docstring is inaccurate because
age-based dropping is currently disabled by a temporary hotfix (the hardcoded
"if False" guard) so jobs are always kept; update the docstring for
age_helper_queue to state that jobs are incremented for age but dropping based
on MAX_JOB_AGE is temporarily disabled (mention the "if False" hotfix and
MAX_JOB_AGE) and add a brief TODO indicating where/how to re-enable the drop
logic (remove the "if False" guard and use job[1] > MAX_JOB_AGE) and that
metrics updates live in the now-disabled branch.
- Around line 132-141: The add_new_jobs() function currently omits jobs pushed
into helper_queue from the returned new_jobs list, causing environment.py to
undercount submitted jobs; update add_new_jobs() so that whenever a job is
appended to helper_queue (the branch that uses helper_queue.append([...]) ),
also append a corresponding job representation to new_jobs (same
structure/metadata you use for queue entries) so helper-queued jobs are included
in the returned list, and update the function docstring to state that returned
new_jobs includes both directly queued and helper-queued jobs (next_empty_slot
return stays the same); ensure this change keeps compatibility with
fill_queue_from_helper() which will still consume helper_queue.
🧹 Nitpick comments (4)
src/job_management.py (1)
231-248: Replaceif Falsewith a config flag for the hotfix toggle.Line 232’s
if False and ...is non-idiomatic and easy to miss. Prefer an explicit flag (e.g.,ENABLE_AGE_DROP) in config so the behavior is discoverable and consistent.♻️ Suggested refactor
-from src.config import ( - MAX_NODES, MAX_JOB_AGE, CORES_PER_NODE -) +from src.config import ( + MAX_NODES, MAX_JOB_AGE, CORES_PER_NODE, ENABLE_AGE_DROP +) ... - if False and new_age > MAX_JOB_AGE: + if ENABLE_AGE_DROP and new_age > MAX_JOB_AGE:src/reward_calculation.py (2)
117-139: Optional simplification: avoid inverted bounds in job-age normalization.
_penalty_job_age()already returns[0, 1], so_normalize(..., 0, -1)is effectively-current_penalty. Consider returning-current_penaltydirectly for clarity.♻️ Possible simplification
- normalized_penalty = self._normalize(current_penalty, 0, -1) - return np.clip(normalized_penalty, -1, 0) + return np.clip(-current_penalty, -1, 0)
141-176: Consider centralizing blackout constants for tuning.
BLACKOUT_QUEUE_THRESHOLDandSATURATION_FACTORare core tuning knobs; promoting them tosrc/config.pykeeps constants centralized and consistent across modules.As per coding guidelines, Configuration constants must be centralized and updated consistently across all modules.
test/test_sanity_env.py (1)
87-87: Use MAX_JOB_AGE constant to keep the test in sync.You already import config constants; consider replacing the hard-coded
168 * 52 * 10withMAX_JOB_AGEto avoid drift.♻️ Suggested change
-from src.config import ( +from src.config import ( MAX_JOB_DURATION, MIN_NODES_PER_JOB, MAX_NODES_PER_JOB, MIN_CORES_PER_JOB, - CORES_PER_NODE, EPISODE_HOURS + CORES_PER_NODE, EPISODE_HOURS, MAX_JOB_AGE ) ... - assert np.all((age >= 0) & (age <= 168 * 52 * 10)), f"age out of bounds min={age.min()} max={age.max()}" + assert np.all((age >= 0) & (age <= MAX_JOB_AGE)), f"age out of bounds min={age.min()} max={age.max()}"As per coding guidelines, Configuration constants must be centralized and updated consistently across all modules.
6763cc8 to
440c0b6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/job_management.py (1)
116-131: Addhelper_queueto the Args docstring.The new parameter is undocumented, which can mislead callers.
Suggested doc fix
Args: job_queue_2d: 2D job queue array (MAX_QUEUE_SIZE, 4) new_jobs_count: Number of new jobs to add new_jobs_durations: List of job durations new_jobs_nodes: List of nodes required per job new_jobs_cores: List of cores per node required per job next_empty_slot: Index of next empty slot in queue + helper_queue: Optional overflow queue for jobs when main queue is full
🤖 Fix all issues with AI agents
In `@src/reward_calculation.py`:
- Around line 135-139: The normalization call in _penalty_job_age_normalized is
using inverted bounds; call _normalize with the true source range (minimum=0,
maximum=1) and then invert the normalized result to map into [-1,0].
Specifically, in _penalty_job_age_normalized replace the
_normalize(current_penalty, 0, -1) usage with _normalize(current_penalty, 0, 1)
and negate the returned value (e.g., -normalized) before applying np.clip; this
keeps _normalize semantics correct and ensures the final value lies in [-1, 0].
🧹 Nitpick comments (5)
test/test_sanity_env.py (1)
85-87: Consider importingMAX_JOB_AGEinstead of hardcoding the magic number.The file already imports several constants from
src.config(lines 24-29). While the comment says "keep hard-coded here to avoid importing env constants," usingMAX_JOB_AGEwould prevent this check from silently becoming stale if the config value changes again.Suggested improvement
from src.config import ( MAX_JOB_DURATION, MIN_NODES_PER_JOB, MAX_NODES_PER_JOB, MIN_CORES_PER_JOB, - CORES_PER_NODE, EPISODE_HOURS + CORES_PER_NODE, EPISODE_HOURS, MAX_JOB_AGE )Then at line 87:
- assert np.all((age >= 0) & (age <= 168 * 52 * 10)), f"age out of bounds min={age.min()} max={age.max()}" + assert np.all((age >= 0) & (age <= MAX_JOB_AGE)), f"age out of bounds min={age.min()} max={age.max()}"src/baseline.py (1)
50-53: Rename unused variable to_per static analysis hint.The
new_baseline_jobsreturn value is not used. Prefixing with underscore signals intentional discard and silences the linter.Proposed fix
- new_baseline_jobs, baseline_next_empty_slot = add_new_jobs( + _, baseline_next_empty_slot = add_new_jobs( job_queue_2d, new_jobs_count, new_jobs_durations, new_jobs_nodes, new_jobs_cores, baseline_next_empty_slot, baseline_helper_queue )src/reward_calculation.py (1)
163-164: Consider moving reward constants toconfig.py.
BLACKOUT_QUEUE_THRESHOLDandSATURATION_FACTORare reward-shaping parameters that affect RL training. Per coding guidelines, configuration constants should be centralized for consistent tuning.Suggested refactor
In
src/config.py:# Blackout reward shaping BLACKOUT_QUEUE_THRESHOLD = 10 # jobs waiting until blackout penalty saturates BLACKOUT_SATURATION_FACTOR = 2Then import and use in
_blackout_term.Based on coding guidelines: "Configuration constants must be centralized and updated consistently across all modules".
src/job_management.py (2)
43-59: Document the required deque-like interface forhelper_queue.
popleft()implies acollections.deque-style container; a brief Args note (or type hint) would prevent accidental misuse.Suggested doc tweak
def fill_queue_from_helper(job_queue_2d, helper_queue, next_empty_slot): """ Move jobs from helper queue into the real queue (FIFO) until full. + Args: + helper_queue: deque-like queue supporting popleft() (e.g., collections.deque). """
236-238: Consider a named flag instead ofif Falseto track the hotfix.A local constant makes intent clearer and easier to toggle/search.
Example refactor
+# TEMP HOTFIX toggle for age-based drops +AGE_DROP_ENABLED = False ... - if False and new_age > MAX_JOB_AGE: + if AGE_DROP_ENABLED and new_age > MAX_JOB_AGE:
| def _penalty_job_age_normalized(self, num_off_nodes, job_queue_2d): | ||
| """Calculate normalized job age penalty [-1, 0].""" | ||
| current_penalty = self._penalty_job_age(num_off_nodes, job_queue_2d) | ||
| normalized_penalty = -self._normalize(current_penalty, self._min_job_age_penalty, self._max_job_age_penalty) | ||
| normalized_penalty = self._normalize(current_penalty, 0, -1) | ||
| return np.clip(normalized_penalty, -1, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Normalization parameters are inverted.
_normalize(current_penalty, 0, -1) will produce incorrect results. With minimum=0 and maximum=-1:
- When
current_penalty=0: returns(0-0)/((-1)-0) = 0 - When
current_penalty=1: returns(1-0)/((-1)-0) = -1
This works by accident because the penalty range is [0, 1] and the output is clipped to [-1, 0]. However, the semantics are confusing—_normalize expects minimum < maximum, but here maximum=-1 is less than minimum=0.
Suggested clarification
def _penalty_job_age_normalized(self, num_off_nodes, job_queue_2d):
"""Calculate normalized job age penalty [-1, 0]."""
current_penalty = self._penalty_job_age(num_off_nodes, job_queue_2d)
- normalized_penalty = self._normalize(current_penalty, 0, -1)
+ # current_penalty is in [0, 1], map to [-1, 0]
+ normalized_penalty = -current_penalty
return np.clip(normalized_penalty, -1, 0)🤖 Prompt for AI Agents
In `@src/reward_calculation.py` around lines 135 - 139, The normalization call in
_penalty_job_age_normalized is using inverted bounds; call _normalize with the
true source range (minimum=0, maximum=1) and then invert the normalized result
to map into [-1,0]. Specifically, in _penalty_job_age_normalized replace the
_normalize(current_penalty, 0, -1) usage with _normalize(current_penalty, 0, 1)
and negate the returned value (e.g., -normalized) before applying np.clip; this
keeps _normalize semantics correct and ensures the final value lies in [-1, 0].
440c0b6 to
b62bdbd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/environment.py`:
- Around line 265-270: The baseline backlog is being aged twice each step: once
via age_backlog_queue(self.baseline_backlog_queue, ...) in environment.py and
again inside baseline.baseline_step(baseline_backlog_queue, metrics, ...).
Remove the duplicate call in environment.py by deleting the
age_backlog_queue(self.baseline_backlog_queue, self.metrics, is_baseline=True)
line so baseline aging is only performed by baseline_step; leave the normal
backlog aging (age_backlog_queue(self.backlog_queue, ...)) and the
fill_queue_from_backlog call unchanged.
🧹 Nitpick comments (4)
test/test_sanity_env.py (1)
87-87: Consider importingMAX_JOB_AGEinstead of hardcoding the value.The hardcoded
168 * 52 * 10matchesMAX_JOB_AGEfromsrc/config.py, but using the constant directly would ensure consistency if the value changes in the future.♻️ Suggested improvement
Add to imports at line 24-29:
from src.config import ( MAX_JOB_DURATION, MIN_NODES_PER_JOB, MAX_NODES_PER_JOB, MIN_CORES_PER_JOB, CORES_PER_NODE, EPISODE_HOURS, MAX_JOB_AGE # Add this )Then update line 87:
- assert np.all((age >= 0) & (age <= 168 * 52 * 10)), f"age out of bounds min={age.min()} max={age.max()}" + assert np.all((age >= 0) & (age <= MAX_JOB_AGE)), f"age out of bounds min={age.min()} max={age.max()}"As per coding guidelines: Configuration constants must be centralized and updated consistently across all modules.
src/baseline.py (1)
50-53: Unused variablenew_baseline_jobscan be replaced with_.The return value
new_baseline_jobsis never used since metrics are updated usingnew_jobs_countdirectly. This is correct behavior, but the variable should be discarded to satisfy the linter and clarify intent.♻️ Suggested fix
- new_baseline_jobs, baseline_next_empty_slot = add_new_jobs( + _, baseline_next_empty_slot = add_new_jobs( job_queue_2d, new_jobs_count, new_jobs_durations, new_jobs_nodes, new_jobs_cores, baseline_next_empty_slot, baseline_backlog_queue )src/reward_calculation.py (1)
158-176: Consider extracting magic numbers as module-level or class-level constants.
BLACKOUT_QUEUE_THRESHOLD = 10andSATURATION_FACTOR = 2are defined inline within the method. For tunability and visibility, these could be moved tosrc/config.pyor at least to class-level constants.♻️ Suggested approach
Either add to
src/config.py:BLACKOUT_QUEUE_THRESHOLD = 10 BLACKOUT_SATURATION_FACTOR = 2Or define as class attributes in
RewardCalculator:class RewardCalculator: BLACKOUT_QUEUE_THRESHOLD = 10 BLACKOUT_SATURATION_FACTOR = 2 ...src/job_management.py (1)
236-237: Theif Falseguard creates dead code that linters may flag.While this is intentional as a temporary hotfix, consider using a named constant (e.g.,
AGE_DROPPING_ENABLED = False) to make the intent clearer and avoid lint warnings about unreachable code.♻️ Suggested improvement
Add to
src/config.py:AGE_DROPPING_ENABLED = False # TEMP: disabled; re-enable when readyThen in
src/job_management.py:+from src.config import AGE_DROPPING_ENABLED ... - if False and new_age > MAX_JOB_AGE: + if AGE_DROPPING_ENABLED and new_age > MAX_JOB_AGE:This also makes the
age_backlog_queuefunction's guard consistent.
b62bdbd to
bae5879
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/environment.py (1)
310-358: Include backlog jobs in blackout/queue wait count.
num_unprocessed_jobsreflects only the main queue. If backlog still holds jobs, the blackout term can incorrectly reward a full blackout despite pending work. Use total waiting jobs (main queue + backlog) when callingRewardCalculator.calculate.🐛 Suggested fix
- num_unprocessed_jobs = np.sum(job_queue_2d[:, 0] > 0) - combined_queue_size = num_unprocessed_jobs + len(self.backlog_queue) + num_unprocessed_jobs = np.sum(job_queue_2d[:, 0] > 0) + combined_queue_size = num_unprocessed_jobs + len(self.backlog_queue) + total_waiting_jobs = combined_queue_size @@ - step_reward, step_cost, eff_reward_norm, price_reward_norm, idle_penalty_norm, job_age_penalty_norm = self.reward_calculator.calculate( + step_reward, step_cost, eff_reward_norm, price_reward_norm, idle_penalty_norm, job_age_penalty_norm = self.reward_calculator.calculate( num_used_nodes, num_idle_nodes, current_price, average_future_price, num_off_nodes, num_launched_jobs, num_node_changes, job_queue_2d, - num_unprocessed_jobs, self.weights, num_dropped_this_step, self.env_print + total_waiting_jobs, self.weights, num_dropped_this_step, self.env_print )
🤖 Fix all issues with AI agents
In `@src/baseline.py`:
- Around line 50-55: The variable new_baseline_jobs returned from the call to
add_new_jobs is unused and triggers a lint warning; rename that binding to
_new_baseline_jobs (or _) in the assignment where add_new_jobs is called so the
call remains the same but the unused result is ignored—update the assignment
that currently reads new_baseline_jobs, baseline_next_empty_slot =
add_new_jobs(...) to _new_baseline_jobs, baseline_next_empty_slot =
add_new_jobs(...), leaving subsequent uses of baseline_next_empty_slot and
metrics.baseline_jobs_submitted / metrics.episode_baseline_jobs_submitted
unchanged.
In `@src/job_management.py`:
- Around line 9-18: The function age_backlog_queue currently has unused
parameters metrics and is_baseline causing ARG001 while the drop branch is
disabled; rename those parameters to _metrics and _is_baseline in the
age_backlog_queue signature (or, alternatively, add a no-op reference like "del
metrics, is_baseline" at the top of the function) so the linter knows they are
intentionally unused until the drop logic is re-enabled; ensure you update any
internal references to those names accordingly when you later restore the
branch.
🧹 Nitpick comments (1)
src/callbacks.py (1)
54-71: Consider extracting helper functions to reduce duplication.The completion rate and average wait time calculation patterns are repeated for both agent and baseline metrics. While the current implementation is correct and readable, extracting these into small helper functions would improve maintainability.
♻️ Optional refactor to reduce duplication
+def _safe_completion_rate(completed: int, submitted: int) -> float: + return (completed / submitted * 100) if submitted > 0 else 0.0 + +def _safe_avg_wait(total_wait: float, completed: int) -> float: + return (total_wait / completed) if completed > 0 else 0.0 + class ComputeClusterCallback(BaseCallback):Then in
_on_step:# Job metrics (agent) - completion_rate = ( - env.metrics.episode_jobs_completed / env.metrics.episode_jobs_submitted * 100 - if env.metrics.episode_jobs_submitted > 0 - else 0.0 - ) - avg_wait = ( - env.metrics.episode_total_job_wait_time / env.metrics.episode_jobs_completed - if env.metrics.episode_jobs_completed > 0 - else 0.0 - ) + completion_rate = _safe_completion_rate( + env.metrics.episode_jobs_completed, env.metrics.episode_jobs_submitted + ) + avg_wait = _safe_avg_wait( + env.metrics.episode_total_job_wait_time, env.metrics.episode_jobs_completed + ) ... # Job metrics (baseline) - baseline_completion_rate = ( - env.metrics.episode_baseline_jobs_completed / env.metrics.episode_baseline_jobs_submitted * 100 - if env.metrics.episode_baseline_jobs_submitted > 0 - else 0.0 - ) - baseline_avg_wait = ( - env.metrics.episode_baseline_total_job_wait_time / env.metrics.episode_baseline_jobs_completed - if env.metrics.episode_baseline_jobs_completed > 0 - else 0.0 - ) + baseline_completion_rate = _safe_completion_rate( + env.metrics.episode_baseline_jobs_completed, env.metrics.episode_baseline_jobs_submitted + ) + baseline_avg_wait = _safe_avg_wait( + env.metrics.episode_baseline_total_job_wait_time, env.metrics.episode_baseline_jobs_completed + )
f0924ed to
88b68ea
Compare
helper_queue allows to keep observation space at "reduced" queue-length, while no job is rejected.
Set Max-Age to very high number (observation space limit). Hotfixed dropping jobs after 1 week, kept logic, if actual dropping shall be implemented later. Age-Penalty now looks at the age of the oldest waiting job; smooth exponential for early times, saturates at -1 after one week.
…erlapping tags Added metrics beyond the original five are: metrics/episode_baseline_cost metrics/episode_baseline_cost_off metrics/episode_jobs_submitted metrics/episode_jobs_completed metrics/episode_completion_rate metrics/episode_avg_wait_hours metrics/episode_max_queue_size metrics/episode_jobs_rejected_queue_full metrics/episode_baseline_jobs_submitted metrics/episode_baseline_jobs_completed metrics/episode_baseline_completion_rate metrics/episode_baseline_avg_wait_hours metrics/episode_baseline_max_queue_size metrics/episode_baseline_jobs_dropped metrics/episode_baseline_jobs_rejected_queue_full
88b68ea to
e7d63d2
Compare