Rework of Rewards and Queueing + Small hotfixes #17

enlorenz · 2026-01-28T13:32:04Z

After Refactor, "new" Energy-efficiency reward was gone. It's reintroduced, and put through where "efficiency-reward" is carried.
Introduce helper_queue to handle job-submission overflow. Agent still only sees 1000 MAX_QUEUE as observation space, while excess jobs are caught by helper.
Reworked job-age penalty. Now, takes age of longest-waiting job. Smooth rise of penalty with age, after 1 Week max. penalty reached.
Added more metrics/logs into callbacks for tensorboard.
Some small bug-fixes.

… refactoring. Now reintroduced and used instead of "efficiency" function.

coderabbitai · 2026-01-28T13:39:22Z

Warning

Rate limit exceeded

@enlorenz has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 12 minutes and 36 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📝 Walkthrough

Walkthrough

Adds deque-based backlog queues and helpers, integrates backlog into job submission and baseline flows, disables age-based dropping, introduces energy-efficiency and blackout reward terms, expands hourly/episode metrics, increases MAX_JOB_AGE/observable age, updates training naming, and adds torchinfo dependency.

Changes

Cohort / File(s)	Summary
Queue Management & Core Flow `src/environment.py`, `src/job_management.py`, `src/baseline.py`	Add deque-based `backlog_queue` and `baseline_backlog_queue`; new helpers `age_backlog_queue()` and `fill_queue_from_backlog()`; `add_new_jobs(..., backlog_queue=None)` accepts backlog for overflow; `baseline_step(..., baseline_backlog_queue)` signature extended; age-based dropping guarded/disabled.
Metrics & Logging `src/callbacks.py`, `src/environment.py`, `src/baseline.py`	Expand per-hour/episode metrics to include detailed agent and baseline job stats (submitted, completed, completion rates, avg wait, max queue size, dropped/rejected) and additional cost/savings keys; metrics updated to account for backlog in combined queue sizing.
Reward Calculation `src/reward_calculation.py`	Replace prior job-age penalty with a saturating normalized formulation; add `_reward_energy_efficiency_normalized()` and `_blackout_term()`; adjust reward composition to separate energy-efficiency and blackout terms.
Configuration & Tests `src/config.py`, `test/test_sanity_env.py`	Increase `MAX_JOB_AGE` to `WEEK_HOURS * 52 * 10`; add `MAX_JOB_AGE_OBS` (EPISODE_HOURS * 13) and widen test invariants/observation bounds accordingly.
Training & Naming `train.py`	Reorder `weights_prefix` to include `a{job_age_weight}` before `d{drop_weight}`, changing derived model/log directory naming.
Dependencies `requirements.txt`	Add `torchinfo` dependency.

Sequence Diagram(s)

sequenceDiagram
    participant Env as Environment
    participant JM as JobManagement
    participant Q as MainQueue
    participant HQ as BacklogQueue
    participant BL as Baseline
    participant Nodes as Nodes

    Env->>JM: age_backlog_queue(HQ, metrics)
    JM-->>Env: updated backlog, dropped_count

    Env->>JM: fill_queue_from_backlog(Q, HQ, next_empty_slot)
    JM->>HQ: pop FIFO jobs
    JM->>Q: insert into main queue
    JM-->>Env: next_empty_slot, moved_count

    Env->>JM: add_new_jobs(Q, new_jobs..., backlog_queue=HQ)
    alt main queue full
        JM->>HQ: append overflow job
    else space available
        JM->>Q: insert job
    end
    JM-->>Env: new_jobs, next_empty_slot

    Env->>BL: baseline_step(..., baseline_backlog_queue=HQ)
    BL->>JM: (may) age/fill backlog helpers
    BL-->>Env: baseline results

    Env->>Nodes: assign_jobs_to_available_nodes(..., metrics, is_baseline?)
    Nodes-->>Env: updated running jobs / completions

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Next empty slot bump #2: Modifies add_new_jobs, queue APIs and signatures—closely related to backlog/signature changes.
Rebuilt carry-over logic to fit with new infrastructure. Also added two standalone plotting tests. #11: Overlaps baseline step and baseline job/metric accounting touched here.
RewardCalculator #6: Related updates to RewardCalculator and reward composition logic.

Suggested reviewers

rbx

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 68.42% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the two main changes: rewards rework (energy-efficiency reintroduction and job-age penalty rework) and queueing improvements (helper_queue for overflow handling), plus small bug fixes.
Description check	✅ Passed	The description is directly related to the changeset, detailing all major changes including energy-efficiency reward reintroduction, helper_queue implementation, job-age penalty rework, new metrics, and bug fixes.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In `@src/config.py`:
- Around line 9-10: The observation `job_queue` uses MAX_JOB_AGE for all four
job attributes causing job_age to be massively overbounded; add a realistic
observation cap like MAX_JOB_AGE_OBS (e.g., set to EPISODE_HOURS * 26 for ~6
months) in config.py and replace the job_age entry in the `high` vector used to
build the `job_queue` observation space in environment.py (referencing the
`job_queue` observation construction and `MAX_JOB_AGE` symbol) so that the
job_age bound uses MAX_JOB_AGE_OBS while duration/nodes/cores keep their
existing reasonable bounds.

In `@src/environment.py`:
- Line 340: Replace the extraneous f-string used in the env output: change the
call to self.env_print(f"[5] Calculating reward...") so it uses a plain string
without the f prefix (self.env_print("[5] Calculating reward...")); update the
occurrence inside the surrounding method where self.env_print is called to
remove the unnecessary f-string formatting.

In `@src/reward_calculation.py`:
- Around line 202-204: The combined efficiency term (efficiency_reward_norm) can
be in [-1,1] because _reward_energy_efficiency_normalized returns [0,1] and
_blackout_term returns [-1,1]; if downstream code expects [0,1], replace the
current sum with a normalized mapping and clamp: compute raw =
self._reward_energy_efficiency_normalized(...) + self._blackout_term(...), then
set efficiency_reward_norm = max(0.0, min(1.0, (raw + 1.0) / 2.0)) so the result
is guaranteed in [0,1]; update the expression where efficiency_reward_norm is
computed (referencing _reward_energy_efficiency_normalized, _blackout_term, and
the efficiency_reward_norm variable) and add a short comment explaining the
normalization.

🧹 Nitpick comments (6)

requirements.txt (1)

8-8: Consider pinning dependency versions for reproducibility.

The new torchinfo dependency follows the existing pattern of unpinned versions. For production stability and reproducible builds, consider pinning versions for all dependencies (e.g., torchinfo==1.8.0). This is optional since it matches the current style.
src/baseline.py (2)
16-17: Update docstring to include baseline_helper_queue parameter.

The new baseline_helper_queue parameter is added to the function signature but not documented in the Args section of the docstring.
Proposed fix

Add after line 35:
        baseline_helper_queue: Helper queue for baseline overflow jobs
Also applies to: 36-36

50-53: Use underscore prefix for unused variable.

new_baseline_jobs is never used after assignment. Per Python convention, prefix unused variables with _ to signal intent.
Proposed fix
-    new_baseline_jobs, baseline_next_empty_slot = add_new_jobs(
+    _new_baseline_jobs, baseline_next_empty_slot = add_new_jobs(
         job_queue_2d, new_jobs_count, new_jobs_durations,
         new_jobs_nodes, new_jobs_cores, baseline_next_empty_slot, baseline_helper_queue
     )
src/job_management.py (1)
231-232: Document the hotfix more explicitly for future cleanup.

The if False and new_age > MAX_JOB_AGE pattern effectively disables dropping. Consider adding a TODO comment with a tracking reference (e.g., issue number) to ensure this temporary measure gets revisited.
📝 Suggested improvement
-        # TEMP HOTFIX: disable age-based dropping (keep logic for later).
-        if False and new_age > MAX_JOB_AGE:
+        # TEMP HOTFIX: disable age-based dropping (keep logic for later).
+        # TODO(enlorenz): Re-enable age-based dropping once queue overflow is properly handled.
+        if False and new_age > MAX_JOB_AGE:
src/reward_calculation.py (2)
135-139: Normalization produces incorrect values: _normalize(current, 0, -1) inverts the scale unexpectedly.

The _normalize method computes (current - minimum) / (maximum - minimum). With minimum=0 and maximum=-1:

When current_penalty = 0: returns (0 - 0) / (-1 - 0) = 0

When current_penalty = 1: returns (1 - 0) / (-1 - 0) = -1

This effectively maps [0, 1] → [0, -1], which is the intended outcome, but the parameter ordering is confusing (passing -1 as maximum when it's actually the minimum of the output range). The subsequent clip(-1, 0) ensures correctness, but the code is misleading.
♻️ Clearer implementation
     def _penalty_job_age_normalized(self, num_off_nodes, job_queue_2d):
         """Calculate normalized job age penalty [-1, 0]."""
         current_penalty = self._penalty_job_age(num_off_nodes, job_queue_2d)
-        normalized_penalty = self._normalize(current_penalty, 0, -1)
-        return np.clip(normalized_penalty, -1, 0)
+        # current_penalty is in [0, 1], map to [-1, 0]
+        return -current_penalty
158-176: Blackout term uses hardcoded magic numbers.

BLACKOUT_QUEUE_THRESHOLD = 10 and SATURATION_FACTOR = 2 are defined inline. Consider extracting these to src/config.py for consistency with other configurable parameters and easier tuning.

The penalty formula exp(-ratio * 2) - 1 produces:

0 jobs → exp(0) - 1 = 0 (but handled separately, returns 1.0)

10 jobs → exp(-2) - 1 ≈ -0.865

20 jobs → exp(-4) - 1 ≈ -0.982

This provides smooth saturation toward -1, which is sensible for the design.
📝 Consider moving constants to config
 from src.config import (
     COST_IDLE_MW, COST_USED_MW, PENALTY_IDLE_NODE,
-    PENALTY_DROPPED_JOB, MAX_NODES, MAX_NEW_JOBS_PER_HOUR, MAX_QUEUE_SIZE, WEEK_HOURS
+    PENALTY_DROPPED_JOB, MAX_NODES, MAX_NEW_JOBS_PER_HOUR, MAX_QUEUE_SIZE, WEEK_HOURS,
+    BLACKOUT_QUEUE_THRESHOLD, BLACKOUT_SATURATION_FACTOR
 )

coderabbitai · 2026-01-28T13:39:25Z

src/config.py

+# Use a very high cap; age-based dropping is temporarily disabled in code.
+MAX_JOB_AGE = WEEK_HOURS * 52 * 10  # ~10 years in hours


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check all usages of MAX_JOB_AGE and related constants rg -n "MAX_JOB_AGE" --type py -A 2 -B 2

Repository: FairRootGroup/powersched

Length of output: 2438

🏁 Script executed:

# Check the config.py file to see constant definitions cat -n src/config.py | head -30

Repository: FairRootGroup/powersched

Length of output: 1157

🏁 Script executed:

# Find environment.py and check observation space implementation fd -n "environment.py" --type f

Repository: FairRootGroup/powersched

Length of output: 298

🏁 Script executed:

# Find environment.py fd "environment.py" --type f

Repository: FairRootGroup/powersched

Length of output: 86

🏁 Script executed:

# Check environment.py around line 156 rg -n "job_queue.*spaces.Box" --type py -A 5 -B 5

Repository: FairRootGroup/powersched

Length of output: 772

🏁 Script executed:

# Get a sense of actual job ages in the codebase - check if there's any documentation or comments rg -n "job.*age\|age.*job" --type py -i | head -20

Repository: FairRootGroup/powersched

Length of output: 50

Observation space uses MAX_JOB_AGE for job_age bounds, creating potential learning issues.

The high bound for the job_queue observation space (line 156 in environment.py) is set to 87,360 (the value of MAX_JOB_AGE), which applies to all four job attributes: duration, age, nodes, and cores. This creates a problematic mismatch since job_age values are typically hours to months while the bound allows for 10 years, whereas other attributes (nodes: 1-16, cores: 1-96) operate at much smaller scales.

For RL algorithms, this causes extreme range disparity in the normalized observation space—job_age values occupy only 1-2% of the possible range—which can degrade learning stability and policy convergence. Consider using a separate, realistic constant for job_age observation bounds (e.g., MAX_JOB_AGE_OBS = EPISODE_HOURS * 26 for 6 months) instead of the disabled-dropping cap.

🤖 Prompt for AI Agents

In `@src/config.py` around lines 9 - 10, The observation `job_queue` uses MAX_JOB_AGE for all four job attributes causing job_age to be massively overbounded; add a realistic observation cap like MAX_JOB_AGE_OBS (e.g., set to EPISODE_HOURS * 26 for ~6 months) in config.py and replace the job_age entry in the `high` vector used to build the `job_queue` observation space in environment.py (referencing the `job_queue` observation construction and `MAX_JOB_AGE` symbol) so that the job_age bound uses MAX_JOB_AGE_OBS while duration/nodes/cores keep their existing reasonable bounds.

src/environment.py

coderabbitai · 2026-01-28T13:39:25Z

src/reward_calculation.py

+        # 0. Energy efficiency. Reward calculation based on Workload (used nodes) (W) / Cost (C)
        total_cost = power_cost(num_used_nodes, num_idle_nodes, current_price)
-        efficiency_reward_norm = self._reward_efficiency_normalized(
-            num_used_nodes, num_idle_nodes, num_unprocessed_jobs, total_cost
-        )
+        efficiency_reward_norm = self._reward_energy_efficiency_normalized(num_used_nodes, num_idle_nodes) + self._blackout_term(num_used_nodes, num_idle_nodes, num_unprocessed_jobs)


⚠️ Potential issue | 🟡 Minor

Efficiency reward can exceed [0, 1] range when blackout term adds +1.0.

Line 204 computes:

efficiency_reward_norm = self._reward_energy_efficiency_normalized(...) + self._blackout_term(...)

_reward_energy_efficiency_normalized returns [0, 1]

_blackout_term returns [-1, 1]

When nodes are on: _blackout_term returns 0.0, so the sum is in [0, 1].
When nodes are off with empty queue: _reward_energy_efficiency_normalized returns 0.0 and _blackout_term returns 1.0, so sum is 1.0. ✓
When nodes are off with jobs waiting: _reward_energy_efficiency_normalized returns 0.0 and _blackout_term returns [-1, 0], so sum is [-1, 0]. ✓

The combined range is [-1, 1], not [0, 1]. The variable name efficiency_reward_norm and downstream weighting may assume a [0, 1] range. Verify this is intentional.

🤖 Prompt for AI Agents

In `@src/reward_calculation.py` around lines 202 - 204, The combined efficiency term (efficiency_reward_norm) can be in [-1,1] because _reward_energy_efficiency_normalized returns [0,1] and _blackout_term returns [-1,1]; if downstream code expects [0,1], replace the current sum with a normalized mapping and clamp: compute raw = self._reward_energy_efficiency_normalized(...) + self._blackout_term(...), then set efficiency_reward_norm = max(0.0, min(1.0, (raw + 1.0) / 2.0)) so the result is guaranteed in [0,1]; update the expression where efficiency_reward_norm is computed (referencing _reward_energy_efficiency_normalized, _blackout_term, and the efficiency_reward_norm variable) and add a short comment explaining the normalization.

Yes, it is intentional.

@enlorenz, thank you for confirming! I've noted this design decision for future reference.

✏️ Learnings added

Learnt from: enlorenz Repo: FairRootGroup/powersched PR: 17 File: src/reward_calculation.py:202-204 Timestamp: 2026-01-28T13:46:11.565Z Learning: In src/reward_calculation.py, the efficiency_reward_norm variable intentionally has a range of [-1, 1] (not [0, 1]) because it combines _reward_energy_efficiency_normalized [0, 1] with _blackout_term [-1, 1]. This design allows the combined efficiency term to include both positive energy efficiency rewards and negative blackout penalties.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

🧠 Learnings used

Learnt from: CR Repo: FairRootGroup/powersched PR: 0 File: CLAUDE.md:0-0 Timestamp: 2026-01-22T14:42:32.006Z Learning: Applies to src/reward_calculation.py : Reward calculation must balance efficiency, cost savings, and resource utilization through weighted components

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@src/job_management.py`:
- Around line 9-33: The age_helper_queue docstring is inaccurate because
age-based dropping is currently disabled by a temporary hotfix (the hardcoded
"if False" guard) so jobs are always kept; update the docstring for
age_helper_queue to state that jobs are incremented for age but dropping based
on MAX_JOB_AGE is temporarily disabled (mention the "if False" hotfix and
MAX_JOB_AGE) and add a brief TODO indicating where/how to re-enable the drop
logic (remove the "if False" guard and use job[1] > MAX_JOB_AGE) and that
metrics updates live in the now-disabled branch.
- Around line 132-141: The add_new_jobs() function currently omits jobs pushed
into helper_queue from the returned new_jobs list, causing environment.py to
undercount submitted jobs; update add_new_jobs() so that whenever a job is
appended to helper_queue (the branch that uses helper_queue.append([...]) ),
also append a corresponding job representation to new_jobs (same
structure/metadata you use for queue entries) so helper-queued jobs are included
in the returned list, and update the function docstring to state that returned
new_jobs includes both directly queued and helper-queued jobs (next_empty_slot
return stays the same); ensure this change keeps compatibility with
fill_queue_from_helper() which will still consume helper_queue.

🧹 Nitpick comments (4)

src/job_management.py (1)
231-248: Replace if False with a config flag for the hotfix toggle.

Line 232’s if False and ... is non-idiomatic and easy to miss. Prefer an explicit flag (e.g., ENABLE_AGE_DROP) in config so the behavior is discoverable and consistent.
♻️ Suggested refactor
-from src.config import (
-    MAX_NODES, MAX_JOB_AGE, CORES_PER_NODE
-)
+from src.config import (
+    MAX_NODES, MAX_JOB_AGE, CORES_PER_NODE, ENABLE_AGE_DROP
+)
 ...
-        if False and new_age > MAX_JOB_AGE:
+        if ENABLE_AGE_DROP and new_age > MAX_JOB_AGE:
src/reward_calculation.py (2)
117-139: Optional simplification: avoid inverted bounds in job-age normalization.

_penalty_job_age() already returns [0, 1], so _normalize(..., 0, -1) is effectively -current_penalty. Consider returning -current_penalty directly for clarity.
♻️ Possible simplification
-        normalized_penalty = self._normalize(current_penalty, 0, -1)
-        return np.clip(normalized_penalty, -1, 0)
+        return np.clip(-current_penalty, -1, 0)
141-176: Consider centralizing blackout constants for tuning.

BLACKOUT_QUEUE_THRESHOLD and SATURATION_FACTOR are core tuning knobs; promoting them to src/config.py keeps constants centralized and consistent across modules.

As per coding guidelines, Configuration constants must be centralized and updated consistently across all modules.
test/test_sanity_env.py (1)
87-87: Use MAX_JOB_AGE constant to keep the test in sync.

You already import config constants; consider replacing the hard-coded 168 * 52 * 10 with MAX_JOB_AGE to avoid drift.
♻️ Suggested change
-from src.config import (
+from src.config import (
     MAX_JOB_DURATION,
     MIN_NODES_PER_JOB, MAX_NODES_PER_JOB,
     MIN_CORES_PER_JOB,
-    CORES_PER_NODE, EPISODE_HOURS
+    CORES_PER_NODE, EPISODE_HOURS, MAX_JOB_AGE
 )
 ...
-    assert np.all((age >= 0) & (age <= 168 * 52 * 10)), f"age out of bounds min={age.min()} max={age.max()}"
+    assert np.all((age >= 0) & (age <= MAX_JOB_AGE)), f"age out of bounds min={age.min()} max={age.max()}"
As per coding guidelines, Configuration constants must be centralized and updated consistently across all modules.

src/job_management.py

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/job_management.py (1)

116-131: Add helper_queue to the Args docstring.

The new parameter is undocumented, which can mislead callers.

Suggested doc fix

     Args:
         job_queue_2d: 2D job queue array (MAX_QUEUE_SIZE, 4)
         new_jobs_count: Number of new jobs to add
         new_jobs_durations: List of job durations
         new_jobs_nodes: List of nodes required per job
         new_jobs_cores: List of cores per node required per job
         next_empty_slot: Index of next empty slot in queue
+        helper_queue: Optional overflow queue for jobs when main queue is full

🤖 Fix all issues with AI agents

In `@src/reward_calculation.py`:
- Around line 135-139: The normalization call in _penalty_job_age_normalized is
using inverted bounds; call _normalize with the true source range (minimum=0,
maximum=1) and then invert the normalized result to map into [-1,0].
Specifically, in _penalty_job_age_normalized replace the
_normalize(current_penalty, 0, -1) usage with _normalize(current_penalty, 0, 1)
and negate the returned value (e.g., -normalized) before applying np.clip; this
keeps _normalize semantics correct and ensures the final value lies in [-1, 0].

🧹 Nitpick comments (5)

test/test_sanity_env.py (1)
85-87: Consider importing MAX_JOB_AGE instead of hardcoding the magic number.

The file already imports several constants from src.config (lines 24-29). While the comment says "keep hard-coded here to avoid importing env constants," using MAX_JOB_AGE would prevent this check from silently becoming stale if the config value changes again.
Suggested improvement
 from src.config import (
     MAX_JOB_DURATION,
     MIN_NODES_PER_JOB, MAX_NODES_PER_JOB,
     MIN_CORES_PER_JOB,
-    CORES_PER_NODE, EPISODE_HOURS
+    CORES_PER_NODE, EPISODE_HOURS, MAX_JOB_AGE
 )
Then at line 87:
-    assert np.all((age >= 0) & (age <= 168 * 52 * 10)), f"age out of bounds min={age.min()} max={age.max()}"
+    assert np.all((age >= 0) & (age <= MAX_JOB_AGE)), f"age out of bounds min={age.min()} max={age.max()}"
src/baseline.py (1)
50-53: Rename unused variable to _ per static analysis hint.

The new_baseline_jobs return value is not used. Prefixing with underscore signals intentional discard and silences the linter.
Proposed fix
-    new_baseline_jobs, baseline_next_empty_slot = add_new_jobs(
+    _, baseline_next_empty_slot = add_new_jobs(
         job_queue_2d, new_jobs_count, new_jobs_durations,
         new_jobs_nodes, new_jobs_cores, baseline_next_empty_slot, baseline_helper_queue
     )
src/reward_calculation.py (1)
163-164: Consider moving reward constants to config.py.

BLACKOUT_QUEUE_THRESHOLD and SATURATION_FACTOR are reward-shaping parameters that affect RL training. Per coding guidelines, configuration constants should be centralized for consistent tuning.
Suggested refactor

In src/config.py:
# Blackout reward shaping
BLACKOUT_QUEUE_THRESHOLD = 10  # jobs waiting until blackout penalty saturates
BLACKOUT_SATURATION_FACTOR = 2
Then import and use in _blackout_term.
Based on coding guidelines: "Configuration constants must be centralized and updated consistently across all modules".
src/job_management.py (2)
43-59: Document the required deque-like interface for helper_queue.

popleft() implies a collections.deque-style container; a brief Args note (or type hint) would prevent accidental misuse.
Suggested doc tweak
 def fill_queue_from_helper(job_queue_2d, helper_queue, next_empty_slot):
     """
     Move jobs from helper queue into the real queue (FIFO) until full.
+    Args:
+        helper_queue: deque-like queue supporting popleft() (e.g., collections.deque).
     """
236-238: Consider a named flag instead of if False to track the hotfix.

A local constant makes intent clearer and easier to toggle/search.
Example refactor
+# TEMP HOTFIX toggle for age-based drops
+AGE_DROP_ENABLED = False
 ...
-        if False and new_age > MAX_JOB_AGE:
+        if AGE_DROP_ENABLED and new_age > MAX_JOB_AGE:

coderabbitai · 2026-01-28T14:14:27Z

src/reward_calculation.py

    def _penalty_job_age_normalized(self, num_off_nodes, job_queue_2d):
        """Calculate normalized job age penalty [-1, 0]."""
        current_penalty = self._penalty_job_age(num_off_nodes, job_queue_2d)
-        normalized_penalty = -self._normalize(current_penalty, self._min_job_age_penalty, self._max_job_age_penalty)
+        normalized_penalty = self._normalize(current_penalty, 0, -1)
        return np.clip(normalized_penalty, -1, 0)


⚠️ Potential issue | 🟡 Minor

Bug: Normalization parameters are inverted.

_normalize(current_penalty, 0, -1) will produce incorrect results. With minimum=0 and maximum=-1:

When current_penalty=0: returns (0-0)/((-1)-0) = 0

When current_penalty=1: returns (1-0)/((-1)-0) = -1

This works by accident because the penalty range is [0, 1] and the output is clipped to [-1, 0]. However, the semantics are confusing—_normalize expects minimum < maximum, but here maximum=-1 is less than minimum=0.

Suggested clarification

def _penalty_job_age_normalized(self, num_off_nodes, job_queue_2d): """Calculate normalized job age penalty [-1, 0].""" current_penalty = self._penalty_job_age(num_off_nodes, job_queue_2d) - normalized_penalty = self._normalize(current_penalty, 0, -1) + # current_penalty is in [0, 1], map to [-1, 0] + normalized_penalty = -current_penalty return np.clip(normalized_penalty, -1, 0)

🤖 Prompt for AI Agents

In `@src/reward_calculation.py` around lines 135 - 139, The normalization call in _penalty_job_age_normalized is using inverted bounds; call _normalize with the true source range (minimum=0, maximum=1) and then invert the normalized result to map into [-1,0]. Specifically, in _penalty_job_age_normalized replace the _normalize(current_penalty, 0, -1) usage with _normalize(current_penalty, 0, 1) and negate the returned value (e.g., -normalized) before applying np.clip; this keeps _normalize semantics correct and ensures the final value lies in [-1, 0].

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@src/environment.py`:
- Around line 265-270: The baseline backlog is being aged twice each step: once
via age_backlog_queue(self.baseline_backlog_queue, ...) in environment.py and
again inside baseline.baseline_step(baseline_backlog_queue, metrics, ...).
Remove the duplicate call in environment.py by deleting the
age_backlog_queue(self.baseline_backlog_queue, self.metrics, is_baseline=True)
line so baseline aging is only performed by baseline_step; leave the normal
backlog aging (age_backlog_queue(self.backlog_queue, ...)) and the
fill_queue_from_backlog call unchanged.

🧹 Nitpick comments (4)

test/test_sanity_env.py (1)
87-87: Consider importing MAX_JOB_AGE instead of hardcoding the value.

The hardcoded 168 * 52 * 10 matches MAX_JOB_AGE from src/config.py, but using the constant directly would ensure consistency if the value changes in the future.
♻️ Suggested improvement

Add to imports at line 24-29:
from src.config import (
    MAX_JOB_DURATION,
    MIN_NODES_PER_JOB, MAX_NODES_PER_JOB,
    MIN_CORES_PER_JOB,
    CORES_PER_NODE, EPISODE_HOURS,
    MAX_JOB_AGE  # Add this
)
Then update line 87:
-    assert np.all((age >= 0) & (age <= 168 * 52 * 10)), f"age out of bounds min={age.min()} max={age.max()}"
+    assert np.all((age >= 0) & (age <= MAX_JOB_AGE)), f"age out of bounds min={age.min()} max={age.max()}"
As per coding guidelines: Configuration constants must be centralized and updated consistently across all modules.
src/baseline.py (1)
50-53: Unused variable new_baseline_jobs can be replaced with _.

The return value new_baseline_jobs is never used since metrics are updated using new_jobs_count directly. This is correct behavior, but the variable should be discarded to satisfy the linter and clarify intent.
♻️ Suggested fix
-    new_baseline_jobs, baseline_next_empty_slot = add_new_jobs(
+    _, baseline_next_empty_slot = add_new_jobs(
         job_queue_2d, new_jobs_count, new_jobs_durations,
         new_jobs_nodes, new_jobs_cores, baseline_next_empty_slot, baseline_backlog_queue
     )
src/reward_calculation.py (1)
158-176: Consider extracting magic numbers as module-level or class-level constants.

BLACKOUT_QUEUE_THRESHOLD = 10 and SATURATION_FACTOR = 2 are defined inline within the method. For tunability and visibility, these could be moved to src/config.py or at least to class-level constants.
♻️ Suggested approach

Either add to src/config.py:
BLACKOUT_QUEUE_THRESHOLD = 10
BLACKOUT_SATURATION_FACTOR = 2
Or define as class attributes in RewardCalculator:
class RewardCalculator:
    BLACKOUT_QUEUE_THRESHOLD = 10
    BLACKOUT_SATURATION_FACTOR = 2
    ...
src/job_management.py (1)
236-237: The if False guard creates dead code that linters may flag.

While this is intentional as a temporary hotfix, consider using a named constant (e.g., AGE_DROPPING_ENABLED = False) to make the intent clearer and avoid lint warnings about unreachable code.
♻️ Suggested improvement

Add to src/config.py:
AGE_DROPPING_ENABLED = False  # TEMP: disabled; re-enable when ready
Then in src/job_management.py:
+from src.config import AGE_DROPPING_ENABLED
...
-        if False and new_age > MAX_JOB_AGE:
+        if AGE_DROPPING_ENABLED and new_age > MAX_JOB_AGE:
This also makes the age_backlog_queue function's guard consistent.

src/environment.py

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/environment.py (1)

310-358: Include backlog jobs in blackout/queue wait count.

num_unprocessed_jobs reflects only the main queue. If backlog still holds jobs, the blackout term can incorrectly reward a full blackout despite pending work. Use total waiting jobs (main queue + backlog) when calling RewardCalculator.calculate.

🐛 Suggested fix

-        num_unprocessed_jobs = np.sum(job_queue_2d[:, 0] > 0)
-        combined_queue_size = num_unprocessed_jobs + len(self.backlog_queue)
+        num_unprocessed_jobs = np.sum(job_queue_2d[:, 0] > 0)
+        combined_queue_size = num_unprocessed_jobs + len(self.backlog_queue)
+        total_waiting_jobs = combined_queue_size
@@
-        step_reward, step_cost, eff_reward_norm, price_reward_norm, idle_penalty_norm, job_age_penalty_norm = self.reward_calculator.calculate(
+        step_reward, step_cost, eff_reward_norm, price_reward_norm, idle_penalty_norm, job_age_penalty_norm = self.reward_calculator.calculate(
             num_used_nodes, num_idle_nodes, current_price, average_future_price,
             num_off_nodes, num_launched_jobs, num_node_changes, job_queue_2d,
-            num_unprocessed_jobs, self.weights, num_dropped_this_step, self.env_print
+            total_waiting_jobs, self.weights, num_dropped_this_step, self.env_print
         )

🤖 Fix all issues with AI agents

In `@src/baseline.py`:
- Around line 50-55: The variable new_baseline_jobs returned from the call to
add_new_jobs is unused and triggers a lint warning; rename that binding to
_new_baseline_jobs (or _) in the assignment where add_new_jobs is called so the
call remains the same but the unused result is ignored—update the assignment
that currently reads new_baseline_jobs, baseline_next_empty_slot =
add_new_jobs(...) to _new_baseline_jobs, baseline_next_empty_slot =
add_new_jobs(...), leaving subsequent uses of baseline_next_empty_slot and
metrics.baseline_jobs_submitted / metrics.episode_baseline_jobs_submitted
unchanged.

In `@src/job_management.py`:
- Around line 9-18: The function age_backlog_queue currently has unused
parameters metrics and is_baseline causing ARG001 while the drop branch is
disabled; rename those parameters to _metrics and _is_baseline in the
age_backlog_queue signature (or, alternatively, add a no-op reference like "del
metrics, is_baseline" at the top of the function) so the linter knows they are
intentionally unused until the drop logic is re-enabled; ensure you update any
internal references to those names accordingly when you later restore the
branch.

🧹 Nitpick comments (1)

src/callbacks.py (1)

54-71: Consider extracting helper functions to reduce duplication.

The completion rate and average wait time calculation patterns are repeated for both agent and baseline metrics. While the current implementation is correct and readable, extracting these into small helper functions would improve maintainability.

♻️ Optional refactor to reduce duplication

+def _safe_completion_rate(completed: int, submitted: int) -> float:
+    return (completed / submitted * 100) if submitted > 0 else 0.0
+
+def _safe_avg_wait(total_wait: float, completed: int) -> float:
+    return (total_wait / completed) if completed > 0 else 0.0
+
 class ComputeClusterCallback(BaseCallback):

Then in _on_step:

             # Job metrics (agent)
-            completion_rate = (
-                env.metrics.episode_jobs_completed / env.metrics.episode_jobs_submitted * 100
-                if env.metrics.episode_jobs_submitted > 0
-                else 0.0
-            )
-            avg_wait = (
-                env.metrics.episode_total_job_wait_time / env.metrics.episode_jobs_completed
-                if env.metrics.episode_jobs_completed > 0
-                else 0.0
-            )
+            completion_rate = _safe_completion_rate(
+                env.metrics.episode_jobs_completed, env.metrics.episode_jobs_submitted
+            )
+            avg_wait = _safe_avg_wait(
+                env.metrics.episode_total_job_wait_time, env.metrics.episode_jobs_completed
+            )
             ...
             # Job metrics (baseline)
-            baseline_completion_rate = (
-                env.metrics.episode_baseline_jobs_completed / env.metrics.episode_baseline_jobs_submitted * 100
-                if env.metrics.episode_baseline_jobs_submitted > 0
-                else 0.0
-            )
-            baseline_avg_wait = (
-                env.metrics.episode_baseline_total_job_wait_time / env.metrics.episode_baseline_jobs_completed
-                if env.metrics.episode_baseline_jobs_completed > 0
-                else 0.0
-            )
+            baseline_completion_rate = _safe_completion_rate(
+                env.metrics.episode_baseline_jobs_completed, env.metrics.episode_baseline_jobs_submitted
+            )
+            baseline_avg_wait = _safe_avg_wait(
+                env.metrics.episode_baseline_total_job_wait_time, env.metrics.episode_baseline_jobs_completed
+            )

src/baseline.py

src/job_management.py

helper_queue allows to keep observation space at "reduced" queue-length, while no job is rejected.

Set Max-Age to very high number (observation space limit). Hotfixed dropping jobs after 1 week, kept logic, if actual dropping shall be implemented later. Age-Penalty now looks at the age of the oldest waiting job; smooth exponential for early times, saturates at -1 after one week.

…erlapping tags Added metrics beyond the original five are: metrics/episode_baseline_cost metrics/episode_baseline_cost_off metrics/episode_jobs_submitted metrics/episode_jobs_completed metrics/episode_completion_rate metrics/episode_avg_wait_hours metrics/episode_max_queue_size metrics/episode_jobs_rejected_queue_full metrics/episode_baseline_jobs_submitted metrics/episode_baseline_jobs_completed metrics/episode_baseline_completion_rate metrics/episode_baseline_avg_wait_hours metrics/episode_baseline_max_queue_size metrics/episode_baseline_jobs_dropped metrics/episode_baseline_jobs_rejected_queue_full

Reward Calculation: HOTFIX! "energy-efficiency" logic was lost during…

f1b4c16

… refactoring. Now reintroduced and used instead of "efficiency" function.

enlorenz force-pushed the wip/integration branch from 8bf27e2 to 963be49 Compare January 28, 2026 13:38

coderabbitai bot reviewed Jan 28, 2026

View reviewed changes

src/job_management.py Outdated Show resolved Hide resolved

src/job_management.py Show resolved Hide resolved

enlorenz force-pushed the wip/integration branch 2 times, most recently from 6763cc8 to 440c0b6 Compare January 28, 2026 14:07

enlorenz marked this pull request as draft January 28, 2026 14:11

coderabbitai bot reviewed Jan 28, 2026

View reviewed changes

enlorenz marked this pull request as ready for review January 28, 2026 14:14

enlorenz force-pushed the wip/integration branch from 440c0b6 to b62bdbd Compare January 28, 2026 14:59

enlorenz marked this pull request as draft January 28, 2026 14:59

coderabbitai bot reviewed Jan 28, 2026

View reviewed changes

src/environment.py Show resolved Hide resolved

enlorenz marked this pull request as ready for review January 28, 2026 15:08

enlorenz marked this pull request as draft January 28, 2026 15:08

enlorenz force-pushed the wip/integration branch 2 times, most recently from b62bdbd to bae5879 Compare January 28, 2026 15:11

coderabbitai bot reviewed Jan 28, 2026

View reviewed changes

src/baseline.py Outdated Show resolved Hide resolved

src/job_management.py Outdated Show resolved Hide resolved

enlorenz force-pushed the wip/integration branch 2 times, most recently from f0924ed to 88b68ea Compare January 28, 2026 15:31

enlorenz marked this pull request as ready for review January 28, 2026 15:36

Enis Lorenz added 4 commits January 28, 2026 16:49

Introduce helper_queue to collect overflow.

481141c

helper_queue allows to keep observation space at "reduced" queue-length, while no job is rejected.

Train.py: Fixed Weight Prefixes for model names.

e7d63d2

enlorenz force-pushed the wip/integration branch from 88b68ea to e7d63d2 Compare January 28, 2026 15:49

		# Use a very high cap; age-based dropping is temporarily disabled in code.
		MAX_JOB_AGE = WEEK_HOURS * 52 * 10 # ~10 years in hours

Rework of Rewards and Queueing + Small hotfixes #17

Are you sure you want to change the base?

Rework of Rewards and Queueing + Small hotfixes #17

Conversation

enlorenz commented Jan 28, 2026

Uh oh!

coderabbitai bot commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

enlorenz Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Jan 28, 2026 •

edited

Loading

coderabbitai bot Jan 28, 2026 •

edited

Loading