Skip to content

Conversation

@jan-janssen
Copy link
Member

@jan-janssen jan-janssen commented Dec 23, 2025

Summary by CodeRabbit

  • New Features
    • Added executor initialization validation with timeout detection and enhanced error reporting for startup failures.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 23, 2025

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

A new internal self-test function is added to verify executor functionality immediately after initialization. The function submits a task, polls for completion with timeout, and raises an error with guidance if the task does not complete within the specified interval.

Changes

Cohort / File(s) Summary
Executor Self-Test Addition
src/executorlib/executor/single.py
Adds _executor_self_test(exe) function that validates executor task submission and completion. Imports sleep from time module. Function is invoked in two locations: after SingleNodeExecutor task scheduler construction and after creating the single-node executor. Implements sleep-based polling to wait for task completion and raises TimeoutError with guidance message if task does not complete within timeout window.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Feature: Add built-in network test' directly aligns with the main change: introducing a self-test function that validates network connectivity in executors.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jan-janssen jan-janssen marked this pull request as draft December 23, 2025 08:55
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
src/executorlib/executor/single.py (2)

459-463: Consider a more diagnostic error message.

The error message prescriptively suggests hostname_localhost=True as the solution, but timeout could occur for other reasons (network issues, resource contention, worker failures). Consider providing a more diagnostic message that describes what failed and suggests hostname_localhost=True as one possible solution rather than the definitive fix.

🔎 Suggested improvement
     if not f.done():
         exe.shutdown(wait=False, cancel_futures=False)
         raise TimeoutError(
-            'Plase try "hostname_localhost=True" in the initialization of the SingleNodeExecutor(hostname_localhost=True).'
+            'Executor self-test timed out after 1 second. This may indicate network connectivity issues. '
+            'Try setting hostname_localhost=True in SingleNodeExecutor initialization if running on macOS or in HPC environments.'
         )

453-464: Consider optional refinements.

Two minor refinements to consider:

  1. The function returns True but this value is never used (line 192 doesn't capture it). Consider removing the return statement or documenting why it returns a value.

  2. The static analysis tool suggests avoiding long exception messages outside the exception class (TRY003). While this is a style preference, you could consider defining a custom exception class if this pattern is repeated elsewhere.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3281a55 and 4f1dc53.

📒 Files selected for processing (1)
  • src/executorlib/executor/single.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/executorlib/executor/single.py (2)
src/executorlib/executor/base.py (2)
  • submit (71-108)
  • shutdown (110-126)
src/executorlib/task_scheduler/file/shared.py (1)
  • done (40-48)
🪛 Ruff (0.14.10)
src/executorlib/executor/single.py

461-463: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: unittest_openmpi (ubuntu-latest, 3.13)
  • GitHub Check: unittest_flux_openmpi
  • GitHub Check: unittest_openmpi (macos-latest, 3.13)
  • GitHub Check: unittest_openmpi (ubuntu-latest, 3.12)
  • GitHub Check: unittest_mpich (ubuntu-latest, 3.13)
  • GitHub Check: unittest_openmpi (ubuntu-latest, 3.11)
  • GitHub Check: unittest_openmpi (ubuntu-24.04-arm, 3.13)
  • GitHub Check: unittest_mpich (macos-latest, 3.13)
  • GitHub Check: unittest_openmpi (ubuntu-22.04-arm, 3.13)
  • GitHub Check: unittest_mpich (ubuntu-22.04-arm, 3.13)
  • GitHub Check: unittest_mpich (ubuntu-24.04-arm, 3.13)
  • GitHub Check: unittest_mpich (ubuntu-latest, 3.11)
  • GitHub Check: unittest_slurm_mpich
  • GitHub Check: unittest_flux_mpich
  • GitHub Check: notebooks
  • GitHub Check: notebooks_integration
  • GitHub Check: benchmark (ubuntu-latest, 3.13, .ci_support/environment-mpich.yml)
  • GitHub Check: unittest_win
  • GitHub Check: unittest_old
  • GitHub Check: benchmark (ubuntu-latest, 3.13, .ci_support/environment-openmpi.yml)
🔇 Additional comments (2)
src/executorlib/executor/single.py (2)

1-1: LGTM!

The import of sleep is necessary for the polling mechanism in the self-test function.


456-458: Verify the timeout duration is sufficient.

The self-test uses a 1-second timeout (10 iterations × 0.1s). Ensure this duration is sufficient for various environments, including slower systems, busy clusters, or CI/CD environments where resources may be constrained.

log_obj_size=log_obj_size,
)
)
_executor_self_test(exe=self._task_scheduler)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, find the file and check its size
find . -name "single.py" -path "*/executor/*" | head -5

Repository: pyiron/executorlib

Length of output: 98


🏁 Script executed:

# Examine the file around line 192 and the TestClusterExecutor area
sed -n '185,300p' src/executorlib/executor/single.py

Repository: pyiron/executorlib

Length of output: 8212


🏁 Script executed:

# Continue reading to see TestClusterExecutor.__init__ implementation body
sed -n '289,370p' src/executorlib/executor/single.py

Repository: pyiron/executorlib

Length of output: 4316


🏁 Script executed:

# Search for _executor_self_test definition
rg -n "def _executor_self_test" src/executorlib/

Repository: pyiron/executorlib

Length of output: 130


🏁 Script executed:

# View the _executor_self_test function
sed -n '453,500p' src/executorlib/executor/single.py

Repository: pyiron/executorlib

Length of output: 474


🏁 Script executed:

# Check the structure of both __init__ methods to see if they both expose _task_scheduler
sed -n '100,200p' src/executorlib/executor/single.py

Repository: pyiron/executorlib

Length of output: 6781


Add self-test validation to TestClusterExecutor for consistency.

TestClusterExecutor initialization lacks the _executor_self_test() call present in SingleNodeExecutor, despite both having similar executor setup patterns. Consider adding _executor_self_test(exe=self._task_scheduler) after the super().__init__() call to validate executor initialization and catch connection issues early, consistent with SingleNodeExecutor's approach.

🤖 Prompt for AI Agents
In src/executorlib/executor/single.py around line 192, TestClusterExecutor's
initializer is missing the self-test call used in SingleNodeExecutor; add a call
to _executor_self_test(exe=self._task_scheduler) immediately after the
super().__init__(...) return so the TaskScheduler/connection is validated during
initialization, mirroring SingleNodeExecutor and allowing early failure
detection.

Comment on lines 453 to 464
def _executor_self_test(exe):
f = exe.submit(sum, [1, 1])
counter = 0
while not f.done() and counter < 10:
sleep(0.1)
counter += 1
if not f.done():
exe.shutdown(wait=False, cancel_futures=False)
raise TimeoutError(
'Plase try "hostname_localhost=True" in the initialization of the SingleNodeExecutor(hostname_localhost=True).'
)
return True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix typo in error message.

Line 462 contains a typo: "Plase" should be "Please".

🔎 Proposed fix
         raise TimeoutError(
-            'Plase try "hostname_localhost=True" in the initialization of the SingleNodeExecutor(hostname_localhost=True).'
+            'Please try "hostname_localhost=True" in the initialization of the SingleNodeExecutor(hostname_localhost=True).'
         )
🧰 Tools
🪛 Ruff (0.14.10)

461-463: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In src/executorlib/executor/single.py around lines 453 to 464, the error message
raised on timeout contains a typo ("Plase"). Update the string to correct the
spelling to "Please" so the message reads: 'Please try "hostname_localhost=True"
in the initialization of the SingleNodeExecutor(hostname_localhost=True).'
Ensure only the typo is fixed and preserve punctuation and quoting exactly as in
the original message.

@jan-janssen
Copy link
Member Author

======================================================================
ERROR: test_meta_executor_parallel (test_singlenodeexecutor_mpi.TestExecutorBackend.test_meta_executor_parallel)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/executorlib/executorlib/tests/test_singlenodeexecutor_mpi.py", line 66, in test_meta_executor_parallel
    with SingleNodeExecutor(
         ~~~~~~~~~~~~~~~~~~^
        max_workers=2,
        ^^^^^^^^^^^^^^
        resource_dict={"cores": 2},
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
        block_allocation=True,
        ^^^^^^^^^^^^^^^^^^^^^^
    ) as exe:
    ^
  File "/home/runner/miniconda3/envs/test/lib/python3.13/site-packages/executorlib/executor/single.py", line 192, in __init__
    _executor_self_test(exe=self._task_scheduler)
    ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/miniconda3/envs/test/lib/python3.13/site-packages/executorlib/executor/single.py", line 461, in _executor_self_test
    raise TimeoutError(
        'Plase try "hostname_localhost=True" in the initialization of the SingleNodeExecutor(hostname_localhost=True).'
    )
TimeoutError: Plase try "hostname_localhost=True" in the initialization of the SingleNodeExecutor(hostname_localhost=True).

======================================================================
ERROR: test_meta_executor_parallel_cache (test_singlenodeexecutor_mpi.TestExecutorBackendCache.test_meta_executor_parallel_cache)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/executorlib/executorlib/tests/test_singlenodeexecutor_mpi.py", line 92, in test_meta_executor_parallel_cache
    with SingleNodeExecutor(
         ~~~~~~~~~~~~~~~~~~^
        max_workers=2,
        ^^^^^^^^^^^^^^
    ...<2 lines>...
        cache_directory="executorlib_cache",
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ) as exe:
    ^
  File "/home/runner/miniconda3/envs/test/lib/python3.13/site-packages/executorlib/executor/single.py", line 192, in __init__
    _executor_self_test(exe=self._task_scheduler)
    ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/miniconda3/envs/test/lib/python3.13/site-packages/executorlib/executor/single.py", line 461, in _executor_self_test
    raise TimeoutError(
        'Plase try "hostname_localhost=True" in the initialization of the SingleNodeExecutor(hostname_localhost=True).'
    )
TimeoutError: Plase try "hostname_localhost=True" in the initialization of the SingleNodeExecutor(hostname_localhost=True).

======================================================================
ERROR: test_single_node_executor_init_function (test_singlenodeexecutor_noblock.TestFunctionCrashes.test_single_node_executor_init_function)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/executorlib/executorlib/tests/test_singlenodeexecutor_noblock.py", line 166, in test_single_node_executor_init_function
    with SingleNodeExecutor(max_workers=2, init_function=exit_funct, block_allocation=True) as exe:
         ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/miniconda3/envs/test/lib/python3.13/site-packages/executorlib/executor/single.py", line 192, in __init__
    _executor_self_test(exe=self._task_scheduler)
    ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/miniconda3/envs/test/lib/python3.13/site-packages/executorlib/executor/single.py", line 461, in _executor_self_test
    raise TimeoutError(
        'Plase try "hostname_localhost=True" in the initialization of the SingleNodeExecutor(hostname_localhost=True).'
    )
TimeoutError: Plase try "hostname_localhost=True" in the initialization of the SingleNodeExecutor(hostname_localhost=True).

======================================================================
FAIL: test_cache_data_class (test_singlenodeexecutor_cache.TestCacheFunctions.test_cache_data_class)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/executorlib/executorlib/tests/test_singlenodeexecutor_cache.py", line 52, in test_cache_data_class
    self.assertEqual(sum([c["output"] for c in cache_lst]), sum(result_lst))
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: 14 != 12

======================================================================
FAIL: test_cache_key (test_singlenodeexecutor_cache.TestCacheFunctions.test_cache_key)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/executorlib/executorlib/tests/test_singlenodeexecutor_cache.py", line 66, in test_cache_key
    self.assertTrue("same" in entry['filename'])
    ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: False is not true

----------------------------------------------------------------------
Ran 244 tests in 67.719s

FAILED (failures=2, errors=3, skipped=35)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants