Skip to content

Conversation

@dillonledoux
Copy link

@dillonledoux dillonledoux commented Jan 14, 2026

Summary

This PR adds support for respecting Crawl-delay directives from robots.txt] files. When enabled, the crawler will automatically wait the specified delay between requests to the same domain, improving compliance with website policies and reducing the risk of being rate-limited or blocked.

Motivation

Many websites specify Crawl-delay directives in their robots.txt to indicate how long crawlers should wait between requests. Respecting this directive helps:

  • Maintain good etiquette when crawling websites
  • Reduce server load on target websites
  • Avoid triggering rate limiting or IP bans
  • Comply with website policies

Changes

New Feature: respect_crawl_delayConfiguration Parameter

Files Modified:

async_configs.py - Added respect_crawl_delay parameter to CrawlerRunConfig
models.py - Added crawl_delay] field to DomainState dataclass
utils.py - Added get_crawl_delay() method to RobotsParser
async_dispatcher.py - Enhanced RateLimiter] to support crawl-delay
async_webcrawler.py - Wired up respect_crawl_delay in arun_many()

Files Added:

test_crawl_delay.py - Comprehensive test suite

Documentation Updates

File Changes
CHANGELOG.md Added new feature entry under [Unreleased] section with full feature description
parameters.md Added respect_crawl_delay to both the "Page Navigation & Timing" table and "Compliance & Ethics" table with updated code examples
arun.md Added respect_crawl_delay to both the core usage example and comprehensive example
complete-sdk-reference.md Added respect_crawl_delay to multiple examples and both parameter tables

Running Tests

python -m pytest tests/general/test_crawl_delay.py -v

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added/updated unit tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant