Skip to content

Releases: brightdata/sdk-python

v2.0.0 - Breaking Changes

01 Dec 17:51
4108b23

Choose a tag to compare

🚀 v2.0.0 - Complete Architecture Rewrite

⚠️ Breaking Changes - Migration Required

This is a major breaking release requiring code changes. Python 3.9+ now required.

Client Initialization

# ❌ Old
from brightdata import bdclient
client = bdclient(api_token="your_token")

# ✅ New
from brightdata import BrightDataClient
client = BrightDataClient(token="your_token")

API Structure - Hierarchical Methods

# ❌ Old - Flat API
client.scrape_linkedin.profiles(url)
client.search_linkedin.jobs()
result = client.scrape(url, zone="my_zone")

# ✅ New - Hierarchical API
client.scrape.linkedin.profiles(url)
client.search.linkedin.jobs()
result = client.scrape_url(url, zone="my_zone")

Platform-Specific Scraping

# ✅ New - Recommended approach
client.scrape.amazon.products(url)
client.scrape.amazon.reviews(url)
client.scrape.amazon.sellers(url)
client.scrape.linkedin.profiles(url)
client.scrape.instagram.profiles(url)
client.scrape.facebook.posts(url)

Search Operations

# ❌ Old
results = client.search(query, search_engine="google")

# ✅ New - Dedicated methods
client.search.google(query)
client.search.bing(query)
client.search.yandex(query)

Async Support (New)

# ✅ Sync (still supported)
client = BrightDataClient(token="...")
result = client.scrape_url(url)

# ✅ Async (recommended for performance)
async with BrightDataClient(token="...") as client:
    result = await client.scrape_url_async(url)
    
# ✅ Async batch operations
async def scrape_multiple():
    async with BrightDataClient(token="...") as client:
        tasks = [client.scrape_url_async(url) for url in urls]
        results = await asyncio.gather(*tasks)

Manual Job Control (New)

# ✅ Fine-grained control
job = await scraper.trigger(url)
# Do other work...
status = await job.status_async()
if status == "ready":
    data = await job.fetch_async()

Type-Safe Payloads (New)

# ❌ Old - untyped dicts
payload = {"url": "...", "reviews_count": 100}

# ✅ New - structured with validation
from brightdata import AmazonProductPayload
payload = AmazonProductPayload(
    url="https://amazon.com/dp/B123",
    reviews_count=100
)
result = client.scrape.amazon.products(payload)

Return Types

# ✅ New - structured objects with metadata
result = client.scrape.amazon.products(url)
print(result.data)        # Actual scraped data
print(result.timing)      # Performance metrics
print(result.cost)        # Cost tracking
print(result.snapshot_id) # Job identifier

CLI Tool (New)

# ✅ Command-line interface
brightdata scrape amazon products --url https://amazon.com/dp/B123
brightdata search google --query "python sdk"
brightdata search linkedin jobs --location "Paris"
brightdata crawler discover --url https://example.com --depth 3

Configuration Changes

# ❌ Old
client = bdclient(
    api_token="token",              # Changed parameter name
    auto_create_zones=True,          # Default changed to False
    web_unlocker_zone="sdk_unlocker", # Default changed
    serp_zone="sdk_serp",            # Default changed
    browser_zone="sdk_browser"       # Default changed
)

# ✅ New
client = BrightDataClient(
    token="token",                   # Renamed from api_token
    auto_create_zones=False,         # New default
    web_unlocker_zone="web_unlocker1", # New default name
    serp_zone="serp_api1",           # New default name
    browser_zone="browser_api1",     # New default name
    timeout=30,                      # New parameter
    rate_limit=10,                   # New parameter (optional)
    rate_period=1.0                  # New parameter
)

✨ New Features

Platform Coverage

Platform Status Methods
Amazon ✅ NEW products(), reviews(), sellers()
Instagram ✅ NEW profiles(), posts(), comments(), reels()
Facebook ✅ NEW posts(), comments(), groups()
LinkedIn ✅ Enhanced Full scraping and search
ChatGPT ✅ Enhanced Improved interaction
Google/Bing/Yandex ✅ Enhanced Dedicated services

Performance

  • 10x better concurrency - Event loop-based architecture
  • 🔌 Advanced connection pooling - 100 total, 30 per host
  • 🎯 Built-in rate limiting - Configurable request throttling

✅ Upgrade Checklist

  • Update Python to 3.9+
  • Change imports: bdclientBrightDataClient
  • Update parameter: api_token=token=
  • Migrate method calls to hierarchical structure
  • Handle new ScrapeResult/SearchResult return types
  • Review zone configuration defaults
  • Consider async for better performance
  • Test in staging environment

📚 Resources

Full Changelog: v1.1.3...v2.0.0

v1.1.3

07 Sep 18:20

Choose a tag to compare

New Features:

  • Added url parameter to extract function for direct URL specification
  • Added output_scheme parameter for OpenAI Structured Outputs support
  • Enhanced parse_content to auto-detect multiple results from batch operations

Improvements:

  • Added user-agent headers to all dataset API requests for better tracking
  • Improved schema validation for OpenAI Structured Outputs compatibility
  • Updated examples with proper formatting

Bug Fixes:

  • Fixed parse_content handling of multiple scraping results
  • Fixed OpenAI schema validation requirements

v1.1.2: AI-Powered Extract Function and LinkedIn Sync Improvements

04 Sep 14:53

Choose a tag to compare

New Features

  • AI-Powered Extract Function: New extract() function that combines web scraping with OpenAI's language models to extract targeted information from web pages using natural language queries
  • LinkedIn Sync Mode Fix: Fixed LinkedIn scraping sync mode to use the correct API endpoint and request structure for immediate data retrieval

Improvements

  • Set sync=True as default for all LinkedIn scraping methods for better user experience
  • Improved unit test coverage
  • Enhanced error handling for LinkedIn API responses

Examples

  • Added extract_example.py demonstrating AI-powered content extraction capabilities
  • Updated LinkedIn examples to showcase sync functionality

Technical Changes

  • Use correct /scrape endpoint for synchronous LinkedIn requests
  • Pass dataset_id as URL parameter with proper flags
  • Handle both 200 and 202 status codes appropriately
  • Maintain backward compatibility for async operations

v1.1.1: Documentation Updates & Bug Fixes

03 Sep 10:22

Choose a tag to compare

Updates

  • Enhanced README with examples for crawl(), parse_content(), and connect_browser() functions
  • Added complete client parameter documentation
  • Fixed browser connection example import issues
  • Improved CI workflow for PyPI package testing

Bug Fixes

  • Fixed missing Playwright import in browser example
  • Corrected example URL typo
  • Updated test workflow to prevent PyPI race conditions

v1.1.0: Web Crawling, Content Parsing & Browser Automation

01 Sep 14:31

Choose a tag to compare

New Features

🕷️ Web Crawling

  • crawl() function for discovering and scraping multiple pages from websites
  • Advanced filtering with regex patterns for URL inclusion/exclusion
  • Configurable crawl depth and sitemap handling
  • Custom output schema support

🔍 Content Parsing

  • parse_content() function for extracting useful data from API responses
  • Support for text extraction, link discovery, and image URL collection
  • Handles both JSON responses and raw HTML content
  • Structured data extraction from various content formats

🌐 Browser Automation

  • connect_browser() function for Playwright/Selenium integration
  • WebSocket endpoint generation for scraping browser connections
  • Support for multiple browser automation tools (Playwright, Puppeteer, Selenium)
  • Seamless authentication with Bright Data's browser service

Improvements

📡 Better Async Handling

  • Enhanced download_snapshot() with improved 202 status code handling
  • Friendly status messages instead of exceptions for pending snapshots
  • Better user experience for asynchronous data processing

🔧 Robust Error Handling

  • Fixed zone creation error handling with proper exception propagation
  • Added retry logic for network failures and temporary errors
  • Improved zone management reliability

🐍 Python Support Update

  • Updated to support Python 3.8+ (removed Python 3.7)
  • Updated CI/CD pipeline for modern Python versions
  • Added BeautifulSoup4 as core dependency

Dependencies

  • Added: beautifulsoup4>=4.9.0 for content parsing
  • Updated: Python compatibility to >=3.8

Examples

New example files demonstrate the enhanced functionality:

  • examples/crawl_example.py - Web crawling usage
  • examples/browser_connection_example.py - Browser automation setup
  • examples/parse_content_example.py - Content parsing workflows

Release v1.0.7: LinkedIn Integration & Enhanced APIs

27 Aug 15:14

Choose a tag to compare

🚀 Major Features

LinkedIn Data Integration

  • New scrape_linkedin class: Comprehensive LinkedIn data scraping for profiles, companies, jobs, and posts
  • New search_linkedin class: Advanced LinkedIn content discovery with keyword and URL-based search
  • Production-ready examples: Ready-to-use examples for all LinkedIn functionality

Enhanced ChatGPT API

  • Renamed to search_chatGPT: More intuitive naming for ChatGPT interactions
  • Sync/Async support: Choose between immediate results or background processing
  • Improved NDJSON parsing: Better handling of multi-response data

Improved Architecture

  • Modular design: Separated download functionality into dedicated module
  • Better code organization: Specialized API modules for different services
  • Production optimizations: Cleaner code with improved performance

🔧 API Enhancements

New LinkedIn Methods

# Scrape LinkedIn data
client.scrape_linkedin.profiles(urls)
client.scrape_linkedin.companies(urls)
client.scrape_linkedin.jobs(urls)
client.scrape_linkedin.posts(urls)

# Search LinkedIn content
client.search_linkedin.profiles(first_name, last_name)
client.search_linkedin.jobs(location="Paris", keyword="developer")
client.search_linkedin.posts(company_url="https://linkedin.com/company/bright-data")

Enhanced ChatGPT API

# Synchronous (immediate results)
result = client.search_chatGPT(prompt="Your question", sync=True)

# Asynchronous (background processing)
result = client.search_chatGPT(prompt="Your question", sync=False)

🛠️ Technical Improvements

  • Better error handling: Enhanced validation and error messages
  • Backward compatibility: All existing code continues to work
  • Performance optimizations: Faster processing and reduced memory usage
  • Production-ready code: Clean, efficient, and maintainable codebase

📝 Breaking Changes

  • scrape_chatGPT() renamed to search_chatGPT() (maintains same functionality)
  • Added sync parameter to ChatGPT API (defaults to True)

🐛 Bug Fixes

  • Fixed NDJSON response parsing for multi-line JSON data
  • Improved parameter validation across all APIs
  • Enhanced timeout handling for long-running requests

📚 Documentation

  • Updated examples with new LinkedIn functionality
  • Enhanced docstrings for all new methods
  • Added comprehensive usage examples

Release v1.0.6

24 Aug 14:03

Choose a tag to compare

Version 1.0.6 - Changed default data_format to html for better output formatting

Release v1.0.5

21 Aug 12:48

Choose a tag to compare

Version 1.0.5 release

Release v1.0.4

21 Aug 12:32

Choose a tag to compare

What's New

  • New JSON Parsing Feature: Added parse parameter to search function
    • When parse=True, automatically appends &brd_json=1 to search URLs
    • Enables structured JSON responses from search engines
    • Defaults to False for backward compatibility

Usage Example

from brightdata import bdclient

client = bdclient(api_token="your-token")

# Enable JSON parsing
results = client.search(
    query="pizza restaurants", 
    search_engine="google", 
    parse=True
)

Changes

  • Add parse parameter to search() method in both API and client
  • Update documentation and examples
  • Add comprehensive unit tests
  • Maintain backward compatibility

v1.0.3

19 Aug 13:31

Choose a tag to compare

What's Changed

  • Fixed CI/CD pipeline issues with deprecated GitHub Actions
  • Enhanced country code validation flexibility
  • Improved zone management and testing coverage

Full Changelog: https://github.com/brightdata/bright-data-sdk-python/blob/main/CHANGELOG.md