01 Dec 17:51

shahar-brd

4108b23

v2.0.0 - Breaking Changes Latest

Latest

🚀 v2.0.0 - Complete Architecture Rewrite

⚠️ Breaking Changes - Migration Required

This is a major breaking release requiring code changes. Python 3.9+ now required.

Client Initialization

# ❌ Old
from brightdata import bdclient
client = bdclient(api_token="your_token")

# ✅ New
from brightdata import BrightDataClient
client = BrightDataClient(token="your_token")

API Structure - Hierarchical Methods

# ❌ Old - Flat API
client.scrape_linkedin.profiles(url)
client.search_linkedin.jobs()
result = client.scrape(url, zone="my_zone")

# ✅ New - Hierarchical API
client.scrape.linkedin.profiles(url)
client.search.linkedin.jobs()
result = client.scrape_url(url, zone="my_zone")

Platform-Specific Scraping

# ✅ New - Recommended approach
client.scrape.amazon.products(url)
client.scrape.amazon.reviews(url)
client.scrape.amazon.sellers(url)
client.scrape.linkedin.profiles(url)
client.scrape.instagram.profiles(url)
client.scrape.facebook.posts(url)

Search Operations

# ❌ Old
results = client.search(query, search_engine="google")

# ✅ New - Dedicated methods
client.search.google(query)
client.search.bing(query)
client.search.yandex(query)

Async Support (New)

# ✅ Sync (still supported)
client = BrightDataClient(token="...")
result = client.scrape_url(url)

# ✅ Async (recommended for performance)
async with BrightDataClient(token="...") as client:
    result = await client.scrape_url_async(url)
    
# ✅ Async batch operations
async def scrape_multiple():
    async with BrightDataClient(token="...") as client:
        tasks = [client.scrape_url_async(url) for url in urls]
        results = await asyncio.gather(*tasks)

Manual Job Control (New)

# ✅ Fine-grained control
job = await scraper.trigger(url)
# Do other work...
status = await job.status_async()
if status == "ready":
    data = await job.fetch_async()

Type-Safe Payloads (New)

# ❌ Old - untyped dicts
payload = {"url": "...", "reviews_count": 100}

# ✅ New - structured with validation
from brightdata import AmazonProductPayload
payload = AmazonProductPayload(
    url="https://amazon.com/dp/B123",
    reviews_count=100
)
result = client.scrape.amazon.products(payload)

Return Types

# ✅ New - structured objects with metadata
result = client.scrape.amazon.products(url)
print(result.data)        # Actual scraped data
print(result.timing)      # Performance metrics
print(result.cost)        # Cost tracking
print(result.snapshot_id) # Job identifier

CLI Tool (New)

# ✅ Command-line interface
brightdata scrape amazon products --url https://amazon.com/dp/B123
brightdata search google --query "python sdk"
brightdata search linkedin jobs --location "Paris"
brightdata crawler discover --url https://example.com --depth 3

Configuration Changes

# ❌ Old
client = bdclient(
    api_token="token",              # Changed parameter name
    auto_create_zones=True,          # Default changed to False
    web_unlocker_zone="sdk_unlocker", # Default changed
    serp_zone="sdk_serp",            # Default changed
    browser_zone="sdk_browser"       # Default changed
)

# ✅ New
client = BrightDataClient(
    token="token",                   # Renamed from api_token
    auto_create_zones=False,         # New default
    web_unlocker_zone="web_unlocker1", # New default name
    serp_zone="serp_api1",           # New default name
    browser_zone="browser_api1",     # New default name
    timeout=30,                      # New parameter
    rate_limit=10,                   # New parameter (optional)
    rate_period=1.0                  # New parameter
)

✨ New Features

Platform Coverage

Platform	Status	Methods
Amazon	✅ NEW	`products()`, `reviews()`, `sellers()`
Instagram	✅ NEW	`profiles()`, `posts()`, `comments()`, `reels()`
Facebook	✅ NEW	`posts()`, `comments()`, `groups()`
LinkedIn	✅ Enhanced	Full scraping and search
ChatGPT	✅ Enhanced	Improved interaction
Google/Bing/Yandex	✅ Enhanced	Dedicated services

Performance

⚡ 10x better concurrency - Event loop-based architecture
🔌 Advanced connection pooling - 100 total, 30 per host
🎯 Built-in rate limiting - Configurable request throttling

✅ Upgrade Checklist

Update Python to 3.9+
Change imports: bdclient → BrightDataClient
Update parameter: api_token= → token=
Migrate method calls to hierarchical structure
Handle new ScrapeResult/SearchResult return types
Review zone configuration defaults
Consider async for better performance
Test in staging environment

📚 Resources

Full Changelog: v1.1.3...v2.0.0

Assets 2

07 Sep 18:20

Idanvilenski

v1.1.3

5a81b1c

v1.1.3

New Features:

Added url parameter to extract function for direct URL specification
Added output_scheme parameter for OpenAI Structured Outputs support
Enhanced parse_content to auto-detect multiple results from batch operations

Improvements:

Added user-agent headers to all dataset API requests for better tracking
Improved schema validation for OpenAI Structured Outputs compatibility
Updated examples with proper formatting

Bug Fixes:

Fixed parse_content handling of multiple scraping results
Fixed OpenAI schema validation requirements

Assets 2

04 Sep 14:53

Idanvilenski

v1.1.2

b4aec7e

v1.1.2: AI-Powered Extract Function and LinkedIn Sync Improvements

New Features

AI-Powered Extract Function: New extract() function that combines web scraping with OpenAI's language models to extract targeted information from web pages using natural language queries
LinkedIn Sync Mode Fix: Fixed LinkedIn scraping sync mode to use the correct API endpoint and request structure for immediate data retrieval

Improvements

Set sync=True as default for all LinkedIn scraping methods for better user experience
Improved unit test coverage
Enhanced error handling for LinkedIn API responses

Examples

Added extract_example.py demonstrating AI-powered content extraction capabilities
Updated LinkedIn examples to showcase sync functionality

Technical Changes

Use correct /scrape endpoint for synchronous LinkedIn requests
Pass dataset_id as URL parameter with proper flags
Handle both 200 and 202 status codes appropriately
Maintain backward compatibility for async operations

Assets 2

03 Sep 10:22

Idanvilenski

v1.1.1

ef3d827

v1.1.1: Documentation Updates & Bug Fixes

Updates

Enhanced README with examples for crawl(), parse_content(), and connect_browser() functions
Added complete client parameter documentation
Fixed browser connection example import issues
Improved CI workflow for PyPI package testing

Bug Fixes

Fixed missing Playwright import in browser example
Corrected example URL typo
Updated test workflow to prevent PyPI race conditions

Assets 2

01 Sep 14:31

Idanvilenski

v1.1.0

aefa14a

v1.1.0: Web Crawling, Content Parsing & Browser Automation

New Features

🕷️ Web Crawling

crawl() function for discovering and scraping multiple pages from websites
Advanced filtering with regex patterns for URL inclusion/exclusion
Configurable crawl depth and sitemap handling
Custom output schema support

🔍 Content Parsing

parse_content() function for extracting useful data from API responses
Support for text extraction, link discovery, and image URL collection
Handles both JSON responses and raw HTML content
Structured data extraction from various content formats

🌐 Browser Automation

connect_browser() function for Playwright/Selenium integration
WebSocket endpoint generation for scraping browser connections
Support for multiple browser automation tools (Playwright, Puppeteer, Selenium)
Seamless authentication with Bright Data's browser service

Improvements

📡 Better Async Handling

Enhanced download_snapshot() with improved 202 status code handling
Friendly status messages instead of exceptions for pending snapshots
Better user experience for asynchronous data processing

🔧 Robust Error Handling

Fixed zone creation error handling with proper exception propagation
Added retry logic for network failures and temporary errors
Improved zone management reliability

🐍 Python Support Update

Updated to support Python 3.8+ (removed Python 3.7)
Updated CI/CD pipeline for modern Python versions
Added BeautifulSoup4 as core dependency

Dependencies

Added: beautifulsoup4>=4.9.0 for content parsing
Updated: Python compatibility to >=3.8

Examples

New example files demonstrate the enhanced functionality:

examples/crawl_example.py - Web crawling usage
examples/browser_connection_example.py - Browser automation setup
examples/parse_content_example.py - Content parsing workflows

Assets 2

27 Aug 15:14

Idanvilenski

v1.0.7

0e05a6a

Release v1.0.7: LinkedIn Integration & Enhanced APIs

🚀 Major Features

LinkedIn Data Integration

New scrape_linkedin class: Comprehensive LinkedIn data scraping for profiles, companies, jobs, and posts
New search_linkedin class: Advanced LinkedIn content discovery with keyword and URL-based search
Production-ready examples: Ready-to-use examples for all LinkedIn functionality

Enhanced ChatGPT API

Renamed to search_chatGPT: More intuitive naming for ChatGPT interactions
Sync/Async support: Choose between immediate results or background processing
Improved NDJSON parsing: Better handling of multi-response data

Improved Architecture

Modular design: Separated download functionality into dedicated module
Better code organization: Specialized API modules for different services
Production optimizations: Cleaner code with improved performance

🔧 API Enhancements

New LinkedIn Methods

# Scrape LinkedIn data
client.scrape_linkedin.profiles(urls)
client.scrape_linkedin.companies(urls)
client.scrape_linkedin.jobs(urls)
client.scrape_linkedin.posts(urls)

# Search LinkedIn content
client.search_linkedin.profiles(first_name, last_name)
client.search_linkedin.jobs(location="Paris", keyword="developer")
client.search_linkedin.posts(company_url="https://linkedin.com/company/bright-data")

Enhanced ChatGPT API

# Synchronous (immediate results)
result = client.search_chatGPT(prompt="Your question", sync=True)

# Asynchronous (background processing)
result = client.search_chatGPT(prompt="Your question", sync=False)

🛠️ Technical Improvements

Better error handling: Enhanced validation and error messages
Backward compatibility: All existing code continues to work
Performance optimizations: Faster processing and reduced memory usage
Production-ready code: Clean, efficient, and maintainable codebase

📝 Breaking Changes

scrape_chatGPT() renamed to search_chatGPT() (maintains same functionality)
Added sync parameter to ChatGPT API (defaults to True)

🐛 Bug Fixes

Fixed NDJSON response parsing for multi-line JSON data
Improved parameter validation across all APIs
Enhanced timeout handling for long-running requests

📚 Documentation

Updated examples with new LinkedIn functionality
Enhanced docstrings for all new methods
Added comprehensive usage examples

Assets 2

24 Aug 14:03

Idanvilenski

v1.0.6

975569f

Release v1.0.6

Version 1.0.6 - Changed default data_format to html for better output formatting

Assets 2

21 Aug 12:48

Idanvilenski

v1.0.5

26be454

Release v1.0.5

Version 1.0.5 release

Assets 2

21 Aug 12:32

Idanvilenski

v1.0.4

b91ba4a

Release v1.0.4

What's New

New JSON Parsing Feature: Added parse parameter to search function
- When parse=True, automatically appends &brd_json=1 to search URLs
- Enables structured JSON responses from search engines
- Defaults to False for backward compatibility

Usage Example

from brightdata import bdclient

client = bdclient(api_token="your-token")

# Enable JSON parsing
results = client.search(
    query="pizza restaurants", 
    search_engine="google", 
    parse=True
)

Changes

Add parse parameter to search() method in both API and client
Update documentation and examples
Add comprehensive unit tests
Maintain backward compatibility

Assets 2

19 Aug 13:31

Idanvilenski

v1.0.3

448cc3a

v1.0.3

What's Changed

Fixed CI/CD pipeline issues with deprecated GitHub Actions
Enhanced country code validation flexibility
Improved zone management and testing coverage

Full Changelog: https://github.com/brightdata/bright-data-sdk-python/blob/main/CHANGELOG.md

Assets 2

Releases: brightdata/sdk-python

v2.0.0 - Breaking Changes

🚀 v2.0.0 - Complete Architecture Rewrite

⚠️ Breaking Changes - Migration Required

Client Initialization

API Structure - Hierarchical Methods

Platform-Specific Scraping

Search Operations

Async Support (New)

Manual Job Control (New)

Type-Safe Payloads (New)

Return Types

CLI Tool (New)

Configuration Changes

✨ New Features

Platform Coverage

Performance

✅ Upgrade Checklist

📚 Resources

Uh oh!

v1.1.3

Uh oh!

v1.1.2: AI-Powered Extract Function and LinkedIn Sync Improvements

New Features

Improvements

Examples

Technical Changes

Uh oh!

v1.1.1: Documentation Updates & Bug Fixes

Updates

Bug Fixes

Uh oh!

v1.1.0: Web Crawling, Content Parsing & Browser Automation

New Features

🕷️ Web Crawling

🔍 Content Parsing

🌐 Browser Automation

Improvements

📡 Better Async Handling

🔧 Robust Error Handling

🐍 Python Support Update

Dependencies

Examples

Uh oh!

Release v1.0.7: LinkedIn Integration & Enhanced APIs

🚀 Major Features

LinkedIn Data Integration

Enhanced ChatGPT API

Improved Architecture

🔧 API Enhancements

New LinkedIn Methods

Enhanced ChatGPT API

🛠️ Technical Improvements

📝 Breaking Changes

🐛 Bug Fixes

📚 Documentation

Uh oh!

Release v1.0.6

Uh oh!

Release v1.0.5

Uh oh!

Release v1.0.4

What's New

Usage Example

Changes

Uh oh!

v1.0.3

What's Changed

Uh oh!