Releases: brightdata/sdk-python
Releases · brightdata/sdk-python
v2.0.0 - Breaking Changes
🚀 v2.0.0 - Complete Architecture Rewrite
⚠️ Breaking Changes - Migration Required
This is a major breaking release requiring code changes. Python 3.9+ now required.
Client Initialization
# ❌ Old
from brightdata import bdclient
client = bdclient(api_token="your_token")
# ✅ New
from brightdata import BrightDataClient
client = BrightDataClient(token="your_token")API Structure - Hierarchical Methods
# ❌ Old - Flat API
client.scrape_linkedin.profiles(url)
client.search_linkedin.jobs()
result = client.scrape(url, zone="my_zone")
# ✅ New - Hierarchical API
client.scrape.linkedin.profiles(url)
client.search.linkedin.jobs()
result = client.scrape_url(url, zone="my_zone")Platform-Specific Scraping
# ✅ New - Recommended approach
client.scrape.amazon.products(url)
client.scrape.amazon.reviews(url)
client.scrape.amazon.sellers(url)
client.scrape.linkedin.profiles(url)
client.scrape.instagram.profiles(url)
client.scrape.facebook.posts(url)Search Operations
# ❌ Old
results = client.search(query, search_engine="google")
# ✅ New - Dedicated methods
client.search.google(query)
client.search.bing(query)
client.search.yandex(query)Async Support (New)
# ✅ Sync (still supported)
client = BrightDataClient(token="...")
result = client.scrape_url(url)
# ✅ Async (recommended for performance)
async with BrightDataClient(token="...") as client:
result = await client.scrape_url_async(url)
# ✅ Async batch operations
async def scrape_multiple():
async with BrightDataClient(token="...") as client:
tasks = [client.scrape_url_async(url) for url in urls]
results = await asyncio.gather(*tasks)Manual Job Control (New)
# ✅ Fine-grained control
job = await scraper.trigger(url)
# Do other work...
status = await job.status_async()
if status == "ready":
data = await job.fetch_async()Type-Safe Payloads (New)
# ❌ Old - untyped dicts
payload = {"url": "...", "reviews_count": 100}
# ✅ New - structured with validation
from brightdata import AmazonProductPayload
payload = AmazonProductPayload(
url="https://amazon.com/dp/B123",
reviews_count=100
)
result = client.scrape.amazon.products(payload)Return Types
# ✅ New - structured objects with metadata
result = client.scrape.amazon.products(url)
print(result.data) # Actual scraped data
print(result.timing) # Performance metrics
print(result.cost) # Cost tracking
print(result.snapshot_id) # Job identifierCLI Tool (New)
# ✅ Command-line interface
brightdata scrape amazon products --url https://amazon.com/dp/B123
brightdata search google --query "python sdk"
brightdata search linkedin jobs --location "Paris"
brightdata crawler discover --url https://example.com --depth 3Configuration Changes
# ❌ Old
client = bdclient(
api_token="token", # Changed parameter name
auto_create_zones=True, # Default changed to False
web_unlocker_zone="sdk_unlocker", # Default changed
serp_zone="sdk_serp", # Default changed
browser_zone="sdk_browser" # Default changed
)
# ✅ New
client = BrightDataClient(
token="token", # Renamed from api_token
auto_create_zones=False, # New default
web_unlocker_zone="web_unlocker1", # New default name
serp_zone="serp_api1", # New default name
browser_zone="browser_api1", # New default name
timeout=30, # New parameter
rate_limit=10, # New parameter (optional)
rate_period=1.0 # New parameter
)✨ New Features
Platform Coverage
| Platform | Status | Methods |
|---|---|---|
| Amazon | ✅ NEW | products(), reviews(), sellers() |
| ✅ NEW | profiles(), posts(), comments(), reels() |
|
| ✅ NEW | posts(), comments(), groups() |
|
| ✅ Enhanced | Full scraping and search | |
| ChatGPT | ✅ Enhanced | Improved interaction |
| Google/Bing/Yandex | ✅ Enhanced | Dedicated services |
Performance
- ⚡ 10x better concurrency - Event loop-based architecture
- 🔌 Advanced connection pooling - 100 total, 30 per host
- 🎯 Built-in rate limiting - Configurable request throttling
✅ Upgrade Checklist
- Update Python to 3.9+
- Change imports:
bdclient→BrightDataClient - Update parameter:
api_token=→token= - Migrate method calls to hierarchical structure
- Handle new
ScrapeResult/SearchResultreturn types - Review zone configuration defaults
- Consider async for better performance
- Test in staging environment
📚 Resources
Full Changelog: v1.1.3...v2.0.0
v1.1.3
New Features:
- Added url parameter to extract function for direct URL specification
- Added output_scheme parameter for OpenAI Structured Outputs support
- Enhanced parse_content to auto-detect multiple results from batch operations
Improvements:
- Added user-agent headers to all dataset API requests for better tracking
- Improved schema validation for OpenAI Structured Outputs compatibility
- Updated examples with proper formatting
Bug Fixes:
- Fixed parse_content handling of multiple scraping results
- Fixed OpenAI schema validation requirements
v1.1.2: AI-Powered Extract Function and LinkedIn Sync Improvements
New Features
- AI-Powered Extract Function: New
extract()function that combines web scraping with OpenAI's language models to extract targeted information from web pages using natural language queries - LinkedIn Sync Mode Fix: Fixed LinkedIn scraping sync mode to use the correct API endpoint and request structure for immediate data retrieval
Improvements
- Set sync=True as default for all LinkedIn scraping methods for better user experience
- Improved unit test coverage
- Enhanced error handling for LinkedIn API responses
Examples
- Added
extract_example.pydemonstrating AI-powered content extraction capabilities - Updated LinkedIn examples to showcase sync functionality
Technical Changes
- Use correct
/scrapeendpoint for synchronous LinkedIn requests - Pass dataset_id as URL parameter with proper flags
- Handle both 200 and 202 status codes appropriately
- Maintain backward compatibility for async operations
v1.1.1: Documentation Updates & Bug Fixes
Updates
- Enhanced README with examples for
crawl(),parse_content(), andconnect_browser()functions - Added complete client parameter documentation
- Fixed browser connection example import issues
- Improved CI workflow for PyPI package testing
Bug Fixes
- Fixed missing Playwright import in browser example
- Corrected example URL typo
- Updated test workflow to prevent PyPI race conditions
v1.1.0: Web Crawling, Content Parsing & Browser Automation
New Features
🕷️ Web Crawling
- crawl() function for discovering and scraping multiple pages from websites
- Advanced filtering with regex patterns for URL inclusion/exclusion
- Configurable crawl depth and sitemap handling
- Custom output schema support
🔍 Content Parsing
- parse_content() function for extracting useful data from API responses
- Support for text extraction, link discovery, and image URL collection
- Handles both JSON responses and raw HTML content
- Structured data extraction from various content formats
🌐 Browser Automation
- connect_browser() function for Playwright/Selenium integration
- WebSocket endpoint generation for scraping browser connections
- Support for multiple browser automation tools (Playwright, Puppeteer, Selenium)
- Seamless authentication with Bright Data's browser service
Improvements
📡 Better Async Handling
- Enhanced download_snapshot() with improved 202 status code handling
- Friendly status messages instead of exceptions for pending snapshots
- Better user experience for asynchronous data processing
🔧 Robust Error Handling
- Fixed zone creation error handling with proper exception propagation
- Added retry logic for network failures and temporary errors
- Improved zone management reliability
🐍 Python Support Update
- Updated to support Python 3.8+ (removed Python 3.7)
- Updated CI/CD pipeline for modern Python versions
- Added BeautifulSoup4 as core dependency
Dependencies
- Added: beautifulsoup4>=4.9.0 for content parsing
- Updated: Python compatibility to >=3.8
Examples
New example files demonstrate the enhanced functionality:
examples/crawl_example.py- Web crawling usageexamples/browser_connection_example.py- Browser automation setupexamples/parse_content_example.py- Content parsing workflows
Release v1.0.7: LinkedIn Integration & Enhanced APIs
🚀 Major Features
LinkedIn Data Integration
- New
scrape_linkedinclass: Comprehensive LinkedIn data scraping for profiles, companies, jobs, and posts - New
search_linkedinclass: Advanced LinkedIn content discovery with keyword and URL-based search - Production-ready examples: Ready-to-use examples for all LinkedIn functionality
Enhanced ChatGPT API
- Renamed to
search_chatGPT: More intuitive naming for ChatGPT interactions - Sync/Async support: Choose between immediate results or background processing
- Improved NDJSON parsing: Better handling of multi-response data
Improved Architecture
- Modular design: Separated download functionality into dedicated module
- Better code organization: Specialized API modules for different services
- Production optimizations: Cleaner code with improved performance
🔧 API Enhancements
New LinkedIn Methods
# Scrape LinkedIn data
client.scrape_linkedin.profiles(urls)
client.scrape_linkedin.companies(urls)
client.scrape_linkedin.jobs(urls)
client.scrape_linkedin.posts(urls)
# Search LinkedIn content
client.search_linkedin.profiles(first_name, last_name)
client.search_linkedin.jobs(location="Paris", keyword="developer")
client.search_linkedin.posts(company_url="https://linkedin.com/company/bright-data")Enhanced ChatGPT API
# Synchronous (immediate results)
result = client.search_chatGPT(prompt="Your question", sync=True)
# Asynchronous (background processing)
result = client.search_chatGPT(prompt="Your question", sync=False)🛠️ Technical Improvements
- Better error handling: Enhanced validation and error messages
- Backward compatibility: All existing code continues to work
- Performance optimizations: Faster processing and reduced memory usage
- Production-ready code: Clean, efficient, and maintainable codebase
📝 Breaking Changes
scrape_chatGPT()renamed tosearch_chatGPT()(maintains same functionality)- Added
syncparameter to ChatGPT API (defaults toTrue)
🐛 Bug Fixes
- Fixed NDJSON response parsing for multi-line JSON data
- Improved parameter validation across all APIs
- Enhanced timeout handling for long-running requests
📚 Documentation
- Updated examples with new LinkedIn functionality
- Enhanced docstrings for all new methods
- Added comprehensive usage examples
Release v1.0.6
Version 1.0.6 - Changed default data_format to html for better output formatting
Release v1.0.5
Version 1.0.5 release
Release v1.0.4
What's New
- New JSON Parsing Feature: Added
parseparameter to search function- When
parse=True, automatically appends&brd_json=1to search URLs - Enables structured JSON responses from search engines
- Defaults to
Falsefor backward compatibility
- When
Usage Example
from brightdata import bdclient
client = bdclient(api_token="your-token")
# Enable JSON parsing
results = client.search(
query="pizza restaurants",
search_engine="google",
parse=True
)Changes
- Add parse parameter to search() method in both API and client
- Update documentation and examples
- Add comprehensive unit tests
- Maintain backward compatibility
v1.0.3
What's Changed
- Fixed CI/CD pipeline issues with deprecated GitHub Actions
- Enhanced country code validation flexibility
- Improved zone management and testing coverage
Full Changelog: https://github.com/brightdata/bright-data-sdk-python/blob/main/CHANGELOG.md