Skip to content

Conversation

@richiemcilroy
Copy link
Member

@richiemcilroy richiemcilroy commented Dec 30, 2025

Enhance audio-video synchronization by refining timestamp propagation, sample count calculations, and playback drift correction.

This PR addresses several subtle issues across the audio pipeline:

  1. Audio Renderer: Correctly accounts for per-track offsets in sample count calculations, preventing incorrect audio duration.
  2. Editor Playback: Improves editor A/V sync by reducing the cursor adjustment threshold from ~200ms to ~33ms (one frame) for more precise audio seeking and drift correction.
  3. Playback Sync: Tightens the general A/V sync threshold from 150ms to 50ms for quicker drift detection.
  4. Audio Mixer: Simplifies and clarifies timestamp propagation in the recording mixer for more accurate sync from capture.
    These changes collectively ensure more robust and precise audio-video synchronization throughout the recording, editing, and playback processes.

Open in Cursor Open in Web

Greptile Summary

This PR significantly improves audio-video synchronization through multiple coordinated enhancements across the recording, editing, and playback pipeline.

Key Changes

  • Audio Renderer: Fixed sample count calculations in render_audio to properly account for per-track offsets using signed arithmetic, preventing incorrect audio duration when tracks have different start times (crates/audio/src/renderer.rs:23-37)

  • Editor Playback Sync: Reduced cursor adjustment threshold from ~200ms (SAMPLE_RATE / 5) to ~33ms (one frame at 30fps) for more precise audio seeking and tighter A/V sync during editing (crates/editor/src/audio.rs:135-136)

  • Playback Drift Correction: Tightened general playback sync threshold from 150ms to 50ms for faster detection and correction of audio-video drift (crates/editor/src/playback.rs:937)

  • Audio Mixer Timestamp Propagation: Simplified and clarified timestamp calculation by computing output timestamps from samples_out relative to start_timestamp, and normalized output frame metadata to ensure downstream encoders receive consistent sample rate information (crates/recording/src/sources/audio_mixer.rs:380-403)

New Features

  • Device Calibration System: Added comprehensive sync calibration infrastructure that detects audio transients and video motion to compute device-pair-specific offsets, stores them persistently, and automatically applies them when opening projects (crates/audio/src/calibration_store.rs, crates/audio/src/sync_analysis.rs, crates/recording/src/sync_calibration.rs)

  • Input Latency Estimation: Added macOS input latency estimation using CoreAudio APIs to measure device latency, buffer latency, and stream latency for more accurate capture timing (crates/audio/src/latency.rs:509-631)

  • Auto-Generated Clip Offsets: Editor now automatically generates clip offsets with calibration data when opening projects without pre-configured clips (crates/editor/src/editor_instance.rs:142-176)

Technical Quality

The changes demonstrate careful attention to timing precision and maintain backwards compatibility through optional device IDs and default values. The calibration system includes confidence thresholding (>0.5) and weighted averaging across multiple measurements. All new code follows repository conventions including proper error handling and unit tests.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • All changes are well-isolated improvements to A/V sync logic with proper backwards compatibility. The audio renderer fix uses safe signed arithmetic, threshold reductions are conservative improvements, and the new calibration system is opt-in with confidence filtering. Comprehensive unit tests validate the calibration storage and sync analysis logic. No breaking changes or risky refactoring.
  • No files require special attention

Important Files Changed

Filename Overview
crates/audio/src/renderer.rs Fixed sample count calculation to properly account for per-track offsets, preventing incorrect duration
crates/editor/src/audio.rs Reduced cursor adjustment threshold from ~200ms to ~33ms for tighter editor A/V sync
crates/editor/src/playback.rs Tightened playback sync threshold from 150ms to 50ms for faster drift correction
crates/recording/src/sources/audio_mixer.rs Simplified timestamp propagation and normalized output frame metadata for accurate sync from capture
crates/audio/src/sync_analysis.rs New post-recording sync analysis detecting audio transients and video motion for offset calibration
crates/editor/src/editor_instance.rs Auto-generates clip offsets using calibration data when opening projects

Sequence Diagram

sequenceDiagram
    participant Recording as Recording System
    participant Mixer as Audio Mixer
    participant Calibration as Calibration Store
    participant Editor as Editor Instance
    participant Renderer as Audio Renderer
    participant Playback as Playback Engine

    Note over Recording,Playback: Recording Phase
    Recording->>Mixer: Audio frames with timestamps
    Mixer->>Mixer: Buffer sources & detect gaps
    Mixer->>Mixer: Normalize frame metadata (sample rate)
    Mixer->>Mixer: Calculate output timestamp from samples_out
    Mixer->>Recording: Mixed audio with corrected timestamps

    Note over Recording,Playback: Post-Recording Analysis
    Recording->>Calibration: Analyze audio transients
    Recording->>Calibration: Analyze video motion peaks
    Calibration->>Calibration: Correlate events & compute offset
    Calibration->>Calibration: Store device-pair calibration (if confidence > 0.5)

    Note over Recording,Playback: Editor Loading
    Editor->>Calibration: Load calibration store
    Editor->>Editor: Check if project.clips is empty
    alt Clips need generation
        Editor->>Calibration: Get offset for camera+mic pair
        Editor->>Editor: Calculate offsets with calibration
        Editor->>Editor: Save generated clip offsets to project
    end

    Note over Recording,Playback: Playback Phase
    Playback->>Renderer: Request audio frame
    Renderer->>Renderer: Adjust cursor (threshold: 33ms)
    Renderer->>Renderer: Calculate max_samples with track offsets
    Renderer->>Renderer: Render audio with per-track offsets
    Renderer->>Playback: Audio frame
    
    Playback->>Playback: Monitor video playhead changes
    alt Drift > 50ms threshold
        Playback->>Renderer: Seek to new video playhead
        Renderer->>Renderer: Reset cursor position
    end
    Playback->>Playback: Fill output buffer
Loading

Co-authored-by: richiemcilroy1 <richiemcilroy1@gmail.com>
@cursor
Copy link

cursor bot commented Dec 30, 2025

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
Learn more about Cursor Agents

cursoragent and others added 4 commits December 30, 2025 19:36
Co-authored-by: richiemcilroy1 <richiemcilroy1@gmail.com>
Co-authored-by: richiemcilroy1 <richiemcilroy1@gmail.com>
Co-authored-by: richiemcilroy1 <richiemcilroy1@gmail.com>
Co-authored-by: richiemcilroy1 <richiemcilroy1@gmail.com>
@richiemcilroy richiemcilroy marked this pull request as ready for review December 30, 2025 21:09
@richiemcilroy richiemcilroy merged commit 9e7a067 into main Dec 30, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants