Refactor audio processing and job handling for transcription workflows #264

AnkushMalaker · 2026-01-12T11:17:32Z

Updated upload_and_process_audio_files and start_post_conversation_jobs to enqueue transcription jobs separately for file uploads, ensuring accurate processing order.
Enhanced logging to provide clearer insights into job enqueuing and processing stages.
Removed batch transcription from the post-conversation job chain for streaming audio, utilizing the streaming transcript directly.
Introduced word-level timestamps in the Conversation model to improve transcript detail and accuracy.
Updated tests to reflect changes in job handling and ensure proper verification of post-conversation processing.

Summary by CodeRabbit

Release Notes

New Features
- Added word-level timestamp data for speaker segments in audio conversations, enabling precise temporal tracking of conversation content.
Improvements
- Refactored audio processing pipeline with streamlined background job orchestration for improved efficiency.
- Enhanced CI/CD workflow documentation with detailed deployment and troubleshooting guides.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

- Updated `upload_and_process_audio_files` and `start_post_conversation_jobs` to enqueue transcription jobs separately for file uploads, ensuring accurate processing order. - Enhanced logging to provide clearer insights into job enqueuing and processing stages. - Removed batch transcription from the post-conversation job chain for streaming audio, utilizing the streaming transcript directly. - Introduced word-level timestamps in the `Conversation` model to improve transcript detail and accuracy. - Updated tests to reflect changes in job handling and ensure proper verification of post-conversation processing.

coderabbitai · 2026-01-12T11:17:40Z

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

This PR refactors the audio transcription workflow to enqueue transcription as an upfront dependency rather than part of the post-conversation processing chain. Uploaded audio files now trigger batch transcription first, while streaming conversations use existing streaming transcripts. Job orchestration is restructured to eliminate transcription from the pipeline and update worker logic accordingly.

Changes

Cohort / File(s)	Summary
Workflow Documentation `.github/workflows/README.md`	Comprehensive reorganization and expansion of CI/CD workflow documentation with detailed sections on triggers, secrets, test coverage, usage snippets, and troubleshooting guidance. No functional code changes.
Batch Transcription Enqueue `backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py`, `backends/advanced/src/advanced_omi_backend/controllers/websocket_controller.py`	Both controllers now enqueue a transcription job upfront and pass it as a dependency via new `depends_on_job` parameter. Transcription job ID is propagated to response payloads and logs, replacing previous references.
Job Orchestration `backends/advanced/src/advanced_omi_backend/controllers/queue_controller.py`	Signature updated to remove `post_transcription` parameter and accept `depends_on_job`. Job chain restructured: speaker recognition now uses streaming transcripts directly, memory extraction decoupled from transcription, title/summary depends only on speaker job, event dispatch depends on both memory and title/summary. Return value updated to exclude transcription key.
Data Model Enhancement `backends/advanced/src/advanced_omi_backend/models/conversation.py`	Added nested `Word` model with timestamp fields (`word`, `start`, `end`, `confidence`). Extended `SpeakerSegment` with `words` list to capture word-level timestamps.
Streaming Transcript Processing `backends/advanced/src/advanced_omi_backend/workers/conversation_jobs.py`	Replaced immediate post-conversation enqueue with retrieval and persistence of final streaming transcript. Builds speaker segments with word-level timestamp data, determines transcript provider, and adds transcript as new conversation version before calling post-conversation jobs.
Test Updates `tests/endpoints/plugin_tests.robot`, `tests/integration/websocket_streaming_tests.robot`, `tests/resources/queue_keywords.robot`	Added `requires-api-keys` tags to two test cases. Updated integration test expectations to remove transcription from post-conversation job suite for streaming conversations. Adjusted error message extraction priority in job status checks.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant AudioController
    participant TranscriptionQueue
    participant QueueController
    participant Workers

    Client->>AudioController: Upload audio file
    AudioController->>TranscriptionQueue: Enqueue transcribe_full_audio_job
    TranscriptionQueue-->>AudioController: Return transcription_job_id
    AudioController->>QueueController: start_post_conversation_jobs(depends_on_job=transcription_job)
    QueueController->>Workers: Enqueue speaker_recognition (depends on transcription)
    QueueController->>Workers: Enqueue memory (depends on speaker)
    QueueController->>Workers: Enqueue title_summary (depends on speaker)
    QueueController->>Workers: Enqueue event_dispatch (depends on memory + title_summary)
    Workers-->>Client: Pipeline complete

sequenceDiagram
    participant Client
    participant WebSocketController
    participant ConversationWorker
    participant QueueController
    participant Workers

    Client->>WebSocketController: Stream audio (websocket)
    WebSocketController->>ConversationWorker: Process streaming transcript
    ConversationWorker->>ConversationWorker: Retrieve final streaming transcript
    ConversationWorker->>ConversationWorker: Build speaker segments with word timestamps
    ConversationWorker->>ConversationWorker: Add transcript version to conversation
    ConversationWorker->>QueueController: start_post_conversation_jobs(no transcription dependency)
    QueueController->>Workers: Enqueue speaker_recognition (uses streaming transcript)
    QueueController->>Workers: Enqueue memory (depends on speaker)
    QueueController->>Workers: Enqueue title_summary (depends on speaker)
    QueueController->>Workers: Enqueue event_dispatch (depends on memory + title_summary)
    Workers-->>Client: Pipeline complete

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: refactoring audio processing and job handling specifically for transcription workflows, which is central to all modifications across multiple controller and worker modules.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

AnkushMalaker · 2026-01-12T11:17:45Z

@coderabbitai review

coderabbitai · 2026-01-12T11:17:52Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In @.github/workflows/README.md:
- Line 408: Replace the bare URL on the line that currently reads "**GitHub
Actions Docs**: https://docs.github.com/en/actions" with a proper markdown link;
for example change it to "**GitHub Actions Docs**: [GitHub Actions
Docs](https://docs.github.com/en/actions)" (or "**GitHub Actions Docs**:
[https://docs.github.com/en/actions](https://docs.github.com/en/actions)") so
the URL is formatted as a markdown link.

🧹 Nitpick comments (5)

backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py (1)
117-117: Unused variable: version_id is overwritten before use.

The version_id created here at line 117 is never used - it's overwritten at line 161 before being passed to the transcription job. Consider removing this declaration to avoid confusion.
♻️ Suggested fix
-                version_id = str(uuid.uuid4())
-
                 # Generate title from filename
backends/advanced/src/advanced_omi_backend/workers/conversation_jobs.py (1)
592-606: Minor: Redundant client_id reassignment.

Line 593 reassigns client_id from conversation.client_id, but client_id is already available from the function parameters at line 39. Since the conversation was fetched successfully (checked at line 530), this works but is unnecessary.
♻️ Suggested simplification
     # Enqueue post-conversation processing pipeline (no batch transcription needed - using streaming transcript)
-    client_id = conversation.client_id if conversation else None
-
     job_ids = start_post_conversation_jobs(
backends/advanced/src/advanced_omi_backend/controllers/websocket_controller.py (1)
859-859: Unused variable: version_id is overwritten before use.

Similar to audio_controller.py, the version_id created here at line 859 is never used - it's overwritten at line 901. Consider removing this declaration.
♻️ Suggested fix
-        version_id = str(uuid.uuid4())
-
         conversation = create_conversation(
backends/advanced/src/advanced_omi_backend/controllers/queue_controller.py (2)
408-426: Consider adding type hint for depends_on_job parameter.

The dependency handling logic is correct, and the conditional logging provides good visibility into whether jobs start immediately or wait for a dependency. However, line 424 accesses depends_on_job.id without a type hint on the parameter (line 371).
🔧 Suggested type hint improvement
 def start_post_conversation_jobs(
     conversation_id: str,
     audio_uuid: str,
     user_id: str,
     transcript_version_id: Optional[str] = None,
-    depends_on_job = None,
+    depends_on_job: Optional[Job] = None,
     client_id: Optional[str] = None
 ) -> Dict[str, str]:
This would require importing Job at the top of the file (it's already imported on line 20).
428-481: Step numbers in comments are outdated.

The docstring correctly lists 4 steps (lines 378-381), but the inline comments still reference old step numbers:

Line 428: "Step 3" should be "Step 2" (memory extraction)

Line 444: "Step 4" should be "Step 3" (title/summary)

Line 461: "Step 5" should be "Step 4" (event dispatch)

This creates confusion between the documentation and inline comments.
📝 Update step numbers
-    # Step 3: Memory extraction job (parallel with title/summary)
+    # Step 2: Memory extraction job (parallel with title/summary)
     memory_job_id = f"memory_{conversation_id[:12]}"
-    # Step 4: Title/summary generation job (parallel with memory, independent)
+    # Step 3: Title/summary generation job (parallel with memory, independent)
     # This ensures conversations always get titles/summaries even if memory job fails
-    # Step 5: Dispatch conversation.complete event (runs after both memory and title/summary complete)
+    # Step 4: Dispatch conversation.complete event (runs after both memory and title/summary complete)
     # This ensures plugins receive the event after all processing is done

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 8ca401c and 55fd469.

📒 Files selected for processing (9)

.github/workflows/README.md
backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py
backends/advanced/src/advanced_omi_backend/controllers/queue_controller.py
backends/advanced/src/advanced_omi_backend/controllers/websocket_controller.py
backends/advanced/src/advanced_omi_backend/models/conversation.py
backends/advanced/src/advanced_omi_backend/workers/conversation_jobs.py
tests/endpoints/plugin_tests.robot
tests/integration/websocket_streaming_tests.robot
tests/resources/queue_keywords.robot

🧰 Additional context used

🧬 Code graph analysis (2)

backends/advanced/src/advanced_omi_backend/controllers/websocket_controller.py (2)

backends/advanced/src/advanced_omi_backend/controllers/queue_controller.py (1)

start_post_conversation_jobs (366-488)

backends/advanced/src/advanced_omi_backend/workers/transcription_jobs.py (1)

transcribe_full_audio_job (126-536)

backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py (2)

backends/advanced/src/advanced_omi_backend/controllers/queue_controller.py (1)

start_post_conversation_jobs (366-488)

backends/advanced/src/advanced_omi_backend/workers/transcription_jobs.py (1)

transcribe_full_audio_job (126-536)

🪛 LanguageTool

.github/workflows/README.md

[style] ~205-~205: Consider a different adjective to strengthen your wording.
Context: ...# Artifacts Downloadable artifacts for deeper analysis: - HTML Reports: `robot-te...

(DEEP_PROFOUND)

🪛 markdownlint-cli2 (0.18.1)

.github/workflows/README.md

408-408: Bare URL used

(MD034, no-bare-urls)

🪛 Ruff (0.14.10)

backends/advanced/src/advanced_omi_backend/workers/conversation_jobs.py

532-532: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: robot-tests

🔇 Additional comments (16)

tests/resources/queue_keywords.robot (1)

75-76: LGTM - Improved error message extraction with proper fallback chain.

The chained .get() with or operators provides robust error message retrieval, prioritizing error_message, then exc_info, and finally error with a default fallback.

tests/endpoints/plugin_tests.robot (2)

98-98: LGTM - Appropriate test tagging for batch transcription test.

Adding requires-api-keys tag allows proper test filtering for CI environments where API keys may not be available.

116-116: LGTM - Consistent tagging for streaming transcription test.

tests/integration/websocket_streaming_tests.robot (2)

105-106: LGTM - Documentation accurately reflects the streaming transcript workflow.

The updated documentation correctly notes that streaming conversations use the streaming transcript directly, eliminating the need for batch transcription in the post-conversation pipeline.

151-157: LGTM - Test expectations correctly updated for streaming workflow.

The test now verifies the expected post-conversation jobs (speaker, memory, title) without checking for batch transcription, aligning with the refactored pipeline where streaming transcripts are used directly.

backends/advanced/src/advanced_omi_backend/models/conversation.py (2)

55-61: LGTM - Well-structured Word model for word-level timestamps.

The Word model properly captures word-level transcript data with appropriate types. Using Optional[float] for confidence allows flexibility when confidence scores aren't available.

69-69: LGTM - Words field correctly added to SpeakerSegment.

Using default_factory=list correctly avoids mutable default argument issues, and the forward reference handles the nested model properly.

backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py (2)

153-177: LGTM - Transcription-first workflow correctly implemented.

The transcription job is properly enqueued with appropriate timeout (30 minutes), metadata, and job ID. This ensures transcription completes before downstream processing begins.

179-201: LGTM - Job chaining and response payload correctly updated.

The depends_on_job parameter properly establishes the transcription → post-conversation dependency. The response payload and logging accurately reflect the new job chain including transcription_job.id.

backends/advanced/src/advanced_omi_backend/workers/conversation_jobs.py (3)

524-532: LGTM - Proper retrieval of streaming transcript and conversation validation.

The conversation fetch with explicit error handling ensures downstream processing has valid data. The error message at line 532 is appropriately descriptive for debugging purposes.

534-558: LGTM - Well-structured conversion to SpeakerSegment with word-level timestamps.

The nested list comprehension correctly populates Conversation.Word objects from segment word data, preserving timestamps and confidence scores. Proper use of .get() with defaults handles missing fields gracefully.

560-590: LGTM - Robust provider handling and transcript version creation.

The try/except for provider validation with fallback to DEEPGRAM is a good defensive pattern. Metadata correctly captures the streaming source information.

backends/advanced/src/advanced_omi_backend/controllers/websocket_controller.py (2)

893-917: LGTM - Transcription-first workflow consistently implemented.

The batch audio processing correctly enqueues transcription before post-conversation jobs, matching the pattern in audio_controller.py. The 30-minute timeout and metadata structure are appropriate.

919-933: LGTM - Job chaining and logging correctly updated.

The depends_on_job parameter properly establishes the transcription dependency, and the log message accurately shows the full pipeline: transcription → speaker → memory.

backends/advanced/src/advanced_omi_backend/controllers/queue_controller.py (2)

374-395: LGTM! Clear documentation of the new workflow.

The docstring accurately describes the updated processing chain and clearly communicates the important distinction between streaming conversations (use streaming transcript) and file uploads (require separate transcription enqueueing). The return shape documentation matches the actual implementation.

483-488: LGTM! Return shape correctly reflects the new workflow.

The returned dictionary properly excludes the transcription key and includes all four job IDs from the new processing chain.

coderabbitai · 2026-01-12T11:20:45Z

.github/workflows/README.md

+- **Test Suite README**: `tests/README.md`
+- **Testing Guidelines**: `tests/TESTING_GUIDELINES.md`
+- **Tag Documentation**: `tests/tags.md`
+- **GitHub Actions Docs**: https://docs.github.com/en/actions


⚠️ Potential issue | 🟡 Minor

Format the bare URL as a markdown link.

The URL should use proper markdown link formatting for consistency with markdown best practices.

🔗 Proposed fix

-- **GitHub Actions Docs**: https://docs.github.com/en/actions +- **GitHub Actions Docs**: [https://docs.github.com/en/actions](https://docs.github.com/en/actions)

Alternatively, for cleaner display:

-- **GitHub Actions Docs**: https://docs.github.com/en/actions +- [GitHub Actions Documentation](https://docs.github.com/en/actions)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- **GitHub Actions Docs**: https://docs.github.com/en/actions

- **GitHub Actions Docs**: [https://docs.github.com/en/actions](https://docs.github.com/en/actions)

Suggested change

- **GitHub Actions Docs**: https://docs.github.com/en/actions

- [GitHub Actions Documentation](https://docs.github.com/en/actions)

🧰 Tools

🪛 markdownlint-cli2 (0.18.1)

408-408: Bare URL used

(MD034, no-bare-urls)

🤖 Prompt for AI Agents

In @.github/workflows/README.md at line 408, Replace the bare URL on the line that currently reads "**GitHub Actions Docs**: https://docs.github.com/en/actions" with a proper markdown link; for example change it to "**GitHub Actions Docs**: [GitHub Actions Docs](https://docs.github.com/en/actions)" (or "**GitHub Actions Docs**: [https://docs.github.com/en/actions](https://docs.github.com/en/actions)") so the URL is formatted as a markdown link.

github-actions · 2026-01-12T11:37:16Z

⚠️ Robot Framework Test Results (No API Keys)

Status: ❌ Some tests failed

ℹ️ Note: This run excludes tests requiring external API keys (Deepgram, OpenAI).
Tests tagged with requires-api-keys will run on dev/main branches.

Metric	Count
✅ Passed	78
❌ Failed	15
📊 Total	93

📊 View Reports

GitHub Pages (Live Reports):

Download Artifacts:

robot-test-reports-html-no-api - HTML reports
robot-test-results-xml-no-api - XML output

View full workflow run

github-actions · 2026-01-12T11:48:11Z

⚠️ Robot Framework Test Results (Label-Triggered Full Suite)

Status: ❌ Some tests failed

🏷️ Note: This run was triggered by the test-with-api-keys label.
All tests including API-dependent tests have been executed.

Metric	Count
✅ Passed	114
❌ Failed	1
📊 Total	115

📊 View Reports

GitHub Pages (Live Reports):

Download Artifacts:

robot-test-reports-html-pr-labeled - HTML reports
robot-test-results-xml-pr-labeled - XML output

View full workflow run

…pose configuration

github-actions · 2026-01-12T16:18:37Z

⚠️ Robot Framework Test Results (Label-Triggered Full Suite)

Status: ❌ Some tests failed

🏷️ Note: This run was triggered by the test-with-api-keys label.
All tests including API-dependent tests have been executed.

Metric	Count
✅ Passed	114
❌ Failed	1
📊 Total	115

📊 View Reports

GitHub Pages (Live Reports):

Download Artifacts:

robot-test-reports-html-pr-labeled - HTML reports
robot-test-results-xml-pr-labeled - XML output

View full workflow run

github-actions · 2026-01-12T16:18:40Z

⚠️ Robot Framework Test Results (No API Keys)

Status: ❌ Some tests failed

ℹ️ Note: This run excludes tests requiring external API keys (Deepgram, OpenAI).
Tests tagged with requires-api-keys will run on dev/main branches.

Metric	Count
✅ Passed	78
❌ Failed	15
📊 Total	93

📊 View Reports

GitHub Pages (Live Reports):

Download Artifacts:

robot-test-reports-html-no-api - HTML reports
robot-test-results-xml-no-api - XML output

View full workflow run

coderabbitai bot reviewed Jan 12, 2026

View reviewed changes

AnkushMalaker added the test-with-api-keys label Jan 12, 2026

AnkushMalaker added 2 commits January 12, 2026 14:17

Remove unnecessary network aliases from speaker service in Docker Com…

2f89970

…pose configuration

Add network aliases for speaker service in Docker Compose configuration

f1afd8b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor audio processing and job handling for transcription workflows #264

Refactor audio processing and job handling for transcription workflows #264

Uh oh!

AnkushMalaker commented Jan 12, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 12, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

AnkushMalaker commented Jan 12, 2026

Uh oh!

coderabbitai bot commented Jan 12, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 12, 2026

Uh oh!

github-actions bot commented Jan 12, 2026

Uh oh!

github-actions bot commented Jan 12, 2026

Uh oh!

github-actions bot commented Jan 12, 2026

Uh oh!

github-actions bot commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	- GitHub Actions Docs: https://docs.github.com/en/actions
	- GitHub Actions Docs: [https://docs.github.com/en/actions](https://docs.github.com/en/actions)

	- GitHub Actions Docs: https://docs.github.com/en/actions
	- [GitHub Actions Documentation](https://docs.github.com/en/actions)

Refactor audio processing and job handling for transcription workflows #264

Are you sure you want to change the base?

Refactor audio processing and job handling for transcription workflows #264

Uh oh!

Conversation

AnkushMalaker commented Jan 12, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

AnkushMalaker commented Jan 12, 2026

Uh oh!

coderabbitai bot commented Jan 12, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 12, 2026

⚠️ Robot Framework Test Results (No API Keys)

📊 View Reports

Uh oh!

github-actions bot commented Jan 12, 2026

⚠️ Robot Framework Test Results (Label-Triggered Full Suite)

📊 View Reports

Uh oh!

github-actions bot commented Jan 12, 2026

⚠️ Robot Framework Test Results (Label-Triggered Full Suite)

📊 View Reports

Uh oh!

github-actions bot commented Jan 12, 2026

⚠️ Robot Framework Test Results (No API Keys)

📊 View Reports

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AnkushMalaker commented Jan 12, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 12, 2026 •

edited

Loading