Skip to content

Conversation

@AnkushMalaker
Copy link
Collaborator

@AnkushMalaker AnkushMalaker commented Jan 12, 2026

  • Updated upload_and_process_audio_files and start_post_conversation_jobs to enqueue transcription jobs separately for file uploads, ensuring accurate processing order.
  • Enhanced logging to provide clearer insights into job enqueuing and processing stages.
  • Removed batch transcription from the post-conversation job chain for streaming audio, utilizing the streaming transcript directly.
  • Introduced word-level timestamps in the Conversation model to improve transcript detail and accuracy.
  • Updated tests to reflect changes in job handling and ensure proper verification of post-conversation processing.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added word-level timestamp data for speaker segments in audio conversations, enabling precise temporal tracking of conversation content.
  • Improvements

    • Refactored audio processing pipeline with streamlined background job orchestration for improved efficiency.
    • Enhanced CI/CD workflow documentation with detailed deployment and troubleshooting guides.

✏️ Tip: You can customize this high-level summary in your review settings.

- Updated `upload_and_process_audio_files` and `start_post_conversation_jobs` to enqueue transcription jobs separately for file uploads, ensuring accurate processing order.
- Enhanced logging to provide clearer insights into job enqueuing and processing stages.
- Removed batch transcription from the post-conversation job chain for streaming audio, utilizing the streaming transcript directly.
- Introduced word-level timestamps in the `Conversation` model to improve transcript detail and accuracy.
- Updated tests to reflect changes in job handling and ensure proper verification of post-conversation processing.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 12, 2026

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

This PR refactors the audio transcription workflow to enqueue transcription as an upfront dependency rather than part of the post-conversation processing chain. Uploaded audio files now trigger batch transcription first, while streaming conversations use existing streaming transcripts. Job orchestration is restructured to eliminate transcription from the pipeline and update worker logic accordingly.

Changes

Cohort / File(s) Summary
Workflow Documentation
.github/workflows/README.md
Comprehensive reorganization and expansion of CI/CD workflow documentation with detailed sections on triggers, secrets, test coverage, usage snippets, and troubleshooting guidance. No functional code changes.
Batch Transcription Enqueue
backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py, backends/advanced/src/advanced_omi_backend/controllers/websocket_controller.py
Both controllers now enqueue a transcription job upfront and pass it as a dependency via new depends_on_job parameter. Transcription job ID is propagated to response payloads and logs, replacing previous references.
Job Orchestration
backends/advanced/src/advanced_omi_backend/controllers/queue_controller.py
Signature updated to remove post_transcription parameter and accept depends_on_job. Job chain restructured: speaker recognition now uses streaming transcripts directly, memory extraction decoupled from transcription, title/summary depends only on speaker job, event dispatch depends on both memory and title/summary. Return value updated to exclude transcription key.
Data Model Enhancement
backends/advanced/src/advanced_omi_backend/models/conversation.py
Added nested Word model with timestamp fields (word, start, end, confidence). Extended SpeakerSegment with words list to capture word-level timestamps.
Streaming Transcript Processing
backends/advanced/src/advanced_omi_backend/workers/conversation_jobs.py
Replaced immediate post-conversation enqueue with retrieval and persistence of final streaming transcript. Builds speaker segments with word-level timestamp data, determines transcript provider, and adds transcript as new conversation version before calling post-conversation jobs.
Test Updates
tests/endpoints/plugin_tests.robot, tests/integration/websocket_streaming_tests.robot, tests/resources/queue_keywords.robot
Added requires-api-keys tags to two test cases. Updated integration test expectations to remove transcription from post-conversation job suite for streaming conversations. Adjusted error message extraction priority in job status checks.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant AudioController
    participant TranscriptionQueue
    participant QueueController
    participant Workers

    Client->>AudioController: Upload audio file
    AudioController->>TranscriptionQueue: Enqueue transcribe_full_audio_job
    TranscriptionQueue-->>AudioController: Return transcription_job_id
    AudioController->>QueueController: start_post_conversation_jobs(depends_on_job=transcription_job)
    QueueController->>Workers: Enqueue speaker_recognition (depends on transcription)
    QueueController->>Workers: Enqueue memory (depends on speaker)
    QueueController->>Workers: Enqueue title_summary (depends on speaker)
    QueueController->>Workers: Enqueue event_dispatch (depends on memory + title_summary)
    Workers-->>Client: Pipeline complete
Loading
sequenceDiagram
    participant Client
    participant WebSocketController
    participant ConversationWorker
    participant QueueController
    participant Workers

    Client->>WebSocketController: Stream audio (websocket)
    WebSocketController->>ConversationWorker: Process streaming transcript
    ConversationWorker->>ConversationWorker: Retrieve final streaming transcript
    ConversationWorker->>ConversationWorker: Build speaker segments with word timestamps
    ConversationWorker->>ConversationWorker: Add transcript version to conversation
    ConversationWorker->>QueueController: start_post_conversation_jobs(no transcription dependency)
    QueueController->>Workers: Enqueue speaker_recognition (uses streaming transcript)
    QueueController->>Workers: Enqueue memory (depends on speaker)
    QueueController->>Workers: Enqueue title_summary (depends on speaker)
    QueueController->>Workers: Enqueue event_dispatch (depends on memory + title_summary)
    Workers-->>Client: Pipeline complete
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: refactoring audio processing and job handling specifically for transcription workflows, which is central to all modifications across multiple controller and worker modules.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@AnkushMalaker
Copy link
Collaborator Author

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 12, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @.github/workflows/README.md:
- Line 408: Replace the bare URL on the line that currently reads "**GitHub
Actions Docs**: https://docs.github.com/en/actions" with a proper markdown link;
for example change it to "**GitHub Actions Docs**: [GitHub Actions
Docs](https://docs.github.com/en/actions)" (or "**GitHub Actions Docs**:
[https://docs.github.com/en/actions](https://docs.github.com/en/actions)") so
the URL is formatted as a markdown link.
🧹 Nitpick comments (5)
backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py (1)

117-117: Unused variable: version_id is overwritten before use.

The version_id created here at line 117 is never used - it's overwritten at line 161 before being passed to the transcription job. Consider removing this declaration to avoid confusion.

♻️ Suggested fix
-                version_id = str(uuid.uuid4())
-
                 # Generate title from filename
backends/advanced/src/advanced_omi_backend/workers/conversation_jobs.py (1)

592-606: Minor: Redundant client_id reassignment.

Line 593 reassigns client_id from conversation.client_id, but client_id is already available from the function parameters at line 39. Since the conversation was fetched successfully (checked at line 530), this works but is unnecessary.

♻️ Suggested simplification
     # Enqueue post-conversation processing pipeline (no batch transcription needed - using streaming transcript)
-    client_id = conversation.client_id if conversation else None
-
     job_ids = start_post_conversation_jobs(
backends/advanced/src/advanced_omi_backend/controllers/websocket_controller.py (1)

859-859: Unused variable: version_id is overwritten before use.

Similar to audio_controller.py, the version_id created here at line 859 is never used - it's overwritten at line 901. Consider removing this declaration.

♻️ Suggested fix
-        version_id = str(uuid.uuid4())
-
         conversation = create_conversation(
backends/advanced/src/advanced_omi_backend/controllers/queue_controller.py (2)

408-426: Consider adding type hint for depends_on_job parameter.

The dependency handling logic is correct, and the conditional logging provides good visibility into whether jobs start immediately or wait for a dependency. However, line 424 accesses depends_on_job.id without a type hint on the parameter (line 371).

🔧 Suggested type hint improvement
 def start_post_conversation_jobs(
     conversation_id: str,
     audio_uuid: str,
     user_id: str,
     transcript_version_id: Optional[str] = None,
-    depends_on_job = None,
+    depends_on_job: Optional[Job] = None,
     client_id: Optional[str] = None
 ) -> Dict[str, str]:

This would require importing Job at the top of the file (it's already imported on line 20).


428-481: Step numbers in comments are outdated.

The docstring correctly lists 4 steps (lines 378-381), but the inline comments still reference old step numbers:

  • Line 428: "Step 3" should be "Step 2" (memory extraction)
  • Line 444: "Step 4" should be "Step 3" (title/summary)
  • Line 461: "Step 5" should be "Step 4" (event dispatch)

This creates confusion between the documentation and inline comments.

📝 Update step numbers
-    # Step 3: Memory extraction job (parallel with title/summary)
+    # Step 2: Memory extraction job (parallel with title/summary)
     memory_job_id = f"memory_{conversation_id[:12]}"
-    # Step 4: Title/summary generation job (parallel with memory, independent)
+    # Step 3: Title/summary generation job (parallel with memory, independent)
     # This ensures conversations always get titles/summaries even if memory job fails
-    # Step 5: Dispatch conversation.complete event (runs after both memory and title/summary complete)
+    # Step 4: Dispatch conversation.complete event (runs after both memory and title/summary complete)
     # This ensures plugins receive the event after all processing is done
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 8ca401c and 55fd469.

📒 Files selected for processing (9)
  • .github/workflows/README.md
  • backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py
  • backends/advanced/src/advanced_omi_backend/controllers/queue_controller.py
  • backends/advanced/src/advanced_omi_backend/controllers/websocket_controller.py
  • backends/advanced/src/advanced_omi_backend/models/conversation.py
  • backends/advanced/src/advanced_omi_backend/workers/conversation_jobs.py
  • tests/endpoints/plugin_tests.robot
  • tests/integration/websocket_streaming_tests.robot
  • tests/resources/queue_keywords.robot
🧰 Additional context used
🧬 Code graph analysis (2)
backends/advanced/src/advanced_omi_backend/controllers/websocket_controller.py (2)
backends/advanced/src/advanced_omi_backend/controllers/queue_controller.py (1)
  • start_post_conversation_jobs (366-488)
backends/advanced/src/advanced_omi_backend/workers/transcription_jobs.py (1)
  • transcribe_full_audio_job (126-536)
backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py (2)
backends/advanced/src/advanced_omi_backend/controllers/queue_controller.py (1)
  • start_post_conversation_jobs (366-488)
backends/advanced/src/advanced_omi_backend/workers/transcription_jobs.py (1)
  • transcribe_full_audio_job (126-536)
🪛 LanguageTool
.github/workflows/README.md

[style] ~205-~205: Consider a different adjective to strengthen your wording.
Context: ...# Artifacts Downloadable artifacts for deeper analysis: - HTML Reports: `robot-te...

(DEEP_PROFOUND)

🪛 markdownlint-cli2 (0.18.1)
.github/workflows/README.md

408-408: Bare URL used

(MD034, no-bare-urls)

🪛 Ruff (0.14.10)
backends/advanced/src/advanced_omi_backend/workers/conversation_jobs.py

532-532: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: robot-tests
🔇 Additional comments (16)
tests/resources/queue_keywords.robot (1)

75-76: LGTM - Improved error message extraction with proper fallback chain.

The chained .get() with or operators provides robust error message retrieval, prioritizing error_message, then exc_info, and finally error with a default fallback.

tests/endpoints/plugin_tests.robot (2)

98-98: LGTM - Appropriate test tagging for batch transcription test.

Adding requires-api-keys tag allows proper test filtering for CI environments where API keys may not be available.


116-116: LGTM - Consistent tagging for streaming transcription test.

tests/integration/websocket_streaming_tests.robot (2)

105-106: LGTM - Documentation accurately reflects the streaming transcript workflow.

The updated documentation correctly notes that streaming conversations use the streaming transcript directly, eliminating the need for batch transcription in the post-conversation pipeline.


151-157: LGTM - Test expectations correctly updated for streaming workflow.

The test now verifies the expected post-conversation jobs (speaker, memory, title) without checking for batch transcription, aligning with the refactored pipeline where streaming transcripts are used directly.

backends/advanced/src/advanced_omi_backend/models/conversation.py (2)

55-61: LGTM - Well-structured Word model for word-level timestamps.

The Word model properly captures word-level transcript data with appropriate types. Using Optional[float] for confidence allows flexibility when confidence scores aren't available.


69-69: LGTM - Words field correctly added to SpeakerSegment.

Using default_factory=list correctly avoids mutable default argument issues, and the forward reference handles the nested model properly.

backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py (2)

153-177: LGTM - Transcription-first workflow correctly implemented.

The transcription job is properly enqueued with appropriate timeout (30 minutes), metadata, and job ID. This ensures transcription completes before downstream processing begins.


179-201: LGTM - Job chaining and response payload correctly updated.

The depends_on_job parameter properly establishes the transcription → post-conversation dependency. The response payload and logging accurately reflect the new job chain including transcription_job.id.

backends/advanced/src/advanced_omi_backend/workers/conversation_jobs.py (3)

524-532: LGTM - Proper retrieval of streaming transcript and conversation validation.

The conversation fetch with explicit error handling ensures downstream processing has valid data. The error message at line 532 is appropriately descriptive for debugging purposes.


534-558: LGTM - Well-structured conversion to SpeakerSegment with word-level timestamps.

The nested list comprehension correctly populates Conversation.Word objects from segment word data, preserving timestamps and confidence scores. Proper use of .get() with defaults handles missing fields gracefully.


560-590: LGTM - Robust provider handling and transcript version creation.

The try/except for provider validation with fallback to DEEPGRAM is a good defensive pattern. Metadata correctly captures the streaming source information.

backends/advanced/src/advanced_omi_backend/controllers/websocket_controller.py (2)

893-917: LGTM - Transcription-first workflow consistently implemented.

The batch audio processing correctly enqueues transcription before post-conversation jobs, matching the pattern in audio_controller.py. The 30-minute timeout and metadata structure are appropriate.


919-933: LGTM - Job chaining and logging correctly updated.

The depends_on_job parameter properly establishes the transcription dependency, and the log message accurately shows the full pipeline: transcription → speaker → memory.

backends/advanced/src/advanced_omi_backend/controllers/queue_controller.py (2)

374-395: LGTM! Clear documentation of the new workflow.

The docstring accurately describes the updated processing chain and clearly communicates the important distinction between streaming conversations (use streaming transcript) and file uploads (require separate transcription enqueueing). The return shape documentation matches the actual implementation.


483-488: LGTM! Return shape correctly reflects the new workflow.

The returned dictionary properly excludes the transcription key and includes all four job IDs from the new processing chain.

- **Test Suite README**: `tests/README.md`
- **Testing Guidelines**: `tests/TESTING_GUIDELINES.md`
- **Tag Documentation**: `tests/tags.md`
- **GitHub Actions Docs**: https://docs.github.com/en/actions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Format the bare URL as a markdown link.

The URL should use proper markdown link formatting for consistency with markdown best practices.

🔗 Proposed fix
-- **GitHub Actions Docs**: https://docs.github.com/en/actions
+- **GitHub Actions Docs**: [https://docs.github.com/en/actions](https://docs.github.com/en/actions)

Alternatively, for cleaner display:

-- **GitHub Actions Docs**: https://docs.github.com/en/actions
+- [GitHub Actions Documentation](https://docs.github.com/en/actions)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- **GitHub Actions Docs**: https://docs.github.com/en/actions
- **GitHub Actions Docs**: [https://docs.github.com/en/actions](https://docs.github.com/en/actions)
Suggested change
- **GitHub Actions Docs**: https://docs.github.com/en/actions
- [GitHub Actions Documentation](https://docs.github.com/en/actions)
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

408-408: Bare URL used

(MD034, no-bare-urls)

🤖 Prompt for AI Agents
In @.github/workflows/README.md at line 408, Replace the bare URL on the line
that currently reads "**GitHub Actions Docs**:
https://docs.github.com/en/actions" with a proper markdown link; for example
change it to "**GitHub Actions Docs**: [GitHub Actions
Docs](https://docs.github.com/en/actions)" (or "**GitHub Actions Docs**:
[https://docs.github.com/en/actions](https://docs.github.com/en/actions)") so
the URL is formatted as a markdown link.

@github-actions
Copy link

⚠️ Robot Framework Test Results (No API Keys)

Status: ❌ Some tests failed

ℹ️ Note: This run excludes tests requiring external API keys (Deepgram, OpenAI).
Tests tagged with requires-api-keys will run on dev/main branches.

Metric Count
✅ Passed 78
❌ Failed 15
📊 Total 93

📊 View Reports

GitHub Pages (Live Reports):

Download Artifacts:


View full workflow run

@github-actions
Copy link

⚠️ Robot Framework Test Results (Label-Triggered Full Suite)

Status: ❌ Some tests failed

🏷️ Note: This run was triggered by the test-with-api-keys label.
All tests including API-dependent tests have been executed.

Metric Count
✅ Passed 114
❌ Failed 1
📊 Total 115

📊 View Reports

GitHub Pages (Live Reports):

Download Artifacts:


View full workflow run

@github-actions
Copy link

⚠️ Robot Framework Test Results (Label-Triggered Full Suite)

Status: ❌ Some tests failed

🏷️ Note: This run was triggered by the test-with-api-keys label.
All tests including API-dependent tests have been executed.

Metric Count
✅ Passed 114
❌ Failed 1
📊 Total 115

📊 View Reports

GitHub Pages (Live Reports):

Download Artifacts:


View full workflow run

@github-actions
Copy link

⚠️ Robot Framework Test Results (No API Keys)

Status: ❌ Some tests failed

ℹ️ Note: This run excludes tests requiring external API keys (Deepgram, OpenAI).
Tests tagged with requires-api-keys will run on dev/main branches.

Metric Count
✅ Passed 78
❌ Failed 15
📊 Total 93

📊 View Reports

GitHub Pages (Live Reports):

Download Artifacts:


View full workflow run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants