-
Notifications
You must be signed in to change notification settings - Fork 23
Refactor audio processing and job handling for transcription workflows #264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: feat/split-tests
Are you sure you want to change the base?
Conversation
- Updated `upload_and_process_audio_files` and `start_post_conversation_jobs` to enqueue transcription jobs separately for file uploads, ensuring accurate processing order. - Enhanced logging to provide clearer insights into job enqueuing and processing stages. - Removed batch transcription from the post-conversation job chain for streaming audio, utilizing the streaming transcript directly. - Introduced word-level timestamps in the `Conversation` model to improve transcript detail and accuracy. - Updated tests to reflect changes in job handling and ensure proper verification of post-conversation processing.
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the 📝 WalkthroughWalkthroughThis PR refactors the audio transcription workflow to enqueue transcription as an upfront dependency rather than part of the post-conversation processing chain. Uploaded audio files now trigger batch transcription first, while streaming conversations use existing streaming transcripts. Job orchestration is restructured to eliminate transcription from the pipeline and update worker logic accordingly. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant AudioController
participant TranscriptionQueue
participant QueueController
participant Workers
Client->>AudioController: Upload audio file
AudioController->>TranscriptionQueue: Enqueue transcribe_full_audio_job
TranscriptionQueue-->>AudioController: Return transcription_job_id
AudioController->>QueueController: start_post_conversation_jobs(depends_on_job=transcription_job)
QueueController->>Workers: Enqueue speaker_recognition (depends on transcription)
QueueController->>Workers: Enqueue memory (depends on speaker)
QueueController->>Workers: Enqueue title_summary (depends on speaker)
QueueController->>Workers: Enqueue event_dispatch (depends on memory + title_summary)
Workers-->>Client: Pipeline complete
sequenceDiagram
participant Client
participant WebSocketController
participant ConversationWorker
participant QueueController
participant Workers
Client->>WebSocketController: Stream audio (websocket)
WebSocketController->>ConversationWorker: Process streaming transcript
ConversationWorker->>ConversationWorker: Retrieve final streaming transcript
ConversationWorker->>ConversationWorker: Build speaker segments with word timestamps
ConversationWorker->>ConversationWorker: Add transcript version to conversation
ConversationWorker->>QueueController: start_post_conversation_jobs(no transcription dependency)
QueueController->>Workers: Enqueue speaker_recognition (uses streaming transcript)
QueueController->>Workers: Enqueue memory (depends on speaker)
QueueController->>Workers: Enqueue title_summary (depends on speaker)
QueueController->>Workers: Enqueue event_dispatch (depends on memory + title_summary)
Workers-->>Client: Pipeline complete
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @.github/workflows/README.md:
- Line 408: Replace the bare URL on the line that currently reads "**GitHub
Actions Docs**: https://docs.github.com/en/actions" with a proper markdown link;
for example change it to "**GitHub Actions Docs**: [GitHub Actions
Docs](https://docs.github.com/en/actions)" (or "**GitHub Actions Docs**:
[https://docs.github.com/en/actions](https://docs.github.com/en/actions)") so
the URL is formatted as a markdown link.
🧹 Nitpick comments (5)
backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py (1)
117-117: Unused variable:version_idis overwritten before use.The
version_idcreated here at line 117 is never used - it's overwritten at line 161 before being passed to the transcription job. Consider removing this declaration to avoid confusion.♻️ Suggested fix
- version_id = str(uuid.uuid4()) - # Generate title from filenamebackends/advanced/src/advanced_omi_backend/workers/conversation_jobs.py (1)
592-606: Minor: Redundantclient_idreassignment.Line 593 reassigns
client_idfromconversation.client_id, butclient_idis already available from the function parameters at line 39. Since the conversation was fetched successfully (checked at line 530), this works but is unnecessary.♻️ Suggested simplification
# Enqueue post-conversation processing pipeline (no batch transcription needed - using streaming transcript) - client_id = conversation.client_id if conversation else None - job_ids = start_post_conversation_jobs(backends/advanced/src/advanced_omi_backend/controllers/websocket_controller.py (1)
859-859: Unused variable:version_idis overwritten before use.Similar to
audio_controller.py, theversion_idcreated here at line 859 is never used - it's overwritten at line 901. Consider removing this declaration.♻️ Suggested fix
- version_id = str(uuid.uuid4()) - conversation = create_conversation(backends/advanced/src/advanced_omi_backend/controllers/queue_controller.py (2)
408-426: Consider adding type hint fordepends_on_jobparameter.The dependency handling logic is correct, and the conditional logging provides good visibility into whether jobs start immediately or wait for a dependency. However, line 424 accesses
depends_on_job.idwithout a type hint on the parameter (line 371).🔧 Suggested type hint improvement
def start_post_conversation_jobs( conversation_id: str, audio_uuid: str, user_id: str, transcript_version_id: Optional[str] = None, - depends_on_job = None, + depends_on_job: Optional[Job] = None, client_id: Optional[str] = None ) -> Dict[str, str]:This would require importing
Jobat the top of the file (it's already imported on line 20).
428-481: Step numbers in comments are outdated.The docstring correctly lists 4 steps (lines 378-381), but the inline comments still reference old step numbers:
- Line 428: "Step 3" should be "Step 2" (memory extraction)
- Line 444: "Step 4" should be "Step 3" (title/summary)
- Line 461: "Step 5" should be "Step 4" (event dispatch)
This creates confusion between the documentation and inline comments.
📝 Update step numbers
- # Step 3: Memory extraction job (parallel with title/summary) + # Step 2: Memory extraction job (parallel with title/summary) memory_job_id = f"memory_{conversation_id[:12]}"- # Step 4: Title/summary generation job (parallel with memory, independent) + # Step 3: Title/summary generation job (parallel with memory, independent) # This ensures conversations always get titles/summaries even if memory job fails- # Step 5: Dispatch conversation.complete event (runs after both memory and title/summary complete) + # Step 4: Dispatch conversation.complete event (runs after both memory and title/summary complete) # This ensures plugins receive the event after all processing is done
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (9)
.github/workflows/README.mdbackends/advanced/src/advanced_omi_backend/controllers/audio_controller.pybackends/advanced/src/advanced_omi_backend/controllers/queue_controller.pybackends/advanced/src/advanced_omi_backend/controllers/websocket_controller.pybackends/advanced/src/advanced_omi_backend/models/conversation.pybackends/advanced/src/advanced_omi_backend/workers/conversation_jobs.pytests/endpoints/plugin_tests.robottests/integration/websocket_streaming_tests.robottests/resources/queue_keywords.robot
🧰 Additional context used
🧬 Code graph analysis (2)
backends/advanced/src/advanced_omi_backend/controllers/websocket_controller.py (2)
backends/advanced/src/advanced_omi_backend/controllers/queue_controller.py (1)
start_post_conversation_jobs(366-488)backends/advanced/src/advanced_omi_backend/workers/transcription_jobs.py (1)
transcribe_full_audio_job(126-536)
backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py (2)
backends/advanced/src/advanced_omi_backend/controllers/queue_controller.py (1)
start_post_conversation_jobs(366-488)backends/advanced/src/advanced_omi_backend/workers/transcription_jobs.py (1)
transcribe_full_audio_job(126-536)
🪛 LanguageTool
.github/workflows/README.md
[style] ~205-~205: Consider a different adjective to strengthen your wording.
Context: ...# Artifacts Downloadable artifacts for deeper analysis: - HTML Reports: `robot-te...
(DEEP_PROFOUND)
🪛 markdownlint-cli2 (0.18.1)
.github/workflows/README.md
408-408: Bare URL used
(MD034, no-bare-urls)
🪛 Ruff (0.14.10)
backends/advanced/src/advanced_omi_backend/workers/conversation_jobs.py
532-532: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: robot-tests
🔇 Additional comments (16)
tests/resources/queue_keywords.robot (1)
75-76: LGTM - Improved error message extraction with proper fallback chain.The chained
.get()withoroperators provides robust error message retrieval, prioritizingerror_message, thenexc_info, and finallyerrorwith a default fallback.tests/endpoints/plugin_tests.robot (2)
98-98: LGTM - Appropriate test tagging for batch transcription test.Adding
requires-api-keystag allows proper test filtering for CI environments where API keys may not be available.
116-116: LGTM - Consistent tagging for streaming transcription test.tests/integration/websocket_streaming_tests.robot (2)
105-106: LGTM - Documentation accurately reflects the streaming transcript workflow.The updated documentation correctly notes that streaming conversations use the streaming transcript directly, eliminating the need for batch transcription in the post-conversation pipeline.
151-157: LGTM - Test expectations correctly updated for streaming workflow.The test now verifies the expected post-conversation jobs (speaker, memory, title) without checking for batch transcription, aligning with the refactored pipeline where streaming transcripts are used directly.
backends/advanced/src/advanced_omi_backend/models/conversation.py (2)
55-61: LGTM - Well-structured Word model for word-level timestamps.The
Wordmodel properly captures word-level transcript data with appropriate types. UsingOptional[float]for confidence allows flexibility when confidence scores aren't available.
69-69: LGTM - Words field correctly added to SpeakerSegment.Using
default_factory=listcorrectly avoids mutable default argument issues, and the forward reference handles the nested model properly.backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py (2)
153-177: LGTM - Transcription-first workflow correctly implemented.The transcription job is properly enqueued with appropriate timeout (30 minutes), metadata, and job ID. This ensures transcription completes before downstream processing begins.
179-201: LGTM - Job chaining and response payload correctly updated.The
depends_on_jobparameter properly establishes the transcription → post-conversation dependency. The response payload and logging accurately reflect the new job chain includingtranscription_job.id.backends/advanced/src/advanced_omi_backend/workers/conversation_jobs.py (3)
524-532: LGTM - Proper retrieval of streaming transcript and conversation validation.The conversation fetch with explicit error handling ensures downstream processing has valid data. The error message at line 532 is appropriately descriptive for debugging purposes.
534-558: LGTM - Well-structured conversion to SpeakerSegment with word-level timestamps.The nested list comprehension correctly populates
Conversation.Wordobjects from segment word data, preserving timestamps and confidence scores. Proper use of.get()with defaults handles missing fields gracefully.
560-590: LGTM - Robust provider handling and transcript version creation.The try/except for provider validation with fallback to DEEPGRAM is a good defensive pattern. Metadata correctly captures the streaming source information.
backends/advanced/src/advanced_omi_backend/controllers/websocket_controller.py (2)
893-917: LGTM - Transcription-first workflow consistently implemented.The batch audio processing correctly enqueues transcription before post-conversation jobs, matching the pattern in
audio_controller.py. The 30-minute timeout and metadata structure are appropriate.
919-933: LGTM - Job chaining and logging correctly updated.The
depends_on_jobparameter properly establishes the transcription dependency, and the log message accurately shows the full pipeline: transcription → speaker → memory.backends/advanced/src/advanced_omi_backend/controllers/queue_controller.py (2)
374-395: LGTM! Clear documentation of the new workflow.The docstring accurately describes the updated processing chain and clearly communicates the important distinction between streaming conversations (use streaming transcript) and file uploads (require separate transcription enqueueing). The return shape documentation matches the actual implementation.
483-488: LGTM! Return shape correctly reflects the new workflow.The returned dictionary properly excludes the transcription key and includes all four job IDs from the new processing chain.
| - **Test Suite README**: `tests/README.md` | ||
| - **Testing Guidelines**: `tests/TESTING_GUIDELINES.md` | ||
| - **Tag Documentation**: `tests/tags.md` | ||
| - **GitHub Actions Docs**: https://docs.github.com/en/actions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Format the bare URL as a markdown link.
The URL should use proper markdown link formatting for consistency with markdown best practices.
🔗 Proposed fix
-- **GitHub Actions Docs**: https://docs.github.com/en/actions
+- **GitHub Actions Docs**: [https://docs.github.com/en/actions](https://docs.github.com/en/actions)Alternatively, for cleaner display:
-- **GitHub Actions Docs**: https://docs.github.com/en/actions
+- [GitHub Actions Documentation](https://docs.github.com/en/actions)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - **GitHub Actions Docs**: https://docs.github.com/en/actions | |
| - **GitHub Actions Docs**: [https://docs.github.com/en/actions](https://docs.github.com/en/actions) |
| - **GitHub Actions Docs**: https://docs.github.com/en/actions | |
| - [GitHub Actions Documentation](https://docs.github.com/en/actions) |
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
408-408: Bare URL used
(MD034, no-bare-urls)
🤖 Prompt for AI Agents
In @.github/workflows/README.md at line 408, Replace the bare URL on the line
that currently reads "**GitHub Actions Docs**:
https://docs.github.com/en/actions" with a proper markdown link; for example
change it to "**GitHub Actions Docs**: [GitHub Actions
Docs](https://docs.github.com/en/actions)" (or "**GitHub Actions Docs**:
[https://docs.github.com/en/actions](https://docs.github.com/en/actions)") so
the URL is formatted as a markdown link.
|
| Metric | Count |
|---|---|
| ✅ Passed | 78 |
| ❌ Failed | 15 |
| 📊 Total | 93 |
📊 View Reports
GitHub Pages (Live Reports):
Download Artifacts:
- robot-test-reports-html-no-api - HTML reports
- robot-test-results-xml-no-api - XML output
|
| Metric | Count |
|---|---|
| ✅ Passed | 114 |
| ❌ Failed | 1 |
| 📊 Total | 115 |
📊 View Reports
GitHub Pages (Live Reports):
Download Artifacts:
- robot-test-reports-html-pr-labeled - HTML reports
- robot-test-results-xml-pr-labeled - XML output
|
| Metric | Count |
|---|---|
| ✅ Passed | 114 |
| ❌ Failed | 1 |
| 📊 Total | 115 |
📊 View Reports
GitHub Pages (Live Reports):
Download Artifacts:
- robot-test-reports-html-pr-labeled - HTML reports
- robot-test-results-xml-pr-labeled - XML output
|
| Metric | Count |
|---|---|
| ✅ Passed | 78 |
| ❌ Failed | 15 |
| 📊 Total | 93 |
📊 View Reports
GitHub Pages (Live Reports):
Download Artifacts:
- robot-test-reports-html-no-api - HTML reports
- robot-test-results-xml-no-api - XML output
upload_and_process_audio_filesandstart_post_conversation_jobsto enqueue transcription jobs separately for file uploads, ensuring accurate processing order.Conversationmodel to improve transcript detail and accuracy.Summary by CodeRabbit
Release Notes
New Features
Improvements
✏️ Tip: You can customize this high-level summary in your review settings.