Skip to content

Conversation

@AnkushMalaker
Copy link
Collaborator

@AnkushMalaker AnkushMalaker commented Jan 14, 2026

Summary by CodeRabbit

  • New Features

    • Added audio waveform visualization with interactive seek controls
    • Added conversation archive/restore functionality for soft-deleted conversations
    • Added HTTP Range request support for partial audio content streaming
    • Introduced memory timestamp tracking with updated_at field
    • Added chunked speaker recognition for large audio files
  • Bug Fixes

    • Improved error handling and cleanup for failed audio processing
    • Enhanced audio streaming termination signaling
  • Documentation

    • Added comprehensive default configuration settings

✏️ Tip: You can customize this high-level summary in your review settings.

- Updated Docker Compose files to mount the entire config directory, allowing for better management of configuration files.
- Introduced a new `WaveformData` model to store pre-computed waveform visualization data, improving UI performance by enabling waveform display without real-time decoding.
- Enhanced the `app_factory` and `job` models to include the new `WaveformData` model, ensuring proper initialization and data handling.
- Implemented waveform generation logic in a new worker module, allowing for on-demand waveform creation from audio chunks.
- Added API endpoints for retrieving and generating waveform data, improving the overall audio processing capabilities.
- Updated tests to cover new functionality and ensure robustness in waveform data handling.
…nd audio upload

- Introduced three new test scripts: `sdk_test_auth.py`, `sdk_test_conversations.py`, and `sdk_test_upload.py`.
- Each script tests different functionalities of the SDK, including authentication, conversation retrieval, and audio file uploads.
- The scripts utilize the `ChronicleClient` to perform operations and print results for verification.
- Enhanced testing capabilities for the SDK, ensuring robust validation of core features.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 14, 2026

📝 Walkthrough

Walkthrough

This PR introduces waveform-based audio visualization with on-demand generation, soft-delete conversation archival with restoration capability, configuration management refactoring to merge defaults with user config, enhanced speaker recognition with chunked audio processing for large files, memory system updates with updated_at field tracking, HTTP Range request support for audio streaming, and a comprehensive backend state cleanup utility with optional backup and export.

Changes

Cohort / File(s) Summary
Configuration System Refactoring
backends/advanced/src/advanced_omi_backend/config.py, backends/advanced/src/advanced_omi_backend/model_registry.py
Introduces centralized config merging system combining defaults.yml with user config.yml via merge_configs(), caching, and environment variable resolution. Updates model_registry to use merged config path.
Configuration & Environment Files
backends/advanced/.env.template, config/defaults.yml, backends/advanced/Docs/memories.md
Adds speaker chunking env vars (threshold, size, overlap); creates comprehensive defaults.yml with models and services; updates memory collection name to chronicle_memories.
Docker & Deployment
backends/advanced/docker-compose.yml, backends/advanced/docker-compose-test.yml
Replaces individual config file mounts with mounted config directory; applies to both test and production services.
Data Models
backends/advanced/src/advanced_omi_backend/models/waveform.py, backends/advanced/src/advanced_omi_backend/app_factory.py, backends/advanced/src/advanced_omi_backend/models/job.py
New WaveformData Beanie model with conversation_id index, samples, sample_rate, duration; integrates into Beanie initialization.
Audio Processing & Waveform Generation
backends/advanced/src/advanced_omi_backend/utils/audio_chunk_utils.py, backends/advanced/src/advanced_omi_backend/workers/waveform_jobs.py
Adds reconstruct_audio_segments() and filter_transcript_by_time() utilities; enforces 30-minute duration cap; adds generate_waveform_data() worker to decode Opus, downsample, and normalize waveform samples.
Audio API & Streaming
backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py
Implements HTTP Range request support (206 Partial Content) with Range header parsing, error handling, and Accept-Ranges headers.
Conversation Management
backends/advanced/src/advanced_omi_backend/controllers/conversation_controller.py, backends/advanced/src/advanced_omi_backend/routers/modules/conversation_routes.py
Adds optional include_deleted parameter to filter soft-deleted conversations; new /waveform endpoint with on-demand generation and caching.
Conversation UI & Archive
backends/advanced/webui/src/pages/Conversations.tsx, backends/advanced/webui/src/pages/Archive.tsx, backends/advanced/webui/src/pages/ConversationsRouter.tsx, backends/advanced/webui/src/pages/ConversationsTimeline.tsx
Adds Archive page for viewing/restoring deleted conversations; integrates WaveformDisplay; removes deleted conversation UI from main list; updates API calls to filter deleted conversations.
Waveform UI Component
backends/advanced/webui/src/components/audio/WaveformDisplay.tsx
New React component that fetches waveform data, renders canvas with vertical bars, displays playback indicator, and handles seek interactions.
Memory System - Timestamps
backends/advanced/src/advanced_omi_backend/services/memory/base.py, backends/advanced/src/advanced_omi_backend/services/memory/providers/chronicle.py, backends/advanced/src/advanced_omi_backend/services/memory/providers/mycelia.py, backends/advanced/src/advanced_omi_backend/services/memory/providers/openmemory_mcp.py, backends/advanced/src/advanced_omi_backend/services/memory/providers/vector_stores.py
Adds updated_at field to MemoryEntry; new get_memory() method for single memory retrieval; updates providers to extract/set updated_at timestamps; memory collection renamed to chronicle_memories.
Memory Configuration
backends/advanced/src/advanced_omi_backend/services/memory/config.py, backends/advanced/src/advanced_omi_backend/services/memory/README.md
Updates canonical config path resolution via get_config_yml_path(); changes default collection from omi_memories to chronicle_memories.
Audio Stream Processing
backends/advanced/src/advanced_omi_backend/services/audio_stream/aggregator.py, backends/advanced/src/advanced_omi_backend/services/audio_stream/producer.py
Changes aggregation to gather all multi-result data with average confidence; adds end_marker signaling to Redis stream on finalization.
Worker Jobs - Speaker Recognition
backends/advanced/src/advanced_omi_backend/workers/speaker_jobs.py
Introduces chunked audio processing for large files with overlap handling; adds _merge_overlapping_speaker_segments() helper; extends threshold-based segment merging; updates check_enrolled_speakers_job() signature.
Worker Jobs - Transcription & Conversation
backends/advanced/src/advanced_omi_backend/workers/transcription_jobs.py, backends/advanced/src/advanced_omi_backend/workers/conversation_jobs.py
Adds speech analysis reason tracking and no_speech_detected reason field; validates and filters transcript segments; adds synthetic segment creation from word data.
Audio Upload & Error Handling
backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py
Adds ValueError handling for chunk conversion with conversation cleanup on failure.
Cleanup Jobs
backends/advanced/src/advanced_omi_backend/workers/cleanup_jobs.py
Integrates WaveformData deletion into soft-delete cleanup flow; tracks purged_waveforms in return payload.
Backend State Cleanup Utility
backends/advanced/src/scripts/cleanup_state.py
New comprehensive 900+ line cleanup script with BackupManager (MongoDB/Qdrant/WAV export), CleanupManager (MongoDB/Redis/Qdrant/legacy WAV cleanup), dry-run, force, and optional user deletion; includes backup summary and statistics.
Frontend API Integration
backends/advanced/webui/src/services/api.ts
Updates getAll() to accept optional includeDeleted parameter; adds restore() and permanentDelete() endpoints.
Frontend Utilities
backends/advanced/webui/src/pages/Memories.tsx
Updates formatDate() to handle undefined/null with N/A fallback.
Unit Tests - Memory
backends/advanced/tests/test_memory_entry.py, backends/advanced/tests/test_memory_providers.py
New comprehensive tests for MemoryEntry timestamp initialization, serialization, and provider-specific conversion logic (Mycelia, OpenMemory MCP).
Unit Tests - Configuration
tests/unit/test_config_loading.py
New tests for merge_configs, caching, reloading, nested merging, list replacement, and edge cases.
Integration Tests - Endpoints
tests/endpoints/memory_tests.robot
Adds validation for memory updated_at field alongside created_at.
Integration Tests - WebSocket & SDK
tests/integration/websocket_transcription_e2e_test.robot, tests/integration/sdk_tests.robot
New WebSocket transcription end-to-end tests with end_marker validation; new SDK test suite for auth, upload, and conversation retrieval.
Integration Test Scripts
tests/scripts/sdk_test_auth.py, tests/scripts/sdk_test_upload.py, tests/scripts/sdk_test_conversations.py
New Python scripts to test SDK authentication, audio upload, and conversation listing.
Test Infrastructure
tests/resources/queue_keywords.robot, tests/resources/redis_keywords.robot
New robot keywords: Get Job Result, Redis Command, Get Backend Logs.

Sequence Diagrams

sequenceDiagram
    participant User as User/Client
    participant API as Conversation API
    participant DB as MongoDB
    participant AudioChunks as AudioChunks
    participant Decoder as Decoder
    participant WaveformWorker as Waveform Worker
    participant Storage as WaveformData

    User->>API: GET /conversations/{id}/waveform
    API->>DB: Query WaveformData by conversation_id
    alt Waveform cached
        DB-->>API: Return WaveformData
        API-->>User: Return samples, duration, sample_rate
    else Waveform not cached
        API->>WaveformWorker: Queue generate_waveform_data
        WaveformWorker->>AudioChunks: Query Opus chunks
        AudioChunks-->>WaveformWorker: Return audio chunks
        WaveformWorker->>Decoder: Decode Opus → PCM samples
        Decoder-->>WaveformWorker: Return raw PCM
        WaveformWorker->>WaveformWorker: Downsample to target rate
        WaveformWorker->>WaveformWorker: Normalize amplitude [-1.0, 1.0]
        WaveformWorker->>Storage: Create WaveformData document
        Storage-->>WaveformWorker: Persisted waveform
        WaveformWorker-->>API: Return generated waveform
        API-->>User: Return samples, duration, sample_rate
    end
Loading
sequenceDiagram
    participant User as User/Client
    participant Archive as Archive UI
    participant API as Conversation API
    participant DB as MongoDB
    participant Cleanup as Cleanup Service

    User->>Archive: Open Archive page
    Archive->>API: GET /conversations?include_deleted=true
    API->>DB: Query conversations where deleted=true
    DB-->>API: Return deleted conversations
    API-->>Archive: Populate archive list

    User->>Archive: Click "Restore" button
    Archive->>API: POST /conversations/{id}/restore
    API->>DB: Unset deleted flag on conversation
    DB-->>API: Update successful
    API-->>Archive: Refresh list
    Archive->>API: GET /conversations?include_deleted=true
    API->>DB: Query (updated conversations)
    DB-->>Archive: Restored conversation removed from view

    User->>Archive: Click "Permanent Delete" (admin)
    Archive->>API: DELETE /conversations/{id}?permanent=true
    API->>DB: Delete conversation document
    API->>Cleanup: Trigger cleanup_jobs
    Cleanup->>DB: Delete associated AudioChunks
    Cleanup->>DB: Delete associated WaveformData
    Cleanup-->>API: Cleanup complete
    API-->>Archive: Refresh list
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Fix/conversation view mongo' is vague and does not clearly convey the primary changes in the pull request, which involve extensive waveform data handling, audio processing improvements, configuration merging, and multiple new features. Consider a more descriptive title that reflects the main changes, such as 'Add waveform visualization and audio processing improvements' or 'Refactor config management and add waveform support'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 90.27% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

@AnkushMalaker AnkushMalaker changed the base branch from fix/audio-pipe-mongo to fix/audio-pipe-3 January 14, 2026 10:00
@github-actions
Copy link

⚠️ Robot Framework Test Results (No API Keys)

Status: ❌ Some tests failed

ℹ️ Note: This run excludes tests requiring external API keys (Deepgram, OpenAI).
Tests tagged with requires-api-keys will run on dev/main branches.

Metric Count
✅ Passed 79
❌ Failed 18
📊 Total 97

📊 View Reports

GitHub Pages (Live Reports):

Download Artifacts:


View full workflow run

- Added configuration options for speaker recognition chunking in `.env.template`, allowing for better management of large audio files.
- Updated `get_conversations` function to include an `include_deleted` parameter for filtering conversations based on their deletion status.
- Enhanced `finalize_session` method in `AudioStreamProducer` to send an end marker to Redis, ensuring proper session closure.
- Introduced `reconstruct_audio_segments` function to yield audio segments with overlap for efficient processing of lengthy conversations.
- Implemented merging of overlapping speaker segments to improve accuracy in speaker recognition.
- Added integration tests for WebSocket streaming transcription to validate the end_marker functionality and overall transcription flow.
@github-actions
Copy link

⚠️ Robot Framework Test Results (No API Keys)

Status: ❌ Some tests failed

ℹ️ Note: This run excludes tests requiring external API keys (Deepgram, OpenAI).
Tests tagged with requires-api-keys will run on dev/main branches.

Metric Count
✅ Passed 79
❌ Failed 18
📊 Total 97

📊 View Reports

GitHub Pages (Live Reports):

Download Artifacts:


View full workflow run

@github-actions
Copy link

⚠️ Robot Framework Test Results (No API Keys)

Status: ❌ Some tests failed

ℹ️ Note: This run excludes tests requiring external API keys (Deepgram, OpenAI).
Tests tagged with requires-api-keys will run on dev/main branches.

Metric Count
✅ Passed 79
❌ Failed 18
📊 Total 97

📊 View Reports

GitHub Pages (Live Reports):

Download Artifacts:


View full workflow run

@AnkushMalaker
Copy link
Collaborator Author

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 14, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backends/advanced/src/advanced_omi_backend/models/job.py (1)

49-54: Bug: raise statement makes the ConfigurationError fallback ineffective, and duplicate flag assignment.

The except block catches ConfigurationError and sets a fallback database, but then immediately re-raises the exception, making the fallback useless. Additionally, _beanie_initialized is set to True on line 54 before init_beanie is actually called, and then redundantly set again on line 61.

🐛 Proposed fix
             try:
                 database = client.get_default_database(mongodb_database)
             except ConfigurationError:
                 database = client[mongodb_database]
-                raise
-            _beanie_initialized = True
+
             # Initialize Beanie
             await init_beanie(
                 database=database,
                 document_models=[User, Conversation, AudioChunkDocument, WaveformData],
             )

             _beanie_initialized = True
             logger.info("✅ Beanie initialized in RQ worker process")

Also applies to: 61-61

🤖 Fix all issues with AI agents
In `@backends/advanced/Docs/memories.md`:
- Line 502: Search the repo for the string "omi_memories" and replace it with
"chronicle_memories" in all configuration and test YAML files (the defaults and
test configs referenced in the review), update any docs examples (e.g.,
curl/verification lines) to use "chronicle_memories", and run tests to ensure no
remaining references to "omi_memories" remain.

In `@backends/advanced/src/advanced_omi_backend/models/waveform.py`:
- Around line 43-47: The waveform model's conversation_id isn't enforced unique
despite the comment; update the field definition from Indexed(str) to
Indexed(str, unique=True) (the same pattern used in Conversation) so inserts
cannot create duplicate conversation_id values, and keep the Settings.indexes
entry as-is; modify the conversation_id declaration in the Waveform model to use
Indexed(str, unique=True).

In `@backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py`:
- Around line 128-163: Validate the parsed byte range after splitting
range_header: after computing range_start and range_end (and clamping to
0..file_size-1) ensure range_start <= range_end; if not, return a 416 Range Not
Satisfiable with "Content-Range: bytes */{file_size}". Also guard against
negative content_length by computing content_length = range_end - range_start +
1 only after the validation, and use the slice wav_data[range_start:range_end +
1] only when valid. Remove the unused exception variable `e` in the except
clause so the handler catches (ValueError, IndexError) and returns 416 without
referencing `e`.

In `@backends/advanced/src/advanced_omi_backend/services/memory/config.py`:
- Line 143: The code default for collection_name (variable collection_name in
config.py) was changed to "chronicle_memories" while existing defaults.yml and
tests still reference "omi_memories", causing a breaking mismatch; fix by
reverting the config.py default back to "omi_memories" (restore collection_name:
str = "omi_memories") so existing deployments keep working, update any unit
tests that assert the default to match, and add a short migration note in
project docs explaining the planned rename and how to opt into
"chronicle_memories" via explicit config so future changes are non-breaking.

In `@backends/advanced/src/advanced_omi_backend/utils/audio_chunk_utils.py`:
- Around line 430-434: The code incorrectly uses
Conversation.get(conversation_id) which looks up by MongoDB _id; replace it with
the indexed-field lookup used elsewhere (pattern at lines 669-671) by awaiting
Conversation.find_one with the filter on the conversation_id field (e.g., await
Conversation.find_one({"conversation_id": conversation_id}) or equivalent
keyword-arg form) and keep the existing not-found ValueError branch.

In `@backends/advanced/src/advanced_omi_backend/workers/conversation_jobs.py`:
- Around line 570-598: The first logger call uses an unnecessary f-string:
change logger.info(f"📝 No segments in streaming results, creating from
word-level data") to a plain string (remove the leading f) in the branch where
segments_data is empty (the block that calls
aggregator.get_session_results(session_id) and appends to segments_data); leave
other f-strings (e.g., the following logger.info that includes
{len(segments_data)}) unchanged.

In `@backends/advanced/src/advanced_omi_backend/workers/speaker_jobs.py`:
- Around line 18-21: The function _merge_overlapping_speaker_segments declares
an unused overlap parameter; either remove it from the signature (and all
callers) if not needed, or incorporate it into the merge logic by comparing the
actual overlap against the provided overlap threshold (e.g., compute overlap =
min(current["end"], next_seg["end"]) - max(current["start"], next_seg["start"])
or check next_seg["start"] - current["end"] <= overlap) and only merge when the
overlap condition is met; update the function signature, any callers/tests, and
the docstring to reflect the chosen behavior and ensure segment merging uses
segments, next_seg["start"], and current["end"] consistently.
- Around line 319-323: The generator loop over reconstruct_audio_segments
currently unpacks a variable named start_time which shadows the earlier
job_start_time used for profiling; rename the unpacked variable (e.g.,
segment_start_time or wav_start_time) in the async for unpack (async for
wav_data, segment_start_time, end_time in reconstruct_audio_segments(...)) and
update all in-loop references to that new name, and ensure all
processing_time_seconds calculations use time.time() - job_start_time (not the
unpacked name) so timing metrics (processing_time_seconds / processing_time)
remain correct.

In `@backends/advanced/src/advanced_omi_backend/workers/waveform_jobs.py`:
- Around line 19-34: The docstring for the function that accepts parameter
sample_rate incorrectly states the default is 10 while the function signature
sets sample_rate: int = 3; update the docstring to state the correct default (3)
so the signature and documentation match, referencing the sample_rate parameter
in the function (waveform generation function in waveform_jobs.py) and adjust
the "Args" section to list "sample_rate: Samples per second for waveform
(default: 3)".

In `@backends/advanced/webui/src/components/audio/WaveformDisplay.tsx`:
- Around line 77-95: The drawWaveform function can divide by zero when samples
is empty; add an early guard at the start of drawWaveform that returns
immediately if !samples || samples.length === 0 to avoid computing barWidth =
width / samples.length and any downstream issues; update the function
(drawWaveform) to perform this check before computing barWidth/centerY and
before using samples.forEach.

In `@tests/integration/sdk_tests.robot`:
- Around line 59-71: The test "SDK Upload Respects Backend File Size Limit"
currently only re-runs the authentication script (Run Process ...
${CURDIR}/../scripts/sdk_test_auth.py) and doesn't exercise upload/oversize
behavior; either mark it as skipped until implemented or rename it to reflect
it's a placeholder: add a skip tag (e.g., add "skip" to the [Tags] line) and
update the test name or [Documentation] to indicate it's a placeholder that
re-runs sdk_test_auth.py, or alternatively change the test name to something
like "Placeholder: Re-run SDK auth script" so the intent and behavior match.

In `@tests/resources/queue_keywords.robot`:
- Around line 330-342: The Robot keyword "Get Job Result" is calling the wrong
endpoint path "/api/queue/job/${job_id}" which should be plural; update the GET
call in that keyword to use "/api/queue/jobs/${job_id}" so it matches the
backend route and will return 200; ensure the GET On Session invocation and any
related variables (${response}, ${job_data}, ${result}) remain unchanged.
🧹 Nitpick comments (29)
backends/advanced/src/advanced_omi_backend/services/transcription/__init__.py (2)

151-181: Consider adding defensive type check for each segment.

The normalization logic correctly handles Deepgram's nested response structure. However, if a malformed response contains a non-dict element in the segments list, seg.get() will raise an AttributeError.

♻️ Suggested defensive check
             if provider == "deepgram" and segments:
                 normalized_segments = []
                 for seg in segments:
+                    if not isinstance(seg, dict):
+                        logger.debug(f"Skipping non-dict Deepgram segment: {seg}")
+                        continue
                     # Deepgram segments may have nested structure
                     # Extract text from either 'text' or 'transcript' or 'sentences'
                     text_content = seg.get("text") or seg.get("transcript") or ""

171-177: Default confidence of 1.0 may overstate certainty.

When confidence data is missing from a segment, defaulting to 1.0 (100% confidence) could be misleading to downstream consumers. Consider using 0.0 or None to indicate unknown confidence, unless downstream code explicitly requires a numeric value.

backends/advanced/src/advanced_omi_backend/utils/audio_chunk_utils.py (1)

462-466: Inconsistent sort syntax.

Line 466 uses .sort(+AudioChunkDocument.chunk_index) while line 281 uses .sort("+chunk_index"). The unary + operator on a Beanie field object is non-standard and may not behave as intended.

♻️ Suggested fix for consistency
-        ).sort(+AudioChunkDocument.chunk_index).to_list()
+        ).sort("+chunk_index").to_list()
backends/advanced/src/advanced_omi_backend/workers/speaker_jobs.py (1)

386-386: Remove extraneous f-string prefix.

This f-string has no placeholders.

-            logger.info(f"🎤 Calling speaker recognition service...")
+            logger.info("🎤 Calling speaker recognition service...")
backends/advanced/webui/src/pages/ConversationsTimeline.tsx (1)

274-276: Consider removing dead conditional styling for deleted conversations.

Since getAll(false) now excludes deleted conversations, conv.deleted will always be false for all items in the list. The conditional conv.deleted ? '#fee2e2' : '#fff' is effectively dead code.

♻️ Suggested cleanup
                 contentStyle={{
-                  background: conv.deleted ? '#fee2e2' : '#fff',
+                  background: '#fff',
                   color: '#1f2937',
                   boxShadow: '0 3px 0 `#ddd`'
                 }}
tests/scripts/sdk_test_conversations.py (1)

12-14: Consider adding argument validation for clearer error messages.

Accessing sys.argv[1:4] without validation will raise an IndexError if arguments are missing, which may be confusing for users running the script incorrectly.

♻️ Suggested improvement
+if len(sys.argv) < 4:
+    print("Usage: sdk_test_conversations.py <backend_url> <email> <password>", file=sys.stderr)
+    sys.exit(1)
+
 backend_url = sys.argv[1]
 email = sys.argv[2]
 password = sys.argv[3]
tests/scripts/sdk_test_auth.py (1)

12-18: Consider adding argument validation and error handling.

The script will crash with an IndexError if fewer than 3 arguments are provided. For test scripts used in CI/automation, clearer error messages help debugging.

🔧 Suggested improvement
-backend_url = sys.argv[1]
-email = sys.argv[2]
-password = sys.argv[3]
-
-client = ChronicleClient(backend_url)
-client.login(email, password)
-print("SUCCESS")
+if len(sys.argv) < 4:
+    print("Usage: sdk_test_auth.py <backend_url> <email> <password>", file=sys.stderr)
+    sys.exit(1)
+
+backend_url = sys.argv[1]
+email = sys.argv[2]
+password = sys.argv[3]
+
+try:
+    client = ChronicleClient(backend_url)
+    client.login(email, password)
+    print("SUCCESS")
+except Exception as e:
+    print(f"FAILED: {e}", file=sys.stderr)
+    sys.exit(1)
backends/advanced/src/advanced_omi_backend/services/audio_stream/aggregator.py (1)

110-154: LGTM with a suggestion: consider sorting segments by timestamp.

The aggregation logic correctly combines text, words, and segments from all results. The average confidence calculation and early return for empty results are properly handled.

However, all_segments are collected in result order but not sorted by their internal timestamps. If segments from different chunks have overlapping or out-of-order timings, the final segment list may not be chronologically sorted.

🔧 Optional: Sort segments by start time
         # Join all text segments with spaces
         combined_text = " ".join(all_text)

+        # Sort segments by start time if present
+        if all_segments and "start" in all_segments[0]:
+            all_segments.sort(key=lambda s: s.get("start", 0))
+
         combined = {
             "text": combined_text,
             "words": all_words,
             "segments": all_segments,
backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py (1)

147-157: Use logging.exception for better traceability.

The static analysis tool correctly flags that logging.exception would be more appropriate here as it automatically includes the stack trace, which aids debugging validation failures.

Proposed fix
                except ValueError as val_error:
                    # Handle validation errors (e.g., file too long)
-                   audio_logger.error(f"Audio validation failed: {val_error}")
+                   audio_logger.exception(f"Audio validation failed: {val_error}")
                    processed_files.append({
                        "filename": file.filename,
                        "status": "error",
                        "error": str(val_error),
                    })
                    # Delete the conversation since it won't have audio chunks
                    await conversation.delete()
                    continue
tests/scripts/sdk_test_upload.py (2)

12-15: Add argument validation for clearer error messages.

Direct sys.argv access will raise an unclear IndexError if arguments are missing. Consider using argparse or at least validating argument count for better test failure diagnostics.

Proposed fix with basic validation
-backend_url = sys.argv[1]
-email = sys.argv[2]
-password = sys.argv[3]
-audio_file = sys.argv[4]
+if len(sys.argv) != 5:
+    print(f"Usage: {sys.argv[0]} <backend_url> <email> <password> <audio_file>", file=sys.stderr)
+    sys.exit(1)
+
+backend_url = sys.argv[1]
+email = sys.argv[2]
+password = sys.argv[3]
+audio_file = sys.argv[4]

21-23: Guard against empty files list.

Accessing result.files[0] will raise IndexError if the upload returns an empty files list. A guard would make test failures clearer.

Proposed fix
+if not result.files:
+    print("ERROR:No files in response")
+    sys.exit(1)
+
 print(f"STATUS:{result.files[0].status}")
 if result.files[0].conversation_id:
     print(f"CONVERSATION_ID:{result.files[0].conversation_id}")
backends/advanced/src/advanced_omi_backend/models/waveform.py (1)

34-37: Consider using timezone-aware datetime.

datetime.utcnow() is deprecated since Python 3.12. While functional, consider using datetime.now(timezone.utc) for timezone-aware timestamps.

Proposed fix
+from datetime import datetime, timezone
-from datetime import datetime
 ...
     created_at: datetime = Field(
-        default_factory=datetime.utcnow,
+        default_factory=lambda: datetime.now(timezone.utc),
         description="When this waveform was generated"
     )
tests/resources/redis_keywords.robot (1)

78-99: LGTM with optional modernization suggestion.

The Redis Command keyword correctly handles command execution, error checking, and numeric coercion. The logic is sound.

Minor note: Run Keyword If (line 95-97) is considered legacy syntax in Robot Framework 4.0+. The modern IF/ELSE syntax is preferred, though this is purely stylistic and the current implementation works correctly.

♻️ Optional: Use modern IF/ELSE syntax
     # Try to convert to integer if it's a number (for commands like XLEN)
     ${is_digit}=    Run Keyword And Return Status    Should Match Regexp    ${output}    ^\\d+$
-    ${return_value}=    Run Keyword If    ${is_digit}
-    ...    Convert To Integer    ${output}
-    ...    ELSE    Set Variable    ${output}
+    IF    ${is_digit}
+        ${return_value}=    Convert To Integer    ${output}
+    ELSE
+        ${return_value}=    Set Variable    ${output}
+    END

     RETURN    ${return_value}
config/defaults.yml (1)

29-40: Consider clarifying the API key environment variable for local models.

Using ${OPENAI_API_KEY:-ollama} for the local Ollama LLM (and similarly for local-embed at line 76) may cause confusion since Ollama doesn't typically require OpenAI credentials. This works because the default fallback is ollama, but a dedicated variable like ${OLLAMA_API_KEY:-ollama} or a comment clarifying this pattern would improve clarity.

backends/advanced/src/advanced_omi_backend/workers/waveform_jobs.py (1)

159-168: Consider upsert to prevent duplicate waveforms on concurrent requests.

If two requests race to generate a waveform for the same conversation, both will insert separate documents. Consider using find_one_and_update with upsert=True or adding a unique index on conversation_id in the WaveformData model to ensure idempotency.

♻️ Example using find_one_and_replace with upsert
-        await waveform_doc.insert()
+        # Use upsert to prevent duplicates from concurrent requests
+        await WaveformData.find_one(
+            WaveformData.conversation_id == conversation_id
+        ).upsert(
+            Set(waveform_doc.model_dump(exclude={"id"})),
+            on_insert=waveform_doc
+        )

Or add a unique index in the model:

class Settings:
    name = "waveforms"
    indexes = [
        IndexModel([("conversation_id", 1)], unique=True),
    ]
backends/advanced/src/advanced_omi_backend/routers/modules/conversation_routes.py (1)

117-120: Move imports to module level.

HTTPException is a standard FastAPI import and should be at the top of the file with other FastAPI imports for consistency. The model imports can remain deferred if there are circular import concerns, but HTTPException has no such issues.

♻️ Suggested change

At line 10, update the import:

-from fastapi import APIRouter, Depends, Query
+from fastapi import APIRouter, Depends, HTTPException, Query

Then at lines 117-120:

-    from fastapi import HTTPException
-    from advanced_omi_backend.models.conversation import Conversation
-    from advanced_omi_backend.models.waveform import WaveformData
-    from advanced_omi_backend.workers.waveform_jobs import generate_waveform_data
+    from advanced_omi_backend.models.conversation import Conversation
+    from advanced_omi_backend.models.waveform import WaveformData
+    from advanced_omi_backend.workers.waveform_jobs import generate_waveform_data
backends/advanced/webui/src/components/audio/WaveformDisplay.tsx (2)

116-137: Remove debug console.log statements before production.

The console.log and console.warn statements at lines 117, 120, 125, and 134 appear to be debug artifacts. Consider removing them or converting to a debug-only logger to reduce console noise in production.


51-75: Consider handling window resize.

The canvas dimensions are calculated once when the effect runs. If the user resizes the window, the waveform may display incorrectly until a re-render is triggered. Consider adding a ResizeObserver or window resize listener.

♻️ Example resize handling
useEffect(() => {
  if (!canvasRef.current) return;
  
  const resizeObserver = new ResizeObserver(() => {
    // Trigger redraw by updating a state or calling draw directly
  });
  
  resizeObserver.observe(canvasRef.current);
  return () => resizeObserver.disconnect();
}, []);
tests/integration/websocket_transcription_e2e_test.robot (1)

55-66: Consider replacing fixed Sleep with polling.

Fixed Sleep 5s can cause flakiness if the system is slow, or waste time if it's fast. Consider using Wait Until Keyword Succeeds with a polling approach to check the Redis stream.

♻️ Example polling approach
    # Instead of:
    Sleep    5s

    # Consider:
    ${stream_length}=    Wait Until Keyword Succeeds    10s    1s
    ...    Get Redis Stream Length Above Zero    ${stream_name}

Where Get Redis Stream Length Above Zero is a keyword that returns the length only if > 0, otherwise fails.

backends/advanced/src/scripts/cleanup_state.py (2)

552-557: Hardcoded vector dimension may not match original collection.

The recreated collection uses a hardcoded 1536 dimensions (OpenAI ada-002). If a different embedding model is used, this will create a mismatched collection. Consider retrieving the original collection's vector config before deletion.

♻️ Suggested approach
            # Get count and config before deletion
            collection_info = await self.qdrant_client.get_collection(collection_name)
            stats.memories_count = collection_info.points_count
+           original_vector_size = collection_info.config.params.vectors.size
+           original_distance = collection_info.config.params.vectors.distance

            # Delete and recreate collection
            await self.qdrant_client.delete_collection(collection_name)
            logger.info(f"Deleted memories collection ({stats.memories_count} vectors)")

            # Recreate with original configuration
            await self.qdrant_client.create_collection(
                collection_name=collection_name,
-               vectors_config=VectorParams(size=1536, distance=Distance.COSINE)
+               vectors_config=VectorParams(size=original_vector_size, distance=original_distance)
            )

501-524: Unused result assignments from delete operations.

The result variables are assigned but never used. While harmless, you could either remove the assignments or use them to verify the actual deleted count matches expectations.

♻️ Option 1: Remove assignments
-        result = await Conversation.find_all().delete()
+        await Conversation.find_all().delete()
♻️ Option 2: Verify counts
-        result = await Conversation.find_all().delete()
-        logger.info(f"Deleted {stats.conversations_count} conversations")
+        result = await Conversation.find_all().delete()
+        logger.info(f"Deleted {result.deleted_count} conversations (expected {stats.conversations_count})")
backends/advanced/src/advanced_omi_backend/model_registry.py (1)

289-300: Consider using logging.exception for better stack trace visibility in fallback path.

The fallback logic is reasonable for resilience, but logging.error loses the stack trace. Using logging.exception would preserve diagnostic information when the merged config load fails.

♻️ Suggested improvement
     # Try to get merged configuration (defaults + user config)
     try:
         raw = get_config(force_reload=force_reload)
     except Exception as e:
-        logging.error(f"Failed to load merged configuration: {e}")
+        logging.exception(f"Failed to load merged configuration: {e}")
         # Fallback to direct config.yml loading
         cfg_path = _find_config_path()
         if not cfg_path.exists():
             return None
         with cfg_path.open("r") as f:
             raw = yaml.safe_load(f) or {}
         raw = _deep_resolve_env(raw)
backends/advanced/src/advanced_omi_backend/services/memory/providers/vector_stores.py (1)

386-446: New get_memory method looks correct, but consider extracting the ID conversion logic.

The method correctly handles UUID/integer ID formats and validates user ownership when user_id is provided.

However, the ID conversion logic (lines 398-409) is duplicated across delete_memory, update_memory, and now get_memory. Consider extracting this to a helper method to reduce duplication.

Also, the import uuid on line 398 is unnecessary since uuid is already imported at the module level (line 11).

♻️ Suggested refactor: Extract ID conversion helper and remove duplicate import
+    def _normalize_point_id(self, memory_id: str):
+        """Convert memory_id to Qdrant-compatible format (UUID string or integer)."""
+        try:
+            uuid.UUID(memory_id)
+            return memory_id
+        except ValueError:
+            try:
+                return int(memory_id)
+            except ValueError:
+                return memory_id

     async def get_memory(self, memory_id: str, user_id: Optional[str] = None) -> Optional[MemoryEntry]:
         """Get a specific memory by ID from Qdrant.
         ...
         """
         try:
-            # Convert memory_id to proper format for Qdrant
-            import uuid
-            try:
-                # Try to parse as UUID first
-                uuid.UUID(memory_id)
-                point_id = memory_id
-            except ValueError:
-                # If not a UUID, try as integer
-                try:
-                    point_id = int(memory_id)
-                except ValueError:
-                    # If neither UUID nor integer, use it as-is
-                    point_id = memory_id
+            point_id = self._normalize_point_id(memory_id)

             # Retrieve the point by ID
             ...

The same refactor can be applied to delete_memory and update_memory.

backends/advanced/webui/src/pages/Archive.tsx (3)

68-79: Consider simplifying date format detection.

The condition on line 70 is complex and may not cover all edge cases (e.g., timestamps with timezone offsets like +05:30). Consider using a more robust approach:

♻️ Suggested simplification
 const formatDate = (timestamp: number | string) => {
   if (typeof timestamp === 'string') {
-    const isoString = timestamp.endsWith('Z') || timestamp.includes('+') || timestamp.includes('T') && timestamp.split('T')[1].includes('-')
-      ? timestamp
-      : timestamp + 'Z'
-    return new Date(isoString).toLocaleString()
+    // If it's a numeric string (Unix timestamp), convert to number
+    if (/^\d+$/.test(timestamp)) {
+      return new Date(parseInt(timestamp, 10) * 1000).toLocaleString()
+    }
+    // Otherwise treat as ISO string
+    return new Date(timestamp).toLocaleString()
   }
   if (timestamp === 0) {
     return 'Unknown date'
   }
   return new Date(timestamp * 1000).toLocaleString()
 }

352-369: Consider using a stable key for transcript segments.

Using index as the React key works here since segments are static after fetch, but a composite key would be more robust if segments are ever modified:

♻️ Suggested improvement
-                                {segments.map((segment, index) => {
+                                {segments.map((segment, index) => {
+                                  const segmentKey = `${segment.start}-${segment.end}-${index}`
                                   const speaker = segment.speaker || 'Unknown'
                                   return (
                                     <div
-                                      key={index}
+                                      key={segmentKey}
                                       className="text-sm leading-relaxed flex items-start space-x-2 py-1 px-2 rounded hover:bg-gray-50 dark:hover:bg-gray-700"
                                     >

3-4: Combine imports from the same module.

♻️ Suggested fix
-import { conversationsApi } from '../services/api'
-import { authApi } from '../services/api'
+import { conversationsApi, authApi } from '../services/api'
backends/advanced/src/advanced_omi_backend/config.py (2)

101-109: Consider emitting a deprecation warning.

The function is marked deprecated in the docstring but callers won't see runtime feedback. Consider adding a warning log to encourage migration.

♻️ Optional: Add deprecation warning
 def get_defaults_config_path():
     """
     Get the path to the defaults config file.

     DEPRECATED: Use get_defaults_yml_path() instead.
     Kept for backward compatibility.
     """
+    logger.warning("get_defaults_config_path() is deprecated, use get_defaults_yml_path() instead")
     defaults_path = get_defaults_yml_path()
     return defaults_path if defaults_path.exists() else None

164-180: Use logger.exception for better error diagnostics.

The broad Exception catches are acceptable for config loading robustness. However, using logger.exception instead of logger.warning/logger.error would include stack traces for easier debugging.

♻️ Proposed improvement
         try:
             with open(defaults_path, 'r') as f:
                 defaults = yaml.safe_load(f) or {}
             logger.info(f"Loaded defaults from {defaults_path}")
         except Exception as e:
-            logger.warning(f"Could not load defaults from {defaults_path}: {e}")
+            logger.exception(f"Could not load defaults from {defaults_path}")

     # Load user config
     config_path = get_config_yml_path()
     user_config = {}
     if config_path.exists():
         try:
             with open(config_path, 'r') as f:
                 user_config = yaml.safe_load(f) or {}
             logger.info(f"Loaded config from {config_path}")
         except Exception as e:
-            logger.error(f"Error loading config from {config_path}: {e}")
+            logger.exception(f"Error loading config from {config_path}")
tests/unit/test_config_loading.py (1)

124-132: Improve test to verify cache invalidation.

The test assigns config1 to establish a cache but doesn't use it to verify the reload behavior. The comment suggests checking for different instances, but the assertion only verifies the return type.

♻️ Proposed improvement
 def test_reload_config():
     """Test config reload invalidates cache."""
     config1 = get_config()
     config2 = reload_config()
 
-    # Should be different instances after reload
-    # (Note: Content might be the same, but object should be different)
-    # We check that reload returns a config object
-    assert isinstance(config2, dict)
+    # After reload, should be a different object instance
+    # (content might be identical, but the dict object should be new)
+    assert isinstance(config2, dict)
+    assert config1 is not config2
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 912c1cd and 078c602.

📒 Files selected for processing (49)
  • backends/advanced/.env.template
  • backends/advanced/Docs/memories.md
  • backends/advanced/docker-compose-test.yml
  • backends/advanced/docker-compose.yml
  • backends/advanced/src/advanced_omi_backend/app_factory.py
  • backends/advanced/src/advanced_omi_backend/config.py
  • backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py
  • backends/advanced/src/advanced_omi_backend/controllers/conversation_controller.py
  • backends/advanced/src/advanced_omi_backend/model_registry.py
  • backends/advanced/src/advanced_omi_backend/models/job.py
  • backends/advanced/src/advanced_omi_backend/models/waveform.py
  • backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py
  • backends/advanced/src/advanced_omi_backend/routers/modules/conversation_routes.py
  • backends/advanced/src/advanced_omi_backend/services/audio_stream/aggregator.py
  • backends/advanced/src/advanced_omi_backend/services/audio_stream/producer.py
  • backends/advanced/src/advanced_omi_backend/services/memory/README.md
  • backends/advanced/src/advanced_omi_backend/services/memory/base.py
  • backends/advanced/src/advanced_omi_backend/services/memory/config.py
  • backends/advanced/src/advanced_omi_backend/services/memory/providers/chronicle.py
  • backends/advanced/src/advanced_omi_backend/services/memory/providers/mycelia.py
  • backends/advanced/src/advanced_omi_backend/services/memory/providers/openmemory_mcp.py
  • backends/advanced/src/advanced_omi_backend/services/memory/providers/vector_stores.py
  • backends/advanced/src/advanced_omi_backend/services/transcription/__init__.py
  • backends/advanced/src/advanced_omi_backend/utils/audio_chunk_utils.py
  • backends/advanced/src/advanced_omi_backend/workers/cleanup_jobs.py
  • backends/advanced/src/advanced_omi_backend/workers/conversation_jobs.py
  • backends/advanced/src/advanced_omi_backend/workers/speaker_jobs.py
  • backends/advanced/src/advanced_omi_backend/workers/transcription_jobs.py
  • backends/advanced/src/advanced_omi_backend/workers/waveform_jobs.py
  • backends/advanced/src/scripts/cleanup_state.py
  • backends/advanced/tests/test_memory_entry.py
  • backends/advanced/tests/test_memory_providers.py
  • backends/advanced/webui/src/components/audio/WaveformDisplay.tsx
  • backends/advanced/webui/src/pages/Archive.tsx
  • backends/advanced/webui/src/pages/Conversations.tsx
  • backends/advanced/webui/src/pages/ConversationsRouter.tsx
  • backends/advanced/webui/src/pages/ConversationsTimeline.tsx
  • backends/advanced/webui/src/pages/Memories.tsx
  • backends/advanced/webui/src/services/api.ts
  • config/defaults.yml
  • tests/endpoints/memory_tests.robot
  • tests/integration/sdk_tests.robot
  • tests/integration/websocket_transcription_e2e_test.robot
  • tests/resources/queue_keywords.robot
  • tests/resources/redis_keywords.robot
  • tests/scripts/sdk_test_auth.py
  • tests/scripts/sdk_test_conversations.py
  • tests/scripts/sdk_test_upload.py
  • tests/unit/test_config_loading.py
🧰 Additional context used
🧬 Code graph analysis (22)
backends/advanced/src/advanced_omi_backend/services/transcription/__init__.py (1)
backends/advanced/src/advanced_omi_backend/models/conversation.py (1)
  • segments (211-213)
backends/advanced/webui/src/pages/ConversationsTimeline.tsx (1)
backends/advanced/webui/src/services/api.ts (1)
  • conversationsApi (109-130)
backends/advanced/webui/src/components/audio/WaveformDisplay.tsx (2)
backends/advanced/src/advanced_omi_backend/models/waveform.py (1)
  • WaveformData (16-47)
backends/advanced/webui/src/services/api.ts (1)
  • api (55-58)
tests/unit/test_config_loading.py (1)
backends/advanced/src/advanced_omi_backend/config.py (2)
  • merge_configs (112-136)
  • reload_config (200-204)
backends/advanced/src/advanced_omi_backend/workers/conversation_jobs.py (3)
backends/advanced/src/advanced_omi_backend/models/conversation.py (1)
  • segments (211-213)
backends/advanced/src/advanced_omi_backend/utils/conversation_utils.py (1)
  • extract_speakers_from_segments (407-423)
backends/advanced/src/advanced_omi_backend/services/audio_stream/aggregator.py (1)
  • get_session_results (29-79)
backends/advanced/src/advanced_omi_backend/routers/modules/conversation_routes.py (3)
backends/advanced/src/advanced_omi_backend/controllers/conversation_controller.py (1)
  • get_conversations (133-187)
backends/advanced/src/advanced_omi_backend/models/waveform.py (1)
  • WaveformData (16-47)
backends/advanced/src/advanced_omi_backend/workers/waveform_jobs.py (1)
  • generate_waveform_data (17-190)
backends/advanced/src/advanced_omi_backend/models/job.py (3)
backends/advanced/src/advanced_omi_backend/models/waveform.py (1)
  • WaveformData (16-47)
backends/advanced/src/advanced_omi_backend/models/conversation.py (1)
  • Conversation (17-328)
backends/advanced/src/advanced_omi_backend/models/audio_chunk.py (1)
  • AudioChunkDocument (16-158)
backends/advanced/src/advanced_omi_backend/services/memory/providers/openmemory_mcp.py (1)
backends/advanced/src/advanced_omi_backend/services/memory/base.py (1)
  • MemoryEntry (26-69)
backends/advanced/webui/src/pages/ConversationsRouter.tsx (1)
backends/advanced/webui/src/pages/Archive.tsx (1)
  • Archive (28-389)
backends/advanced/src/advanced_omi_backend/services/memory/config.py (1)
backends/advanced/src/advanced_omi_backend/config.py (1)
  • get_config_yml_path (91-93)
backends/advanced/src/advanced_omi_backend/services/audio_stream/aggregator.py (1)
backends/advanced/src/advanced_omi_backend/models/conversation.py (1)
  • segments (211-213)
backends/advanced/src/advanced_omi_backend/controllers/conversation_controller.py (3)
backends/advanced/src/advanced_omi_backend/routers/modules/conversation_routes.py (1)
  • get_conversations (31-36)
backends/advanced/src/advanced_omi_backend/models/user.py (2)
  • User (51-98)
  • user_id (71-73)
backends/advanced/src/advanced_omi_backend/models/conversation.py (1)
  • Conversation (17-328)
backends/advanced/src/advanced_omi_backend/services/memory/providers/vector_stores.py (6)
backends/advanced/src/advanced_omi_backend/models/user.py (1)
  • user_id (71-73)
backends/advanced/src/advanced_omi_backend/services/memory/base.py (2)
  • get_memory (171-185)
  • MemoryEntry (26-69)
backends/advanced/src/advanced_omi_backend/services/memory/providers/chronicle.py (1)
  • get_memory (300-322)
backends/advanced/src/advanced_omi_backend/services/memory/providers/mycelia.py (1)
  • get_memory (613-648)
backends/advanced/src/advanced_omi_backend/services/memory/providers/openmemory_mcp.py (1)
  • get_memory (285-320)
backends/advanced/src/advanced_omi_backend/services/memory/providers/mcp_client.py (1)
  • get_memory (364-405)
backends/advanced/tests/test_memory_entry.py (1)
backends/advanced/src/advanced_omi_backend/services/memory/base.py (2)
  • MemoryEntry (26-69)
  • to_dict (57-69)
backends/advanced/src/advanced_omi_backend/config.py (1)
backends/advanced/src/advanced_omi_backend/model_registry.py (1)
  • _deep_resolve_env (61-90)
backends/advanced/src/advanced_omi_backend/model_registry.py (1)
backends/advanced/src/advanced_omi_backend/config.py (2)
  • get_config (143-197)
  • get_config_yml_path (91-93)
backends/advanced/src/advanced_omi_backend/workers/waveform_jobs.py (2)
backends/advanced/src/advanced_omi_backend/models/waveform.py (1)
  • WaveformData (16-47)
backends/advanced/src/advanced_omi_backend/utils/audio_chunk_utils.py (2)
  • retrieve_audio_chunks (246-288)
  • decode_opus_to_pcm (117-195)
backends/advanced/src/advanced_omi_backend/workers/cleanup_jobs.py (1)
backends/advanced/src/advanced_omi_backend/models/waveform.py (1)
  • WaveformData (16-47)
backends/advanced/src/advanced_omi_backend/utils/audio_chunk_utils.py (2)
backends/advanced/src/advanced_omi_backend/models/conversation.py (1)
  • Conversation (17-328)
backends/advanced/src/advanced_omi_backend/models/audio_chunk.py (2)
  • AudioChunkDocument (16-158)
  • compression_ratio (140-144)
backends/advanced/src/scripts/cleanup_state.py (3)
backends/advanced/src/advanced_omi_backend/models/conversation.py (1)
  • Conversation (17-328)
backends/advanced/src/advanced_omi_backend/models/audio_chunk.py (1)
  • AudioChunkDocument (16-158)
backends/advanced/src/advanced_omi_backend/models/waveform.py (1)
  • WaveformData (16-47)
backends/advanced/src/advanced_omi_backend/app_factory.py (1)
backends/advanced/src/advanced_omi_backend/models/waveform.py (1)
  • WaveformData (16-47)
backends/advanced/src/advanced_omi_backend/services/memory/providers/chronicle.py (4)
backends/advanced/src/advanced_omi_backend/services/memory/providers/mycelia.py (2)
  • get_memory (613-648)
  • initialize (82-105)
backends/advanced/src/advanced_omi_backend/services/memory/providers/openmemory_mcp.py (2)
  • get_memory (285-320)
  • initialize (71-104)
backends/advanced/src/advanced_omi_backend/services/memory/providers/vector_stores.py (2)
  • get_memory (386-446)
  • initialize (52-118)
backends/advanced/src/advanced_omi_backend/models/user.py (1)
  • user_id (71-73)
🪛 dotenv-linter (4.0.0)
backends/advanced/.env.template

[warning] 171-171: [ValueWithoutQuotes] This value needs to be surrounded in quotes

(ValueWithoutQuotes)


[warning] 172-172: [UnorderedKey] The SPEAKER_CHUNK_SIZE key should go before the SPEAKER_CHUNK_THRESHOLD key

(UnorderedKey)


[warning] 172-172: [ValueWithoutQuotes] This value needs to be surrounded in quotes

(ValueWithoutQuotes)


[warning] 173-173: [UnorderedKey] The SPEAKER_CHUNK_OVERLAP key should go before the SPEAKER_CHUNK_SIZE key

(UnorderedKey)


[warning] 173-173: [ValueWithoutQuotes] This value needs to be surrounded in quotes

(ValueWithoutQuotes)

🪛 Ruff (0.14.11)
backends/advanced/src/advanced_omi_backend/models/waveform.py

45-47: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

tests/unit/test_config_loading.py

126-126: Local variable config1 is assigned to but never used

Remove assignment to unused variable config1

(F841)

backends/advanced/src/advanced_omi_backend/workers/conversation_jobs.py

573-573: f-string without any placeholders

Remove extraneous f prefix

(F541)

backends/advanced/src/advanced_omi_backend/routers/modules/conversation_routes.py

33-33: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


98-98: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py

149-149: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


166-166: Use explicit conversion flag

Replace with conversion flag

(RUF010)

backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py

156-156: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

backends/advanced/src/advanced_omi_backend/services/memory/config.py

82-85: Avoid specifying long messages outside the exception class

(TRY003)

backends/advanced/src/advanced_omi_backend/controllers/conversation_controller.py

142-142: Avoid equality comparisons to False; use not Conversation.deleted: for false checks

Replace with not Conversation.deleted

(E712)


149-149: Avoid equality comparisons to False; use not Conversation.deleted: for false checks

Replace with not Conversation.deleted

(E712)

backends/advanced/src/advanced_omi_backend/services/memory/providers/vector_stores.py

146-146: Consider moving this statement to an else block

(TRY300)


380-380: Consider moving this statement to an else block

(TRY300)


382-382: Do not catch blind exception: Exception

(BLE001)


383-383: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


442-442: Consider moving this statement to an else block

(TRY300)


444-444: Do not catch blind exception: Exception

(BLE001)


445-445: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

backends/advanced/src/advanced_omi_backend/config.py

168-168: Do not catch blind exception: Exception

(BLE001)


179-179: Do not catch blind exception: Exception

(BLE001)


180-180: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

backends/advanced/src/advanced_omi_backend/model_registry.py

292-292: Do not catch blind exception: Exception

(BLE001)


293-293: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

backends/advanced/src/advanced_omi_backend/workers/speaker_jobs.py

20-20: Unused function argument: overlap

(ARG001)


351-351: Abstract raise to an inner function

(TRY301)


351-351: Avoid specifying long messages outside the exception class

(TRY003)


386-386: f-string without any placeholders

Remove extraneous f prefix

(F541)

backends/advanced/src/advanced_omi_backend/workers/waveform_jobs.py

172-178: Consider moving this statement to an else block

(TRY300)

backends/advanced/src/advanced_omi_backend/utils/audio_chunk_utils.py

434-434: Avoid specifying long messages outside the exception class

(TRY003)


448-448: Avoid specifying long messages outside the exception class

(TRY003)


596-599: Avoid specifying long messages outside the exception class

(TRY003)


762-765: Avoid specifying long messages outside the exception class

(TRY003)

backends/advanced/src/scripts/cleanup_state.py

66-66: Do not catch blind exception: Exception

(BLE001)


135-135: Consider moving this statement to an else block

(TRY300)


188-188: Do not catch blind exception: Exception

(BLE001)


278-278: Unused method argument: stats

(ARG002)


289-289: Do not catch blind exception: Exception

(BLE001)


328-328: Do not catch blind exception: Exception

(BLE001)


420-420: Do not catch blind exception: Exception

(BLE001)


480-480: Consider moving this statement to an else block

(TRY300)


523-523: Local variable result is assigned to but never used

Remove assignment to unused variable result

(F841)


559-559: Do not catch blind exception: Exception

(BLE001)


608-608: Do not catch blind exception: Exception

(BLE001)


630-630: Do not catch blind exception: Exception

(BLE001)


657-657: Do not catch blind exception: Exception

(BLE001)


676-676: Do not catch blind exception: Exception

(BLE001)


804-804: Do not catch blind exception: Exception

(BLE001)

backends/advanced/src/advanced_omi_backend/services/memory/providers/chronicle.py

319-319: Consider moving this statement to an else block

(TRY300)


320-320: Do not catch blind exception: Exception

(BLE001)


321-321: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

- Ensure embedding model is available in Ollama
- Check vector dimensions match between embedder and Qdrant
- Verify collection has vectors: `curl http://localhost:6333/collections/omi_memories`
- Verify collection has vectors: `curl http://localhost:6333/collections/chronicle_memories`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

rg -n "omi_memories" --type md --type py --type yaml --type json

Repository: SimpleOpenSoftware/chronicle

Length of output: 417


Update collection name references in configuration files.

The old omi_memories collection name still appears in multiple configuration files that need to be updated to match the new chronicle_memories name:

  • config/defaults.yml:198
  • tests/configs/deepgram-openai.yml:50
  • tests/configs/parakeet-openai.yml:44
  • tests/configs/parakeet-ollama.yml:44
  • tests/configs/mock-services.yml:40

These configuration files will cause failures if the old collection doesn't exist in the system.

🤖 Prompt for AI Agents
In `@backends/advanced/Docs/memories.md` at line 502, Search the repo for the
string "omi_memories" and replace it with "chronicle_memories" in all
configuration and test YAML files (the defaults and test configs referenced in
the review), update any docs examples (e.g., curl/verification lines) to use
"chronicle_memories", and run tests to ensure no remaining references to
"omi_memories" remain.

Comment on lines +43 to +47
class Settings:
name = "waveforms"
indexes = [
"conversation_id", # Unique lookup by conversation
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's find the WaveformData model and examine it
find . -type f -name "*.py" | xargs grep -l "class WaveformData" 2>/dev/null

Repository: SimpleOpenSoftware/chronicle

Length of output: 135


🏁 Script executed:

# Also search for the waveform.py file to understand its full structure
find . -type f -name "waveform.py" -path "*/advanced_omi_backend/models/*"

Repository: SimpleOpenSoftware/chronicle

Length of output: 135


🏁 Script executed:

# Check how conversation_id is used in the model and elsewhere
rg -n "conversation_id" --type=py backends/advanced/src/advanced_omi_backend/models/ -B 2 -A 2

Repository: SimpleOpenSoftware/chronicle

Length of output: 6481


🏁 Script executed:

# Look for any constraints or validation on WaveformData
rg -n "WaveformData\|conversation_id" --type=py backends/advanced/src/advanced_omi_backend/ | head -50

Repository: SimpleOpenSoftware/chronicle

Length of output: 54


🏁 Script executed:

cat -n backends/advanced/src/advanced_omi_backend/models/waveform.py

Repository: SimpleOpenSoftware/chronicle

Length of output: 1766


🏁 Script executed:

# Also check if there's any application logic validating uniqueness of waveform per conversation
rg -n "WaveformData\|waveform" --type=py backends/advanced/src/advanced_omi_backend/ | grep -i "create\|insert\|unique\|validation" | head -30

Repository: SimpleOpenSoftware/chronicle

Length of output: 54


🏁 Script executed:

# Check for any repository/database query patterns that assume one waveform per conversation
rg -n "find_one.*waveform\|waveform.*find_one" --type=py backends/advanced/src/advanced_omi_backend/ -B 2 -A 2

Repository: SimpleOpenSoftware/chronicle

Length of output: 54


🏁 Script executed:

# Search for waveform creation and update patterns
rg -n "WaveformData\(" --type=py backends/advanced/src/advanced_omi_backend/ -B 3 -A 5

Repository: SimpleOpenSoftware/chronicle

Length of output: 1872


🏁 Script executed:

# Check for any tests that show waveform creation patterns
find backends/advanced -type f -name "*test*.py" | xargs grep -l "WaveformData\|waveform" 2>/dev/null

Repository: SimpleOpenSoftware/chronicle

Length of output: 56


🏁 Script executed:

# Look for service/repository methods that interact with waveforms
rg -n "class.*Repository\|def.*waveform\|def.*create.*waveform" --type=py backends/advanced/src/advanced_omi_backend/ -B 1 -A 3

Repository: SimpleOpenSoftware/chronicle

Length of output: 54


🏁 Script executed:

# Check the waveform_jobs.py file to see how waveforms are stored
cat -n backends/advanced/src/advanced_omi_backend/workers/waveform_jobs.py | sed -n '155,175p'

Repository: SimpleOpenSoftware/chronicle

Length of output: 869


🏁 Script executed:

# Search for any insert/update logic for waveforms
rg -n "insert\|insert_one\|replace\|update" --type=py backends/advanced/src/advanced_omi_backend/workers/waveform_jobs.py -B 2 -A 2

Repository: SimpleOpenSoftware/chronicle

Length of output: 54


🏁 Script executed:

# Check if there's logic to handle duplicate waveforms or find existing ones
rg -n "find.*conversation_id\|replace_one\|update_one" --type=py backends/advanced/src/advanced_omi_backend/ -B 1 -A 3 | grep -i waveform -A 5 -B 5

Repository: SimpleOpenSoftware/chronicle

Length of output: 54


🏁 Script executed:

# Check if Beanie's unique index syntax uses a list wrapper for uniqueness
rg -n "indexes.*\[.*unique\|unique.*index" --type=py backends/advanced/src/advanced_omi_backend/models/ -B 1 -A 3

Repository: SimpleOpenSoftware/chronicle

Length of output: 54


🏁 Script executed:

# Look at how Beanie is configured and check if there's a way to define unique indexes
rg -n "class Settings\|indexes = " --type=py backends/advanced/src/advanced_omi_backend/models/ -A 5 | head -60

Repository: SimpleOpenSoftware/chronicle

Length of output: 54


🏁 Script executed:

# Get all Settings classes to see index patterns
rg -n "class Settings" --type=py backends/advanced/src/advanced_omi_backend/models/ -A 10

Repository: SimpleOpenSoftware/chronicle

Length of output: 3865


🏁 Script executed:

# Check Beanie documentation on unique indexes in the code comments
rg -n "unique.*index\|\[.*\(.*unique" --type=py backends/advanced/src/advanced_omi_backend/models/

Repository: SimpleOpenSoftware/chronicle

Length of output: 54


🏁 Script executed:

# Let's examine conversation.py Settings section to see how unique constraints are defined
sed -n '320,335p' backends/advanced/src/advanced_omi_backend/models/conversation.py

Repository: SimpleOpenSoftware/chronicle

Length of output: 449


🏁 Script executed:

# Verify how Beanie defines unique indexes by looking at documentation comments or other unique field examples
rg -n "unique=True" --type=py backends/advanced/src/advanced_omi_backend/models/

Repository: SimpleOpenSoftware/chronicle

Length of output: 284


🏁 Script executed:

# Check if there's any code that queries for existing waveforms before inserting (which would suggest one-per-conversation intent)
rg -n "find.*waveform\|WaveformData.*find" --type=py backends/advanced/src/advanced_omi_backend/ -B 2 -A 3

Repository: SimpleOpenSoftware/chronicle

Length of output: 54


🏁 Script executed:

# Check what happens when waveform job runs - does it replace or just insert?
sed -n '140,180p' backends/advanced/src/advanced_omi_backend/workers/waveform_jobs.py

Repository: SimpleOpenSoftware/chronicle

Length of output: 1490


Add unique=True to conversation_id field if one waveform per conversation is required.

The field description states "unique per conversation" and the index comment claims "Unique lookup by conversation", but the implementation doesn't enforce this constraint. Following the pattern used in Conversation model (line 84 of conversation.py), change Indexed(str) to Indexed(str, unique=True) on line 20. Currently, await waveform_doc.insert() allows duplicate conversation_ids, which contradicts the stated intent.

🧰 Tools
🪛 Ruff (0.14.11)

45-47: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

🤖 Prompt for AI Agents
In `@backends/advanced/src/advanced_omi_backend/models/waveform.py` around lines
43 - 47, The waveform model's conversation_id isn't enforced unique despite the
comment; update the field definition from Indexed(str) to Indexed(str,
unique=True) (the same pattern used in Conversation) so inserts cannot create
duplicate conversation_id values, and keep the Settings.indexes entry as-is;
modify the conversation_id declaration in the Waveform model to use Indexed(str,
unique=True).

Comment on lines +128 to +163
# Parse Range header (e.g., "bytes=0-1023")
try:
range_str = range_header.replace("bytes=", "")
range_start, range_end = range_str.split("-")
range_start = int(range_start) if range_start else 0
range_end = int(range_end) if range_end else file_size - 1

# Ensure valid range
range_start = max(0, range_start)
range_end = min(file_size - 1, range_end)
content_length = range_end - range_start + 1

# Extract requested byte range
range_data = wav_data[range_start:range_end + 1]

# Return 206 Partial Content with Range headers
return Response(
content=range_data,
status_code=206,
media_type="audio/wav",
headers={
"Content-Range": f"bytes {range_start}-{range_end}/{file_size}",
"Content-Length": str(content_length),
"Accept-Ranges": "bytes",
"Content-Disposition": f"inline; filename={conversation_id}.wav",
"X-Audio-Source": "mongodb-chunks",
}
)
except (ValueError, IndexError) as e:
# Invalid Range header, return 416 Range Not Satisfiable
return Response(
status_code=416,
headers={
"Content-Range": f"bytes */{file_size}"
}
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Invalid Range requests can produce malformed responses.

The Range parsing doesn't validate that range_start <= range_end after parsing. For requests like bytes=100-50, this produces:

  • content_length = 50 - 100 + 1 = -49 (negative)
  • Empty content from wav_data[100:51]
  • A response with invalid Content-Length: -49

Also, the static analysis correctly notes the unused variable e on line 156.

🐛 Proposed fix to validate range and remove unused variable
         # Ensure valid range
         range_start = max(0, range_start)
         range_end = min(file_size - 1, range_end)
+
+        # Validate range_start <= range_end
+        if range_start > range_end:
+            return Response(
+                status_code=416,
+                headers={
+                    "Content-Range": f"bytes */{file_size}"
+                }
+            )
+
         content_length = range_end - range_start + 1

         # Extract requested byte range
         range_data = wav_data[range_start:range_end + 1]
-    except (ValueError, IndexError) as e:
+    except (ValueError, IndexError):
         # Invalid Range header, return 416 Range Not Satisfiable
         return Response(
             status_code=416,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Parse Range header (e.g., "bytes=0-1023")
try:
range_str = range_header.replace("bytes=", "")
range_start, range_end = range_str.split("-")
range_start = int(range_start) if range_start else 0
range_end = int(range_end) if range_end else file_size - 1
# Ensure valid range
range_start = max(0, range_start)
range_end = min(file_size - 1, range_end)
content_length = range_end - range_start + 1
# Extract requested byte range
range_data = wav_data[range_start:range_end + 1]
# Return 206 Partial Content with Range headers
return Response(
content=range_data,
status_code=206,
media_type="audio/wav",
headers={
"Content-Range": f"bytes {range_start}-{range_end}/{file_size}",
"Content-Length": str(content_length),
"Accept-Ranges": "bytes",
"Content-Disposition": f"inline; filename={conversation_id}.wav",
"X-Audio-Source": "mongodb-chunks",
}
)
except (ValueError, IndexError) as e:
# Invalid Range header, return 416 Range Not Satisfiable
return Response(
status_code=416,
headers={
"Content-Range": f"bytes */{file_size}"
}
)
# Parse Range header (e.g., "bytes=0-1023")
try:
range_str = range_header.replace("bytes=", "")
range_start, range_end = range_str.split("-")
range_start = int(range_start) if range_start else 0
range_end = int(range_end) if range_end else file_size - 1
# Ensure valid range
range_start = max(0, range_start)
range_end = min(file_size - 1, range_end)
# Validate range_start <= range_end
if range_start > range_end:
return Response(
status_code=416,
headers={
"Content-Range": f"bytes */{file_size}"
}
)
content_length = range_end - range_start + 1
# Extract requested byte range
range_data = wav_data[range_start:range_end + 1]
# Return 206 Partial Content with Range headers
return Response(
content=range_data,
status_code=206,
media_type="audio/wav",
headers={
"Content-Range": f"bytes {range_start}-{range_end}/{file_size}",
"Content-Length": str(content_length),
"Accept-Ranges": "bytes",
"Content-Disposition": f"inline; filename={conversation_id}.wav",
"X-Audio-Source": "mongodb-chunks",
}
)
except (ValueError, IndexError):
# Invalid Range header, return 416 Range Not Satisfiable
return Response(
status_code=416,
headers={
"Content-Range": f"bytes */{file_size}"
}
)
🧰 Tools
🪛 Ruff (0.14.11)

156-156: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

🤖 Prompt for AI Agents
In `@backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py`
around lines 128 - 163, Validate the parsed byte range after splitting
range_header: after computing range_start and range_end (and clamping to
0..file_size-1) ensure range_start <= range_end; if not, return a 416 Range Not
Satisfiable with "Content-Range: bytes */{file_size}". Also guard against
negative content_length by computing content_length = range_end - range_start +
1 only after the validation, and use the slice wav_data[range_start:range_end +
1] only when valid. Remove the unused exception variable `e` in the except
clause so the handler catches (ValueError, IndexError) and returns 416 without
referencing `e`.

host: str = "localhost",
port: int = 6333,
collection_name: str = "omi_memories",
collection_name: str = "chronicle_memories",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for references to "omi_memories" across the codebase
echo "=== References to omi_memories ==="
rg -n "omi_memories" --type py --type md --type yaml --type json --type txt 2>/dev/null | head -50

echo ""
echo "=== Migration documentation or scripts ==="
fd -t f \( -name "*migrat*" -o -name "*MIGRAT*" \) 2>/dev/null | head -20

echo ""
echo "=== Checking for environment variable configuration ==="
rg -n "COLLECTION|collection" backends/advanced/src/advanced_omi_backend/services/memory/config.py -A 2 -B 2 2>/dev/null | head -80

Repository: SimpleOpenSoftware/chronicle

Length of output: 1336


Address inconsistency between code defaults and configuration file defaults for collection name migration.

The code default for collection_name changed to "chronicle_memories" (lines 143, 266), but existing configuration files (config/defaults.yml, test configs) still reference "omi_memories". This creates an inconsistent upgrade experience:

  • Deployments using config files will retain the old collection name (preserving existing data)
  • Deployments without explicit configuration may switch to the new name, causing data access issues

Since existing configurations still specify "omi_memories", consider:

  1. Update configuration files to use the new default name, OR
  2. Revert the code default to match existing configs, OR
  3. Add migration documentation explaining the change and how to explicitly configure the collection name

No migration scripts or documentation currently exists for this breaking change.

🤖 Prompt for AI Agents
In `@backends/advanced/src/advanced_omi_backend/services/memory/config.py` at line
143, The code default for collection_name (variable collection_name in
config.py) was changed to "chronicle_memories" while existing defaults.yml and
tests still reference "omi_memories", causing a breaking mismatch; fix by
reverting the config.py default back to "omi_memories" (restore collection_name:
str = "omi_memories") so existing deployments keep working, update any unit
tests that assert the default to match, and add a short migration note in
project docs explaining the planned rename and how to opt into
"chronicle_memories" via explicit config so future changes are non-breaking.

Comment on lines +430 to +434
# Get conversation metadata
conversation = await Conversation.get(conversation_id)

if not conversation:
raise ValueError(f"Conversation {conversation_id} not found")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Incorrect document lookup method.

Conversation.get(conversation_id) retrieves a document by its MongoDB _id. However, conversation_id is a separate indexed field on the Conversation model, not the document's _id. This will fail to find conversations.

Use the same pattern as elsewhere in this file (lines 669-671):

🐛 Proposed fix
-    conversation = await Conversation.get(conversation_id)
+    conversation = await Conversation.find_one(
+        Conversation.conversation_id == conversation_id
+    )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Get conversation metadata
conversation = await Conversation.get(conversation_id)
if not conversation:
raise ValueError(f"Conversation {conversation_id} not found")
# Get conversation metadata
conversation = await Conversation.find_one(
Conversation.conversation_id == conversation_id
)
if not conversation:
raise ValueError(f"Conversation {conversation_id} not found")
🧰 Tools
🪛 Ruff (0.14.11)

434-434: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In `@backends/advanced/src/advanced_omi_backend/utils/audio_chunk_utils.py` around
lines 430 - 434, The code incorrectly uses Conversation.get(conversation_id)
which looks up by MongoDB _id; replace it with the indexed-field lookup used
elsewhere (pattern at lines 669-671) by awaiting Conversation.find_one with the
filter on the conversation_id field (e.g., await
Conversation.find_one({"conversation_id": conversation_id}) or equivalent
keyword-arg form) and keep the existing not-found ValueError branch.

Comment on lines +19 to +34
sample_rate: int = 3,
) -> Dict[str, Any]:
"""
Generate waveform visualization data from conversation audio chunks.
This function:
1. Retrieves Opus-compressed audio chunks from MongoDB
2. Decodes each chunk to PCM
3. Downsamples PCM to target sample rate (e.g., 10 samples/sec)
4. Calculates amplitude peaks for each sample window
5. Normalizes to [-1.0, 1.0] range
6. Stores in WaveformData collection
Args:
conversation_id: Conversation ID to generate waveform for
sample_rate: Samples per second for waveform (default: 10)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Docstring/signature mismatch for sample_rate default.

The docstring states the default is 10, but the actual default in the signature is 3. Update the docstring to reflect the correct default.

📝 Suggested fix
-        sample_rate: Samples per second for waveform (default: 10)
+        sample_rate: Samples per second for waveform (default: 3)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
sample_rate: int = 3,
) -> Dict[str, Any]:
"""
Generate waveform visualization data from conversation audio chunks.
This function:
1. Retrieves Opus-compressed audio chunks from MongoDB
2. Decodes each chunk to PCM
3. Downsamples PCM to target sample rate (e.g., 10 samples/sec)
4. Calculates amplitude peaks for each sample window
5. Normalizes to [-1.0, 1.0] range
6. Stores in WaveformData collection
Args:
conversation_id: Conversation ID to generate waveform for
sample_rate: Samples per second for waveform (default: 10)
sample_rate: int = 3,
) -> Dict[str, Any]:
"""
Generate waveform visualization data from conversation audio chunks.
This function:
1. Retrieves Opus-compressed audio chunks from MongoDB
2. Decodes each chunk to PCM
3. Downsamples PCM to target sample rate (e.g., 10 samples/sec)
4. Calculates amplitude peaks for each sample window
5. Normalizes to [-1.0, 1.0] range
6. Stores in WaveformData collection
Args:
conversation_id: Conversation ID to generate waveform for
sample_rate: Samples per second for waveform (default: 3)
🤖 Prompt for AI Agents
In `@backends/advanced/src/advanced_omi_backend/workers/waveform_jobs.py` around
lines 19 - 34, The docstring for the function that accepts parameter sample_rate
incorrectly states the default is 10 while the function signature sets
sample_rate: int = 3; update the docstring to state the correct default (3) so
the signature and documentation match, referencing the sample_rate parameter in
the function (waveform generation function in waveform_jobs.py) and adjust the
"Args" section to list "sample_rate: Samples per second for waveform (default:
3)".

Comment on lines +278 to +293
async def _export_audio_wav(self, stats: CleanupStats):
"""Export audio as WAV files (1-minute chunks)"""
logger.info("Exporting audio as WAV files (this may take a while)...")

# Get all unique conversation IDs
conversations = await Conversation.find_all().to_list()
audio_dir = self.backup_path / "audio"

for conv in conversations:
try:
await self._export_conversation_audio(conv.conversation_id, audio_dir)
except Exception as e:
logger.warning(f"Failed to export audio for {conv.conversation_id}: {e}")
continue

logger.info("Audio export completed")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Unused stats parameter in _export_audio_wav.

The stats parameter is passed to this method but never used. Either remove it from the signature or use it to track export statistics (e.g., count of exported WAV files).

🐛 Suggested fix - remove unused parameter
-    async def _export_audio_wav(self, stats: CleanupStats):
+    async def _export_audio_wav(self):

And update the call at line 120:

-                await self._export_audio_wav(stats)
+                await self._export_audio_wav()
🧰 Tools
🪛 Ruff (0.14.11)

278-278: Unused method argument: stats

(ARG002)


289-289: Do not catch blind exception: Exception

(BLE001)

Comment on lines +77 to +95
const drawWaveform = (
ctx: CanvasRenderingContext2D,
samples: number[],
width: number,
height: number
) => {
const barWidth = width / samples.length;
const centerY = height / 2;

ctx.fillStyle = '#3b82f6'; // Blue bars (Tailwind blue-500)

samples.forEach((amplitude, i) => {
const x = i * barWidth;
const barHeight = Math.max(1, amplitude * centerY); // Ensure minimum 1px height

// Draw bar centered vertically
ctx.fillRect(x, centerY - barHeight, barWidth - 1, barHeight * 2);
});
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Guard against empty samples array.

If samples is empty, barWidth becomes Infinity (division by zero), and the forEach will be a no-op anyway. Add an early return for clarity and to prevent potential issues.

🐛 Suggested fix
  const drawWaveform = (
    ctx: CanvasRenderingContext2D,
    samples: number[],
    width: number,
    height: number
  ) => {
+   if (samples.length === 0) return;
+
    const barWidth = width / samples.length;
    const centerY = height / 2;
🤖 Prompt for AI Agents
In `@backends/advanced/webui/src/components/audio/WaveformDisplay.tsx` around
lines 77 - 95, The drawWaveform function can divide by zero when samples is
empty; add an early guard at the start of drawWaveform that returns immediately
if !samples || samples.length === 0 to avoid computing barWidth = width /
samples.length and any downstream issues; update the function (drawWaveform) to
perform this check before computing barWidth/centerY and before using
samples.forEach.

Comment on lines +59 to +71
SDK Upload Respects Backend File Size Limit
[Documentation] Verify SDK properly reports backend errors for oversized files
[Tags] audio-upload
# Note: This tests that SDK handles backend rejection gracefully
# The 30-minute limit is enforced by the backend, not the SDK
# Full test would require a 30+ minute audio file

${result}= Run Process uv run python
... ${CURDIR}/../scripts/sdk_test_auth.py
... ${BACKEND_URL} ${ADMIN_EMAIL} ${ADMIN_PASSWORD}
Should Be Equal As Integers ${result.rc} 0 SDK should handle backend errors gracefully

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Test case doesn't match its documented purpose.

This test claims to verify "SDK properly reports backend errors for oversized files" but actually just re-runs the authentication script. Consider either:

  1. Skipping this test like other unimplemented features
  2. Renaming it to reflect what it actually tests
Proposed fix - skip until properly implemented
 SDK Upload Respects Backend File Size Limit
     [Documentation]    Verify SDK properly reports backend errors for oversized files
     [Tags]    audio-upload
-
-    # Note: This tests that SDK handles backend rejection gracefully
-    # The 30-minute limit is enforced by the backend, not the SDK
-    # Full test would require a 30+ minute audio file
-
-    ${result}=    Run Process    uv    run    python
-    ...    ${CURDIR}/../scripts/sdk_test_auth.py
-    ...    ${BACKEND_URL}    ${ADMIN_EMAIL}    ${ADMIN_PASSWORD}
-    Should Be Equal As Integers    ${result.rc}    0    SDK should handle backend errors gracefully
+    Skip    Test requires 30+ minute audio file - not implemented
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
SDK Upload Respects Backend File Size Limit
[Documentation] Verify SDK properly reports backend errors for oversized files
[Tags] audio-upload
# Note: This tests that SDK handles backend rejection gracefully
# The 30-minute limit is enforced by the backend, not the SDK
# Full test would require a 30+ minute audio file
${result}= Run Process uv run python
... ${CURDIR}/../scripts/sdk_test_auth.py
... ${BACKEND_URL} ${ADMIN_EMAIL} ${ADMIN_PASSWORD}
Should Be Equal As Integers ${result.rc} 0 SDK should handle backend errors gracefully
SDK Upload Respects Backend File Size Limit
[Documentation] Verify SDK properly reports backend errors for oversized files
[Tags] audio-upload
Skip Test requires 30+ minute audio file - not implemented
🤖 Prompt for AI Agents
In `@tests/integration/sdk_tests.robot` around lines 59 - 71, The test "SDK Upload
Respects Backend File Size Limit" currently only re-runs the authentication
script (Run Process ... ${CURDIR}/../scripts/sdk_test_auth.py) and doesn't
exercise upload/oversize behavior; either mark it as skipped until implemented
or rename it to reflect it's a placeholder: add a skip tag (e.g., add "skip" to
the [Tags] line) and update the test name or [Documentation] to indicate it's a
placeholder that re-runs sdk_test_auth.py, or alternatively change the test name
to something like "Placeholder: Re-run SDK auth script" so the intent and
behavior match.

Comment on lines +330 to +342
Get Job Result
[Documentation] Get the result field of a completed job
... Useful for checking job output/return values
[Arguments] ${job_id}

# Get full job details
${response}= GET On Session api /api/queue/job/${job_id}
... expected_status=200

${job_data}= Set Variable ${response.json()}
${result}= Set Variable ${job_data}[result]

RETURN ${result}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for queue API route definitions

# First, let's find the structure of the repository
echo "=== Repository structure ==="
find . -type f -name "*.py" | head -20

echo ""
echo "=== Searching for API route definitions with 'queue' and 'job' ==="
rg -n --type py '/api/queue/job' | head -20

echo ""
echo "=== Searching for route decorators with queue and job endpoints ==="
rg -n --type py '@.*route.*queue.*job|@.*get.*queue.*job|@.*post.*queue.*job' -C3

echo ""
echo "=== Searching for 'queue' in route definitions ==="
rg -n --type py '/queue/jobs?/' | head -30

Repository: SimpleOpenSoftware/chronicle

Length of output: 2106


🏁 Script executed:

#!/bin/bash
# Search for queue controller and router definitions

echo "=== Queue Controller File ==="
cat -n backends/advanced/src/advanced_omi_backend/controllers/queue_controller.py | head -100

echo ""
echo "=== Queue-related files in routers ==="
find . -type f -name "*.py" -path "*/routers/*" | xargs grep -l "queue" 2>/dev/null

echo ""
echo "=== Main router setup ==="
find . -type f -name "*router*.py" -o -name "routes.py" | head -10

Repository: SimpleOpenSoftware/chronicle

Length of output: 4783


🏁 Script executed:

#!/bin/bash
# Examine queue routes for endpoint definitions

echo "=== Queue Routes File ==="
wc -l backends/advanced/src/advanced_omi_backend/routers/modules/queue_routes.py

echo ""
echo "=== Complete Queue Routes File ==="
cat -n backends/advanced/src/advanced_omi_backend/routers/modules/queue_routes.py

Repository: SimpleOpenSoftware/chronicle

Length of output: 50384


🏁 Script executed:

#!/bin/bash
# Check the actual endpoint in the test file

echo "=== Checking lines 330-342 in test file ==="
sed -n '330,342p' tests/resources/queue_keywords.robot

Repository: SimpleOpenSoftware/chronicle

Length of output: 554


Fix endpoint path: change /api/queue/job/ to /api/queue/jobs/

The endpoint uses the singular form /api/queue/job/${job_id}, but the API backend only defines endpoints with the plural form /api/queue/jobs/{job_id}. This mismatch will cause a 404 error at runtime. All other job-specific endpoints in this file correctly use the plural form.

Verified Fix
     # Get full job details
-    ${response}=    GET On Session    api    /api/queue/job/${job_id}
+    ${response}=    GET On Session    api    /api/queue/jobs/${job_id}
     ...    expected_status=200
🤖 Prompt for AI Agents
In `@tests/resources/queue_keywords.robot` around lines 330 - 342, The Robot
keyword "Get Job Result" is calling the wrong endpoint path
"/api/queue/job/${job_id}" which should be plural; update the GET call in that
keyword to use "/api/queue/jobs/${job_id}" so it matches the backend route and
will return 200; ensure the GET On Session invocation and any related variables
(${response}, ${job_data}, ${result}) remain unchanged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants