Fix/conversation view mongo #269

AnkushMalaker · 2026-01-14T09:59:35Z

Summary by CodeRabbit

New Features
- Added audio waveform visualization with interactive seek controls
- Added conversation archive/restore functionality for soft-deleted conversations
- Added HTTP Range request support for partial audio content streaming
- Introduced memory timestamp tracking with updated_at field
- Added chunked speaker recognition for large audio files
Bug Fixes
- Improved error handling and cleanup for failed audio processing
- Enhanced audio streaming termination signaling
Documentation
- Added comprehensive default configuration settings

_{✏️ Tip: You can customize this high-level summary in your review settings.}

- Updated Docker Compose files to mount the entire config directory, allowing for better management of configuration files. - Introduced a new `WaveformData` model to store pre-computed waveform visualization data, improving UI performance by enabling waveform display without real-time decoding. - Enhanced the `app_factory` and `job` models to include the new `WaveformData` model, ensuring proper initialization and data handling. - Implemented waveform generation logic in a new worker module, allowing for on-demand waveform creation from audio chunks. - Added API endpoints for retrieving and generating waveform data, improving the overall audio processing capabilities. - Updated tests to cover new functionality and ensure robustness in waveform data handling.

…nd audio upload - Introduced three new test scripts: `sdk_test_auth.py`, `sdk_test_conversations.py`, and `sdk_test_upload.py`. - Each script tests different functionalities of the SDK, including authentication, conversation retrieval, and audio file uploads. - The scripts utilize the `ChronicleClient` to perform operations and print results for verification. - Enhanced testing capabilities for the SDK, ensuring robust validation of core features.

coderabbitai · 2026-01-14T09:59:42Z

📝 Walkthrough

Walkthrough

This PR introduces waveform-based audio visualization with on-demand generation, soft-delete conversation archival with restoration capability, configuration management refactoring to merge defaults with user config, enhanced speaker recognition with chunked audio processing for large files, memory system updates with updated_at field tracking, HTTP Range request support for audio streaming, and a comprehensive backend state cleanup utility with optional backup and export.

Changes

Cohort / File(s)	Summary
Configuration System Refactoring `backends/advanced/src/advanced_omi_backend/config.py`, `backends/advanced/src/advanced_omi_backend/model_registry.py`	Introduces centralized config merging system combining defaults.yml with user config.yml via `merge_configs()`, caching, and environment variable resolution. Updates model_registry to use merged config path.
Configuration & Environment Files `backends/advanced/.env.template`, `config/defaults.yml`, `backends/advanced/Docs/memories.md`	Adds speaker chunking env vars (threshold, size, overlap); creates comprehensive defaults.yml with models and services; updates memory collection name to chronicle_memories.
Docker & Deployment `backends/advanced/docker-compose.yml`, `backends/advanced/docker-compose-test.yml`	Replaces individual config file mounts with mounted config directory; applies to both test and production services.
Data Models `backends/advanced/src/advanced_omi_backend/models/waveform.py`, `backends/advanced/src/advanced_omi_backend/app_factory.py`, `backends/advanced/src/advanced_omi_backend/models/job.py`	New WaveformData Beanie model with conversation_id index, samples, sample_rate, duration; integrates into Beanie initialization.
Audio Processing & Waveform Generation `backends/advanced/src/advanced_omi_backend/utils/audio_chunk_utils.py`, `backends/advanced/src/advanced_omi_backend/workers/waveform_jobs.py`	Adds `reconstruct_audio_segments()` and `filter_transcript_by_time()` utilities; enforces 30-minute duration cap; adds `generate_waveform_data()` worker to decode Opus, downsample, and normalize waveform samples.
Audio API & Streaming `backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py`	Implements HTTP Range request support (206 Partial Content) with Range header parsing, error handling, and Accept-Ranges headers.
Conversation Management `backends/advanced/src/advanced_omi_backend/controllers/conversation_controller.py`, `backends/advanced/src/advanced_omi_backend/routers/modules/conversation_routes.py`	Adds optional `include_deleted` parameter to filter soft-deleted conversations; new `/waveform` endpoint with on-demand generation and caching.
Conversation UI & Archive `backends/advanced/webui/src/pages/Conversations.tsx`, `backends/advanced/webui/src/pages/Archive.tsx`, `backends/advanced/webui/src/pages/ConversationsRouter.tsx`, `backends/advanced/webui/src/pages/ConversationsTimeline.tsx`	Adds Archive page for viewing/restoring deleted conversations; integrates WaveformDisplay; removes deleted conversation UI from main list; updates API calls to filter deleted conversations.
Waveform UI Component `backends/advanced/webui/src/components/audio/WaveformDisplay.tsx`	New React component that fetches waveform data, renders canvas with vertical bars, displays playback indicator, and handles seek interactions.
Memory System - Timestamps `backends/advanced/src/advanced_omi_backend/services/memory/base.py`, `backends/advanced/src/advanced_omi_backend/services/memory/providers/chronicle.py`, `backends/advanced/src/advanced_omi_backend/services/memory/providers/mycelia.py`, `backends/advanced/src/advanced_omi_backend/services/memory/providers/openmemory_mcp.py`, `backends/advanced/src/advanced_omi_backend/services/memory/providers/vector_stores.py`	Adds `updated_at` field to MemoryEntry; new `get_memory()` method for single memory retrieval; updates providers to extract/set updated_at timestamps; memory collection renamed to chronicle_memories.
Memory Configuration `backends/advanced/src/advanced_omi_backend/services/memory/config.py`, `backends/advanced/src/advanced_omi_backend/services/memory/README.md`	Updates canonical config path resolution via `get_config_yml_path()`; changes default collection from omi_memories to chronicle_memories.
Audio Stream Processing `backends/advanced/src/advanced_omi_backend/services/audio_stream/aggregator.py`, `backends/advanced/src/advanced_omi_backend/services/audio_stream/producer.py`	Changes aggregation to gather all multi-result data with average confidence; adds end_marker signaling to Redis stream on finalization.
Worker Jobs - Speaker Recognition `backends/advanced/src/advanced_omi_backend/workers/speaker_jobs.py`	Introduces chunked audio processing for large files with overlap handling; adds `_merge_overlapping_speaker_segments()` helper; extends threshold-based segment merging; updates `check_enrolled_speakers_job()` signature.
Worker Jobs - Transcription & Conversation `backends/advanced/src/advanced_omi_backend/workers/transcription_jobs.py`, `backends/advanced/src/advanced_omi_backend/workers/conversation_jobs.py`	Adds speech analysis reason tracking and no_speech_detected reason field; validates and filters transcript segments; adds synthetic segment creation from word data.
Audio Upload & Error Handling `backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py`	Adds ValueError handling for chunk conversion with conversation cleanup on failure.
Cleanup Jobs `backends/advanced/src/advanced_omi_backend/workers/cleanup_jobs.py`	Integrates WaveformData deletion into soft-delete cleanup flow; tracks purged_waveforms in return payload.
Backend State Cleanup Utility `backends/advanced/src/scripts/cleanup_state.py`	New comprehensive 900+ line cleanup script with BackupManager (MongoDB/Qdrant/WAV export), CleanupManager (MongoDB/Redis/Qdrant/legacy WAV cleanup), dry-run, force, and optional user deletion; includes backup summary and statistics.
Frontend API Integration `backends/advanced/webui/src/services/api.ts`	Updates `getAll()` to accept optional `includeDeleted` parameter; adds `restore()` and `permanentDelete()` endpoints.
Frontend Utilities `backends/advanced/webui/src/pages/Memories.tsx`	Updates `formatDate()` to handle undefined/null with N/A fallback.
Unit Tests - Memory `backends/advanced/tests/test_memory_entry.py`, `backends/advanced/tests/test_memory_providers.py`	New comprehensive tests for MemoryEntry timestamp initialization, serialization, and provider-specific conversion logic (Mycelia, OpenMemory MCP).
Unit Tests - Configuration `tests/unit/test_config_loading.py`	New tests for merge_configs, caching, reloading, nested merging, list replacement, and edge cases.
Integration Tests - Endpoints `tests/endpoints/memory_tests.robot`	Adds validation for memory updated_at field alongside created_at.
Integration Tests - WebSocket & SDK `tests/integration/websocket_transcription_e2e_test.robot`, `tests/integration/sdk_tests.robot`	New WebSocket transcription end-to-end tests with end_marker validation; new SDK test suite for auth, upload, and conversation retrieval.
Integration Test Scripts `tests/scripts/sdk_test_auth.py`, `tests/scripts/sdk_test_upload.py`, `tests/scripts/sdk_test_conversations.py`	New Python scripts to test SDK authentication, audio upload, and conversation listing.
Test Infrastructure `tests/resources/queue_keywords.robot`, `tests/resources/redis_keywords.robot`	New robot keywords: Get Job Result, Redis Command, Get Backend Logs.

Sequence Diagrams

sequenceDiagram
    participant User as User/Client
    participant API as Conversation API
    participant DB as MongoDB
    participant AudioChunks as AudioChunks
    participant Decoder as Decoder
    participant WaveformWorker as Waveform Worker
    participant Storage as WaveformData

    User->>API: GET /conversations/{id}/waveform
    API->>DB: Query WaveformData by conversation_id
    alt Waveform cached
        DB-->>API: Return WaveformData
        API-->>User: Return samples, duration, sample_rate
    else Waveform not cached
        API->>WaveformWorker: Queue generate_waveform_data
        WaveformWorker->>AudioChunks: Query Opus chunks
        AudioChunks-->>WaveformWorker: Return audio chunks
        WaveformWorker->>Decoder: Decode Opus → PCM samples
        Decoder-->>WaveformWorker: Return raw PCM
        WaveformWorker->>WaveformWorker: Downsample to target rate
        WaveformWorker->>WaveformWorker: Normalize amplitude [-1.0, 1.0]
        WaveformWorker->>Storage: Create WaveformData document
        Storage-->>WaveformWorker: Persisted waveform
        WaveformWorker-->>API: Return generated waveform
        API-->>User: Return samples, duration, sample_rate
    end

sequenceDiagram
    participant User as User/Client
    participant Archive as Archive UI
    participant API as Conversation API
    participant DB as MongoDB
    participant Cleanup as Cleanup Service

    User->>Archive: Open Archive page
    Archive->>API: GET /conversations?include_deleted=true
    API->>DB: Query conversations where deleted=true
    DB-->>API: Return deleted conversations
    API-->>Archive: Populate archive list

    User->>Archive: Click "Restore" button
    Archive->>API: POST /conversations/{id}/restore
    API->>DB: Unset deleted flag on conversation
    DB-->>API: Update successful
    API-->>Archive: Refresh list
    Archive->>API: GET /conversations?include_deleted=true
    API->>DB: Query (updated conversations)
    DB-->>Archive: Restored conversation removed from view

    User->>Archive: Click "Permanent Delete" (admin)
    Archive->>API: DELETE /conversations/{id}?permanent=true
    API->>DB: Delete conversation document
    API->>Cleanup: Trigger cleanup_jobs
    Cleanup->>DB: Delete associated AudioChunks
    Cleanup->>DB: Delete associated WaveformData
    Cleanup-->>API: Cleanup complete
    API-->>Archive: Refresh list

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title 'Fix/conversation view mongo' is vague and does not clearly convey the primary changes in the pull request, which involve extensive waveform data handling, audio processing improvements, configuration merging, and multiple new features.	Consider a more descriptive title that reflects the main changes, such as 'Add waveform visualization and audio processing improvements' or 'Refactor config management and add waveform support'.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 90.27% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-01-14T10:19:22Z

⚠️ Robot Framework Test Results (No API Keys)

Status: ❌ Some tests failed

ℹ️ Note: This run excludes tests requiring external API keys (Deepgram, OpenAI).
Tests tagged with requires-api-keys will run on dev/main branches.

Metric	Count
✅ Passed	79
❌ Failed	18
📊 Total	97

📊 View Reports

GitHub Pages (Live Reports):

Download Artifacts:

robot-test-reports-html-no-api - HTML reports
robot-test-results-xml-no-api - XML output

View full workflow run

- Added configuration options for speaker recognition chunking in `.env.template`, allowing for better management of large audio files. - Updated `get_conversations` function to include an `include_deleted` parameter for filtering conversations based on their deletion status. - Enhanced `finalize_session` method in `AudioStreamProducer` to send an end marker to Redis, ensuring proper session closure. - Introduced `reconstruct_audio_segments` function to yield audio segments with overlap for efficient processing of lengthy conversations. - Implemented merging of overlapping speaker segments to improve accuracy in speaker recognition. - Added integration tests for WebSocket streaming transcription to validate the end_marker functionality and overall transcription flow.

github-actions · 2026-01-14T11:46:50Z

⚠️ Robot Framework Test Results (No API Keys)

Status: ❌ Some tests failed

ℹ️ Note: This run excludes tests requiring external API keys (Deepgram, OpenAI).
Tests tagged with requires-api-keys will run on dev/main branches.

Metric	Count
✅ Passed	79
❌ Failed	18
📊 Total	97

📊 View Reports

GitHub Pages (Live Reports):

Download Artifacts:

robot-test-reports-html-no-api - HTML reports
robot-test-results-xml-no-api - XML output

View full workflow run

github-actions · 2026-01-14T12:34:29Z

⚠️ Robot Framework Test Results (No API Keys)

Status: ❌ Some tests failed

ℹ️ Note: This run excludes tests requiring external API keys (Deepgram, OpenAI).
Tests tagged with requires-api-keys will run on dev/main branches.

Metric	Count
✅ Passed	79
❌ Failed	18
📊 Total	97

📊 View Reports

GitHub Pages (Live Reports):

Download Artifacts:

robot-test-reports-html-no-api - HTML reports
robot-test-results-xml-no-api - XML output

View full workflow run

AnkushMalaker · 2026-01-14T14:47:25Z

@coderabbitai review

coderabbitai · 2026-01-14T14:47:32Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

backends/advanced/src/advanced_omi_backend/models/job.py (1)
49-54: Bug: raise statement makes the ConfigurationError fallback ineffective, and duplicate flag assignment.

The except block catches ConfigurationError and sets a fallback database, but then immediately re-raises the exception, making the fallback useless. Additionally, _beanie_initialized is set to True on line 54 before init_beanie is actually called, and then redundantly set again on line 61.
🐛 Proposed fix
             try:
                 database = client.get_default_database(mongodb_database)
             except ConfigurationError:
                 database = client[mongodb_database]
-                raise
-            _beanie_initialized = True
+
             # Initialize Beanie
             await init_beanie(
                 database=database,
                 document_models=[User, Conversation, AudioChunkDocument, WaveformData],
             )

             _beanie_initialized = True
             logger.info("✅ Beanie initialized in RQ worker process")
Also applies to: 61-61

🤖 Fix all issues with AI agents

In `@backends/advanced/Docs/memories.md`:
- Line 502: Search the repo for the string "omi_memories" and replace it with
"chronicle_memories" in all configuration and test YAML files (the defaults and
test configs referenced in the review), update any docs examples (e.g.,
curl/verification lines) to use "chronicle_memories", and run tests to ensure no
remaining references to "omi_memories" remain.

In `@backends/advanced/src/advanced_omi_backend/models/waveform.py`:
- Around line 43-47: The waveform model's conversation_id isn't enforced unique
despite the comment; update the field definition from Indexed(str) to
Indexed(str, unique=True) (the same pattern used in Conversation) so inserts
cannot create duplicate conversation_id values, and keep the Settings.indexes
entry as-is; modify the conversation_id declaration in the Waveform model to use
Indexed(str, unique=True).

In `@backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py`:
- Around line 128-163: Validate the parsed byte range after splitting
range_header: after computing range_start and range_end (and clamping to
0..file_size-1) ensure range_start <= range_end; if not, return a 416 Range Not
Satisfiable with "Content-Range: bytes */{file_size}". Also guard against
negative content_length by computing content_length = range_end - range_start +
1 only after the validation, and use the slice wav_data[range_start:range_end +
1] only when valid. Remove the unused exception variable `e` in the except
clause so the handler catches (ValueError, IndexError) and returns 416 without
referencing `e`.

In `@backends/advanced/src/advanced_omi_backend/services/memory/config.py`:
- Line 143: The code default for collection_name (variable collection_name in
config.py) was changed to "chronicle_memories" while existing defaults.yml and
tests still reference "omi_memories", causing a breaking mismatch; fix by
reverting the config.py default back to "omi_memories" (restore collection_name:
str = "omi_memories") so existing deployments keep working, update any unit
tests that assert the default to match, and add a short migration note in
project docs explaining the planned rename and how to opt into
"chronicle_memories" via explicit config so future changes are non-breaking.

In `@backends/advanced/src/advanced_omi_backend/utils/audio_chunk_utils.py`:
- Around line 430-434: The code incorrectly uses
Conversation.get(conversation_id) which looks up by MongoDB _id; replace it with
the indexed-field lookup used elsewhere (pattern at lines 669-671) by awaiting
Conversation.find_one with the filter on the conversation_id field (e.g., await
Conversation.find_one({"conversation_id": conversation_id}) or equivalent
keyword-arg form) and keep the existing not-found ValueError branch.

In `@backends/advanced/src/advanced_omi_backend/workers/conversation_jobs.py`:
- Around line 570-598: The first logger call uses an unnecessary f-string:
change logger.info(f"📝 No segments in streaming results, creating from
word-level data") to a plain string (remove the leading f) in the branch where
segments_data is empty (the block that calls
aggregator.get_session_results(session_id) and appends to segments_data); leave
other f-strings (e.g., the following logger.info that includes
{len(segments_data)}) unchanged.

In `@backends/advanced/src/advanced_omi_backend/workers/speaker_jobs.py`:
- Around line 18-21: The function _merge_overlapping_speaker_segments declares
an unused overlap parameter; either remove it from the signature (and all
callers) if not needed, or incorporate it into the merge logic by comparing the
actual overlap against the provided overlap threshold (e.g., compute overlap =
min(current["end"], next_seg["end"]) - max(current["start"], next_seg["start"])
or check next_seg["start"] - current["end"] <= overlap) and only merge when the
overlap condition is met; update the function signature, any callers/tests, and
the docstring to reflect the chosen behavior and ensure segment merging uses
segments, next_seg["start"], and current["end"] consistently.
- Around line 319-323: The generator loop over reconstruct_audio_segments
currently unpacks a variable named start_time which shadows the earlier
job_start_time used for profiling; rename the unpacked variable (e.g.,
segment_start_time or wav_start_time) in the async for unpack (async for
wav_data, segment_start_time, end_time in reconstruct_audio_segments(...)) and
update all in-loop references to that new name, and ensure all
processing_time_seconds calculations use time.time() - job_start_time (not the
unpacked name) so timing metrics (processing_time_seconds / processing_time)
remain correct.

In `@backends/advanced/src/advanced_omi_backend/workers/waveform_jobs.py`:
- Around line 19-34: The docstring for the function that accepts parameter
sample_rate incorrectly states the default is 10 while the function signature
sets sample_rate: int = 3; update the docstring to state the correct default (3)
so the signature and documentation match, referencing the sample_rate parameter
in the function (waveform generation function in waveform_jobs.py) and adjust
the "Args" section to list "sample_rate: Samples per second for waveform
(default: 3)".

In `@backends/advanced/webui/src/components/audio/WaveformDisplay.tsx`:
- Around line 77-95: The drawWaveform function can divide by zero when samples
is empty; add an early guard at the start of drawWaveform that returns
immediately if !samples || samples.length === 0 to avoid computing barWidth =
width / samples.length and any downstream issues; update the function
(drawWaveform) to perform this check before computing barWidth/centerY and
before using samples.forEach.

In `@tests/integration/sdk_tests.robot`:
- Around line 59-71: The test "SDK Upload Respects Backend File Size Limit"
currently only re-runs the authentication script (Run Process ...
${CURDIR}/../scripts/sdk_test_auth.py) and doesn't exercise upload/oversize
behavior; either mark it as skipped until implemented or rename it to reflect
it's a placeholder: add a skip tag (e.g., add "skip" to the [Tags] line) and
update the test name or [Documentation] to indicate it's a placeholder that
re-runs sdk_test_auth.py, or alternatively change the test name to something
like "Placeholder: Re-run SDK auth script" so the intent and behavior match.

In `@tests/resources/queue_keywords.robot`:
- Around line 330-342: The Robot keyword "Get Job Result" is calling the wrong
endpoint path "/api/queue/job/${job_id}" which should be plural; update the GET
call in that keyword to use "/api/queue/jobs/${job_id}" so it matches the
backend route and will return 200; ensure the GET On Session invocation and any
related variables (${response}, ${job_data}, ${result}) remain unchanged.

🧹 Nitpick comments (29)

backends/advanced/src/advanced_omi_backend/services/transcription/__init__.py (2)
151-181: Consider adding defensive type check for each segment.

The normalization logic correctly handles Deepgram's nested response structure. However, if a malformed response contains a non-dict element in the segments list, seg.get() will raise an AttributeError.
♻️ Suggested defensive check
             if provider == "deepgram" and segments:
                 normalized_segments = []
                 for seg in segments:
+                    if not isinstance(seg, dict):
+                        logger.debug(f"Skipping non-dict Deepgram segment: {seg}")
+                        continue
                     # Deepgram segments may have nested structure
                     # Extract text from either 'text' or 'transcript' or 'sentences'
                     text_content = seg.get("text") or seg.get("transcript") or ""
171-177: Default confidence of 1.0 may overstate certainty.

When confidence data is missing from a segment, defaulting to 1.0 (100% confidence) could be misleading to downstream consumers. Consider using 0.0 or None to indicate unknown confidence, unless downstream code explicitly requires a numeric value.
backends/advanced/src/advanced_omi_backend/utils/audio_chunk_utils.py (1)
462-466: Inconsistent sort syntax.

Line 466 uses .sort(+AudioChunkDocument.chunk_index) while line 281 uses .sort("+chunk_index"). The unary + operator on a Beanie field object is non-standard and may not behave as intended.
♻️ Suggested fix for consistency
-        ).sort(+AudioChunkDocument.chunk_index).to_list()
+        ).sort("+chunk_index").to_list()
backends/advanced/src/advanced_omi_backend/workers/speaker_jobs.py (1)
386-386: Remove extraneous f-string prefix.

This f-string has no placeholders.
-            logger.info(f"🎤 Calling speaker recognition service...")
+            logger.info("🎤 Calling speaker recognition service...")
backends/advanced/webui/src/pages/ConversationsTimeline.tsx (1)
274-276: Consider removing dead conditional styling for deleted conversations.

Since getAll(false) now excludes deleted conversations, conv.deleted will always be false for all items in the list. The conditional conv.deleted ? '#fee2e2' : '#fff' is effectively dead code.
♻️ Suggested cleanup
                 contentStyle={{
-                  background: conv.deleted ? '#fee2e2' : '#fff',
+                  background: '#fff',
                   color: '#1f2937',
                   boxShadow: '0 3px 0 `#ddd`'
                 }}
tests/scripts/sdk_test_conversations.py (1)
12-14: Consider adding argument validation for clearer error messages.

Accessing sys.argv[1:4] without validation will raise an IndexError if arguments are missing, which may be confusing for users running the script incorrectly.
♻️ Suggested improvement
+if len(sys.argv) < 4:
+    print("Usage: sdk_test_conversations.py <backend_url> <email> <password>", file=sys.stderr)
+    sys.exit(1)
+
 backend_url = sys.argv[1]
 email = sys.argv[2]
 password = sys.argv[3]
tests/scripts/sdk_test_auth.py (1)
12-18: Consider adding argument validation and error handling.

The script will crash with an IndexError if fewer than 3 arguments are provided. For test scripts used in CI/automation, clearer error messages help debugging.
🔧 Suggested improvement
-backend_url = sys.argv[1]
-email = sys.argv[2]
-password = sys.argv[3]
-
-client = ChronicleClient(backend_url)
-client.login(email, password)
-print("SUCCESS")
+if len(sys.argv) < 4:
+    print("Usage: sdk_test_auth.py <backend_url> <email> <password>", file=sys.stderr)
+    sys.exit(1)
+
+backend_url = sys.argv[1]
+email = sys.argv[2]
+password = sys.argv[3]
+
+try:
+    client = ChronicleClient(backend_url)
+    client.login(email, password)
+    print("SUCCESS")
+except Exception as e:
+    print(f"FAILED: {e}", file=sys.stderr)
+    sys.exit(1)
backends/advanced/src/advanced_omi_backend/services/audio_stream/aggregator.py (1)
110-154: LGTM with a suggestion: consider sorting segments by timestamp.

The aggregation logic correctly combines text, words, and segments from all results. The average confidence calculation and early return for empty results are properly handled.

However, all_segments are collected in result order but not sorted by their internal timestamps. If segments from different chunks have overlapping or out-of-order timings, the final segment list may not be chronologically sorted.
🔧 Optional: Sort segments by start time
         # Join all text segments with spaces
         combined_text = " ".join(all_text)

+        # Sort segments by start time if present
+        if all_segments and "start" in all_segments[0]:
+            all_segments.sort(key=lambda s: s.get("start", 0))
+
         combined = {
             "text": combined_text,
             "words": all_words,
             "segments": all_segments,
backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py (1)
147-157: Use logging.exception for better traceability.

The static analysis tool correctly flags that logging.exception would be more appropriate here as it automatically includes the stack trace, which aids debugging validation failures.
Proposed fix
                except ValueError as val_error:
                    # Handle validation errors (e.g., file too long)
-                   audio_logger.error(f"Audio validation failed: {val_error}")
+                   audio_logger.exception(f"Audio validation failed: {val_error}")
                    processed_files.append({
                        "filename": file.filename,
                        "status": "error",
                        "error": str(val_error),
                    })
                    # Delete the conversation since it won't have audio chunks
                    await conversation.delete()
                    continue
tests/scripts/sdk_test_upload.py (2)
12-15: Add argument validation for clearer error messages.

Direct sys.argv access will raise an unclear IndexError if arguments are missing. Consider using argparse or at least validating argument count for better test failure diagnostics.
Proposed fix with basic validation
-backend_url = sys.argv[1]
-email = sys.argv[2]
-password = sys.argv[3]
-audio_file = sys.argv[4]
+if len(sys.argv) != 5:
+    print(f"Usage: {sys.argv[0]} <backend_url> <email> <password> <audio_file>", file=sys.stderr)
+    sys.exit(1)
+
+backend_url = sys.argv[1]
+email = sys.argv[2]
+password = sys.argv[3]
+audio_file = sys.argv[4]
21-23: Guard against empty files list.

Accessing result.files[0] will raise IndexError if the upload returns an empty files list. A guard would make test failures clearer.
Proposed fix
+if not result.files:
+    print("ERROR:No files in response")
+    sys.exit(1)
+
 print(f"STATUS:{result.files[0].status}")
 if result.files[0].conversation_id:
     print(f"CONVERSATION_ID:{result.files[0].conversation_id}")
backends/advanced/src/advanced_omi_backend/models/waveform.py (1)
34-37: Consider using timezone-aware datetime.

datetime.utcnow() is deprecated since Python 3.12. While functional, consider using datetime.now(timezone.utc) for timezone-aware timestamps.
Proposed fix
+from datetime import datetime, timezone
-from datetime import datetime
 ...
     created_at: datetime = Field(
-        default_factory=datetime.utcnow,
+        default_factory=lambda: datetime.now(timezone.utc),
         description="When this waveform was generated"
     )
tests/resources/redis_keywords.robot (1)
78-99: LGTM with optional modernization suggestion.

The Redis Command keyword correctly handles command execution, error checking, and numeric coercion. The logic is sound.

Minor note: Run Keyword If (line 95-97) is considered legacy syntax in Robot Framework 4.0+. The modern IF/ELSE syntax is preferred, though this is purely stylistic and the current implementation works correctly.
♻️ Optional: Use modern IF/ELSE syntax
     # Try to convert to integer if it's a number (for commands like XLEN)
     ${is_digit}=    Run Keyword And Return Status    Should Match Regexp    ${output}    ^\\d+$
-    ${return_value}=    Run Keyword If    ${is_digit}
-    ...    Convert To Integer    ${output}
-    ...    ELSE    Set Variable    ${output}
+    IF    ${is_digit}
+        ${return_value}=    Convert To Integer    ${output}
+    ELSE
+        ${return_value}=    Set Variable    ${output}
+    END

     RETURN    ${return_value}
config/defaults.yml (1)

29-40: Consider clarifying the API key environment variable for local models.

Using ${OPENAI_API_KEY:-ollama} for the local Ollama LLM (and similarly for local-embed at line 76) may cause confusion since Ollama doesn't typically require OpenAI credentials. This works because the default fallback is ollama, but a dedicated variable like ${OLLAMA_API_KEY:-ollama} or a comment clarifying this pattern would improve clarity.
backends/advanced/src/advanced_omi_backend/workers/waveform_jobs.py (1)
159-168: Consider upsert to prevent duplicate waveforms on concurrent requests.

If two requests race to generate a waveform for the same conversation, both will insert separate documents. Consider using find_one_and_update with upsert=True or adding a unique index on conversation_id in the WaveformData model to ensure idempotency.
♻️ Example using find_one_and_replace with upsert
-        await waveform_doc.insert()
+        # Use upsert to prevent duplicates from concurrent requests
+        await WaveformData.find_one(
+            WaveformData.conversation_id == conversation_id
+        ).upsert(
+            Set(waveform_doc.model_dump(exclude={"id"})),
+            on_insert=waveform_doc
+        )
Or add a unique index in the model:
class Settings:
    name = "waveforms"
    indexes = [
        IndexModel([("conversation_id", 1)], unique=True),
    ]
backends/advanced/src/advanced_omi_backend/routers/modules/conversation_routes.py (1)
117-120: Move imports to module level.

HTTPException is a standard FastAPI import and should be at the top of the file with other FastAPI imports for consistency. The model imports can remain deferred if there are circular import concerns, but HTTPException has no such issues.
♻️ Suggested change

At line 10, update the import:
-from fastapi import APIRouter, Depends, Query
+from fastapi import APIRouter, Depends, HTTPException, Query
Then at lines 117-120:
-    from fastapi import HTTPException
-    from advanced_omi_backend.models.conversation import Conversation
-    from advanced_omi_backend.models.waveform import WaveformData
-    from advanced_omi_backend.workers.waveform_jobs import generate_waveform_data
+    from advanced_omi_backend.models.conversation import Conversation
+    from advanced_omi_backend.models.waveform import WaveformData
+    from advanced_omi_backend.workers.waveform_jobs import generate_waveform_data
backends/advanced/webui/src/components/audio/WaveformDisplay.tsx (2)
116-137: Remove debug console.log statements before production.

The console.log and console.warn statements at lines 117, 120, 125, and 134 appear to be debug artifacts. Consider removing them or converting to a debug-only logger to reduce console noise in production.

51-75: Consider handling window resize.

The canvas dimensions are calculated once when the effect runs. If the user resizes the window, the waveform may display incorrectly until a re-render is triggered. Consider adding a ResizeObserver or window resize listener.
♻️ Example resize handling
useEffect(() => {
  if (!canvasRef.current) return;
  
  const resizeObserver = new ResizeObserver(() => {
    // Trigger redraw by updating a state or calling draw directly
  });
  
  resizeObserver.observe(canvasRef.current);
  return () => resizeObserver.disconnect();
}, []);
tests/integration/websocket_transcription_e2e_test.robot (1)
55-66: Consider replacing fixed Sleep with polling.

Fixed Sleep 5s can cause flakiness if the system is slow, or waste time if it's fast. Consider using Wait Until Keyword Succeeds with a polling approach to check the Redis stream.
♻️ Example polling approach
    # Instead of:
    Sleep    5s

    # Consider:
    ${stream_length}=    Wait Until Keyword Succeeds    10s    1s
    ...    Get Redis Stream Length Above Zero    ${stream_name}
Where Get Redis Stream Length Above Zero is a keyword that returns the length only if > 0, otherwise fails.
backends/advanced/src/scripts/cleanup_state.py (2)
552-557: Hardcoded vector dimension may not match original collection.

The recreated collection uses a hardcoded 1536 dimensions (OpenAI ada-002). If a different embedding model is used, this will create a mismatched collection. Consider retrieving the original collection's vector config before deletion.
♻️ Suggested approach
            # Get count and config before deletion
            collection_info = await self.qdrant_client.get_collection(collection_name)
            stats.memories_count = collection_info.points_count
+           original_vector_size = collection_info.config.params.vectors.size
+           original_distance = collection_info.config.params.vectors.distance

            # Delete and recreate collection
            await self.qdrant_client.delete_collection(collection_name)
            logger.info(f"Deleted memories collection ({stats.memories_count} vectors)")

            # Recreate with original configuration
            await self.qdrant_client.create_collection(
                collection_name=collection_name,
-               vectors_config=VectorParams(size=1536, distance=Distance.COSINE)
+               vectors_config=VectorParams(size=original_vector_size, distance=original_distance)
            )
501-524: Unused result assignments from delete operations.

The result variables are assigned but never used. While harmless, you could either remove the assignments or use them to verify the actual deleted count matches expectations.
♻️ Option 1: Remove assignments
-        result = await Conversation.find_all().delete()
+        await Conversation.find_all().delete()
♻️ Option 2: Verify counts
-        result = await Conversation.find_all().delete()
-        logger.info(f"Deleted {stats.conversations_count} conversations")
+        result = await Conversation.find_all().delete()
+        logger.info(f"Deleted {result.deleted_count} conversations (expected {stats.conversations_count})")
backends/advanced/src/advanced_omi_backend/model_registry.py (1)
289-300: Consider using logging.exception for better stack trace visibility in fallback path.

The fallback logic is reasonable for resilience, but logging.error loses the stack trace. Using logging.exception would preserve diagnostic information when the merged config load fails.
♻️ Suggested improvement
     # Try to get merged configuration (defaults + user config)
     try:
         raw = get_config(force_reload=force_reload)
     except Exception as e:
-        logging.error(f"Failed to load merged configuration: {e}")
+        logging.exception(f"Failed to load merged configuration: {e}")
         # Fallback to direct config.yml loading
         cfg_path = _find_config_path()
         if not cfg_path.exists():
             return None
         with cfg_path.open("r") as f:
             raw = yaml.safe_load(f) or {}
         raw = _deep_resolve_env(raw)
backends/advanced/src/advanced_omi_backend/services/memory/providers/vector_stores.py (1)
386-446: New get_memory method looks correct, but consider extracting the ID conversion logic.

The method correctly handles UUID/integer ID formats and validates user ownership when user_id is provided.

However, the ID conversion logic (lines 398-409) is duplicated across delete_memory, update_memory, and now get_memory. Consider extracting this to a helper method to reduce duplication.

Also, the import uuid on line 398 is unnecessary since uuid is already imported at the module level (line 11).
♻️ Suggested refactor: Extract ID conversion helper and remove duplicate import
+    def _normalize_point_id(self, memory_id: str):
+        """Convert memory_id to Qdrant-compatible format (UUID string or integer)."""
+        try:
+            uuid.UUID(memory_id)
+            return memory_id
+        except ValueError:
+            try:
+                return int(memory_id)
+            except ValueError:
+                return memory_id

     async def get_memory(self, memory_id: str, user_id: Optional[str] = None) -> Optional[MemoryEntry]:
         """Get a specific memory by ID from Qdrant.
         ...
         """
         try:
-            # Convert memory_id to proper format for Qdrant
-            import uuid
-            try:
-                # Try to parse as UUID first
-                uuid.UUID(memory_id)
-                point_id = memory_id
-            except ValueError:
-                # If not a UUID, try as integer
-                try:
-                    point_id = int(memory_id)
-                except ValueError:
-                    # If neither UUID nor integer, use it as-is
-                    point_id = memory_id
+            point_id = self._normalize_point_id(memory_id)

             # Retrieve the point by ID
             ...
The same refactor can be applied to delete_memory and update_memory.
backends/advanced/webui/src/pages/Archive.tsx (3)
68-79: Consider simplifying date format detection.

The condition on line 70 is complex and may not cover all edge cases (e.g., timestamps with timezone offsets like +05:30). Consider using a more robust approach:
♻️ Suggested simplification
 const formatDate = (timestamp: number | string) => {
   if (typeof timestamp === 'string') {
-    const isoString = timestamp.endsWith('Z') || timestamp.includes('+') || timestamp.includes('T') && timestamp.split('T')[1].includes('-')
-      ? timestamp
-      : timestamp + 'Z'
-    return new Date(isoString).toLocaleString()
+    // If it's a numeric string (Unix timestamp), convert to number
+    if (/^\d+$/.test(timestamp)) {
+      return new Date(parseInt(timestamp, 10) * 1000).toLocaleString()
+    }
+    // Otherwise treat as ISO string
+    return new Date(timestamp).toLocaleString()
   }
   if (timestamp === 0) {
     return 'Unknown date'
   }
   return new Date(timestamp * 1000).toLocaleString()
 }
352-369: Consider using a stable key for transcript segments.

Using index as the React key works here since segments are static after fetch, but a composite key would be more robust if segments are ever modified:
♻️ Suggested improvement
-                                {segments.map((segment, index) => {
+                                {segments.map((segment, index) => {
+                                  const segmentKey = `${segment.start}-${segment.end}-${index}`
                                   const speaker = segment.speaker || 'Unknown'
                                   return (
                                     <div
-                                      key={index}
+                                      key={segmentKey}
                                       className="text-sm leading-relaxed flex items-start space-x-2 py-1 px-2 rounded hover:bg-gray-50 dark:hover:bg-gray-700"
                                     >
3-4: Combine imports from the same module.
♻️ Suggested fix
-import { conversationsApi } from '../services/api'
-import { authApi } from '../services/api'
+import { conversationsApi, authApi } from '../services/api'
backends/advanced/src/advanced_omi_backend/config.py (2)
101-109: Consider emitting a deprecation warning.

The function is marked deprecated in the docstring but callers won't see runtime feedback. Consider adding a warning log to encourage migration.
♻️ Optional: Add deprecation warning
 def get_defaults_config_path():
     """
     Get the path to the defaults config file.

     DEPRECATED: Use get_defaults_yml_path() instead.
     Kept for backward compatibility.
     """
+    logger.warning("get_defaults_config_path() is deprecated, use get_defaults_yml_path() instead")
     defaults_path = get_defaults_yml_path()
     return defaults_path if defaults_path.exists() else None
164-180: Use logger.exception for better error diagnostics.

The broad Exception catches are acceptable for config loading robustness. However, using logger.exception instead of logger.warning/logger.error would include stack traces for easier debugging.
♻️ Proposed improvement
         try:
             with open(defaults_path, 'r') as f:
                 defaults = yaml.safe_load(f) or {}
             logger.info(f"Loaded defaults from {defaults_path}")
         except Exception as e:
-            logger.warning(f"Could not load defaults from {defaults_path}: {e}")
+            logger.exception(f"Could not load defaults from {defaults_path}")

     # Load user config
     config_path = get_config_yml_path()
     user_config = {}
     if config_path.exists():
         try:
             with open(config_path, 'r') as f:
                 user_config = yaml.safe_load(f) or {}
             logger.info(f"Loaded config from {config_path}")
         except Exception as e:
-            logger.error(f"Error loading config from {config_path}: {e}")
+            logger.exception(f"Error loading config from {config_path}")
tests/unit/test_config_loading.py (1)
124-132: Improve test to verify cache invalidation.

The test assigns config1 to establish a cache but doesn't use it to verify the reload behavior. The comment suggests checking for different instances, but the assertion only verifies the return type.
♻️ Proposed improvement
 def test_reload_config():
     """Test config reload invalidates cache."""
     config1 = get_config()
     config2 = reload_config()
 
-    # Should be different instances after reload
-    # (Note: Content might be the same, but object should be different)
-    # We check that reload returns a config object
-    assert isinstance(config2, dict)
+    # After reload, should be a different object instance
+    # (content might be identical, but the dict object should be new)
+    assert isinstance(config2, dict)
+    assert config1 is not config2

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 912c1cd and 078c602.

📒 Files selected for processing (49)

backends/advanced/.env.template
backends/advanced/Docs/memories.md
backends/advanced/docker-compose-test.yml
backends/advanced/docker-compose.yml
backends/advanced/src/advanced_omi_backend/app_factory.py
backends/advanced/src/advanced_omi_backend/config.py
backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py
backends/advanced/src/advanced_omi_backend/controllers/conversation_controller.py
backends/advanced/src/advanced_omi_backend/model_registry.py
backends/advanced/src/advanced_omi_backend/models/job.py
backends/advanced/src/advanced_omi_backend/models/waveform.py
backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py
backends/advanced/src/advanced_omi_backend/routers/modules/conversation_routes.py
backends/advanced/src/advanced_omi_backend/services/audio_stream/aggregator.py
backends/advanced/src/advanced_omi_backend/services/audio_stream/producer.py
backends/advanced/src/advanced_omi_backend/services/memory/README.md
backends/advanced/src/advanced_omi_backend/services/memory/base.py
backends/advanced/src/advanced_omi_backend/services/memory/config.py
backends/advanced/src/advanced_omi_backend/services/memory/providers/chronicle.py
backends/advanced/src/advanced_omi_backend/services/memory/providers/mycelia.py
backends/advanced/src/advanced_omi_backend/services/memory/providers/openmemory_mcp.py
backends/advanced/src/advanced_omi_backend/services/memory/providers/vector_stores.py
backends/advanced/src/advanced_omi_backend/services/transcription/__init__.py
backends/advanced/src/advanced_omi_backend/utils/audio_chunk_utils.py
backends/advanced/src/advanced_omi_backend/workers/cleanup_jobs.py
backends/advanced/src/advanced_omi_backend/workers/conversation_jobs.py
backends/advanced/src/advanced_omi_backend/workers/speaker_jobs.py
backends/advanced/src/advanced_omi_backend/workers/transcription_jobs.py
backends/advanced/src/advanced_omi_backend/workers/waveform_jobs.py
backends/advanced/src/scripts/cleanup_state.py
backends/advanced/tests/test_memory_entry.py
backends/advanced/tests/test_memory_providers.py
backends/advanced/webui/src/components/audio/WaveformDisplay.tsx
backends/advanced/webui/src/pages/Archive.tsx
backends/advanced/webui/src/pages/Conversations.tsx
backends/advanced/webui/src/pages/ConversationsRouter.tsx
backends/advanced/webui/src/pages/ConversationsTimeline.tsx
backends/advanced/webui/src/pages/Memories.tsx
backends/advanced/webui/src/services/api.ts
config/defaults.yml
tests/endpoints/memory_tests.robot
tests/integration/sdk_tests.robot
tests/integration/websocket_transcription_e2e_test.robot
tests/resources/queue_keywords.robot
tests/resources/redis_keywords.robot
tests/scripts/sdk_test_auth.py
tests/scripts/sdk_test_conversations.py
tests/scripts/sdk_test_upload.py
tests/unit/test_config_loading.py

🧰 Additional context used

🧬 Code graph analysis (22)

backends/advanced/src/advanced_omi_backend/services/transcription/__init__.py (1)

backends/advanced/src/advanced_omi_backend/models/conversation.py (1)

segments (211-213)

backends/advanced/webui/src/pages/ConversationsTimeline.tsx (1)

backends/advanced/webui/src/services/api.ts (1)

conversationsApi (109-130)

backends/advanced/webui/src/components/audio/WaveformDisplay.tsx (2)

backends/advanced/src/advanced_omi_backend/models/waveform.py (1)

WaveformData (16-47)

backends/advanced/webui/src/services/api.ts (1)

api (55-58)

tests/unit/test_config_loading.py (1)

backends/advanced/src/advanced_omi_backend/config.py (2)

merge_configs (112-136)

reload_config (200-204)

backends/advanced/src/advanced_omi_backend/workers/conversation_jobs.py (3)

backends/advanced/src/advanced_omi_backend/models/conversation.py (1)

segments (211-213)

backends/advanced/src/advanced_omi_backend/utils/conversation_utils.py (1)

extract_speakers_from_segments (407-423)

backends/advanced/src/advanced_omi_backend/services/audio_stream/aggregator.py (1)

get_session_results (29-79)

backends/advanced/src/advanced_omi_backend/routers/modules/conversation_routes.py (3)

backends/advanced/src/advanced_omi_backend/controllers/conversation_controller.py (1)

get_conversations (133-187)

backends/advanced/src/advanced_omi_backend/models/waveform.py (1)

WaveformData (16-47)

backends/advanced/src/advanced_omi_backend/workers/waveform_jobs.py (1)

generate_waveform_data (17-190)

backends/advanced/src/advanced_omi_backend/models/job.py (3)

backends/advanced/src/advanced_omi_backend/models/waveform.py (1)

WaveformData (16-47)

backends/advanced/src/advanced_omi_backend/models/conversation.py (1)

Conversation (17-328)

backends/advanced/src/advanced_omi_backend/models/audio_chunk.py (1)

AudioChunkDocument (16-158)

backends/advanced/src/advanced_omi_backend/services/memory/providers/openmemory_mcp.py (1)

backends/advanced/src/advanced_omi_backend/services/memory/base.py (1)

MemoryEntry (26-69)

backends/advanced/webui/src/pages/ConversationsRouter.tsx (1)

backends/advanced/webui/src/pages/Archive.tsx (1)

Archive (28-389)

backends/advanced/src/advanced_omi_backend/services/memory/config.py (1)

backends/advanced/src/advanced_omi_backend/config.py (1)

get_config_yml_path (91-93)

backends/advanced/src/advanced_omi_backend/services/audio_stream/aggregator.py (1)

backends/advanced/src/advanced_omi_backend/models/conversation.py (1)

segments (211-213)

backends/advanced/src/advanced_omi_backend/controllers/conversation_controller.py (3)

backends/advanced/src/advanced_omi_backend/routers/modules/conversation_routes.py (1)

get_conversations (31-36)

backends/advanced/src/advanced_omi_backend/models/user.py (2)

User (51-98)

user_id (71-73)

backends/advanced/src/advanced_omi_backend/models/conversation.py (1)

Conversation (17-328)

backends/advanced/src/advanced_omi_backend/services/memory/providers/vector_stores.py (6)

backends/advanced/src/advanced_omi_backend/models/user.py (1)

user_id (71-73)

backends/advanced/src/advanced_omi_backend/services/memory/base.py (2)

get_memory (171-185)

MemoryEntry (26-69)

backends/advanced/src/advanced_omi_backend/services/memory/providers/chronicle.py (1)

get_memory (300-322)

backends/advanced/src/advanced_omi_backend/services/memory/providers/mycelia.py (1)

get_memory (613-648)

backends/advanced/src/advanced_omi_backend/services/memory/providers/openmemory_mcp.py (1)

get_memory (285-320)

backends/advanced/src/advanced_omi_backend/services/memory/providers/mcp_client.py (1)

get_memory (364-405)

backends/advanced/tests/test_memory_entry.py (1)

backends/advanced/src/advanced_omi_backend/services/memory/base.py (2)

MemoryEntry (26-69)

to_dict (57-69)

backends/advanced/src/advanced_omi_backend/config.py (1)

backends/advanced/src/advanced_omi_backend/model_registry.py (1)

_deep_resolve_env (61-90)

backends/advanced/src/advanced_omi_backend/model_registry.py (1)

backends/advanced/src/advanced_omi_backend/config.py (2)

get_config (143-197)

get_config_yml_path (91-93)

backends/advanced/src/advanced_omi_backend/workers/waveform_jobs.py (2)

backends/advanced/src/advanced_omi_backend/models/waveform.py (1)

WaveformData (16-47)

backends/advanced/src/advanced_omi_backend/utils/audio_chunk_utils.py (2)

retrieve_audio_chunks (246-288)

decode_opus_to_pcm (117-195)

backends/advanced/src/advanced_omi_backend/workers/cleanup_jobs.py (1)

backends/advanced/src/advanced_omi_backend/models/waveform.py (1)

WaveformData (16-47)

backends/advanced/src/advanced_omi_backend/utils/audio_chunk_utils.py (2)

backends/advanced/src/advanced_omi_backend/models/conversation.py (1)

Conversation (17-328)

backends/advanced/src/advanced_omi_backend/models/audio_chunk.py (2)

AudioChunkDocument (16-158)

compression_ratio (140-144)

backends/advanced/src/scripts/cleanup_state.py (3)

backends/advanced/src/advanced_omi_backend/models/conversation.py (1)

Conversation (17-328)

backends/advanced/src/advanced_omi_backend/models/audio_chunk.py (1)

AudioChunkDocument (16-158)

backends/advanced/src/advanced_omi_backend/models/waveform.py (1)

WaveformData (16-47)

backends/advanced/src/advanced_omi_backend/app_factory.py (1)

backends/advanced/src/advanced_omi_backend/models/waveform.py (1)

WaveformData (16-47)

backends/advanced/src/advanced_omi_backend/services/memory/providers/chronicle.py (4)

backends/advanced/src/advanced_omi_backend/services/memory/providers/mycelia.py (2)

get_memory (613-648)

initialize (82-105)

backends/advanced/src/advanced_omi_backend/services/memory/providers/openmemory_mcp.py (2)

get_memory (285-320)

initialize (71-104)

backends/advanced/src/advanced_omi_backend/services/memory/providers/vector_stores.py (2)

get_memory (386-446)

initialize (52-118)

backends/advanced/src/advanced_omi_backend/models/user.py (1)

user_id (71-73)

🪛 dotenv-linter (4.0.0)

backends/advanced/.env.template

[warning] 171-171: [ValueWithoutQuotes] This value needs to be surrounded in quotes

(ValueWithoutQuotes)

[warning] 172-172: [UnorderedKey] The SPEAKER_CHUNK_SIZE key should go before the SPEAKER_CHUNK_THRESHOLD key

(UnorderedKey)

[warning] 172-172: [ValueWithoutQuotes] This value needs to be surrounded in quotes

(ValueWithoutQuotes)

[warning] 173-173: [UnorderedKey] The SPEAKER_CHUNK_OVERLAP key should go before the SPEAKER_CHUNK_SIZE key

(UnorderedKey)

[warning] 173-173: [ValueWithoutQuotes] This value needs to be surrounded in quotes

(ValueWithoutQuotes)

🪛 Ruff (0.14.11)

backends/advanced/src/advanced_omi_backend/models/waveform.py

45-47: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

tests/unit/test_config_loading.py

126-126: Local variable config1 is assigned to but never used

Remove assignment to unused variable config1

(F841)

backends/advanced/src/advanced_omi_backend/workers/conversation_jobs.py

573-573: f-string without any placeholders

Remove extraneous f prefix

(F541)

backends/advanced/src/advanced_omi_backend/routers/modules/conversation_routes.py

33-33: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

98-98: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

backends/advanced/src/advanced_omi_backend/controllers/audio_controller.py

149-149: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

166-166: Use explicit conversion flag

Replace with conversion flag

(RUF010)

backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py

156-156: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

backends/advanced/src/advanced_omi_backend/services/memory/config.py

82-85: Avoid specifying long messages outside the exception class

(TRY003)

backends/advanced/src/advanced_omi_backend/controllers/conversation_controller.py

142-142: Avoid equality comparisons to False; use not Conversation.deleted: for false checks

Replace with not Conversation.deleted

(E712)

149-149: Avoid equality comparisons to False; use not Conversation.deleted: for false checks

Replace with not Conversation.deleted

(E712)

backends/advanced/src/advanced_omi_backend/services/memory/providers/vector_stores.py

146-146: Consider moving this statement to an else block

(TRY300)

380-380: Consider moving this statement to an else block

(TRY300)

382-382: Do not catch blind exception: Exception

(BLE001)

383-383: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

442-442: Consider moving this statement to an else block

(TRY300)

444-444: Do not catch blind exception: Exception

(BLE001)

445-445: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

backends/advanced/src/advanced_omi_backend/config.py

168-168: Do not catch blind exception: Exception

(BLE001)

179-179: Do not catch blind exception: Exception

(BLE001)

180-180: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

backends/advanced/src/advanced_omi_backend/model_registry.py

292-292: Do not catch blind exception: Exception

(BLE001)

293-293: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

backends/advanced/src/advanced_omi_backend/workers/speaker_jobs.py

20-20: Unused function argument: overlap

(ARG001)

351-351: Abstract raise to an inner function

(TRY301)

351-351: Avoid specifying long messages outside the exception class

(TRY003)

386-386: f-string without any placeholders

Remove extraneous f prefix

(F541)

backends/advanced/src/advanced_omi_backend/workers/waveform_jobs.py

172-178: Consider moving this statement to an else block

(TRY300)

backends/advanced/src/advanced_omi_backend/utils/audio_chunk_utils.py

434-434: Avoid specifying long messages outside the exception class

(TRY003)

448-448: Avoid specifying long messages outside the exception class

(TRY003)

596-599: Avoid specifying long messages outside the exception class

(TRY003)

762-765: Avoid specifying long messages outside the exception class

(TRY003)

backends/advanced/src/scripts/cleanup_state.py

66-66: Do not catch blind exception: Exception

(BLE001)

135-135: Consider moving this statement to an else block

(TRY300)

188-188: Do not catch blind exception: Exception

(BLE001)

278-278: Unused method argument: stats

(ARG002)

289-289: Do not catch blind exception: Exception

(BLE001)

328-328: Do not catch blind exception: Exception

(BLE001)

420-420: Do not catch blind exception: Exception

(BLE001)

480-480: Consider moving this statement to an else block

(TRY300)

523-523: Local variable result is assigned to but never used

Remove assignment to unused variable result

(F841)

559-559: Do not catch blind exception: Exception

(BLE001)

608-608: Do not catch blind exception: Exception

(BLE001)

630-630: Do not catch blind exception: Exception

(BLE001)

657-657: Do not catch blind exception: Exception

(BLE001)

676-676: Do not catch blind exception: Exception

(BLE001)

804-804: Do not catch blind exception: Exception

(BLE001)

backends/advanced/src/advanced_omi_backend/services/memory/providers/chronicle.py

319-319: Consider moving this statement to an else block

(TRY300)

320-320: Do not catch blind exception: Exception

(BLE001)

321-321: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

coderabbitai · 2026-01-14T14:55:58Z

backends/advanced/Docs/memories.md

   - Ensure embedding model is available in Ollama
   - Check vector dimensions match between embedder and Qdrant
-   - Verify collection has vectors: `curl http://localhost:6333/collections/omi_memories`
+   - Verify collection has vectors: `curl http://localhost:6333/collections/chronicle_memories`


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

rg -n "omi_memories" --type md --type py --type yaml --type json

Repository: SimpleOpenSoftware/chronicle

Length of output: 417

Update collection name references in configuration files.

The old omi_memories collection name still appears in multiple configuration files that need to be updated to match the new chronicle_memories name:

config/defaults.yml:198

tests/configs/deepgram-openai.yml:50

tests/configs/parakeet-openai.yml:44

tests/configs/parakeet-ollama.yml:44

tests/configs/mock-services.yml:40

These configuration files will cause failures if the old collection doesn't exist in the system.

🤖 Prompt for AI Agents

In `@backends/advanced/Docs/memories.md` at line 502, Search the repo for the string "omi_memories" and replace it with "chronicle_memories" in all configuration and test YAML files (the defaults and test configs referenced in the review), update any docs examples (e.g., curl/verification lines) to use "chronicle_memories", and run tests to ensure no remaining references to "omi_memories" remain.

coderabbitai · 2026-01-14T14:55:59Z

backends/advanced/src/advanced_omi_backend/models/waveform.py

+    class Settings:
+        name = "waveforms"
+        indexes = [
+            "conversation_id",  # Unique lookup by conversation
+        ]


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's find the WaveformData model and examine it find . -type f -name "*.py" | xargs grep -l "class WaveformData" 2>/dev/null

Repository: SimpleOpenSoftware/chronicle

Length of output: 135

🏁 Script executed:

# Also search for the waveform.py file to understand its full structure find . -type f -name "waveform.py" -path "*/advanced_omi_backend/models/*"

Repository: SimpleOpenSoftware/chronicle

Length of output: 135

🏁 Script executed:

# Check how conversation_id is used in the model and elsewhere rg -n "conversation_id" --type=py backends/advanced/src/advanced_omi_backend/models/ -B 2 -A 2

Repository: SimpleOpenSoftware/chronicle

Length of output: 6481

🏁 Script executed:

# Look for any constraints or validation on WaveformData rg -n "WaveformData\|conversation_id" --type=py backends/advanced/src/advanced_omi_backend/ | head -50

Repository: SimpleOpenSoftware/chronicle

Length of output: 54

🏁 Script executed:

cat -n backends/advanced/src/advanced_omi_backend/models/waveform.py

Repository: SimpleOpenSoftware/chronicle

Length of output: 1766

🏁 Script executed:

# Also check if there's any application logic validating uniqueness of waveform per conversation rg -n "WaveformData\|waveform" --type=py backends/advanced/src/advanced_omi_backend/ | grep -i "create\|insert\|unique\|validation" | head -30

Repository: SimpleOpenSoftware/chronicle

Length of output: 54

🏁 Script executed:

# Check for any repository/database query patterns that assume one waveform per conversation rg -n "find_one.*waveform\|waveform.*find_one" --type=py backends/advanced/src/advanced_omi_backend/ -B 2 -A 2

Repository: SimpleOpenSoftware/chronicle

Length of output: 54

🏁 Script executed:

# Search for waveform creation and update patterns rg -n "WaveformData\(" --type=py backends/advanced/src/advanced_omi_backend/ -B 3 -A 5

Repository: SimpleOpenSoftware/chronicle

Length of output: 1872

🏁 Script executed:

# Check for any tests that show waveform creation patterns find backends/advanced -type f -name "*test*.py" | xargs grep -l "WaveformData\|waveform" 2>/dev/null

Repository: SimpleOpenSoftware/chronicle

Length of output: 56

🏁 Script executed:

# Look for service/repository methods that interact with waveforms rg -n "class.*Repository\|def.*waveform\|def.*create.*waveform" --type=py backends/advanced/src/advanced_omi_backend/ -B 1 -A 3

Repository: SimpleOpenSoftware/chronicle

Length of output: 54

🏁 Script executed:

# Check the waveform_jobs.py file to see how waveforms are stored cat -n backends/advanced/src/advanced_omi_backend/workers/waveform_jobs.py | sed -n '155,175p'

Repository: SimpleOpenSoftware/chronicle

Length of output: 869

🏁 Script executed:

# Search for any insert/update logic for waveforms rg -n "insert\|insert_one\|replace\|update" --type=py backends/advanced/src/advanced_omi_backend/workers/waveform_jobs.py -B 2 -A 2

Repository: SimpleOpenSoftware/chronicle

Length of output: 54

🏁 Script executed:

# Check if there's logic to handle duplicate waveforms or find existing ones rg -n "find.*conversation_id\|replace_one\|update_one" --type=py backends/advanced/src/advanced_omi_backend/ -B 1 -A 3 | grep -i waveform -A 5 -B 5

Repository: SimpleOpenSoftware/chronicle

Length of output: 54

🏁 Script executed:

# Check if Beanie's unique index syntax uses a list wrapper for uniqueness rg -n "indexes.*\[.*unique\|unique.*index" --type=py backends/advanced/src/advanced_omi_backend/models/ -B 1 -A 3

Repository: SimpleOpenSoftware/chronicle

Length of output: 54

🏁 Script executed:

# Look at how Beanie is configured and check if there's a way to define unique indexes rg -n "class Settings\|indexes = " --type=py backends/advanced/src/advanced_omi_backend/models/ -A 5 | head -60

Repository: SimpleOpenSoftware/chronicle

Length of output: 54

🏁 Script executed:

# Get all Settings classes to see index patterns rg -n "class Settings" --type=py backends/advanced/src/advanced_omi_backend/models/ -A 10

Repository: SimpleOpenSoftware/chronicle

Length of output: 3865

🏁 Script executed:

# Check Beanie documentation on unique indexes in the code comments rg -n "unique.*index\|\[.*\(.*unique" --type=py backends/advanced/src/advanced_omi_backend/models/

Repository: SimpleOpenSoftware/chronicle

Length of output: 54

🏁 Script executed:

# Let's examine conversation.py Settings section to see how unique constraints are defined sed -n '320,335p' backends/advanced/src/advanced_omi_backend/models/conversation.py

Repository: SimpleOpenSoftware/chronicle

Length of output: 449

🏁 Script executed:

# Verify how Beanie defines unique indexes by looking at documentation comments or other unique field examples rg -n "unique=True" --type=py backends/advanced/src/advanced_omi_backend/models/

Repository: SimpleOpenSoftware/chronicle

Length of output: 284

🏁 Script executed:

# Check if there's any code that queries for existing waveforms before inserting (which would suggest one-per-conversation intent) rg -n "find.*waveform\|WaveformData.*find" --type=py backends/advanced/src/advanced_omi_backend/ -B 2 -A 3

Repository: SimpleOpenSoftware/chronicle

Length of output: 54

🏁 Script executed:

# Check what happens when waveform job runs - does it replace or just insert? sed -n '140,180p' backends/advanced/src/advanced_omi_backend/workers/waveform_jobs.py

Repository: SimpleOpenSoftware/chronicle

Length of output: 1490

Add unique=True to conversation_id field if one waveform per conversation is required.

The field description states "unique per conversation" and the index comment claims "Unique lookup by conversation", but the implementation doesn't enforce this constraint. Following the pattern used in Conversation model (line 84 of conversation.py), change Indexed(str) to Indexed(str, unique=True) on line 20. Currently, await waveform_doc.insert() allows duplicate conversation_ids, which contradicts the stated intent.

🧰 Tools

🪛 Ruff (0.14.11)

45-47: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

🤖 Prompt for AI Agents

In `@backends/advanced/src/advanced_omi_backend/models/waveform.py` around lines 43 - 47, The waveform model's conversation_id isn't enforced unique despite the comment; update the field definition from Indexed(str) to Indexed(str, unique=True) (the same pattern used in Conversation) so inserts cannot create duplicate conversation_id values, and keep the Settings.indexes entry as-is; modify the conversation_id declaration in the Waveform model to use Indexed(str, unique=True).

coderabbitai · 2026-01-14T14:55:59Z

backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py

+    # Parse Range header (e.g., "bytes=0-1023")
+    try:
+        range_str = range_header.replace("bytes=", "")
+        range_start, range_end = range_str.split("-")
+        range_start = int(range_start) if range_start else 0
+        range_end = int(range_end) if range_end else file_size - 1
+
+        # Ensure valid range
+        range_start = max(0, range_start)
+        range_end = min(file_size - 1, range_end)
+        content_length = range_end - range_start + 1
+
+        # Extract requested byte range
+        range_data = wav_data[range_start:range_end + 1]
+
+        # Return 206 Partial Content with Range headers
+        return Response(
+            content=range_data,
+            status_code=206,
+            media_type="audio/wav",
+            headers={
+                "Content-Range": f"bytes {range_start}-{range_end}/{file_size}",
+                "Content-Length": str(content_length),
+                "Accept-Ranges": "bytes",
+                "Content-Disposition": f"inline; filename={conversation_id}.wav",
+                "X-Audio-Source": "mongodb-chunks",
+            }
+        )
+    except (ValueError, IndexError) as e:
+        # Invalid Range header, return 416 Range Not Satisfiable
+        return Response(
+            status_code=416,
+            headers={
+                "Content-Range": f"bytes */{file_size}"
+            }
+        )


⚠️ Potential issue | 🟡 Minor

Invalid Range requests can produce malformed responses.

The Range parsing doesn't validate that range_start <= range_end after parsing. For requests like bytes=100-50, this produces:

content_length = 50 - 100 + 1 = -49 (negative)

Empty content from wav_data[100:51]

A response with invalid Content-Length: -49

Also, the static analysis correctly notes the unused variable e on line 156.

🐛 Proposed fix to validate range and remove unused variable

# Ensure valid range range_start = max(0, range_start) range_end = min(file_size - 1, range_end) + + # Validate range_start <= range_end + if range_start > range_end: + return Response( + status_code=416, + headers={ + "Content-Range": f"bytes */{file_size}" + } + ) + content_length = range_end - range_start + 1 # Extract requested byte range range_data = wav_data[range_start:range_end + 1]

- except (ValueError, IndexError) as e: + except (ValueError, IndexError): # Invalid Range header, return 416 Range Not Satisfiable return Response( status_code=416,

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Parse Range header (e.g., "bytes=0-1023")

try:

range_str = range_header.replace("bytes=", "")

range_start, range_end = range_str.split("-")

range_start = int(range_start) if range_start else 0

range_end = int(range_end) if range_end else file_size - 1

# Ensure valid range

range_start = max(0, range_start)

range_end = min(file_size - 1, range_end)

content_length = range_end - range_start + 1

# Extract requested byte range

range_data = wav_data[range_start:range_end + 1]

# Return 206 Partial Content with Range headers

return Response(

content=range_data,

status_code=206,

media_type="audio/wav",

headers={

"Content-Range": f"bytes {range_start}-{range_end}/{file_size}",

"Content-Length": str(content_length),

"Accept-Ranges": "bytes",

"Content-Disposition": f"inline; filename={conversation_id}.wav",

"X-Audio-Source": "mongodb-chunks",

}

)

except (ValueError, IndexError) as e:

# Invalid Range header, return 416 Range Not Satisfiable

return Response(

status_code=416,

headers={

"Content-Range": f"bytes */{file_size}"

}

)

# Parse Range header (e.g., "bytes=0-1023")

try:

range_str = range_header.replace("bytes=", "")

range_start, range_end = range_str.split("-")

range_start = int(range_start) if range_start else 0

range_end = int(range_end) if range_end else file_size - 1

# Ensure valid range

range_start = max(0, range_start)

range_end = min(file_size - 1, range_end)

# Validate range_start <= range_end

if range_start > range_end:

return Response(

status_code=416,

headers={

"Content-Range": f"bytes */{file_size}"

}

)

content_length = range_end - range_start + 1

# Extract requested byte range

range_data = wav_data[range_start:range_end + 1]

# Return 206 Partial Content with Range headers

return Response(

content=range_data,

status_code=206,

media_type="audio/wav",

headers={

"Content-Range": f"bytes {range_start}-{range_end}/{file_size}",

"Content-Length": str(content_length),

"Accept-Ranges": "bytes",

"Content-Disposition": f"inline; filename={conversation_id}.wav",

"X-Audio-Source": "mongodb-chunks",

}

)

except (ValueError, IndexError):

# Invalid Range header, return 416 Range Not Satisfiable

return Response(

status_code=416,

headers={

"Content-Range": f"bytes */{file_size}"

}

)

🧰 Tools

🪛 Ruff (0.14.11)

156-156: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

🤖 Prompt for AI Agents

In `@backends/advanced/src/advanced_omi_backend/routers/modules/audio_routes.py` around lines 128 - 163, Validate the parsed byte range after splitting range_header: after computing range_start and range_end (and clamping to 0..file_size-1) ensure range_start <= range_end; if not, return a 416 Range Not Satisfiable with "Content-Range: bytes */{file_size}". Also guard against negative content_length by computing content_length = range_end - range_start + 1 only after the validation, and use the slice wav_data[range_start:range_end + 1] only when valid. Remove the unused exception variable `e` in the except clause so the handler catches (ValueError, IndexError) and returns 416 without referencing `e`.

coderabbitai · 2026-01-14T14:55:59Z

backends/advanced/src/advanced_omi_backend/services/memory/config.py

    host: str = "localhost",
    port: int = 6333,
-    collection_name: str = "omi_memories",
+    collection_name: str = "chronicle_memories",


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Search for references to "omi_memories" across the codebase echo "=== References to omi_memories ===" rg -n "omi_memories" --type py --type md --type yaml --type json --type txt 2>/dev/null | head -50 echo "" echo "=== Migration documentation or scripts ===" fd -t f $ -name "*migrat*" -o -name "*MIGRAT*" $ 2>/dev/null | head -20 echo "" echo "=== Checking for environment variable configuration ===" rg -n "COLLECTION|collection" backends/advanced/src/advanced_omi_backend/services/memory/config.py -A 2 -B 2 2>/dev/null | head -80

Repository: SimpleOpenSoftware/chronicle

Length of output: 1336

Address inconsistency between code defaults and configuration file defaults for collection name migration.

The code default for collection_name changed to "chronicle_memories" (lines 143, 266), but existing configuration files (config/defaults.yml, test configs) still reference "omi_memories". This creates an inconsistent upgrade experience:

Deployments using config files will retain the old collection name (preserving existing data)

Deployments without explicit configuration may switch to the new name, causing data access issues

Since existing configurations still specify "omi_memories", consider:

Update configuration files to use the new default name, OR

Revert the code default to match existing configs, OR

Add migration documentation explaining the change and how to explicitly configure the collection name

No migration scripts or documentation currently exists for this breaking change.

🤖 Prompt for AI Agents

In `@backends/advanced/src/advanced_omi_backend/services/memory/config.py` at line 143, The code default for collection_name (variable collection_name in config.py) was changed to "chronicle_memories" while existing defaults.yml and tests still reference "omi_memories", causing a breaking mismatch; fix by reverting the config.py default back to "omi_memories" (restore collection_name: str = "omi_memories") so existing deployments keep working, update any unit tests that assert the default to match, and add a short migration note in project docs explaining the planned rename and how to opt into "chronicle_memories" via explicit config so future changes are non-breaking.

coderabbitai · 2026-01-14T14:55:59Z

backends/advanced/src/advanced_omi_backend/utils/audio_chunk_utils.py

+    # Get conversation metadata
+    conversation = await Conversation.get(conversation_id)
+
+    if not conversation:
+        raise ValueError(f"Conversation {conversation_id} not found")


⚠️ Potential issue | 🔴 Critical

Incorrect document lookup method.

Conversation.get(conversation_id) retrieves a document by its MongoDB _id. However, conversation_id is a separate indexed field on the Conversation model, not the document's _id. This will fail to find conversations.

Use the same pattern as elsewhere in this file (lines 669-671):

🐛 Proposed fix

- conversation = await Conversation.get(conversation_id) + conversation = await Conversation.find_one( + Conversation.conversation_id == conversation_id + )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Get conversation metadata

conversation = await Conversation.get(conversation_id)

if not conversation:

raise ValueError(f"Conversation {conversation_id} not found")

# Get conversation metadata

conversation = await Conversation.find_one(

Conversation.conversation_id == conversation_id

)

if not conversation:

raise ValueError(f"Conversation {conversation_id} not found")

🧰 Tools

🪛 Ruff (0.14.11)

434-434: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents

In `@backends/advanced/src/advanced_omi_backend/utils/audio_chunk_utils.py` around lines 430 - 434, The code incorrectly uses Conversation.get(conversation_id) which looks up by MongoDB _id; replace it with the indexed-field lookup used elsewhere (pattern at lines 669-671) by awaiting Conversation.find_one with the filter on the conversation_id field (e.g., await Conversation.find_one({"conversation_id": conversation_id}) or equivalent keyword-arg form) and keep the existing not-found ValueError branch.

coderabbitai · 2026-01-14T14:55:59Z

backends/advanced/src/advanced_omi_backend/workers/waveform_jobs.py

+    sample_rate: int = 3,
+) -> Dict[str, Any]:
+    """
+    Generate waveform visualization data from conversation audio chunks.
+
+    This function:
+    1. Retrieves Opus-compressed audio chunks from MongoDB
+    2. Decodes each chunk to PCM
+    3. Downsamples PCM to target sample rate (e.g., 10 samples/sec)
+    4. Calculates amplitude peaks for each sample window
+    5. Normalizes to [-1.0, 1.0] range
+    6. Stores in WaveformData collection
+
+    Args:
+        conversation_id: Conversation ID to generate waveform for
+        sample_rate: Samples per second for waveform (default: 10)


⚠️ Potential issue | 🟡 Minor

Docstring/signature mismatch for sample_rate default.

The docstring states the default is 10, but the actual default in the signature is 3. Update the docstring to reflect the correct default.

📝 Suggested fix

- sample_rate: Samples per second for waveform (default: 10) + sample_rate: Samples per second for waveform (default: 3)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

sample_rate: int = 3,

) -> Dict[str, Any]:

"""

Generate waveform visualization data from conversation audio chunks.

This function:

1. Retrieves Opus-compressed audio chunks from MongoDB

2. Decodes each chunk to PCM

3. Downsamples PCM to target sample rate (e.g., 10 samples/sec)

4. Calculates amplitude peaks for each sample window

5. Normalizes to [-1.0, 1.0] range

6. Stores in WaveformData collection

Args:

conversation_id: Conversation ID to generate waveform for

sample_rate: Samples per second for waveform (default: 10)

sample_rate: int = 3,

) -> Dict[str, Any]:

"""

Generate waveform visualization data from conversation audio chunks.

This function:

1. Retrieves Opus-compressed audio chunks from MongoDB

2. Decodes each chunk to PCM

3. Downsamples PCM to target sample rate (e.g., 10 samples/sec)

4. Calculates amplitude peaks for each sample window

5. Normalizes to [-1.0, 1.0] range

6. Stores in WaveformData collection

Args:

conversation_id: Conversation ID to generate waveform for

sample_rate: Samples per second for waveform (default: 3)

🤖 Prompt for AI Agents

In `@backends/advanced/src/advanced_omi_backend/workers/waveform_jobs.py` around lines 19 - 34, The docstring for the function that accepts parameter sample_rate incorrectly states the default is 10 while the function signature sets sample_rate: int = 3; update the docstring to state the correct default (3) so the signature and documentation match, referencing the sample_rate parameter in the function (waveform generation function in waveform_jobs.py) and adjust the "Args" section to list "sample_rate: Samples per second for waveform (default: 3)".

coderabbitai · 2026-01-14T14:56:00Z

backends/advanced/src/scripts/cleanup_state.py

+    async def _export_audio_wav(self, stats: CleanupStats):
+        """Export audio as WAV files (1-minute chunks)"""
+        logger.info("Exporting audio as WAV files (this may take a while)...")
+
+        # Get all unique conversation IDs
+        conversations = await Conversation.find_all().to_list()
+        audio_dir = self.backup_path / "audio"
+
+        for conv in conversations:
+            try:
+                await self._export_conversation_audio(conv.conversation_id, audio_dir)
+            except Exception as e:
+                logger.warning(f"Failed to export audio for {conv.conversation_id}: {e}")
+                continue
+
+        logger.info("Audio export completed")


⚠️ Potential issue | 🟡 Minor

Unused stats parameter in _export_audio_wav.

The stats parameter is passed to this method but never used. Either remove it from the signature or use it to track export statistics (e.g., count of exported WAV files).

🐛 Suggested fix - remove unused parameter

- async def _export_audio_wav(self, stats: CleanupStats): + async def _export_audio_wav(self):

And update the call at line 120:

- await self._export_audio_wav(stats) + await self._export_audio_wav()

🧰 Tools

🪛 Ruff (0.14.11)

278-278: Unused method argument: stats

(ARG002)

289-289: Do not catch blind exception: Exception

(BLE001)

coderabbitai · 2026-01-14T14:56:00Z

backends/advanced/webui/src/components/audio/WaveformDisplay.tsx

+  const drawWaveform = (
+    ctx: CanvasRenderingContext2D,
+    samples: number[],
+    width: number,
+    height: number
+  ) => {
+    const barWidth = width / samples.length;
+    const centerY = height / 2;
+
+    ctx.fillStyle = '#3b82f6'; // Blue bars (Tailwind blue-500)
+
+    samples.forEach((amplitude, i) => {
+      const x = i * barWidth;
+      const barHeight = Math.max(1, amplitude * centerY); // Ensure minimum 1px height
+
+      // Draw bar centered vertically
+      ctx.fillRect(x, centerY - barHeight, barWidth - 1, barHeight * 2);
+    });
+  };


⚠️ Potential issue | 🟡 Minor

Guard against empty samples array.

If samples is empty, barWidth becomes Infinity (division by zero), and the forEach will be a no-op anyway. Add an early return for clarity and to prevent potential issues.

🐛 Suggested fix

const drawWaveform = ( ctx: CanvasRenderingContext2D, samples: number[], width: number, height: number ) => { + if (samples.length === 0) return; + const barWidth = width / samples.length; const centerY = height / 2;

🤖 Prompt for AI Agents

In `@backends/advanced/webui/src/components/audio/WaveformDisplay.tsx` around lines 77 - 95, The drawWaveform function can divide by zero when samples is empty; add an early guard at the start of drawWaveform that returns immediately if !samples || samples.length === 0 to avoid computing barWidth = width / samples.length and any downstream issues; update the function (drawWaveform) to perform this check before computing barWidth/centerY and before using samples.forEach.

coderabbitai · 2026-01-14T14:56:00Z

tests/integration/sdk_tests.robot

+SDK Upload Respects Backend File Size Limit
+    [Documentation]    Verify SDK properly reports backend errors for oversized files
+    [Tags]    audio-upload
+
+    # Note: This tests that SDK handles backend rejection gracefully
+    # The 30-minute limit is enforced by the backend, not the SDK
+    # Full test would require a 30+ minute audio file
+
+    ${result}=    Run Process    uv    run    python
+    ...    ${CURDIR}/../scripts/sdk_test_auth.py
+    ...    ${BACKEND_URL}    ${ADMIN_EMAIL}    ${ADMIN_PASSWORD}
+    Should Be Equal As Integers    ${result.rc}    0    SDK should handle backend errors gracefully
+


⚠️ Potential issue | 🟡 Minor

Test case doesn't match its documented purpose.

This test claims to verify "SDK properly reports backend errors for oversized files" but actually just re-runs the authentication script. Consider either:

Skipping this test like other unimplemented features

Renaming it to reflect what it actually tests

Proposed fix - skip until properly implemented

SDK Upload Respects Backend File Size Limit [Documentation] Verify SDK properly reports backend errors for oversized files [Tags] audio-upload - - # Note: This tests that SDK handles backend rejection gracefully - # The 30-minute limit is enforced by the backend, not the SDK - # Full test would require a 30+ minute audio file - - ${result}= Run Process uv run python - ... ${CURDIR}/../scripts/sdk_test_auth.py - ... ${BACKEND_URL} ${ADMIN_EMAIL} ${ADMIN_PASSWORD} - Should Be Equal As Integers ${result.rc} 0 SDK should handle backend errors gracefully + Skip Test requires 30+ minute audio file - not implemented

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

SDK Upload Respects Backend File Size Limit

[Documentation] Verify SDK properly reports backend errors for oversized files

[Tags] audio-upload

# Note: This tests that SDK handles backend rejection gracefully

# The 30-minute limit is enforced by the backend, not the SDK

# Full test would require a 30+ minute audio file

${result}= Run Process uv run python

... ${CURDIR}/../scripts/sdk_test_auth.py

... ${BACKEND_URL} ${ADMIN_EMAIL} ${ADMIN_PASSWORD}

Should Be Equal As Integers ${result.rc} 0 SDK should handle backend errors gracefully

SDK Upload Respects Backend File Size Limit

[Documentation] Verify SDK properly reports backend errors for oversized files

[Tags] audio-upload

Skip Test requires 30+ minute audio file - not implemented

🤖 Prompt for AI Agents

In `@tests/integration/sdk_tests.robot` around lines 59 - 71, The test "SDK Upload Respects Backend File Size Limit" currently only re-runs the authentication script (Run Process ... ${CURDIR}/../scripts/sdk_test_auth.py) and doesn't exercise upload/oversize behavior; either mark it as skipped until implemented or rename it to reflect it's a placeholder: add a skip tag (e.g., add "skip" to the [Tags] line) and update the test name or [Documentation] to indicate it's a placeholder that re-runs sdk_test_auth.py, or alternatively change the test name to something like "Placeholder: Re-run SDK auth script" so the intent and behavior match.

coderabbitai · 2026-01-14T14:56:00Z

tests/resources/queue_keywords.robot

+Get Job Result
+    [Documentation]    Get the result field of a completed job
+    ...                Useful for checking job output/return values
+    [Arguments]    ${job_id}
+
+    # Get full job details
+    ${response}=    GET On Session    api    /api/queue/job/${job_id}
+    ...    expected_status=200
+
+    ${job_data}=    Set Variable    ${response.json()}
+    ${result}=    Set Variable    ${job_data}[result]
+
+    RETURN    ${result}


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Search for queue API route definitions # First, let's find the structure of the repository echo "=== Repository structure ===" find . -type f -name "*.py" | head -20 echo "" echo "=== Searching for API route definitions with 'queue' and 'job' ===" rg -n --type py '/api/queue/job' | head -20 echo "" echo "=== Searching for route decorators with queue and job endpoints ===" rg -n --type py '@.*route.*queue.*job|@.*get.*queue.*job|@.*post.*queue.*job' -C3 echo "" echo "=== Searching for 'queue' in route definitions ===" rg -n --type py '/queue/jobs?/' | head -30

Repository: SimpleOpenSoftware/chronicle

Length of output: 2106

🏁 Script executed:

#!/bin/bash # Search for queue controller and router definitions echo "=== Queue Controller File ===" cat -n backends/advanced/src/advanced_omi_backend/controllers/queue_controller.py | head -100 echo "" echo "=== Queue-related files in routers ===" find . -type f -name "*.py" -path "*/routers/*" | xargs grep -l "queue" 2>/dev/null echo "" echo "=== Main router setup ===" find . -type f -name "*router*.py" -o -name "routes.py" | head -10

Repository: SimpleOpenSoftware/chronicle

Length of output: 4783

🏁 Script executed:

#!/bin/bash # Examine queue routes for endpoint definitions echo "=== Queue Routes File ===" wc -l backends/advanced/src/advanced_omi_backend/routers/modules/queue_routes.py echo "" echo "=== Complete Queue Routes File ===" cat -n backends/advanced/src/advanced_omi_backend/routers/modules/queue_routes.py

Repository: SimpleOpenSoftware/chronicle

Length of output: 50384

🏁 Script executed:

#!/bin/bash # Check the actual endpoint in the test file echo "=== Checking lines 330-342 in test file ===" sed -n '330,342p' tests/resources/queue_keywords.robot

Repository: SimpleOpenSoftware/chronicle

Length of output: 554

Fix endpoint path: change /api/queue/job/ to /api/queue/jobs/

The endpoint uses the singular form /api/queue/job/${job_id}, but the API backend only defines endpoints with the plural form /api/queue/jobs/{job_id}. This mismatch will cause a 404 error at runtime. All other job-specific endpoints in this file correctly use the plural form.

Verified Fix

# Get full job details - ${response}= GET On Session api /api/queue/job/${job_id} + ${response}= GET On Session api /api/queue/jobs/${job_id} ... expected_status=200

🤖 Prompt for AI Agents

In `@tests/resources/queue_keywords.robot` around lines 330 - 342, The Robot keyword "Get Job Result" is calling the wrong endpoint path "/api/queue/job/${job_id}" which should be plural; update the GET call in that keyword to use "/api/queue/jobs/${job_id}" so it matches the backend route and will return 200; ensure the GET On Session invocation and any related variables (${response}, ${job_data}, ${result}) remain unchanged.

AnkushMalaker added 2 commits January 14, 2026 09:27

AnkushMalaker changed the base branch from fix/audio-pipe-mongo to fix/audio-pipe-3 January 14, 2026 10:00

archive

078c602

coderabbitai bot reviewed Jan 14, 2026

View reviewed changes

Fix/conversation view mongo #269

Are you sure you want to change the base?

Fix/conversation view mongo #269

Uh oh!

Conversation

AnkushMalaker commented Jan 14, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagrams

Estimated code review effort

Uh oh!

github-actions bot commented Jan 14, 2026

⚠️ Robot Framework Test Results (No API Keys)

📊 View Reports

Uh oh!

github-actions bot commented Jan 14, 2026

⚠️ Robot Framework Test Results (No API Keys)

📊 View Reports

Uh oh!

github-actions bot commented Jan 14, 2026

⚠️ Robot Framework Test Results (No API Keys)

📊 View Reports

Uh oh!

AnkushMalaker commented Jan 14, 2026

Uh oh!

coderabbitai bot commented Jan 14, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AnkushMalaker commented Jan 14, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 14, 2026 •

edited

Loading