Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
559baf3
cometchat in a box
swapnil-cometchat Jan 6, 2026
ea3c4c2
updates docs
swapnil-cometchat Jan 8, 2026
86b137b
updates content
swapnil-cometchat Jan 13, 2026
e87343a
Merge branch 'main' into docs/cometchat-in-a-box
swapnil-cometchat Jan 16, 2026
47c4326
Update docs.json
swapnil-cometchat Jan 16, 2026
1c216ea
updates docs
swapnil-cometchat Jan 16, 2026
aed4b9e
updates doc
swapnil-cometchat Jan 16, 2026
0a7442c
updates the diagram
swapnil-cometchat Jan 16, 2026
946175f
adds docs for cometchat on prem
swapnil-cometchat Jan 19, 2026
3c01e2c
Update docs.json
swapnil-cometchat Jan 19, 2026
6e09d2c
Update overview.mdx
swapnil-cometchat Jan 19, 2026
24eaf79
Update overview.mdx
swapnil-cometchat Jan 19, 2026
832c5ed
removes what's next
swapnil-cometchat Jan 19, 2026
060bdac
Update docs.json
swapnil-cometchat Jan 20, 2026
c06f70a
updates docs
swapnil-cometchat Jan 20, 2026
a0384d7
Update docs.json
swapnil-cometchat Jan 20, 2026
0fa848e
updates location
swapnil-cometchat Jan 20, 2026
c0d5167
Update index.mdx
swapnil-cometchat Jan 20, 2026
2081ed1
updates the tab for On Prem
swapnil-cometchat Jan 21, 2026
bcf4ad0
updates css
swapnil-cometchat Jan 21, 2026
c3ea448
Update docs.json
swapnil-cometchat Jan 21, 2026
5c213e7
Update docs.json
swapnil-cometchat Jan 21, 2026
bc613cc
Update docs.json
swapnil-cometchat Jan 21, 2026
0a913b3
Update docs.json
swapnil-cometchat Jan 21, 2026
c116d2e
Update docs.json
swapnil-cometchat Jan 21, 2026
63ce99a
merge main branch
swapnil-cometchat Jan 22, 2026
0870a5c
Update docs.json
swapnil-cometchat Jan 22, 2026
d905d19
Update docs.json
swapnil-cometchat Jan 22, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 5 additions & 7 deletions assets/version-aligner.css
Original file line number Diff line number Diff line change
Expand Up @@ -51,9 +51,9 @@ html.cc-version-aligned #sidebar-content .cc-version-aligned-row [data-version-a
border-radius: 999px !important;
border: 1px solid rgba(15, 23, 42, 0.15);
transition: border-color 0.2s ease, background-color 0.2s ease;
width: 10rem;
max-width: 10rem;
flex: 0 0 10rem;
width: 12rem;
max-width: 12rem;
flex: 0 0 12rem;
overflow: visible;
}

Expand Down Expand Up @@ -100,14 +100,12 @@ html.cc-version-aligned #sidebar-content [data-version-aligner-button] {
.prose :where(img):not(:where([class~=not-prose],[class~=not-prose] *)) {
margin-top: 0em;
margin-bottom: 0em;
}
}

:not(pre)>code {
padding: .125rem 0rem;
}

:not(pre)>code {
padding: .125rem 0rem;
}


}
25 changes: 25 additions & 0 deletions cometchat-on-prem/docker/air-gapped-deployment.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
---
title: "Air-Gapped Deployment"
sidebarTitle: "Air-Gapped"
---

Guidelines for deploying the platform in offline or isolated (air-gapped) environments.

## Offline installation steps

- Export required Docker images with `docker save`
- Transfer images via removable media, secure copy (SSH), or an isolated internal network
- Import images on the target system with `docker load`

## Local registry

- Host images in Harbor, Nexus, or a private Docker registry
- Enforce role-based access control (RBAC) and image retention policies

## Limitations in air-gapped mode

- No access to external push notification services
- No S3 or other cloud object storage unless internally emulated
- No cloud-hosted analytics, logging, or monitoring integrations

Air-gapped deployments require careful planning for certificate management, image updates, and backup strategies. For assistance with compliance requirements (HIPAA, FedRAMP, ISO 27001) or custom air-gapped architectures, [contact us](https://www.cometchat.com/contact-sales).
121 changes: 121 additions & 0 deletions cometchat-on-prem/docker/configuration-reference.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
---
title: "Configuration Reference"
sidebarTitle: "Configuration Reference"
---

Use this reference when updating domains, migrating environments, troubleshooting misconfiguration, or performing production deployments. Values are sourced from `docker-compose.yml`, service-level `.env` files, and the domain update guide.

Use this when:
- Updating domains
- Migrating environments
- Troubleshooting service misconfiguration
- Performing production deployments

## Global notes

- All services read environment variables from their respective directories.
- Domain values must be updated consistently across API, WebSocket, Notifications, Webhooks, and NGINX configurations.
- Changing the primary domain impacts reverse proxy routing, OAuth headers, CORS, webhook endpoints, and TiDB host references.

## Chat API

Update these values when changing domains:

- `MAIN_DOMAIN="<your-domain>"`
- `EXTENSION_DOMAIN="<your-domain>"`
- `WEBHOOKS_BASE_URL="https://webhooks.<your-domain>/v1/webhooks"`
- `TRIGGERS_BASE_URL="https://webhooks.<your-domain>/v1/triggers"`
- `EXTENSION_BASE_URL="https://notifications.<your-domain>"`
- `MODERATION_ENABLED=true`
- `RULES_BASE_URL="https://moderation.<your-domain>/v1/moderation-service"`
- `ADMIN_API_HOST="api.<your-domain>"`
- `CLIENT_API_HOST="apiclient.<your-domain>"`
- `ALLOWED_API_DOMAINS="<your-domain>,<additional-domain>"`
- `DB_HOST="tidb.<your-domain>"`
- `DB_HOST_CREATOR="tidb.<your-domain>"`
- `V3_CHAT_HOST="websocket.<your-domain>"`

## Management API (MGMT API)

- `ADMIN_API_HOST="api.<your-domain>"`
- `CLIENT_API_HOST="apiclient.<your-domain>"`
- `APP_HOST="dashboard.<your-domain>"`
- `API_HOST="https://mgmt-api.<your-domain>"`
- `MGMT_DOMAIN="<your-domain>"`
- `MGMT_DOMAIN_TO_REPLACE="<your-domain>"`
- `RULES_BASE_URL="https://moderation.<your-domain>/v1/moderation"`
- `ACCESS_CONTROL_ALLOW_ORIGIN="<your-domain>,<additional-domain>"`

## WebSocket

Hostnames are derived automatically from NGINX and Chat API configuration; no manual domain updates are required.

## Notifications service

- `CC_DOMAIN="<your-domain>"` (controls routing, token validation, and push delivery)

## Moderation service

- `CHAT_API_URL="<your-domain>"` for rule evaluation, metadata retrieval, and decision submission

## Webhooks service

- `CHAT_API_DOMAIN="<your-domain>"` - must match the Chat API domain exactly to avoid retries or signature verification failures

## Extensions

```json
"DOMAINS": [
"<allowed-domain-1>",
"<allowed-domain-2>",
"<your-domain>"
],
"DOMAIN_NAME": "<your-domain>"
```

Defines CORS and allowed origins for extension traffic.

## Receipt Updater

- `RECEIPTS_MYSQL_HOST="tidb.<your-domain>"` for delivery receipts, read receipts, and thread metadata

## SQL Consumer

```json
"CONNECTION_CONFIG": {
"host": "<tidb-host>"
},
"ALTER_USER_CONFIG": {
"host": "<tidb-host>"
},
"API_CONFIG": {
"API_DOMAIN": "<api-domain>"
}
```

Controls database migrations, multi-tenant provisioning, and internal requests to Chat API.

## NGINX configuration files

Update domain values in:

- chatapi.conf
- extensions.conf
- mgmtapi.conf
- notifications.conf
- dashboard.conf
- globalwebhooks.conf
- moderation.conf
- websocket.conf

These govern TLS termination, routing, reverse proxy rules, and WebSocket upgrades.

## Summary of domain values to update

- Chat API, Client API, and Management API
- Notifications, Moderation, Webhooks, and Extensions services
- NGINX reverse proxy hostnames
- TiDB host references
- WebSocket host configuration in Chat API

Configuration changes should be tested in staging environments before production deployment. For assistance with complex multi-region setups, custom domain architectures, or migration planning, [contact us](https://www.cometchat.com/contact-sales).
165 changes: 165 additions & 0 deletions cometchat-on-prem/docker/monitoring.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
---
title: "Monitoring"
sidebarTitle: "Monitoring"
---

Monitoring ensures system health, operational visibility, and SLA compliance for CometChat On-Prem deployments.

## Monitoring stack

The following open-source tools form the monitoring and observability stack for CometChat On-Prem deployments:

- **Prometheus**: Collects and stores metrics from all services
- **Grafana**: Visualizes metrics with dashboards and alerts
- **Loki**: Stores and queries logs from all containers
- **Promtail**: Tails logs from Docker containers and pushes them to Loki
- **Node Exporter**: Collects host-level metrics (CPU, memory, disk, network)
- **cAdvisor**: Collects container-level resource usage metrics

## Architecture

```mermaid
graph TB
Grafana[Grafana<br/>Dashboards & Visualization]
Prometheus[Prometheus<br/>Metrics Store]
Loki[Loki<br/>Log Store]
NodeExporter[Node Exporter<br/>Host Metrics]
cAdvisor[cAdvisor<br/>Container Metrics]
Promtail[Promtail<br/>Log Collection]
Swarm[Docker Swarm<br/>CometChat Services]

Grafana -->|Query Metrics| Prometheus
Grafana -->|Query Logs| Loki
Prometheus -->|Scrape /metrics| NodeExporter
Prometheus -->|Scrape /metrics| cAdvisor
Promtail -->|Push Logs| Loki
NodeExporter -->|Monitor| Swarm
cAdvisor -->|Monitor| Swarm
Promtail -->|Collect Logs| Swarm
```

## Key metrics to monitor

### Infrastructure
- CPU usage per node
- Memory usage per node
- Disk space and I/O
- Network traffic
- Container resource usage

### Application services
- WebSocket active connections
- Chat API request rate and latency
- API error rates (4xx, 5xx)
- Service uptime

### Data stores
- **Kafka**: Consumer lag, message throughput
- **Redis**: Memory usage, cache hit ratio, connected clients
- **MongoDB**: Operation latency, connections, replication lag
- **TiDB**: Query duration, region health, storage capacity

### Load balancer
- NGINX request rate
- Response status codes
- Active connections

## Alerting

Alerts should focus on user impact, capacity risks, and data integrity rather than raw metric noise.

Set up alerts for these critical conditions:

- CPU usage > 80% for 5 minutes
- Memory usage > 85% for 5 minutes
- Disk space < 15%
- Service down for 2 minutes
- Database query latency > 100ms
- Kafka consumer lag > 10,000 messages
- Redis memory > 90%
- WebSocket connection errors > 10/second
- API error rate > 5%
- Container restarts

These thresholds are recommended starting points and should be adjusted based on workload characteristics and environment scale.

## Grafana dashboards

Create dashboards to visualize:

1. **Overview**: System health, active users, request rates, error rates
2. **Infrastructure**: CPU, memory, disk, network per node
3. **WebSocket**: Active connections, message throughput, errors
4. **API**: Request rate, latency, error rates by endpoint
5. **Databases**: Query performance, connections, replication status
6. **Kafka**: Consumer lag, throughput, partition health
7. **Logs & Error Analysis**: Error aggregation, log volume, search, and correlation with metrics

### Logs & Error Analysis Dashboard

This dashboard provides centralized visibility into application errors, log patterns, and system anomalies for rapid troubleshooting and incident investigation.

**Key Visualizations:**

- **Error Volume by Service**: Time-series graph showing error log count per service, helping identify which components are experiencing issues
- **Top Error Messages**: Table displaying the most frequent error messages with occurrence counts, enabling quick identification of recurring problems
- **Log Volume Trends**: Track total log volume over time to detect unusual spikes that may indicate issues or attacks
- **Error Rate by Severity**: Breakdown of errors by severity level (CRITICAL, ERROR, WARNING) for prioritization
- **Service Health Correlation**: Side-by-side view of error logs and service metrics (CPU, memory, latency) to correlate errors with resource constraints
- **Search & Filter**: Interactive LogQL query panel for ad-hoc log searches and pattern matching
- **Recent Critical Errors**: Live feed of the latest critical errors across all services for immediate awareness

**Use Cases:**
- Rapid incident investigation by correlating errors with metric anomalies
- Identifying error patterns and root causes across distributed services
- Monitoring error trends to detect degradation before user impact
- Post-incident analysis and root cause identification
- Compliance and audit trail review

## Log queries

Use Loki's LogQL to search and filter logs across all services:

```logql
# View all errors
{service="chat-api"} |= "error"

# WebSocket connection issues
{service="websocket"} |~ "connection.*failed"

# API 5xx errors
{service="nginx"} |~ "HTTP/[0-9.]+ 5[0-9]{2}"

# High latency requests
{service="chat-api"} | json | latency > 1000
```

## Troubleshooting

### First check Grafana dashboards

Start with the Overview dashboard to determine blast radius before drilling into component-level dashboards. Confirm whether the issue is node-level, service-level, or data-store related before diving into individual components.

### Check Prometheus targets
```bash
curl http://localhost:9090/api/v1/targets
```

### Check Loki status
```bash
curl http://localhost:3100/ready
```

### View Promtail logs
```bash
docker service logs promtail
```

### Check service metrics
```bash
# Node Exporter
curl http://localhost:9100/metrics

# cAdvisor
curl http://localhost:8080/metrics
```
Loading