Monitoring & Observability
Monitor your Boards deployment with structured logging, health checks, and integration with external monitoring tools.
Health Checks
API Health Endpoint
The backend exposes a health check endpoint:
GET /health
Response:
{
"status": "healthy",
"version": "0.9.10"
}
Use this endpoint for:
- Load balancer health checks
- Kubernetes readiness/liveness probes
- Uptime monitoring services
Docker Compose
services:
api:
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8800/health"]
interval: 30s
timeout: 3s
retries: 3
start_period: 10s
Kubernetes
readinessProbe:
httpGet:
path: /health
port: 8800
initialDelaySeconds: 10
periodSeconds: 5
livenessProbe:
httpGet:
path: /health
port: 8800
initialDelaySeconds: 30
periodSeconds: 10
Structured Logging
Boards uses structlog for structured logging with consistent formatting and context.
Configuration
# Log level: debug, info, warning, error
BOARDS_LOG_LEVEL=info
# Format: console (human-readable) or json (structured)
BOARDS_LOG_FORMAT=json
Log Format
Console format (development):
2024-01-15 10:30:45 [info ] Request started method=POST path=/graphql request_id=abc123
2024-01-15 10:30:45 [info ] Request completed method=POST path=/graphql status=200 duration_ms=45
JSON format (production):
{"event": "Request started", "method": "POST", "path": "/graphql", "request_id": "abc123", "timestamp": "2024-01-15T10:30:45.123Z", "level": "info"}
{"event": "Request completed", "method": "POST", "path": "/graphql", "status": 200, "duration_ms": 45, "request_id": "abc123", "timestamp": "2024-01-15T10:30:45.168Z", "level": "info"}
Request Context
Each request includes:
request_id: Unique identifier for tracingmethod: HTTP methodpath: Request pathuser_id: Authenticated user (if available)tenant_id: Tenant identifier (if multi-tenant)
Worker Logs
Worker processes log job execution:
{"event": "Job started", "job_id": "job-123", "generator": "FluxProGenerator", "timestamp": "..."}
{"event": "Job completed", "job_id": "job-123", "duration_ms": 5432, "timestamp": "..."}
{"event": "Job failed", "job_id": "job-456", "error": "API rate limit exceeded", "timestamp": "..."}
Log Aggregation
Docker Compose
Forward logs to a file or aggregation service:
services:
api:
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
Or use a logging driver:
services:
api:
logging:
driver: "fluentd"
options:
fluentd-address: "localhost:24224"
tag: "boards.api"
Kubernetes
Logs are collected automatically by the cluster logging solution. Common options:
- EFK Stack: Elasticsearch, Fluentd, Kibana
- Loki + Grafana: Lightweight log aggregation
- Cloud Logging: AWS CloudWatch, GCP Cloud Logging, Azure Monitor
Example Fluentd config for parsing JSON logs:
<filter kubernetes.var.log.containers.boards-api**>
@type parser
key_name log
<parse>
@type json
</parse>
</filter>
External Monitoring Integration
Prometheus (Metrics)
Boards doesn't expose Prometheus metrics natively, but you can add metrics collection:
Option 1: Sidecar exporter
Use a sidecar to expose metrics from logs:
services:
api:
# ... api config
metrics-exporter:
image: google/mtail
volumes:
- api-logs:/var/log/boards:ro
- ./mtail:/etc/mtail:ro
ports:
- "3903:3903"
Option 2: Application-level metrics
Add custom instrumentation in your application layer.
Grafana
Visualize logs and metrics with Grafana:
- Connect to your log aggregation (Loki, Elasticsearch)
- Create dashboards for:
- Request rate and latency
- Error rates
- Job queue depth and processing time
- Database connection pool
Sentry (Error Tracking)
Capture and track errors with Sentry:
# Add to your application
SENTRY_DSN=https://key@sentry.io/project
For the backend, configure in your application startup:
import sentry_sdk
sentry_sdk.init(dsn=os.environ.get("SENTRY_DSN"))
Datadog
For comprehensive APM with Datadog:
services:
datadog-agent:
image: datadog/agent:latest
environment:
- DD_API_KEY=your-api-key
- DD_LOGS_ENABLED=true
- DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL=true
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- /proc/:/host/proc/:ro
- /sys/fs/cgroup/:/host/sys/fs/cgroup:ro
api:
labels:
com.datadoghq.ad.logs: '[{"source": "python", "service": "boards-api"}]'
Uptime Monitoring
External Health Checks
Use uptime monitoring services to check your deployment:
Configure to check:
GET https://api.boards.example.com/health- Expected response:
200 OK - Check interval: 1-5 minutes
Alerting
Set up alerts for:
| Condition | Severity | Action |
|---|---|---|
| Health check fails | Critical | Page on-call |
| Error rate > 5% | Warning | Notify team |
| Response time > 2s | Warning | Investigate |
| Disk usage > 80% | Warning | Scale storage |
Database Monitoring
PostgreSQL Metrics
Monitor key database metrics:
-- Active connections
SELECT count(*) FROM pg_stat_activity WHERE state = 'active';
-- Connection utilization
SELECT count(*) * 100.0 / current_setting('max_connections')::int
FROM pg_stat_activity;
-- Slow queries (requires pg_stat_statements)
SELECT query, calls, mean_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;
Redis Monitoring
Monitor Redis queue depth:
# Queue length
redis-cli LLEN boards:queue
# Memory usage
redis-cli INFO memory
Performance Debugging
Request Tracing
Use request IDs to trace requests through the system:
- Find request ID in frontend console or response headers
- Search logs:
grep "request_id=abc123" /var/log/boards/api.log - Correlate with worker logs
Slow Query Analysis
Enable slow query logging in PostgreSQL:
ALTER SYSTEM SET log_min_duration_statement = 1000; -- Log queries > 1 second
SELECT pg_reload_conf();
Recommended Monitoring Stack
For a complete monitoring setup:
| Component | Tool | Purpose |
|---|---|---|
| Logs | Loki + Grafana | Log aggregation and search |
| Metrics | Prometheus + Grafana | System and application metrics |
| Errors | Sentry | Error tracking and alerting |
| Uptime | UptimeRobot | External health monitoring |
| APM | Datadog (optional) | Full application performance |
Next Steps
- Docker Deployment - Configure logging in Docker
- Kubernetes Deployment - K8s logging and monitoring
- Configuration Reference - Logging environment variables