Disaster Recovery
Antei maintains robust disaster recovery (DR) practices to ensure service continuity, data integrity, and rapid recovery following any disruption or incident.Recovery Objectives
Metric | Target |
---|---|
Recovery Time Objective (RTO) | ≤ 2 hours for core services |
Recovery Point Objective (RPO) | ≤ 15 minutes of data loss window |
Backup Strategy
- PostgreSQL Backups (GCP)
- Automated point-in-time backups every 15 minutes
- Daily full snapshots stored for 30 days
- Cloudflare R2 Objects
- Versioned storage for all key documents (invoices, attachments)
- Lifecycle policy to retain 90 days of object versions
- Xano Metadata & Logs
- Daily exports of audit logs and configuration stored in R2
- Retention for 180 days
Failover & Continuity
- Multi-Region Read Replicas
- PostgreSQL read replicas in secondary GCP regions for failover
- Worker Redeployment
- Cloudflare Workers automatically redeployed across edge nodes
- API Layer Resilience
- Xano deployed on multiple GCP zones; automatic traffic rerouting on failure
- Auxiliary Service Redundancy
- Railway and Render services configured with health checks and retry policies
Incident Response Process
- Detection & Alerting
- Automated monitoring triggers alerts for service errors, latency spikes, and downtime
- Incident tickets created in PagerDuty (or equivalent)
- Containment & Mitigation
- Traffic rerouted to healthy regions or fallback endpoints
- Read-only mode activated if necessary to preserve data integrity
- Recovery & Restoration
- Data restored from nearest snapshot to meet RPO
- Services restarted in failover region within RTO targets
- Post-Incident Review
- Root cause analysis documented
- Action items tracked and prioritized in backlog
- DR plan updated based on lessons learned
Testing & Validation
- Quarterly DR Drills
- Simulated outages to validate failover procedures and recovery scripts
- Backup Restore Tests
- Monthly restore exercises from R2 and PostgreSQL snapshots
- Documentation Reviews
- DR plan reviewed semi-annually to incorporate infrastructure or process changes