Deployment & CI/CD
Each service lives in its own repo and deploys independently. There is no monorepo path-filtering — a push to a repo triggers that repo's deploy only.
How a deploy works
The pipeline is: git push → GitHub Actions → docker build → ECR push → ECS force-new-deployment.
Environments & Branches
| Branch | Environment | ECS Cluster | AWS Region | Image Tag |
|---|---|---|---|---|
staging | staging | zelly-staging | ap-southeast-1 | staging-{sha8} |
main | production | zelly-production | ap-south-1 | prod-{sha8} |
ECR — Elastic Container Registry
ECR is AWS's private Docker registry. ECS Fargate can only run pre-built images — it does not build. ECR repos are created by Terraform in the production account and shared (via data source) by staging.
ECR repositories
| Repo name | Service |
|---|---|
zelly/fastify-nova | backend-api-fastify-nova |
zelly/customer-panel | customer-panel-neptune |
zelly/orion-backend | internal-admin-panel-orion/backend |
zelly/events-consumer | store-events-consumer |
zelly/astro-storefront | storefront-astro-titan |
zelly/caddy | Caddy sidecar (storefront task) |
Manual build & push (one-off)
AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
AWS_REGION=ap-south-1
SERVICE=fastify-nova
ECR_URI="${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/zelly/${SERVICE}"
# Authenticate Docker to ECR (token lasts 12h)
aws ecr get-login-password --region $AWS_REGION \
| docker login --username AWS --password-stdin \
"${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com"
# Build
docker build -t "zelly/${SERVICE}" ./backend-api-fastify-nova
# Tag with commit SHA
SHA=$(git -C ./backend-api-fastify-nova rev-parse --short=8 HEAD)
docker tag "zelly/${SERVICE}" "${ECR_URI}:prod-${SHA}"
# Push
docker push "${ECR_URI}:prod-${SHA}"
# Trigger ECS redeploy
aws ecs update-service \
--cluster zelly-production \
--service "zelly-production-fastify-nova" \
--force-new-deployment \
--region $AWS_REGION
GitHub Actions CI/CD
Each service repo contains .github/workflows/deploy.yml. The workflow uses GitHub OIDC to authenticate with AWS (no long-lived access keys stored in GitHub).
Workflow structure
on: push: branches: [main, staging] jobs: deploy: runs-on: ubuntu-latest permissions: id-token: write # required for OIDC contents: read steps: - uses: actions/checkout@v4 - name: Configure AWS credentials (OIDC) uses: aws-actions/configure-aws-credentials@v4 with: role-to-assume: arn:aws:iam::<account>:role/zelly-github-actions aws-region: ${{ steps.env.outputs.aws_region }} - name: Login to ECR id: ecr uses: aws-actions/amazon-ecr-login@v2 - name: Build, tag, push run: | IMAGE="${{ steps.ecr.outputs.registry }}/zelly/fastify-nova:${{ steps.env.outputs.image_tag }}" docker build -t $IMAGE . docker push $IMAGE - name: Deploy to ECS uses: aws-actions/amazon-ecs-deploy-task-definition@v2 with: task-definition: zelly-${{ steps.env.outputs.environment }}-fastify-nova service: zelly-${{ steps.env.outputs.environment }}-fastify-nova cluster: zelly-${{ steps.env.outputs.environment }} wait-for-service-stability: true
Environment resolution step
GitHub OIDC Setup
GitHub Actions authenticates to AWS via OIDC — no IAM access keys are stored in GitHub secrets.
The github_oidc Terraform module creates the OIDC provider and an IAM role (zelly-github-actions) that GitHub Actions can assume. The role trust policy restricts assumption to pushes from the zelly-in GitHub organisation.
ACTIONS_RUNNER_DEBUG and ACTIONS_STEP_DEBUG GitHub secrets can be set to true to get verbose OIDC debugging output when authentication fails.Deployment order for a fresh environment
- Apply Terraform (VPC, ECR, ECS cluster, ALBs, databases)
- Populate secrets in AWS Secrets Manager (
zelly/*paths) - Run DB migrations against Aurora (via WireGuard VPN in prod, direct in staging)
- Apply ClickHouse init SQL (
store-events-consumer/docker/clickhouse/init/) via bastion - Build and push all Docker images to ECR (manual or via a CI push)
- ECS services start automatically and pull images from ECR
- Update Cloudflare DNS to point domains at ALB/NLB DNS names
- Validate ACM certificate DNS records (added automatically by the
dnsTerraform module)
Rollback
ECS keeps the previous task definition revision. To roll back to the previous image:
SERVICE=zelly-production-fastify-nova CLUSTER=zelly-production REGION=ap-south-1 # Get current task definition ARN CURRENT=$(aws ecs describe-services \ --cluster $CLUSTER --services $SERVICE --region $REGION \ --query 'services[0].taskDefinition' --output text) # Derive previous revision number PREV_REV=$(( $(echo $CURRENT | grep -o '[0-9]*$') - 1 )) FAMILY=$(echo $CURRENT | sed 's/:[0-9]*$//') PREV="${FAMILY}:${PREV_REV}" # Update service to use previous revision aws ecs update-service \ --cluster $CLUSTER \ --service $SERVICE \ --task-definition $PREV \ --force-new-deployment \ --region $REGION
Cloudflare Pages (seller-panel, orion-frontend)
The two React SPAs deploy to Cloudflare Pages directly from their GitHub repos. No ECS or ECR involved.
- Push to
main→ production Pages deployment - Push to any other branch → preview deployment at a unique URL
- Build command:
npm run build - Output directory:
dist/ - Set
VITE_API_BASE_URL(and other env vars) in the Cloudflare Pages dashboard, not in the repo