Infrastructure

Deployment & CI/CD

Each service lives in its own repo and deploys independently. There is no monorepo path-filtering — a push to a repo triggers that repo's deploy only.

How a deploy works

The pipeline is: git push → GitHub Actions → docker build → ECR push → ECS force-new-deployment.

git push main / staging
   ↓
GitHub Actions (in each service repo)
   ↓
docker build -t <service> .
docker tag ... <ECR_URI>:<env>-<sha8>
docker push <ECR_URI>:<env>-<sha8>
   ↓
aws ecs update-service --force-new-deployment
   ↓
ECS pulls new image, rolling replacement of tasks

Environments & Branches

BranchEnvironmentECS ClusterAWS RegionImage Tag
stagingstagingzelly-stagingap-southeast-1staging-{sha8}
mainproductionzelly-productionap-south-1prod-{sha8}

ECR — Elastic Container Registry

ECR is AWS's private Docker registry. ECS Fargate can only run pre-built images — it does not build. ECR repos are created by Terraform in the production account and shared (via data source) by staging.

ECR repositories

Repo nameService
zelly/fastify-novabackend-api-fastify-nova
zelly/customer-panelcustomer-panel-neptune
zelly/orion-backendinternal-admin-panel-orion/backend
zelly/events-consumerstore-events-consumer
zelly/astro-storefrontstorefront-astro-titan
zelly/caddyCaddy sidecar (storefront task)

Manual build & push (one-off)

bash — push one service to ECR
AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
AWS_REGION=ap-south-1
SERVICE=fastify-nova
ECR_URI="${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/zelly/${SERVICE}"

# Authenticate Docker to ECR (token lasts 12h)
aws ecr get-login-password --region $AWS_REGION \
  | docker login --username AWS --password-stdin \
    "${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com"

# Build
docker build -t "zelly/${SERVICE}" ./backend-api-fastify-nova

# Tag with commit SHA
SHA=$(git -C ./backend-api-fastify-nova rev-parse --short=8 HEAD)
docker tag "zelly/${SERVICE}" "${ECR_URI}:prod-${SHA}"

# Push
docker push "${ECR_URI}:prod-${SHA}"

# Trigger ECS redeploy
aws ecs update-service \
  --cluster zelly-production \
  --service "zelly-production-fastify-nova" \
  --force-new-deployment \
  --region $AWS_REGION

GitHub Actions CI/CD

Each service repo contains .github/workflows/deploy.yml. The workflow uses GitHub OIDC to authenticate with AWS (no long-lived access keys stored in GitHub).

Workflow structure

.github/workflows/deploy.yml — skeleton
on:
  push:
    branches: [main, staging]

jobs:
  deploy:
    runs-on: ubuntu-latest
    permissions:
      id-token: write   # required for OIDC
      contents: read

    steps:
      - uses: actions/checkout@v4

      - name: Configure AWS credentials (OIDC)
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::<account>:role/zelly-github-actions
          aws-region: ${{ steps.env.outputs.aws_region }}

      - name: Login to ECR
        id: ecr
        uses: aws-actions/amazon-ecr-login@v2

      - name: Build, tag, push
        run: |
          IMAGE="${{ steps.ecr.outputs.registry }}/zelly/fastify-nova:${{ steps.env.outputs.image_tag }}"
          docker build -t $IMAGE .
          docker push $IMAGE

      - name: Deploy to ECS
        uses: aws-actions/amazon-ecs-deploy-task-definition@v2
        with:
          task-definition: zelly-${{ steps.env.outputs.environment }}-fastify-nova
          service:          zelly-${{ steps.env.outputs.environment }}-fastify-nova
          cluster:          zelly-${{ steps.env.outputs.environment }}
          wait-for-service-stability: true

Environment resolution step

# Determines region, cluster, and tag from the branch name
- name: Set environment
  id: env
  run: |
    if [ "$GITHUB_REF_NAME" = "main" ]; then
      echo "environment=production" >> $GITHUB_OUTPUT
      echo "aws_region=ap-south-1"  >> $GITHUB_OUTPUT
      echo "image_tag=prod-${GITHUB_SHA::8}" >> $GITHUB_OUTPUT
    else
      echo "environment=staging" >> $GITHUB_OUTPUT
      echo "aws_region=ap-southeast-1" >> $GITHUB_OUTPUT
      echo "image_tag=staging-${GITHUB_SHA::8}" >> $GITHUB_OUTPUT
    fi

GitHub OIDC Setup

GitHub Actions authenticates to AWS via OIDC — no IAM access keys are stored in GitHub secrets.

The github_oidc Terraform module creates the OIDC provider and an IAM role (zelly-github-actions) that GitHub Actions can assume. The role trust policy restricts assumption to pushes from the zelly-in GitHub organisation.

i
The ACTIONS_RUNNER_DEBUG and ACTIONS_STEP_DEBUG GitHub secrets can be set to true to get verbose OIDC debugging output when authentication fails.

Deployment order for a fresh environment

  1. Apply Terraform (VPC, ECR, ECS cluster, ALBs, databases)
  2. Populate secrets in AWS Secrets Manager (zelly/* paths)
  3. Run DB migrations against Aurora (via WireGuard VPN in prod, direct in staging)
  4. Apply ClickHouse init SQL (store-events-consumer/docker/clickhouse/init/) via bastion
  5. Build and push all Docker images to ECR (manual or via a CI push)
  6. ECS services start automatically and pull images from ECR
  7. Update Cloudflare DNS to point domains at ALB/NLB DNS names
  8. Validate ACM certificate DNS records (added automatically by the dns Terraform module)

Rollback

ECS keeps the previous task definition revision. To roll back to the previous image:

bash — roll back one revision
SERVICE=zelly-production-fastify-nova
CLUSTER=zelly-production
REGION=ap-south-1

# Get current task definition ARN
CURRENT=$(aws ecs describe-services \
  --cluster $CLUSTER --services $SERVICE --region $REGION \
  --query 'services[0].taskDefinition' --output text)

# Derive previous revision number
PREV_REV=$(( $(echo $CURRENT | grep -o '[0-9]*$') - 1 ))
FAMILY=$(echo $CURRENT | sed 's/:[0-9]*$//')
PREV="${FAMILY}:${PREV_REV}"

# Update service to use previous revision
aws ecs update-service \
  --cluster $CLUSTER \
  --service $SERVICE \
  --task-definition $PREV \
  --force-new-deployment \
  --region $REGION

Cloudflare Pages (seller-panel, orion-frontend)

The two React SPAs deploy to Cloudflare Pages directly from their GitHub repos. No ECS or ECR involved.