A step-by-step guide to creating a production-ready Kubernetes cluster

Kubernetes Cluster Setup on AWS EKS

A step-by-step guide to creating a production-ready Kubernetes cluster on AWS EKS using eksctl, setting up core components (NGINX Ingress, Cert Manager, NATS, PostgreSQL), and configuring secrets and environment-specific configuration for application services.

This walkthrough builds on the high-level overview from Deploying microservices into a Kubernetes cloud.


Prerequisites

Install and configure:

  • AWS CLI
  • eksctl
  • kubectl
  • Helm
  • AWS account with permissions to create EKS clusters, IAM roles, and policies
  • (Optional) aws-iam-authenticator if your environment requires it

Ensure you’re authenticated:

aws configure
aws sts get-caller-identity

Helm Repo Setup

helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo add jetstack https://charts.jetstack.io
helm repo add nats https://nats-io.github.io/k8s/helm/charts/
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update

Step-by-Step Setup

1. Create EKS Cluster

eksctl create cluster \
  --name xoxo-v1 \
  --region us-east-2 \
  --nodegroup-name standard-workers \
  --node-type t3.medium \
  --nodes 2 \
  --nodes-min 1 \
  --nodes-max 4 \
  --managed

What this does
Creates a managed EKS cluster in us-east-2 with a node group that can autoscale from 1 to 4 nodes.


2. Add Cluster to kubectl Context

aws eks update-kubeconfig --region us-east-2 --name xoxo-v1
aws eks describe-cluster --name xoxo-v1 --region us-east-2
kubectl get nodes -o wide

What this does
Adds the new EKS cluster to your kubeconfig and verifies connectivity.


3. Install NGINX Ingress Controller

helm install nginx-ingress ingress-nginx/ingress-nginx \
  --namespace ingress-nginx \
  --create-namespace \
  --set controller.publishService.enabled=true

kubectl get svc -n ingress-nginx
kubectl create ns backend

What this does
Installs the NGINX Ingress Controller and creates the backend namespace for your application services.


4. Install Cert Manager

Install CRDs:

kubectl apply --validate=false \
  -f https://github.com/cert-manager/cert-manager/releases/latest/download/cert-manager.crds.yaml

Install Cert Manager (pin a version if desired):

helm install cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --version v1.14.3

Apply your ClusterIssuer (example in Appendix):

kubectl apply -f cluster-issuer.yaml

What this does
Installs Cert Manager to automatically provision and renew TLS certificates (e.g., via Let’s Encrypt).


5. Install NATS Messaging Queue

helm install nats nats/nats \
  --namespace nats \
  --create-namespace

kubectl get pods,svc -n nats
kubectl exec -n nats -it nats-0 -- nslookup nats.nats.svc.cluster.local

What this does
Deploys NATS for inter-service messaging and validates cluster DNS.


6. Install PostgreSQL (EBS CSI + Bitnami)

6.1 EBS CSI Driver (for persistent volumes)

Create IAM role for the EBS CSI driver and attach policy:

aws iam create-role \
  --role-name AmazonEKS_EBS_CSI_DriverRole \
  --assume-role-policy-document file://trust-policy.json

aws iam attach-role-policy \
  --role-name AmazonEKS_EBS_CSI_DriverRole \
  --policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy

Install the addon (IRSA role ARN will vary by account):

eksctl create addon --name aws-ebs-csi-driver \
  --cluster xoxo-v1 \
  --region us-east-2 \
  --service-account-role-arn arn:aws:iam::<YOUR_ACCOUNT_ID>:role/AmazonEKS_EBS_CSI_DriverRole

6.2 Install PostgreSQL (Bitnami)

kubectl create namespace postgres

helm install pgdb bitnami/postgresql \
  --namespace postgres \
  --values postgresdb-values.yaml

kubectl get pods,svc -n postgres
kubectl exec -n nats -it nats-0 -- nslookup pgdb-postgresql.postgres.svc.cluster.local

What this does
Deploys PostgreSQL using the Bitnami Helm chart with persistent volumes via EBS.


7. Access PostgreSQL

Internal access (from a temporary pod):

export POSTGRES_PASSWORD=$(kubectl get secret --namespace postgres pgdb-postgresql \
  -o jsonpath="{.data.postgres-password}" | base64 -d)

kubectl run pgdb-postgresql-client --rm --tty -i --restart='Never' \
  --namespace default \
  --image docker.io/bitnami/postgresql:17.6.0-debian-12-r0 \
  --env="PGPASSWORD=$POSTGRES_PASSWORD" \
  --command -- psql --host pgdb-postgresql -U postgres -d backend -p 5432

External access (via local port-forward):

# forward Service in postgres namespace to localhost:5433
kubectl port-forward -n postgres svc/pgdb-postgresql 5433:5432 &

# then connect locally using psql (reuses POSTGRES_PASSWORD exported above)
PGPASSWORD="$POSTGRES_PASSWORD" psql --host 127.0.0.1 -U postgres -d backend -p 5433

8. Add Docker Registry Secret & Apply Configs

⚠️ Never hardcode tokens in scripts or repos. Use env vars or secret managers.

# Export credentials securely (replace values accordingly)
export DOCKER_USERNAME="criyadevops"
export DOCKER_PASSWORD="<your_docker_access_token>"
export DOCKER_EMAIL="devops@criya.co"

kubectl create secret docker-registry docker-reg -n backend \
  --docker-username="$DOCKER_USERNAME" \
  --docker-password="$DOCKER_PASSWORD" \
  --docker-email="$DOCKER_EMAIL"

To use this secret, add the following to your Deployments:

spec:
  template:
    spec:
      imagePullSecrets:
        - name: docker-reg

Apply environment configs and secrets (adjust file names as needed):

kubectl apply -f staging-configmap.yaml
./staging-redeploy-secrets.sh

Security Recommendations

  • Use AWS Secrets Manager or External Secrets Operator (ESO)
    Store app secrets outside the cluster and sync with Kubernetes:

    • ESO example (values sample in Appendix)
    • Avoid kubectl create secret ... for long-lived credentials.
  • Enable IRSA (IAM Roles for Service Accounts)
    Grant the minimum AWS permissions to specific pods that need them:

    1. Create a fine-grained IAM policy.
    2. Create/annotate a K8s service account with the IAM role ARN.
    3. Reference that service account in your Deployment.
  • TLS Everywhere
    Use Cert Manager with Let’s Encrypt (HTTP-01 or DNS-01) and ensure all Ingress objects have TLS configured.

  • Network Policies
    Restrict traffic between namespaces and workloads (e.g., only app pods can talk to PostgreSQL/NATS).

  • RBAC & Least Privilege
    Provide minimal Kubernetes permissions to CI/CD and developers.

  • Pod Security (PSA)
    Enforce restricted baseline via namespace labels and admission (e.g., disallow privileged pods).

  • Encrypt at Rest
    Enable EBS volume encryption by default (KMS keys if required).

  • Audit & Control Plane Logs
    Enable EKS control plane logging (API, audit, authenticator).


Monitoring & Logging

  • Metrics (Prometheus + Grafana)
    Install the kube-prometheus-stack:

    helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
    helm repo update
    helm install monitoring prometheus-community/kube-prometheus-stack \
      --namespace monitoring --create-namespace
  • Logging (CloudWatch / Fluent Bit)
    Enable CloudWatch Container Insights or deploy Fluent Bit to ship logs to CloudWatch/ELK/DataDog.

  • Horizontal/Vertical Scaling
    Install metrics-server for HPA and consider VPA for right-sizing:

    kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
  • Cluster Autoscaler (optional)
    Improves node scaling efficiency:

    kubectl apply -f https://github.com/kubernetes/autoscaler/releases/latest/download/cluster-autoscaler-autodiscover.yaml

Best Practices

  • Pin chart versions for reproducibility.
  • Separate namespaces by concern: ingress, cert-manager, nats, postgres, backend.
  • Use values.yaml files per environment (dev/staging/prod).
  • Resource requests/limits on all workloads.
  • Readiness/Liveness probes on app pods.
  • Backups: Schedule PostgreSQL backups (e.g., pgBackRest) and consider Velero for cluster backup.
  • Cost: Right-size nodes, enable autoscaler, use spot where appropriate (with interruption handling).

Cleanup

# Delete the EKS cluster and all managed resources
eksctl delete cluster --name xoxo-v1 --region us-east-2

# (Optional) Detach & delete EBS CSI role
aws iam detach-role-policy \
  --role-name AmazonEKS_EBS_CSI_DriverRole \
  --policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy

aws iam delete-role --role-name AmazonEKS_EBS_CSI_DriverRole

Note: Deleting the cluster won’t delete EBS volumes that are retained, so double-check to avoid orphaned costs.


Troubleshooting

  • Ingress not exposing IP/DNS
    Check controller logs: kubectl logs -n ingress-nginx deploy/nginx-ingress-controller
  • Certificates stuck in Pending
    kubectl describe certificate -A and kubectl describe challenge -A for ACME issues.
  • DNS issues
    Use a busybox pod: nslookup <service>.<namespace>.svc.cluster.local
  • PersistentVolumeClaims pending
    Verify EBS CSI addon and StorageClass.

Appendix

A. Sample cluster-issuer.yaml (Let’s Encrypt HTTP-01)

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    email: devops@criya.co
    server: https://acme-v02.api.letsencrypt.org/directory
    privateKeySecretRef:
      name: letsencrypt-prod-private-key
    solvers:
      - http01:
          ingress:
            class: nginx

B. Sample postgresdb-values.yaml (Bitnami)

global:
  postgresql:
    auth:
      username: postgres
      database: backend
      existingSecret: ""   # Prefer external/ESO secrets; else leave empty to auto-generate
primary:
  persistence:
    enabled: true
    size: 20Gi
    storageClass: gp3
resources:
  requests:
    cpu: 250m
    memory: 512Mi
  limits:
    cpu: "1"
    memory: 1Gi

C. External Secrets Operator (optional)

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: backend-env
  namespace: backend
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets-manager
    kind: ClusterSecretStore
  target:
    name: backend-env
    creationPolicy: Owner
  data:
    - secretKey: DATABASE_URL
      remoteRef:
        key: /prod/backend/DATABASE_URL

FAQ

What does this AWS EKS cluster setup guide walk me through?
It shows how to create an EKS cluster with eksctl, configure kubectl, install NGINX Ingress, Cert Manager, NATS, and PostgreSQL, and wire up DNS and TLS so you have a realistic environment for running backend services.
Is this EKS configuration meant for production use or just local experiments?
The guide aims for a production-leaning setup with managed node groups, persistent storage, ingress, TLS, and messaging, but you should still adapt security, backup, and cost settings to your own organization's requirements before using it in a real production environment.

Welcome to The infinite monkey theorem

Somewhere a monkey just typed Shakespeare in TypeScript. Be the first to read the masterpieces (and the hilarious misfires) landing on the blog.

Subscribe to The infinite monkey theorem

We fling fresh posts—no banana peels attached—straight to your inbox.