Welcome to the twenty-fifth post in our Kubernetes A-to-Z Series! By now you have met Pods, ReplicaSets, Deployments, and Jobs in separate posts. This one zooms out and asks the bigger question: when you have an application to run, which workload resource should you actually pick? The answer depends on whether the app is stateless, stateful, scheduled, or node-local, and the wrong pick will cause subtle bugs in production.

What is a Workload?

In Kubernetes, a workload is an application running on your cluster. The actual process always runs inside a Pod, but you rarely create raw Pods yourself. Instead, you use a higher level workload resource that creates and manages Pods on your behalf, gives them a lifecycle, and reacts when something goes wrong.

The built-in workload resources are:

  • Pod: the raw primitive. One or more co-located containers sharing network and storage.
  • ReplicaSet: keeps N identical Pods running.
  • Deployment: declarative wrapper around ReplicaSets that adds rolling updates and rollbacks.
  • StatefulSet: like a Deployment, but each Pod has a stable identity (name, network, storage).
  • DaemonSet: runs exactly one Pod per node (or per matching node).
  • Job: runs Pods until a fixed number of them succeed, then stops.
  • CronJob: creates Jobs on a schedule.

Custom resources (Operators, Argo Rollouts, Argo Workflows, KEDA ScaledJobs, Knative Services) extend this list, but every CRD ultimately produces Pods using the same mechanics, so understanding the built-ins is the foundation.

The Pod is Never the Final Answer

Raw Pod                    Workload Resource
┌───────────────────┐     ┌────────────────────────────┐
│ Pod: web-server   │     │ Deployment: web-server     │
│  - dies on node   │     │   manages ReplicaSet       │
│    crash          │     │     manages 3 Pods         │
│  - no replacement │     │   replaces dead Pods       │
│  - no scaling     │     │   rolling updates          │
└───────────────────┘     └────────────────────────────┘

If you submit a raw Pod and the node it sits on dies, the Pod is gone. No controller will create a replacement. That is why production workloads always run under a controller.

Workload Types in Detail

Pod (Primitive Only)

A Pod is the smallest deployable unit. Use it directly only for short-lived debugging or one-off experiments. For anything that should outlive a node failure, wrap it.

# Quick debug shell, will not be restarted
kubectl run debug --image=busybox --rm -it -- sh

See the P is for Pods post for the full anatomy.

ReplicaSet

A ReplicaSet keeps a fixed number of identical Pods running. It is the lowest level controller that gives you self-healing.

You almost never write a ReplicaSet by hand. Deployments create ReplicaSets for you and use them as the unit of revision history. The R is for ReplicaSets post covers the standalone case.

Rule of thumb: if you find yourself writing kind: ReplicaSet in a YAML file, ask whether you really wanted kind: Deployment instead.

Deployment

A Deployment is the default choice for stateless applications: web servers, API gateways, stateless microservices, background workers that read from a queue and have no local state.

Key behaviors:

  • Manages a ReplicaSet under the hood.
  • Performs rolling updates when the Pod template changes. Old ReplicaSet scales down while new ReplicaSet scales up.
  • Supports rollback via kubectl rollout undo.
  • Pods get interchangeable identities. A Pod named web-7d4f-abc12 is functionally identical to web-7d4f-xyz99.

Deep dive in D is for Deployments.

Minimal example:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web
  labels:
    app: web
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
      - name: web
        image: nginx:1.27
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: "100m"
            memory: "128Mi"
          limits:
            cpu: "500m"
            memory: "512Mi"
        readinessProbe:
          httpGet:
            path: /
            port: 80
          initialDelaySeconds: 5
          periodSeconds: 5
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0

StatefulSet

A StatefulSet manages Pods that need a stable identity. Use it for databases, message brokers, distributed stores, and anything that needs to know “I am replica 0” or “I am replica 2”.

Differences from a Deployment:

  • Pods get predictable names: mysql-0, mysql-1, mysql-2. The ordinal is part of the contract.
  • Pods are created and terminated in strict order by default. Pod 0 is ready before Pod 1 starts. Scaling down removes the highest ordinal first.
  • Each Pod gets a stable DNS name through a headless Service: mysql-0.mysql.default.svc.cluster.local.
  • Each Pod gets its own PersistentVolumeClaim via volumeClaimTemplates. When mysql-0 is rescheduled to another node, it reattaches to the same volume.

Use cases:

  • Relational databases (PostgreSQL, MySQL primary/replica).
  • Distributed databases (Cassandra, MongoDB ReplicaSet, Elasticsearch).
  • Message brokers (Kafka, RabbitMQ cluster).
  • Anything where a peer says “I trust the data on disk at pvc-0”.

Example:

apiVersion: v1
kind: Service
metadata:
  name: mysql
  labels:
    app: mysql
spec:
  ports:
  - port: 3306
    name: mysql
  clusterIP: None
  selector:
    app: mysql
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
spec:
  serviceName: mysql
  replicas: 3
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
      - name: mysql
        image: mysql:8.4
        ports:
        - containerPort: 3306
          name: mysql
        env:
        - name: MYSQL_ROOT_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mysql-secret
              key: root-password
        volumeMounts:
        - name: data
          mountPath: /var/lib/mysql
        readinessProbe:
          exec:
            command: ["sh", "-c", "mysqladmin ping -h 127.0.0.1 -p$MYSQL_ROOT_PASSWORD"]
          initialDelaySeconds: 10
          periodSeconds: 10
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: fast-ssd
      resources:
        requests:
          storage: 20Gi

Note the headless Service (clusterIP: None) paired with serviceName: mysql. That headless Service is what gives each Pod its stable DNS name. Skip it, and replicas cannot find each other reliably.

DaemonSet

A DaemonSet runs one Pod per node. The controller watches the node list and ensures a Pod is scheduled on every node (or on every node matching a selector).

Use cases:

  • Log shippers: Fluent Bit, Fluentd, Vector, Promtail.
  • Node metrics agents: node-exporter, cAdvisor.
  • Network plugins: Calico, Cilium, Flannel.
  • Storage agents: CSI node drivers.
  • Security agents: Falco, OSQuery, intrusion detection.

If the workload needs to inspect or expose something about the node itself, you want a DaemonSet.

Example: node-exporter DaemonSet.

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
  namespace: monitoring
  labels:
    app: node-exporter
spec:
  selector:
    matchLabels:
      app: node-exporter
  template:
    metadata:
      labels:
        app: node-exporter
    spec:
      hostNetwork: true
      hostPID: true
      tolerations:
      - operator: "Exists"
      containers:
      - name: node-exporter
        image: quay.io/prometheus/node-exporter:v1.8.2
        args:
        - "--path.procfs=/host/proc"
        - "--path.sysfs=/host/sys"
        - "--web.listen-address=:9100"
        ports:
        - containerPort: 9100
          name: metrics
        volumeMounts:
        - name: proc
          mountPath: /host/proc
          readOnly: true
        - name: sys
          mountPath: /host/sys
          readOnly: true
      volumes:
      - name: proc
        hostPath:
          path: /proc
      - name: sys
        hostPath:
          path: /sys

The tolerations: [{operator: "Exists"}] block tells the DaemonSet to schedule on every node, including control-plane nodes and nodes with custom taints. Without it, tainted nodes silently skip the DaemonSet, and you discover the gap only when a node stops shipping metrics.

Job

A Job runs Pods until a target number of them complete successfully, then stops. Unlike Deployments, a Job is finite.

Use cases:

  • Database migrations on release.
  • One-off data backfill.
  • Batch processing: video transcode, ETL.
  • CI/CD steps that run in-cluster.

Two important spec fields:

  • restartPolicy: must be Never or OnFailure. Never Always.
  • backoffLimit: how many times to retry a failed Pod before marking the Job failed. Default is 6.
  • completions and parallelism: for batch fan-out.

Minimal example:

apiVersion: batch/v1
kind: Job
metadata:
  name: db-migration
spec:
  backoffLimit: 3
  ttlSecondsAfterFinished: 600
  template:
    spec:
      restartPolicy: Never
      containers:
      - name: migrate
        image: myapp/migrator:v1.2.0
        command: ["./migrate", "up"]
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: db-secret
              key: url

The ttlSecondsAfterFinished: 600 field cleans up the Job and its Pods 10 minutes after completion, so finished migration Pods do not pile up in your namespace.

For deeper coverage of Job patterns, see J is for Jobs and CronJobs.

CronJob

A CronJob creates a Job on a recurring schedule, using standard cron syntax.

apiVersion: batch/v1
kind: CronJob
metadata:
  name: nightly-backup
spec:
  schedule: "0 3 * * *"
  timeZone: "Asia/Ho_Chi_Minh"
  concurrencyPolicy: Forbid
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 5
  jobTemplate:
    spec:
      backoffLimit: 2
      template:
        spec:
          restartPolicy: OnFailure
          containers:
          - name: backup
            image: myapp/backup:v1.0.0
            args: ["/bin/sh", "-c", "/scripts/backup.sh"]

A few fields that matter in practice:

  • concurrencyPolicy: Forbid: skip a run if the previous one is still going. Good for backups that should never overlap.
  • concurrencyPolicy: Replace: kill the running one and start fresh.
  • concurrencyPolicy: Allow (default): run them concurrently. Often the wrong choice.
  • timeZone: respected as of Kubernetes 1.27 and later. Before that, schedules ran in the kube-controller-manager’s time zone, which surprised many teams.

Decision Tree: Picking the Right Workload

Walk through the questions in order. Stop at the first match.

Is the task short-lived and finite?
  Yes -> Need to schedule it repeatedly?
            Yes -> CronJob
            No  -> Job
  No  -> Continue.

Does the workload need a Pod on every node?
  Yes -> DaemonSet
  No  -> Continue.

Does each replica need a stable identity (name, network, storage)?
  Yes -> StatefulSet
  No  -> Deployment (default)

Concrete signals that push you toward each type:

SignalWorkload
”Replicas read and write to a local disk that must survive a reschedule”StatefulSet
”Replicas talk to each other by name to form a cluster”StatefulSet
”Replicas are interchangeable, traffic comes through a Service”Deployment
”I need to read /proc or /sys on the host”DaemonSet
”I need to install something on every new node automatically”DaemonSet
”Run once when we deploy a new release”Job
”Run every night at 3am”CronJob

Lifecycle Differences That Trip People Up

Scaling

  • Deployment: kubectl scale deploy/web --replicas=5. Pods come up in parallel, no ordering.
  • StatefulSet: kubectl scale sts/mysql --replicas=5. Pods come up one at a time, in order. Scaling down also goes in reverse order. This is slow on purpose.
  • DaemonSet: you do not scale it. The replica count is “number of matching nodes”. To run more, cordon fewer nodes or add nodes.
  • Job: parallelism controls how many Pods run at once; completions controls the total successes needed.

Rolling Update Strategy

  • Deployment: RollingUpdate (default) or Recreate. RollingUpdate respects maxSurge and maxUnavailable.
  • StatefulSet: RollingUpdate (default) updates from the highest ordinal down to 0. OnDelete means nothing happens until you manually delete a Pod, which is useful for databases where you want to control the rollout yourself.
  • DaemonSet: RollingUpdate (default) or OnDelete. Honors maxUnavailable only, since maxSurge does not make sense when there is exactly one Pod per node.
  • Job / CronJob: no rolling update. A change to the spec applies to the next Job created.

Pod Identity and Network

WorkloadPod NameDNSVolume
Deploymentrandom suffix (web-7d4f-abc12)round-robin via Serviceshared or per-Pod ephemeral
StatefulSetordinal (mysql-0, mysql-1)per-Pod via headless Serviceper-Pod stable PVC
DaemonSetnode name suffixper-node, often hostNetwork: trueusually hostPath
Jobrandom suffixnot addressableper-Pod ephemeral

Termination

  • Deployment / StatefulSet / DaemonSet: Pods are restarted forever unless the workload is deleted.
  • Job: Pods stop after the success count is reached. Set ttlSecondsAfterFinished to garbage collect.
  • CronJob: keeps a configurable history via successfulJobsHistoryLimit and failedJobsHistoryLimit.

Common Pitfalls

1. Using a Deployment for Something Stateful

A common mistake: using a Deployment with a single replica and a PersistentVolumeClaim for a database.

The replica count works, but on rollout (or node failure) the new Pod may schedule on a different node before the old Pod releases the volume. With ReadWriteOnce access mode, the new Pod gets stuck in ContainerCreating. Worse, with Recreate strategy you may still hit a race during reschedule.

If the data matters, use a StatefulSet. The strict order and per-Pod PVC are designed for exactly this.

2. Forgetting the Headless Service for a StatefulSet

A StatefulSet without a matching headless Service (clusterIP: None) still runs, but the per-Pod DNS names do not resolve. Cluster members cannot find each other, and you get cryptic errors from the database init script.

Always create the headless Service first and reference it via serviceName: in the StatefulSet spec.

3. DaemonSet Skipping Tainted Nodes

By default, a DaemonSet only schedules on nodes whose taints it tolerates. Control-plane nodes typically have node-role.kubernetes.io/control-plane:NoSchedule. A logging DaemonSet without tolerations will silently skip them, and you lose logs from the control plane.

Either add explicit tolerations for the taints you care about, or add a wildcard:

tolerations:
- operator: "Exists"

4. Job Retry Loops That Burn Money

A misconfigured Job with backoffLimit: 6 and a permanently failing image will spin up six Pods, each one slow to pull the image, before giving up. If the Pod requests a GPU or a large memory limit, this is expensive.

Set a low backoffLimit (1 or 2) for migrations. Set activeDeadlineSeconds to put a hard upper bound on total runtime.

spec:
  backoffLimit: 2
  activeDeadlineSeconds: 600

5. CronJob Schedules Running in the Wrong Time Zone

Before Kubernetes 1.27, CronJob schedules ran in the kube-controller-manager’s time zone, usually UTC. A schedule: "0 3 * * *" you assumed was 3 AM local time was actually 3 AM UTC. Always set timeZone: explicitly on Kubernetes 1.27 or newer, and double-check on older clusters.

6. Overlapping CronJob Runs

The default concurrencyPolicy: Allow lets a CronJob spawn a new Job even if the previous one is still running. For backups, batch ingest, or anything that touches the same data, this corrupts state. Use Forbid unless you specifically want overlap.

7. Deleting a StatefulSet Does Not Delete Its PVCs

By design, kubectl delete statefulset mysql removes the Pods but keeps the PersistentVolumeClaims. This protects your data, but it surprises new users who expect a clean slate. Use kubectl delete pvc -l app=mysql to actually release the storage. As of Kubernetes 1.27, you can set persistentVolumeClaimRetentionPolicy on the StatefulSet to opt into automatic PVC deletion.

Quick Workload Cheatsheet

WorkloadScaleRolloutOrderingIdentityUse Case
PodnonenonenonerandomDebugging only
ReplicaSetmanualnonenonerandomRarely used directly
Deploymentmanual or HPArolling, rollbackparallelrandomStateless services
StatefulSetmanual or HPArolling (ordered)strict ordinalstable name, stable PVCDatabases, brokers, distributed stores
DaemonSetper-node, automaticrollingper-nodeper-nodeLog shippers, CNI, node agents
Jobparallelism + completionsnoneparallelrandomOne-shot batch tasks
CronJobvia jobTemplatenoneper-schedulerandomScheduled batch tasks

Useful kubectl Snippets

# List every workload type at once
kubectl get deploy,sts,ds,job,cj -A

# Watch a rolling update
kubectl rollout status deploy/web

# Pause a Deployment mid-rollout (e.g. to tweak the rollout)
kubectl rollout pause deploy/web
kubectl rollout resume deploy/web

# Roll back to the previous revision
kubectl rollout undo deploy/web

# See the revision history
kubectl rollout history deploy/web

# Scale a StatefulSet (one Pod at a time, in order)
kubectl scale sts/mysql --replicas=5

# Trigger a CronJob manually for testing
kubectl create job --from=cronjob/nightly-backup backup-manual-$(date +%s)

# Get the per-Pod DNS for a StatefulSet member
kubectl get pod mysql-0 -o jsonpath='{.metadata.name}.{.spec.subdomain}'

Wrapping Up

Pick the workload that matches the shape of the application, not the shape of the YAML you remember writing last week.

  • Stateless, horizontally scalable, traffic via Service: Deployment.
  • Stateful, needs stable identity and per-replica storage: StatefulSet.
  • One Pod per node, usually for observability or networking: DaemonSet.
  • Finite task that runs to completion: Job.
  • Finite task on a recurring schedule: CronJob.

Get this choice right and the rest of Kubernetes works with you. Get it wrong and you fight the platform every release.

Key Takeaways

  • A workload is the high-level resource that manages Pods. You rarely manage Pods directly.
  • Deployment is the default for stateless apps; StatefulSet is the default for stateful clusters.
  • DaemonSet runs one Pod per node, for node-local concerns.
  • Job and CronJob handle finite tasks, with backoffLimit and concurrencyPolicy as critical guard rails.
  • The biggest pitfalls are using the wrong workload for stateful data, forgetting the headless Service for StatefulSets, and misconfigured retry or schedule policies.

Resources for Further Learning

Next Steps

Now that you can pick the right workload for the job, the next post tackles X is for eXtensions, covering CustomResourceDefinitions, Operators, and how to extend Kubernetes when the built-in workloads are not enough.