VM Storage with DataVolumes and CDI

Every VM you’ve booted so far has been ephemeral. The container disk image is read-only, runtime writes are held in memory, and everything you change disappears when the VMI restarts. That’s the right default for learning, and the wrong default for anything you want to keep. This tutorial fixes both: it attaches a PersistentVolumeClaim to testvm for durable state, and then boots a fresh VM directly from a real cloud image imported by CDI.

containerDisk is not persistent

Worth saying clearly because it bites people in production: a containerDisk is a disk image baked into an OCI container. The VM writes go into a writable layer in memory. Stop the VM, the writes are gone. Use containerDisks for stateless, immutable, throwaway workloads — the OS root for a build agent, a worker node base image, a quickly-provisioned scratch VM. Anything you’d hate to lose needs a PVC.

Hotplug a PVC into the running VM

Create a small PVC backed by your default StorageClass:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: testvm-data
spec:
  accessModes: [ReadWriteOnce]
  resources:
    requests:
      storage: 5Gi

kubectl apply -f testvm-data-pvc.yaml
kubectl get pvc testvm-data

Hotplug it into the running VM:

virtctl addvolume testvm --volume-name=testvm-data

The guest sees a new block device appear (typically /dev/vdb) without a reboot — the kernel notices the same way it would notice a USB drive plugged into a real machine. SSH in, partition, format, mount, write something to verify:

virtctl ssh -i ~/.ssh/id_ed25519 ubuntu@vmi/testvm

# Inside the guest
lsblk
sudo mkfs.ext4 /dev/vdb
sudo mkdir -p /mnt/data
sudo mount /dev/vdb /mnt/data
echo "persistent across reboots" | sudo tee /mnt/data/canary.txt
sudo umount /mnt/data
exit

Stop the VM and start it again. The PVC stays put; when you re-attach and remount, your file is still there. That’s the durability story KubeVirt inherits from Kubernetes’ storage primitives — without writing any virtualization-specific glue.

Boot a fresh VM from a DataVolume

Hotplug is right when you’re adding a data disk to an existing VM. For a VM that boots from a real cloud image (not the small containerDisk), you want a DataVolume as the root.

A DataVolume is a PVC with superpowers. CDI’s DataVolume controller watches for them, creates a PVC under the hood, and runs an importer pod that downloads a cloud image from an HTTP URL (or a registry, an upload, an existing PVC clone, a VolumeSnapshot, etc.) into the PVC. When the importer reports Succeeded, the PVC is ready to boot from.

Here’s a VM whose root disk is a DataVolume importing the official Ubuntu 22.04 cloud image:

apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: ubuntu-vm
spec:
  runStrategy: Always
  dataVolumeTemplates:
    - metadata:
        name: ubuntu-vm-rootdisk
      spec:
        source:
          http:
            url: https://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-amd64.img
        storage:
          accessModes: [ReadWriteOnce]
          resources:
            requests:
              storage: 20Gi
  template:
    metadata:
      labels:
        kubevirt.io/domain: ubuntu-vm
    spec:
      domain:
        devices:
          disks:
            - name: rootdisk
              disk:
                bus: virtio
            - name: cloudinitdisk
              disk:
                bus: virtio
          interfaces:
            - name: default
              masquerade: {}
        resources:
          requests:
            memory: 2Gi
            cpu: "1"
      networks:
        - name: default
          pod: {}
      volumes:
        - name: rootdisk
          dataVolume:
            name: ubuntu-vm-rootdisk
        - name: cloudinitdisk
          cloudInitNoCloud:
            userData: |
              #cloud-config
              ssh_authorized_keys:
                - PASTE_YOUR_PUBLIC_KEY_HERE

Apply it and watch the import:

kubectl apply -f ubuntu-vm.yaml
kubectl get datavolume,pvc,vmi -l kubevirt.io/domain=ubuntu-vm -w

CDI walks through WaitForFirstConsumer → ImportInProgress (with %) → Succeeded (3–8 minutes for a ~600 MB image, depending on bandwidth). Once the DataVolume is ready, the VM boots and you can virtctl ssh in.

The access mode trap

The accessModes: [ReadWriteOnce] choice above works for a single-node VM and prevents live migration. RWO PVCs only allow one pod to mount them at a time, so KubeVirt can’t migrate the VM to another node without first stopping it. For a live-migratable VM, the StorageClass must support ReadWriteMany (RWX) and the DataVolume/PVC has to declare it. On a sandbox cluster running Longhorn, both modes are available — you pick one when you create the volume. On most cloud block-storage CSI drivers, only RWO is available; for RWX you need a file-storage driver (NFS, CephFS, EFS).

This catches teams late in a project. Write your DataVolumes with the access mode you actually need from day one. If migration matters, RWX. If it doesn’t, RWO is cheaper and faster.

What’s next

Storage and migration are the foundation for everything operational — backups, HA, planned-maintenance drains. With a real DataVolume root and the right access mode, the rest of the operational story (live migration with virtctl migrate, snapshots via VirtualMachineSnapshot, backup integrations) is upstream KubeVirt territory and the user guide is the right next read.