Troubleshooting QEMU-GlusterFS

As described in my previous blog post, QEMU supports talking to GlusterFS using libgfapi which is a much better way to use GlusterFS to host VM images than using the FUSE mount to access GlusterFS volumes. However due to some bugs that exist in GlusterFS-3.4, any invalid specification of GlusterFS drive on QEMU command line can result in completely non-obvious error messages from QEMU. The aim of this blog post is to document some of these known error scenarios in QEMU-GlusterFS so that users can figure out what could be potentially wrong in their QEMU-GlusterFS setup.

As of this writing (Aug 2013), I know about two bugs in GlusterFS that cause all this confusion in the failure messages:

  • A bug that results in GlusterFS log messages not reaching stderr because of which no meaningful failure messages are shown from QEMU. This bug has been fixed, and it should eventually make it to some release (probably 3.4.1) of GlusterFS.
  • A bug in a libgfapi routine called glfs_init() that results in returning failure with errno set to 0. QEMU depends on errno here and this leads to unpleasant segmentation faults in QEMU.

Environment

I am using Fedora 19 with distro-provided QEMU and GlusterFS rpms for all the experiments here.

# rpm -qa | grep qemu
qemu-system-x86-1.4.2-5.fc19.x86_64
ipxe-roms-qemu-20130517-2.gitc4bce43.fc19.noarch
qemu-kvm-1.4.2-5.fc19.x86_64
qemu-guest-agent-1.4.2-3.fc19.x86_64
libvirt-daemon-driver-qemu-1.0.5.1-1.fc19.x86_64
qemu-img-1.4.2-5.fc19.x86_64
qemu-common-1.4.2-5.fc19.x86_64

# rpm -qa | grep gluster
glusterfs-3.4.0-8.fc19.x86_64
glusterfs-libs-3.4.0-8.fc19.x86_64
glusterfs-fuse-3.4.0-8.fc19.x86_64
glusterfs-server-3.4.0-8.fc19.x86_64
glusterfs-api-3.4.0-8.fc19.x86_64
glusterfs-debuginfo-3.4.0-8.fc19.x86_64
glusterfs-cli-3.4.0-8.fc19.x86_64

Error scenarios

1. glusterd service not started

On Fedora 19, glusterd service fails to start during boot and if you don’t notice it, you can end up using QEMU with GlusterFS when glusterd service isn’t running. In this scenario, you will encounter the following kind of failure:

[root@localhost ~]# qemu-system-x86_64 –enable-kvm -nographic -smp 2 -m 1024 -drive file=gluster://kvm-gluster/test/F17-qcow2,if=virtio
qemu-system-x86_64: -drive file=gluster://kvm-gluster/test/F17-qcow2,if=virtio: Gluster connection failed for server=kvm-gluster port=0 volume=test image=F17-qcow2 transport=tcp
qemu-system-x86_64: -drive file=gluster://kvm-gluster/test/F17-qcow2,if=virtio: could not open disk image gluster://kvm-gluster/test/F17-qcow2: No data available

2. GlusterFS volume not started

If QEMU is used to boot a VM image on GlusterFS volume which is not in started state, the following kind of error is seen:

[root@localhost ~]# gluster volume status test
Volume test is not started
[root@localhost ~]# qemu-system-x86_64 –enable-kvm -nographic -smp 2 -m 1024 -drive file=gluster://kvm-gluster/test/F17-qcow2,if=virtio
qemu-system-x86_64: -drive file=gluster://kvm-gluster/test/F17-qcow2,if=virtio: Gluster connection failed for server=kvm-gluster port=0 volume=test image=F17-qcow2 transport=tcp
qemu-system-x86_64: -drive file=gluster://kvm-gluster/test/F17-qcow2,if=virtio: could not open disk image gluster://kvm-gluster/test/F17-qcow2: No data available

3. QEMU run as non-root user

As of this writing (Aug 2013), GlusterFS doesn’t support non-root users to access GlusterFS volumes via libgfapi without some manual settings. This is a very typical scenario for most users who don’t use QEMU directly but use it via libvirt and oVirt. In this scenario, the following kind of error is seen:

[bharata@localhost ~]$ qemu-system-x86_64 –enable-kvm -nographic -smp 2 -m 1024 -drive file=gluster://kvm-gluster/test/F17-qcow2,if=virtio
qemu-system-x86_64: -drive file=gluster://kvm-gluster/test/F17-qcow2,if=virtio: Gluster connection failed for server=kvm-gluster port=0 volume=test image=F17-qcow2 transport=tcp
qemu-system-x86_64: -drive file=gluster://kvm-gluster/test/F17-qcow2,if=virtio: could not open disk image gluster://kvm-gluster/test/F17-qcow2: No data available

As you can see, 3 different failure scenarios produced similar error messages. This will improve once error logs from GlusterFS start reaching QEMU when GlusterFS with the fix is used.

4. Specifying non-existing GlusterFS volume or server

Specifying invalid or wrong server or volume names can cause nasty failures like this:

[root@localhost ~]# qemu-system-x86_64 –enable-kvm -nographic -smp 2 -m 1024 -drive file=gluster://kvm-gluster/test-xxx/F17-qcow2,if=virtio
qemu-system-x86_64: -drive file=gluster://kvm-gluster/test-xxx/F17-qcow2,if=virtio: Gluster connection failed for server=kvm-gluster port=0 volume=test-xxx image=F17-qcow2 transport=tcp
Segmentation fault (core dumped)

[root@localhost ~]# qemu-system-x86_64 –enable-kvm -nographic -smp 2 -m 1024 -drive file=gluster://kvm-gluster-xxx/test/F17-qcow2,if=virtio
qemu-system-x86_64: -drive file=gluster://kvm-gluster-xxx/test/F17-qcow2,if=virtio: Gluster connection failed for server=kvm-gluster-xxx port=0 volume=test image=F17-qcow2 transport=tcp
Segmentation fault (core dumped)

This should get fixed when the bug in libgfapi is resolved.

Leave a comment