Xen: Frequently Confusing Issues

subtitle: One man's quest for Xen enlightenment...

Introduction

Here I am trying to figure out if Xen is something I ought to be interested in, and I find things like the following as practically the first question in the Xen FAQ: "to hide the second network interface from dom0, you can then append physdev_dom0_hide=(02:03.1) to your kernel /boot/xen-2.0.gz."

Perhaps that questions is frequently asked, but I have to believe an even more frequently asked question is "Huh?". I clearly need to find the beginning of the conversation so I can catch up.

Let's go on a web quest to see what I can find:

  • Xen Basics

    Slightly better attempt than most Xen pages to give you some idea of what is going on, but then dives into a specific prescription of how to install guests that won't be what you want to do. (Seems likely it was also written by someone with english as a second language. I hope people don't think the same about this :-).

  • Xen Wikipedia entry

    Sunday supplement material that may just leave you knowing less than you did before.

  • Xen FAQ

    If this is an FAQ, then the folks frequently asking these questions ain't us folks trying to figure out how to get started :-).

  • Xen Fedora Core 6

    If you want to do exactly what is described here, you can do that and install Fedora as a guest on Fedora as a host, and you'll still have no idea what the heck you did when you've done it.

  • Xen Users Guide

    Hey! This may be as close to the beginning of the conversation as I can find. The Xen users guide from the font of Xen, Cambridge University. So far it looks like the document most likely to actually answer questions and provide some understanding of what is going on. (Unfortunately as time passes, I also discover that much of the information in here is out of date and lots has changed without updates to this guide).

  • The man page for xm(1) (and the "see also" pages it references) also seem to convey more information than most of the web pages I've found so far (but always with the feeling that something is missing).

How the heck do I get started?

I suspect this is the actual FAQ. Most of the Xen support you find in existing operating systems has been created to be very OS specific. The redhat virt-manager tool can install a guest from an installation tree that looks like a redhat or fedora release, but you can't install a suse release tree that way (and vice-versa). The tools crafted to make it easy for debian can install debian guests, etc.

The Xen Users Guide comes closest to providing a clue of what you might do with lots of different operating systems. It hints (in section 3.1 on booting Xen) that the simplest thing might be to install the different operating system just as you would if you were going to multi-boot the system, then fiddle the config files to describe those boot partitions as guest operating systems.

Naturally, figuring out the details of the tweaking isn't exactly spelled out. Perhaps this mailing list thread will get some useful responses someday: From multi-boot to multi-guest?

See Another Experiment below for some success doing this.


What about 32 bits then?

I have yet to find a plain statement about the status of running a 32 bit guest on a 64 bit host. Many web pages seem to imply it is possible, but attempting to install a 32 bit Fedora Core 6 guest on my 64 bit FC6 host just generated errors about incompatible kernels. I asked on the fedora-xen mailing list: 32 bit domU, 64 bit dom0?.

One answer I got in that thread seems to be accurate. I was able to install a 32 bit fedora as a guest on 64 bit fedora when running fully virtualized, and it seems to work fine, but the 64 bit hypervisor isn't (yet) capable of talking down to a 32 bit kernel so with a paravirtualized guest, 32 on 64 is a no-go.


What's a pygrub?

I have installed a FC6 domU at home on my FC6 dom0, and perhaps looking around at what the fedora tools installed and how the network is configured and wot-not will tell me more information than I can find by just reading web pages. The install went OK, but I haven't had time to poke around much yet.

I did, however, notice that the config file virt-manager built runs something named pygrub as the guest kernel image. None of the web pages I've been reading on setting up Xen have any mention of pygrub, but now that I've seen the name, new web searches have revealed that it is a special boot loader program that understands how to dig the guest's actual boot image out of the guest's own /boot partition and grub menu, so it avoids the need to copy the boot image to someplace where dom0 can find it. Seems like a good idea, and it apparently worked since I ran updates on the guest I installed last night and the updated kernel was loaded when I rebooted the guest.


What kernel?

Another interesting fact is that the virt-manager and/or the fedora anaconda installer picked the xen kernel to install on the guest OS. I don't know if that is necessary (or even sensible since I probably have no need to run nested hypervisors :-). I'll see if installing the "normal" kernel works, or if it really needs the xen kernel to boot properly in the guest OS.

OK, I have my answer - only the xen kernel works. I get error 22 again (apparently the only error the xm command ever generates) if I try to boot the non-xen FC6 kernel. This gives me an opportunity to try out the mysterious gibberish documented in the Xen Fedora Core 6 wiki page involving losetup, kpartx, and mount to gain access to the guest filesystems and edit the grub.conf file to put back the xen kernel as the default. (The gibberish worked - I did get it to boot with the xen kernel again).

Looking at my test some more, I think I am begining to get the idea: The primary host (Dom0) system has a grub menu that boots the hypervisor as the primary kernel, and that grub entry also has a "module" line that points to the fedora-xen kernel which becomes the Dom0 linux running under the hypervisor.

The "paravirtualized" guest (DomU) fedora has a grub line that just directly boots the fedora-xen kernel as the guest's kernel, no booting of hypervisors in a guest (the hypervisor is already running).

This kind of makes sense now that I've seen it. The Dom0 kernel certainly needs to be running under the hypervisor, so a paravirtualized guest, also needing to run under the hypervisor, also runs the same xen kernel.

So I think the answer to my original question about how to find what kernel you should boot as a paravirtualized guest is: Use the same kernel that OS distribution would use as Dom0 if you installed it as the Dom0 linux. If the linux distribution you want to install as a guest doesn't already have xen support, you'll probably need to build a special kernel to get it booted as a paravirtualized system.

This also may be a partial answer to my question about how to transform a system from multi-boot to guest. I'd want to install the multi-boot system with xen and Dom0 setup to boot, then when changing it to boot as a guest, edit the grub config file to boot the xen kernel directly, not the hypervisor. (And in Another Experiment, I did just that).


Video Driver Mystery

Having finally gotten a guest OS installed, I see that the most the guest can do in the way of video is to run in the 800x600 console via some kind of emulated framebuffer (and I might add that using the mouse in this thing is an adventure :-).

Why then do I see all these posts on the net about nvidia and ATI driver problems with xen? What on earth do the video drivers running on the dom0 system have to do with anything? Why does running video in a Xen kernel act different than running video in a normal kernel?

One theory might be that video drivers are a heavy user of memory mapped I/O and the way the Xen kernel uses memory (to make room for guests) is somehow upsetting to the video drivers?

Another theory might be that the video drivers were written to do low-level operations directly on various bits of hardware, and running in a Dom0 kernel, they need to be doing some of those operations via calls to the hypervisor rather than directly to the hardware (but why would the hypervisor care about video hardware since it doesn't allow the guests to access it?)

Perhaps issues like this are why Linus seems to prefer the KVM approach with the kernel and hypervisor merged into one? In such a scheme, there shouldn't be anything on the host that needs to be different in any way (maybe?). But, of course, KVM only works with newer cpus and full virtualization (probably not an issue if you wait long enough :-).


Who owns the hardware?

The Sunday supplement stuff all seems to imply that the hardware is all owned and managed by the dom0 operating system. If that's really the case, bugs like this one become very hard to understand. I've read it several times. I still have no idea why dom0 can't include the uart driver.


Experiment

The box named "xen" has arrived at work. A massive monster with 4 dual-core opterons, 8 gig of memory, and a collection of 300GB disks. (A bad mistake getting such a nice system - now everyone wants to steal it to do benchmarks and such :-).

We tried to install SLES 10 x86_64 on the beast, and the install hangs, so since I had a Fedora Core 6 DVD laying around we gave that a shot, and it worked (but only in text mode - the graphical installer gets mighty screwy with the ATI chip on the motherboard). Fedora Core 6 seemed to work (we just created a small /boot, swap, and root partition on one disk, leaving everything else untouched, so there is lots of space to expand into). Interesting aside: SLES 10 i686 does install OK (but who wants to run a 32 bit OS on an opteron?)

As a test I then installed a fully virtualized i386 Fedora Core 6 guest, and it works (I'm amazed). I used the gui virt-manager app from the "System Tools" menu in Gnome and just faked out an 8 gig disk for it to install into via a file image, installing from the FC6 dvd in the dvd drive (but I wonder if it would let me change CDs if I had to install from CD - stay tuned to this thread for more info).

The only thing weird I noticed was telling it that it has 2 cpus got it very confused (perhaps some info will appear in this thread someday?). The install gets hung right after the kernel image it loads says something about activating 2 cpus. If I install it with a single virtual cpu, it worked fine. (I didn't try telling it 8 cpus - perhaps that would work too, and it is just the subset that confuses it).


Limitations

It is worth noting some limitations of Xen before everyone gets way too hyper about it :-).

As mentioned above, the only "native" video for guests is a simple framebuffer emulation mode, so you aren't likely to want a xen server for doing multi-paltform development on your openGL intensive 3D applications.

There is (as yet) no support for any kind of power management in the hypervisor or Dom0, so if you install and run xen on your laptop to get access to windows and linux apps at the same time, you'll find your battery going dead faster than you can close your Office XP applications. Even in a desktop or server system, you'll find the CPUs running hotter most of the time (with the extra electricity that consumes and greater strain it places on the chips).

I get the impression that memory usage can't be dynamic, you pretty much have to reserve chunks of memory for each guest, and that's what they get (which kinda makes sense, because new memory doesn't usually magically appear while real hardware is running).

Another important fact to note is that you can't have the same disk partitions mounted at the same time in multiple guests or in guest and host. Disks need to be dedicated since each kernel thinks it is in full control, having disk contents changing out from under it would be upsetting. You can, of course, still share disks via NFS mounts. (Again this makes sense - on real hardware without very special disk controllers, only one computer is talking to the disk).

There is a joke we often make: "There is only one machine in the comp center". Basically, so many machines depend on one another, that a single crash often renders even the machines that are still up ineffective. With all these Xen guests running on the same host hardware, the joke is no longer just a joke - it is the literal truth. If the host crashes, so do all the guests.


Another Experiment

At home (where my decrepit old dual core amd64 chip forces me to use paravirtualization - shucks, it is almost a year old now :-), I just managed to install openSUSE 10.2 as a guest under Fedora Core 6 via the following technique:

  • I had an old FC5 boot and root partition I didn't need anymore in /dev/sda5 and /dev/sda10, so I installed SUSE over FC5 in those partitions, not doing any of this under Xen, just installing as I normally would on the real machine with the real DVD drive.
  • After that all worked as a normal multi-boot kind of system, I booted back into FC6, mounted the /dev/sda5 and /dev/sda10 partitions and edited the grub config to boot the xen kernel directly (not as a module under the hypervisor as it was installed originally). I also tried to fix the network interface definition to believe in the new virtual mac address (but without much success as SUSE network config files are somewhat strange and confusing in new and different ways than the fedora strange and confusing network config files).
  • I unmounted the /dev/sda5 and /dev/sda10 partitions, and edited the sample xen config file virt-manager generated from my previous test installing an FC6 guest (which I deleted after playing with it a while). The new config file for the SUSE guest system "meathead" looks like:
    # Automatically generated xen config file
    name = "meathead"
    memory = "512"
    disk = [ 'phy:sda5,sda5,w', 'phy:sda10,sda10,w' ]
    vif = [ 'mac=00:16:3e:47:04:43, bridge=xenbr0', ]
    vnc=1
    vncunused=1
    uuid = "b5f9eb7c-a1ad-e228-ab52-8858700edf33"
    bootloader="/usr/bin/pygrub"
    vcpus=2
    on_reboot   = 'restart'
    on_crash    = 'restart'
    

    An important bit is the disk line which says /dev/sda5 should look like /dev/sda5 to the guest and /dev/sda10 should look like /dev/sda10 to the guest (since the grub configs and fstabs and wot-not all expect those partitions, best not to rename them).

    Also I left the bootloader line alone, and whatever voodoo pygrub does to dig up the boot images to use seemed to work OK for this guest on physical partitions just like it did in the previous experiment with the file image providing the disk.

  • That left me with a working guest, but I couldn't talk to it via the emulated console (I guess because SUSE xen kernels don't have whatever voodoo supports the console - yep, see this thread), and the network didn't work because I couldn't understand how to diddle the SUSE network config info. Fortunately, I was able to use the -c option in the xm create command to get a serial console to talk to the guest, and in there, I was able to use the curses version of yast to edit the netowrk definition and get it working again (though curses doesn't work for beans the way yast uses it, I could see enough of the characters that mattered to get it configed - perhaps setting TERM could have provided something that worked better, but not sure what to set it to).

After that, it was mostly clear sailing. Even without the console, I can talk X over the network to run yast or gnome sessions or anything, so I successfully got a non-Fedora operating system to run under fedora by doing the install from completely outside of fedora or xen.

I suspect this same technique would work with fully virtualized operating systems as well, there would just be some slight differences as the loader wouldn't be pygrub, instead things would look a lot more like the guest1 test I installed on the xen box at work:

# Automatically generated xen config file
name = "guest1"
builder = "hvm"
memory = "500"
disk = [ 'file:/guests/guest1,hda,w', ]
vif = [ 'type=ioemu, mac=00:16:3e:6d:02:f3, bridge=xenbr0', ]
uuid = "5891d2eb-7bcc-607c-d509-14943b2c36b9"
device_model = "/usr/lib64/xen/bin/qemu-dm"
kernel = "/usr/lib/xen/boot/hvmloader"
vnc=1
vncunused=1
apic=1
acpi=1
pae=1

vcpus=1
serial = "pty" # enable serial console
on_reboot   = 'restart'
on_crash    = 'restart'

Instead of a bootloader we have stuff like builder and kernel pointing to things named "hvm" (which may stand for hardware virtual machine?).

One thing to note with this scheme: Normally, when multi-booting several versions of linux, you just share the same swap partition with all of them. When installing for multi-boot, but planning to really run as guest, you'll almost certainly want to define a separate swap for each install so you can pass it along in the disk definitions and provide the guest a dedicated swap. I failed to do that in my system at home (but as long as I don't run too much stuff in the guest, it gets along OK without a swap area). Perhaps I can figure out how to use the device mapper voodoo to get some additional swap space. (Actually, it looks simpler than that: This thread explains how).


Networking

The vif lines in the config file refer to something named xenbr0. I think that is some kind of magic "bridge" that the dom0 sets up for each physical network interface on the host, so if you have multiple networks on the host, you can define multiple virtual interfaces in the guests as well, with one using xenbr0, another using xenbr1, etc.

Presumably you could either provide multiple interfaces to each guest, or have some guests run their traffic one way, and other guests another way, depending on how you want to arrange things.

Perhaps you can even hide network interfaces from Dom0 and give direct access to them to some DomU? Hey look! We are back to the original example confusing FAQ entry :-).

Page last modified Fri Dec 15 09:00:20 2006