Category: code

Stateless Hypervisors at Scale

Running a public cloud that provisions infrastructure has many challenges, especially when you start getting to very large scale. Today I’m going to touch on the hypervisor piece, the main part of a public cloud that contains the customers data running in their instances.

Hypervisors typically run on bare metal, have some sort of operating system, host configuration, the customer’s instance settings and then if using local storage, the virtual disks.

Traditionally, an operating system is installed and configuration management like Puppet, Chef, or Ansible is ran to bring the host machine up to deployed specification and then added to automation that ultimately provisions instances for a public cloud.

Over time, new features get implemented, bugs are fixed, and operational knowledge is gained so your deployed infrastructure will evolve over time. As your infrastructure grows, your older legacy style infrastructure can start looking a lot different and start becoming out of sync and inconsistent.

To break it down into a few points:

  • Hypervisors become inconsistent over time by ongoing maintenance, code releases, and manual troubleshooting by operations.
  • Optimizations, patches, and security fixes are pushed with newer builds but older builds in production never get caught up.
  • Critical kernel or hypervisor updates that require reboots are hard to do because of the uptime requirements of a public cloud.

So what if we got rid of the traditional methods of OS installation and configuration management and instead created a snapshot of your server build once and then deployed that to thousands of servers?

“We’ll Do It Live”

If you’ve ever installed Ubuntu, typically you’ll use what’s called a Live CD to install the OS. The CD loads an OS into RAM and brings up the GUI so that you can then run the install from there. Many distributions over the years have used Live CDs for installation, rescue, or to serve as a tool for recovering from data loss.

The same concept can be applied to a hypervisor or a server running a work load. If you think about it, the hypervisors typically have one purpose, to run instances virtually for a user. Why have thousands of independent installs?

Creating a Live Image

The process I’ve been using to create live images is relatively simple. I’ve detailed some very high level basics and will deep dive into each one of these at a later date:

  • Create an initial minimal chroot of the filesystem
  • Using Ansible, run configuration management one time within the chroot. This includes all additional packages needed, any customizations, and other additional things you’d normally do in your configuration run.
  • Install tools to allow for Live Booting to work
  • CentOS/Debian/Fedora/OpenSUSE/Ubuntu – dracut
  • Regenerate the initrd to inject the live boot tools into the initrd
  • Copy the kernel and initrd out
  • Create an image file and sync the filesystem into the image file.

From there you now the entire build of your OS represented by three files that can be used to boot the operating system over the network, from Grub, or via kexec.

To persist or not persist?

Now at this point you essentially will have an image that can boot into RAM using iPXE, Grub, or even kexec which is fully stateless. But what if you want to actually make the data persist? With a few scripts added to the boot time, you can very easily separate the actual operating system and applications which will need updating over time from the user’s data which will need to persist and be constant.

The scripts create symlinks from the filesystem in RAM to local storage on the server so that when the application tries to write to a directory, it gets redirected to a persistent storage on the local disk. The scripts to build the symlinks are part of the image so they are recreated every time the server boots the image.

In the example of an Openstack Nova Compute running Libvirt+KVM booting as a LiveOS, I have just a few locations on the filesystem that symlink to /data which is mounted on local storage on /dev/sda2:

  • /etc/libvirt – libvirt configurations
  • /etc/nova – Openstack Nova configuration
  • /etc/openvswitch – openvswitch settings and config
  • /etc/systemd/network – systemd networking configs
  • /var/lib/libvirt/ – libvirt files
  • /var/lib/nova/ – instance location
  • /var/lib/openvswitch/ – openvswitch database

Those locations and files within them make up the unique part of each hypervisor and keep them separate from the rest of the overall OS which will need to go through constant upgrading or changes.

Squashible

I’ve been working on making some of the bits we’ve been working on available to the public. It’s a project called Squashible. The name came from mashing SquashFS with Ansible. We switched away from using SquashFS for the time being but the name stuck for now until I can come up with a better name.

You can play around with it here. It’s a constant work in progress so please use at your own risk. It currently runs through various roles to create an image with the minimal set of packages you need to run a hypervisor of a certain type. Many thanks to Major Hayden for working with me side by side on a lot of this project over the past year.

Openstack

A video to my presentation and slides are below for Openstack Austin 2016 – Stateless Hypervisors at Scale.


Feedback

Comments, concerns, ideas? Let me know!

Booting Linux ISOs with Memdisk and iPXE

There are a number of distributions out there that provide proper support for booting the distribution over the network. A lot of the more popular distributions usually provide a installer kernels that can be easily downloaded for use. You point at the vmlinuz and the initrd and can them immediately proceed with the install streaming down packages as needed. These distributions make it great for tools like netboot.xyz to install using iPXE.

There are some distributions out there that don’t have this functionality and typically only produce the ISO without any repositories that provide installer kernels or the rootfs.

In those cases, occasionally you can use memdisk and iPXE to boot those ISOs but they don’t always work. In doing some research, I ran across one of the major issues as to why.

Syslinux – Memdisk

The following was taken from syslinux – memdisk.

The majority of Linux based CD images will also fail to work with MEMDISK ISO emulation. Linux distributions require kernel and initrd files to be specified, as soon as these files are loaded the protected mode kernel driver(s) take control and the virtual CD will no longer be accessible. If any other files are required from the CD/DVD they will be missing, resulting in boot error(s). Linux distributions that only require kernel and initrd files function fully via ISO emulation, as no other data needs accessing from the virtual CD/DVD drive once they have been loaded. The boot loader has read all necessary files to memory by using INT 13h, before booting the kernel.

There is also another solution, which requires the phram and mtdblock kernel module and memdiskfind utility of the Syslinux package (utils/memdiskfind). memdiskfind will detect the MEMDISK mapped image and will print the start and length of the found MEMDISK mapped image in a format phram understands:

modprobe phram phram=memdisk,$(memdiskfind)
modprobe mtdblock

This will create a /dev/mtdblock0 device, which should be the .ISO image, and should be mountable.

If your image is bigger than 128MiB and you have a 32-bit OS, then you have to increase the maximum memory usage of vmalloc, by adding:

vmalloc=<at_least_size_of_your_image_in_mib>Mi</at_least_size_of_your_image_in_mib>

Example: vmalloc=256Mi to your kernel parameters.

memdiskfind can be compiled with the klibc instead of with the glibc C library to get a much smaller binary for use in the initramfs:

cd ./syslinux-4.04/utils/
make spotless
make CC=klcc memdiskfind

Implementations of phram and mtdblock

ArchLinux has implemented the above concept here and here.

Debian Live used it here.

It’s also been implemented in Clonezilla and GParted.

Antergos Linux based on Arch Linux works great with memdisk using the phram module.

Conclusion

I think it would be great for more distributions to attempt to implement something like this so that iPXE tools can be used to load the ISOs instead of actually having to burn or look for the location of the latest ISO every time.

Some of the distributions I’d love to see network support or better memdisk support are:

Linux Mint
Manjaro
Elementary
Solus Project

There are also many other new distributions being released all the time. I typically use DistroWatch to determine the most popular distributions to attempt to add to netboot.xyz. I’d love to get a lot of these added to make it really easy to install anything on the fly.

I’d also love to see some of the hypervisors out there crack open the ISOs, pull them outside of their paywalls, and host the bits on their servers so that it’s much easier to immediately boot an install to test something out without having to jump through many hoops. I have working installs for VMware ESX and Citrix XenServer but I’d need to have them host the bits or allow permission to do so for a public facing installer menu.

netboot.xyz

My newest project on the side is netboot.xyz. If you’ve seen boot.rackspace.com, this should look pretty familiar to you. I ran across cheap .xyz domains from Namecheap (one dollar at the time!), and figured the netboot.xyz name space was much easier to remember and was more neutral to the goal I was trying to accomplish. I forked boot.rackspace.com (still doing basic maintenance) and am now focusing my efforts on netboot.xyz.

My goal with the project is to make it easy as possible to boot many of the popular operating systems from bare metal, virtual machines, etc without the hassle of hunting down the latest ISO for the OS you want to download. Ideally it’s usable with any service provider or just someone who maintains their own servers.

I usually try and use operating systems that make their boot loaders available via mirrors, although there are occasionally some exceptions. I’m also experimenting with various new builds like WinPE, Live Booted OS, and I’d like to even pursue getting some hypervisors on there as well to make it as easy as possible to install everything.

It’s also a great place to just let people play around with new operating systems with just a menu and learn about the many many distributions out there.

Check it out when you get a chance and drop me some feedback or make a pull request if you see something I’m missing. I’ve added a really easy way to test your pull request from the utility menu, all you need to do is enter in your github username and branch or hash of the commit you want to test.

I’m still working on a bunch of documents for demonstrating how easy it is to plug the 1MB iPXE ISO into things like VMware Fusion, Virtual Box, Openstack, so bear with me while I try and get all of those available.

Enjoy!

Creating Custom Security Updates In XenServer

Some of you may have heard about the latest vulnerability affecting QEMU codenamed VENOM.

Sometimes security vulnerabilities are released faster than the vendor can qualify a valid hot fix. In this post, I’ll walk you through how to generate your own XenServer hotfix in order to rapidly patch the issue.

How XenServer Patching Works

The sources for XenServer are provided each release, usually in a binpkg.iso. Here’s are some links for the latest version of XenServer 6.5:

XenServer Primary Download Page

XenServer 6.5 Hypervisor

XenServer 6.5 Sources

XenServer 6.5 DDK

Creating your Own Custom Patch

The first thing you’ll need to do is to download the DDK for the affected version. The DDKs are released for each version of XenServer and also released anytime the kernel revs within the major release. The DDK provides the same environment that the SRPMs were created under, so it makes it really easy to rebuild the RPMs. It comes packaged as an appliance, so you’ll want to import that appliance into a build of XenServer and boot it up

Determine What Needs to be Patched

If QEMU needs patching, more than likely it’s the qemu-dm binary (/usr/lib64/xen/bin/qemu-dm). To determine which packages sources you need to retrieve, run a rpm query on that binary:

[root@hostname /]# rpm -qf /usr/lib64/xen/bin/qemu-dm
xen-device-model-1.9.0-199.7656

Now we know that we need to make the changes to the xen-device-model.

If we needed to patch Xen:

[root@hostname boot]# rpm -qf xen-4.4.1-xs100346.gz
xen-hypervisor-4.4.1-1.9.0.462.28802

And so on. Once you know what the package is, then we can go about finding the source rpm.

Obtaining the Source RPM

Assuming the version of XenServer you’re using is up to date on patches, you’ll want to grab either the latest deployed patch to your environment, or grab the latest patch that contained the version you want to update. Each xsupdate contains updated RPMs, so you might need to run through all of the latest patches to find the right one.

Anytime a hotfix is released, the hotfix will include the sources that were changed as part of the update release. For example, within the zip of a hotfixed release, 6.5SP1 in this case, you’ll have two files:

  • the xsupdate that is used to apply to the server, XS65ESP1.xsupdate
  • the sources package, XS65ESP1-src-pkgs.tar.bz2

The sources package includes all of the SRPMs that were used to create the latest xsupdate.

Extracting the Sources

We’ll want to take the latest available sources, grab the Source RPM, and install it to the DDK server. We’ll use the one out of this hotfix to simulate updating QEMU for VENOM:

wget http://downloadns.citrix.com.edgesuite.net/10325/XS62ESP1021.zip
unzip XS62ESP1021.zip
bunzip2 XS62ESP1021-src-pkgs.tar.bz2
tar xvf XS62ESP1021-src-pkgs.tar 

Create .rpmmacros so that the sources extract to a known location:

# ~/.rpmmacros
%packager %(echo "$USER")
%_topdir %(echo "$HOME")/rpmbuild

Make directories:

mkdir ~/rpmbuild ~/rpmbuild/SOURCES ~/rpmbuild/RPMS ~/rpmbuild/BUILD ~/rpmbuild/SRPMS ~/rpmbuild/SPECS 

Install the sources:

rpm -i xen-device-model-1.8.0-105.7582.i686.src.rpm

Copy the patch file to ~/rpmbuild/SOURCES/:

cp xsa133-qemut.patch ~/rpmbuild/SOURCES/

Update the SPEC file to include the new patch and bump the release from 105.7582 to 105.7582.1custom. We do this so we can prevent conflicts from future versions but still differentiate which version we’re on:

[root@localhost]# diff -u xen-device-model.spec xen-device-model.spec.mod
--- xen-device-model.spec   2015-03-17 12:02:05.000000000 -0400
+++ xen-device-model.spec.mod   2015-05-12 19:35:53.000000000 -0400
@@ -1,11 +1,12 @@ 
 Summary: qemu-dm device model
 Name: xen-device-model
 Version: 1.8.0
-Release: 105.7582
+Release: 105.7582.1custom
 License: GPL
 Group: System/Hypervisor 
 Source0: xen-device-model-%{version}.tar.bz2
 Patch0: xen-device-model-development.patch
+Patch1: xsa133-qemut.patch
 BuildRoot: %{_tmppath}/%{name}-%{version}-%{release}-buildroot
 BuildRequires: SDL-devel, zlib-devel, xen-devel, ncurses-devel, pciutils-devel
@@ -14,6 +15,7 @@
 %prep
 %setup -q
 %patch0 -p1
+%patch1 -p1
 %build
 ./xen-setup --disable-opengl --disable-vnc-tls --disable-blobs
@@ -37,6 +39,9 @@
 %dir /var/xen/qemu
 %changelog
+* Tue Mar 17 2015 MyPatch <www.mypatch.com> [1.8.0 105.7582.1custom]
+- xsa133-qemu
+
 * Tue Mar 17 2015 Citrix Systems, Inc. <www.citrix.com> [1.8.0 105.7582]
 - Build ioemu.

Regenerate the RPM from the sources, and watch for errors.

rpmbuild -ba xen-device-model.spec

Make sure your patches apply cleanly and if they do, after the compile has completed, the fresh RPMs will be present in ~/RPMS:

ls ~/rpmbuild/RPMS/i386/xen-device-model* 
xen-device-model-1.8.0-105.7582.1custom.i386
xen-device-model-debuginfo-1.8.0-105.7582.1custom.i386.rpm 

Deploying the RPMs to XenServer

You’ll want to take the new RPM and deploy it using:

rpm -Uvh xen-device-model-1.8.0-105.7582.1custom.i386 

If you need to revert to the original version, you can run

rpm --force -Uvh xen-device-model-1.8.0-105.7582.i386

Depending on the type of patching you’re doing, you’ll need to determine your reload strategy. If it’s Xen or a kernel for instance, you’ll know you’ll have to reboot. If it QEMU, you know that you’ll have to detach the disks and reload them so that they get the newly patched process.

Developing boot.rackspace.com

When I started down the path of building osimag.es, I started realizing that it could be really useful for others, especially in a cloud environment. Since my main focus has been working on Rackspace Cloud Servers for a number of years, I decided to see how feasible it would be to put together a menu driven installer for any Operating System working in a Infrastructure as a Service type of environment. I figured there’s probably a number of power users who might not want to start out with the default images provided, but possibly would want the opportunity to create their own custom image from scratch.

Will it even work?

I started testing out the XenServer boot from ISO code in Openstack to see if someone might have already gotten that working for another use case. To my delight, the boot from ISO code worked out pretty well. I was able to upload the iPXE 1MB iso into Glance and boot from that image type.

The next problem to solve was the fact that Rackspace Cloud Servers assigns static IP addresses and does not currently run a DHCP service to assign out the networking. iPXE usually works best when DHCP is used as the network stack gets set up automatically. Because of this, a customer launching a cloud server could boot the iPXE image but would have to specify the networking manually of the instance in order to chain load boot.rackspace.com.

We started thinking about how to automate this, and with the help of a few developers came up with a solution. The solution retrieves an iPXE image on boot, brings it down to the hypervisor, extracts the iPXE kernel, and regenerates the ISO with a new iPXE startup script that contains the networking information of the instance. Then when the instance is started, iPXE is able to get on the network and load up boot.rackspace.com automatically. Once iPXE has those values, they can then be passed to kernel command line for distributions that support network options. This allows for the user to not have to worry about any networking input during installation.

Hosting the Menu

Because boot.rackspace.com is just a bunch of iPXE scripts, they are hosted on Cloud Files in a container. The domain is a CNAME to the containers URL and then hosted on the Akamai CDN. The source is deployed from Github to the Cloud Files container when new commits are checked in via a Jenkins job. This makes it very lightweight and very scalable to run. The next thing I’m probably going to look at is seeing if I can remove the Jenkins server completely and just run the deploy out of Github. I was also able to enable CDN logs within the container and I’m using a service called Qloudstat to parse those logs and provide metrics on the usage of the scripts.

Delete those old ISOs
Having a small 1MB image is really nice for those times when you need to deploy an OS onto a remote server, or just need to install something into Virtual Box or VMware. There’s really no point in storing tons of ISOs on your machine if you can just stream the packages you need.

What’s Next?
I have a few ideas about some new features that I’d like to add. I’d like to add a menu of experimental items and I’d also like to have the ability to generate a new version of the menu from a pull request so that new changes can be quickly validated before being merged into the main code base. If you haven’t tried out boot.rackspace.com yet, I encourage you to check it out. You can get a quick overview from my Rackspace blog post.

XenServer Auto Patcher

I put together a little script that might come in handy to get Citrix XenServer fully up to date after doing a factory install. You can find it here:

https://github.com/amesserl/xs_patcher

It will detect the version of XenServer you are running and install all of the latest Citrix XenServer hotfixes that are available in sequential order. It will also detect any previous patches and install anything that might not be present. If you don not have the hotfixes on the machine, it will retrieve them for you. After running the script, all you will need to do is reboot so it will pick up the latest kernel.

To install it automatically during an install, you will need to put the patcher script on the disk with the cache prepropulated with all of the patches to avoid the script retrieving them each time. It’s usually best to put this in place during the post install. You won’t want to run it during the post install because XAPI isn’t up and running at that point which the hotfixes require. You’ll want to install a script into /etc/firstboot.d with a starting number higher than all the other processes that run during firstboot. Once the initial firstboot has run which sets up XenServer and all of it’s storage repositories, you can then kick off the xs_patcher.sh script which will install all of the needed hotfixes. I usually then have one more call to reboot occur after that.

I’ll try and maintain the script going forward as new hotfixes are released by Citrix. Currently it supports Boston, Sanibel, and Tampa. I’ll probably go back and grab earlier versions as well in the future as I have time.