Skandragon's Ramblings: 2008

Tuesday, May 20, 2008

Vmware Server and Xen

In the past, I've played with VMware (as a "workstation" "server", not the bare-metal one we have access to now) but was never quite happy with it. Some of the problems I had might have been that I ran it on windows Vista, not Linux. However, from a VM point of view, the VM itself should be more or less identical.

Recently I tried out Xen on the same hardware, but using NetBSD/amd64 as the "host" OS.

Hardware

The machine is Gateway GM5446E with a dual core Intel Core 2 Duo with 3 GB of ram. The machine has three SATA hard drives, connected via an Intel AHCI controller running in native AHCI mode.

Bare Hardware Baseline

I booted a standard NetBSD-current/amd64 kernel and ran some speed tests, which gives a baseline for dom0 and guest OS disk I/O speed tests. See below.

VMware

In my VMware install, I used the machine running "Windows Vista Media Center" -- it was what came pre-installed on the machine.

Host OS: Windows Vista Media Center

Guest OSs tried:

NetBSD-4.0/i386
NetBSD-4.0/amd64
NetBSD-current/i386
NetBSD-current/amd64
Linux Ubuntu (server, then-current version)
Linux Debian (then-current version)

Linux booted in 64-bit mode, but would crap out at some later point with similar issues that NetBSD had.

One machine, named "nfsd", was dedicated to serving out home directories and source trees of NetBSD. The other host OSs mounted /home or /netbsd-src from nfsd.

Each machine had a "local" disk to store object files from pkgsrc, src, and other OS-related builds.

General VMware Problems

I could not get netbsd/amd64 (current or 4.0 release) to "self host" -- build /usr/src and kernels -- reliably. They would either silently lock up without reason, or they would crash with an odd CPU exception.

Timekeeping was whacky. Without running the VMware-supplied (closed source) tools on a client, time was off, and apparently in uncorrectable ways. Running ntp made things seriously whacky as the time would drift wildly as ntp tried to correct a guest.

The VMware closed-source tools are only available on a small, limited number of OS types, and then specific versions of many of them. They do supply a .so that is, in theory, linkable on many versions of Linux, but the install procedure warns loudly of warnings pertaining to compatibility.

VMware running on anything but Intel chips with synchronized cycle counters (which most OSs use for high-res timekeeping these days) was a disaster.

VMware Strengths

If the problems above are solved, VMware is a true virtual machine architecture that will run any OS without modification. VMware could run windows guests, right along with unmodified NetBSD, FreeBSD, Linux, and Solaris/x86 guests.

Xen

I tried Xen 3.1.3.

Xen is a different architecture than vmware in that it prefers to use "paravirtualization" rather than a full virtual machine. It has a host machine (called "domain 0" or "dom0") which attaches to the physical hardware and acts as a conduit between the xen hypervisor and hardware.

The boot process is that the xen kernel is booted first, which then boots the dom0 host. Multiple domains can be created, serving different hardware, but in practice this is rarely done.

Each host OS has a config file, and is stated with "xm create /path/to/file.conf". This boots the guest OS and connects a serial console, which can be used with "xm console <name>".

Since the "dom0" is a fully functional OS in its own rights, I have it serve NFS to the guest OSs.

I created the following guest OSs:

NetBSD-current/i386
NetBSD-current/amd64
Windows XP Pro (32-bit)
Windows Server 2003 (32-bit)

Yes, I managed to install Windows XP Pro and Server 2003. They run in a "vnc" console, and for all practical purposes looks like windows. This is using Xen's "hvm" -- which is a full hardware emulated virtual machine, and allows running unmodified guest OSs. People have Vista running in a virual machine under Xen, but I do not have "real" Vista install media or licenses, just the ones that came with and is tied to my hardware.

Xen also supports both realtime and offline "migration." As I have only one machine of the same type, I have not yet read up on how this works. The basics: A realtime copy is made of the guest's ram, device state, and other data. It is transmitted to the new destination, and synced up until a very small switchover time can be used to swap where that guest is running. Xen claims 100 ms switchover time is possible, but there are restrictions: The disks are NOT migrated, so must reside on a shared volume. The physical network each dom0 is on is also shared, in order to avoid disruption of TCP connections. I also believe fairly identical CPU and dom0 operating systems should be used.

Offline migration involves shutting the guest down, copying the disks over, and restarting it on a new dom0. This will, of course, interrupt service.

Xen Problems

It is difficult to configure for the fist time. The documentation is... lacking. It is also only as solid as the host OS is, but vmware has the same issue in "server" or "workstation" incarnations.

Xen is also very, very "linux" specific in documentation and examples. Most of these can be translated -- I certainly did so easily enough -- but this is being corrected in their documentation as more OSs are able to boot as dom0.

Xen Strengths

Timekeeping in Xen, since it is paravirtualized, is almost perfect. Small drifts will occur without running ntp, but all guests (and the host) can run ntp and obtain sanity.

It also appears that all guests and the dom0 "drift" identically, so this is probably related to hardware timekeeping issues. The measured drift of an uncorrected NetBSD guest was 4 seconds in two weeks. ntp correction kept the others in perfect real-world sync.

It is as free as you want it to be. Support and commercial versions exist, but the free stuff works amazingly well.

Performance

On Xen, all tests were performed with the domu's running but idle, and no hvm guests running (windows is just too unpredictable.) On VMware, only one VM was active at once, and the Vista host was as idle as it could be made.

I measured three main things here:

Boot speed: How fast a kernel gets from loading to the first /etc/rc message.
Disk speed: read/write speed.
CPU Performance.

Boot Speed

The dom0 boots as fast as any other kernel boots; it must probe the hardware, wait for hardware to change state, etc. No measured difference between a standard NetBSD-current/amd64 kernel and the dom0 kernel.

The domUs (guests) boot so fast it is nearly impossible to measure. This is because the devices they have access to are known -- all are on a virtual bus, and are directly enumerable, so there is no need to probe for devices, wait for them to change state, or time out when not present. As best I can measure, just under 2 seconds is a fair estimate.

The hvm (windows) guest seems to be about as fast as windows is. I did not analyze this one much.

On VMware, the host OSs boot at about the same speed as a "real" machine boots unless a custom kernel is built with just "known present" devices. Even then, boot times are 15-20 seconds.

Disk Speed

In all host/dom0 tests, "iozone" version 3.263 was used, with a 1 GB file on the same disk. Each test was performed only once; for real comparison data we'd want to run it more than once, but this is just a first-pass test.

For native NetBSD/amd64, I had to increase the file size to 4 GB to avoid the cache, as the machine has 3 GB of ram.

wd0 is a 500 GB SATA 3.0Gb/sec disk.
wd1 is present but unused.
wd2 is a 320 GB SATA 1.5Gb/sec disk.

All are on different channels of an Intel AHCI controller running in native SATA mode.

For the Xen tests, all disks were mounted as files on the dom0 host. From dom0's point of view, the file is mounted on a "vnd" virtual disk, and that virtual disk is exported to the host.

For the VMware test, all disks were mounted as files in the Windows filesystem.

OS	Disk	Block Size	Read	Write
netbsd-current/amd64 native	wd0	8192	60762	59949
	wd0	16384	60545	60141
	wd2	8192	78342	76342
	wd2	16384	78311	75252
netbsd-current/amd64 dom0	wd0	8192	60641	60109
	wd0	16384	60459	61919
	wd2	8192	80258	79102
	wd2	16384	80187	80295
netbsd-current/amd64 domu	wd0	8192	51205	24004
	wd0	16384	51714	27971
	wd2	8192	77990	23997
	wd2	16384	77282	22496
netbsd-current/i386 domu	wd0	8192	41730	25012
	wd0	16384	42008	24543
	wd2	8192	66401	26048
	wd2	16384	66201	28910
netbsd-current/i386 vmware	wd0	8192	25014	13912
	wd0	16384	25417	13771
	wd2	8192	38831	16100
	wd2	16384	38994	16332

I also repeated one test with a raw, physical partition mounted in the netbsd-current/amd64 domU, which bypasses the "double filesystem" issue:

netbsd-current/amd64 domu	wd2	8192	79915	76992
Physical mount	wd2	16384	79744	77102

CPU Performance

Each CPU speed test was run with: Dhrystone Benchmark, Version 2.1 (Language: C) Program compiled without 'register' attribute.

I used an iteration count of 1,000,000,000 for each test.

Operating System	Dhrystones per second
netbsd-current/amd64 native	11,013,216
netbsd-current/amd64 dom0	10,365,917
netbsd-current/amd64 domu	11,130,899
netbsd-current/i386 domu	4,935,347
netbsd-current/i386 vmware	5,012,123

Just for grins, I ran the following tests, one dhrystone on one guest and another on a different one. Since each guest is uniprocessor in my configuration, I did not run two benchmarks on the same host.

Operating Systems	Speed 1	Speed 2
Running both domu/i386 and domu/amd64	4916421.0	11135857.0
Running both dom0/amd64 and domu/amd64	10298661.0	11135857.0
Running both dom0/amd64 and domu/i386	10373444.0	4921260.0

Conclusions

Xen is production ready.

When the host OS can be modified, much higher performance numbers are obtained vs. the low-end VMware server I ran.

While it might be extremely tempting to build one guest that does one very specific function, this probably does not scale: memory is pre-allocated and dedicated to a guest, and while some swapping is allowed, it will slow the guest at seemingly random times; disk can be overcommitted, but the OS sees failure to allocate a block as a hardware failure; the more hosts, the more maintenance costs are present: maintaining packages on each guest, upgrading, etc.

VMware "hmx" or whatever the name of the run-on-bare-metal product should be tested.

I'd love to install Xen on a huge machine with lots of ram and many, many CPUs as a test. Would someone like to ship me a 4 CPU quad core with 64 GB?

Wednesday, April 23, 2008

Your cable company owns you

Well, ok, perhaps not entirely... yet.

This is actually a rant on something cable modems allow your cable internet provider to do to you.

They restrict access to your own hardware.

Why would they do this? Paranoia. A while back, there was a security hole in a network monitoring tool called Simple Network Management Protocol, or SNMP. This security issue allowed people to crash other people's modems, break into their own and change upload/download speeds, and other nasty things.

All of these have been fixed. However, people are still breaking into their modems to "uncap" them -- change speed settings. They just don't use SNMP to do it anymore. They've become more advanced and use things like internal serial ports or JTAG ports.

So, why do cable companies still restrict access to SNMP, and worse, to some of your modem's diagnostic features? I suspect it is because they don't want to have to answer questions about why they suck. They hide the real details of what your modem is doing from you.

Why is this a big deal?

For one, I own the hardware, but my cable company configures it against my wishes. I can understand rate limiting -- I pay for the fastest service already -- but I cannot understand restricting diagnostic tools.

For two, I have spent, in the last 6 months, perhaps 40 hours debugging a cable internet issue with techs from Cox Communications. After many, many rounds of techs who report "all signal levels are good" I finally got a real live network engineer on the line, who, in 5 minutes, could look at all the statistics on my modem. And solve problems.

Monday, March 24, 2008

Checking Credit Card Numbers in Ruby

This is not meant to be an exhaustive list of all possible numbers, nor the only or best method to verify that they pass the "checksum" test, but here's what I came up with.

I wrote this mostly to link a Ruby version of the code to Wikipedia's article on Luhn checksum validation, since nearly every other language in use was listed, but Ruby was sadly missing.

#!/usr/bin/env ruby

#
# Copyright (c) 2008 Michael Graff.  All rights reserved.
#
# Redistribution and use in source and binary forms, with or
# without modification, are permitted provided that the following
# conditions are met:
# 1. Redistributions of source code must retain the above copyright
#    notice, this list of conditions and the following disclaimer.
# 2. Redistributions in binary form must reproduce the above
#    copyright notice, this list of conditions and the following
#    disclaimer in the documentation and/or other materials provided
#    with the distribution.
# 3. The name of Michael Graff may not be used to endorse or promote
#    products derived from this software without specific prior
#    written permission.
#
# THIS SOFTWARE IS PROVIDED BY Michael Graff ``AS IS'' AND ANY
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
# PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL Micahel Graff
# BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
# TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
# OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY
# OF SUCH DAMAGE.
#

class Luhn
  public
  def self.check_luhn(s)
    s.gsub!(/[^0-9]/, "")
    ss = s.reverse.split(//)

    alternate = false
    total = 0
    ss.each do |c|
      if alternate
        total += double_it(c.to_i)
      else
        total += c.to_i
      end
      alternate = !alternate
    end
    (total % 10) == 0
  end

  private
  def self.double_it(i)
    i = i * 2
    if i > 9
      i = i % 10 + 1
    end
    i
  end

end

if $0 == __FILE__
  def test_valid(s)
    result = Luhn::check_luhn(s)
    if result
      puts "VALID: #{s}"
    else
      puts "INVALID: #{s} (should be valid)"
    end
  end

  test_valid('5105 1051 0510 5100') # Mastercard
  test_valid('5555 5555 5555 4444') # Mastercard

  test_valid('4222 2222 2222 2')    # Visa
  test_valid('4111 1111 1111 1111') # Visa
  test_valid('4012 8888 8888 1881') # Visa

  test_valid('3782 8224 6310 005')  # American Express
  test_valid('3714 4963 5398 431')  # American Express
  test_valid('3787 3449 3671 000')  # American Express Corporate
  test_valid('3782 8224 6310 005')  # Amex
  test_valid('3400 0000 0000 009')  # Amex
  test_valid('3700 0000 0000 002')  # Amex

  test_valid('38520000023237')      # Diners Club (14 digits)
  test_valid('30569309025904')      # Diners Club (14 digits)

  test_valid('6011111111111117')    # Discover (16 digits)
  test_valid('6011 0000 0000 0004') # Discover
  test_valid('6011 0000 0000 0012') # Discover
  test_valid('6011000990139424')    # Discover (16 digits)
  test_valid('6011601160116611')    # Discover (16 digits)

  test_valid('3530111333300000')    # JCB (16 digits)
  test_valid('3566002020360505')    # JCB (16 digits)

  test_valid('5431111111111111')    # Mastercard (16 digits)
end

Wednesday, March 19, 2008

Javascript application framework 'extjs' and privacy

Out of the box, extjs version 2.0.2 leaks privacy information.

If you fail to change the value of Ext.BLANK_IMAGE_URL to something local, it will default to http://extjs.com/s.gif. At first this might not seem bad, but remember that every time this image is fetched the referring URL is sent to the extjs.com web server.

At worse, this is a minor information link. Depending on what you might place in your URL line, this could be a major issue.

I have posted a comment on the extjs forums, but so far the developers don't see the problem. They say it is well documented in their FAQ, and that it is documented in the API docs.

I would prefer they opt for a warning message saying "You did not set ..." rather than leaking information by default. I'll probably have to post a CERT on this one.

Skandragon's Ramblings