Initial revision

2009-04-13 07:05:42 +00:00 · 2009-04-13 07:05:42 +00:00 · 75775e0dff
commit 75775e0dff
139 changed files with 21478 additions and 0 deletions
--- a/README.txt
+++ b/README.txt
@ -0,0 +1,145 @@
 --------------------------------
 VMware SVGA Device Developer Kit
 --------------------------------
 The "VMware SVGA II" device is the virtual graphics card implemented
 by all VMware virtualization products. It is a virtual PCI device,
 which implements a basic 2D framebuffer, as well as 3D acceleration,
 video overlay acceleration, and hardware cursor support.
 This is a package of documentation and example code for the VMware
 SVGA device's programming model. Currently it consists of some very
 basic documentation, and a collection of examples which illustrate the
 more advanced features of the device. These examples are written to
 run on the "virtual bare metal", without an operating system.
 This package is intended for educational purposes, or for people who
 are developing 3D drivers. This code won't help you if you're writing
 normal user-level apps that you'd like to run inside a virtual
 machine. It's for driver authors, and it assumes a reasonable amount
 of prior knowledge about graphics hardware.
 Requirements
 ------------
 To compile the example code, you'll need a few basic open source tools:
  - A recent version of GCC. I use 4.2. Older versions may
    require tweaking the Makefile.rules file slightly.)
  - binutils
  - GNU Make
  - Python
 To run the examples, you'll need a recent version of VMware
 Workstation, Player, or Fusion. Some of the examples will work on
 older versions, but Workstation 6.5.x or Fusion 2.0.x is strongly
 recommended.
 Contents
 --------
 * bin/
  Precompiled binaries and .vmx files for all examples. These can be
  loaded directly into VMware Workstation, Player, or Fusion.
 * doc/
  Basic SVGA hardware documentation. This includes a text file with
  information about the programming model, plus it includes a copy of
  a WIOV paper which describes our 3D acceleration architecture.
 * lib/metalkit/
  Metalkit is a very simple open source OS, which bootstraps the
  examples and provides basic hardware support.
 * lib/refdriver/
  The SVGA "Reference Driver". This is a sample implementation of a
  driver for our device, which is used by the examples. It provides
  device initialiation, an implementation of the low-level FIFO
  protocol, and wrappers around common FIFO commands.
  If you're writing a driver for the VMware SVGA device, "svga.c"
  from this directory is required reading. The FIFO protocol has
  many subtle gotchas, and this source file is the only place
  where they're publicly documented.
 * lib/vmware/
  Header files which define VMware's protocols and virtual hardware.
  The svga_reg.h and svga3d_reg.h files are (in places, at least)
  commented with more information on the programming model.
  If you can't find specific documentation or an example on a feature,
  this is the next place to look. This is also where to get a complete
  list of the supported registers and commands.
 * lib/util/
  Higher-level utilities built on top of the refdriver layer. This
  directory won't contain any novel information about the virtual
  hardware, but it does contain some higher-level abstractions used
  by the examples, and these abstractions demonstrate some useful
  idioms for programming the SVGA device.
 * examples/
  Each example has a separate subdirectory. You can run "make" in the
  top-level directory to compile all examples, or you can build them
  individually.
  Many of the examples are self-explanatory, but some of them are
  not. See the comments at the beginning of the 'main.c' file in each
  example.
 Development
 -----------
 This project isn't intended to be a one-time "code drop" from VMware.
 Our intent is for the examples in this package to be maintained out in
 the open. If we have a bugfix, or a new example that works on released
 VMware products, we'll check it in directly to the public repository.
 For examples of not-yet-released features, we will be developing on an
 internal branch. This branch will be merged to the public repository
 shortly after the first release which has working versions of these
 features.
 License
 -------
 Except where noted in individual source files, the whole package is
 Copyright (C) 1998-2009 VMware, Inc.  It is released under the MIT
 license:
  Permission is hereby granted, free of charge, to any person
  obtaining a copy of this software and associated documentation files
  (the "Software"), to deal in the Software without restriction,
  including without limitation the rights to use, copy, modify, merge,
  publish, distribute, sublicense, and/or sell copies of the Software,
  and to permit persons to whom the Software is furnished to do so,
  subject to the following conditions:
  The above copyright notice and this permission notice shall be
  included in all copies or substantial portions of the Software.
  THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
  EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
  MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
  NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
  BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
  ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
  CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
  SOFTWARE.
 Contact
 -------
 This project is provided as-is, with no official support from
 VMware. However, I will try to answer questions as time permits.
 If you have questions or you'd like to submit a patch, feel free
 to email me at: micah at vmware.com
 --
--- a/bin/2dmark.img
+++ b/bin/2dmark.img
--- a/bin/2dmark.vmx
+++ b/bin/2dmark.vmx
@ -0,0 +1,9 @@
 config.version = 8
 virtualHW.version = 7
 memsize = 4
 displayname = 2dmark
 guestOS = other
 mks.enable3d = TRUE
 floppy0.startConnected = TRUE
 floppy0.fileType = file
 floppy0.fileName = 2dmark.img
--- a/bin/blit-cube.img
+++ b/bin/blit-cube.img
--- a/bin/blit-cube.vmx
+++ b/bin/blit-cube.vmx
@ -0,0 +1,9 @@
 config.version = 8
 virtualHW.version = 7
 memsize = 4
 displayname = blit-cube
 guestOS = other
 mks.enable3d = TRUE
 floppy0.startConnected = TRUE
 floppy0.fileType = file
 floppy0.fileName = blit-cube.img
--- a/bin/bunnies.img
+++ b/bin/bunnies.img
--- a/bin/bunnies.vmx
+++ b/bin/bunnies.vmx
@ -0,0 +1,9 @@
 config.version = 8
 virtualHW.version = 7
 memsize = 4
 displayname = bunnies
 guestOS = other
 mks.enable3d = TRUE
 floppy0.startConnected = TRUE
 floppy0.fileType = file
 floppy0.fileName = bunnies.img
--- a/bin/cube.img
+++ b/bin/cube.img
--- a/bin/cube.vmx
+++ b/bin/cube.vmx
@ -0,0 +1,9 @@
 config.version = 8
 virtualHW.version = 7
 memsize = 4
 displayname = cube
 guestOS = other
 mks.enable3d = TRUE
 floppy0.startConnected = TRUE
 floppy0.fileType = file
 floppy0.fileName = cube.img
--- a/bin/cubemark.img
+++ b/bin/cubemark.img
--- a/bin/cubemark.vmx
+++ b/bin/cubemark.vmx
@ -0,0 +1,9 @@
 config.version = 8
 virtualHW.version = 7
 memsize = 4
 displayname = cubemark
 guestOS = other
 mks.enable3d = TRUE
 floppy0.startConnected = TRUE
 floppy0.fileType = file
 floppy0.fileName = cubemark.img
--- a/bin/dynamic-vertex-stress.img
+++ b/bin/dynamic-vertex-stress.img
--- a/bin/dynamic-vertex-stress.vmx
+++ b/bin/dynamic-vertex-stress.vmx
@ -0,0 +1,9 @@
 config.version = 8
 virtualHW.version = 7
 memsize = 4
 displayname = dynamic-vertex-stress
 guestOS = other
 mks.enable3d = TRUE
 floppy0.startConnected = TRUE
 floppy0.fileType = file
 floppy0.fileName = dynamic-vertex-stress.img
--- a/bin/dynamic-vertex.img
+++ b/bin/dynamic-vertex.img
--- a/bin/dynamic-vertex.vmx
+++ b/bin/dynamic-vertex.vmx
@ -0,0 +1,9 @@
 config.version = 8
 virtualHW.version = 7
 memsize = 4
 displayname = dynamic-vertex
 guestOS = other
 mks.enable3d = TRUE
 floppy0.startConnected = TRUE
 floppy0.fileType = file
 floppy0.fileName = dynamic-vertex.img
--- a/bin/fence-stress.img
+++ b/bin/fence-stress.img
--- a/bin/fence-stress.vmx
+++ b/bin/fence-stress.vmx
@ -0,0 +1,9 @@
 config.version = 8
 virtualHW.version = 7
 memsize = 4
 displayname = fence-stress
 guestOS = other
 mks.enable3d = TRUE
 floppy0.startConnected = TRUE
 floppy0.fileType = file
 floppy0.fileName = fence-stress.img
--- a/bin/gmr-test.img
+++ b/bin/gmr-test.img
--- a/bin/gmr-test.vmx
+++ b/bin/gmr-test.vmx
@ -0,0 +1,9 @@
 config.version = 8
 virtualHW.version = 7
 memsize = 128
 displayname = gmr-test
 guestOS = other
 mks.enable3d = TRUE
 floppy0.startConnected = TRUE
 floppy0.fileType = file
 floppy0.fileName = gmr-test.img
--- a/bin/half-float-test.img
+++ b/bin/half-float-test.img
--- a/bin/half-float-test.vmx
+++ b/bin/half-float-test.vmx
@ -0,0 +1,9 @@
 config.version = 8
 virtualHW.version = 7
 memsize = 4
 displayname = half-float-test
 guestOS = other
 mks.enable3d = TRUE
 floppy0.startConnected = TRUE
 floppy0.fileType = file
 floppy0.fileName = half-float-test.img
--- a/bin/pong.img
+++ b/bin/pong.img
--- a/bin/pong.vmx
+++ b/bin/pong.vmx
@ -0,0 +1,9 @@
 config.version = 8
 virtualHW.version = 7
 memsize = 4
 displayname = pong
 guestOS = other
 mks.enable3d = TRUE
 floppy0.startConnected = TRUE
 floppy0.fileType = file
 floppy0.fileName = pong.img
--- a/bin/presentReadback.img
+++ b/bin/presentReadback.img
--- a/bin/presentReadback.vmx
+++ b/bin/presentReadback.vmx
@ -0,0 +1,9 @@
 config.version = 8
 virtualHW.version = 7
 memsize = 4
 displayname = presentReadback
 guestOS = other
 mks.enable3d = TRUE
 floppy0.startConnected = TRUE
 floppy0.fileType = file
 floppy0.fileName = presentReadback.img
--- a/bin/simple-shaders.img
+++ b/bin/simple-shaders.img
--- a/bin/simple-shaders.vmx
+++ b/bin/simple-shaders.vmx
@ -0,0 +1,9 @@
 config.version = 8
 virtualHW.version = 7
 memsize = 4
 displayname = simple-shaders
 guestOS = other
 mks.enable3d = TRUE
 floppy0.startConnected = TRUE
 floppy0.fileType = file
 floppy0.fileName = simple-shaders.img
--- a/bin/simple_blit.img
+++ b/bin/simple_blit.img
--- a/bin/simple_blit.vmx
+++ b/bin/simple_blit.vmx
@ -0,0 +1,9 @@
 config.version = 8
 virtualHW.version = 7
 memsize = 4
 displayname = simple_blit
 guestOS = other
 mks.enable3d = TRUE
 floppy0.startConnected = TRUE
 floppy0.fileType = file
 floppy0.fileName = simple_blit.img
--- a/bin/video-formats.img
+++ b/bin/video-formats.img
--- a/bin/video-formats.vmx
+++ b/bin/video-formats.vmx
@ -0,0 +1,9 @@
 config.version = 8
 virtualHW.version = 7
 memsize = 4
 displayname = video-formats
 guestOS = other
 mks.enable3d = TRUE
 floppy0.startConnected = TRUE
 floppy0.fileType = file
 floppy0.fileName = video-formats.img
--- a/bin/video-sync.img
+++ b/bin/video-sync.img
--- a/bin/video-sync.vmx
+++ b/bin/video-sync.vmx
@ -0,0 +1,9 @@
 config.version = 8
 virtualHW.version = 7
 memsize = 4
 displayname = video-sync
 guestOS = other
 mks.enable3d = TRUE
 floppy0.startConnected = TRUE
 floppy0.fileType = file
 floppy0.fileName = video-sync.img
--- a/doc/gpu-wiov.pdf
+++ b/doc/gpu-wiov.pdf
--- a/doc/svga_interface.txt
+++ b/doc/svga_interface.txt
@ -0,0 +1,872 @@
 Copyright (C) 1999-2009 VMware, Inc.
 All Rights Reserved
        VMware SVGA Device Interface and Programming Model
        --------------------------------------------------
 Revision 3, 2009-04-12
 Table of Contents:
  1. Introduction
  2. Examples and Reference Implementation
  3. Virtual Hardware Overview
  4. 2D Graphics Model
  5. 3D Graphics Model
  6. Overview of SVGA3D features
  7. Programming the VMware SVGA Device
 XXX - Todo
 ----------
 This document does not yet describe the 3D hardware in great
 detail. It is an architectural overview. See the accompanying sample
 and reference code for details.
 Section (7) is biased toward describing much older features of the
 virtual hardware. Many new capability flags and FIFO commands have
 been added, and these are sparsely documented in svga_reg.h.
 1.  Introduction
 ----------------
 This document describes the virtual graphics adapter interface which
 is implemented by VMware products. The VMware SVGA Device is a virtual
 PCI video card. It does not directly correspond to any real video
 card, but it serves as an interface for exposing accelerated graphics
 capabilities to virtual machines in a hardware-independent way.
 In its simplest form, the VMware SVGA Device can be used as a basic
 memory-mapped framebuffer. In this mode, the main advantage of
 VMware's SVGA device over alternatives like VBE is that the virtual
 machine can explicitly indicate which ares of the screen have changed
 by sending update rectangles through the device's command FIFO. This
 allows VMware products to avoid reading areas of the framebuffer which
 haven't been modified by the virtualized OS.
 The VMware SVGA device also supports several advanced features:
   - Accelerated video overlays
   - 2D acceleration
   - Synchronization primitives
   - DMA transfers
   - Device-independent 3D acceleration, with shaders
   - Multiple monitors
   - Desktop resizing
 2.  Examples and Reference Implementation
 -----------------------------------------
 This document is not yet complete, in that it doesn't describe the
 entire SVGA device interface in detail. It is an architectural
 overview of the entire device, as well as an introduction to a few
 basic areas of programming for the device.
 For deeper details, see the attached example code. The "examples"
 directory contains individual example applications which show various
 features in action. The "lib" directory contains support code for the
 example applications.
 Some of this support code is designed to act as a reference
 implementation. For example, the process of writing to the command
 FIFO safely and efficiently is very complicated. The attached
 reference implementation is a must-read for anyone attempting to write
 their own driver for the SVGA device.
 For simplicity and OS-neutrality, all examples compile to floppy disk
 images which execute "on the bare metal" in a VMware virtual
 machine. There are no run-time dependencies. At compile-time, most of
 the examples only require a GNU toolchain (GCC and Binutils). Some of
 the examples require Python at compile-time.
 Each example will generate a .vmx virtual machine configuration file
 which can be used to boot it in VMware Workstation or Fusion.
 The included example code focuses on advanced features of the SVGA
 device, such as 3D and synchronization primitives. There are also a
 couple examples that demonstrate 3D graphics and video overlays.
 For more examples of basic 2D usage, the Xorg driver is also a good
 reference.
 Header files and reference implementation files in 'lib':
 * svga_reg.h
   SVGA register definitions, SVGA capabilities, and FIFO command
   definitions.
 * svga_overlay.h
   Definitions required to use the SVGA device's hardware video overlay
   support.
 * svga_escape.h
   A list of definitions for the SVGA Escape commands, a way to send
   arbitrary data over the SVGA command FIFO. Escapes are used for video
   overlays, for vendor-specific extensions to the SVGA device, and for
   various tools internal to VMware.
 * svga3d_reg.h
   Defines the SVGA3D protocol, the set of FIFO commands used for hardware
   3D acceleration.
 * svga3d_shaderdefs.h
   Defines the bytecode format for SVGA3D shaders. This is used for
   accelerated 3D with programmable vertex and pixel pipelines.
 * svga.c
   Reference implementation of low-level SVGA device functionality.
   This contains sample code for device initialization, safe and
   efficient FIFO writes, and various synchronization primitives.
 * svga3d.c
   Reference implementation for the SVGA3D protocol. This file
   uses the FIFO primitives in svga.c to speak the SVGA3D protocol.
   Includes a lot of in-line documentation.
 * svga3dutil.c
   This is a collection of high-level utilities which provide usage
   examples for svga3d.c and svga.c, and which demonstrate common
   SVGA3D idioms.
 3.  Virtual Hardware Overview
 -----------------------------
 The VMware SVGA Device is a virtual PCI device. It provides the
 following low-level hardware features, which are used to implement
 various feature-specific protocols for 2D, 3D, and video overlays:
 * I/O space, at PCI Base Address Register 0 (BAR0)
 There are only a few I/O ports. Besides the ports used to access
 registers, these are generally either legacy features, or they are for
 I/O which is performance critical but may have side-effects. (Such as
 clearing IRQs after they occur.)
 * Registers, accessed indirectly via INDEX and VALUE I/O ports.
 The device's register space is the principal method by which
 configuration takes place. In general, registers are for actions which
 may have side-effects and which take place synchronously with the CPU.
 * Guest Framebuffer (BAR1)
 The SVGA device itself owns a variable amount of "framebuffer" memory,
 up to a maximum of 128MB. This memory size is fixed at power-on. The
 memory exists outside of the virtual machine's "main memory", and it's
 mapped into PCI space via BAR1. The size of this framebuffer may be
 determined either by probing BAR1 in typical PCI fashion, or by
 reading SVGA_REG_FB_SIZE.
 The beginning of framebuffer memory is reserved for the 2D
 framebuffer. The rest of framebuffer memory may be used as buffer
 space for DMA operations.
 * Command FIFO (BAR2)
 The SVGA device can be thought of as a co-processor which executes
 commands asynchronously with the virtual machine's CPU. To enqueue
 commands for this coprocessor, the SVGA device uses another
 device-owned memory region which is mapped into PCI space.
 The command FIFO is usually much smaller than the framebuffer.  While
 the framebuffer usually ranges from 4MB to 128MB, the FIFO ranges in
 size from 256KB to 2MB. Like the framebuffer, the FIFO size is fixed
 at power-on. The FIFO is mapped via PCI BAR2.
 * FIFO Registers
 The beginning of FIFO memory is reserved for a set of "registers".
 Some of these are used to implement the FIFO command queueing
 protocol, but many of these are used for other purposes. The main
 difference between FIFO registers and non-FIFO registers is that FIFO
 registers are backed by normal RAM whereas non-FIFO registers require
 I/O operations to access. This means that only non-FIFO registers can
 have side-effects, but FIFO registers are much more efficient when
 side-effects aren't necessary.
 The FIFO register space is variable-sized. The driver is responsible
 for partitioning FIFO memory into register space and command space.
 * Synchronization Primitives
 Conceptually, the part of the SVGA device which processes FIFO
 commands can be thought of as a coprocessor or a separate thread of
 execution. The virtual machine may need to:
 - Wake up the FIFO processor when it's sleeping, to ensure that
   new commands are processed with low-latency. (FIFO doorbell)
 - Check whether a previously enqueued FIFO command has been
   processed. (FIFO fence)
 - Wait until the FIFO processor has passed a particular
   command. (Sync to fence)
 - Wait until more space is available in the FIFO. (Wait for
   FIFO progress)
 * Interrupts (Workstation 6.5 virtual machines and later only)
 On virtual machines which have been upgrade to Workstation 6.5 virtual
 hardware, the SVGA device provides an IRQ which can be used to notify
 the virtual machine when a synchronization event occurs. This allows
 implementing operations like "Sync to fence" without interfering with
 a virtual machine's ability to multitask.
 On older virtual hardware versions, the SVGA device only supports a
 "legacy sync" mechanism, in which a particular register access has the
 side-effect of waiting for host FIFO processing to occur.  This older
 mechanism completely halts the virtual machine's CPU while the FIFO is
 being processed.
 * Physical VRAM
 The VMware SVGA device provides management of physical VRAM resources
 via "surface objects", however physical VRAM is never directly visible
 to the virtual machine. Physical VRAM can only be accessed via DMA
 transfers.
 Note that framebuffer memory is simply a convenient place to put DMA
 buffers. Even if a virtual machine only has 16MB of framebuffer memory
 allocated to it, it could be using gigabytes of physical VRAM if that
 memory is available to the physical GPU.
 * DMA engine
 The VMware SVGA device can asynchronously transfer surface data
 between phyiscal VRAM and guest-visible memory. This guest-visible
 memory could be part of framebuffer memory, or it could be part of
 guest system memory.
 The DMA engine uses a "Guest Pointer" abstraction to refer to any
 guest-visible memory. Guest pointer consist of an offset and a Guest
 Memory Region (GMR) ID. There is a pre-defined GMR which refers to
 framebuffer memory. The virtual machine can create additional GMRs to
 refer to regions of system memory which may or may not be physically
 contiguous.
 4.  2D Graphics Model
 ---------------------
 Conceptually, the 2D portion of the VMware SVGA device is a compositor
 which displays a user-visible image composed of several planes. From
 back to front, those planes are:
  - The 2D framebuffer
  - 3D regions
  - Video overlay regions
  - The virtual hardware mouse cursor ("cursor bypass")
  - The physical hardware mouse cursor ("host cursor")
 It is important to note that host-executed 2D graphics commands do not
 necessarily modify the 2D framebuffer, they may write directly to the
 physical display or display window. Like a physical video card, the
 VMware SVGA device's framebuffer is never modified by a mouse cursor
 or video overlay. Unlike a physical video card, however, 3D display
 regions in the VMware SVGA device may or may not modify the 2D
 framebuffer.
 The following basic 2D operations are available:
 * Update
 Redraw a portion of the screen, using data from the 2D
 framebuffer. Any update rectangles are subtracted from the set of
 on-screen 3D regions, so 2D updates always overwrite 3D regions. 2D
 updates still appear behind video overlays and mouse cursors.
 An update command must be sent any time the driver wishes to make
 changes to the 2D framebuffer available. The user-visible screen is
 not guaranteed to update unless an explicit update command is sent.
 Also note that the SVGA device is allowed to read the 2D framebuffer
 even if no update command has been sent. For example, if the virtual
 machine is running in a partially obscured window, the SVGA device
 will read the 2D framebuffer immediately when the window is uncovered
 in order to draw the newly visible portion of the VM's window.
 This means that the virtual machine must not treat the 2D framebuffer
 as a back-buffer. It must contain a completely rendered image at all
 times.
 There is not yet any way to synchronize updates with the vertical
 refresh. Current VMware SVGA devices may suffer from tearing
 artifacts.
 * 2D acceleration operations
 These include fills, copies, and various kinds of blits. All 2D
 acceleration operations happen directly on the user-visible screen,
 not in 2D framebuffer memory.
 Use of the 2D acceleration operations is encouraged only in very
 limited circumstances. For example, when moving or scrolling
 windows. Mixing accelerated and unaccelerated 2D operations is
 difficult to implement properly, and incurs a significant
 synchronization penalty.
 * Present 3D surface
 "Present" is an SVGA3D command which copies a finished image from an
 SVGA3D surface to the user-visible screen. It may or may not update
 the 2D framebuffer in the process.
 Present commands effectively create a 3D overlay on top of part of the
 2D framebuffer. This overlay can be overwritten by Update commands or
 by other Present commands.
 Present is the only way in which the 2D and 3D portions of the VMware
 SVGA device interact.
 * Video overlay operations
 The SVGA device defines a group of virtual "video overlay units", each
 of which can color-convert, scale, and display a frame of YUV video
 overlayed with the 2D framebuffer. Overlay units each have a set of
 virtual registers which are configured using the commands in
 svga_overlay.h.
 * Virtual mouse cursor operations
 The virtual mouse cursor is an overlay which shows the SVGA device's
 current cursor image at a particular location. It may not be
 hardware-accelerated by the physical machine, and it does not
 necessarily correspond with the position of the user's physical mouse.
 There are three "Cursor Bypass" mechanisms by which the virtual
 machine can set the position of the virtual mouse cursor. Cursor
 bypass 1 did not follow the overlay model described above, and it has
 long been obsolete. Cursor bypass 2 and 3 are functionally equivalent,
 except that cursor bypass 2 operates via non-FIFO registers and cursor
 bypass 3 operates via FIFO registers. If cursor bypass 3 is supported
 (SVGA_FIFO_CAP_CURSOR_BYPASS_3), it should be used instead of cursor
 bypass 2.
 For all forms of cursor bypass, the cursor image is defined by
 SVGA_CMD_DEFINE_CURSOR.
 * Physical mouse cursor operations
 The virtual machine does not define the location of the physical mouse
 cursor, but it can define the cursor image and hide/show it. It does
 so using the SVGA_CMD_DEFINE_CURSOR and SVGA_CMD_DISPLAY_CURSOR
 commands.
 5.  3D Graphics Model
 ---------------------
 The VMware SVGA device supports hardware-independent accelerated 3D
 graphics via the "SVGA3D" protocol. This is a set of extended FIFO
 commands. SVGA3D utilizes the same underlying FIFO and synchronization
 primitives as the 2D portion of the SVGA device, but the 2D and 3D
 portions of the device are largely independent.
 The SVGA3D protocol is relatively high-level. The device is
 responsible for tracking render state among multiple contexts, for
 managing physical VRAM, and for implementing both fixed-function and
 programmable vertex and pixel processing.
 The SVGA3D protocol is designed to be vendor- and API-neutral, but for
 convenience it has been designed to be compatible with Direct3D in
 most places. The shader bytecode is fully binary-compatible with
 Direct3D bytecode, and most render states are identical to those
 defined by Direct3D.
 Note that the VMware SVGA device still supports 3D acceleration on all
 operating systems that VMware products run on. Internally, hardware
 accelerated 3D is implemented on top of the OpenGL graphics API.
 To summarize the SVGA3D device's design:
 * SVGA3D is an extension to the VMware SVGA device's command FIFO
   protocol.
 * In some ways it looks like a graphics API:
      o SVGA3D device manages all physical VRAM allocation.
      o High-level render states, relatively high-level shader bytecode.
 * In some ways it looks like hardware:
      o All commands are executed asynchronously.
      o Driver must track memory ownership, schedule DMA transfers.
      o All physical VRAM represented by generic "Surface" objects
 * Supports both fixed-function and programmable vertex and fragment
   pipelines.
 6. Overview of SVGA3D features
 ------------------------------
 * Capabilities
      o Extensible key/value pair list describes the SVGA3D device's
        capabilities.
      o Number of texture image units, max texture size, number of
        lights, texture formats, etc.
 * Surfaces
      o Formats: 8-bit RGB/RGBA, 16-bit RGB/RGBA, depth, packed
        depth/stencil, luminance/alpha, DXT compressed, signed, floating
        point, etc.
      o Supports 3D (volume) textures, cube maps,.
      o Surfaces are also used as vertex and index buffers.
      o Generic DMA blits between surfaces and system memory or
        offscreen "virtual VRAM".
      o Generic surface-to-surface blits, with and without scaling.
 * Contexts
      o Surfaces are global, other objects are per-context, render
        states are per-context.
      o Commands to create/delete contexts.
 * Render State (Mostly Direct3D-style)
      o Matrices
      o Texture stage states: Filtering, combiners, LOD, gamma
        correction, etc.
      o Stencil, depth, culling, blending, lighting, materials, etc.
 * Render Targets
      o Few restrictions on which surfaces can be used as render
        targets (More lenient than OpenGL FBOs)
      o Supports depth, stencil, color buffer(s)
 * Present
      o The "present" operation is a blit from an SVGA3D surface back
        to the user-visible screen.
      o May or may not update the guest-visible 2D framebuffer.
 * Occlusion queries
      o Submitted via FIFO commands
      o Results returned asynchronously: a results structure is filled
        in via DMA.
 * Shaders
      o We define an "SVGA3D bytecode", which is binary-compatible
        with Direct3D's shader bytecode.
      o SVGA3D may define extensions to the bytecode format in the future.
 * Drawing
      o A single generic "draw primitives" command performs a list of
        rendering operations from a list of vertex buffers.
      o Index buffer is optional.
      o Similar to drawing with OpenGL vertex arrays and VBOs.
 7. Programming the VMware SVGA Device
 -------------------------------------
 1. Reading/writing a register:
    The SVGA registers are addressed by an index/value pair of 32 bit
    registers in the IO address space.
    The 0710 VMware SVGA chipset (PCI device ID PCI_DEVICE_ID_VMWARE_SVGA) has
    its index and value ports hardcoded at:
        index: SVGA_LEGACY_BASE_PORT + 4 * SVGA_INDEX_PORT
        value: SVGA_LEGACY_BASE_PORT + 4 * SVGA_VALUE_PORT
    The 0405 VMware SVGA chipset (PCI device ID PCI_DEVICE_ID_VMWARE_SVGA2)
    determines its index and value ports as a function of the first base
    address register in its PCI configuration space as:
        index: <Base Address Register 0> + SVGA_INDEX_PORT
        value: <Base Address Register 0> + SVGA_VALUE_PORT
    To read a register:
        Set the index port to the index of the register, using a dword OUT
        Do a dword IN from the value port
    To write a register:
        Set the index port to the index of the register, using a dword OUT
        Do a dword OUT to the value port
    Example, setting the width to 1024:
        mov     eax, SVGA_REG_WIDTH
        mov     edx, <SVGA Address Port>
        out     dx, eax
        mov     eax, 1024
        mov     edx, <SVGA Value Port>
        out     dx, eax
 2. Initialization
    Check the version number
     loop:
      Write into SVGA_REG_ID the maximum SVGA_ID_* the driver supports.
      Read from SVGA_REG_ID.
       Check if it is the value you wrote.
        If yes, VMware SVGA device supports it
        If no, decrement SVGA_ID_* and goto loop
     This algorithm converges.
    Map the frame buffer and the command FIFO
        Read SVGA_REG_FB_START, SVGA_REG_FB_SIZE, SVGA_REG_MEM_START,
        SVGA_REG_MEM_SIZE.
        Map the frame buffer (FB) and the FIFO memory (MEM).
        This step must occur after the version negotiation above, since by
        default the device is in a legacy-compatibility mode in which there
        is no command FIFO.
    Get the device capabilities and frame buffer dimensions
        Read SVGA_REG_CAPABILITIES, SVGA_REG_MAX_WIDTH, SVGA_REG_MAX_HEIGHT,
        and SVGA_REG_HOST_BITS_PER_PIXEL / SVGA_REG_BITS_PER_PIXEL.
        Note: The capabilities can and do change without the PCI device ID
        changing or the SVGA_REG_ID changing.  A driver should always check
        the capabilities register when loading before expecting any
        capabilities-determined feature to be available.  See below for a list
        of capabilities as of this writing.
        Note: If SVGA_CAP_8BIT_EMULATION is not set, then it is possible that
        SVGA_REG_HOST_BITS_PER_PIXEL does not exist and
        SVGA_REG_BITS_PER_PIXEL should be read instead.
    Optional: Report the Guest Operating System
        Write SVGA_REG_GUEST_ID with the appropriate value from <guest_os.h>.
        While not required in any way, this is useful information for the
        virtual machine to have available for reporting and sanity checking
        purposes.
    SetMode
        Set SVGA_REG_WIDTH, SVGA_REG_HEIGHT, SVGA_REG_BITS_PER_PIXEL
        Read SVGA_REG_FB_OFFSET
        (SVGA_REG_FB_OFFSET is the offset from SVGA_REG_FB_START of the
         visible portion of the frame buffer)
        Read SVGA_REG_BYTES_PER_LINE, SVGA_REG_DEPTH, SVGA_REG_PSEUDOCOLOR,
        SVGA_REG_RED_MASK, SVGA_REG_GREEN_MASK, SVGA_REG_BLUE_MASK
        Note: SVGA_REG_BITS_PER_PIXEL is readonly if
        SVGA_CAP_8BIT_EMULATION is not set in the capabilities register.  Even
        if it is set, values other than 8 and SVGA_REG_HOST_BITS_PER_PIXEL
        will be ignored.
    Enable SVGA
        Set SVGA_REG_ENABLE to 1
        (to disable SVGA, set SVGA_REG_ENABLE to 0.  Setting SVGA_REG_ENABLE
        to 0 also enables VGA.)
    Initialize the command FIFO
        The FIFO is exclusively dword (32-bit) aligned.  The first four
        dwords define the portion of the MEM area that is used for the
        command FIFO.  These are values are all in byte offsets from the
        start of the MEM area.
        A minimum sized FIFO would have these values:
            mem[SVGA_FIFO_MIN] = 16;
            mem[SVGA_FIFO_MAX] = 16 + (10 * 1024);
            mem[SVGA_FIFO_NEXT_CMD] = 16;
            mem[SVGA_FIFO_STOP] = 16;
        Various addresses near the beginning of the FIFO are defined as
        "FIFO registers" with special meaning. If the driver wishes to
        take advantage of the special meaning of these addresses rather
        than using them as part of the command FIFO, the driver must
        reserve space for these registers when setting up the FIFO.
        Typically the driver will set MIN to SVGA_FIFO_NUM_REGS*4.
    Report the guest 3D version
        If your driver supports 3D, write the latest supported 3D
        version (SVGA3D_HWVERSION_CURRENT) to the
        SVGA_FIFO_GUEST_3D_HWVERSION register.
    Enable the command FIFO
        Set SVGA_REG_CONFIG_DONE to 1 after these values have been set.
        Note: Setting SVGA_REG_CONFIG_DONE to 0 will stop the device from
        reading the FIFO until it is reinitialized and SVGA_REG_CONFIG_DONE is
        set to 1 again.
 3. SVGA command FIFO protocol
    The FIFO is empty when SVGA_FIFO_NEXT_CMD == SVGA_FIFO_STOP.  The
    driver writes commands to the FIFO starting at the offset specified
    by SVGA_FIFO_NEXT_CMD, and then increments SVGA_FIFO_NEXT_CMD.
    The FIFO is full when SVGA_FIFO_NEXT_CMD is one word before SVGA_FIFO_STOP.
    When the FIFO becomes full, the driver must wait for space to become
    available. It can do this via various methods (busy-wait, legacy sync)
    but the preferred method is to use the FIFO_PROGRESS interrupt.
    The SVGA device does not guarantee that all of FIFO memory is valid
    at all times. The device is free to discard the contents of any memory
    which is not part of the active portion of the FIFO. The active portion
    of the FIFO is defined as the region with valid commands (starting
    at SVGA_FIFO_STOP and ending at SVGA_FIFO_NEXT_CMD) plus the reserved
    portion of the FIFO.
    By default, only one word of memory is 'reserved'. If the FIFO supports
    the SVGA_FIFO_CAP_RESERVE capability, the device supports reserving
    driver-defined amounts of memory. If both the device and driver support
    this operation, it's possible to write multiple words of data between
    updates to the FIFO control registers.
    The simplest way to use the FIFO is to write one word at a time, but the
    highest-performance way to use the FIFO is to reserve enough space for
    an entire command or group of commands, write the commands directly to
    FIFO memory, then "commit" the command(s) by updating the FIFO control
    registers.
    A reference implementation of this reserve/commit algorithm is provided
    in svga.c, in SVGA_FIFOReserve() and SVGA_FIFOCommit(). In the common
    case, this algorithm lets drivers assemble commands directly in FIFO
    memory without any additional copies or memory allocation.
 4. Synchronization
    The primary synchronization primitive defined by the SVGA device is
    "Sync to fence". A "fence" is a numbered marker inserted into the FIFO
    command stream. The driver can insert fences at any time, and efficiently
    determine the value of the last fence processed by the device.
    "Sync to fence" is the process of waiting for a particular fence to be
    processed. This may be important for several reasons:
       - Flow control. For interactivity, it is important to put an upper
         limit on the amount by which the device may lag the application.
       - Waiting for DMA completion. If the driver needs to recycle a DMA
         buffer or complete a DMA operation synchronously, it must sync
         to a fence which occurred after the DMA operation in the command
         stream.
       - Waiting for accelerated 2D operations. If a 2D driver needs to
         write to a portion of the framebuffer which is affected by
         an accelerated blit, it should sync to a fence which occurred
         after the blit.
    There are multiple possible implementations of Sync to Fence, depending
    on the capabilities of the SVGA device you're driving. Very old versions
    of the VMware SVGA device did not support fences at all. For these
    devices, you must always perform a "legacy sync". New virtual machines
    with Workstation 6.5 virtual hardware or later support an IRQ-driven
    sync operation. For all other versions of the SVGA device, the best
    approach is a hybrid in which you synchronously use the SYNC/BUSY
    registers to process the FIFO until the sync has passed.
    FIFO synchronization is a very complex topic, and it isn't covered fully
    by this document. Please see the synchronization-related comments in
    svga_reg.h, and the reference implementation of these primitives in
    svga.c.
 5. Cursor
    When SVGA_CAP_CURSOR is set, hardware cursor support is available.  In
    practice, SVGA_CAP_CURSOR will only be set when SVGA_CAP_CURSOR_BYPASS is
    also set and drivers supporting a hardware cursor should only worry about
    SVGA_CAP_CURSOR_BYPASS and only use the FIFO to define the cursor.  See
    below for more information.
 6. Pseudocolor
    When the read-only register SVGA_REG_PSEUDOCOLOR is 1, the device is in a
    colormapped mode whose index width and color width are both SVGA_REG_DEPTH.
    Thus far, 8 is the only depth at which pseudocolor is ever used.
    In pseudocolor, the colormap is programmed by writing to the SVGA palette
    registers.  These start at SVGA_PALETTE_BASE and are interpreted as
    follows:
        SVGA_PALETTE_BASE + 3*n         - The nth red component
        SVGA_PALETTE_BASE + 3*n + 1     - The nth green component
        SVGA_PALETTE_BASE + 3*n + 2     - The nth blue component
    And n ranges from 0 to ((1<<SVGA_REG_DEPTH) - 1).
 7. Pseudocolor
    After initialization, the driver can write directly to the frame
    buffer.  The updated frame buffer is not displayed immediately, but
    only when an update command is sent.  The update command
    (SVGA_CMD_UPDATE) defines the rectangle in the frame buffer that has
    been modified by the driver, and causes that rectangle to be updated
    on the screen.
    A complete driver can be developed this way.  For increased
    performance, additional commands are available to accelerate common
    operations.  The two most useful are SVGA_CMD_RECT_FILL and
    SVGA_CMD_RECT_COPY.
    After issuing an accelerated command, the FIFO should be sync'd, as
    described above, before writing to the frame buffer.
 SVGA_REG_FB_OFFSET and SVGA_REG_BYTES_PER_LINE may change after SVGA_REG_WIDTH
 or SVGA_REG_HEIGHT is set.  Also the VGA registers must be written to after
 setting SVGA_REG_ENABLE to 0 to change the display to a VGA mode.
 8. Mode changes
    The video mode may be changed by writing to the WIDTH, HEIGHT,
    and/or DEPTH registers again, after initialization. All of the
    registers listed in the 'SetMode' initialization section above
    should be reread afterwards. Additionally, when changing modes, it
    can be convenient to set SVGA_REG_ENABLE to 0, change
    SVGA_REG_WIDTH, SVGA_REG_HEIGHT, and SVGA_REG_BITS_PER_PIXEL (if
    available), and then set SVGA_REG_ENABLE to 1 again. This is
    optional, but it will avoid intermediate states in which only one
    component of the new mode has been set.
 9. Capabilities
 The capabilities register (SVGA_REG_CAPABILITIES) is an array of bits that
 indicates the capabilities of the SVGA emulation.  A driver should check
 SVGA_REG_CAPABILITIES every time it loads before relying on any feature that
 is only optionally available.
 XXX: There is also a capabilities register in the FIFO register space.
     It is not documented in this file, but all of the available bits
     are listed in svga_reg.h.
 Some of the capabilities determine which FIFO commands are available.  This
 table shows which capability indicates support for which command.
    FIFO Command                        Capability
    ------------                        ----------
    SVGA_CMD_RECT_FILL                  SVGA_CAP_RECT_FILL
    SVGA_CMD_RECT_COPY                  SVGA_CAP_RECT_COPY
    SVGA_CMD_DEFINE_BITMAP              SVGA_CAP_OFFSCREEN
    SVGA_CMD_DEFINE_BITMAP_SCANLINE     SVGA_CAP_OFFSCREEN
    SVGA_CMD_DEFINE_PIXMAP              SVGA_CAP_OFFSCREEN
    SVGA_CMD_DEFINE_PIXMAP_SCANLINE     SVGA_CAP_OFFSCREEN
    SVGA_CMD_RECT_BITMAP_FILL           SVGA_CAP_RECT_PAT_FILL
    SVGA_CMD_RECT_PIXMAP_FILL           SVGA_CAP_RECT_PAT_FILL
    SVGA_CMD_RECT_BITMAP_COPY           SVGA_CAP_RECT_PAT_FILL
    SVGA_CMD_RECT_PIXMAP_COPY           SVGA_CAP_RECT_PAT_FILL
    SVGA_CMD_FREE_OBJECT                SVGA_CAP_OFFSCREEN
    SVGA_CMD_RECT_ROP_FILL              SVGA_CAP_RECT_FILL +
                                            SVGA_CAP_RASTER_OP
    SVGA_CMD_RECT_ROP_COPY              SVGA_CAP_RECT_COPY +
                                            SVGA_CAP_RASTER_OP
    SVGA_CMD_RECT_ROP_BITMAP_FILL       SVGA_CAP_RECT_PAT_FILL +
                                            SVGA_CAP_RASTER_OP
    SVGA_CMD_RECT_ROP_PIXMAP_FILL       SVGA_CAP_RECT_PAT_FILL +
                                            SVGA_CAP_RASTER_OP
    SVGA_CMD_RECT_ROP_BITMAP_COPY       SVGA_CAP_RECT_PAT_FILL +
                                            SVGA_CAP_RASTER_OP
    SVGA_CMD_RECT_ROP_PIXMAP_COPY       SVGA_CAP_RECT_PAT_FILL +
                                            SVGA_CAP_RASTER_OP
    SVGA_CMD_DEFINE_CURSOR              SVGA_CAP_CURSOR
    SVGA_CMD_DISPLAY_CURSOR             SVGA_CAP_CURSOR
    SVGA_CMD_MOVE_CURSOR                SVGA_CAP_CURSOR
    SVGA_CMD_DEFINE_ALPHA_CURSOR        SVGA_CAP_ALPHA_CURSOR
    SVGA_CMD_DRAW_GLYPH                 SVGA_CAP_GLYPH
    SVGA_CMD_DRAW_GLYPH_CLIPPED         SVGA_CAP_GLYPH_CLIPPING
    SVGA_CMD_ESCAPE                     SVGA_FIFO_CAP_ESCAPE
    (NOTE: Many of the commands here are deprecated, and listed
           in the table only for reference. All comments for glyph,
           bitmap, and pixmap drawing are not implemented in the
           latest releases of VMware products.)
 Other capabilities indicate other functionality as described below:
    SVGA_CAP_CURSOR_BYPASS
        The hardware cursor can be drawn via SVGA Registers (without requiring
        the FIFO be synchronized and will be drawn potentially before any
        outstanding unprocessed FIFO commands).
        Note:  Without SVGA_CAP_CURSOR_BYPASS_2, cursors drawn this way still
        appear in the guest's framebuffer and need to be turned off before any
        save under / overlapping drawing and turned back on after.  This can
        cause very noticeable cursor flicker.
    SVGA_CAP_CURSOR_BYPASS_2
        Instead of turning the cursor off and back on around any overlapping
        drawing, the driver can write SVGA_CURSOR_ON_REMOVE_FROM_FB and
        SVGA_CURSOR_ON_RESTORE_TO_FB to SVGA_REG_CURSOR_ON.  In almost all
        cases these are NOPs and the cursor will be remain visible without
        appearing in the guest framebuffer.  In 'direct graphics' modes like
        Linux host fullscreen local displays, however, the cursor will still
        be drawn in the framebuffer, still flicker, and be drawn incorrectly
        if a driver does not use SVGA_CURSOR_ON_REMOVE_FROM_FB / RESTORE_TO_FB.
    SVGA_CAP_8BIT_EMULATION
        SVGA_REG_BITS_PER_PIXEL is writable and can be set to either 8 or
        SVGA_REG_HOST_BITS_PER_PIXEL.  Otherwise the only SVGA modes available
        inside a virtual machine must match the host's bits per pixel.
        Note: Some versions which lack SVGA_CAP_8BIT_EMULATION also lack the
        SVGA_REG_HOST_BITS_PER_PIXEL and a driver should assume
        SVGA_REG_BITS_PER_PIXEL is both read-only and initialized to the only
        available value if SVGA_CAP_8BIT_EMULATION is not set.
    SVGA_CAP_OFFSCREEN_1
        SVGA_CMD_RECT_FILL, SVGA_CMD_RECT_COPY, SVGA_CMD_RECT_ROP_FILL,
        SVGA_CMD_RECT_ROP_COPY can operate with a source or destination (or
        both) in offscreen memory.
        Usable offscreen memory is a rectangle located below the last scanline
        of the visible memory:
        x1 = 0
        y1 = (SVGA_REG_FB_SIZE + SVGA_REG_BYTES_PER_LINE - 1) /
             SVGA_REG_BYTES_PER_LINE
        x2 = SVGA_REG_BYTES_PER_LINE / SVGA_REG_DEPTH
        y2 = SVGA_REG_VRAM_SIZE / SVGA_REG_BYTES_PER_LINE
 Cursor Handling
 ---------------
 Several cursor drawing mechanisms are supported for legacy
 compatibility. The current mechanism, and the only one that new
 drivers need support, is "Cursor Bypass 3".
 In Cursor Bypass 3 mode, the cursor image is defined via FIFO
 commands, but the cursor position and visibility is reported
 asynchronously by writing to FIFO registers.
 A driver defines an AND/XOR hardware cursor using
 SVGA_CMD_DEFINE_CURSOR to assign an ID and establish the AND and XOR
 masks with the hardware.  A driver uses SVGA_CMD_DEFINE_ALPHA_CURSOR
 to define a 32 bit mask whose top 8 bits are used to blend the cursor
 image with the pixels it covers.  Alpha cursor support is only
 available when SVGA_CAP_ALPHA_CURSOR is set. Note that alpha cursors
 use pre-multiplied alpha.
 ---
--- a/examples/2dmark/Makefile
+++ b/examples/2dmark/Makefile
@ -0,0 +1,6 @@
 TARGET = 2dmark.img
 APP_SOURCES = main.c
 LIB_DIR = ../../lib
 include $(LIB_DIR)/Makefile.rules
--- a/examples/2dmark/main.c
+++ b/examples/2dmark/main.c
@ -0,0 +1,148 @@
 /*
 * Simple 2D graphics benchmark.
 *
 * The VMware SVGA device typically coalesces update rectangles and
 * processes them asynchronously. This makes it difficult to get
 * meaningful 2D benchmark numbers from tools which run inside a
 * normal guest OS.
 *
 * This tool sweeps through multiple 2D update sizes on multiple
 * video modes. After the test, results are summarized to the screen
 * (in VGA text mode) and to vmware.log.
 *
 * Copyright (C) 2008-2009 VMware, Inc. Licensed under the MIT
 * License, please see the README.txt. All rights reserved.
 */
 #include "svga.h"
 #include "intr.h"
 #include "console_vga.h"
 #include "vmbackdoor.h"
 #include "svga3dutil.h"
 struct {
   uint32 value;
   const char *label;
 } sizes[] = {
   { 1,    "    1" },
   { 8,    "    8" },
   { 64,   "   64" },
   { 233,  "  233" },  /* Prime */
   { 256,  "  256" },
   { 2048, " 2048" },
   { 2099, " 2099" },  /* Prime */
   { 4096, " 4096" },
 };
 /*
 * benchmarkAtSize --
 *
 *    Inner benchmarking loop, tests one combination of fb and update sizes.
 */
 static FPSCounterState *
 benchmarkAtSize(uint32 screen, uint32 update)
 {
   int i = 3;
   static FPSCounterState fps;
   memset(&fps, 0, sizeof fps);
   /* Clear the screen and change modes */
   memset(gSVGA.fbMem, 0x40, screen * screen * sizeof(uint32));
   SVGA_SetMode(screen, screen, 32);
   /* Make sure the FIFO is empty */
   SVGA_SyncToFence(SVGA_InsertFence());
   /*
    * UpdateFPSCounter returns TRUE each time it's output is updated.
    * The first time, it won't have an FPS reading available yet. (It is
    * guaranteed to return TRUE on its first call.) The second time,
    * an FPS reading will be ready. We wait until the third time, in
    * order to give the readings extra time to stabilize.
    *
    * Note that the 'i--' part of this expression only executes when
    * UpdateFPSCounter returns TRUE.
    */
   do {
      /* Synchronously update the screen */
      SVGA_Update(0, 0, update, update);
      SVGA_SyncToFence(SVGA_InsertFence());
   } while (!SVGA3DUtil_UpdateFPSCounter(&fps) || i--);
   return &fps;
 }
 /*
 * runBenchmark --
 *
 *    Main benchmark loop. Run through all valid combinations
 *    of update and display sizes.
 */
 static void
 runBenchmark()
 {
   const int numSizes = sizeof sizes / sizeof sizes[0];
   int i, j;
   Console_WriteString("Synchronous 2D updates per second.\n"
                       "Video mode width/height on Y axis, update size on X axis.\n"
                       "\n");
   /* Size headings across the top of the screen */
   Console_WriteString("      | ");
   for (i = 0; i < numSizes; i++) {
      Console_WriteString("   ");
      Console_WriteString(sizes[i].label);
   }
   Console_WriteString("\n");
   for (i = 0; i < 79; i++) {
      Console_WriteString("-");
   }
   Console_WriteString("\n");
   for (i = 0; i < numSizes; i++) {
      Console_Format("%s | ", sizes[i].label);
      for (j = 0; j <= i; j++) {
         char *fps = benchmarkAtSize(sizes[i].value, sizes[j].value)->text;
         /* Hack to make the string shorter by cutting off "FPS" label. */
         fps[7] = '\0';
         Console_Format(" %s", fps);
      }
      Console_WriteString("\n");
   }
   Console_WriteString("\nBenchmark complete. Results are "
                       "also available in the VMX log.");
 }
 /*
 * main --
 *
 *    Initialization and results reporting.
 */
 int
 main(void)
 {
   Intr_Init();
   Intr_SetFaultHandlers(SVGA_DefaultFaultHandler);
   ConsoleVGA_Init();
   SVGA_Init();
   runBenchmark();
   SVGA_Disable();
   VMBackdoor_VGAScreenshot();
   return 0;
 }
--- a/examples/Makefile
+++ b/examples/Makefile
@ -0,0 +1,11 @@
 SUBDIRS = $(subst /Makefile,,$(wildcard */Makefile))
 .PHONY: subdirs clean $(SUBDIRS)
 subdirs: $(SUBDIRS)
 $(SUBDIRS):
 	$(MAKE) -C $@
 clean:
 	for dir in $(SUBDIRS); do $(MAKE) -C $$dir clean; done
--- a/examples/blit-cube/Makefile
+++ b/examples/blit-cube/Makefile
@ -0,0 +1,6 @@
 TARGET = blit-cube.img
 APP_SOURCES = main.c
 LIB_DIR = ../../lib
 include $(LIB_DIR)/Makefile.rules
--- a/examples/blit-cube/main.c
+++ b/examples/blit-cube/main.c
@ -0,0 +1,349 @@
 /*
 * SVGA3D example: Spinning cube, with various blit operations.
 *
 * Copyright (C) 2008-2009 VMware, Inc. Licensed under the MIT
 * License, please see the README.txt. All rights reserved.
 */
 #include "svga3dutil.h"
 #include "svga3dtext.h"
 #include "matrix.h"
 #include "math.h"
 typedef struct {
   float position[3];
   float texcoord[2];
   float color[3];
 } MyVertex;
 static const MyVertex vertexData[] = {
   { {-1, -1, -1}, { 0, 0 }, {0.5, 0.5, 0.5} },  /* -X */
   { {-1, -1,  1}, { 0, 1 }, {1.0, 1.0, 1.0} },
   { {-1,  1, -1}, { 1, 0 }, {0.5, 0.5, 0.5} },
   { {-1,  1,  1}, { 1, 1 }, {1.0, 1.0, 1.0} },
   { { 1, -1, -1}, { 0, 0 }, {0.5, 0.5, 0.5} },  /* +X */
   { { 1, -1,  1}, { 0, 1 }, {1.0, 1.0, 1.0} },
   { { 1,  1, -1}, { 1, 0 }, {0.5, 0.5, 0.5} },
   { { 1,  1,  1}, { 1, 1 }, {1.0, 1.0, 1.0} },
   { {-1, -1, -1}, { 0, 0 }, {0.5, 0.5, 0.5} },  /* -Y */
   { {-1, -1,  1}, { 0, 1 }, {1.0, 1.0, 1.0} },
   { { 1, -1, -1}, { 1, 0 }, {0.5, 0.5, 0.5} },
   { { 1, -1,  1}, { 1, 1 }, {1.0, 1.0, 1.0} },
   { {-1,  1, -1}, { 0, 0 }, {0.5, 0.5, 0.5} },  /* +Y */
   { {-1,  1,  1}, { 0, 1 }, {1.0, 1.0, 1.0} },
   { { 1,  1, -1}, { 1, 0 }, {0.5, 0.5, 0.5} },
   { { 1,  1,  1}, { 1, 1 }, {1.0, 1.0, 1.0} },
   { {-1, -1, -1}, { 0, 0 }, {0.5, 0.5, 0.5} },  /* -Z */
   { {-1,  1, -1}, { 0, 1 }, {1.0, 1.0, 1.0} },
   { { 1, -1, -1}, { 1, 0 }, {0.5, 0.5, 0.5} },
   { { 1,  1, -1}, { 1, 1 }, {1.0, 1.0, 1.0} },
   { {-1, -1,  1}, { 0, 0 }, {0.5, 0.5, 0.5} },  /* +Z */
   { {-1,  1,  1}, { 0, 1 }, {1.0, 1.0, 1.0} },
   { { 1, -1,  1}, { 1, 0 }, {0.5, 0.5, 0.5} },
   { { 1,  1,  1}, { 1, 1 }, {1.0, 1.0, 1.0} },
 };
 #define QUAD(a,b,c,d) a, b, d, d, c, a
 static const uint16 indexData[] = {
   QUAD(0,  1,  2,  3),  // -X
   QUAD(4,  5,  6,  7),  // +X
   QUAD(8,  9,  10, 11), // -Y
   QUAD(12, 13, 14, 15), // +Y
   QUAD(16, 17, 18, 19), // -Z
   QUAD(20, 21, 22, 23), // +Z
 };
 #undef QUAD
 const uint32 numTriangles = sizeof indexData / sizeof indexData[0] / 3;
 uint32 vertexSid, indexSid, textureSid;
 Matrix perspectiveMat;
 FPSCounterState gFPS;
 VMMousePacket lastMouseState;
 /*
 * render --
 *
 *   Set up render state, and draw our cube scene from static index
 *   and vertex buffers.
 *
 *   This render state only needs to be set each frame because
 *   SVGA3DText_Draw() changes it.
 */
 void
 render(void)
 {
   SVGA3dTextureState *ts;
   SVGA3dRenderState *rs;
   SVGA3dVertexDecl *decls;
   SVGA3dPrimitiveRange *ranges;
   static Matrix view;
   Matrix_Copy(view, gIdentityMatrix);
   Matrix_Scale(view, 0.5, 0.5, 0.5, 1.0);
   if (lastMouseState.buttons & VMMOUSE_LEFT_BUTTON) {
      Matrix_RotateX(view, lastMouseState.y *  0.0001);
      Matrix_RotateY(view, lastMouseState.x * -0.0001);
   } else {
      Matrix_RotateX(view, 30.0 * M_PI / 180.0);
      Matrix_RotateY(view, gFPS.frame * 0.01f);
   }
   Matrix_Translate(view, 0, 0, 2);
   SVGA3D_SetTransform(CID, SVGA3D_TRANSFORM_VIEW, view);
   SVGA3D_SetTransform(CID, SVGA3D_TRANSFORM_WORLD, gIdentityMatrix);
   SVGA3D_SetTransform(CID, SVGA3D_TRANSFORM_PROJECTION, perspectiveMat);
   SVGA3D_BeginSetRenderState(CID, &rs, 4);
   {
      rs[0].state     = SVGA3D_RS_BLENDENABLE;
      rs[0].uintValue = FALSE;
      rs[1].state     = SVGA3D_RS_ZENABLE;
      rs[1].uintValue = TRUE;
      rs[2].state     = SVGA3D_RS_ZWRITEENABLE;
      rs[2].uintValue = TRUE;
      rs[3].state     = SVGA3D_RS_ZFUNC;
      rs[3].uintValue = SVGA3D_CMP_LESS;
   }
   SVGA_FIFOCommitAll();
   SVGA3D_BeginSetTextureState(CID, &ts, 10);
   {
      ts[0].stage = 0;
      ts[0].name  = SVGA3D_TS_BIND_TEXTURE;
      ts[0].value = textureSid;
      ts[1].stage = 0;
      ts[1].name  = SVGA3D_TS_COLOROP;
      ts[1].value = SVGA3D_TC_MODULATE;
      ts[2].stage = 0;
      ts[2].name  = SVGA3D_TS_COLORARG1;
      ts[2].value = SVGA3D_TA_TEXTURE;
      ts[3].stage = 0;
      ts[3].name  = SVGA3D_TS_COLORARG2;
      ts[3].value = SVGA3D_TA_DIFFUSE;
      ts[4].stage = 0;
      ts[4].name  = SVGA3D_TS_ALPHAOP;
      ts[4].value = SVGA3D_TC_SELECTARG1;
      ts[5].stage = 0;
      ts[5].name  = SVGA3D_TS_ALPHAARG1;
      ts[5].value = SVGA3D_TA_DIFFUSE;
      ts[6].stage = 0;
      ts[6].name  = SVGA3D_TS_MINFILTER;
      ts[6].value = SVGA3D_TEX_FILTER_LINEAR;
      ts[7].stage = 0;
      ts[7].name  = SVGA3D_TS_MAGFILTER;
      ts[7].value = SVGA3D_TEX_FILTER_LINEAR;
      ts[8].stage = 0;
      ts[8].name  = SVGA3D_TS_ADDRESSU;
      ts[8].value = SVGA3D_TEX_ADDRESS_WRAP;
      ts[9].stage = 0;
      ts[9].name  = SVGA3D_TS_ADDRESSV;
      ts[9].value = SVGA3D_TEX_ADDRESS_WRAP;
   }
   SVGA_FIFOCommitAll();
   SVGA3D_BeginDrawPrimitives(CID, &decls, 3, &ranges, 1);
   {
      decls[0].identity.type = SVGA3D_DECLTYPE_FLOAT3;
      decls[0].identity.usage = SVGA3D_DECLUSAGE_POSITION;
      decls[0].array.surfaceId = vertexSid;
      decls[0].array.stride = sizeof(MyVertex);
      decls[0].array.offset = offsetof(MyVertex, position);
      decls[1].identity.type = SVGA3D_DECLTYPE_FLOAT2;
      decls[1].identity.usage = SVGA3D_DECLUSAGE_TEXCOORD;
      decls[1].array.surfaceId = vertexSid;
      decls[1].array.stride = sizeof(MyVertex);
      decls[1].array.offset = offsetof(MyVertex, texcoord);
      decls[2].identity.type = SVGA3D_DECLTYPE_FLOAT3;
      decls[2].identity.usage = SVGA3D_DECLUSAGE_COLOR;
      decls[2].array.surfaceId = vertexSid;
      decls[2].array.stride = sizeof(MyVertex);
      decls[2].array.offset = offsetof(MyVertex, color);
      ranges[0].primType = SVGA3D_PRIMITIVE_TRIANGLELIST;
      ranges[0].primitiveCount = numTriangles;
      ranges[0].indexArray.surfaceId = indexSid;
      ranges[0].indexArray.stride = sizeof(uint16);
      ranges[0].indexWidth = sizeof(uint16);
   }
   SVGA_FIFOCommitAll();
 }
 /*
 * defineCheckerboard --
 *
 *    Create a new checkerboard texture of the specified size.
 */
 uint32
 defineCheckerboard(uint32 width, uint32 height)
 {
   uint32 *buffer;
   int i, j;
   SVGAGuestPtr gPtr;
   uint32 size = width * height * sizeof *buffer;
   uint32 sid = SVGA3DUtil_DefineSurface2D(width, height, SVGA3D_A8R8G8B8);
   buffer = SVGA3DUtil_AllocDMABuffer(size, &gPtr);
   for (j = 0; j < height; j++) {
      for (i = 0; i < width; i++) {
         *buffer = (i + j) & 1 ? 0xFFFFFFFF : 0x00000000;
         buffer++;
      }
   }
   SVGA3DUtil_SurfaceDMA2D(sid, &gPtr, SVGA3D_WRITE_HOST_VRAM, width, height);
   return sid;
 }
 /*
 * main --
 *
 *    Our example's entry point, invoked directly by the bootloader.
 */
 int
 main(void)
 {
   uint32 texSize = 256;
   uint32 checkerSid;
   SVGA3DUtil_InitFullscreen(CID, 1024, 768);
   SVGA3DText_Init();
   vertexSid = SVGA3DUtil_DefineStaticBuffer(vertexData, sizeof vertexData);
   indexSid = SVGA3DUtil_DefineStaticBuffer(indexData, sizeof indexData);
   textureSid = SVGA3DUtil_DefineSurface2D(texSize, texSize, SVGA3D_A8R8G8B8);
   checkerSid = defineCheckerboard(texSize, texSize);
   Matrix_Perspective(perspectiveMat, 45.0f,
                      gSVGA.width / (float)gSVGA.height, 0.1f, 100.0f);
   while (1) {
      if (SVGA3DUtil_UpdateFPSCounter(&gFPS)) {
         Console_Clear();
         Console_Format(
            "VMware SVGA3D Example:\n"
            "Spinning cube blitter test: \n"
            "  - SurfaceStretchBlt from back buffer to cube texture\n"
            "  - SurfaceCopy from cube texture to back buffer\n"
            "  - Checkerboard pattern in bottom left\n"
            "\n"
            "Verify performance and correctness with all blitter implementations.\n"
            "\n"
            "%s",
            gFPS.text);
         SVGA3DText_Update();
         VMBackdoor_VGAScreenshot();
      }
      while (VMBackdoor_MouseGetPacket(&lastMouseState));
      SVGA3DUtil_ClearFullscreen(CID, SVGA3D_CLEAR_COLOR | SVGA3D_CLEAR_DEPTH,
                                 0x6666dd, 1.0f, 0);
      render();
      SVGA3DText_Draw();
      /* Surface copy from cube texture to the lower-right corner of the back buffer */
      {
         SVGA3dSurfaceImageId src = { textureSid };
         SVGA3dCopyBox *boxes;
         SVGA3D_BeginSurfaceCopy(&src, &gFullscreen.colorImage, &boxes, 1);
         boxes[0].w = texSize;
         boxes[0].h = texSize;
         boxes[0].d = 1;
         boxes[0].x = gFullscreen.screen.w - texSize;
         boxes[0].y = gFullscreen.screen.h - texSize;
         SVGA_FIFOCommitAll();
      }
      /*
       * We're displaying the checkerboard texture in the lower-left
       * corner of the back buffer.  This tests for subpixel alignment
       * errors within the blitter.
       *
       * Draw the top half with a regular blit, bottom half with a
       * stretch blit. You should see a contiguous checkerboard.
       */
      {
         SVGA3dSurfaceImageId src = { checkerSid };
         SVGA3dCopyBox *boxes;
         SVGA3dBox boxSrc = { 0 };
         SVGA3dBox boxDest = { 0 };
         SVGA3D_BeginSurfaceCopy(&src, &gFullscreen.colorImage, &boxes, 1);
         boxes[0].w = texSize;
         boxes[0].h = texSize/2;
         boxes[0].d = 1;
         boxes[0].y = gFullscreen.screen.h - texSize;
         SVGA_FIFOCommitAll();
         boxSrc.w = texSize;
         boxSrc.y = texSize/2;
         boxSrc.h = texSize/2;
         boxSrc.d = 1;
         boxDest.w = texSize;
         boxDest.y = gFullscreen.screen.h - texSize/2;
         boxDest.h = texSize/2;
         boxDest.d = 1;
         SVGA3D_SurfaceStretchBlt(&src, &gFullscreen.colorImage, &boxSrc, &boxDest,
                                  SVGA3D_STRETCH_BLT_LINEAR);
      }
      SVGA3DUtil_PresentFullscreen();
      /* Stretch blit from back buffer to cube */
      {
         SVGA3dSurfaceImageId dest = { textureSid };
         SVGA3dBox boxSrc = { 0 };
         SVGA3dBox boxDest = { 0 };
         boxSrc.w = gFullscreen.screen.w;
         boxSrc.h = gFullscreen.screen.h;
         boxSrc.d = 1;
         boxDest.w = texSize;
         boxDest.h = texSize;
         boxDest.d = 1;
         SVGA3D_SurfaceStretchBlt(&gFullscreen.colorImage, &dest, &boxSrc, &boxDest,
                                  SVGA3D_STRETCH_BLT_LINEAR);
      }
   }
   return 0;
 }
--- a/examples/bunnies/Makefile
+++ b/examples/bunnies/Makefile
@ -0,0 +1,6 @@
 TARGET = bunnies.img
 APP_SOURCES = main.c bunny.ib.z.data.o bunny.vb.z.data.o
 LIB_DIR = ../../lib
 include $(LIB_DIR)/Makefile.rules
--- a/examples/bunnies/bunny.ib
+++ b/examples/bunnies/bunny.ib
--- a/examples/bunnies/bunny.vb
+++ b/examples/bunnies/bunny.vb
--- a/examples/bunnies/main.c
+++ b/examples/bunnies/main.c
@ -0,0 +1,202 @@
 /*
 * SVGA3D example: Bunnies.
 *
 * This example loads the famous Stanford Bunny model, and draws many
 * copies of it. This demonstrates large models and fixed-function
 * lighting.
 *
 * Copyright (C) 2008-2009 VMware, Inc. Licensed under the MIT
 * License, please see the README.txt. All rights reserved.
 */
 #include "svga3dutil.h"
 #include "svga3dtext.h"
 #include "matrix.h"
 #include "math.h"
 DECLARE_DATAFILE(ibFile, bunny_ib_z);
 DECLARE_DATAFILE(vbFile, bunny_vb_z);
 uint32 vertexSid, indexSid;
 uint32 ibSize, vbSize;
 Matrix perspectiveMat;
 FPSCounterState gFPS;
 /*
 * setupFrame --
 *
 *    Set up render state that we load once per frame (because
 *    SVGA3DText clobbered it) and perform matrix calculations that we
 *    only need once per frame.
 */
 void
 setupFrame(void)
 {
   static Matrix world;
   SVGA3dTextureState *ts;
   SVGA3dRenderState *rs;
   static const SVGA3dLightData light = {
      .type = SVGA3D_LIGHTTYPE_POINT,
      .inWorldSpace = TRUE,
      .diffuse = { 10.0f, 10.0f, 10.0f, 1.0f },
      .ambient = { 0.05f, 0.05f, 0.1f, 1.0f },
      .position = { -5.0f, 5.0f, 0.0f, 1.0f },
      .attenuation0 = 1.0f,
      .attenuation1 = 0.0f,
      .attenuation2 = 0.0f,
   };
   static const SVGA3dMaterial mat = {
      .diffuse = { 1.0f, 0.9f, 0.9f, 1.0f },
      .ambient = { 1.0f, 1.0f, 1.0f, 1.0f },
   };
   Matrix_Copy(world, gIdentityMatrix);
   Matrix_Scale(world, 10, 10, 10, 1);
   Matrix_RotateY(world, gFPS.frame * 0.001f);
   SVGA3D_SetTransform(CID, SVGA3D_TRANSFORM_WORLD, world);
   SVGA3D_SetTransform(CID, SVGA3D_TRANSFORM_PROJECTION, perspectiveMat);
   SVGA3D_SetMaterial(CID, SVGA3D_FACE_FRONT_BACK, &mat);
   SVGA3D_SetLightData(CID, 0, &light);
   SVGA3D_SetLightEnabled(CID, 0, TRUE);
   SVGA3D_BeginSetRenderState(CID, &rs, 8);
   {
      rs[0].state     = SVGA3D_RS_BLENDENABLE;
      rs[0].uintValue = FALSE;
      rs[1].state     = SVGA3D_RS_ZENABLE;
      rs[1].uintValue = TRUE;
      rs[2].state     = SVGA3D_RS_ZWRITEENABLE;
      rs[2].uintValue = TRUE;
      rs[3].state     = SVGA3D_RS_ZFUNC;
      rs[3].uintValue = SVGA3D_CMP_LESS;
      rs[4].state     = SVGA3D_RS_LIGHTINGENABLE;
      rs[4].uintValue = TRUE;
      rs[5].state     = SVGA3D_RS_VERTEXMATERIALENABLE;
      rs[5].uintValue = FALSE;
      rs[6].state     = SVGA3D_RS_CULLMODE;
      rs[6].uintValue = SVGA3D_FACE_FRONT;
      rs[7].state     = SVGA3D_RS_AMBIENT;
      rs[7].uintValue = 0x00000000;
   }
   SVGA_FIFOCommitAll();
   SVGA3D_BeginSetTextureState(CID, &ts, 4);
   {
      ts[0].stage = 0;
      ts[0].name  = SVGA3D_TS_BIND_TEXTURE;
      ts[0].value = SVGA3D_INVALID_ID;
      ts[1].stage = 0;
      ts[1].name  = SVGA3D_TS_COLOROP;
      ts[1].value = SVGA3D_TC_SELECTARG1;
      ts[2].stage = 0;
      ts[2].name  = SVGA3D_TS_COLORARG1;
      ts[2].value = SVGA3D_TA_DIFFUSE;
      ts[3].stage = 0;
      ts[3].name  = SVGA3D_TS_ALPHAARG1;
      ts[3].value = SVGA3D_TA_DIFFUSE;
   }
   SVGA_FIFOCommitAll();
 }
 /*
 * drawMesh --
 *
 *    Draw our bunny mesh at a particular position.
 */
 void
 drawMesh(float posX, float posY, float posZ)
 {
   SVGA3dVertexDecl *decls;
   SVGA3dPrimitiveRange *ranges;
   static Matrix view;
   Matrix_Copy(view, gIdentityMatrix);
   Matrix_Translate(view, posX, posY, posZ);
   SVGA3D_SetTransform(CID, SVGA3D_TRANSFORM_VIEW, view);
   SVGA3D_BeginDrawPrimitives(CID, &decls, 2, &ranges, 1);
   {
      decls[0].identity.type = SVGA3D_DECLTYPE_FLOAT3;
      decls[0].identity.usage = SVGA3D_DECLUSAGE_POSITION;
      decls[0].array.surfaceId = vertexSid;
      decls[0].array.stride = 6 * sizeof(float);
      decls[1].identity.type = SVGA3D_DECLTYPE_FLOAT3;
      decls[1].identity.usage = SVGA3D_DECLUSAGE_NORMAL;
      decls[1].array.surfaceId = vertexSid;
      decls[1].array.stride = 6 * sizeof(float);
      decls[1].array.offset = 3 * sizeof(float);
      ranges[0].primType = SVGA3D_PRIMITIVE_TRIANGLELIST;
      ranges[0].primitiveCount = ibSize / sizeof(uint32) / 3;
      ranges[0].indexArray.surfaceId = indexSid;
      ranges[0].indexArray.stride = sizeof(uint32);
      ranges[0].indexWidth = sizeof(uint32);
   }
   SVGA_FIFOCommitAll();
 }
 /*
 * main --
 *
 *    Our example's entry point, invoked directly by the bootloader.
 */
 int
 main(void)
 {
   SVGA3DUtil_InitFullscreen(CID, 800, 600);
   SVGA3DText_Init();
   vertexSid = SVGA3DUtil_LoadCompressedBuffer(vbFile, &vbSize);
   indexSid = SVGA3DUtil_LoadCompressedBuffer(ibFile, &ibSize);
   Matrix_Perspective(perspectiveMat, 45.0f,
                      gSVGA.width / (float)gSVGA.height, 0.1f, 100.0f);
   while (1) {
      int i;
      if (SVGA3DUtil_UpdateFPSCounter(&gFPS)) {
         Console_Clear();
         Console_Format("VMware SVGA3D Example:\n"
                        "Bunnies: Drawing 4 copies of the Stanford Bunny,"
                        " at 65K triangles each.\n\n%s",
                        gFPS.text);
         SVGA3DText_Update();
      }
      SVGA3DUtil_ClearFullscreen(CID, SVGA3D_CLEAR_COLOR | SVGA3D_CLEAR_DEPTH,
                                 0x113366, 1.0f, 0);
      setupFrame();
      for (i = 0; i < 4; i++) {
         drawMesh(0.8 - i * 1.0f, -1, 3 + i * 1.0f);
      }
      SVGA3DText_Draw();
      SVGA3DUtil_PresentFullscreen();
   }
   return 0;
 }
--- a/examples/cube/Makefile
+++ b/examples/cube/Makefile
@ -0,0 +1,6 @@
 TARGET = cube.img
 APP_SOURCES = main.c
 LIB_DIR = ../../lib
 include $(LIB_DIR)/Makefile.rules
--- a/examples/cube/main.c
+++ b/examples/cube/main.c
@ -0,0 +1,191 @@
 /*
 * SVGA3D example: Spinning cube, with static vertex/index buffers.
 *
 * Copyright (C) 2008-2009 VMware, Inc. Licensed under the MIT
 * License, please see the README.txt. All rights reserved.
 */
 #include "svga3dutil.h"
 #include "svga3dtext.h"
 #include "matrix.h"
 #include "math.h"
 #include "keyboard.h"
 #include "apm.h"
 typedef struct {
   float position[3];
   uint32 color;
 } MyVertex;
 static const MyVertex vertexData[] = {
   { {-1, -1, -1}, 0xFFFFFF },
   { {-1, -1,  1}, 0xFFFF00 },
   { {-1,  1, -1}, 0xFF00FF },
   { {-1,  1,  1}, 0xFF0000 },
   { { 1, -1, -1}, 0x00FFFF },
   { { 1, -1,  1}, 0x00FF00 },
   { { 1,  1, -1}, 0x0000FF },
   { { 1,  1,  1}, 0x000000 },
 };
 #define QUAD(a,b,c,d) a, b, d, d, c, a
 static const uint16 indexData[] = {
   QUAD(0,1,2,3), // -X
   QUAD(4,5,6,7), // +X
   QUAD(0,1,4,5), // -Y
   QUAD(2,3,6,7), // +Y
   QUAD(0,2,4,6), // -Z
   QUAD(1,3,5,7), // +Z
 };
 #undef QUAD
 const uint32 numTriangles = sizeof indexData / sizeof indexData[0] / 3;
 uint32 vertexSid, indexSid;
 Matrix perspectiveMat;
 FPSCounterState gFPS;
 VMMousePacket lastMouseState;
 /*
 * render --
 *
 *   Set up render state, and draw our cube scene from static index
 *   and vertex buffers.
 *
 *   This render state only needs to be set each frame because
 *   SVGA3DText_Draw() changes it.
 */
 void
 render(void)
 {
   SVGA3dTextureState *ts;
   SVGA3dRenderState *rs;
   SVGA3dVertexDecl *decls;
   SVGA3dPrimitiveRange *ranges;
   static Matrix view;
   Matrix_Copy(view, gIdentityMatrix);
   Matrix_Scale(view, 0.5, 0.5, 0.5, 1.0);
   if (lastMouseState.buttons & VMMOUSE_LEFT_BUTTON) {
      Matrix_RotateX(view, lastMouseState.y *  0.0001);
      Matrix_RotateY(view, lastMouseState.x * -0.0001);
   } else {
      Matrix_RotateX(view, 30.0 * M_PI / 180.0);
      Matrix_RotateY(view, gFPS.frame * 0.01f);
   }
   Matrix_Translate(view, 0, 0, 3);
   SVGA3D_SetTransform(CID, SVGA3D_TRANSFORM_VIEW, view);
   SVGA3D_SetTransform(CID, SVGA3D_TRANSFORM_WORLD, gIdentityMatrix);
   SVGA3D_SetTransform(CID, SVGA3D_TRANSFORM_PROJECTION, perspectiveMat);
   SVGA3D_BeginSetRenderState(CID, &rs, 4);
   {
      rs[0].state     = SVGA3D_RS_BLENDENABLE;
      rs[0].uintValue = FALSE;
      rs[1].state     = SVGA3D_RS_ZENABLE;
      rs[1].uintValue = TRUE;
      rs[2].state     = SVGA3D_RS_ZWRITEENABLE;
      rs[2].uintValue = TRUE;
      rs[3].state     = SVGA3D_RS_ZFUNC;
      rs[3].uintValue = SVGA3D_CMP_LESS;
   }
   SVGA_FIFOCommitAll();
   SVGA3D_BeginSetTextureState(CID, &ts, 4);
   {
      ts[0].stage = 0;
      ts[0].name  = SVGA3D_TS_BIND_TEXTURE;
      ts[0].value = SVGA3D_INVALID_ID;
      ts[1].stage = 0;
      ts[1].name  = SVGA3D_TS_COLOROP;
      ts[1].value = SVGA3D_TC_SELECTARG1;
      ts[2].stage = 0;
      ts[2].name  = SVGA3D_TS_COLORARG1;
      ts[2].value = SVGA3D_TA_DIFFUSE;
      ts[3].stage = 0;
      ts[3].name  = SVGA3D_TS_ALPHAARG1;
      ts[3].value = SVGA3D_TA_DIFFUSE;
   }
   SVGA_FIFOCommitAll();
   SVGA3D_BeginDrawPrimitives(CID, &decls, 2, &ranges, 1);
   {
      decls[0].identity.type = SVGA3D_DECLTYPE_FLOAT3;
      decls[0].identity.usage = SVGA3D_DECLUSAGE_POSITION;
      decls[0].array.surfaceId = vertexSid;
      decls[0].array.stride = sizeof(MyVertex);
      decls[0].array.offset = offsetof(MyVertex, position);
      decls[1].identity.type = SVGA3D_DECLTYPE_D3DCOLOR;
      decls[1].identity.usage = SVGA3D_DECLUSAGE_COLOR;
      decls[1].array.surfaceId = vertexSid;
      decls[1].array.stride = sizeof(MyVertex);
      decls[1].array.offset = offsetof(MyVertex, color);
      ranges[0].primType = SVGA3D_PRIMITIVE_TRIANGLELIST;
      ranges[0].primitiveCount = numTriangles;
      ranges[0].indexArray.surfaceId = indexSid;
      ranges[0].indexArray.stride = sizeof(uint16);
      ranges[0].indexWidth = sizeof(uint16);
   }
   SVGA_FIFOCommitAll();
 }
 /*
 * main --
 *
 *    Our example's entry point, invoked directly by the bootloader.
 */
 int
 main(void)
 {
   SVGA3DUtil_InitFullscreen(CID, 800, 600);
   SVGA3DText_Init();
   Keyboard_Init();
   APM_Init();
   vertexSid = SVGA3DUtil_DefineStaticBuffer(vertexData, sizeof vertexData);
   indexSid = SVGA3DUtil_DefineStaticBuffer(indexData, sizeof indexData);
   Matrix_Perspective(perspectiveMat, 45.0f,
                      gSVGA.width / (float)gSVGA.height, 0.1f, 100.0f);
   while (!Keyboard_IsKeyPressed(KEY_ESCAPE)) {
      if (SVGA3DUtil_UpdateFPSCounter(&gFPS)) {
         Console_Clear();
         Console_Format("VMware SVGA3D Example:\n"
                        "Spinning cube with static vertex and index buffer.\n"
                        "Drag with left mouse button to rotate.\n"
                        "Press ESC to exit.\n"
                        "\n%s",
                        gFPS.text);
         SVGA3DText_Update();
         VMBackdoor_VGAScreenshot();
      }
      while (VMBackdoor_MouseGetPacket(&lastMouseState));
      SVGA3DUtil_ClearFullscreen(CID, SVGA3D_CLEAR_COLOR | SVGA3D_CLEAR_DEPTH,
                                 0x113366, 1.0f, 0);
      render();
      SVGA3DText_Draw();
      SVGA3DUtil_PresentFullscreen();
   }
   APM_SetPowerState(POWER_OFF);
   return 0;
 }
--- a/examples/cubemark/Makefile
+++ b/examples/cubemark/Makefile
@ -0,0 +1,16 @@
 TARGET = cubemark.img
 APP_SOURCES = main.c
 LIB_DIR = ../../lib
 include $(LIB_DIR)/Makefile.rules
 .PHONY: shaders
 shaders: cube_vs.h cube_ps.h
 cube_vs.h: cube.fx
 	wine fxc.exe /T vs_2_0 /E MyVertexShader /Fh cube_vs.h cube.fx
 cube_ps.h: cube.fx
 	wine fxc.exe /T ps_2_0 /E MyPixelShader /Fh cube_ps.h cube.fx
--- a/examples/cubemark/cube.fx
+++ b/examples/cubemark/cube.fx
@ -0,0 +1,36 @@
 float4x4 matView, matProj;
 struct VS_Input
 {
   float4  Pos      : POSITION;
   float4  Color    : COLOR0;
 };
 struct VS_Output
 {
   float4  Pos      : POSITION;
   float4  Color    : COLOR0;
 };
 VS_Output
 MyVertexShader(VS_Input Input)
 {
   VS_Output Output;
   Output.Pos = mul(mul(Input.Pos, matView), matProj);
   Output.Color = Input.Color;
   return Output;
 }
 struct PS_Input
 {
   float4  Color    : COLOR0;
 };
 float4
 MyPixelShader(PS_Input Input) : COLOR
 {
   return Input.Color;
 }
--- a/examples/cubemark/cube_ps.h
+++ b/examples/cubemark/cube_ps.h
@ -0,0 +1,21 @@
 #if 0
 //
 // Generated by Microsoft (R) D3DX9 Shader Compiler 
 //
 //   fxc /T ps_2_0 /E MyPixelShader /Fh cube_ps.h cube.fx
 //
    ps_2_0
    dcl v0
    mov oC0, v0
 // approximately 1 instruction slot used
 #endif
 const DWORD g_ps20_MyPixelShader[] =
 {
    0xffff0200, 0x0013fffe, 0x42415443, 0x0000001c, 0x00000023, 0xffff0200, 
    0x00000000, 0x00000000, 0x20000100, 0x0000001c, 0x325f7370, 0x4d00305f, 
    0x6f726369, 0x74666f73, 0x29522820, 0x44334420, 0x53203958, 0x65646168, 
    0x6f432072, 0x6c69706d, 0x00207265, 0x0200001f, 0x80000000, 0x900f0000, 
    0x02000001, 0x800f0800, 0x90e40000, 0x0000ffff
 };
--- a/examples/cubemark/cube_vs.h
+++ b/examples/cubemark/cube_vs.h
@ -0,0 +1,54 @@
 #if 0
 //
 // Generated by Microsoft (R) D3DX9 Shader Compiler 
 //
 //   fxc /T vs_2_0 /E MyVertexShader /Fh cube_vs.h cube.fx
 //
 //
 // Parameters:
 //
 //   float4x4 matProj;
 //   float4x4 matView;
 //
 //
 // Registers:
 //
 //   Name         Reg   Size
 //   ------------ ----- ----
 //   matView      c0       4
 //   matProj      c4       4
 //
    vs_2_0
    dcl_position v0
    dcl_color v1
    dp4 r0.x, v0, c0
    dp4 r0.y, v0, c1
    dp4 r0.z, v0, c2
    dp4 r0.w, v0, c3
    dp4 oPos.x, r0, c4
    dp4 oPos.y, r0, c5
    dp4 oPos.z, r0, c6
    dp4 oPos.w, r0, c7
    mov oD0, v1
 // approximately 9 instruction slots used
 #endif
 const DWORD g_vs20_MyVertexShader[] =
 {
    0xfffe0200, 0x0025fffe, 0x42415443, 0x0000001c, 0x0000006b, 0xfffe0200, 
    0x00000002, 0x0000001c, 0x20000100, 0x00000064, 0x00000044, 0x00040002, 
    0x00000004, 0x0000004c, 0x00000000, 0x0000005c, 0x00000002, 0x00000004, 
    0x0000004c, 0x00000000, 0x5074616d, 0x006a6f72, 0x00030003, 0x00040004, 
    0x00000001, 0x00000000, 0x5674616d, 0x00776569, 0x325f7376, 0x4d00305f, 
    0x6f726369, 0x74666f73, 0x29522820, 0x44334420, 0x53203958, 0x65646168, 
    0x6f432072, 0x6c69706d, 0x00207265, 0x0200001f, 0x80000000, 0x900f0000, 
    0x0200001f, 0x8000000a, 0x900f0001, 0x03000009, 0x80010000, 0x90e40000, 
    0xa0e40000, 0x03000009, 0x80020000, 0x90e40000, 0xa0e40001, 0x03000009, 
    0x80040000, 0x90e40000, 0xa0e40002, 0x03000009, 0x80080000, 0x90e40000, 
    0xa0e40003, 0x03000009, 0xc0010000, 0x80e40000, 0xa0e40004, 0x03000009, 
    0xc0020000, 0x80e40000, 0xa0e40005, 0x03000009, 0xc0040000, 0x80e40000, 
    0xa0e40006, 0x03000009, 0xc0080000, 0x80e40000, 0xa0e40007, 0x02000001, 
    0xd00f0000, 0x90e40001, 0x0000ffff
 };
--- a/examples/cubemark/main.c
+++ b/examples/cubemark/main.c
@ -0,0 +1,235 @@
 /*
 * Cubemark, a microbenchmark which renders a very large number of
 * very simple objects. This stresses the throughput of the SVGA3D
 * command pipeline and API layers.
 *
 * Half of the cubes are rendered using fixed-function, and half of
 * them are rendered using shaders. This helps hilight any performance
 * differences between per-draw setup for FFP vs. for shaders.
 *
 * Copyright (C) 2008-2009 VMware, Inc. Licensed under the MIT
 * License, please see the README.txt. All rights reserved.
 */
 #include "svga3dutil.h"
 #include "svga3dtext.h"
 #include "matrix.h"
 #include "math.h"
 typedef uint32 DWORD;
 #include "cube_vs.h"
 #include "cube_ps.h"
 #define MY_VSHADER_ID       0
 #define MY_PSHADER_ID       0
 #define CONST_MAT_VIEW      0
 #define CONST_MAT_PROJ      4
 typedef struct {
   float position[3];
   uint32 color;
 } MyVertex;
 /*
 * Two colors for the cubes, so we can see them rotate more easily.
 */
 #define COLOR1 0x8080FF
 #define COLOR2 0x000080
 /*
 * This defines the grid spacing, as well as the total number of cubes we draw.
 */
 #define GRID_X_MIN  (-35)
 #define GRID_X_MAX  35
 #define GRID_Y_MIN  (-20)
 #define GRID_Y_MAX  20
 #define GRID_STEP   2
 static const MyVertex vertexData[] = {
   { {-1, -1, -1}, COLOR1 },
   { {-1, -1,  1}, COLOR1 },
   { {-1,  1, -1}, COLOR1 },
   { {-1,  1,  1}, COLOR1 },
   { { 1, -1, -1}, COLOR2 },
   { { 1, -1,  1}, COLOR2 },
   { { 1,  1, -1}, COLOR2 },
   { { 1,  1,  1}, COLOR2 },
 };
 #define QUAD(a,b,c,d) a, b, d, d, c, a
 static const uint16 indexData[] = {
   QUAD(0,1,2,3), // -X
   QUAD(4,5,6,7), // +X
   QUAD(0,1,4,5), // -Y
   QUAD(2,3,6,7), // +Y
   QUAD(0,2,4,6), // -Z
   QUAD(1,3,5,7), // +Z
 };
 #undef QUAD
 const uint32 numTriangles = sizeof indexData / sizeof indexData[0] / 3;
 uint32 vertexSid, indexSid;
 Matrix perspectiveMat;
 FPSCounterState gFPS;
 VMMousePacket lastMouseState;
 /*
 * render --
 *
 *   Set up common render state and matrices, then enter a loop
 *   drawing many cubes with individual draw commands.
 *
 *   This render state only needs to be set each frame because
 *   SVGA3DText_Draw() changes it.
 */
 void
 render(void)
 {
   SVGA3dTextureState *ts;
   SVGA3dRenderState *rs;
   SVGA3dVertexDecl *decls;
   SVGA3dPrimitiveRange *ranges;
   static Matrix view, instance;
   float x, y;
   Bool useShaders = FALSE;
   Matrix_Copy(view, gIdentityMatrix);
   Matrix_Scale(view, 0.5, 0.5, 0.5, 1.0);
   Matrix_RotateX(view, 30.0 * M_PI / 180.0);
   Matrix_RotateY(view, gFPS.frame * 0.1f);
   Matrix_Translate(view, 0, 0, 75);
   SVGA3D_SetTransform(CID, SVGA3D_TRANSFORM_WORLD, gIdentityMatrix);
   SVGA3D_SetTransform(CID, SVGA3D_TRANSFORM_PROJECTION, perspectiveMat);
   SVGA3DUtil_SetShaderConstMatrix(CID, CONST_MAT_PROJ,
                                   SVGA3D_SHADERTYPE_VS, perspectiveMat);
   SVGA3D_BeginSetRenderState(CID, &rs, 4);
   {
      rs[0].state     = SVGA3D_RS_BLENDENABLE;
      rs[0].uintValue = FALSE;
      rs[1].state     = SVGA3D_RS_ZENABLE;
      rs[1].uintValue = TRUE;
      rs[2].state     = SVGA3D_RS_ZWRITEENABLE;
      rs[2].uintValue = TRUE;
      rs[3].state     = SVGA3D_RS_ZFUNC;
      rs[3].uintValue = SVGA3D_CMP_LESS;
   }
   SVGA_FIFOCommitAll();
   SVGA3D_BeginSetTextureState(CID, &ts, 4);
   {
      ts[0].stage = 0;
      ts[0].name  = SVGA3D_TS_BIND_TEXTURE;
      ts[0].value = SVGA3D_INVALID_ID;
      ts[1].stage = 0;
      ts[1].name  = SVGA3D_TS_COLOROP;
      ts[1].value = SVGA3D_TC_SELECTARG1;
      ts[2].stage = 0;
      ts[2].name  = SVGA3D_TS_COLORARG1;
      ts[2].value = SVGA3D_TA_DIFFUSE;
      ts[3].stage = 0;
      ts[3].name  = SVGA3D_TS_ALPHAARG1;
      ts[3].value = SVGA3D_TA_DIFFUSE;
   }
   SVGA_FIFOCommitAll();
   for (x = GRID_X_MIN; x <= GRID_X_MAX; x += GRID_STEP) {
      for (y = GRID_Y_MIN; y <= GRID_Y_MAX; y += GRID_STEP) {
         Matrix_Copy(instance, view);
         Matrix_Translate(instance, x, y, 0);
         if (useShaders) {
            SVGA3D_SetShader(CID, SVGA3D_SHADERTYPE_VS, MY_VSHADER_ID);
            SVGA3D_SetShader(CID, SVGA3D_SHADERTYPE_PS, MY_PSHADER_ID);
            SVGA3DUtil_SetShaderConstMatrix(CID, CONST_MAT_VIEW,
                                            SVGA3D_SHADERTYPE_VS, instance);
         } else {
            SVGA3D_SetShader(CID, SVGA3D_SHADERTYPE_VS, SVGA3D_INVALID_ID);
            SVGA3D_SetShader(CID, SVGA3D_SHADERTYPE_PS, SVGA3D_INVALID_ID);
            SVGA3D_SetTransform(CID, SVGA3D_TRANSFORM_VIEW, instance);
         }
         SVGA3D_BeginDrawPrimitives(CID, &decls, 2, &ranges, 1);
         {
            decls[0].identity.type = SVGA3D_DECLTYPE_FLOAT3;
            decls[0].identity.usage = SVGA3D_DECLUSAGE_POSITION;
            decls[0].array.surfaceId = vertexSid;
            decls[0].array.stride = sizeof(MyVertex);
            decls[0].array.offset = offsetof(MyVertex, position);
            decls[1].identity.type = SVGA3D_DECLTYPE_D3DCOLOR;
            decls[1].identity.usage = SVGA3D_DECLUSAGE_COLOR;
            decls[1].array.surfaceId = vertexSid;
            decls[1].array.stride = sizeof(MyVertex);
            decls[1].array.offset = offsetof(MyVertex, color);
            ranges[0].primType = SVGA3D_PRIMITIVE_TRIANGLELIST;
            ranges[0].primitiveCount = numTriangles;
            ranges[0].indexArray.surfaceId = indexSid;
            ranges[0].indexArray.stride = sizeof(uint16);
            ranges[0].indexWidth = sizeof(uint16);
         }
         SVGA_FIFOCommitAll();
      }
      useShaders = !useShaders;
   }
   SVGA3D_SetShader(CID, SVGA3D_SHADERTYPE_VS, SVGA3D_INVALID_ID);
   SVGA3D_SetShader(CID, SVGA3D_SHADERTYPE_PS, SVGA3D_INVALID_ID);
 }
 /*
 * main --
 *
 *    Our example's entry point, invoked directly by the bootloader.
 */
 int
 main(void)
 {
   SVGA3DUtil_InitFullscreen(CID, 800, 600);
   SVGA3DText_Init();
   vertexSid = SVGA3DUtil_DefineStaticBuffer(vertexData, sizeof vertexData);
   indexSid = SVGA3DUtil_DefineStaticBuffer(indexData, sizeof indexData);
   SVGA3D_DefineShader(CID, MY_VSHADER_ID, SVGA3D_SHADERTYPE_VS,
                       g_vs20_MyVertexShader, sizeof g_vs20_MyVertexShader);
   SVGA3D_DefineShader(CID, MY_PSHADER_ID, SVGA3D_SHADERTYPE_PS,
                       g_ps20_MyPixelShader, sizeof g_ps20_MyPixelShader);
   Matrix_Perspective(perspectiveMat, 45.0f,
                      gSVGA.width / (float)gSVGA.height, 10.0f, 100.0f);
   while (1) {
      if (SVGA3DUtil_UpdateFPSCounter(&gFPS)) {
         Console_Clear();
         Console_Format("Cubemark microbenchmark\n\n%s", gFPS.text);
         SVGA3DText_Update();
         VMBackdoor_VGAScreenshot();
      }
      SVGA3DUtil_ClearFullscreen(CID, SVGA3D_CLEAR_COLOR | SVGA3D_CLEAR_DEPTH,
                                 0x000000, 1.0f, 0);
      render();
      SVGA3DText_Draw();
      SVGA3DUtil_PresentFullscreen();
   }
   return 0;
 }
--- a/examples/dynamic-vertex-stress/Makefile
+++ b/examples/dynamic-vertex-stress/Makefile
@ -0,0 +1,6 @@
 TARGET = dynamic-vertex-stress.img
 APP_SOURCES = main.c
 LIB_DIR = ../../lib
 include $(LIB_DIR)/Makefile.rules
--- a/examples/dynamic-vertex-stress/main.c
+++ b/examples/dynamic-vertex-stress/main.c
@ -0,0 +1,353 @@
 /*
 * SVGA3D example: Dynamic vertex buffer stress-test.
 *
 * This example is a performance stress-test for dynamic vertex
 * buffers, and specifically for performing DMA on buffers which may
 * still be in use by the GPU.
 *
 * Like the original dynamic-vertex test, we compute an animated
 * function on the guest CPU and upload it via a vertex buffer before
 * each draw. To simulate the stresses involved in dealing with apps
 * that render in immediate-mode, however, this test breaks the vertex
 * buffer up into very small pieces which are all DMA'ed and rendered
 * individually.
 *
 * If the SVGA3D implementation has any bottlenecks related to reusing
 * vertex buffers that are still in use by the physical GPU, this test
 * will expose them.
 *
 * Copyright (C) 2008-2009 VMware, Inc. Licensed under the MIT
 * License, please see the README.txt. All rights reserved.
 */
 #include "svga3dutil.h"
 #include "svga3dtext.h"
 #include "matrix.h"
 #include "math.h"
 #define MESH_WIDTH      256  /* 64 kilovertices, 1.5MB */
 #define MESH_HEIGHT     256
 #define MESH_NUM_VERTICES   (MESH_WIDTH * MESH_HEIGHT)
 #define MESH_NUM_QUADS      ((MESH_WIDTH-1) * (MESH_HEIGHT-1))
 #define MESH_NUM_TRIANGLES  (MESH_NUM_QUADS * 2)
 #define MESH_NUM_INDICES    (MESH_NUM_TRIANGLES * 3)
 #define MESH_NUM_BYTES      (MESH_NUM_VERTICES * sizeof(MyVertex))
 #define TRIANGLES_PER_ROW   ((MESH_WIDTH-1) * 2)
 #define INDICES_PER_ROW     (TRIANGLES_PER_ROW * 3)
 #define MESH_ELEMENT(x, y)  (MESH_WIDTH * (y) + (x))
 typedef struct {
   float position[3];
   float color[3];
 } MyVertex;
 typedef uint16 IndexType;
 DMAPool vertexDMA;
 uint32 vertexSid, indexSid;
 Matrix perspectiveMat;
 FPSCounterState gFPS;
 /*
 * setupFrame --
 *
 *    Set up render state that we load once per frame (because
 *    SVGA3DText clobbered it) and perform matrix calculations that we
 *    only need once per frame.
 */
 void
 setupFrame(void)
 {
   static Matrix world;
   static Matrix view;
   SVGA3dTextureState *ts;
   SVGA3dRenderState *rs;
   Matrix_Copy(view, gIdentityMatrix);
   Matrix_Translate(view, 0, 0, 3);
   SVGA3D_SetTransform(CID, SVGA3D_TRANSFORM_VIEW, view);
   Matrix_Copy(world, gIdentityMatrix);
   Matrix_RotateX(world, -60.0 * PI_OVER_180);
   Matrix_RotateY(world, gFPS.frame * 0.01f);
   SVGA3D_SetTransform(CID, SVGA3D_TRANSFORM_WORLD, world);
   SVGA3D_SetTransform(CID, SVGA3D_TRANSFORM_PROJECTION, perspectiveMat);
   SVGA3D_BeginSetRenderState(CID, &rs, 4);
   {
      rs[0].state     = SVGA3D_RS_BLENDENABLE;
      rs[0].uintValue = FALSE;
      rs[1].state     = SVGA3D_RS_ZENABLE;
      rs[1].uintValue = TRUE;
      rs[2].state     = SVGA3D_RS_ZWRITEENABLE;
      rs[2].uintValue = TRUE;
      rs[3].state     = SVGA3D_RS_ZFUNC;
      rs[3].uintValue = SVGA3D_CMP_LESS;
   }
   SVGA_FIFOCommitAll();
   SVGA3D_BeginSetTextureState(CID, &ts, 4);
   {
      ts[0].stage = 0;
      ts[0].name  = SVGA3D_TS_BIND_TEXTURE;
      ts[0].value = SVGA3D_INVALID_ID;
      ts[1].stage = 0;
      ts[1].name  = SVGA3D_TS_COLOROP;
      ts[1].value = SVGA3D_TC_SELECTARG1;
      ts[2].stage = 0;
      ts[2].name  = SVGA3D_TS_COLORARG1;
      ts[2].value = SVGA3D_TA_DIFFUSE;
      ts[3].stage = 0;
      ts[3].name  = SVGA3D_TS_ALPHAARG1;
      ts[3].value = SVGA3D_TA_DIFFUSE;
   }
   SVGA_FIFOCommitAll();
 }
 /*
 * updateVertices --
 *
 *    Calculate new vertices, writing them directly into an available
 *    DMA buffer. Returns a DMAPoolBuffer which contains the vertex
 *    data for an entire frame.
 */
 DMAPoolBuffer *
 updateVertices(float red, float green, float blue, float phase, float offset)
 {
   DMAPoolBuffer *dma;
   MyVertex *vert;
   int x, y;
   float t = gFPS.frame * 0.1f + phase;
   dma = SVGA3DUtil_DMAPoolGetBuffer(&vertexDMA);
   vert = (MyVertex*) dma->buffer;
   for (y = 0; y < MESH_HEIGHT; y++) {
      for (x = 0; x < MESH_WIDTH; x++) {
         float fx = x * (2.0 / MESH_WIDTH) - 1.0;
         float fy = y * (2.0 / MESH_HEIGHT) - 1.0;
         float fxo = fx + offset;
         float dist = fxo * fxo + fy * fy;
         float z = sinf(dist * 8.0 + t) / (1 + dist * 10.0);
         vert->position[0] = fx;
         vert->position[1] = fy;
         vert->position[2] = z;
         vert->color[0] = red - z;
         vert->color[1] = green - z;
         vert->color[2] = blue - z;
         vert++;
      }
   }
   return dma;
 }
 /*
 * createIndexBuffer --
 *
 *    Create a static index buffer that renders our vertices as a 2D
 *    mesh. For simplicity, we use a triangle list rather than a
 *    triangle strip.
 */
 uint32
 createIndexBuffer(void)
 {
   IndexType *indexBuffer;
   const uint32 bufferSize = MESH_NUM_INDICES * sizeof *indexBuffer;
   SVGAGuestPtr gPtr;
   uint32 sid;
   int x, y;
   sid = SVGA3DUtil_DefineSurface2D(bufferSize, 1, SVGA3D_BUFFER);
   indexBuffer = SVGA3DUtil_AllocDMABuffer(bufferSize, &gPtr);
   for (y = 0; y < (MESH_HEIGHT - 1); y++) {
      for (x = 0; x < (MESH_WIDTH - 1); x++) {
         indexBuffer[0] = MESH_ELEMENT(x,   y  );
         indexBuffer[1] = MESH_ELEMENT(x+1, y  );
         indexBuffer[2] = MESH_ELEMENT(x+1, y+1);
         indexBuffer[3] = MESH_ELEMENT(x+1, y+1);
         indexBuffer[4] = MESH_ELEMENT(x,   y+1);
         indexBuffer[5] = MESH_ELEMENT(x,   y  );
         indexBuffer += 6;
      }
   }
   SVGA3DUtil_SurfaceDMA2D(sid, &gPtr, SVGA3D_WRITE_HOST_VRAM, bufferSize, 1);
   return sid;
 }
 /*
 * trashBuffer --
 *
 *    Upload zeroes to the vertex buffer, to make any future DMA errors obvious.
 */
 void trashBuffer(void)
 {
   DMAPoolBuffer *dma = SVGA3DUtil_DMAPoolGetBuffer(&vertexDMA);
   memset(dma->buffer, 0, MESH_NUM_BYTES);
   SVGA3DUtil_SurfaceDMA2D(vertexSid, &dma->ptr,
                           SVGA3D_WRITE_HOST_VRAM, MESH_NUM_BYTES, 1);
   SVGA3DUtil_AsyncCall((AsyncCallFn) SVGA3DUtil_DMAPoolFreeBuffer, dma);
 }
 /*
 * uploadRow --
 *
 *    Upload the vertex data for one row of the mesh.
 */
 void uploadRow(int row, DMAPoolBuffer *dma)
 {
   SVGA3dCopyBox *boxes;
   SVGA3dGuestImage guestImage;
   SVGA3dSurfaceImageId hostImage = { vertexSid };
   guestImage.ptr = dma->ptr;
   guestImage.pitch = 0;
   SVGA3D_BeginSurfaceDMA(&guestImage, &hostImage, SVGA3D_WRITE_HOST_VRAM, &boxes, 1);
   {
      boxes[0].x = MESH_HEIGHT * sizeof(MyVertex) * row;
      boxes[0].w = MESH_WIDTH * sizeof(MyVertex);
      boxes[0].srcx = boxes[0].x;
      boxes[0].h = 1;
      boxes[0].d = 1;
   }
   SVGA_FIFOCommitAll();
 }
 /*
 * drawStrip --
 *
 *    Draw all triangles between 'row' and 'row+1'.
 */
 void
 drawStrip(int row)
 {
   SVGA3dVertexDecl *decls;
   SVGA3dPrimitiveRange *ranges;
   SVGA3D_BeginDrawPrimitives(CID, &decls, 2, &ranges, 1);
   {
      decls[0].identity.type = SVGA3D_DECLTYPE_FLOAT3;
      decls[0].identity.usage = SVGA3D_DECLUSAGE_POSITION;
      decls[0].array.surfaceId = vertexSid;
      decls[0].array.stride = sizeof(MyVertex);
      decls[0].array.offset = offsetof(MyVertex, position);
      decls[1].identity.type = SVGA3D_DECLTYPE_FLOAT3;
      decls[1].identity.usage = SVGA3D_DECLUSAGE_COLOR;
      decls[1].array.surfaceId = vertexSid;
      decls[1].array.stride = sizeof(MyVertex);
      decls[1].array.offset = offsetof(MyVertex, color);
      ranges[0].primType = SVGA3D_PRIMITIVE_TRIANGLELIST;
      ranges[0].primitiveCount = TRIANGLES_PER_ROW;
      ranges[0].indexArray.surfaceId = indexSid;
      ranges[0].indexArray.stride = sizeof(IndexType);
      ranges[0].indexArray.offset = sizeof(IndexType) * INDICES_PER_ROW * row;
      ranges[0].indexWidth = sizeof(IndexType);
   }
   SVGA_FIFOCommitAll();
 }
 /*
 * render --
 *
 *    Calculate, upload, and draw the entire mesh.
 */
 void
 render(void)
 {
   DMAPoolBuffer *dma = updateVertices(0.2, 0.8, 0.2, 0, 0);
   int row;
   trashBuffer();
   uploadRow(0, dma);
   for (row = 1; row < MESH_HEIGHT; row++) {
      uploadRow(row, dma);
      drawStrip(row - 1);
   }
   SVGA3DUtil_AsyncCall((AsyncCallFn) SVGA3DUtil_DMAPoolFreeBuffer, dma);
 }
 /*
 * main --
 *
 *    Our example's entry point, invoked directly by the bootloader.
 */
 int
 main(void)
 {
   SVGA3DUtil_InitFullscreen(CID, 800, 600);
   SVGA3DText_Init();
   vertexSid = SVGA3DUtil_DefineSurface2D(MESH_NUM_BYTES, 1, SVGA3D_BUFFER);
   indexSid = createIndexBuffer();
   SVGA3DUtil_AllocDMAPool(&vertexDMA, MESH_NUM_BYTES, 16);
   Matrix_Perspective(perspectiveMat, 45.0f,
                      gSVGA.width / (float)gSVGA.height, 0.1f, 100.0f);
   while (1) {
      if (SVGA3DUtil_UpdateFPSCounter(&gFPS)) {
         Console_Clear();
         Console_Format("VMware SVGA3D Example:\n"
                        "Dynamic vertex buffer stress-test.\n"
                        "This example performs a separate DMA and "
                        "Draw for each row of the mesh.\n\n%s",
                        gFPS.text);
         SVGA3DText_Update();
      }
      SVGA3DUtil_ClearFullscreen(CID, SVGA3D_CLEAR_COLOR | SVGA3D_CLEAR_DEPTH,
                                 0x113366, 1.0f, 0);
      setupFrame();
      render();
      SVGA3DText_Draw();
      SVGA3DUtil_PresentFullscreen();
   }
   return 0;
 }
--- a/examples/dynamic-vertex/Makefile
+++ b/examples/dynamic-vertex/Makefile
@ -0,0 +1,6 @@
 TARGET = dynamic-vertex.img
 APP_SOURCES = main.c
 LIB_DIR = ../../lib
 include $(LIB_DIR)/Makefile.rules
--- a/examples/dynamic-vertex/main.c
+++ b/examples/dynamic-vertex/main.c
@ -0,0 +1,279 @@
 /*
 * SVGA3D example: Dynamic vertex buffers.
 *
 * This example shows how to efficiently stream vertex data to the
 * GPU, using multiple DMA buffers but a single vertex buffer. We
 * allocate DMA buffers from a pool every time we want to draw a new
 * dynamic mesh, then we asynchronously recycle those buffers after
 * the DMA transfer has completed.
 *
 * Copyright (C) 2008-2009 VMware, Inc. Licensed under the MIT
 * License, please see the README.txt. All rights reserved.
 */
 #include "svga3dutil.h"
 #include "svga3dtext.h"
 #include "matrix.h"
 #include "math.h"
 #define MESH_WIDTH      128
 #define MESH_HEIGHT     128
 #define MESH_NUM_VERTICES   (MESH_WIDTH * MESH_HEIGHT)
 #define MESH_NUM_QUADS      ((MESH_WIDTH-1) * (MESH_HEIGHT-1))
 #define MESH_NUM_TRIANGLES  (MESH_NUM_QUADS * 2)
 #define MESH_NUM_INDICES    (MESH_NUM_TRIANGLES * 3)
 #define MESH_ELEMENT(x, y)  (MESH_WIDTH * (y) + (x))
 typedef struct {
   float position[3];
   float color[3];
 } MyVertex;
 typedef uint16 IndexType;
 DMAPool vertexDMA;
 uint32 vertexSid, indexSid;
 Matrix perspectiveMat;
 FPSCounterState gFPS;
 /*
 * setupFrame --
 *
 *    Set up render state that we load once per frame (because
 *    SVGA3DText clobbered it) and perform matrix calculations that we
 *    only need once per frame.
 */
 void
 setupFrame(void)
 {
   static Matrix world;
   SVGA3dTextureState *ts;
   SVGA3dRenderState *rs;
   Matrix_Copy(world, gIdentityMatrix);
   Matrix_RotateX(world, -60.0 * PI_OVER_180);
   Matrix_RotateY(world, gFPS.frame * 0.001f);
   SVGA3D_SetTransform(CID, SVGA3D_TRANSFORM_WORLD, world);
   SVGA3D_SetTransform(CID, SVGA3D_TRANSFORM_PROJECTION, perspectiveMat);
   SVGA3D_BeginSetRenderState(CID, &rs, 4);
   {
      rs[0].state     = SVGA3D_RS_BLENDENABLE;
      rs[0].uintValue = FALSE;
      rs[1].state     = SVGA3D_RS_ZENABLE;
      rs[1].uintValue = TRUE;
      rs[2].state     = SVGA3D_RS_ZWRITEENABLE;
      rs[2].uintValue = TRUE;
      rs[3].state     = SVGA3D_RS_ZFUNC;
      rs[3].uintValue = SVGA3D_CMP_LESS;
   }
   SVGA_FIFOCommitAll();
   SVGA3D_BeginSetTextureState(CID, &ts, 4);
   {
      ts[0].stage = 0;
      ts[0].name  = SVGA3D_TS_BIND_TEXTURE;
      ts[0].value = SVGA3D_INVALID_ID;
      ts[1].stage = 0;
      ts[1].name  = SVGA3D_TS_COLOROP;
      ts[1].value = SVGA3D_TC_SELECTARG1;
      ts[2].stage = 0;
      ts[2].name  = SVGA3D_TS_COLORARG1;
      ts[2].value = SVGA3D_TA_DIFFUSE;
      ts[3].stage = 0;
      ts[3].name  = SVGA3D_TS_ALPHAARG1;
      ts[3].value = SVGA3D_TA_DIFFUSE;
   }
   SVGA_FIFOCommitAll();
 }
 /*
 * updateVertices --
 *
 *    Calculate new vertices, writing them directly into an available
 *    DMA buffer. Asynchronously begin DMA and recycle the buffer.
 */
 void
 updateVertices(float red, float green, float blue, float phase, float offset)
 {
   DMAPoolBuffer *dma;
   MyVertex *vert;
   int x, y;
   float t = gFPS.frame * 0.01f + phase;
   dma = SVGA3DUtil_DMAPoolGetBuffer(&vertexDMA);
   vert = (MyVertex*) dma->buffer;
   for (y = 0; y < MESH_HEIGHT; y++) {
      for (x = 0; x < MESH_WIDTH; x++) {
         float fx = x * (2.0 / MESH_WIDTH) - 1.0;
         float fy = y * (2.0 / MESH_HEIGHT) - 1.0;
         float fxo = fx + offset;
         float dist = fxo * fxo + fy * fy;
         float z = sinf(dist * 8.0 + t) / (1 + dist * 10.0);
         vert->position[0] = fx;
         vert->position[1] = fy;
         vert->position[2] = z;
         vert->color[0] = red - z;
         vert->color[1] = green - z;
         vert->color[2] = blue - z;
         vert++;
      }
   }
   SVGA3DUtil_SurfaceDMA2D(vertexSid, &dma->ptr, SVGA3D_WRITE_HOST_VRAM,
                           MESH_NUM_VERTICES * sizeof(MyVertex), 1);
   SVGA3DUtil_AsyncCall((AsyncCallFn) SVGA3DUtil_DMAPoolFreeBuffer, dma);
 }
 /*
 * drawMesh --
 *
 *    Draw our mesh at a particular position. This uses the index and
 *    vertex data which is resident in the host VRAM buffers at the
 *    time the drawing command is executed asynchronously.
 */
 void
 drawMesh(float posX, float posY, float posZ)
 {
   SVGA3dVertexDecl *decls;
   SVGA3dPrimitiveRange *ranges;
   static Matrix view;
   Matrix_Copy(view, gIdentityMatrix);
   Matrix_Translate(view, posX, posY, posZ);
   SVGA3D_SetTransform(CID, SVGA3D_TRANSFORM_VIEW, view);
   SVGA3D_BeginDrawPrimitives(CID, &decls, 2, &ranges, 1);
   {
      decls[0].identity.type = SVGA3D_DECLTYPE_FLOAT3;
      decls[0].identity.usage = SVGA3D_DECLUSAGE_POSITION;
      decls[0].array.surfaceId = vertexSid;
      decls[0].array.stride = sizeof(MyVertex);
      decls[0].array.offset = offsetof(MyVertex, position);
      decls[1].identity.type = SVGA3D_DECLTYPE_FLOAT3;
      decls[1].identity.usage = SVGA3D_DECLUSAGE_COLOR;
      decls[1].array.surfaceId = vertexSid;
      decls[1].array.stride = sizeof(MyVertex);
      decls[1].array.offset = offsetof(MyVertex, color);
      ranges[0].primType = SVGA3D_PRIMITIVE_TRIANGLELIST;
      ranges[0].primitiveCount = MESH_NUM_TRIANGLES;
      ranges[0].indexArray.surfaceId = indexSid;
      ranges[0].indexArray.stride = sizeof(IndexType);
      ranges[0].indexWidth = sizeof(IndexType);
   }
   SVGA_FIFOCommitAll();
 }
 /*
 * createIndexBuffer --
 *
 *    Create a static index buffer that renders our vertices as a 2D
 *    mesh. For simplicity, we use a triangle list rather than a
 *    triangle strip.
 */
 uint32
 createIndexBuffer(void)
 {
   IndexType *indexBuffer;
   const uint32 bufferSize = MESH_NUM_INDICES * sizeof *indexBuffer;
   SVGAGuestPtr gPtr;
   uint32 sid;
   int x, y;
   sid = SVGA3DUtil_DefineSurface2D(bufferSize, 1, SVGA3D_BUFFER);
   indexBuffer = SVGA3DUtil_AllocDMABuffer(bufferSize, &gPtr);
   for (y = 0; y < (MESH_HEIGHT - 1); y++) {
      for (x = 0; x < (MESH_WIDTH - 1); x++) {
         indexBuffer[0] = MESH_ELEMENT(x,   y  );
         indexBuffer[1] = MESH_ELEMENT(x+1, y  );
         indexBuffer[2] = MESH_ELEMENT(x+1, y+1);
         indexBuffer[3] = MESH_ELEMENT(x+1, y+1);
         indexBuffer[4] = MESH_ELEMENT(x,   y+1);
         indexBuffer[5] = MESH_ELEMENT(x,   y  );
         indexBuffer += 6;
      }
   }
   SVGA3DUtil_SurfaceDMA2D(sid, &gPtr, SVGA3D_WRITE_HOST_VRAM, bufferSize, 1);
   return sid;
 }
 /*
 * main --
 *
 *    Our example's entry point, invoked directly by the bootloader.
 */
 int
 main(void)
 {
   SVGA3DUtil_InitFullscreen(CID, 800, 600);
   SVGA3DText_Init();
   vertexSid = SVGA3DUtil_DefineSurface2D(MESH_NUM_VERTICES * sizeof(MyVertex),
                                          1, SVGA3D_BUFFER);
   indexSid = createIndexBuffer();
   SVGA3DUtil_AllocDMAPool(&vertexDMA, MESH_NUM_VERTICES * sizeof(MyVertex), 16);
   Matrix_Perspective(perspectiveMat, 45.0f,
                      gSVGA.width / (float)gSVGA.height, 0.1f, 100.0f);
   while (1) {
      if (SVGA3DUtil_UpdateFPSCounter(&gFPS)) {
         Console_Clear();
         Console_Format("VMware SVGA3D Example:\n"
                        "Dynamic vertex buffers.\n\n%s",
                        gFPS.text);
         SVGA3DText_Update();
      }
      SVGA3DUtil_ClearFullscreen(CID, SVGA3D_CLEAR_COLOR | SVGA3D_CLEAR_DEPTH,
                                 0x113366, 1.0f, 0);
      setupFrame();
      updateVertices(1, 0.5, 0.5, M_PI, 0);
      drawMesh(-1.5, -1, 6);
      updateVertices(0.5, 1.0, 0.5, 0, 0);
      drawMesh(0, 1, 6);
      updateVertices(0.5, 0.5, 1.0, 0, 1.5);
      drawMesh(1.5, -1, 6);
      SVGA3DText_Draw();
      SVGA3DUtil_PresentFullscreen();
   }
   return 0;
 }
--- a/examples/fence-stress/Makefile
+++ b/examples/fence-stress/Makefile
@ -0,0 +1,6 @@
 TARGET = fence-stress.img
 APP_SOURCES = main.c
 LIB_DIR = ../../lib
 include $(LIB_DIR)/Makefile.rules
--- a/examples/fence-stress/main.c
+++ b/examples/fence-stress/main.c
@ -0,0 +1,58 @@
 /*
 * SVGA3D example: Stress-test for our FIFO Fence synchronization.
 *
 * Copyright (C) 2008-2009 VMware, Inc. Licensed under the MIT
 * License, please see the README.txt. All rights reserved.
 */
 #include "svga3dutil.h"
 #include "svga3dtext.h"
 #define SYNCS_PER_FRAME      1024
 int
 main(void)
 {
   int i, j;
   uint32 fence = 0;
   static FPSCounterState gFPS;
   SVGA3DUtil_InitFullscreen(CID, 640, 480);
   SVGA3DText_Init();
   while (1) {
      SVGA3DUtil_UpdateFPSCounter(&gFPS);
      Console_Clear();
      Console_Format("VMware SVGA3D Example:\n"
                     "FIFO Fence stress-test.\n"
                     "%d syncs per frame.\n"
                     "\n"
                     "%s\n"
                     "\n"
                     "Latest fence: 0x%08x\n"
                     "   IRQ count: %d\n",
                     SYNCS_PER_FRAME, gFPS.text, fence, gSVGA.irq.count);
      SVGA3DText_Update();
      SVGA3DUtil_ClearFullscreen(CID, SVGA3D_CLEAR_COLOR, 0, 1.0f, 0);
      SVGA3DText_Draw();
      SVGA3DUtil_PresentFullscreen();
      for (j = 0; j < SYNCS_PER_FRAME; j++) {
         for (i=0; i<100; i++) {
            SVGA_InsertFence();
         }
         fence = SVGA_InsertFence();
         for (i=0; i<50; i++) {
            SVGA_InsertFence();
         }
         SVGA_SyncToFence(fence);
      }
   }
   return 0;
 }
--- a/examples/gmr-test/Makefile
+++ b/examples/gmr-test/Makefile
@ -0,0 +1,7 @@
 TARGET = gmr-test.img
 VMX_MEMSIZE = 128
 APP_SOURCES = main.c
 LIB_DIR = ../../lib
 include $(LIB_DIR)/Makefile.rules
--- a/examples/gmr-test/main.c
+++ b/examples/gmr-test/main.c
@ -0,0 +1,578 @@
 /*
 * SVGA3D example: Test harness and low-level example program for
 * Guest Memory Regions.
 *
 * With Guest Memory regions, the SVGA device can perform DMA
 * operations directly between guest system memory and host
 * VRAM. Guest drivers use the device's GMR registers to set up
 * regions of guest memory which can be accessed by the device, then
 * the driver refers to these regions by ID when sending pointers over
 * the command FIFO.
 *
 * GMRs support physically contiguous or discontiguous memory. This
 * example is a bit contrived because we're testing GMRs without an
 * operating system or a virtual memory subsystem- in a real OS,
 * support for physically discontiguous addresses would often be
 * required in order to ensure that the GMR's address space matches
 * that of a particular virtual address space in the OS. In this
 * example, we just test physically discontiguous regions for the sake
 * of testing them.
 *
 * This test harness is focused on system memory GMRs, however it also
 * ends up testing much of the GLSurface and GLFBO code, since it
 * performs GMR-to-GMR copies by way of surface DMA operations.
 *
 * Copyright (C) 2008-2009 VMware, Inc. Licensed under the MIT
 * License, please see the README.txt. All rights reserved.
 */
 #include "svga.h"
 #include "svga3dutil.h"
 #include "svga3dtext.h"
 #include "console_vga.h"
 #include "gmr.h"
 #include "math.h"
 #include "mt19937ar.h"
 /* Maximum number of copy boxes we'll test with. The host has no limit. */
 #define MAX_COPY_BOXES     128
 /*
 * Global data
 */
 static uint32 tempSurfaceId;
 static uint32 randSeed;
 static uint32 testIters;
 static uint32 testRegionSize;
 static const char *testPass;
 /*
 * TestPattern_Write --
 * TestPattern_Check --
 *
 *    Write/check an arbitrary deterministic test pattern in the
 *    provided buffer. The buffer must be a multiple of 4 bytes long.
 *
 *    Instead of generating a unique random number for every word,
 *    which would be pretty slow, this generates a prime number of
 *    random words, which then repeat across the entire check range.
 */
 #define PATTERN_BUFFER_LEN  41  // Must be prime
 void
 TestPattern_Write(uint32 *buffer,
                  uint32 size)
 {
 #ifndef DISABLE_CHECKING
   uint32 pattern[PATTERN_BUFFER_LEN];
   int i;
   init_genrand(randSeed);
   for (i = 0; i < PATTERN_BUFFER_LEN; i++) {
      pattern[i] = genrand_int32();
   }
   i = 0;
   size /= sizeof *buffer;
   while (size--) {
      *(buffer++) = pattern[i];
      if (++i == PATTERN_BUFFER_LEN) {
         i = 0;
      }
   }
 #endif
 }
 void
 TestPattern_Check(uint32 *buffer,
                  uint32 size,
                  uint32 offset,
                  uint32 line,
                  uint32 index)
 {
 #ifndef DISABLE_CHECKING
   uint32 pattern[PATTERN_BUFFER_LEN];
   int i;
   init_genrand(randSeed);
   for (i = 0; i < PATTERN_BUFFER_LEN; i++) {
      pattern[i] = genrand_int32();
   }
   offset /= sizeof *buffer;
   size /= sizeof *buffer;
   i = offset % PATTERN_BUFFER_LEN;
   while (size) {
      uint32 v = pattern[i];
      if (++i == PATTERN_BUFFER_LEN) {
         i = 0;
      }
      if (*buffer != v) {
         SVGA_Disable();
         ConsoleVGA_Init();
         Console_Format("Test pattern mismatch on %4x.%4x\n"
                        "Test pass: %s\n"
                        "Mismatch at %08x, with %08x bytes left in block.\n\n",
                        line, index, testPass, buffer, size * sizeof *buffer);
         size = MIN(size, 16);
         while (size) {
            Console_Format("Actual: %08x  Expected: %08x\n",
                           *buffer, v);
            buffer++;
            size--;
            v = pattern[i];
            if (++i == PATTERN_BUFFER_LEN) {
               i = 0;
            }
         }
         Intr_Disable();
         Intr_Halt();
      }
      buffer++;
      size--;
   }
 #endif
 }
 /*
 * GMR_GenericCopy --
 *
 *    Copy between two GMRs, using an arbitrarily shaped buffer
 *    surface and an arbitrary list of copy boxes.
 *
 *    In the copy boxes, the 'source' represents locations
 *    on both guest surfaces and the 'destination' represents
 *    a locations in host VRAM.
 */
 void
 GMR_GenericCopy(SVGAGuestPtr *dest,
                SVGAGuestPtr *src,
                SVGA3dSize *surfSize,
                SVGA3dSurfaceFormat format,
                SVGA3dCopyBox *boxes,
                uint32 numBoxes)
 {
   SVGA3dSize *mipSizes;
   SVGA3dSurfaceFace *faces;
   SVGA3dCopyBox *dmaBoxes;
   SVGA3dGuestImage srcImage = { *src };
   SVGA3dGuestImage destImage = { *dest };
   SVGA3dSurfaceImageId hostImage = { tempSurfaceId };
   SVGA3D_BeginDefineSurface(tempSurfaceId, 0, format, &faces, &mipSizes, 1);
   faces[0].numMipLevels = 1;
   mipSizes[0] = *surfSize;
   SVGA_FIFOCommitAll();
   SVGA3D_BeginSurfaceDMA(&srcImage, &hostImage, SVGA3D_WRITE_HOST_VRAM,
                          &dmaBoxes, numBoxes);
   memcpy(dmaBoxes, boxes, numBoxes * sizeof boxes[0]);
   SVGA_FIFOCommitAll();
   SVGA3D_BeginSurfaceDMA(&destImage, &hostImage, SVGA3D_READ_HOST_VRAM,
                          &dmaBoxes, numBoxes);
   memcpy(dmaBoxes, boxes, numBoxes * sizeof boxes[0]);
   SVGA_FIFOCommitAll();
   SVGA3D_DestroySurface(tempSurfaceId);
   /* Wait for both DMA operations to finish. */
   SVGA_SyncToFence(SVGA_InsertFence());
 }
 /*
 * Display_BeginPass --
 *
 *    Begin a new test pass, and update the on-screen display.
 */
 void
 Display_BeginPass(const char *pass)
 {
   testPass = pass;
   Console_Clear();
   Console_Format("VMware SVGA3D Example:\n"
                  "Guest Memory Region stress-test.\n"
                  "\n"
                  "Host capabilities\n"
                  "-----------------\n"
                  "\n"
                  "            Max IDs: %d\n"
                  " Max Descriptor Len: %d\n"
                  "\n"
                  "Test status\n"
                  "-----------\n"
                  "\n"
                  "   Iterations: %d\n"
                  "         Seed: %08x\n"
                  "      Running: %s\n"
                  "\n"
 #ifdef DISABLE_CHECKING
                  "CHECKING DISABLED. This test can't fail.\n",
 #else
                  "Test is running successfully so far. Will Panic on failure.\n",
 #endif
                  gGMR.maxIds, gGMR.maxDescriptorLen, testIters,
                  randSeed, testPass);
   VMBackdoor_VGAScreenshot();
   SVGA3DText_Update();
   SVGA3DUtil_ClearFullscreen(CID, SVGA3D_CLEAR_COLOR, 0x000080, 1.0f, 0);
   SVGA3DText_Draw();
   SVGA3DUtil_PresentFullscreen();
 }
 /*
 * runTestPass --
 *
 *    Run one test pass- create two large GMRs, one contiguous and one
 *    discontiguous.  Copy a test pattern back and forth between the
 *    two buffers, using the provided surface size and type.
 */
 void
 runTestPass(uint32 testRegionSize,
            SVGA3dSize *surfSize,
            SVGA3dSurfaceFormat format,
            SVGA3dCopyBox *boxes,
            uint32 numBoxes)
 {
   SVGAGuestPtr contig = { 0, 0 };
   SVGAGuestPtr evenPages = { gGMR.maxIds - 1, 0 };
   int i;
   uint32 contigPages = GMR_DefineContiguous(contig.gmrId, gGMR.maxDescriptorLen * 2);
   uint32 discontigPages = GMR_DefineEvenPages(evenPages.gmrId, gGMR.maxDescriptorLen);
   /*
    * Write a test pattern into the contiguous GMR.
    */
   TestPattern_Write(PPN_POINTER(contigPages), testRegionSize);
   TestPattern_Check(PPN_POINTER(contigPages), testRegionSize, 0, __LINE__, 0);
   /*
    * Copy from contiguous to discontiguous.
    */
   GMR_GenericCopy(&evenPages, &contig, surfSize, format, boxes, numBoxes);
   /*
    * Check the discontiguous GMR, page-by-page.
    */
   for (i = 0; i < testRegionSize / PAGE_SIZE; i++) {
      TestPattern_Check(PPN_POINTER(discontigPages + 2*i),
                        PAGE_SIZE, PAGE_SIZE * i, __LINE__, i);
   }
   /*
    * Clear the contiguous GMR, then copy data back into it from the discontiguous GMR.
    */
   memset(PPN_POINTER(contigPages), 0x42, testRegionSize);
   GMR_GenericCopy(&contig, &evenPages, surfSize, format, boxes, numBoxes);
   /*
    * Check the contiguous GMR again.
    */
   TestPattern_Check(PPN_POINTER(contigPages), testRegionSize, 0, __LINE__, i);
   GMR_FreeAll();
   Heap_Reset();
 }
 /*
 * createBoxes --
 *
 *    Create an array of N copyboxes which cover an entire surface.
 *    This begins with a single large copybox, and iteratively splits
 *    small boxes off from a random face on the original box.
 *
 *    This function can and will generate degenerate copy boxes
 *    (zero-size).  The SVGA3D device must ignore those boxes.
 */
 void
 createBoxes(SVGA3dSize *size,
            SVGA3dCopyBox *boxes,
            uint32 numBoxes)
 {
   uint32 i;
   SVGA3dCopyBox space = {
      .w = size->width,
      .h = size->height,
      .d = size->depth,
   };
   init_genrand(randSeed);
   for (i = 0; i < numBoxes - 1; i++) {
      uint32 rand = genrand_int32();
      uint32 a;
      memcpy(&boxes[i], &space, sizeof space);
      switch (rand % 6) {
      case 0:                   /* X- */
         a = rand % space.w;
         boxes[i].w = a;
         space.x += a;
         space.w -= a;
         break;
      case 1:                   /* Y- */
         a = rand % space.h;
         boxes[i].h = a;
         space.y += a;
         space.h -= a;
         break;
      case 2:                   /* Z- */
         a = rand % space.d;
         boxes[i].d = a;
         space.z += a;
         space.d -= a;
         break;
      case 3:                   /* X+ */
         a = rand % space.w;
         boxes[i].w = a;
         space.w -= a;
         boxes[i].x += space.w;
         break;
      case 4:                   /* Y+ */
         a = rand % space.h;
         boxes[i].h = a;
         space.h -= a;
         boxes[i].y += space.h;
         break;
      case 5:                   /* Z+ */
         a = rand % space.d;
         boxes[i].d = a;
         space.d -= a;
         boxes[i].z += space.d;
         break;
      }
   }
   boxes[i] = space;
   for (i = 0; i < numBoxes; i++) {
      boxes[i].srcx = boxes[i].x;
      boxes[i].srcy = boxes[i].y;
      boxes[i].srcz = boxes[i].z;
   }
 }
 /*
 * createMisaligned1dBoxes --
 *
 *    Create an array of N 1-dimensional copyboxes, most of which
 *    have a width of PAGE_SIZE-1 bytes.
 *
 *    The boxes may extend past the end of 'size'. This is okay,
 *    the SVGA3D device is responsible for clipping them.
 */
 void
 createMisaligned1dBoxes(uint32 size,
                        SVGA3dCopyBox *boxes,
                        uint32 numBoxes)
 {
   uint32 offset = 0;
   uint32 i;
   memset(boxes, 0, sizeof *boxes * numBoxes);
   for (i = 0; i < numBoxes - 1; i++) {
      boxes[i].x = boxes[i].srcx = offset;
      boxes[i].w = PAGE_SIZE-1;
      boxes[i].h = 1;
      boxes[i].d = 1;
      offset += boxes[i].w;
   }
   boxes[i].x = boxes[i].srcx = offset;
   boxes[i].w = size - offset;
   boxes[i].h = 1;
   boxes[i].d = 1;
 }
 /*
 * runTests --
 *
 *    Main function to run one iteration of all tests.
 */
 void
 runTests(void)
 {
   /* Maximum size of worst-case-discontiguous region we can represent */
   uint32 largeRegionSize = gGMR.maxDescriptorLen * PAGE_SIZE;
   /* Smaller region, to speed up other testing. */
   uint32 regionSize = 0x20 * PAGE_SIZE;
   /* Smallest region, suitable for 1D textures. */
   uint32 tinyRegionSize = 1024;
   SVGA3dSize size1dLarge = {
      .width = largeRegionSize,
      .height = 1,
      .depth = 1,
   };
   SVGA3dSize size1d = {
      .width = tinyRegionSize,
      .height = 1,
      .depth = 1,
   };
   SVGA3dSize size2d = {
      .width = 0x100,
      .height = regionSize / 0x100,
      .depth = 1,
   };
   SVGA3dSize size3d = {
      .width = 0x40,
      .height = 0x40,
      .depth = regionSize / 0x1000,
   };
   /* A single maximally-sized 1D copybox. The host will clip it. */
   SVGA3dCopyBox maxBox1d = {
      .w = 0xFFFFFFFFUL,
      .h = 1,
      .d = 1,
   };
   SVGA3dCopyBox boxes[MAX_COPY_BOXES];
   /*
    * Basic per-surface-format tests.
    *
    * Note that 3D compressed textures are not expected to work yet,
    * so we skip those tests.
    */
 #define TEST_FORMAT_2D(f, b)                                               \
   {                                                                       \
      Display_BeginPass("Single copy via 1D " #f " surface.");             \
      runTestPass(tinyRegionSize*b, &size1d, SVGA3D_ ## f, &maxBox1d, 1);  \
                                                                           \
      Display_BeginPass("Single copy via 2D " #f " surface.");             \
      createBoxes(&size2d, boxes, 1);                                      \
      runTestPass(regionSize*b, &size2d, SVGA3D_ ## f, boxes, 1);          \
   }
 #define TEST_FORMAT(f, b)                                                  \
   {                                                                       \
      TEST_FORMAT_2D(f, b)                                                 \
                                                                           \
      Display_BeginPass("Single copy via 3D " #f " surface.");             \
      createBoxes(&size3d, boxes, 1);                                      \
      runTestPass(regionSize*b, &size3d, SVGA3D_ ## f, boxes, 1);          \
   }
   TEST_FORMAT(BUFFER, 1)       // Buffers use their own host VRAM type
   TEST_FORMAT(LUMINANCE8, 1)   // Test a simple 8bpp format
   TEST_FORMAT(ALPHA8, 1)       // To isolate alpha channel bugs
   TEST_FORMAT(A8R8G8B8, 4)     // ARGB surfaces have more readback paths than others
   TEST_FORMAT_2D(DXT2, 1)      // Test 4x4 block size, and compressed texture upload/download
 #undef TEST_FORMAT
 #undef TEST_FORMAT_2D
   /*
    * Test large buffers (Limited by max size of worst-case fragmented GMR)
    */
   Display_BeginPass("Single copy via 1D BUFFER surface. (Large region)");
   runTestPass(largeRegionSize, &size1dLarge, SVGA3D_BUFFER, &maxBox1d, 1);
   /*
    * Test with randomly subdivided copyboxes.
    */
 #define TEST_FORMAT_2D(f, b)                                                    \
   {                                                                            \
      Display_BeginPass("Subdivided copy via 2D " #f " surface.");              \
      createBoxes(&size2d, boxes, MAX_COPY_BOXES);                              \
      runTestPass(regionSize*b, &size2d, SVGA3D_ ## f, boxes, MAX_COPY_BOXES);  \
   }
 #define TEST_FORMAT(f, b)                                                       \
   {                                                                            \
      TEST_FORMAT_2D(f, b)                                                      \
                                                                                \
      Display_BeginPass("Subdivided copy via 3D " #f " surface.");              \
      createBoxes(&size3d, boxes, MAX_COPY_BOXES);                              \
      runTestPass(regionSize*b, &size3d, SVGA3D_ ## f, boxes, MAX_COPY_BOXES);  \
   }
   TEST_FORMAT(BUFFER, 1)
   TEST_FORMAT(ALPHA8, 1)
   TEST_FORMAT(A8R8G8B8, 4)
   TEST_FORMAT_2D(DXT2, 1)      // Test compressed texture rectangle clipping
 #undef TEST_FORMAT
 #undef TEST_FORMAT_2D
   /*
    * Test another large 1D copy, split into slightly misaligned chunks.
    */
   Display_BeginPass("Misaligned copies via 1D BUFFER surface. (Large region)");
   createMisaligned1dBoxes(largeRegionSize, boxes, MAX_COPY_BOXES);
   runTestPass(largeRegionSize, &size1dLarge, SVGA3D_BUFFER, boxes, MAX_COPY_BOXES);
 }
 /*
 * main --
 *
 *    Entry point and main loop for the example.
 */
 int
 main(void)
 {
   SVGA3DUtil_InitFullscreen(CID, 640, 480);
   SVGA3DText_Init();
   GMR_Init();
   Heap_Reset();
   tempSurfaceId = SVGA3DUtil_AllocSurfaceID();
   testRegionSize = gGMR.maxDescriptorLen * PAGE_SIZE;
   while (1) {
      runTests();
      randSeed = genrand_int32();
      testIters++;
   }
   return 0;
 }
--- a/examples/half-float-test/Makefile
+++ b/examples/half-float-test/Makefile
@ -0,0 +1,16 @@
 TARGET = half-float-test.img
 APP_SOURCES = main.c
 LIB_DIR = ../../lib
 include $(LIB_DIR)/Makefile.rules
 .PHONY: shaders
 shaders: cube_vs.h cube_ps.h
 cube_vs.h: cube.fx
 	wine fxc.exe /T vs_2_0 /E MyVertexShader /Fh cube_vs.h cube.fx
 cube_ps.h: cube.fx
 	wine fxc.exe /T ps_2_0 /E MyPixelShader /Fh cube_ps.h cube.fx
--- a/examples/half-float-test/cube.fx
+++ b/examples/half-float-test/cube.fx
@ -0,0 +1,36 @@
 float4x4 matView, matProj;
 struct VS_Input
 {
   float4  Pos      : POSITION;
   float4  Color    : COLOR0;
 };
 struct VS_Output
 {
   float4  Pos      : POSITION;
   float4  Color    : COLOR0;
 };
 VS_Output
 MyVertexShader(VS_Input Input)
 {
   VS_Output Output;
   Output.Pos = mul(mul(Input.Pos, matView), matProj);
   Output.Color = Input.Color;
   return Output;
 }
 struct PS_Input
 {
   float4  Color    : COLOR0;
 };
 float4
 MyPixelShader(PS_Input Input) : COLOR
 {
   return Input.Color;
 }
--- a/examples/half-float-test/cube_ps.h
+++ b/examples/half-float-test/cube_ps.h
@ -0,0 +1,21 @@
 #if 0
 //
 // Generated by Microsoft (R) D3DX9 Shader Compiler 
 //
 //   fxc /T ps_2_0 /E MyPixelShader /Fh cube_ps.h cube.fx
 //
    ps_2_0
    dcl v0
    mov oC0, v0
 // approximately 1 instruction slot used
 #endif
 const DWORD g_ps20_MyPixelShader[] =
 {
    0xffff0200, 0x0013fffe, 0x42415443, 0x0000001c, 0x00000023, 0xffff0200, 
    0x00000000, 0x00000000, 0x20000100, 0x0000001c, 0x325f7370, 0x4d00305f, 
    0x6f726369, 0x74666f73, 0x29522820, 0x44334420, 0x53203958, 0x65646168, 
    0x6f432072, 0x6c69706d, 0x00207265, 0x0200001f, 0x80000000, 0x900f0000, 
    0x02000001, 0x800f0800, 0x90e40000, 0x0000ffff
 };
--- a/examples/half-float-test/cube_vs.h
+++ b/examples/half-float-test/cube_vs.h
@ -0,0 +1,54 @@
 #if 0
 //
 // Generated by Microsoft (R) D3DX9 Shader Compiler 
 //
 //   fxc /T vs_2_0 /E MyVertexShader /Fh cube_vs.h cube.fx
 //
 //
 // Parameters:
 //
 //   float4x4 matProj;
 //   float4x4 matView;
 //
 //
 // Registers:
 //
 //   Name         Reg   Size
 //   ------------ ----- ----
 //   matView      c0       4
 //   matProj      c4       4
 //
    vs_2_0
    dcl_position v0
    dcl_color v1
    dp4 r0.x, v0, c0
    dp4 r0.y, v0, c1
    dp4 r0.z, v0, c2
    dp4 r0.w, v0, c3
    dp4 oPos.x, r0, c4
    dp4 oPos.y, r0, c5
    dp4 oPos.z, r0, c6
    dp4 oPos.w, r0, c7
    mov oD0, v1
 // approximately 9 instruction slots used
 #endif
 const DWORD g_vs20_MyVertexShader[] =
 {
    0xfffe0200, 0x0025fffe, 0x42415443, 0x0000001c, 0x0000006b, 0xfffe0200, 
    0x00000002, 0x0000001c, 0x20000100, 0x00000064, 0x00000044, 0x00040002, 
    0x00000004, 0x0000004c, 0x00000000, 0x0000005c, 0x00000002, 0x00000004, 
    0x0000004c, 0x00000000, 0x5074616d, 0x006a6f72, 0x00030003, 0x00040004, 
    0x00000001, 0x00000000, 0x5674616d, 0x00776569, 0x325f7376, 0x4d00305f, 
    0x6f726369, 0x74666f73, 0x29522820, 0x44334420, 0x53203958, 0x65646168, 
    0x6f432072, 0x6c69706d, 0x00207265, 0x0200001f, 0x80000000, 0x900f0000, 
    0x0200001f, 0x8000000a, 0x900f0001, 0x03000009, 0x80010000, 0x90e40000, 
    0xa0e40000, 0x03000009, 0x80020000, 0x90e40000, 0xa0e40001, 0x03000009, 
    0x80040000, 0x90e40000, 0xa0e40002, 0x03000009, 0x80080000, 0x90e40000, 
    0xa0e40003, 0x03000009, 0xc0010000, 0x80e40000, 0xa0e40004, 0x03000009, 
    0xc0020000, 0x80e40000, 0xa0e40005, 0x03000009, 0xc0040000, 0x80e40000, 
    0xa0e40006, 0x03000009, 0xc0080000, 0x80e40000, 0xa0e40007, 0x02000001, 
    0xd00f0000, 0x90e40001, 0x0000ffff
 };
--- a/examples/half-float-test/main.c
+++ b/examples/half-float-test/main.c
@ -0,0 +1,227 @@
 /*
 * Test support for half-precision (16-bit) floating point.
 *
 * This test draws four cubes, to test fixed-function and programmable
 * pipelines, and to test 16-bit and 32-bit float vertices.
 *
 * Copyright (C) 2008-2009 VMware, Inc. Licensed under the MIT
 * License, please see the README.txt. All rights reserved.
 */
 #include "svga3dutil.h"
 #include "svga3dtext.h"
 #include "matrix.h"
 #include "math.h"
 typedef uint32 DWORD;
 #include "cube_vs.h"
 #include "cube_ps.h"
 /* 16-bit floating point constants */
 #define HALF_0              0x0000
 #define HALF_POS_1          0x3c00
 #define HALF_NEG_1          0xbc00
 #define MY_VSHADER_ID       0
 #define MY_PSHADER_ID       0
 #define CONST_MAT_VIEW      0
 #define CONST_MAT_PROJ      4
 typedef struct {
   float  position32[3];
   uint16 position16[4];
   uint32 color;
 } MyVertex;
 static const MyVertex vertexData[] = {
   { {-1, -1, -1}, {HALF_NEG_1, HALF_NEG_1, HALF_NEG_1, HALF_POS_1}, 0xFFFFFF },
   { {-1, -1,  1}, {HALF_NEG_1, HALF_NEG_1, HALF_POS_1, HALF_POS_1}, 0xFFFF00 },
   { {-1,  1, -1}, {HALF_NEG_1, HALF_POS_1, HALF_NEG_1, HALF_POS_1}, 0xFF00FF },
   { {-1,  1,  1}, {HALF_NEG_1, HALF_POS_1, HALF_POS_1, HALF_POS_1}, 0xFF0000 },
   { { 1, -1, -1}, {HALF_POS_1, HALF_NEG_1, HALF_NEG_1, HALF_POS_1}, 0x00FFFF },
   { { 1, -1,  1}, {HALF_POS_1, HALF_NEG_1, HALF_POS_1, HALF_POS_1}, 0x00FF00 },
   { { 1,  1, -1}, {HALF_POS_1, HALF_POS_1, HALF_NEG_1, HALF_POS_1}, 0x0000FF },
   { { 1,  1,  1}, {HALF_POS_1, HALF_POS_1, HALF_POS_1, HALF_POS_1}, 0x000000 },
 };
 #define QUAD(a,b,c,d) a, b, d, d, c, a
 static const uint16 indexData[] = {
   QUAD(0,1,2,3), // -X
   QUAD(4,5,6,7), // +X
   QUAD(0,1,4,5), // -Y
   QUAD(2,3,6,7), // +Y
   QUAD(0,2,4,6), // -Z
   QUAD(1,3,5,7), // +Z
 };
 #undef QUAD
 const uint32 numTriangles = sizeof indexData / sizeof indexData[0] / 3;
 uint32 vertexSid, indexSid;
 Matrix perspectiveMat;
 FPSCounterState gFPS;
 /*
 * renderCube --
 *
 *   Render one cube at the supplied X/Y coordinate, using either
 *   shaders or fixed-function, and using either 16-bit or 32-bit
 *   vertex data.
 */
 void
 renderCube(float x,
           float y,
           Bool useShaders,
           Bool useHalf)
 {
   SVGA3dTextureState *ts;
   SVGA3dRenderState *rs;
   SVGA3dVertexDecl *decls;
   SVGA3dPrimitiveRange *ranges;
   static Matrix view;
   Matrix_Copy(view, gIdentityMatrix);
   Matrix_RotateX(view, 30.0 * M_PI / 180.0);
   Matrix_RotateY(view, gFPS.frame * 0.01f);
   Matrix_Translate(view, x, y, 15);
   if (useShaders) {
      SVGA3D_SetShader(CID, SVGA3D_SHADERTYPE_VS, MY_VSHADER_ID);
      SVGA3D_SetShader(CID, SVGA3D_SHADERTYPE_PS, MY_PSHADER_ID);
      SVGA3DUtil_SetShaderConstMatrix(CID, CONST_MAT_PROJ,
                                      SVGA3D_SHADERTYPE_VS, perspectiveMat);
      SVGA3DUtil_SetShaderConstMatrix(CID, CONST_MAT_VIEW,
                                      SVGA3D_SHADERTYPE_VS, view);
   } else {
      SVGA3D_SetShader(CID, SVGA3D_SHADERTYPE_VS, SVGA3D_INVALID_ID);
      SVGA3D_SetShader(CID, SVGA3D_SHADERTYPE_PS, SVGA3D_INVALID_ID);
      SVGA3D_SetTransform(CID, SVGA3D_TRANSFORM_VIEW, view);
      SVGA3D_SetTransform(CID, SVGA3D_TRANSFORM_WORLD, gIdentityMatrix);
      SVGA3D_SetTransform(CID, SVGA3D_TRANSFORM_PROJECTION, perspectiveMat);
   }
   SVGA3D_BeginSetRenderState(CID, &rs, 4);
   {
      rs[0].state     = SVGA3D_RS_BLENDENABLE;
      rs[0].uintValue = FALSE;
      rs[1].state     = SVGA3D_RS_ZENABLE;
      rs[1].uintValue = TRUE;
      rs[2].state     = SVGA3D_RS_ZWRITEENABLE;
      rs[2].uintValue = TRUE;
      rs[3].state     = SVGA3D_RS_ZFUNC;
      rs[3].uintValue = SVGA3D_CMP_LESS;
   }
   SVGA_FIFOCommitAll();
   SVGA3D_BeginSetTextureState(CID, &ts, 4);
   {
      ts[0].stage = 0;
      ts[0].name  = SVGA3D_TS_BIND_TEXTURE;
      ts[0].value = SVGA3D_INVALID_ID;
      ts[1].stage = 0;
      ts[1].name  = SVGA3D_TS_COLOROP;
      ts[1].value = SVGA3D_TC_SELECTARG1;
      ts[2].stage = 0;
      ts[2].name  = SVGA3D_TS_COLORARG1;
      ts[2].value = SVGA3D_TA_DIFFUSE;
      ts[3].stage = 0;
      ts[3].name  = SVGA3D_TS_ALPHAARG1;
      ts[3].value = SVGA3D_TA_DIFFUSE;
   }
   SVGA_FIFOCommitAll();
   SVGA3D_BeginDrawPrimitives(CID, &decls, 2, &ranges, 1);
   {
      decls[0].identity.usage = SVGA3D_DECLUSAGE_POSITION;
      decls[0].array.surfaceId = vertexSid;
      decls[0].array.stride = sizeof(MyVertex);
      if (useHalf) {
         decls[0].identity.type = SVGA3D_DECLTYPE_FLOAT16_4;
         decls[0].array.offset = offsetof(MyVertex, position16);
      } else {
         decls[0].identity.type = SVGA3D_DECLTYPE_FLOAT3;
         decls[0].array.offset = offsetof(MyVertex, position32);
      }
      decls[1].identity.type = SVGA3D_DECLTYPE_D3DCOLOR;
      decls[1].identity.usage = SVGA3D_DECLUSAGE_COLOR;
      decls[1].array.surfaceId = vertexSid;
      decls[1].array.stride = sizeof(MyVertex);
      decls[1].array.offset = offsetof(MyVertex, color);
      ranges[0].primType = SVGA3D_PRIMITIVE_TRIANGLELIST;
      ranges[0].primitiveCount = numTriangles;
      ranges[0].indexArray.surfaceId = indexSid;
      ranges[0].indexArray.stride = sizeof(uint16);
      ranges[0].indexWidth = sizeof(uint16);
   }
   SVGA_FIFOCommitAll();
   SVGA3D_SetShader(CID, SVGA3D_SHADERTYPE_VS, SVGA3D_INVALID_ID);
   SVGA3D_SetShader(CID, SVGA3D_SHADERTYPE_PS, SVGA3D_INVALID_ID);
 }
 /*
 * main --
 *
 *    Our example's entry point, invoked directly by the bootloader.
 */
 int
 main(void)
 {
   SVGA3DUtil_InitFullscreen(CID, 800, 600);
   SVGA3DText_Init();
   vertexSid = SVGA3DUtil_DefineStaticBuffer(vertexData, sizeof vertexData);
   indexSid = SVGA3DUtil_DefineStaticBuffer(indexData, sizeof indexData);
   SVGA3D_DefineShader(CID, MY_VSHADER_ID, SVGA3D_SHADERTYPE_VS,
                       g_vs20_MyVertexShader, sizeof g_vs20_MyVertexShader);
   SVGA3D_DefineShader(CID, MY_PSHADER_ID, SVGA3D_SHADERTYPE_PS,
                       g_ps20_MyPixelShader, sizeof g_ps20_MyPixelShader);
   Matrix_Perspective(perspectiveMat, 45.0f,
                      gSVGA.width / (float)gSVGA.height, 10.0f, 100.0f);
   while (1) {
      if (SVGA3DUtil_UpdateFPSCounter(&gFPS)) {
         Console_Clear();
         Console_Format("Half-precision floating point test.\n"
                        "You should see four identical cubes.\n"
                        "\n"
                        "Top row: Fixed function, Bottom row: Shaders.\n"
                        "Left column: 32-bit float, Right column: 16-bit float.\n"
                        "\n%s",
                        gFPS.text);
         SVGA3DText_Update();
         VMBackdoor_VGAScreenshot();
      }
      SVGA3DUtil_ClearFullscreen(CID, SVGA3D_CLEAR_COLOR | SVGA3D_CLEAR_DEPTH,
                                 0x113366, 1.0f, 0);
      renderCube(-2, 2, FALSE, FALSE);   /* Top-left */
      renderCube(2, 2, FALSE, TRUE);     /* Top-right */
      renderCube(-2, -2, TRUE, FALSE);   /* Bottom-left */
      renderCube(2, -2, TRUE, TRUE);     /* Bottom-right */
      SVGA3DText_Draw();
      SVGA3DUtil_PresentFullscreen();
   }
   return 0;
 }
--- a/examples/pong/Makefile
+++ b/examples/pong/Makefile
@ -0,0 +1,8 @@
 TARGET = pong.img
 APP_SOURCES = main.c
 DEFS = -DREALLY_TINY
 LIB_DIR = ../../lib
 include $(LIB_DIR)/Makefile.rules
--- a/examples/pong/main.c
+++ b/examples/pong/main.c
@ -0,0 +1,710 @@
 /*
 * PongOS v2.0
 *
 * Micah Dowty <micah@vmware.com>
 *
 * Copyright (C) 2008-2009 VMware, Inc. Licensed under the MIT
 * License, please see the README.txt. All rights reserved.
 */
 #include "svga.h"
 #include "intr.h"
 #include "io.h"
 #include "timer.h"
 #include "keyboard.h"
 #include "vmbackdoor.h"
 #define PONG_DOT_SIZE           8
 #define PONG_DIGIT_PIXEL_SIZE   10
 #define PONG_BG_COLOR           0x000000
 #define PONG_SPRITE_COLOR       0xFFFFFF
 #define PONG_PLAYFIELD_COLOR    0xAAAAAA
 #define PONG_FRAME_RATE         60
 #define MAX_DIRTY_RECTS         128
 #define MAX_SPRITES             8
 typedef struct {
   float x, y;
 } Vector2;
 typedef struct {
   int x, y, w, h;
 } Rect;
 typedef struct {
   Rect r;
   uint32 color;
 } FillRect;
 static struct {
   uint32 *buffer;
   Rect dirtyRects[MAX_DIRTY_RECTS];
   uint32 numDirtyRects;
 } back;
 static struct {
   FillRect paddles[2];
   FillRect ball;
   uint8 scores[2];
   float ballSpeed;
   float paddleVelocities[2];
   float paddlePos[2];
   Vector2 ballVelocity;
   Vector2 ballPos;
   Bool playfieldDirty;
 } pong;
 /*
 *-----------------------------------------------------------------------------
 *
 * Random32 --
 *
 *    "Random" number generator. To save code space, we actually just use
 *    the low bits of the TSC. This of course isn't actually random, but
 *    it's good enough for Pong.
 *
 *-----------------------------------------------------------------------------
 */
 static uint32
 Random32(void)
 {
   uint64 t;
   __asm__ __volatile__("rdtsc" : "=A" (t));
   return (uint32)t;
 }
 /*
 *-----------------------------------------------------------------------------
 *
 * RectTestIntersection --
 *
 *    Returns TRUE iff two Rects intersect with each other.
 *
 *-----------------------------------------------------------------------------
 */
 static Bool
 RectTestIntersection(Rect *a,  // IN
                     Rect *b)  // IN
 {
   return !(a->x + a->w < b->x ||
            a->x > b->x + b->w ||
            a->y + a->h < b->y ||
            a->y > b->y + b->h);
 }
 /*
 *-----------------------------------------------------------------------------
 *
 * BackFill --
 *
 *    Perform a color fill on the backbuffer.
 *
 *-----------------------------------------------------------------------------
 */
 static void
 BackFill(FillRect fr)   // IN
 {
   int i, j;
   for (i = 0; i < fr.r.h; i++) {
      uint32 *line = &back.buffer[(fr.r.y + i) * gSVGA.width + fr.r.x];
      for (j = 0; j < fr.r.w; j++) {
         line[j] = fr.color;
      }
   }
 }
 /*
 *-----------------------------------------------------------------------------
 *
 * BackMarkDirty --
 *
 *    Mark a region of the backbuffer as dirty. We'll copy it to the
 *    front buffer and ask the host to update it on the next
 *    BackUpdate().
 *
 *-----------------------------------------------------------------------------
 */
 static void
 BackMarkDirty(Rect rect)  // IN
 {
   back.dirtyRects[back.numDirtyRects++] = rect;
 }
 /*
 *-----------------------------------------------------------------------------
 *
 * BackUpdate --
 *
 *    Copy all dirty regions of the backbuffer to the frontbuffer, and
 *    send updates to the SVGA device. Clears the dirtyRects list.
 *
 *    For flow control, this also waits for the host to process the
 *    batch of updates we just queued into the FIFO.
 *
 *-----------------------------------------------------------------------------
 */
 static void
 BackUpdate()  // IN
 {
   int rectNum;
   for (rectNum = 0; rectNum < back.numDirtyRects; rectNum++) {
      Rect rect = back.dirtyRects[rectNum];
      uint32 i, j;
      for (i = 0; i < rect.h; i++) {
         uint32 offset = (rect.y + i) * gSVGA.width + rect.x;
         uint32 *src = &back.buffer[offset];
         uint32 *dest = &((uint32*) gSVGA.fbMem)[offset];
         for (j = 0; j < rect.w; j++) {
            dest[j] = src[j];
         }
      }
      SVGA_Update(rect.x, rect.y, rect.w, rect.h);
   }
   back.numDirtyRects = 0;
   SVGA_SyncToFence(SVGA_InsertFence());
 }
 /*
 *-----------------------------------------------------------------------------
 *
 * PongDrawString --
 *
 *    Draw a string of digits, using our silly blocky font. The
 *    string's origin is the top-middle.
 *
 *-----------------------------------------------------------------------------
 */
 static void
 PongDrawString(uint32 x,         // IN
               uint32 y,         // IN
               const char *str,  // IN
               uint32 strLen)    // IN
 {
   const int charW = 4;
   const int charH = 5;
   static const uint8 font[] = {
      0xF1,  // **** ...*
      0x91,  // *..* ...*
      0x91,  // *..* ...*
      0x91,  // *..* ...*
      0xF1,  // **** ...*
      0xFF,  // **** ****
      0x11,  // ...* ...*
      0xFF,  // **** ****
      0x81,  // *... ...*
      0xFF,  // **** ****
      0x9F,  // *..* ****
      0x98,  // *..* *...
      0xFF,  // **** ****
      0x11,  // ...* ...*
      0x1F,  // ...* ****
      0xFF,  // **** ****
      0x81,  // *... ...*
      0xF1,  // **** ...*
      0x91,  // *..* ...*
      0xF1,  // **** ...*
      0xFF,  // **** ****
      0x99,  // *..* *..*
      0xFF,  // **** ****
      0x91,  // *..* ...*
      0xF1,  // **** ...*
   };
   x -= (PONG_DIGIT_PIXEL_SIZE * (strLen * (charW + 1) - 1)) / 2;
   while (*str) {
      int digit = *str - '0';
      if (digit >= 0 && digit <= 9) {
         int i, j;
         for (j = 0; j < charH; j++) {
            for (i = 0; i < charW; i++) {
               if ((font[digit / 2 * 5 + j] << i) & (digit & 1 ? 0x08 : 0x80)) {
                  FillRect pixel = {
                     {x + i * PONG_DIGIT_PIXEL_SIZE,
                      y + j * PONG_DIGIT_PIXEL_SIZE,
                      PONG_DIGIT_PIXEL_SIZE,
                      PONG_DIGIT_PIXEL_SIZE},
                     PONG_PLAYFIELD_COLOR,
                  };
                  BackFill(pixel);
               }
            }
         }
      }
      x += PONG_DIGIT_PIXEL_SIZE * (charW + 1);
      str++;
   }
 }
 /*
 *-----------------------------------------------------------------------------
 *
 * DecDigit --
 *
 *    Utility for extracting a decimal digit.
 *
 *-----------------------------------------------------------------------------
 */
 static char
 DecDigit(int i, int div, Bool blank)
 {
   if (blank && i < div) {
      return ' ';
   }
   return (i / div) % 10 + '0';
 }
 /*
 *-----------------------------------------------------------------------------
 *
 * PongDrawPlayfield --
 *
 *    Redraw the playfield for Pong.
 *
 *-----------------------------------------------------------------------------
 */
 static void
 PongDrawPlayfield()
 {
   int i;
   /*
    * Clear the screen
    */
   FillRect background = {
      {0, 0, gSVGA.width, gSVGA.height},
      PONG_BG_COLOR,
   };
   BackFill(background);
   /*
    * Draw the dotted dividing line
    */
   for (i = PONG_DOT_SIZE;
        i <= gSVGA.height - PONG_DOT_SIZE * 2;
        i += PONG_DOT_SIZE * 2) {
      FillRect dot = {
         {(gSVGA.width - PONG_DOT_SIZE) / 2, i,
          PONG_DOT_SIZE, PONG_DOT_SIZE},
         PONG_PLAYFIELD_COLOR,
      };
      BackFill(dot);
   }
   /*
    * Draw the score counters.
    *
    * sprintf() is big, so we'll format this the old-fashioned way.
    * Right-justify the left score, and left-justify the right score.
    */
   {
      char scoreStr[7] = "       ";
      char *p = scoreStr;
      *(p++) = DecDigit(pong.scores[0], 100, TRUE);
      *(p++) = DecDigit(pong.scores[0], 10, TRUE);
      *(p++) = DecDigit(pong.scores[0], 1, FALSE);
      p++;
      if (pong.scores[1] >= 100) {
         *(p++) = DecDigit(pong.scores[1], 100, TRUE);
      }
      if (pong.scores[1] >= 10) {
         *(p++) = DecDigit(pong.scores[1], 10, TRUE);
      }
      *(p++) = DecDigit(pong.scores[1], 1, FALSE);
      PongDrawString(gSVGA.width/2, PONG_DIGIT_PIXEL_SIZE,
                     scoreStr, sizeof scoreStr);
   }
 }
 /*
 *-----------------------------------------------------------------------------
 *
 * PongDrawScreen --
 *
 *    Top-level redraw function for Pong. This does a lot of unnecessary
 *    drawing to the backbuffer, but we're careful to only send update
 *    rectangles for a few things:
 *
 *      - When the playfield changes, we update the entire screen.
 *      - Each sprite (the paddles and ball) gets two rectangles:
 *          - One for its new position
 *          - One for its old position
 *
 *    None of these rectangles are ever merged.
 *
 *-----------------------------------------------------------------------------
 */
 static void
 PongDrawScreen()
 {
   PongDrawPlayfield();
   if (pong.playfieldDirty) {
      Rect r = {0, 0, gSVGA.width, gSVGA.height};
      BackMarkDirty(r);
      pong.playfieldDirty = FALSE;
   }
   /* Draw all sprites at the current positions */
   BackFill(pong.paddles[0]);
   BackMarkDirty(pong.paddles[0].r);
   BackFill(pong.paddles[1]);
   BackMarkDirty(pong.paddles[1].r);
   BackFill(pong.ball);
   BackMarkDirty(pong.ball.r);
   /* Commit this to the front buffer and the host's screen */
   BackUpdate();
   /* Make sure we erase all sprites at the current positions on the next frame */
   BackMarkDirty(pong.paddles[0].r);
   BackMarkDirty(pong.paddles[1].r);
   BackMarkDirty(pong.ball.r);
 }
 /*
 *-----------------------------------------------------------------------------
 *
 * PongLaunchBall --
 *
 *    Reset the ball position, and give it a random angle.
 *
 *-----------------------------------------------------------------------------
 */
 static void
 PongLaunchBall()
 {
   /* sin() from 0 to PI/2 */
   static const float sineTable[64] = {
      0.000000, 0.024931, 0.049846, 0.074730, 0.099568, 0.124344, 0.149042, 0.173648,
      0.198146, 0.222521, 0.246757, 0.270840, 0.294755, 0.318487, 0.342020, 0.365341,
      0.388435, 0.411287, 0.433884, 0.456211, 0.478254, 0.500000, 0.521435, 0.542546,
      0.563320, 0.583744, 0.603804, 0.623490, 0.642788, 0.661686, 0.680173, 0.698237,
      0.715867, 0.733052, 0.749781, 0.766044, 0.781831, 0.797133, 0.811938, 0.826239,
      0.840026, 0.853291, 0.866025, 0.878222, 0.889872, 0.900969, 0.911506, 0.921476,
      0.930874, 0.939693, 0.947927, 0.955573, 0.962624, 0.969077, 0.974928, 0.980172,
      0.984808, 0.988831, 0.992239, 0.995031, 0.997204, 0.998757, 0.999689, 1.000000,
   };
   int t;
   float sinT, cosT;
   pong.ballPos.x = gSVGA.width / 2;
   pong.ballPos.y = gSVGA.height / 2;
   /* Limit the random angle to avoid those within 45 degrees of vertical */
   t = 32 + (Random32() & 31);
   sinT = sineTable[t];
   cosT = -sineTable[(t + 32) & 63];
   sinT *= pong.ballSpeed;
   cosT *= pong.ballSpeed;
   switch (Random32() & 3) {
   case 0:
      pong.ballVelocity.x = sinT;
      pong.ballVelocity.y = cosT;
      break;
   case 1:
      pong.ballVelocity.x = -sinT;
      pong.ballVelocity.y = cosT;
      break;
   case 2:
      pong.ballVelocity.x = -sinT;
      pong.ballVelocity.y = -cosT;
      break;
   case 3:
      pong.ballVelocity.x = sinT;
      pong.ballVelocity.y = -cosT;
      break;
   }
 }
 /*
 *-----------------------------------------------------------------------------
 *
 * PongInit --
 *
 *    Initialize all game variables, including sprite location/size/color.
 *    Requires that SVGA has already been initialized.
 *
 *-----------------------------------------------------------------------------
 */
 static void
 PongInit()
 {
   pong.scores[0] = 0;
   pong.scores[1] = 0;
   pong.playfieldDirty = TRUE;
   pong.paddlePos[0] = pong.paddlePos[1] = gSVGA.height / 2;
   pong.paddles[0].r.x = 10;
   pong.paddles[0].r.w = 16;
   pong.paddles[0].r.h = 64;
   pong.paddles[0].color = PONG_SPRITE_COLOR;
   pong.paddles[1].r.x = gSVGA.width - 16 - 10;
   pong.paddles[1].r.w = 16;
   pong.paddles[1].r.h = 64;
   pong.paddles[1].color = PONG_SPRITE_COLOR;
   pong.ball.r.w = 16;
   pong.ball.r.h = 16;
   pong.ball.color = PONG_SPRITE_COLOR;
   pong.ballSpeed = 400;
   PongLaunchBall();
 }
 /*
 *-----------------------------------------------------------------------------
 *
 * PongUpdateMotion --
 *
 *    Perform motion updates for the ball and paddles. This includes
 *    bounce/goal detection.
 *
 *-----------------------------------------------------------------------------
 */
 static void
 PongUpdateMotion(float dt)  // IN
 {
   int playableWidth = gSVGA.width - pong.ball.r.w;
   int playableHeight = gSVGA.height - pong.ball.r.h;
   int i;
   pong.ballPos.x += pong.ballVelocity.x * dt;
   pong.ballPos.y += pong.ballVelocity.y * dt;
   for (i = 0; i < 2; i++) {
      int pos = pong.paddlePos[i] + pong.paddleVelocities[i] * dt;
      pong.paddlePos[i] = MIN(gSVGA.height - pong.paddles[i].r.h, MAX(0, pos));
      pong.paddles[i].r.y = (int)pong.paddlePos[i];
   }
   if (pong.ballPos.x >= playableWidth) {
      /* Goal off the right edge */
      pong.scores[0]++;
      pong.playfieldDirty = TRUE;
      PongLaunchBall();
   }
   if (pong.ballPos.x <= 0) {
      /* Goal off the left edge */
      pong.scores[1]++;
      pong.playfieldDirty = TRUE;
      PongLaunchBall();
   }
   if (pong.ballPos.y >= playableHeight) {
      /* Bounce off the bottom edge */
      pong.ballVelocity.y = -pong.ballVelocity.y;
      pong.ballPos.y = playableHeight - (pong.ballPos.y - playableHeight);
   }
   if (pong.ballPos.y <= 0) {
      /* Bounce off the top edge */
      pong.ballVelocity.y = -pong.ballVelocity.y;
      pong.ballPos.y = -pong.ballPos.y;
   }
   pong.ballPos.y = MIN(playableHeight, pong.ballPos.y);
   pong.ballPos.y = MAX(0, pong.ballPos.y);
   pong.ball.r.x = (int)pong.ballPos.x;
   pong.ball.r.y = (int)pong.ballPos.y;
   /*
    * Lame collision detection between ball and paddles. Really we
    * should be testing the ball's entire path over this time step,
    * not just the ball's new position. Using the current
    * implementation, it's possible for the ball to move through a
    * paddle if it's going fast enough or our frame rate is slow
    * enough.
    */
   for (i = 0; i < 2; i++) {
      /*
       * Only bounce off the paddle when we're moving toward it, to
       * prevent the ball from getting stuck inside the paddle
       */
      if ((pong.paddles[i].r.x > gSVGA.width / 2) == (pong.ballVelocity.x > 0) &&
          RectTestIntersection(&pong.ball.r, &pong.paddles[i].r)) {
         /*
          * Boing! The ball bounces back, plus it gets a little spin
          * if the paddle itself was moving at the time.
          */
         pong.ballVelocity.x = -pong.ballVelocity.x;
         pong.ballVelocity.y += pong.paddleVelocities[i];
         pong.ballVelocity.y = MIN(pong.ballVelocity.y, pong.ballSpeed * 2);
         pong.ballVelocity.y = MAX(pong.ballVelocity.y, -pong.ballSpeed * 2);
      }
   }
 }
 /*
 *-----------------------------------------------------------------------------
 *
 * PongKeyboardPlayer --
 *
 *    A human player, using the up and down arrows on a keyboard.
 *
 *-----------------------------------------------------------------------------
 */
 static void
 PongKeyboardPlayer(int playerNum,   // IN
                   float maxSpeed,  // IN
                   float accel)     // IN
 {
   float v = pong.paddleVelocities[playerNum];
   Bool up = Keyboard_IsKeyPressed(KEY_UP);
   Bool down = Keyboard_IsKeyPressed(KEY_DOWN);
   if (up && !down) {
      v -= accel;
   } else if (down && !up) {
      v += accel;
   } else {
      v = 0;
   }
   v = MIN(maxSpeed, MAX(-maxSpeed, v));
   pong.paddleVelocities[playerNum] = v;
 }
 /*
 *-----------------------------------------------------------------------------
 *
 * PongAbsMousePlayer --
 *
 *    A human player, controlled with the Y axis of the absolute mouse.
 *
 *-----------------------------------------------------------------------------
 */
 static void
 PongAbsMousePlayer(int playerNum)   // IN
 {
   int currentY = pong.paddles[playerNum].r.y;
   int newY = currentY;
   VMMousePacket p;
   Bool mouseMoved = FALSE;
   while (VMBackdoor_MouseGetPacket(&p)) {
      newY = (p.y * gSVGA.height / 0xFFFF) - pong.paddles[playerNum].r.h / 2;
      newY = MAX(0, newY);
      newY = MIN(gSVGA.height - pong.paddles[playerNum].r.h, newY);
      mouseMoved = TRUE;
   }
   if (newY != currentY && mouseMoved) {
      pong.paddleVelocities[playerNum] = (newY - currentY) * (float)PONG_FRAME_RATE;
   }
 }
 /*
 *-----------------------------------------------------------------------------
 *
 * PongComputerPlayer --
 *
 *    Simple computer player. Always moves its paddle toward the ball.
 *
 *-----------------------------------------------------------------------------
 */
 static void
 PongComputerPlayer(int playerNum,   // IN
                   float maxSpeed)  // IN
 {
   int paddleCenter = pong.paddles[playerNum].r.y + pong.paddles[playerNum].r.h / 2;
   int ballCenter = pong.ball.r.y + pong.ball.r.h / 2;
   int distance = ballCenter - paddleCenter;
   pong.paddleVelocities[playerNum] = distance / (float)gSVGA.height * maxSpeed;
 }
 /*
 *-----------------------------------------------------------------------------
 *
 * main --
 *
 *    Initialization and main loop.
 *
 *-----------------------------------------------------------------------------
 */
 void
 main(void)
 {
   Intr_Init();
   SVGA_Init();
   SVGA_SetMode(800, 600, 32);
   back.buffer = (uint32*) (gSVGA.fbMem + gSVGA.width * gSVGA.height * sizeof(uint32));
   Keyboard_Init();
   VMBackdoor_MouseInit(TRUE);
   PongInit();
   Timer_InitPIT(PIT_HZ / PONG_FRAME_RATE);
   Intr_SetMask(0, TRUE);
   while (1) {
      PongKeyboardPlayer(0, 1000, 50);
      PongAbsMousePlayer(0);
      PongComputerPlayer(1, 2000);
      PongUpdateMotion(1.0 / PONG_FRAME_RATE);
      PongDrawScreen();
      Intr_Halt();
   }
 }
--- a/examples/present-readback/Makefile
+++ b/examples/present-readback/Makefile
@ -0,0 +1,6 @@
 TARGET = presentReadback.img
 APP_SOURCES = main.c
 LIB_DIR = ../../lib
 include $(LIB_DIR)/Makefile.rules
--- a/examples/present-readback/main.c
+++ b/examples/present-readback/main.c
@ -0,0 +1,233 @@
 /*
 * SVGA3D example: Present Reaback example.  This example tests the 3d
 * and 2d syncronization presentReadback command.  This example draws
 * a spinning cube using 3d.  After every frame parts of the 2d
 * framebuffer are updated with present readback with a 2d update
 * following.  Parts of the 3d region are cleared before the present
 * readback command testing that the last presented 3d data is what is
 * copied to the 2d framebuffer.  This cube should spin with no
 * flicker.
 *
 * Copyright (C) 2008-2009 VMware, Inc. Licensed under the MIT
 * License, please see the README.txt. All rights reserved.
 */
 #include "svga3dutil.h"
 #include "svga3dtext.h"
 #include "matrix.h"
 #include "math.h"
 typedef struct {
   float position[3];
   uint32 color;
 } MyVertex;
 static const MyVertex vertexData[] = {
   { {-1, -1, -1}, 0xFFFFFF },
   { {-1, -1,  1}, 0xFFFF00 },
   { {-1,  1, -1}, 0xFF00FF },
   { {-1,  1,  1}, 0xFF0000 },
   { { 1, -1, -1}, 0x00FFFF },
   { { 1, -1,  1}, 0x00FF00 },
   { { 1,  1, -1}, 0x0000FF },
   { { 1,  1,  1}, 0x000000 },
 };
 #define QUAD(a,b,c,d) a, b, d, d, c, a
 static const uint16 indexData[] = {
   QUAD(0,1,2,3), // -X
   QUAD(4,5,6,7), // +X
   QUAD(0,1,4,5), // -Y
   QUAD(2,3,6,7), // +Y
   QUAD(0,2,4,6), // -Z
   QUAD(1,3,5,7), // +Z
 };
 #undef QUAD
 const uint32 numTriangles = sizeof indexData / sizeof indexData[0] / 3;
 uint32 vertexSid, indexSid;
 Matrix perspectiveMat;
 FPSCounterState gFPS;
 VMMousePacket lastMouseState;
 /*
 * render --
 *
 *   Set up render state, and draw our cube scene from static index
 *   and vertex buffers.
 *
 *   This render state only needs to be set each frame because
 *   SVGA3DText_Draw() changes it.
 */
 void
 render(void)
 {
   SVGA3dTextureState *ts;
   SVGA3dRenderState *rs;
   SVGA3dVertexDecl *decls;
   SVGA3dPrimitiveRange *ranges;
   static Matrix view;
   Matrix_Copy(view, gIdentityMatrix);
   Matrix_Scale(view, 0.5, 0.5, 0.5, 1.0);
   if (lastMouseState.buttons & VMMOUSE_LEFT_BUTTON) {
      Matrix_RotateX(view, lastMouseState.y *  0.0001);
      Matrix_RotateY(view, lastMouseState.x * -0.0001);
   } else {
      Matrix_RotateX(view, 30.0 * M_PI / 180.0);
      Matrix_RotateY(view, gFPS.frame * 0.01f);
   }
   Matrix_Translate(view, 0, 0, 3);
   SVGA3D_SetTransform(CID, SVGA3D_TRANSFORM_VIEW, view);
   SVGA3D_SetTransform(CID, SVGA3D_TRANSFORM_WORLD, gIdentityMatrix);
   SVGA3D_SetTransform(CID, SVGA3D_TRANSFORM_PROJECTION, perspectiveMat);
   SVGA3D_BeginSetRenderState(CID, &rs, 4);
   {
      rs[0].state     = SVGA3D_RS_BLENDENABLE;
      rs[0].uintValue = FALSE;
      rs[1].state     = SVGA3D_RS_ZENABLE;
      rs[1].uintValue = TRUE;
      rs[2].state     = SVGA3D_RS_ZWRITEENABLE;
      rs[2].uintValue = TRUE;
      rs[3].state     = SVGA3D_RS_ZFUNC;
      rs[3].uintValue = SVGA3D_CMP_LESS;
   }
   SVGA_FIFOCommitAll();
   SVGA3D_BeginSetTextureState(CID, &ts, 4);
   {
      ts[0].stage = 0;
      ts[0].name  = SVGA3D_TS_BIND_TEXTURE;
      ts[0].value = SVGA3D_INVALID_ID;
      ts[1].stage = 0;
      ts[1].name  = SVGA3D_TS_COLOROP;
      ts[1].value = SVGA3D_TC_SELECTARG1;
      ts[2].stage = 0;
      ts[2].name  = SVGA3D_TS_COLORARG1;
      ts[2].value = SVGA3D_TA_DIFFUSE;
      ts[3].stage = 0;
      ts[3].name  = SVGA3D_TS_ALPHAARG1;
      ts[3].value = SVGA3D_TA_DIFFUSE;
   }
   SVGA_FIFOCommitAll();
   SVGA3D_BeginDrawPrimitives(CID, &decls, 2, &ranges, 1);
   {
      decls[0].identity.type = SVGA3D_DECLTYPE_FLOAT3;
      decls[0].identity.usage = SVGA3D_DECLUSAGE_POSITION;
      decls[0].array.surfaceId = vertexSid;
      decls[0].array.stride = sizeof(MyVertex);
      decls[0].array.offset = offsetof(MyVertex, position);
      decls[1].identity.type = SVGA3D_DECLTYPE_D3DCOLOR;
      decls[1].identity.usage = SVGA3D_DECLUSAGE_COLOR;
      decls[1].array.surfaceId = vertexSid;
      decls[1].array.stride = sizeof(MyVertex);
      decls[1].array.offset = offsetof(MyVertex, color);
      ranges[0].primType = SVGA3D_PRIMITIVE_TRIANGLELIST;
      ranges[0].primitiveCount = numTriangles;
      ranges[0].indexArray.surfaceId = indexSid;
      ranges[0].indexArray.stride = sizeof(uint16);
      ranges[0].indexWidth = sizeof(uint16);
   }
   SVGA_FIFOCommitAll();
 }
 /*
 * main --
 *
 *    Our example's entry point, invoked directly by the bootloader.
 */
 int
 main(void)
 {
   SVGAGuestPtr ptr;
   SVGA3DUtil_InitFullscreen(CID, 800, 600);
   SVGA3DUtil_AllocDMABuffer(gSVGA.width * gSVGA.height * 4, &ptr);
   SVGA3DText_Init();
   vertexSid = SVGA3DUtil_DefineStaticBuffer(vertexData, sizeof vertexData);
   indexSid = SVGA3DUtil_DefineStaticBuffer(indexData, sizeof indexData);
   Matrix_Perspective(perspectiveMat, 45.0f,
                      gSVGA.width / (float)gSVGA.height, 0.1f, 100.0f);
   while (1) {
      SVGA3dRect *rects;
      int halfWidth = gSVGA.width / 2;
      int halfHeight = gSVGA.height / 2;
      if (SVGA3DUtil_UpdateFPSCounter(&gFPS)) {
         Console_Clear();
         Console_Format("VMware SVGA3D Example:\n"
                        "Present Readback:\n"
                        "  - upper left quadrant:\n"
                        "       present\n"
                        "  - lower right quadrant:\n"
                        "       present -> presentReadback -> update\n"
                        "  - upper right and lower left quadrants:\n"
                        "       present -> clear -> presentReadback -> update\n"
                        "\n"
                        "The cube should appear to be smothly spinning \n"
                        "with all quadrants of the screen in sync.\n\n%s",
                        gFPS.text);
         SVGA3DText_Update();
         VMBackdoor_VGAScreenshot();
      }
      while (VMBackdoor_MouseGetPacket(&lastMouseState));
      SVGA3DUtil_ClearFullscreen(CID, SVGA3D_CLEAR_COLOR | SVGA3D_CLEAR_DEPTH,
                                 0x113366, 1.0f, 0);
      render();
      SVGA3DText_Draw();
      SVGA3DUtil_PresentFullscreen();
      SVGA3D_BeginPresentReadback(&rects, 1);
      rects[0].x = halfWidth;
      rects[0].y = halfHeight;
      rects[0].w = halfWidth;
      rects[0].h = halfHeight;
      SVGA_FIFOCommitAll();
      SVGA_SyncToFence(SVGA_InsertFence());
      SVGA_Update(halfWidth,halfHeight,halfWidth,halfHeight);
      SVGA3DUtil_ClearFullscreen(CID, SVGA3D_CLEAR_COLOR | SVGA3D_CLEAR_DEPTH,
                                 0xff00ff, 1.0f, 0);
      SVGA3D_BeginPresentReadback(&rects, 2);
      rects[0].x = halfWidth;
      rects[0].y = 0;
      rects[0].w = halfWidth;
      rects[0].h = halfHeight;
      rects[1].x = 0;
      rects[1].y = halfHeight;
      rects[1].w = halfWidth;
      rects[1].h = halfHeight;
      SVGA_FIFOCommitAll();
      SVGA_SyncToFence(SVGA_InsertFence());
      SVGA_Update(halfWidth, 0, halfWidth, halfHeight);
      SVGA_Update(0, halfHeight, halfWidth, halfHeight);
   }
   return 0;
 }
--- a/examples/simple-blit/Makefile
+++ b/examples/simple-blit/Makefile
@ -0,0 +1,6 @@
 TARGET = simple_blit.img
 APP_SOURCES = main.c
 LIB_DIR = ../../lib
 include $(LIB_DIR)/Makefile.rules
--- a/examples/simple-blit/main.c
+++ b/examples/simple-blit/main.c
@ -0,0 +1,104 @@
 /*
 * SVGA3D example: Simple BLIT (Block Image Transfer) which updates
 *                 the render target surface ID.
 *
 * Copyright (C) 2008-2009 VMware, Inc. Licensed under the MIT
 * License, please see the README.txt. All rights reserved.
 */
 #include "types.h"
 #include "svga3dutil.h"
 #include "svga3dtext.h"
 FPSCounterState gFPS;
 /*
 * Alpha, red, green, blue components of color.
 */
 static uint32 a = 255, r = 0, g = 0, b = 0;
 /*
 * DMA pools allow for the allocation of GMR memory.
 * The re-use policy is handled by the allocation routines.
 */
 static uint32 blitSize = 0;
 DMAPool blitDMA;
 /*
 * render --
 *
 *   Set up render state, and use surface DMA to update the
 *   render target with a solid color that goes from black
 *   to white.
 *
 */
 void
 render(void)
 {
   DMAPoolBuffer *dma = NULL;
   uint32 *buffer = NULL;
   uint32 color;
   dma = SVGA3DUtil_DMAPoolGetBuffer(&blitDMA);
   buffer = (uint32 *)dma->buffer;
   /* uint32 memset. */
   color = ((a&255) << 24) | ((r&255) << 16) | ((b&255) << 8) | (g&255);
   memset32(buffer, color, blitSize / sizeof *buffer);
   r++; g++; b++;
   /*
    * Copy pixel data from our temporary memory in the GMR into
    * the render target.  This is a BLIT operation from memory
    * in the guest to the host render target.
    */
   SVGA3DUtil_SurfaceDMA2D(gFullscreen.colorImage.sid, &dma->ptr,
                           SVGA3D_WRITE_HOST_VRAM,
                           gSVGA.width, gSVGA.height);
   SVGA3DUtil_AsyncCall((AsyncCallFn) SVGA3DUtil_DMAPoolFreeBuffer, dma);
 }
 /*
 * main --
 *
 *    Our example's entry point, invoked directly by the bootloader.
 */
 int
 main(void)
 {
   SVGA3DUtil_InitFullscreen(CID, 800, 600);
   SVGA3DText_Init();
   /*
    * Allocate 2 buffers for DMA.  Each buffer is the size of the display
    * so that we can fill the buffers with color data and DMA that buffer
    * to the render target.
    */
   blitSize = gSVGA.width * gSVGA.height * sizeof(uint32);
   SVGA3DUtil_AllocDMAPool(&blitDMA, blitSize, 4);
   while (1) {
      if (SVGA3DUtil_UpdateFPSCounter(&gFPS)) {
         Console_Clear();
         Console_Format("VMware SVGA3D Example:\n"
                        "Simple BLIT of image into render target.\n%s",
                        gFPS.text);
         SVGA3DText_Update();
         VMBackdoor_VGAScreenshot();
      }
      SVGA3DUtil_ClearFullscreen(CID, SVGA3D_CLEAR_COLOR, 0x113366, 1.0f, 0);
      render();
      SVGA3DText_Draw();
      SVGA3DUtil_PresentFullscreen();
   }
   return 0;
 }
--- a/examples/simple-shaders/Makefile
+++ b/examples/simple-shaders/Makefile
@ -0,0 +1,16 @@
 TARGET = simple-shaders.img
 APP_SOURCES = main.c
 LIB_DIR = ../../lib
 include $(LIB_DIR)/Makefile.rules
 .PHONY: shaders
 shaders: simple_vs.h simple_ps.h
 simple_vs.h: simple.fx
 	wine fxc.exe /T vs_2_0 /E MyVertexShader /Fh simple_vs.h simple.fx
 simple_ps.h: simple.fx
 	wine fxc.exe /T ps_2_0 /E MyPixelShader /Fh simple_ps.h simple.fx
--- a/examples/simple-shaders/main.c
+++ b/examples/simple-shaders/main.c
@ -0,0 +1,249 @@
 /*
 * SVGA3D example: Simple Shaders.
 *
 * This is a simple example to demonstrate the programmable pixel
 * and vertex pipelines. A vertex shader animates a rippling surface,
 * and a pixel shader generates a procedural checkerboard pattern.
 *
 * For simplicity, this example generates shader bytecode at
 * compile-time using the Microsoft HLSL compiler.
 *
 * Copyright (C) 2008-2009 VMware, Inc. Licensed under the MIT
 * License, please see the README.txt. All rights reserved.
 */
 #include "svga3dutil.h"
 #include "svga3dtext.h"
 #include "matrix.h"
 #include "math.h"
 typedef uint32 DWORD;
 #include "simple_vs.h"
 #include "simple_ps.h"
 /*
 * Small integers to identify our shaders.
 */
 #define MY_VSHADER_ID       0
 #define MY_PSHADER_ID       0
 /*
 * Shader constants. These must match the constant registers in the
 * bytecode we send the device, so in this example the constants are
 * actually assigned by the Microsoft HLSL compiler.
 */
 #define CONST_MAT_WORLDVIEWPROJ   0
 #define CONST_TIMESTEP            4
 /*
 * Macros for the simple mesh we generate as input for the vertex
 * shader.  It's a static grid in the XY plane.
 */
 #define MESH_WIDTH      256
 #define MESH_HEIGHT     256
 #define MESH_NUM_VERTICES   (MESH_WIDTH * MESH_HEIGHT)
 #define MESH_NUM_QUADS      ((MESH_WIDTH-1) * (MESH_HEIGHT-1))
 #define MESH_NUM_TRIANGLES  (MESH_NUM_QUADS * 2)
 #define MESH_NUM_INDICES    (MESH_NUM_TRIANGLES * 3)
 #define MESH_ELEMENT(x, y)  (MESH_WIDTH * (y) + (x))
 typedef struct {
   float position[3];
 } MyVertex;
 typedef uint16 IndexType;
 uint32 vertexSid, indexSid;
 FPSCounterState gFPS;
 /*
 * render --
 *
 *    Set up render state that we load once per frame (because
 *    SVGA3DText clobbered it) and render the scene.
 */
 void
 render(void)
 {
   SVGA3dVertexDecl *decls;
   SVGA3dPrimitiveRange *ranges;
   SVGA3dRenderState *rs;
   float shaderTimestep[4] = { gFPS.frame * 0.01 };
   SVGA3D_SetShaderConst(CID, CONST_TIMESTEP, SVGA3D_SHADERTYPE_VS,
                         SVGA3D_CONST_TYPE_FLOAT, shaderTimestep);
   SVGA3D_BeginSetRenderState(CID, &rs, 4);
   {
      rs[0].state     = SVGA3D_RS_BLENDENABLE;
      rs[0].uintValue = FALSE;
      rs[1].state     = SVGA3D_RS_ZENABLE;
      rs[1].uintValue = TRUE;
      rs[2].state     = SVGA3D_RS_ZWRITEENABLE;
      rs[2].uintValue = TRUE;
      rs[3].state     = SVGA3D_RS_ZFUNC;
      rs[3].uintValue = SVGA3D_CMP_LESS;
   }
   SVGA_FIFOCommitAll();
   SVGA3D_SetShader(CID, SVGA3D_SHADERTYPE_VS, MY_VSHADER_ID);
   SVGA3D_SetShader(CID, SVGA3D_SHADERTYPE_PS, MY_PSHADER_ID);
   SVGA3D_BeginDrawPrimitives(CID, &decls, 1, &ranges, 1);
   {
      decls[0].identity.type = SVGA3D_DECLTYPE_FLOAT3;
      decls[0].identity.usage = SVGA3D_DECLUSAGE_POSITION;
      decls[0].array.surfaceId = vertexSid;
      decls[0].array.stride = sizeof(MyVertex);
      decls[0].array.offset = offsetof(MyVertex, position);
      ranges[0].primType = SVGA3D_PRIMITIVE_TRIANGLELIST;
      ranges[0].primitiveCount = MESH_NUM_TRIANGLES;
      ranges[0].indexArray.surfaceId = indexSid;
      ranges[0].indexArray.stride = sizeof(IndexType);
      ranges[0].indexWidth = sizeof(IndexType);
   }
   SVGA_FIFOCommitAll();
   SVGA3D_SetShader(CID, SVGA3D_SHADERTYPE_VS, SVGA3D_INVALID_ID);
   SVGA3D_SetShader(CID, SVGA3D_SHADERTYPE_PS, SVGA3D_INVALID_ID);
 }
 /*
 * createIndexBuffer --
 *
 *    Create a static index buffer that renders our vertices as a 2D
 *    mesh. For simplicity, we use a triangle list rather than a
 *    triangle strip.
 */
 uint32
 createIndexBuffer(void)
 {
   IndexType *indexBuffer;
   const uint32 bufferSize = MESH_NUM_INDICES * sizeof *indexBuffer;
   SVGAGuestPtr gPtr;
   uint32 sid;
   int x, y;
   sid = SVGA3DUtil_DefineSurface2D(bufferSize, 1, SVGA3D_BUFFER);
   indexBuffer = SVGA3DUtil_AllocDMABuffer(bufferSize, &gPtr);
   for (y = 0; y < (MESH_HEIGHT - 1); y++) {
      for (x = 0; x < (MESH_WIDTH - 1); x++) {
         indexBuffer[0] = MESH_ELEMENT(x,   y  );
         indexBuffer[1] = MESH_ELEMENT(x+1, y  );
         indexBuffer[2] = MESH_ELEMENT(x+1, y+1);
         indexBuffer[3] = MESH_ELEMENT(x+1, y+1);
         indexBuffer[4] = MESH_ELEMENT(x,   y+1);
         indexBuffer[5] = MESH_ELEMENT(x,   y  );
         indexBuffer += 6;
      }
   }
   SVGA3DUtil_SurfaceDMA2D(sid, &gPtr, SVGA3D_WRITE_HOST_VRAM, bufferSize, 1);
   return sid;
 }
 /*
 * createVertexBuffer --
 *
 *    Create a static vertex buffer that renders a mesh on thee XY
 *    plane. For simplicity, we use a triangle list rather than a
 *    triangle strip.
 */
 uint32
 createVertexBuffer(void)
 {
   MyVertex *vert;
   const uint32 bufferSize = MESH_NUM_VERTICES * sizeof(MyVertex);
   SVGAGuestPtr gPtr;
   uint32 sid;
   int x, y;
   sid = SVGA3DUtil_DefineSurface2D(bufferSize, 1, SVGA3D_BUFFER);
   vert = SVGA3DUtil_AllocDMABuffer(bufferSize, &gPtr);
   for (y = 0; y < MESH_HEIGHT; y++) {
      for (x = 0; x < MESH_WIDTH; x++) {
         vert->position[0] = x * (2.0 / MESH_WIDTH) - 1.0;
         vert->position[1] = y * (2.0 / MESH_HEIGHT) - 1.0;
         vert->position[2] = 0.0f;
         vert++;
      }
   }
   SVGA3DUtil_SurfaceDMA2D(sid, &gPtr, SVGA3D_WRITE_HOST_VRAM, bufferSize, 1);
   return sid;
 }
 /*
 * main --
 *
 *    Our example's entry point, invoked directly by the bootloader.
 */
 int
 main(void)
 {
   Matrix worldViewProj, proj;
   SVGA3DUtil_InitFullscreen(CID, 800, 600);
   SVGA3DText_Init();
   vertexSid = createVertexBuffer();
   indexSid = createIndexBuffer();
   SVGA3D_DefineShader(CID, MY_VSHADER_ID, SVGA3D_SHADERTYPE_VS,
                       g_vs20_MyVertexShader, sizeof g_vs20_MyVertexShader);
   SVGA3D_DefineShader(CID, MY_PSHADER_ID, SVGA3D_SHADERTYPE_PS,
                       g_ps20_MyPixelShader, sizeof g_ps20_MyPixelShader);
   /*
    * Compute a single matrix for the world, view, and projection
    * transforms, then upload that to the shader.
    */
   Matrix_Copy(worldViewProj, gIdentityMatrix);
   Matrix_RotateX(worldViewProj, 60.0 * PI_OVER_180);
   Matrix_Translate(worldViewProj, 0, 0, 3);
   Matrix_Perspective(proj, 45.0f, gSVGA.width / (float)gSVGA.height, 0.1f, 100.0f);
   Matrix_Multiply(worldViewProj, proj);
   SVGA3DUtil_SetShaderConstMatrix(CID, CONST_MAT_WORLDVIEWPROJ,
                                   SVGA3D_SHADERTYPE_VS, worldViewProj);
   while (1) {
      if (SVGA3DUtil_UpdateFPSCounter(&gFPS)) {
         Console_Clear();
         Console_Format("VMware SVGA3D Example:\n"
                        "Simple Shaders.\n\n%s",
                        gFPS.text);
         SVGA3DText_Update();
      }
      SVGA3DUtil_ClearFullscreen(CID, SVGA3D_CLEAR_COLOR | SVGA3D_CLEAR_DEPTH,
                                 0x113366, 1.0f, 0);
      render();
      SVGA3DText_Draw();
      SVGA3DUtil_PresentFullscreen();
   }
   return 0;
 }
--- a/examples/simple-shaders/simple.fx
+++ b/examples/simple-shaders/simple.fx
@ -0,0 +1,60 @@
 float4x4 matWorldViewProj;
 float    timestep;
 struct VS_Output
 {
   float4  Pos      : POSITION;
   float4  Coord    : TEXCOORD0;
 };
 VS_Output
 MyVertexShader(float4 inputPos : POSITION)
 {
   VS_Output Output;
   float4 objectCoord = inputPos;
   float dist = pow(objectCoord.x, 2) + pow(objectCoord.y, 2);
   objectCoord.z = sin(dist * 8.0 + timestep) / (1 + dist * 10.0);
   Output.Pos = mul(objectCoord, matWorldViewProj);
   Output.Coord = objectCoord;
   return Output;
 }
 struct PS_Input
 {
   float4  Coord    : TEXCOORD0;
 };
 float4
 MyPixelShader(PS_Input Input) : COLOR
 {
   /*
    * Simple 2D procedural checkerboard.
    */
   const float4 color1 = { 0.25, 0.25, 0.25, 1.0 };
   const float4 color2 = { 1.0, 1.0, 1.0, 1.0 };
   const float checkerSize = 0.2;
   float2 s = fmod(Input.Coord.xy / checkerSize, 1);
   float check = ( (float)(s.x > 0.5 || (s.x < 0 && s.x > -0.5)) +
                   (float)(s.y > 0.5 || (s.y < 0 && s.y > -0.5)) );
   float4 color = lerp(color1, color2, fmod(check, 2));
   /*
    * Do a little fake shading
    */
   const float4 shadeTop = { 1.0, 1.0, 0.5, 1.0 };
   const float4 shadeBottom = { 0.5, 0.5, 1.0, 1.0 };
   float z = Input.Coord.z * 2;
   color = lerp(color, shadeBottom, clamp(z, 0, 0.25));
   color = lerp(color, shadeTop, clamp(-z, 0, 0.25));
   return color;
 }
--- a/examples/simple-shaders/simple_ps.h
+++ b/examples/simple-shaders/simple_ps.h
@ -0,0 +1,89 @@
 #if 0
 //
 // Generated by Microsoft (R) D3DX9 Shader Compiler 
 //
 //   fxc /T ps_2_0 /E MyPixelShader /Fh simple_ps.h simple.fx
 //
    ps_2_0
    def c0, 5, 0.5, 0, 1
    def c1, -0.5, 0.25, 0, 0
    def c2, 0.25, 0.25, 1, 0
    def c3, 1.5, 1.5, 0, 0
    def c4, 0.5, 1, 1, 0
    def c5, 1, 0.5, 1, 0
    dcl t0.xyz
    mul r0.xy, t0, c0.x
    abs r0.xy, r0
    frc r0.xy, r0
    cmp r0.xy, t0, r0, -r0
    add r0.w, -r0.x, c0.y
    cmp r0.w, r0.w, c0.z, c0.w
    add r1.w, -r0.x, c1.x
    cmp r1.w, r1.w, c0.z, c0.w
    cmp r2.w, r0.x, c0.z, c0.w
    mad r0.w, r2.w, r1.w, r0.w
    cmp r0.w, -r0.w, c0.z, c0.w
    add r1.w, -r0.y, c0.y
    cmp r1.w, r1.w, c0.z, c0.w
    add r2.w, -r0.y, c1.x
    cmp r3.w, r0.y, c0.z, c0.w
    cmp r2.w, r2.w, c0.z, c0.w
    mad r1.w, r3.w, r2.w, r1.w
    cmp r1.w, -r1.w, c0.z, c0.w
    add r0.w, r0.w, r1.w
    mul r0.w, r0.w, c0.y
    frc r0.w, r0.w
    mov r0.xyz, c3
    mad r0.xyz, r0.w, r0, c2
    add r0.w, t0.z, t0.z
    max r1.w, r0.w, c0.z
    max r2.w, -r0.w, c0.z
    min r0.w, r1.w, c1.y
    lrp r1.xyz, r0.w, c4, r0
    min r0.w, r2.w, c1.y
    lrp r2.xyz, r0.w, c5, r1
    mov r0.xy, r2.x
    mov r0.w, r2.z
    mov r0.z, r2.y
    mov oC0, r0
 // approximately 34 instruction slots used
 #endif
 const DWORD g_ps20_MyPixelShader[] =
 {
    0xffff0200, 0x0013fffe, 0x42415443, 0x0000001c, 0x00000023, 0xffff0200, 
    0x00000000, 0x00000000, 0x20000100, 0x0000001c, 0x325f7370, 0x4d00305f, 
    0x6f726369, 0x74666f73, 0x29522820, 0x44334420, 0x53203958, 0x65646168, 
    0x6f432072, 0x6c69706d, 0x00207265, 0x05000051, 0xa00f0000, 0x40a00000, 
    0x3f000000, 0x00000000, 0x3f800000, 0x05000051, 0xa00f0001, 0xbf000000, 
    0x3e800000, 0x00000000, 0x00000000, 0x05000051, 0xa00f0002, 0x3e800000, 
    0x3e800000, 0x3f800000, 0x00000000, 0x05000051, 0xa00f0003, 0x3fc00000, 
    0x3fc00000, 0x00000000, 0x00000000, 0x05000051, 0xa00f0004, 0x3f000000, 
    0x3f800000, 0x3f800000, 0x00000000, 0x05000051, 0xa00f0005, 0x3f800000, 
    0x3f000000, 0x3f800000, 0x00000000, 0x0200001f, 0x80000000, 0xb0070000, 
    0x03000005, 0x80030000, 0xb0e40000, 0xa0000000, 0x02000023, 0x80030000, 
    0x80e40000, 0x02000013, 0x80030000, 0x80e40000, 0x04000058, 0x80030000, 
    0xb0e40000, 0x80e40000, 0x81e40000, 0x03000002, 0x80080000, 0x81000000, 
    0xa0550000, 0x04000058, 0x80080000, 0x80ff0000, 0xa0aa0000, 0xa0ff0000, 
    0x03000002, 0x80080001, 0x81000000, 0xa0000001, 0x04000058, 0x80080001, 
    0x80ff0001, 0xa0aa0000, 0xa0ff0000, 0x04000058, 0x80080002, 0x80000000, 
    0xa0aa0000, 0xa0ff0000, 0x04000004, 0x80080000, 0x80ff0002, 0x80ff0001, 
    0x80ff0000, 0x04000058, 0x80080000, 0x81ff0000, 0xa0aa0000, 0xa0ff0000, 
    0x03000002, 0x80080001, 0x81550000, 0xa0550000, 0x04000058, 0x80080001, 
    0x80ff0001, 0xa0aa0000, 0xa0ff0000, 0x03000002, 0x80080002, 0x81550000, 
    0xa0000001, 0x04000058, 0x80080003, 0x80550000, 0xa0aa0000, 0xa0ff0000, 
    0x04000058, 0x80080002, 0x80ff0002, 0xa0aa0000, 0xa0ff0000, 0x04000004, 
    0x80080001, 0x80ff0003, 0x80ff0002, 0x80ff0001, 0x04000058, 0x80080001, 
    0x81ff0001, 0xa0aa0000, 0xa0ff0000, 0x03000002, 0x80080000, 0x80ff0000, 
    0x80ff0001, 0x03000005, 0x80080000, 0x80ff0000, 0xa0550000, 0x02000013, 
    0x80080000, 0x80ff0000, 0x02000001, 0x80070000, 0xa0e40003, 0x04000004, 
    0x80070000, 0x80ff0000, 0x80e40000, 0xa0e40002, 0x03000002, 0x80080000, 
    0xb0aa0000, 0xb0aa0000, 0x0300000b, 0x80080001, 0x80ff0000, 0xa0aa0000, 
    0x0300000b, 0x80080002, 0x81ff0000, 0xa0aa0000, 0x0300000a, 0x80080000, 
    0x80ff0001, 0xa0550001, 0x04000012, 0x80070001, 0x80ff0000, 0xa0e40004, 
    0x80e40000, 0x0300000a, 0x80080000, 0x80ff0002, 0xa0550001, 0x04000012, 
    0x80070002, 0x80ff0000, 0xa0e40005, 0x80e40001, 0x02000001, 0x80030000, 
    0x80000002, 0x02000001, 0x80080000, 0x80aa0002, 0x02000001, 0x80040000, 
    0x80550002, 0x02000001, 0x800f0800, 0x80e40000, 0x0000ffff
 };
--- a/examples/simple-shaders/simple_vs.h
+++ b/examples/simple-shaders/simple_vs.h
@ -0,0 +1,75 @@
 #if 0
 //
 // Generated by Microsoft (R) D3DX9 Shader Compiler 
 //
 //   fxc /T vs_2_0 /E MyVertexShader /Fh simple_vs.h simple.fx
 //
 //
 // Parameters:
 //
 //   float4x4 matWorldViewProj;
 //   float timestep;
 //
 //
 // Registers:
 //
 //   Name             Reg   Size
 //   ---------------- ----- ----
 //   matWorldViewProj c0       4
 //   timestep         c4       1
 //
    vs_2_0
    def c5, 8, 0.159154937, 0.5, 0
    def c6, 6.28318548, -3.14159274, 10, 1
    def c7, -1.55009923e-06, -2.17013894e-05, 0.00260416674, 0.00026041668
    def c8, -0.020833334, -0.125, 1, 0.5
    dcl_position v0
    mul r0.xy, v0, v0
    add r0.x, r0.y, r0.x
    mov r1.x, c5.x
    mad r0.y, r0.x, r1.x, c4.x
    mad r0.x, r0.x, c6.z, c6.w
    mad r0.y, r0.y, c5.y, c5.z
    frc r0.y, r0.y
    mad r0.y, r0.y, c6.x, c6.y
    sincos r1.y, r0.y, c7, c8
    rcp r0.x, r0.x
    mul r0.z, r1.y, r0.x
    mov r0.xyw, v0
    dp4 oPos.x, r0, c0
    dp4 oPos.y, r0, c1
    dp4 oPos.z, r0, c2
    dp4 oPos.w, r0, c3
    mov oT0, r0
 // approximately 24 instruction slots used
 #endif
 const DWORD g_vs20_MyVertexShader[] =
 {
    0xfffe0200, 0x002dfffe, 0x42415443, 0x0000001c, 0x0000008b, 0xfffe0200, 
    0x00000002, 0x0000001c, 0x20000100, 0x00000084, 0x00000044, 0x00000002, 
    0x00000004, 0x00000058, 0x00000000, 0x00000068, 0x00040002, 0x00000001, 
    0x00000074, 0x00000000, 0x5774616d, 0x646c726f, 0x77656956, 0x6a6f7250, 
    0xababab00, 0x00030003, 0x00040004, 0x00000001, 0x00000000, 0x656d6974, 
    0x70657473, 0xababab00, 0x00030000, 0x00010001, 0x00000001, 0x00000000, 
    0x325f7376, 0x4d00305f, 0x6f726369, 0x74666f73, 0x29522820, 0x44334420, 
    0x53203958, 0x65646168, 0x6f432072, 0x6c69706d, 0x00207265, 0x05000051, 
    0xa00f0005, 0x41000000, 0x3e22f983, 0x3f000000, 0x00000000, 0x05000051, 
    0xa00f0006, 0x40c90fdb, 0xc0490fdb, 0x41200000, 0x3f800000, 0x05000051, 
    0xa00f0007, 0xb5d00d01, 0xb7b60b61, 0x3b2aaaab, 0x39888889, 0x05000051, 
    0xa00f0008, 0xbcaaaaab, 0xbe000000, 0x3f800000, 0x3f000000, 0x0200001f, 
    0x80000000, 0x900f0000, 0x03000005, 0x80030000, 0x90e40000, 0x90e40000, 
    0x03000002, 0x80010000, 0x80550000, 0x80000000, 0x02000001, 0x80010001, 
    0xa0000005, 0x04000004, 0x80020000, 0x80000000, 0x80000001, 0xa0000004, 
    0x04000004, 0x80010000, 0x80000000, 0xa0aa0006, 0xa0ff0006, 0x04000004, 
    0x80020000, 0x80550000, 0xa0550005, 0xa0aa0005, 0x02000013, 0x80020000, 
    0x80550000, 0x04000004, 0x80020000, 0x80550000, 0xa0000006, 0xa0550006, 
    0x04000025, 0x80020001, 0x80550000, 0xa0e40007, 0xa0e40008, 0x02000006, 
    0x80010000, 0x80000000, 0x03000005, 0x80040000, 0x80550001, 0x80000000, 
    0x02000001, 0x800b0000, 0x90e40000, 0x03000009, 0xc0010000, 0x80e40000, 
    0xa0e40000, 0x03000009, 0xc0020000, 0x80e40000, 0xa0e40001, 0x03000009, 
    0xc0040000, 0x80e40000, 0xa0e40002, 0x03000009, 0xc0080000, 0x80e40000, 
    0xa0e40003, 0x02000001, 0xe00f0000, 0x80e40000, 0x0000ffff
 };
--- a/examples/video-formats/Makefile
+++ b/examples/video-formats/Makefile
@ -0,0 +1,6 @@
 TARGET = video-formats.img
 APP_SOURCES = main.c screen.png.data.o wols4x3.yuv.z.data.o
 LIB_DIR = ../../lib
 include $(LIB_DIR)/Makefile.rules
--- a/examples/video-formats/main.c
+++ b/examples/video-formats/main.c
@ -0,0 +1,297 @@
 /*
 * video-formats -- Demonstrate all supported video overlay formats.
 *
 * XXX: There are some known bugs in the currently released VMware
 *      products, which are exposed by this test:
 *
 *   1. The very first VideoFlush may not appear. In this test,
 *      the bug manifests as "No Overlay" for test #1.
 *
 *   2. Software emulated scaling is very low quality.
 *
 *   3. If the host is using hardware video overlay rather than
 *      its software fallback, it assumes that colorkey is always
 *      enabled. This means our video will only draw in the black
 *      portions of the background image (inside the "X", and
 *      the box around the "No overlay" text.)
 *
 * Copyright (C) 2008-2009 VMware, Inc. Licensed under the MIT
 * License, please see the README.txt. All rights reserved.
 */
 #include "svga.h"
 #include "png.h"
 #include "intr.h"
 #include "datafile.h"
 /*
 * This is our video test card, in UYVY format.
 *
 * It's a 720x576 pixel 4:3 aspect test card designed by Barney
 * Wol. (http://www.barney-wol.net/testpatterns)
 */
 DECLARE_DATAFILE(testCardFile, wols4x3_yuv_z);
 #define TESTCARD_WIDTH  720
 #define TESTCARD_HEIGHT 576
 /*
 * Our background image, in PNG format.
 *
 * This has 'cutouts' where we're supposed to display the test
 * pattern. Each of these are described by the table of overlay
 * settings below.
 */
 DECLARE_DATAFILE(screenPNGFile, screen_png);
 #define OFFSET_YUY2  0x400000
 #define OFFSET_UYVY  0x500000
 #define OFFSET_YV12  0x600000
 static SVGAOverlayUnit overlays[] = {
   // #0 - YUY2 Large
   {
      .enabled = TRUE,
      .format = VMWARE_FOURCC_YUY2,
      .width = TESTCARD_WIDTH,
      .height = TESTCARD_HEIGHT,
      .srcWidth = TESTCARD_WIDTH,
      .srcHeight = TESTCARD_HEIGHT,
      .dstX = 109,
      .dstY = 407,
      .dstWidth = 320,
      .dstHeight = 240,
      .pitches[0] = TESTCARD_WIDTH * 2,
      .dataOffset = OFFSET_YUY2,
   },
   // #1 - YV12 Large
   {
      .enabled = TRUE,
      .format = VMWARE_FOURCC_YV12,
      .width = TESTCARD_WIDTH,
      .height = TESTCARD_HEIGHT,
      .srcWidth = TESTCARD_WIDTH,
      .srcHeight = TESTCARD_HEIGHT,
      .dstX = 564,
      .dstY = 58,
      .dstWidth = 320,
      .dstHeight = 240,
      .pitches[0] = TESTCARD_WIDTH,
      .pitches[1] = TESTCARD_WIDTH / 2,
      .pitches[2] = TESTCARD_WIDTH / 2,
      .dataOffset = OFFSET_YV12,
   },
   // #2 - UYVY Large
   {
      .enabled = TRUE,
      .format = VMWARE_FOURCC_UYVY,
      .width = TESTCARD_WIDTH,
      .height = TESTCARD_HEIGHT,
      .srcWidth = TESTCARD_WIDTH,
      .srcHeight = TESTCARD_HEIGHT,
      .dstX = 564,
      .dstY = 407,
      .dstWidth = 320,
      .dstHeight = 240,
      .pitches[0] = TESTCARD_WIDTH * 2,
      .dataOffset = OFFSET_UYVY,
   },
   // #3 - YUY2 Small
   {
      .enabled = TRUE,
      .format = VMWARE_FOURCC_YUY2,
      .width = TESTCARD_WIDTH,
      .height = TESTCARD_HEIGHT,
      .srcX = 34,
      .srcY = 31,
      .srcWidth = 76,
      .srcHeight = 79,
      .dstX = 109,
      .dstY = 652,
      .dstWidth = 64,
      .dstHeight = 64,
      .pitches[0] = TESTCARD_WIDTH * 2,
      .dataOffset = OFFSET_YUY2,
   },
   // #4 - YV12 Small
   {
      .enabled = TRUE,
      .format = VMWARE_FOURCC_YV12,
      .width = TESTCARD_WIDTH,
      .height = TESTCARD_HEIGHT,
      .srcX = 34,
      .srcY = 31,
      .srcWidth = 76,
      .srcHeight = 79,
      .dstX = 564,
      .dstY = 303,
      .dstWidth = 64,
      .dstHeight = 64,
      .pitches[0] = TESTCARD_WIDTH,
      .pitches[1] = TESTCARD_WIDTH / 2,
      .pitches[2] = TESTCARD_WIDTH / 2,
      .dataOffset = OFFSET_YV12,
   },
   // #5 - UYVY Small
   {
      .enabled = TRUE,
      .format = VMWARE_FOURCC_UYVY,
      .width = TESTCARD_WIDTH,
      .height = TESTCARD_HEIGHT,
      .srcX = 34,
      .srcY = 31,
      .srcWidth = 76,
      .srcHeight = 79,
      .dstX = 564,
      .dstY = 652,
      .dstWidth = 64,
      .dstHeight = 64,
      .pitches[0] = TESTCARD_WIDTH * 2,
      .dataOffset = OFFSET_UYVY,
   },
 };
 /*
 * convertUYVYtoYUY2 --
 *
 *    Convert the test card image from UYVY format to YUY2.
 *    Both of these are packed-pixel formats, they just use
 *    different byte orders.
 */
 static void
 convertUYVYtoYUY2(uint8 *src,   // IN
                  uint8 *dest)  // OUT
 {
   uint32 numWords = TESTCARD_WIDTH / 2 * TESTCARD_HEIGHT;
   while (numWords--) {
      uint8 u  = *(src++);
      uint8 y1 = *(src++);
      uint8 v  = *(src++);
      uint8 y2 = *(src++);
      *(dest++) = y1;
      *(dest++) = u;
      *(dest++) = y2;
      *(dest++) = v;
   }
 }
 /*
 * convertUYVYtoYV12 --
 *
 *    Convert the test card image from UYVY format (packed pixel) to
 *    YV12 (planar). This vertically decimates the chroma planes by
 *    1/2.
 */
 static void
 convertUYVYtoYV12(uint8 *src,   // IN
                  uint8 *dest)  // OUT
 {
   /*
    * Y plane, full resolution.
    */
   uint8 *s = src;
   uint32 numWords = TESTCARD_WIDTH / 2 * TESTCARD_HEIGHT;
   while (numWords--) {
      s++;                 // U
      *(dest++) = *(s++);  // Y1
      s++;                 // V
      *(dest++) = *(s++);  // Y2
   }
   /*
    * U and V planes, at 1/2 height.
    */
   uint32 x, y;
   const uint32 pitch = TESTCARD_WIDTH * 2;
   uint8 *line1 = src;
   uint8 *v = dest;
   uint8 *u = v + (TESTCARD_WIDTH * TESTCARD_HEIGHT) / 4;
   for (y = TESTCARD_HEIGHT/2; y; y--) {
      uint8 *line2 = line1 + pitch;
      for (x = TESTCARD_WIDTH/2; x; x--) {
         uint8 u1 = *(line1)++;  // U
         line1++;                // Y1
         uint8 v1 = *(line1)++;  // V
         line1++;                // Y2
         uint8 u2 = *(line2)++;  // U
         line2++;                // Y1
         uint8 v2 = *(line2)++;  // V
         line2++;                // Y2
         *(u++) = ((int)u1 + (int)u2) >> 1;
         *(v++) = ((int)v1 + (int)v2) >> 1;
      }
      line1 = line2;
   }
 }
 /*
 * main --
 *
 *    Set up the virtual hardware, decompress the YUV images, and
 *    program the overlay units.
 */
 int
 main(void)
 {
   PNGChunkIHDR *screenPNG = PNG_Header(screenPNGFile->ptr);
   uint32 width = bswap32(screenPNG->width);
   uint32 height = bswap32(screenPNG->height);
   uint32 streamId;
   Intr_Init();
   Intr_SetFaultHandlers(SVGA_DefaultFaultHandler);
   SVGA_Init();
   SVGA_SetMode(width, height, 32);
   /*
    * Draw the background image
    */
   PNG_DecompressBGRX(screenPNG, (uint32*) gSVGA.fbMem, gSVGA.pitch);
   SVGA_Update(0, 0, width, height);
   /*
    * Decompress the YUY2 image, and use it to generate UYVY and YV12 versions.
    */
   DataFile_Decompress(testCardFile, gSVGA.fbMem + OFFSET_UYVY, 0x100000);
   convertUYVYtoYUY2(gSVGA.fbMem + OFFSET_UYVY, gSVGA.fbMem + OFFSET_YUY2);
   convertUYVYtoYV12(gSVGA.fbMem + OFFSET_UYVY, gSVGA.fbMem + OFFSET_YV12);
   /*
    * Program the overlay units
    */
   for (streamId = 0; streamId < arraysize(overlays); streamId++) {
      SVGA_VideoSetAllRegs(streamId, &overlays[streamId], SVGA_VIDEO_PITCH_3);
      SVGA_VideoFlush(streamId);
   }
   return 0;
 }
--- a/examples/video-formats/screen.png
+++ b/examples/video-formats/screen.png
--- a/examples/video-formats/wols4x3.yuv
+++ b/examples/video-formats/wols4x3.yuv
--- a/examples/video-sync/Makefile
+++ b/examples/video-sync/Makefile
@ -0,0 +1,6 @@
 TARGET = video-sync.img
 APP_SOURCES = main.c screen.png.data.o
 LIB_DIR = ../../lib
 include $(LIB_DIR)/Makefile.rules
--- a/examples/video-sync/main.c
+++ b/examples/video-sync/main.c
@ -0,0 +1,147 @@
 /*
 * video-sync -- Test video DMA synchronization, by displaying a
 * sequence of animated frames with flow control and multi-frame
 * buffering.
 *
 * Copyright (C) 2008-2009 VMware, Inc. Licensed under the MIT
 * License, please see the README.txt. All rights reserved.
 */
 #include "svga.h"
 #include "png.h"
 #include "intr.h"
 #include "datafile.h"
 /*
 * Our background image, in PNG format.
 */
 DECLARE_DATAFILE(screenPNGFile, screen_png);
 /*
 * generateFrame --
 *
 *    Generate one frame of video, in UYVY format.
 */
 static void
 generateFrame(uint8 *buffer,  // OUT
              uint32 width,   // IN
              uint32 height,  // IN
              uint32 frame)   // IN
 {
   uint32 wordPitch = width / 2;
   uint32 numWords = wordPitch * height;
   int x = frame % width;
   uint32 *linePtr = (uint32*)buffer + (x >> 1);
   uint32 lineWord;
   /*
    * Clear it multiple times, so it will be obvious if the
    * host reads a frame that we're still writing to.
    */
   //                 Y1VVY0UU
   memset32(buffer, 0xFFFFFFFF, numWords);
   memset32(buffer, 0x40804080, numWords);
   /*
    * Draw a vertical line that moves right on each frame.  This is
    * the easiest way to make it obvious when the image tears.
    *
    * This test will also show when the luminance bytes in the
    * packed-pixel decoder are out of order.
    */
   if (x & 1) {
      lineWord = 0xFF804080;
   } else {
      lineWord = 0x4080FF80;
   }
   while (height--) {
      *linePtr = lineWord;
      linePtr += wordPitch;
   }
 }
 /*
 * main --
 *
 *    Initialization and main loop.
 */
 int
 main(void)
 {
   PNGChunkIHDR *screenPNG = PNG_Header(screenPNGFile->ptr);
   uint32 width = bswap32(screenPNG->width);
   uint32 height = bswap32(screenPNG->height);
   Intr_Init();
   Intr_SetFaultHandlers(SVGA_DefaultFaultHandler);
   SVGA_Init();
   SVGA_SetMode(width, height, 32);
   /*
    * Draw the background image
    */
   PNG_DecompressBGRX(screenPNG, (uint32*) gSVGA.fbMem, gSVGA.pitch);
   SVGA_Update(0, 0, width, height);
   /*
    * Initialize the video overlay unit. We're displaying DVD-resolution
    * letterboxed 16:9 video, in UYVY (packed-pixel) format.
    */
   SVGAOverlayUnit overlay = {
      .enabled = TRUE,
      .format = VMWARE_FOURCC_UYVY,
      .width = 720,
      .height = 480,
      .srcWidth = 720,
      .srcHeight = 480,
      .dstX = 1,
      .dstY = 92,
      .dstWidth = 1022,
      .dstHeight = 574,
      .pitches[0] = 1440,
   };
   SVGA_VideoSetAllRegs(0, &overlay, SVGA_VIDEO_PITCH_3);
   /*
    * Main loop. Loop over each frame in the ring buffer repeatedly.
    * We wait for the DMA buffer to become available, fill it with the
    * next frame, then program the overlay unit to display that frame.
    */
   uint32 frameCounter = 0;
   uint32 baseOffset = width * height * 4;
   uint32 frameSize = overlay.pitches[0] * overlay.height;
   static uint32 fences[16];
   while (1) {
      uint32 bufferId;
      for (bufferId = 0; bufferId < arraysize(fences); bufferId++) {
         uint32 bufferOffset = baseOffset + bufferId * frameSize;
         uint8 *bufferPtr = gSVGA.fbMem + bufferOffset;
         SVGA_SyncToFence(fences[bufferId]);
         generateFrame(bufferPtr, overlay.width, overlay.height, frameCounter++);
         SVGA_VideoSetReg(0, SVGA_VIDEO_DATA_OFFSET, bufferOffset);
         SVGA_VideoFlush(0);
         fences[bufferId] = SVGA_InsertFence();
      }
   }
   return 0;
 }
--- a/examples/video-sync/screen.png
+++ b/examples/video-sync/screen.png
--- a/lib/Makefile.rules
+++ b/lib/Makefile.rules
@ -0,0 +1,132 @@
 #
 # Common GNU Make rules for the VMware SVGA examples.
 #
 # To build your own apps, you just need a makefile which
 # defines a few variables and includes this one. For example:
 #
 #    LIB_DIR = path/to/lib
 #    TARGET = myapp.img
 #    APP_MODULES = main
 #    include $(LIB_DIR)/Makefile.rules
 #
 # All examples get compiled with all library code, and we let
 # GCC garbage collect modules that aren't being used.
 #
 # Basic options necessary to produce our standalone binary.
 # Produce 32-bit code, even on 64-bit machines. Don't use
 # the standard library at all. Begin the text segment at 1MB.
 CFLAGS := -m32 -ffreestanding -nostdinc -fno-stack-protector
 LDFLAGS := -nostdlib -Wl,-T,$(LIB_DIR)/metalkit/image.ld
 # Extra warnings
 CFLAGS += -Wall -Werror
 # Size Optimizations.
 CFLAGS += -Os -Wl,--gc-sections -ffunction-sections -fdata-sections
 # This enables extra gcc builtins for floating point math.
 CFLAGS += -march=i686 -ffast-math
 # Generate debug symbols. These only show up in the .elf file, not the
 # final image. Recent versions of VMware have a gdb debug stub that
 # you can use along with these symbols for source-level debugging of
 # Metalkit apps.
 CFLAGS += -g
 # Most of the examples only need 4MB of memory. Some examples
 # override this, so only set it if it isn't already defined.
 ifeq ($(VMX_MEMSIZE),)
  VMX_MEMSIZE = 4
 endif
 CFLAGS += \
   -I$(LIB_DIR)/metalkit \
   -I$(LIB_DIR)/util \
   -I$(LIB_DIR)/refdriver \
   -I$(LIB_DIR)/vmware \
 SOURCES := \
   $(LIB_DIR)/metalkit/boot.S \
   $(LIB_DIR)/metalkit/pci.c \
   $(LIB_DIR)/metalkit/intr.c \
   $(LIB_DIR)/metalkit/console.c \
   $(LIB_DIR)/metalkit/console_vga.c \
   $(LIB_DIR)/metalkit/puff.c \
   $(LIB_DIR)/metalkit/timer.c \
   $(LIB_DIR)/metalkit/keyboard.c \
   $(LIB_DIR)/metalkit/bios.c \
   $(LIB_DIR)/metalkit/apm.c \
   $(LIB_DIR)/metalkit/gcc_support.c \
   $(LIB_DIR)/util/matrix.c \
   $(LIB_DIR)/util/svga3dutil.c \
   $(LIB_DIR)/util/svga3dtext.c \
   $(LIB_DIR)/util/vmbackdoor.c \
   $(LIB_DIR)/util/mt19937ar.c \
   $(LIB_DIR)/util/png.c \
   $(LIB_DIR)/refdriver/svga.c \
   $(LIB_DIR)/refdriver/svga3d.c \
   $(LIB_DIR)/refdriver/gmr.c \
   $(APP_SOURCES)
 ELF_TARGET := $(subst .img,.elf,$(TARGET))
 LST_TARGET := $(subst .img,.lst,$(TARGET))
 VMX_TARGET := $(subst .img,.vmx,$(TARGET))
 PLAIN_TARGET := $(subst .img,,$(TARGET))
 .PHONY: all target clean sizeprof listing
 target: $(TARGET) $(VMX_TARGET)
 %.lst: %.elf
 	objdump -d $< > $@
 %.img: %.elf
 	objcopy -O binary $< $@
 # Stackable rules for processing data files
 %.data.o: %
 	objcopy -I binary -O elf32-i386 -B i386 $< $@
 %.z: %
 	python $(LIB_DIR)/metalkit/deflate.py < $< > $@
 # To optimize size, we compile all input files in one step. This
 # lets GCC use information available from all files during its
 # optimization phase.
 $(ELF_TARGET): $(SOURCES)
 	$(CC) $(LDFLAGS) $(CFLAGS) -o $@ $(SOURCES)
 clean:
 	rm -f $(TARGET) $(ELF_TARGET) $(LST_TARGET) $(VMX_TARGET) *.o
 # This is a phony target which prints a list of symbols, sorted by
 # size, and excluding the BSS segment. This is a quick way to see
 # which functions and initialized data are taking the most space in
 # the final binary.
 sizeprof: $(ELF_TARGET)
 	@nm --size-sort -S $< | egrep -v " [bBsS] "
 # Another phony target, for convenience, which dumps an assembly
 # listing to stdout.
 listing: $(ELF_TARGET)
 	objdump -d $<
 # Generate a .vmx config file for VMware
 $(VMX_TARGET):
 	@echo config.version = 8 > $(VMX_TARGET)
 	@echo virtualHW.version = 7 >> $(VMX_TARGET)
 	@echo memsize = $(VMX_MEMSIZE) >> $(VMX_TARGET)
 	@echo displayname = $(PLAIN_TARGET) >> $(VMX_TARGET)
 	@echo guestOS = other >> $(VMX_TARGET)
 	@echo mks.enable3d = TRUE >> $(VMX_TARGET)
 	@echo floppy0.startConnected = TRUE >> $(VMX_TARGET)
 	@echo floppy0.fileType = file >> $(VMX_TARGET)
 	@echo floppy0.fileName = $(TARGET) >> $(VMX_TARGET)
--- a/lib/README
+++ b/lib/README
@ -0,0 +1,25 @@
 Library Code
 ------------
 metalkit -
   Open source (MIT-licensed) library code for writing programs
   that run on the IA32 architecture on the "bare metal", without
   an operating system.
 refdriver -
   Source code for the VMware SVGA reference driver.
 util -
   Utility code used by the accompanying examples. Includes higher
   level APIs built on top of the reference driver, as well as
   miscellaneous code such as text rendering and matrix math.
 vmware -
   VMware-provided header files, including the headers which define
   registers and FIFO commands used by the VMware SVGA device.
 win32 -
   A Win32 port of the VMWare SVGA reference driver. This driver runs
   in userspace, using the kernel mode interface provided by VMware's
   proprietary kernel-mode graphics driver for Windows XP.
--- a/lib/metalkit/apm.c
+++ b/lib/metalkit/apm.c
@ -0,0 +1,164 @@
 /* -*- Mode: C; c-basic-offset: 3 -*-
 *
 * apm.c - Support for the legacy Advanced Power Management (APM) BIOS
 *
 * This file is part of Metalkit, a simple collection of modules for
 * writing software that runs on the bare metal. Get the latest code
 * at http://svn.navi.cx/misc/trunk/metalkit/
 *
 * Copyright (c) 2009 Micah Dowty
 *
 * Permission is hereby granted, free of charge, to any person
 * obtaining a copy of this software and associated documentation
 * files (the "Software"), to deal in the Software without
 * restriction, including without limitation the rights to use,
 * copy, modify, merge, publish, distribute, sublicense, and/or sell
 * copies of the Software, and to permit persons to whom the
 * Software is furnished to do so, subject to the following
 * conditions:
 *
 * The above copyright notice and this permission notice shall be
 * included in all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
 * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
 * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
 * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
 * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
 * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
 * OTHER DEALINGS IN THE SOFTWARE.
 */
 #include "apm.h"
 #include "bios.h"
 #include "intr.h"
 APMState gAPM;
 /*
 * APM_Init --
 *
 *    Probe for APM support. If APM is available, this connects to APM.
 *
 *    It would be easier to use the 16-bit real mode interface via
 *    Metalkit's BIOS module, but that wouldn't work very well for us
 *    because we can't handle interrupts during a real-mode BIOS call.
 *    So, any APM_Idle() call would hang!
 *
 *    Instead, we need to use 16-bit BIOS calls to bootstrap APM, then
 *    we do our real work via the 32-bit APM interface that is present
 *    in all APM 1.2 BIOSes.
 *
 *    On exit, gAPM will have a valid 'connected' flag, APM version,
 *    and flags.
 */
 fastcall void
 APM_Init()
 {
   APMState *self = &gAPM;
   Regs reg = {};
   /* Real mode "APM Installation Check" call */
   reg.ax = 0x5300;
   reg.bx = 0x0000;
   BIOS_Call(0x15, &reg);
   if (reg.bx == SIGNATURE_APM && reg.cf == 0) {
      self->version = reg.ax;
      self->flags = reg.cx;
   } else {
      return;
   }
   /* Real-mode interface connect */
   reg.ax = 0x5303;
   reg.bx = 0x0000;
   BIOS_Call(0x15, &reg);
   if (reg.cf != 0) {
      return;
   }
   /* Indicate that we want APM v1.2 */
   reg.ax = 0x530e;
   reg.bx = 0x0000;
   reg.cx = 0x0102;
   BIOS_Call(0x15, &reg);
   if (reg.cf != 0) {
      return;
   }
   /* Success! */
   self->connected = TRUE;
 }
 /*
 * APM_Idle --
 *
 *    If we're connected to APM, issue a "CPU Idle" call. The BIOS may
 *    halt the CPU until the next interrupt and/or slow or stop the
 *    CPU clock.
 *
 *    If we aren't connected to APM or the APM call is unsuccessful,
 *    this issue a CPU HLT instruction.
 */
 fastcall void
 APM_Idle()
 {
   /*
    * XXX: This doesn't actually work, because BIOS_Call disables
    *      interrupts!  To get idle calls working, we'll need to use
    *      the real 32-bit APM interface.
    */
 #if 0
   APMState *self = &gAPM;
   if (self->connected) {
      Regs reg = {};
      /* Real mode "CPU Idle" call */
      reg.ax = 0x5305;
      BIOS_Call(0x15, &reg);
      if (reg.cf == 0) {
         /* Success */
         return;
      }
   }
 #endif
   /* Fall back to CPU HLT */
   Intr_Halt();
 }
 /*
 * APM_SetPowerState --
 *
 *    Set the power state of all APM-managed devices.
 *    If we aren't connected to APM, always fails.
 *
 *    Returns TRUE on success, FALSE on error.
 */
 fastcall Bool
 APM_SetPowerState(uint16 state)
 {
   APMState *self = &gAPM;
   if (self->connected) {
      Regs reg = {};
      reg.ax = 0x5307;   // APM Set Power State
      reg.bx = 0x0001;   // All devices
      reg.cx = state;
      BIOS_Call(0x15, &reg);
      return reg.cf == 0;
   }
   return FALSE;
 }
--- a/lib/metalkit/apm.h
+++ b/lib/metalkit/apm.h
@ -0,0 +1,65 @@
 /* -*- Mode: C; c-basic-offset: 3 -*-
 *
 * apm.c - Support for the legacy Advanced Power Management (APM) BIOS
 *
 * This file is part of Metalkit, a simple collection of modules for
 * writing software that runs on the bare metal. Get the latest code
 * at http://svn.navi.cx/misc/trunk/metalkit/
 *
 * Copyright (c) 2008-2009 Micah Dowty
 *
 * Permission is hereby granted, free of charge, to any person
 * obtaining a copy of this software and associated documentation
 * files (the "Software"), to deal in the Software without
 * restriction, including without limitation the rights to use,
 * copy, modify, merge, publish, distribute, sublicense, and/or sell
 * copies of the Software, and to permit persons to whom the
 * Software is furnished to do so, subject to the following
 * conditions:
 *
 * The above copyright notice and this permission notice shall be
 * included in all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
 * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
 * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
 * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
 * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
 * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
 * OTHER DEALINGS IN THE SOFTWARE.
 */
 #ifndef __APM_H__
 #define __APM_H__
 #include "types.h"
 #include "bios.h"
 #define SIGNATURE_APM    0x504d    // "PM"
 #define APM_FLAG_16BIT             (1 << 0)
 #define APM_FLAG_32BIT             (1 << 1)
 #define APM_FLAG_SLOW_CPU_ON_IDLE  (1 << 2)
 #define APM_FLAG_DISABLED          (1 << 3)
 #define APM_FLAG_DISENGAGED        (1 << 4)
 /* APM power states */
 #define POWER_ON        0
 #define POWER_STANDBY   1
 #define POWER_SUSPEND   2
 #define POWER_OFF       3
 typedef struct {
   Bool         connected;         // Are we successfully connected to APM?
   uint16       version;           // Supported APM version in BCD, 0 if not supported
   uint16       flags;
 } APMState;
 extern APMState gAPM;
 fastcall void APM_Init();
 fastcall void APM_Idle();
 fastcall Bool APM_SetPowerState(uint16 state);
 #endif /* __APM_H_ */
--- a/lib/metalkit/bios.c
+++ b/lib/metalkit/bios.c
@ -0,0 +1,247 @@
 /* -*- Mode: C; c-basic-offset: 3 -*-
 *
 * bios.c - Make real-mode BIOS calls from protected mode.
 *          For simplicity and small size, this implementation
 *          switches back to real-mode rather than using virtual 8086
 *          mode. A v86 mode implementation may be more robust.
 *
 * This file is part of Metalkit, a simple collection of modules for
 * writing software that runs on the bare metal. Get the latest code
 * at http://svn.navi.cx/misc/trunk/metalkit/
 *
 * Copyright (c) 2008-2009 Micah Dowty
 *
 * Permission is hereby granted, free of charge, to any person
 * obtaining a copy of this software and associated documentation
 * files (the "Software"), to deal in the Software without
 * restriction, including without limitation the rights to use,
 * copy, modify, merge, publish, distribute, sublicense, and/or sell
 * copies of the Software, and to permit persons to whom the
 * Software is furnished to do so, subject to the following
 * conditions:
 *
 * The above copyright notice and this permission notice shall be
 * included in all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
 * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
 * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
 * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
 * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
 * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
 * OTHER DEALINGS IN THE SOFTWARE.
 */
 #include "bios.h"
 #include "boot.h"
 #include "intr.h"
 /*
 * BIOSCallInternal --
 *
 *    Internal implementation of BIOS_Call. This is a C function which
 *    wraps the assembly-language internal implementation. The only
 *    reason to use C here, really, is so we can easily calculate
 *    offsets into our shared C structures.
 *
 *    This function must not make any function calls, since we need to
 *    be able to trust the value of %esp. This is also why it must not
 *    be inlined into BIOS_Call itself.
 */
 static __attribute__((noinline)) void
 BIOSCallInternal(void)
 {
   /*
    * Save registers and stack in a safe place.
    */
   asm volatile ("pusha");
   asm volatile ("mov %%esp, %0" :"=m" (BIOS_SHARED->esp));
   /*
    * Jump the the relocated 16-bit trampoline (source code below).
    */
   asm volatile ("ljmp %0, %1"
                 :: "i" (BOOT_CODE16_SEG), "i" (BIOS_SHARED->trampoline));
   /*
    * This is where we return from the relocated trampoline.
    * We're back in protected mode, but the data segments
    * are still 16-bit. Restore them.
    */
   asm volatile ("BIOSReturn32: \n"
                 "mov %0, %%ax \n"
                 "mov %%ax, %%ss \n"
                 "mov %%ax, %%ds \n"
                 "mov %%ax, %%es \n"
                 "mov %%ax, %%fs \n"
                 "mov %%ax, %%gs \n"
                 :: "i" (BOOT_DATA_SEG));
   /*
    * Restore our stack and saved registers.
    * Now we can safely execute C code again.
    */
   asm volatile("mov %0, %%esp" ::"m" (BIOS_SHARED->esp));
   asm volatile ("popa");
   /*
    * Return here. The rest of this code is never run directly,
    * but we need to prevent GCC from optimizing it out.
    */
   asm volatile("jmp BIOSTrampolineEnd\n");
   /*
    * This is a 16-bit assembly-language trampoline, relocated at
    * runtime to low memory, which actually makes the BIOS call. It
    * handles saving/restoring registers, and it switches in and out
    * of real mode.
    *
    * This code is never run directly.
    */
   asm volatile("BIOSTrampoline: .code16");
    /*
     * Switch to our 16-bit data segment.
     */
   asm volatile("movw %0, %%ax \n"
                "movw %%ax, %%ds \n"
                "movw %%ax, %%es \n"
                "movw %%ax, %%ss \n"
                :: "i" (BOOT_DATA16_SEG));
   /*
    * Disable protected mode.
    */
   asm volatile("movl %cr0, %eax \n"
                "andl $(~1), %eax \n"
                "movl %eax, %cr0 \n");
   /*
    * Do another long jump to reset the real-mode %cs
    * register to a valid paragraph number. Right now
    * it's still a protected-mode-style selector index.
    *
    * XXX: I'm not sure how to do this address calculation cleanly.
    *      Currently I'm hardcoding the address of the relocated trampoline.
    */
   asm volatile("ljmp $0, $(BIOSTrampolineCS16 - BIOSTrampoline + 0x7C00)\n"
                "BIOSTrampolineCS16: \n");
   /*
    * Set up the real-mode stack and %cs register.
    */
   asm volatile("xorw %%ax, %%ax \n"
                "mov %%ax, %%ss \n"
                "mov %0, %%esp \n"
                :: "i" (&BIOS_SHARED->stackTop[-sizeof(Regs)]));
   /*
    * Pop Regs off the stack.
    */
   asm volatile("pop %ds \n"
                "pop %es \n"
                "pop %eax \n"   // Ignore EFLAGS value.
                "popal \n");
   /*
    * This interrupt instruction is a placeholder that gets
    * patched at runtime (after relocation) to point to the
    * right interrupt vector.
    */
   asm volatile("BIOSTrampolineVector: \n"
                "int $0xFF");
   /*
    * Push Regs back onto the stack.
    */
   asm volatile("pushal \n"
                "pushfl \n"
                "push %es \n"
                "push %ds \n");
   /*
    * Enable protected mode.
    */
   asm volatile("movl %cr0, %eax \n"
                "orl $1, %eax \n"
                "movl %eax, %cr0 \n");
   /*
    * Return via a long 16-to-32 bit jump.
    */
   asm volatile("data32 ljmp %0, $BIOSReturn32 \n"
                :: "i" (BOOT_CODE_SEG));
   asm volatile("BIOSTrampolineEnd: .code32 \n");
 }
 extern struct {
   uint16 limit;
   uint32 base;
 } PACKED IDTDesc;
 /*
 * BIOS_Call --
 *
 *    Make BIOS calls after boot, by temporarily switching
 *    back into real mode.
 *
 *    This function relocates the trampoline and stack into
 *    real-mode-addressable low memory, then makes a 32-to-16-bit jump
 *    into the trampoline.
 */
 fastcall void
 BIOS_Call(uint8 vector, Regs *regs)
 {
   extern uint8 BIOSTrampoline[];
   extern uint8 BIOSTrampolineVector[];
   extern uint8 BIOSTrampolineEnd[];
   const uint32 trampSize = (uint8*)BIOSTrampolineEnd - (uint8*)BIOSTrampoline;
   const uint32 vectorOffset = (uint8*)BIOSTrampolineVector - (uint8*)BIOSTrampoline + 1;
   Bool iFlag = Intr_Save();
   Intr_Disable();
   /*
    * Relocate the trampoline code itself.
    */
   memcpy(BIOS_SHARED->trampoline, BIOSTrampoline, trampSize);
   /*
    * Save the 32-bit IDT descriptor, and set up a legacy 256-entry
    * 16-bit IDT descriptor.
    */
   asm volatile("sidt %0" : "=m" (BIOS_SHARED->idtr32));
   BIOS_SHARED->idtr16.base = 0;
   BIOS_SHARED->idtr16.limit = 0x3ff;
   asm volatile("lidt %0" :: "m" (BIOS_SHARED->idtr16));
   /*
    * Binary-patch the trampoline code with the right interrupt vector.
    */
   BIOS_SHARED->trampoline[vectorOffset] = vector;
   /*
    * Copy Regs onto the top of the 16-bit stack.
    */
   memcpy(&BIOS_SHARED->stackTop[-sizeof *regs], regs, sizeof *regs);
   BIOSCallInternal();
   /* Copy Regs back */
   memcpy(regs, &BIOS_SHARED->stackTop[-sizeof *regs], sizeof *regs);
   /*
    * Back to 32-bit IDT.
    */
   asm volatile("lidt %0" :: "m" (BIOS_SHARED->idtr32));
   Intr_Restore(iFlag);
 }
--- a/lib/metalkit/bios.h
+++ b/lib/metalkit/bios.h
@ -0,0 +1,176 @@
 /* -*- Mode: C; c-basic-offset: 3 -*-
 *
 * bios.h - Make real-mode BIOS calls from protected mode.
 *          For simplicity and small size, this implementation
 *          switches back to real-mode rather than using virtual 8086
 *          mode. A v86 mode implementation may be more robust.
 *
 * This file is part of Metalkit, a simple collection of modules for
 * writing software that runs on the bare metal. Get the latest code
 * at http://svn.navi.cx/misc/trunk/metalkit/
 *
 * Copyright (c) 2008-2009 Micah Dowty
 *
 * Permission is hereby granted, free of charge, to any person
 * obtaining a copy of this software and associated documentation
 * files (the "Software"), to deal in the Software without
 * restriction, including without limitation the rights to use,
 * copy, modify, merge, publish, distribute, sublicense, and/or sell
 * copies of the Software, and to permit persons to whom the
 * Software is furnished to do so, subject to the following
 * conditions:
 *
 * The above copyright notice and this permission notice shall be
 * included in all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
 * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
 * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
 * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
 * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
 * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
 * OTHER DEALINGS IN THE SOFTWARE.
 */
 #ifndef __BIOS_H__
 #define __BIOS_H__
 #include "types.h"
 #include "boot.h"
 typedef struct Regs {
   /*
    * Subset of segment registers
    */
   uint16  ds;
   uint16  es;
   /*
    * CPU flags (Saved on BIOS exit, ignored on entry)
    */
   union {
      uint16 flags;
      uint32 eflags;
      struct {
         uint32 cf : 1;
         uint32 reserved_0 : 1;
         uint32 pf : 1;
         uint32 reserved_1 : 1;
         uint32 af : 1;
         uint32 reserved_2 : 1;
         uint32 zf : 1;
         uint32 sf : 1;
         uint32 tp : 1;
         uint32 intf : 1;
         uint32 df : 1;
         uint32 of : 1;
         uint32 iopl : 2;
         uint32 nt : 1;
         uint32 reserved_3 : 1;
         uint32 rf : 1;
         uint32 vm : 1;
         uint32 vif : 1;
         uint32 vip : 1;
         uint32 id : 1;
         uint32 reserved_4 : 10;
      };
   };
   /*
    * General purpose 32-bit registers, in the order expected by
    * pushad/popad.  Note that while most BIOS routines need only the
    * 16-bit portions of these registers, some 32-bit-aware routines
    * use them even in real mode.
    */
   union {
      uint32 edi;
      uint16 di;
   };
   union {
      uint32 esi;
      uint16 si;
   };
   union {
      uint32 ebp;
      uint16 bp;
   };
   union {           // Saved on BIOS exit, ignored on entry
      uint32 esp;
      uint16 sp;
   };
   union {
      uint32  ebx;
      uint16  bx;
      struct {
         uint8 bl;
         uint8 bh;
      };
   };
   union {
      uint32  edx;
      uint16  dx;
      struct {
         uint8 dl;
         uint8 dh;
      };
   };
   union {
      uint32  ecx;
      uint16  cx;
      struct {
         uint8 cl;
         uint8 ch;
      };
   };
   union {
      uint32  eax;
      uint16  ax;
      struct {
         uint8 al;
         uint8 ah;
      };
   };
 } PACKED Regs;
 /*
 * This is the communication area between the real-mode BIOS
 * and protected mode. Parts of it are used internally by this
 * module, but the 'userdata' area is available to the caller.
 */
 struct BIOSShared {
   uint8 trampoline[512];
   uint8 stack[4096];
   uint8 stackTop[0];
   uint32 esp;
   struct {
      uint16 limit;
      uint32 base;
   } PACKED idtr16, idtr32;
   uint8 userdata[1024];
 } PACKED;
 #define BIOS_SHARED  ((struct BIOSShared*) BOOT_REALMODE_SCRATCH)
 /*
 * Macros for converting between 32-bit and 16-bit near/far pointers.
 */
 typedef uint32 far_ptr_t;
 #define PTR_32_TO_NEAR(p, seg)   ((uint16)((uint32)(p) - ((seg) << 4)))
 #define PTR_NEAR_TO_32(seg, off) ((void*)((((uint32)(seg)) << 4) + ((uint32)(off))))
 #define PTR_FAR_TO_32(p)         PTR_NEAR_TO_32(p >> 16, p & 0xFFFF)
 /*
 * Public entry point.
 */
 fastcall void BIOS_Call(uint8 vector, Regs *regs);
 #endif /* __BIOS_H__ */
--- a/lib/metalkit/boot.S
+++ b/lib/metalkit/boot.S
@ -0,0 +1,586 @@
 /*
 * boot.S --
 *
 *    This is a tiny but relatively featureful bootloader for
 *    32-bit standalone apps and kernels. It compiles into one
 *    binary that can be used either stand-alone (loaded directly
 *    by the BIOS, from a floppy or USB disk image) or as a GNU
 *    Multiboot image, loaded by GRUB.
 *
 *    This bootloader loads itself and the attached main program
 *    at 1MB, with the available portions of the first megabyte of
 *    RAM set up as stack space by default.
 *
 *    This loader is capable of loading an arbitrarily big binary
 *    image from the boot device into high memory. If you're booting
 *    from a floppy, it can load the whole 1.44MB disk. If you're
 *    booting from USB, it can load any amount of data from the USB
 *    disk.
 *
 *    This loader works by using the BIOS's disk services, so we
 *    should be able to read the whole binary image off of any device
 *    the BIOS knows how to boot from. Since we have only a tiny
 *    amount of buffer space, and we need to store the resulting image
 *    above the 1MB boundary, we have to keep switching back and forth
 *    between real mode and protected mode.
 *
 *    To avoid device-specific CHS addressing madness, we require LBA
 *    mode to boot off of anything other than a 1.44MB floppy or a
 *    Multiboot loader. We try to use the INT 13h AH=42h "Extended Read
 *    Sectors From Drive" command, which uses LBA addressing. If this
 *    doesn't work, we fall back to floppy-disk-style CHS addressing.
 *
 *
 * This file is part of Metalkit, a simple collection of modules for
 * writing software that runs on the bare metal. Get the latest code
 * at http://svn.navi.cx/misc/trunk/metalkit/
 *
 * Copyright (c) 2008-2009 Micah Dowty
 *
 * Permission is hereby granted, free of charge, to any person
 * obtaining a copy of this software and associated documentation
 * files (the "Software"), to deal in the Software without
 * restriction, including without limitation the rights to use,
 * copy, modify, merge, publish, distribute, sublicense, and/or sell
 * copies of the Software, and to permit persons to whom the
 * Software is furnished to do so, subject to the following
 * conditions:
 *
 * The above copyright notice and this permission notice shall be
 * included in all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
 * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
 * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
 * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
 * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
 * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
 * OTHER DEALINGS IN THE SOFTWARE.
 */
 #define ASM
 #include "boot.h"
 /*
 * Constants that affect our early boot memory map.
 */
 #define BIOS_START_ADDRESS     0x7C00    // Defined by the BIOS
 #define EARLY_STACK_ADDRESS    0x2000    // In low DOS memory
 #define SECTORS_AT_A_TIME      18        // Must equal CHS sectors per head
 #define SECTOR_SIZE            512
 #define DISK_BUFFER            0x2800
 #define DISK_BUFFER_SIZE       (SECTORS_AT_A_TIME * SECTOR_SIZE)
 #define BIOS_PTR(x)            (x - _start + BIOS_START_ADDRESS)
        .section .boot
        .global _start
        /*
         * External symbols. main() is self-explanatory, but these
         * other symbols must be provided by the linker script. See
         * "image.ld" for the actual partition size and LDT calculations.
         */
        .extern main
        .extern _end
        .extern _edata
        .extern _bss_size
        .extern _stack
        .extern _partition_chs_head
        .extern _partition_chs_sector_byte
        .extern _partition_chs_cylinder_byte
        .extern _partition_blocks
        .extern _ldt_byte0
        .extern _ldt_byte1
        .extern _ldt_byte2
        .extern _ldt_byte3
        /*
         * Other modules can optionally define an LDT in uninitialized
         * memory.  By default this LDT will be all zeroes, but this
         * is a simple and code-size-efficient way of letting other
         * Metalkit modules allocate segment descriptors when they
         * need to.
         *
         * Note that we page-align the LDT. This isn't strictly
         * necessary, but it might be useful for performance in
         * some environments.
         */
        .comm   LDT, BOOT_LDT_SIZE, 4096
        /*
         * This begins our 16-bit DOS MBR boot sector segment. This
         * sits in the first 512 bytes of our floppy image, and it
         * gets loaded by the BIOS at START_ADDRESS.
         *
         * Until we've loaded the memory image off of disk into
         * its final location, this code is running at a different
         * address than the linker is expecting. Any absolute
         * addresses must be fixed up by the BIOS_PTR() macro.
         */
        .code16
 _start:
        ljmp    $0, $BIOS_PTR(bios_main)
        /*
         * gnu_multiboot --
         *
         *    GNU Multiboot header. This can come anywhere in the
         *    first 8192 bytes of the image file.
         */
        .p2align 2
        .code32
 gnu_multiboot:
 #define MULTIBOOT_MAGIC         0x1BADB002
 #define MULTIBOOT_FLAGS         0x00010000
        .long   MULTIBOOT_MAGIC
        .long   MULTIBOOT_FLAGS
        .long   -(MULTIBOOT_MAGIC + MULTIBOOT_FLAGS)
        .long   gnu_multiboot
        .long   _start
        .long   _edata
        .long   _end
        .long   entry32
        /*
         * String table, located in the boot sector.
         */
 loading_str:            .string "\r\nMETALKIT "
 disk_error_str:         .string " err!"
        /*
         * bios_main --
         *
         *    Main routine for our BIOS MBR based loader. We set up the
         *    stack, display some welcome text, then load the rest of
         *    the boot image from disk. We have to use real mode to
         *    call the BIOS's floppy driver, then protected mode to
         *    copy each disk block to its final location above the 1MB
         *    barrier.
         */
        .code16
 bios_main:
        /*
         * Early init: setup our stack and data segments, make sure
         * interrupts are off.
         */
        cli
        xorw    %ax, %ax
        movw    %ax, %ss
        movw    %ax, %ds
        movw    %ax, %es
        movw    $EARLY_STACK_ADDRESS, %sp
        /*
         * Save parameters that the BIOS gave us via registers.
         */
        mov     %dl, BIOS_PTR(disk_drive)
        /*
         * Switch on the A20 gate, so we can access more than 1MB
         * of memory. There are multiple ways to do this: The
         * original way was to write to bit 1 of the keyboard
         * controller's output port. There's also a bit on PS2
         * System Control port A to enable A20.
         *
         * The keyboard controller method should always work, but
         * it's kind of slow and it takes a lot of code space in
         * our already-cramped bootloader. Instead, we ask the BIOS
         * to enable A20.
         *
         * If your computer doesn't support this BIOS interface,
         * you'll see our "err!" message before "METAL" appears.
         *
         * References:
         *    http://www.win.tue.nl/~aeb/linux/kbd/A20.html
         */
        mov     $0x2401, %ax    // Enable A20
        int     $0x15
        jc      fatal_error
        /*
         * Load our image, starting at the beginning of whatever disk
         * the BIOS told us we booted from. The Disk Address Packet
         * (DAP) has already been initialized statically.
         */
        mov     $BIOS_PTR(loading_str), %si
        call    print_str
        /*
         * Fill our DISK_BUFFER, reading SECTORS_AT_A_TIME sectors.
         *
         * First, try to use LBA addressing. This is required in
         * order to boot off of non-floppy devices, like USB drives.
         */
 disk_copy_loop:
        mov     $0x42, %ah
        mov     BIOS_PTR(disk_drive), %dl
        mov     $BIOS_PTR(dap_buffer), %si
        int     $0x13
        jnc     disk_success
        /*
         * If LBA fails, fall back to old fashioned CHS addressing.
         * This works everywhere, but only if we're on a 1.44MB floppy.
         */
        mov     $(0x0200 | SECTORS_AT_A_TIME), %ax
        mov     BIOS_PTR(chs_sector), %cx               // Sector and cylinder
        mov     BIOS_PTR(disk_drive), %dx               // Drive and head
        mov     $DISK_BUFFER, %bx
        int     $0x13
        jnc     disk_success
        /*
         * If both CHS and LBA fail, the error is fatal.
         */
 fatal_error:
        mov     $BIOS_PTR(disk_error_str), %si
        call    print_str
        cli
        hlt
 disk_success:
        mov     $'.', %al
        call    print_char
        /*
         * Enter protected mode, so we can copy this sector to
         * memory above the 1MB boundary.
         *
         * Note that we reset CS, DS, and ES, but we don't
         * modify the stack at all.
         */
        cli
        lgdt    BIOS_PTR(bios_gdt_desc)
        movl    %cr0, %eax
        orl     $1, %eax
        movl    %eax, %cr0
        ljmp    $BOOT_CODE_SEG, $BIOS_PTR(copy_enter32)
        .code32
 copy_enter32:
        movw    $BOOT_DATA_SEG, %ax
        movw    %ax, %ds
        movw    %ax, %es
        /*
         * Copy the buffer to high memory.
         */
        mov     $DISK_BUFFER, %esi
        mov     BIOS_PTR(dest_address), %edi
        mov     $(DISK_BUFFER_SIZE / 4), %ecx
        rep movsl
        /*
         * Next...
         *
         * Even though the CHS and LBA addresses are mutually exclusive,
         * there's no harm in incrementing them both. The LBA increment
         * is pretty straightforward, but CHS is of course less so.
         * We only support CHS on 1.44MB floppies. We always copy one
         * head at a time (SECTORS_AT_A_TIME must equal 18), so we have
         * to hop between disk head 0 and 1, and increment the cylinder
         * on every other head.
         *
         * When we're done copying, branch to entry32 while we're
         * still in protected mode. Also note that we do a long branch
         * to its final address, not it's temporary BIOS_PTR() address.
         */
        addl    $DISK_BUFFER_SIZE, BIOS_PTR(dest_address)
        addl    $SECTORS_AT_A_TIME, BIOS_PTR(dap_sector)
        xorb    $1, BIOS_PTR(chs_head)
        jnz     same_cylinder
        incb    BIOS_PTR(chs_cylinder)
 same_cylinder:
        cmpl    $_edata, BIOS_PTR(dest_address)
        jl      not_done_copying
        ljmp    $BOOT_CODE_SEG, $entry32
 not_done_copying:
        /*
         * Back to 16-bit mode for the next copy.
         *
         * To understand this code, it's important to know the difference
         * between how segment registers are treated in protected-mode and
         * in real-mode. Loading a segment register in PM is actually a
         * request for the processor to fill the hidden portion of that
         * segment register with data from the GDT. When we switch to
         * real-mode, the segment registers change meaning (now they're
         * paragraph offsets again) but that hidden portion of the
         * register remains set.
         */
        /* 1. Load protected-mode segment registers (CS, DS, ES) */
        movw    $BOOT_DATA16_SEG, %ax
        movw    %ax, %ds
        movw    %ax, %es
        ljmp    $BOOT_CODE16_SEG, $BIOS_PTR(copy_enter16)
        /* (We're entering a 16-bit code segment now) */
        .code16
 copy_enter16:
        /* 2. Disable protected mode */
        movl    %cr0, %eax
        andl    $(~1), %eax
        movl    %eax, %cr0
        /*
         * 3. Load real-mode segment registers. (CS, DS, ES)
         */
        xorw    %ax, %ax
        movw    %ax, %ds
        movw    %ax, %es
        ljmp    $0, $BIOS_PTR(disk_copy_loop)
        /*
         * print_char --
         *
         *    Use the BIOS's TTY emulation to output one character, from %al.
         */
        .code16
 print_char:
        mov     $0x0E, %ah
        mov     $0x0001, %bx
        int     $0x10
 ret_label:
        ret
        /*
         * print_str --
         *
         *    Print a NUL-terminated string, starting at %si.
         */
        .code16
 print_str:
        lodsb
        test    %al, %al
        jz      ret_label
        call    print_char
        jmp     print_str
        /*
         * entry32 --
         *
         *    Main 32-bit entry point. To be here, we require that:
         *
         *      - We're running in protected mode
         *      - The A20 gate is enabled
         *      - The entire image is loaded at _start
         *
         *    We jump directly here from GNU Multiboot loaders (like
         *    GRUB), and this is where we jump directly from our
         *    protected mode disk block copy routine after we've copied
         *    the lask block.
         *
         *    We still need to set up our final stack and GDT.
         */
        .code32
 entry32:
        cli
        lgdt    boot_gdt_desc
        movl    %cr0, %eax
        orl     $1, %eax
        movl    %eax, %cr0
        ljmp    $BOOT_CODE_SEG, $entry32_gdt_done
 entry32_gdt_done:
        movw    $BOOT_DATA_SEG, %ax
        movw    %ax, %ds
        movw    %ax, %ss
        movw    %ax, %es
        movw    %ax, %fs
        movw    %ax, %gs
        mov     $_stack, %esp
        /*
         * Zero out the BSS segment.
         */
        xor     %eax, %eax
        mov     $_bss_size, %ecx
        mov     $_edata, %edi
        rep stosb
        /*
         * Set our LDT segment as the current LDT.
         */
        mov     $BOOT_LDT_SEG, %ax
        lldt    %ax
        /*
         * Call main().
         *
         * If it returns, put the machine in a halt loop. We don't
         * disable interrupts: if the main program is in fact done
         * with, but the application is still doing useful work in its
         * interrupt handlers, no reason to stop them.
         */
        call    main
 halt_loop:
        hlt
        jmp     halt_loop
        /*
         * boot_gdt --
         *
         *    This is a Global Descriptor Table that gives us a
         *    code and data segment, with a flat memory model.
         *
         *    See section 3.4.5 of the Intel IA32 software developer's manual.
         */
        .code32
        .p2align 3
 boot_gdt:
        /*
         * This is BOOT_NULL_SEG, the unusable segment zero.
         * Reuse this memory as bios_gdt_desc, a GDT descriptor
         * which uses our pre-relocation (BIOS_PTR) GDT address.
         */
 bios_gdt_desc:
        .word   (boot_gdt_end - boot_gdt - 1)
        .long   BIOS_PTR(boot_gdt)
        .word   0  // Unused
        .word   0xFFFF, 0x0000                  // BOOT_CODE_SEG
        .byte   0x00, 0x9A, 0xCF, 0x00
        .word   0xFFFF, 0x0000                  // BOOT_DATA_SEG
        .byte   0x00, 0x92, 0xCF, 0x00
        .word   0xFFFF, 0x0000                  // BOOT_CODE16_SEG
        .byte   0x00, 0x9A, 0x00, 0x00
        .word   0xFFFF, 0x0000                  // BOOT_DATA16_SEG
        .byte   0x00, 0x92, 0x00, 0x00
        .word   0xFFFF                          // BOOT_LDT_SEG
        .byte   _ldt_byte0
        .byte   _ldt_byte1
        .byte   _ldt_byte2
        .byte   0x82, 0x40
        .byte   _ldt_byte3
 boot_gdt_end:
 boot_gdt_desc:                                  // Uses final address
        .word   (boot_gdt_end - boot_gdt - 1)
        .long   boot_gdt
        /*
         * dap_buffer --
         *
         *    The Disk Address Packet buffer holds the current LBA
         *    disk address. We pass this to BIOS INT 13h, and we
         *    statically initialize it here.
         *
         *    Note that the DAP is only used in LBA mode, not CHS mode.
         *
         * References:
         *    http://en.wikipedia.org/wiki/INT_13
         *        #INT_13h_AH.3D42h:_Extended_Read_Sectors_From_Drive
         */
 dap_buffer:
        .byte   0x10                    // DAP structure size
        .byte   0x00                    // (Unused)
        .byte   SECTORS_AT_A_TIME       // Number of sectors to read
        .byte   0x00                    // (Unused)
        .word   DISK_BUFFER             // Buffer offset
        .word   0x00                    // Buffer segment
 dap_sector:
        .long   0x00000000              // Disk sector number
        .long   0x00000000
        /*
         * Statically initialized disk addressing variables.  The CHS
         * address here is only used in CHS mode, not LBA mode, but
         * the disk drive number and dest address are always used.
         */
 chs_sector:                             // Order matters. Cylinder/sector and head/drive
        .byte   0x01                    //   are packed into words together.
 chs_cylinder:
        .byte   0x00
 disk_drive:
        .byte   0x00
 chs_head:
        .byte   0x00
 dest_address:
        .long   _start                  // Initial dest address for 16-to-32-bit copy.
        /*
         * Partition table and Boot Signature --
         *
         *    This must be at the end of the first 512-byte disk
         *    sector. The partition table marks the end of the
         *    portion of this binary which is loaded by the BIOS.
         *
         *    Each partition record is 16 bytes.
         *
         *    After installing Metalkit, a disk can be partitioned as
         *    long as the space used by the Metalkit binary itself is
         *    reserved. By default, we create a single "Non-FS data"
         *    partition which holds the Metalkit binary. Note that
         *    this default partition starts at sector 1 (the first
         *    sector) so it covers the entire Metalkit image including
         *    bootloader.
         *
         *    Partitions 2 through 4 are unused, and must be all zero
         *    or fdisk will complain.
         *
         * References:
         *    http://en.wikipedia.org/wiki/Master_boot_record
         */
        .org    0x1BE           // Partition 1
 boot_partition_table:
        .byte   0x80                     // Status (Bootable)
        .byte   0x00                     // First block (head, sector/cylinder, cylinder)
        .byte   0x01
        .byte   0x00
        .byte   0xda                     // Partition type ("Non-FS data" in fdisk)
        .byte   _partition_chs_head      // Last block (head, sector/cylinder, cylinder)
        .byte   _partition_chs_sector_byte
        .byte   _partition_chs_cylinder_byte
        .long   0                        // LBA of first sector
        .long   _partition_blocks        // Number of blocks in partition
        .org    0x1CE           // Partition 2 (Unused)
        .org    0x1DE           // Partition 3 (Unused)
        .org    0x1EE           // Partition 4 (Unused)
        .org    0x1FE           // Boot signature
        .byte   0x55, 0xAA      //   This marks the end of the 512-byte MBR.
--- a/lib/metalkit/boot.h
+++ b/lib/metalkit/boot.h
@ -0,0 +1,59 @@
 /* -*- Mode: C; c-basic-offset: 3 -*-
 *
 * boot.h - Definitions used by both the bootloader and
 *          the rest of the library. This file must be valid
 *          C and assembly.
 *
 * This file is part of Metalkit, a simple collection of modules for
 * writing software that runs on the bare metal. Get the latest code
 * at http://svn.navi.cx/misc/trunk/metalkit/
 *
 * Copyright (c) 2008-2009 Micah Dowty
 *
 * Permission is hereby granted, free of charge, to any person
 * obtaining a copy of this software and associated documentation
 * files (the "Software"), to deal in the Software without
 * restriction, including without limitation the rights to use,
 * copy, modify, merge, publish, distribute, sublicense, and/or sell
 * copies of the Software, and to permit persons to whom the
 * Software is furnished to do so, subject to the following
 * conditions:
 *
 * The above copyright notice and this permission notice shall be
 * included in all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
 * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
 * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
 * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
 * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
 * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
 * OTHER DEALINGS IN THE SOFTWARE.
 */
 #ifndef __BOOT_H__
 #define __BOOT_H__
 #define BOOT_NULL_SEG       0x00
 #define BOOT_CODE_SEG       0x08
 #define BOOT_DATA_SEG       0x10
 #define BOOT_CODE16_SEG     0x18
 #define BOOT_DATA16_SEG     0x20
 #define BOOT_LDT_SEG        0x28
 #define BOOT_LDT_ENTRIES    1024
 #define BOOT_LDT_SIZE       (BOOT_LDT_ENTRIES * 8)
 /* Unused real-mode-accessable scratch memory. */
 #define BOOT_REALMODE_SCRATCH   0x7C00
 /*
 * The bootloader defines an LDT table which can be modified
 * by C code, for loading segments dynamically.
 */
 #ifndef ASM
 extern unsigned char LDT[BOOT_LDT_SIZE];
 #endif
 #endif /* __BOOT_H__ */
--- a/lib/metalkit/console.c
+++ b/lib/metalkit/console.c
@ -0,0 +1,291 @@
 /* -*- Mode: C; c-basic-offset: 3 -*-
 *
 * console.c - Abstract text console
 *
 * This file is part of Metalkit, a simple collection of modules for
 * writing software that runs on the bare metal. Get the latest code
 * at http://svn.navi.cx/misc/trunk/metalkit/
 *
 * Copyright (c) 2008-2009 Micah Dowty
 *
 * Permission is hereby granted, free of charge, to any person
 * obtaining a copy of this software and associated documentation
 * files (the "Software"), to deal in the Software without
 * restriction, including without limitation the rights to use,
 * copy, modify, merge, publish, distribute, sublicense, and/or sell
 * copies of the Software, and to permit persons to whom the
 * Software is furnished to do so, subject to the following
 * conditions:
 *
 * The above copyright notice and this permission notice shall be
 * included in all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
 * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
 * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
 * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
 * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
 * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
 * OTHER DEALINGS IN THE SOFTWARE.
 */
 #include "types.h"
 #include "console.h"
 #include "intr.h"
 ConsoleInterface gConsole;
 /*
 * Console_WriteString --
 *
 *    Write a NUL-terminated string.
 */
 fastcall void
 Console_WriteString(const char *str)
 {
   char c;
   while ((c = *(str++))) {
      Console_WriteChar(c);
   }
 }
 /*
 * Console_WriteUInt32 --
 *
 *    Write a positive 32-bit integer with arbitrary base from 2 to
 *    16, up to 'digits' characters long. If 'padding' is non-NUL,
 *    this character is used for leading digits that would be zero.
 *    If padding is NUL, leading digits are suppressed entierly.
 */
 fastcall void
 Console_WriteUInt32(uint32 num, int digits, char padding, int base, Bool suppressZero)
 {
   if (digits == 0) {
      return;
   }
   Console_WriteUInt32(num / base, digits - 1, padding, base, TRUE);
   if (num == 0 && suppressZero) {
      if (padding) {
 	 Console_WriteChar(padding);
      }
   } else {
      uint8 digit = num % base;
      Console_WriteChar(digit >= 10 ? digit - 10 + 'A' : digit + '0');
   }
 }
 /*
 * Console_Format --
 * Console_FormatV --
 *
 *    Write a formatted string. This is for the most part a tiny
 *    subset of printf(). Supports the standard %c, %s, %d, %u,
 *    and %X specifiers.
 *
 *    Deviates from a standard printf() in a few ways, in the interest
 *    of low-level utility and small code size:
 *
 *     - Adds a nonstandard %b specifier, for binary numbers.
 *     - Width specifiers set an exact width, not a minimum width.
 *     - %x is treated as %X.
 */
 void
 Console_Format(const char *fmt, ...)
 {
   Console_FormatV(&fmt);
 }
 fastcall void
 Console_FormatV(const char **args)
 {
   char c;
   const char *fmt = *(args++);
   while ((c = *(fmt++))) {
      int width = 0;
      Bool isSigned = FALSE;
      char padding = '\0';
      if (c != '%') {
         Console_WriteChar(c);
         continue;
      }
      while ((c = *(fmt++))) {
         if (c == '0' && width == 0) {
            /* If we get a leading 0 in the width specifier, turn on zero-padding */
            padding = '0';
            continue;
         }
         if (c >= '0' && c <= '9') {
            /* Add another digit to the width specifier */
            width = (width * 10) + (c - '0');
            if (padding == '\0') {
               padding = ' ';
            }
            continue;
         }
         /*
          * Any other character means the width specifier has
          * ended. If it's still zero, set the defaults.
          */
         if (width == 0) {
            width = 32;
         }
         /*
          * Non-integer format specifiers
          */
         if (c == 's') {
            Console_WriteString((char*) *(args++));
            break;
         }
         if (c == 'c') {
            Console_WriteChar((char)(uint32) *(args++));
            break;
         }
         /*
          * Integers of different bases
          */
         int base = 0;
         if (c == 'X' || c == 'x') {
            base = 16;
         } else if (c == 'd') {
            base = 10;
            isSigned = TRUE;
         } else if (c == 'u') {
            base = 10;
         } else if (c == 'b') {
            base = 2;
         }
         if (base) {
            uint32 value = (uint32)*(args++);
            /*
             * Print the sign for negative numbers.
             */
            if (isSigned && 0 > (int32)value) {
               Console_WriteChar('-');
               width--;
               value = -value;
            }
            Console_WriteUInt32(value, width, padding, base, FALSE);
            break;
         }
         /* Unrecognized */
         Console_WriteChar(c);
         break;
      }
   }
 }
 /*
 * Console_HexDump --
 *
 *    Write a 32-bit hex dump to the console, labelling each
 *    line with addresses starting at 'startAddr'.
 */
 fastcall void
 Console_HexDump(uint32 *data, uint32 startAddr, uint32 numWords)
 {
   while (numWords) {
      int32 lineWords = 4;
      Console_Format("%08x:", startAddr);
      while (numWords && lineWords) {
         Console_Format(" %08x", *data);
         data++;
         startAddr += 4;
         numWords--;
         lineWords--;
      }
      Console_WriteChar('\n');
   }
 }
 /*
 * Console_UnhandledFault --
 *
 *    Display a fatal error message with register and stack trace when
 *    an unhandled fault occurs. This fault handler must be installed
 *    using the Intr module.
 */
 void
 Console_UnhandledFault(int vector)
 {
   IntrContext *ctx = Intr_GetContext(vector);
   /*
    * Using a regular inline string constant, the linker can't
    * optimize out this string when the function isn't used.
    */
   static const char faultFmt[] =
      "Fatal error:\n"
      "Unhandled fault %d at %04x:%08x\n"
      "\n"
      "eax=%08x ebx=%08x ecx=%08x edx=%08x\n"
      "esi=%08x edi=%08x esp=%08x ebp=%08x\n"
      "eflags=%032b\n"
      "\n";
   Console_BeginPanic();
   /*
    * IntrContext's stack pointer includes the three values that were
    * pushed by the hardware interrupt. Advance past these, so the
    * stack trace shows the state of execution at the time of the
    * fault rather than at the time our interrupt trampoline was
    * invoked.
    */
   ctx->esp += 3 * sizeof(int);
   Console_Format(faultFmt,
                  vector, ctx->cs, ctx->eip,
                  ctx->eax, ctx->ebx, ctx->ecx, ctx->edx,
                  ctx->esi, ctx->edi, ctx->esp, ctx->ebp,
                  ctx->eflags);
   Console_HexDump((void*)ctx->esp, ctx->esp, 64);
   Console_Flush();
   Intr_Disable();
   Intr_Halt();
 }
 /*
 * Console_Panic --
 *
 *    Default panic handler. Prints a caller-defined message, and
 *    halts the machine.
 */
 void
 Console_Panic(const char *fmt, ...)
 {
   Console_BeginPanic();
   Console_WriteString("Panic:\n");
   Console_FormatV(&fmt);
   Console_Flush();
   Intr_Disable();
   Intr_Halt();
 }
--- a/lib/metalkit/console.h
+++ b/lib/metalkit/console.h
@ -0,0 +1,63 @@
 /* -*- Mode: C; c-basic-offset: 3 -*-
 *
 * console.h - Abstract text console
 *
 * This file is part of Metalkit, a simple collection of modules for
 * writing software that runs on the bare metal. Get the latest code
 * at http://svn.navi.cx/misc/trunk/metalkit/
 *
 * Copyright (c) 2008-2009 Micah Dowty
 *
 * Permission is hereby granted, free of charge, to any person
 * obtaining a copy of this software and associated documentation
 * files (the "Software"), to deal in the Software without
 * restriction, including without limitation the rights to use,
 * copy, modify, merge, publish, distribute, sublicense, and/or sell
 * copies of the Software, and to permit persons to whom the
 * Software is furnished to do so, subject to the following
 * conditions:
 *
 * The above copyright notice and this permission notice shall be
 * included in all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
 * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
 * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
 * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
 * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
 * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
 * OTHER DEALINGS IN THE SOFTWARE.
 */
 #ifndef __CONSOLE_H__
 #define __CONSOLE_H__
 #include "types.h"
 typedef struct {
   fastcall void (*beginPanic)(void);       // Initialize the console for a Panic message
   fastcall void (*clear)(void);            // Clear the screen, home the cursor
   fastcall void (*moveTo)(int x, int y);   // Move the cursor
   fastcall void (*writeChar)(char c);      // Write one character, with support for control codes
   fastcall void (*flush)(void);            // Finish writing a string of characters
 } ConsoleInterface;
 extern ConsoleInterface gConsole;
 #define Console_BeginPanic()   gConsole.beginPanic()
 #define Console_Clear()        gConsole.clear()
 #define Console_MoveTo(x, y)   gConsole.moveTo(x, y)
 #define Console_WriteChar(c)   gConsole.writeChar(c)
 #define Console_Flush()        gConsole.flush()
 fastcall void Console_WriteString(const char *str);
 fastcall void Console_WriteUInt32(uint32 num, int digits, char padding, int base, Bool suppressZero);
 fastcall void Console_FormatV(const char **args);
 fastcall void Console_HexDump(uint32 *data, uint32 startAddr, uint32 numWords);
 void Console_Format(const char *fmt, ...);
 void Console_Panic(const char *str, ...);
 void Console_UnhandledFault(int number);
 #endif /* __CONSOLE_H__ */
--- a/lib/metalkit/console_vga.c
+++ b/lib/metalkit/console_vga.c
@ -0,0 +1,269 @@
 /* -*- Mode: C; c-basic-offset: 3 -*-
 *
 * console_vga.c - Console driver for VGA text mode.
 *
 * This file is part of Metalkit, a simple collection of modules for
 * writing software that runs on the bare metal. Get the latest code
 * at http://svn.navi.cx/misc/trunk/metalkit/
 *
 * Copyright (c) 2008-2009 Micah Dowty
 *
 * Permission is hereby granted, free of charge, to any person
 * obtaining a copy of this software and associated documentation
 * files (the "Software"), to deal in the Software without
 * restriction, including without limitation the rights to use,
 * copy, modify, merge, publish, distribute, sublicense, and/or sell
 * copies of the Software, and to permit persons to whom the
 * Software is furnished to do so, subject to the following
 * conditions:
 *
 * The above copyright notice and this permission notice shall be
 * included in all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
 * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
 * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
 * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
 * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
 * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
 * OTHER DEALINGS IN THE SOFTWARE.
 */
 #include "types.h"
 #include "console_vga.h"
 #include "io.h"
 #include "intr.h"
 #define VGA_TEXT_FRAMEBUFFER     ((uint8*)0xB8000)
 #define VGA_CRTCREG_CURSOR_LOC_HIGH  0x0E
 #define VGA_CRTCREG_CURSOR_LOC_LOW   0x0F
 typedef struct {
   uint16 crtc_iobase;
   struct {
      int8 x, y;
   } cursor;
   int8 attr;
 } ConsoleVGAObject;
 ConsoleVGAObject gConsoleVGA[1];
 /*
 * ConsoleVGAWriteCRTC --
 *
 *    Write to a VGA CRT Control register.
 */
 static fastcall void
 ConsoleVGAWriteCRTC(uint8 addr, uint8 value)
 {
   ConsoleVGAObject *self = gConsoleVGA;
   IO_Out8(self->crtc_iobase, addr);
   IO_Out8(self->crtc_iobase + 1, value);
 }
 /*
 * ConsoleVGAMoveHardwareCursor --
 *
 *    Set the hardware cursor to the current cursor position.
 */
 static fastcall void
 ConsoleVGAMoveHardwareCursor(void)
 {
   ConsoleVGAObject *self = gConsoleVGA;
   uint16 loc = self->cursor.x + self->cursor.y * VGA_TEXT_WIDTH;
   ConsoleVGAWriteCRTC(VGA_CRTCREG_CURSOR_LOC_LOW, loc & 0xFF);
   ConsoleVGAWriteCRTC(VGA_CRTCREG_CURSOR_LOC_HIGH, loc >> 8);
 }
 /*
 * ConsoleVGAMoveTo --
 *
 *    Set the text insertion point. This will move the hardware cursor
 *    at the next Console_Flush(). 
 */
 static fastcall void
 ConsoleVGAMoveTo(int x, int y)
 {
   ConsoleVGAObject *self = gConsoleVGA;
   self->cursor.x = x;
   self->cursor.y = y;
 }
 /*
 * ConsoleVGA_Clear --
 *
 *    Clear the screen and move the cursor to the home position.
 */
 static fastcall void
 ConsoleVGAClear(void)
 {
   ConsoleVGAObject *self = gConsoleVGA;
   uint8 *fb = VGA_TEXT_FRAMEBUFFER;
   int i, j;
   ConsoleVGAMoveTo(0, 0);
   for (j = 0; j < VGA_TEXT_HEIGHT; j++) {
      for (i = 0; i < VGA_TEXT_WIDTH; i++) {
         fb[0] = ' ';
         fb[1] = self->attr;
         fb += 2;
      }
   }
 }
 /*
 * ConsoleVGA_SetColor --
 *
 *    Set the text foreground color.
 */
 fastcall void
 ConsoleVGA_SetColor(int8 fgColor)
 {
   ConsoleVGAObject *self = gConsoleVGA;
   self->attr &= 0xF0;
   self->attr |= fgColor;
 }
 /*
 * ConsoleVGA_SetColor --
 *
 *    Set the text background color.
 */
 fastcall void
 ConsoleVGA_SetBgColor(int8 bgColor)
 {
   ConsoleVGAObject *self = gConsoleVGA;
   self->attr &= 0x0F;
   self->attr |= bgColor << 4;
 }
 /*
 * ConsoleVGAWriteChar --
 *
 *    Write one character, TTY-style. Interprets \n characters.
 */
 static fastcall void
 ConsoleVGAWriteChar(char c)
 {
   ConsoleVGAObject *self = gConsoleVGA;
   uint8 *fb = VGA_TEXT_FRAMEBUFFER;
   if (c == '\n') {
      self->cursor.y++;
      self->cursor.x = 0;
   } else if (c == '\t') {
      while (self->cursor.x & 7) {
         ConsoleVGAWriteChar(' ');
      }
   } else if (c == '\b') {
      if (self->cursor.x > 0) {
         self->cursor.x--;
         ConsoleVGAWriteChar(' ');
         self->cursor.x--;
      }
   } else {
      fb += self->cursor.x * 2 + self->cursor.y * VGA_TEXT_WIDTH * 2;
      fb[0] = c;
      fb[1] = self->attr;
      self->cursor.x++;
   }
   if (self->cursor.x >= VGA_TEXT_WIDTH) {
      self->cursor.x = 0;
      self->cursor.y++;
   }
   if (self->cursor.y >= VGA_TEXT_HEIGHT) {
      int i;
      uint8 *fb = VGA_TEXT_FRAMEBUFFER;
      const uint32 scrollSize = VGA_TEXT_WIDTH * 2 * (VGA_TEXT_HEIGHT - 1);
      self->cursor.y = VGA_TEXT_HEIGHT - 1;
      memcpy(fb, fb + VGA_TEXT_WIDTH * 2, scrollSize);
      fb += scrollSize;
      for (i = 0; i < VGA_TEXT_WIDTH; i++) {
         fb[0] = ' ';
         fb[1] = self->attr;
         fb += 2;
      }
   }
 }
 /*
 * ConsoleVGABeginPanic --
 *
 *    Prepare for a panic in VGA mode: Set up the panic colors,
 *    and clear the screen.
 */
 static fastcall void
 ConsoleVGABeginPanic(void)
 {
   ConsoleVGA_SetColor(VGA_COLOR_WHITE);
   ConsoleVGA_SetBgColor(VGA_COLOR_RED);
   ConsoleVGAClear();
   ConsoleVGAMoveHardwareCursor();
 }
 /*
 * ConsoleVGA_Init --
 *
 *    Perform first-time initialization for VGA text mode,
 *    set VGA as the current console driver, and clear the
 *    screen with a default color.
 */
 fastcall void
 ConsoleVGA_Init(void)
 {
   ConsoleVGAObject *self = gConsoleVGA;
   /*
    * Read the I/O address select bit, to determine where the CRTC
    * registers are.
    */
   if (IO_In8(0x3CC) & 1) {
      self->crtc_iobase = 0x3D4;
   } else {
      self->crtc_iobase = 0x3B4;
   }
   gConsole.beginPanic = ConsoleVGABeginPanic;
   gConsole.clear = ConsoleVGAClear;
   gConsole.moveTo = ConsoleVGAMoveTo;
   gConsole.writeChar = ConsoleVGAWriteChar;
   gConsole.flush = ConsoleVGAMoveHardwareCursor;
   ConsoleVGA_SetColor(VGA_COLOR_WHITE);
   ConsoleVGA_SetBgColor(VGA_COLOR_BLUE);
   ConsoleVGAClear();
   ConsoleVGAMoveHardwareCursor();
 }
--- a/lib/metalkit/console_vga.h
+++ b/lib/metalkit/console_vga.h
@ -0,0 +1,63 @@
 /* -*- Mode: C; c-basic-offset: 3 -*-
 *
 * console_vga.h - Console driver for VGA text mode.
 *
 * This file is part of Metalkit, a simple collection of modules for
 * writing software that runs on the bare metal. Get the latest code
 * at http://svn.navi.cx/misc/trunk/metalkit/
 *
 * Copyright (c) 2008-2009 Micah Dowty
 *
 * Permission is hereby granted, free of charge, to any person
 * obtaining a copy of this software and associated documentation
 * files (the "Software"), to deal in the Software without
 * restriction, including without limitation the rights to use,
 * copy, modify, merge, publish, distribute, sublicense, and/or sell
 * copies of the Software, and to permit persons to whom the
 * Software is furnished to do so, subject to the following
 * conditions:
 *
 * The above copyright notice and this permission notice shall be
 * included in all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
 * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
 * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
 * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
 * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
 * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
 * OTHER DEALINGS IN THE SOFTWARE.
 */
 #ifndef __CONSOLE_VGA_H__
 #define __CONSOLE_VGA_H__
 #include "types.h"
 #include "console.h"
 #define VGA_COLOR_BLACK          0
 #define VGA_COLOR_BLUE           1
 #define VGA_COLOR_GREEN          2
 #define VGA_COLOR_CYAN           3
 #define VGA_COLOR_RED            4
 #define VGA_COLOR_MAGENTA        5
 #define VGA_COLOR_BROWN          6
 #define VGA_COLOR_LIGHT_GRAY     7
 #define VGA_COLOR_DARK_GRAY      8
 #define VGA_COLOR_LIGHT_BLUE     9
 #define VGA_COLOR_LIGHT_GREEN    10
 #define VGA_COLOR_LIGHT_CYAN     11
 #define VGA_COLOR_LIGHT_RED      12
 #define VGA_COLOR_LIGHT_MAGENTA  13
 #define VGA_COLOR_YELLOW         14
 #define VGA_COLOR_WHITE          15
 #define VGA_TEXT_WIDTH           80
 #define VGA_TEXT_HEIGHT          25
 fastcall void ConsoleVGA_Init(void);
 fastcall void ConsoleVGA_SetColor(int8 fgColor);
 fastcall void ConsoleVGA_SetBgColor(int8 bgColor);
 #endif /* __CONSOLE_VGA_H__ */
--- a/lib/metalkit/datafile.h
+++ b/lib/metalkit/datafile.h
@ -0,0 +1,71 @@
 /* -*- Mode: C; c-basic-offset: 3 -*-
 *
 * datafile.h - Macros for using raw data files included via objcopy.
 *
 * This file is part of Metalkit, a simple collection of modules for
 * writing software that runs on the bare metal. Get the latest code
 * at http://svn.navi.cx/misc/trunk/metalkit/
 *
 * Copyright (c) 2008-2009 Micah Dowty
 *
 * Permission is hereby granted, free of charge, to any person
 * obtaining a copy of this software and associated documentation
 * files (the "Software"), to deal in the Software without
 * restriction, including without limitation the rights to use,
 * copy, modify, merge, publish, distribute, sublicense, and/or sell
 * copies of the Software, and to permit persons to whom the
 * Software is furnished to do so, subject to the following
 * conditions:
 *
 * The above copyright notice and this permission notice shall be
 * included in all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
 * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
 * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
 * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
 * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
 * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
 * OTHER DEALINGS IN THE SOFTWARE.
 */
 #ifndef __DATAFILE_H__
 #define __DATAFILE_H__
 #include "types.h"
 #include "puff.h"
 typedef struct DataFile {
   uint8 *ptr;
   uint32 size;
 } DataFile;
 #define DECLARE_DATAFILE(symbol, filename)          \
   extern uint8 _binary_ ## filename ## _start[];   \
   extern uint8 _binary_ ## filename ## _size[];    \
   static const DataFile symbol[1] = {{             \
      (uint8*) _binary_ ## filename ## _start,      \
      (uint32) _binary_ ## filename ## _size,       \
   }}
 static inline uint32
 DataFile_Decompress(const DataFile *f, void *buffer, uint32 bufferSize)
 {
   unsigned long sourcelen = f->size;
   unsigned long destlen = bufferSize;
   if (puff(buffer, &destlen, f->ptr, &sourcelen)) {
      asm volatile ("int3");
   }
   return destlen;
 }
 static inline uint32
 DataFile_GetDecompressedSize(const DataFile *f)
 {
   return DataFile_Decompress(f, NULL, 0);
 }
 #endif /* __DATAFILE_H_ */
--- a/lib/metalkit/deflate.py
+++ b/lib/metalkit/deflate.py
@ -0,0 +1,25 @@
 #!/usr/bin/env python
 #
 # A simple Python script to compress data
 # with zlib's DEFLATE algorithm at build time.
 #
 import zlib, sys
 level = 9
 input = sys.stdin.read()
 zData = zlib.compress(input, level)
 # Strip off the zlib header, and return the raw DEFLATE data stream.
 # See the zlib RFC: http://www.gzip.org/zlib/rfc-zlib.html
 cmf = ord(zData[0])
 flg = ord(zData[1])
 assert (cmf & 0x0F) == 8   # DEFLATE algorithm
 assert (flg & 0x20) == 0   # No preset dictionary
 # Strip off 2-byte header and 4-byte checksum
 rawData = zData[2:len(zData)-4]
 sys.stdout.write(rawData)
--- a/lib/metalkit/gcc_support.c
+++ b/lib/metalkit/gcc_support.c
@ -0,0 +1,48 @@
 /* -*- Mode: C; c-basic-offset: 3 -*-
 *
 * gcc_support.c - Older versions of GCC will call functions for
 *                 common operations like memcpy/memset instead of
 *                 using compiler intrinsics. This file provides
 *                 non-inlined memcpy/memset functions for this
 *                 purpose, and it's a good place to put any other
 *                 compiler-specific functionality.
 *
 * This file is part of Metalkit, a simple collection of modules for
 * writing software that runs on the bare metal. Get the latest code
 * at http://svn.navi.cx/misc/trunk/metalkit/
 *
 * Copyright (c) 2008-2009 Micah Dowty
 *
 * Permission is hereby granted, free of charge, to any person
 * obtaining a copy of this software and associated documentation
 * files (the "Software"), to deal in the Software without
 * restriction, including without limitation the rights to use,
 * copy, modify, merge, publish, distribute, sublicense, and/or sell
 * copies of the Software, and to permit persons to whom the
 * Software is furnished to do so, subject to the following
 * conditions:
 *
 * The above copyright notice and this permission notice shall be
 * included in all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
 * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
 * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
 * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
 * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
 * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
 * OTHER DEALINGS IN THE SOFTWARE.
 */
 void
 memcpy(void *dest, const void *src, unsigned long size)
 {
   asm volatile ("cld; rep movsb" : "+c" (size), "+S" (src), "+D" (dest) :: "memory");
 }
 void
 memset(void *dest, unsigned char value, unsigned long size)
 {
   asm volatile ("cld; rep stosb" : "+c" (size), "+D" (dest) : "a" (value) : "memory");
 }
--- a/lib/metalkit/image.ld
+++ b/lib/metalkit/image.ld
@ -0,0 +1,110 @@
 /*
 * GNU Linker script for assembling a Metalkit binary image.
 *
 * Notable changes from ld's default behaviour:
 *
 *   - Load address is at the 1MB boundary.
 *
 *   - Our binary begins with a .boot section.
 *
 *   - The end of the data section is padded to a
 *     512-byte boundary, to make sure that our disk
 *     image ends on a sector boundary. (Required by QEMU)
 *
 *   - We calculate a few auxiliary values used by the
 *     bootloader, which depend on knowing the size of
 *     the entire binary.
 */
 OUTPUT_FORMAT("elf32-i386", "elf32-i386", "elf32-i386")
 OUTPUT_ARCH(i386)
 ENTRY(_start)
 /*
 * Stack starts at the top of the usable portion of the first 1MB, and
 * grows downward.
 */
 _stack = 0x9fffc;
 SECTIONS
 {
   . = 0x100000;
   .text : {
      _file_origin = .;
      *(.boot);
      *(.text .text.*);
    }
   .data : {
      *(.rodata .rodata.* .data .data.*)
      _edata = .;
      _sector_padding = .;
      . = ALIGN(512);
      _sector_padding_end = .;
   }
   .bss : {
      __bss_start = .;
      *(.bss .bss.*);
   }
   _end = .;
   /DISCARD/ : {
      *(.note .note.* .comment .comment.*);
   }
 }
 _bss_size = _end - _edata;
 _image_size = _edata - _file_origin;
 /*
 * Disk geometry. CHS geometry is mostly irrelevant these days, so we
 * just pick something that will make fdisk happy. It tries to
 * autodetect the disk size by looking at the disk's existing
 * partitions, so the easiest way to keep it happy is to align the
 * partition to a cylinder boundary.
 *
 * We'd like to use a floppy-disk-compatible geometry for images that
 * are small enough to fit on a 1.44 MB disk, but for larger images we
 * need to use a bigger geometry so that our cylinder numbers can fit
 * in 10 bits. This larger geometry has 1 megabyte cylinders, so we
 * can address 1 GB without breaking the 10 bit boundary.
 */
 _geom_large_disk = _image_size >= (2880 * 512);
 _geom_sectors_per_head = _geom_large_disk ? 32 : 18;
 _geom_heads_per_cylinder = _geom_large_disk ? 64 : 2;
 _geom_sectors_per_cylinder = _geom_sectors_per_head * _geom_heads_per_cylinder;
 /*
 * Partition is just big enough to hold our initialized data, rounded
 * up to the nearest cylinder. The "_partition_chs_cylinder" is the
 * number of the last cylinder in the partition. Also note that
 * sector numbers are 1-based.
 */
 _image_sectors = (_image_size + 511) / 512;
 _partition_chs_cylinder = _image_sectors / _geom_sectors_per_cylinder;
 _partition_blocks = (_partition_chs_cylinder + 1) * _geom_sectors_per_cylinder;
 _partition_chs_head = _geom_heads_per_cylinder - 1;
 _partition_chs_sector = _geom_sectors_per_head;
 /*
 * Encode the sector and cylinder bytes in the format expected by MBR
 * partition tables.
 */
 _partition_chs_cylinder_byte = _partition_chs_cylinder & 0xff;
 _partition_chs_sector_byte = _partition_chs_sector |
   ((_partition_chs_cylinder - _partition_chs_cylinder_byte) >> 2);
 /*
 * Split up the LDT address into byte-wide chunks, so we can write it
 * into the GDT at link time. We can't do this entirely in boot.S,
 * because the LDT address isn't contiguous in the GDT.
 */
 _ldt_byte0 = (LDT >> 0) & 0xff;
 _ldt_byte1 = (LDT >> 8) & 0xff;
 _ldt_byte2 = (LDT >> 16) & 0xff;
 _ldt_byte3 = (LDT >> 24) & 0xff;
--- a/lib/metalkit/intr.c
+++ b/lib/metalkit/intr.c
@ -0,0 +1,419 @@
 /* -*- Mode: C; c-basic-offset: 3 -*-
 *
 * intr.c - Interrupt vector management, interrupt routing,
 *          and low-level building blocks for multithreading.
 *
 * This file is part of Metalkit, a simple collection of modules for
 * writing software that runs on the bare metal. Get the latest code
 * at http://svn.navi.cx/misc/trunk/metalkit/
 *
 * Copyright (c) 2008-2009 Micah Dowty
 *
 * Permission is hereby granted, free of charge, to any person
 * obtaining a copy of this software and associated documentation
 * files (the "Software"), to deal in the Software without
 * restriction, including without limitation the rights to use,
 * copy, modify, merge, publish, distribute, sublicense, and/or sell
 * copies of the Software, and to permit persons to whom the
 * Software is furnished to do so, subject to the following
 * conditions:
 *
 * The above copyright notice and this permission notice shall be
 * included in all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
 * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
 * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
 * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
 * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
 * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
 * OTHER DEALINGS IN THE SOFTWARE.
 */
 #include "intr.h"
 #include "boot.h"
 #include "io.h"
 /*
 * Definitions for the two PIC chips.
 */
 #define PIC1_COMMAND_PORT  0x20
 #define PIC1_DATA_PORT     0x21
 #define PIC2_COMMAND_PORT  0xA0
 #define PIC2_DATA_PORT     0xA1
 /*
 * IDT table and IDT table descriptor. The table itself lives in the
 * BSS segment, the descriptor lives in the data segment.
 */
 typedef union {
   struct {
      uint16 offsetLow;
      uint16 segment;
      uint16 flags;
      uint16 offsetHigh;
   };
   struct {
      uint32 offsetLowSeg;
      uint32 flagsOffsetHigh;
   };
 } PACKED IDTType;
 /*
 * Note the IDT is page-aligned. Only 8-byte alignment is actually
 * necessary, though page alignment may help performance in some
 * environments.
 */
 static IDTType ALIGNED(4096) IDT[NUM_INTR_VECTORS];
 const struct {
   uint16 limit;
   void *address;
 } PACKED IDTDesc = {
   .limit = NUM_INTR_VECTORS * 8 - 1,
   .address = IDT,
 };
 /*
 * To save space, we don't include assembly-language trampolines for
 * each interrupt vector. Instead, we allocate a table in the BSS
 * segment which we can fill in at runtime with simple trampoline
 * functions. This structure actually describes executable 32-bit
 * code.
 */
 typedef struct {
   uint16      code1;
   uint32      arg;
   uint8       code2;
   IntrHandler handler;
   uint32      code3;
   uint32      code4;
   uint32      code5;
   uint32      code6;
   uint32      code7;
   uint32      code8;
 } PACKED IntrTrampolineType;
 static IntrTrampolineType ALIGNED(4) IntrTrampoline[NUM_INTR_VECTORS];
 /*
 * IntrDefaultHandler --
 *
 *    Default no-op interrupt handler.
 */
 static void
 IntrDefaultHandler(int vector)
 {
   /* Do nothing. */
 }
 /*
 * Intr_Init --
 *
 *    Initialize the interrupt descriptor table and the programmable
 *    interrupt controller (PIC). On return, interrupts are enabled
 *    but all handlers are no-ops.
 */
 fastcall void
 Intr_Init(void)
 {
   int i;
   Intr_Disable();
   IDTType *idt = IDT;
   IntrTrampolineType *tramp = IntrTrampoline;
   for (i = 0; i < NUM_INTR_VECTORS; i++) {
      uint32 trampolineAddr = (uint32) tramp;
      /*
       * Set up the IDT entry as a 32-bit interrupt gate, pointing at
       * our trampoline for this vector. Fill in the IDT with two 32-bit
       * writes, since GCC generates significantly smaller code for this
       * than when writing four 16-bit fields separately.
       */
      idt->offsetLowSeg = (trampolineAddr & 0x0000FFFF) | (BOOT_CODE_SEG << 16);
      idt->flagsOffsetHigh = (trampolineAddr & 0xFFFF0000) | 0x00008E00;
      /*
       * Set up the trampoline, pointing it at the default handler.
       * The trampoline function wraps our C interrupt handler, and
       * handles placing a vector number onto the stack. It also allows
       * interrupt handlers to switch stacks upon return by writing
       * to the saved 'esp' register.
       *
       * Note that the old stack and new stack may actually be different
       * stack frames on the same stack. We require that the new stack
       * is in a higher or equal stack frame, but the two stacks may
       * overlap. This is why the trampoline does its copy in reverse.
       *
       * Keep the trampoline function consistent with the definition
       * of IntrContext in intr.h.
       *
       * Stack layout:
       *
       *     8   eflags
       *     4   cs
       *     0   eip        <- esp on entry to IRQ handler
       *    -4   eax
       *    -8   ecx
       *   -12   edx
       *   -16   ebx
       *   -20   esp
       *   -24   ebp
       *   -28   esi
       *   -32   edi
       *   -36   <arg>      <- esp on entry to handler function
       *
       * Our trampolines each look like:
       *
       *    60                 pusha                   // Save general-purpose regs
       *    68 <32-bit arg>    push   <arg>            // Call handler(arg)
       *    b8 <32-bit addr>   mov    <addr>, %eax
       *    ff d0              call   *%eax
       *    58                 pop    %eax             // Remove arg from stack
       *    8b 7c 24 0c        mov    12(%esp), %edi   // Load new stack address
       *    8d 74 24 28        lea    40(%esp), %esi   // Addr of eflags on old stack
       *    83 c7 08           add    $8, %edi         // Addr of eflags on new stack
       *    fd                 std                     // Copy backwards
       *    a5                 movsl                   // Copy eflags
       *    a5                 movsl                   // Copy cs
       *    a5                 movsl                   // Copy eip
       *    61                 popa                    // Restore general-purpose regs
       *    8b 64 24 ec        mov    -20(%esp), %esp  // Switch stacks
       *    cf                 iret                    // Restore eip, cs, eflags
       *
       * Note: Surprisingly enough, it's actually more size-efficient to initialize
       * the structure in code like this than it is to memcpy() the trampoline from
       * a template in the data segment.
       */
      tramp->code1 = 0x6860;
      tramp->code2 = 0xb8;
      tramp->code3 = 0x8b58d0ff;
      tramp->code4 = 0x8d0c247c;
      tramp->code5 = 0x83282474;
      tramp->code6 = 0xa5fd08c7;
      tramp->code7 = 0x8b61a5a5;
      tramp->code8 = 0xcfec2464;
      tramp->handler = IntrDefaultHandler;
      tramp->arg = i;
      idt++;
      tramp++;
   }
   asm volatile ("lidt IDTDesc");
   typedef struct {
      uint8 port, data;
   } PortData8;
   static const PortData8 pitInit[] = {
      /*
       * Program the PIT to map all IRQs linearly starting at
       * IRQ_VECTOR_BASE.
       */
      { PIC1_COMMAND_PORT, 0x11 },       // Begin init, use 4 command words
      { PIC2_COMMAND_PORT, 0x11 },
      { PIC1_DATA_PORT, IRQ_VECTOR_BASE },
      { PIC2_DATA_PORT, IRQ_VECTOR_BASE + 8 },
      { PIC1_DATA_PORT, 0x04 },
      { PIC2_DATA_PORT, 0x02 },
      { PIC1_DATA_PORT, 0x03 },          // 8086 mode, auto-end-of-interrupt.
      { PIC2_DATA_PORT, 0x03 },
      /*
       * All IRQs start out masked, except for the cascade IRQs 2 and 4.
       */
      { PIC1_DATA_PORT, 0xEB },
      { PIC2_DATA_PORT, 0xFF },
   };
   const PortData8 *p = pitInit;
   for (i = arraysize(pitInit); i; i--, p++) {
      IO_Out8(p->port, p->data);
   }
   Intr_Enable();
 }
 /*
 * Intr_SetHandler --
 *
 *    Set a C-language interrupt handler for a particular vector.
 *    Note that the argument is a vector number, not an IRQ.
 */
 fastcall void
 Intr_SetHandler(int vector, IntrHandler handler)
 {
   IntrTrampoline[vector].handler = handler;
 }
 /*
 * Intr_SetMask --
 *
 *    (Un)mask a particular IRQ.
 */
 fastcall void
 Intr_SetMask(int irq, Bool enable)
 {
   uint8 port, bit, mask;
   if (irq >= 8) {
      bit = 1 << (irq - 8);
      port = PIC2_DATA_PORT;
   } else {
      bit = 1 << irq;
      port = PIC1_DATA_PORT;
   }
   mask = IO_In8(port);
   /* A '1' bit in the mask inhibits the interrupt. */
   if (enable) {
      mask &= ~bit;
   } else {
      mask |= bit;
   }
   IO_Out8(port, mask);
 }
 /*
 * Intr_SetFaultHandlers --
 *
 *    Set all processor fault handlers to the provided function.
 */
 fastcall void
 Intr_SetFaultHandlers(IntrHandler handler)
 {
   int vector;
   for (vector = 0; vector < NUM_FAULT_VECTORS; vector++) {
      Intr_SetHandler(vector, handler);
   }
 }
 /*
 * Intr_InitContext --
 *
 *    Create an IntrContext representing a brand new thread of
 *    execution. This can be used as a primitive to implement
 *    light-weight cooperative or pre-emptive multithreading.
 *
 *    'Stack' points to the initial value of the stack pointer.
 *    Stacks grow downward, so this should point to the top word of
 *    the allocated stack memory.
 */
 fastcall void
 Intr_InitContext(IntrContext *ctx, uint32 *stack, IntrContextFn main)
 {
   Intr_SaveContext(ctx);
   ctx->esp = (uint32) stack;
   ctx->eip = (uint32) main;
 }
 /*
 * Intr_SaveContext --
 *
 *    This is a C-callable function which constructs an
 *    IntrContext representing the current execution
 *    context. This is nearly equivalent to invoking
 *    software interrupt and saving the interrupt's
 *    IntrContext, but this implementation doesn't have the
 *    overhead of an actual interrupt invocation.
 */
 asm (".global Intr_SaveContext \n Intr_SaveContext:"
     "pusha \n"
     /*
      * Adjust the saved stack pointer. IntrContexts always
      * store an %esp which has three words on the stack
      * prior to the general-purpose regs, but since we don't
      * use cs or eflags we only have 1.
      */
     "sub     $8, 12(%esp) \n"
     /*
      * The stack now matches the layout of the first 9 words
      * of IntrContext. Copy these, then manually save CS and
      * eflags.
      */
     "mov     %esp, %esi \n"
     "mov     36(%esp), %edi \n"
     "mov     $9, %ecx \n"
     "rep movsl \n"
     "xor     %eax, %eax \n"
     "mov     %cs, %ax \n"
     "stosl \n"
     "pushf \n"
     "pop     %eax \n"
     "stosl \n"
     /* Return 0 when this function is called directly. */
     "popa \n"
     "xor     %eax, %eax \n"
     "ret" );
 /*
 * Intr_RestoreContext --
 *
 *    This is the inverse of Intr_SaveContext: copy the
 *    IntrContext onto the target context's stack frame,
 *    switch stacks, then restore the rest of the context's
 *    saved state.
 */
 asm(".global Intr_RestoreContext \n Intr_RestoreContext:"
    "mov     4(%esp), %esi \n"   // Load pointer to IntrContext
    "mov     12(%esi), %esp \n"  // Switch stacks
    /*
     * esp was saved with 3 words on the stack (eip, cs, cflags).
     * Position esp so we have 9 words instead. (General purpose
     * regs plus eip, but no cs/eflags.)
     */
    "sub     $24, %esp \n"
    // Copy the first 9 words of Intrcontext back onto the stack.
    "mov     %esp, %edi \n"
    "mov     $9, %ecx \n"
    "rep movsl \n"
    // Restore the general purpose regs and eip
    "popa \n"
    "ret" );
--- a/lib/metalkit/intr.h
+++ b/lib/metalkit/intr.h
@ -0,0 +1,176 @@
 /* -*- Mode: C; c-basic-offset: 3 -*-
 *
 * intr.h - Interrupt vector management and interrupt routing.
 *
 * This file is part of Metalkit, a simple collection of modules for
 * writing software that runs on the bare metal. Get the latest code
 * at http://svn.navi.cx/misc/trunk/metalkit/
 *
 * Copyright (c) 2008-2009 Micah Dowty
 *
 * Permission is hereby granted, free of charge, to any person
 * obtaining a copy of this software and associated documentation
 * files (the "Software"), to deal in the Software without
 * restriction, including without limitation the rights to use,
 * copy, modify, merge, publish, distribute, sublicense, and/or sell
 * copies of the Software, and to permit persons to whom the
 * Software is furnished to do so, subject to the following
 * conditions:
 *
 * The above copyright notice and this permission notice shall be
 * included in all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
 * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
 * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
 * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
 * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
 * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
 * OTHER DEALINGS IN THE SOFTWARE.
 */
 #ifndef __INTR_H__
 #define __INTR_H__
 #include "types.h"
 #define NUM_INTR_VECTORS    256
 #define NUM_FAULT_VECTORS   0x20
 #define NUM_IRQ_VECTORS     0x10
 #define IRQ_VECTOR_BASE     NUM_FAULT_VECTORS
 #define IRQ_VECTOR(irq)     ((irq) + IRQ_VECTOR_BASE)
 #define USER_VECTOR_BASE    (IRQ_VECTOR_BASE + NUM_IRQ_VECTORS)
 #define USER_VECTOR(n)      ((n) + USER_VECTOR_BASE)
 #define IRQ_TIMER           0
 #define IRQ_KEYBOARD        1
 #define FAULT_DE            0x00    // Divide error
 #define FAULT_NMI           0x02    // Non-maskable interrupt
 #define FAULT_BP            0x03    // Breakpoint
 #define FAULT_OF            0x04    // Overflow
 #define FAULT_BR            0x05    // Bound range
 #define FAULT_UD            0x06    // Undefined opcode
 #define FAULT_NM            0x07    // No FPU
 #define FAULT_DF            0x08    // Double Fault
 #define FAULT_TS            0x0A    // Invalid TSS
 #define FAULT_NP            0x0B    // Segment not present
 #define FAULT_SS            0x0C    // Stack-segment fault
 #define FAULT_GP            0x0D    // General Protection Fault
 #define FAULT_PF            0x0E    // Page fault
 #define FAULT_MF            0x10    // Math fault
 #define FAULT_AC            0x11    // Alignment check
 #define FAULT_MC            0x12    // Machine check
 #define FAULT_XM            0x13    // SIMD floating point exception
 typedef void (*IntrHandler)(int vector);
 typedef void (*IntrContextFn)(void);
 fastcall void Intr_Init(void);
 fastcall void Intr_SetFaultHandlers(IntrHandler handler);
 fastcall void Intr_SetHandler(int vector, IntrHandler handler);
 fastcall void Intr_SetMask(int irq, Bool enable);
 static inline void
 Intr_Enable(void) {
   asm volatile ("sti");
 }
 static inline void
 Intr_Disable(void) {
   asm volatile ("cli");
 }
 static inline Bool
 Intr_Save(void) {
   uint32 eflags;
   asm volatile ("pushf; pop %0" : "=r" (eflags));
   return (eflags & 0x200) != 0;
 }
 static inline void
 Intr_Restore(Bool flag) {
   if (flag) {
      Intr_Enable();
   } else {
      Intr_Disable();
   }
 }
 static inline void
 Intr_Halt(void) {
   asm volatile ("hlt");
 }
 static inline void
 Intr_Break(void) {
   asm volatile ("int3");
 }
 /*
 * This structure describes all execution state that's saved when an
 * interrupt or a setjmp occurs. In the case of an interrupt, this
 * structure actually describes the stack frame of the interrupt
 * trampoline.
 *
 * An interrupt handler can get a pointer to its IntrContext by
 * passing its first argument to the Intr_GetContext macro. This
 * allows an interrupt handler to examine the execution context in
 * which the interrupt occurred, to modify the interrupt's return
 * address, or even to implement input and output for OS traps.
 *
 * This module also provides functions for directly saving and
 * restoring IntrContext structures. This can be used much like
 * setjmp/longjmp, or it can even be used for simple cooperative or
 * preemptive multithreading. An interrupt handler can perform a
 * context switch by overwriting its IntrContext with a saved context.
 *
 * The definition of this structure must be kept in sync with the
 * machine code in our interrupt trampolines, and with the
 * assembly-language implementation of SaveContext and RestoreContext.
 */
 typedef struct IntrContext {
   /*
    * General purpose registers. These are all saved after the value
    * of %esp is captured.
    */
   uint32  edi;
   uint32  esi;
   uint32  ebp;
   uint32  esp;
   uint32  ebx;
   uint32  edx;
   uint32  ecx;
   uint32  eax;
   /*
    * These values are save by the CPU during an interrupt.  By
    * convention, these values are at the top of the stack when %esp
    * was saved.
    *
    * The values of cs and eflags are ignored by Intr_SaveContext
    * and Intr_RestoreContext.
    */
   uint32  eip;
   uint32  cs;
   uint32  eflags;
 } IntrContext;
 /*
 * Always use the 'volatile' keyword when storing the result
 * of Intr_GetContext. GCC can erroneously decide to optimize
 * out any copies to this pointer, because it doesn't know the
 * values will be used by our trampoline.
 */
 #define Intr_GetContext(arg)  ((IntrContext*) &(&arg)[1])
 uint32 Intr_SaveContext(IntrContext *ctx);
 void Intr_RestoreContext(IntrContext *ctx);
 fastcall void Intr_InitContext(IntrContext *ctx, uint32 *stack, IntrContextFn main);
 #endif /* __INTR_H__ */
--- a/Show more
+++ b/Show more