Copyright (C) 1999-2009 VMware, Inc. All Rights Reserved VMware SVGA Device Interface and Programming Model -------------------------------------------------- Revision 3, 2009-04-12 Table of Contents: 1. Introduction 2. Examples and Reference Implementation 3. Virtual Hardware Overview 4. 2D Graphics Model 5. 3D Graphics Model 6. Overview of SVGA3D features 7. Programming the VMware SVGA Device XXX - Todo ---------- This document does not yet describe the 3D hardware in great detail. It is an architectural overview. See the accompanying sample and reference code for details. Section (7) is biased toward describing much older features of the virtual hardware. Many new capability flags and FIFO commands have been added, and these are sparsely documented in svga_reg.h. 1. Introduction ---------------- This document describes the virtual graphics adapter interface which is implemented by VMware products. The VMware SVGA Device is a virtual PCI video card. It does not directly correspond to any real video card, but it serves as an interface for exposing accelerated graphics capabilities to virtual machines in a hardware-independent way. In its simplest form, the VMware SVGA Device can be used as a basic memory-mapped framebuffer. In this mode, the main advantage of VMware's SVGA device over alternatives like VBE is that the virtual machine can explicitly indicate which ares of the screen have changed by sending update rectangles through the device's command FIFO. This allows VMware products to avoid reading areas of the framebuffer which haven't been modified by the virtualized OS. The VMware SVGA device also supports several advanced features: - Accelerated video overlays - 2D acceleration - Synchronization primitives - DMA transfers - Device-independent 3D acceleration, with shaders - Multiple monitors - Desktop resizing 2. Examples and Reference Implementation ----------------------------------------- This document is not yet complete, in that it doesn't describe the entire SVGA device interface in detail. It is an architectural overview of the entire device, as well as an introduction to a few basic areas of programming for the device. For deeper details, see the attached example code. The "examples" directory contains individual example applications which show various features in action. The "lib" directory contains support code for the example applications. Some of this support code is designed to act as a reference implementation. For example, the process of writing to the command FIFO safely and efficiently is very complicated. The attached reference implementation is a must-read for anyone attempting to write their own driver for the SVGA device. For simplicity and OS-neutrality, all examples compile to floppy disk images which execute "on the bare metal" in a VMware virtual machine. There are no run-time dependencies. At compile-time, most of the examples only require a GNU toolchain (GCC and Binutils). Some of the examples require Python at compile-time. Each example will generate a .vmx virtual machine configuration file which can be used to boot it in VMware Workstation or Fusion. The included example code focuses on advanced features of the SVGA device, such as 3D and synchronization primitives. There are also a couple examples that demonstrate 3D graphics and video overlays. For more examples of basic 2D usage, the Xorg driver is also a good reference. Header files and reference implementation files in 'lib': * svga_reg.h SVGA register definitions, SVGA capabilities, and FIFO command definitions. * svga_overlay.h Definitions required to use the SVGA device's hardware video overlay support. * svga_escape.h A list of definitions for the SVGA Escape commands, a way to send arbitrary data over the SVGA command FIFO. Escapes are used for video overlays, for vendor-specific extensions to the SVGA device, and for various tools internal to VMware. * svga3d_reg.h Defines the SVGA3D protocol, the set of FIFO commands used for hardware 3D acceleration. * svga3d_shaderdefs.h Defines the bytecode format for SVGA3D shaders. This is used for accelerated 3D with programmable vertex and pixel pipelines. * svga.c Reference implementation of low-level SVGA device functionality. This contains sample code for device initialization, safe and efficient FIFO writes, and various synchronization primitives. * svga3d.c Reference implementation for the SVGA3D protocol. This file uses the FIFO primitives in svga.c to speak the SVGA3D protocol. Includes a lot of in-line documentation. * svga3dutil.c This is a collection of high-level utilities which provide usage examples for svga3d.c and svga.c, and which demonstrate common SVGA3D idioms. 3. Virtual Hardware Overview ----------------------------- The VMware SVGA Device is a virtual PCI device. It provides the following low-level hardware features, which are used to implement various feature-specific protocols for 2D, 3D, and video overlays: * I/O space, at PCI Base Address Register 0 (BAR0) There are only a few I/O ports. Besides the ports used to access registers, these are generally either legacy features, or they are for I/O which is performance critical but may have side-effects. (Such as clearing IRQs after they occur.) * Registers, accessed indirectly via INDEX and VALUE I/O ports. The device's register space is the principal method by which configuration takes place. In general, registers are for actions which may have side-effects and which take place synchronously with the CPU. * Guest Framebuffer (BAR1) The SVGA device itself owns a variable amount of "framebuffer" memory, up to a maximum of 128MB. This memory size is fixed at power-on. The memory exists outside of the virtual machine's "main memory", and it's mapped into PCI space via BAR1. The size of this framebuffer may be determined either by probing BAR1 in typical PCI fashion, or by reading SVGA_REG_FB_SIZE. The beginning of framebuffer memory is reserved for the 2D framebuffer. The rest of framebuffer memory may be used as buffer space for DMA operations. * Command FIFO (BAR2) The SVGA device can be thought of as a co-processor which executes commands asynchronously with the virtual machine's CPU. To enqueue commands for this coprocessor, the SVGA device uses another device-owned memory region which is mapped into PCI space. The command FIFO is usually much smaller than the framebuffer. While the framebuffer usually ranges from 4MB to 128MB, the FIFO ranges in size from 256KB to 2MB. Like the framebuffer, the FIFO size is fixed at power-on. The FIFO is mapped via PCI BAR2. * FIFO Registers The beginning of FIFO memory is reserved for a set of "registers". Some of these are used to implement the FIFO command queueing protocol, but many of these are used for other purposes. The main difference between FIFO registers and non-FIFO registers is that FIFO registers are backed by normal RAM whereas non-FIFO registers require I/O operations to access. This means that only non-FIFO registers can have side-effects, but FIFO registers are much more efficient when side-effects aren't necessary. The FIFO register space is variable-sized. The driver is responsible for partitioning FIFO memory into register space and command space. * Synchronization Primitives Conceptually, the part of the SVGA device which processes FIFO commands can be thought of as a coprocessor or a separate thread of execution. The virtual machine may need to: - Wake up the FIFO processor when it's sleeping, to ensure that new commands are processed with low-latency. (FIFO doorbell) - Check whether a previously enqueued FIFO command has been processed. (FIFO fence) - Wait until the FIFO processor has passed a particular command. (Sync to fence) - Wait until more space is available in the FIFO. (Wait for FIFO progress) * Interrupts (Workstation 6.5 virtual machines and later only) On virtual machines which have been upgrade to Workstation 6.5 virtual hardware, the SVGA device provides an IRQ which can be used to notify the virtual machine when a synchronization event occurs. This allows implementing operations like "Sync to fence" without interfering with a virtual machine's ability to multitask. On older virtual hardware versions, the SVGA device only supports a "legacy sync" mechanism, in which a particular register access has the side-effect of waiting for host FIFO processing to occur. This older mechanism completely halts the virtual machine's CPU while the FIFO is being processed. * Physical VRAM The VMware SVGA device provides management of physical VRAM resources via "surface objects", however physical VRAM is never directly visible to the virtual machine. Physical VRAM can only be accessed via DMA transfers. Note that framebuffer memory is simply a convenient place to put DMA buffers. Even if a virtual machine only has 16MB of framebuffer memory allocated to it, it could be using gigabytes of physical VRAM if that memory is available to the physical GPU. * DMA engine The VMware SVGA device can asynchronously transfer surface data between phyiscal VRAM and guest-visible memory. This guest-visible memory could be part of framebuffer memory, or it could be part of guest system memory. The DMA engine uses a "Guest Pointer" abstraction to refer to any guest-visible memory. Guest pointer consist of an offset and a Guest Memory Region (GMR) ID. There is a pre-defined GMR which refers to framebuffer memory. The virtual machine can create additional GMRs to refer to regions of system memory which may or may not be physically contiguous. 4. 2D Graphics Model --------------------- Conceptually, the 2D portion of the VMware SVGA device is a compositor which displays a user-visible image composed of several planes. From back to front, those planes are: - The 2D framebuffer - 3D regions - Video overlay regions - The virtual hardware mouse cursor ("cursor bypass") - The physical hardware mouse cursor ("host cursor") It is important to note that host-executed 2D graphics commands do not necessarily modify the 2D framebuffer, they may write directly to the physical display or display window. Like a physical video card, the VMware SVGA device's framebuffer is never modified by a mouse cursor or video overlay. Unlike a physical video card, however, 3D display regions in the VMware SVGA device may or may not modify the 2D framebuffer. The following basic 2D operations are available: * Update Redraw a portion of the screen, using data from the 2D framebuffer. Any update rectangles are subtracted from the set of on-screen 3D regions, so 2D updates always overwrite 3D regions. 2D updates still appear behind video overlays and mouse cursors. An update command must be sent any time the driver wishes to make changes to the 2D framebuffer available. The user-visible screen is not guaranteed to update unless an explicit update command is sent. Also note that the SVGA device is allowed to read the 2D framebuffer even if no update command has been sent. For example, if the virtual machine is running in a partially obscured window, the SVGA device will read the 2D framebuffer immediately when the window is uncovered in order to draw the newly visible portion of the VM's window. This means that the virtual machine must not treat the 2D framebuffer as a back-buffer. It must contain a completely rendered image at all times. There is not yet any way to synchronize updates with the vertical refresh. Current VMware SVGA devices may suffer from tearing artifacts. * 2D acceleration operations These include fills, copies, and various kinds of blits. All 2D acceleration operations happen directly on the user-visible screen, not in 2D framebuffer memory. Use of the 2D acceleration operations is encouraged only in very limited circumstances. For example, when moving or scrolling windows. Mixing accelerated and unaccelerated 2D operations is difficult to implement properly, and incurs a significant synchronization penalty. * Present 3D surface "Present" is an SVGA3D command which copies a finished image from an SVGA3D surface to the user-visible screen. It may or may not update the 2D framebuffer in the process. Present commands effectively create a 3D overlay on top of part of the 2D framebuffer. This overlay can be overwritten by Update commands or by other Present commands. Present is the only way in which the 2D and 3D portions of the VMware SVGA device interact. * Video overlay operations The SVGA device defines a group of virtual "video overlay units", each of which can color-convert, scale, and display a frame of YUV video overlayed with the 2D framebuffer. Overlay units each have a set of virtual registers which are configured using the commands in svga_overlay.h. * Virtual mouse cursor operations The virtual mouse cursor is an overlay which shows the SVGA device's current cursor image at a particular location. It may not be hardware-accelerated by the physical machine, and it does not necessarily correspond with the position of the user's physical mouse. There are three "Cursor Bypass" mechanisms by which the virtual machine can set the position of the virtual mouse cursor. Cursor bypass 1 did not follow the overlay model described above, and it has long been obsolete. Cursor bypass 2 and 3 are functionally equivalent, except that cursor bypass 2 operates via non-FIFO registers and cursor bypass 3 operates via FIFO registers. If cursor bypass 3 is supported (SVGA_FIFO_CAP_CURSOR_BYPASS_3), it should be used instead of cursor bypass 2. For all forms of cursor bypass, the cursor image is defined by SVGA_CMD_DEFINE_CURSOR. * Physical mouse cursor operations The virtual machine does not define the location of the physical mouse cursor, but it can define the cursor image and hide/show it. It does so using the SVGA_CMD_DEFINE_CURSOR and SVGA_CMD_DISPLAY_CURSOR commands. 5. 3D Graphics Model --------------------- The VMware SVGA device supports hardware-independent accelerated 3D graphics via the "SVGA3D" protocol. This is a set of extended FIFO commands. SVGA3D utilizes the same underlying FIFO and synchronization primitives as the 2D portion of the SVGA device, but the 2D and 3D portions of the device are largely independent. The SVGA3D protocol is relatively high-level. The device is responsible for tracking render state among multiple contexts, for managing physical VRAM, and for implementing both fixed-function and programmable vertex and pixel processing. The SVGA3D protocol is designed to be vendor- and API-neutral, but for convenience it has been designed to be compatible with Direct3D in most places. The shader bytecode is fully binary-compatible with Direct3D bytecode, and most render states are identical to those defined by Direct3D. Note that the VMware SVGA device still supports 3D acceleration on all operating systems that VMware products run on. Internally, hardware accelerated 3D is implemented on top of the OpenGL graphics API. To summarize the SVGA3D device's design: * SVGA3D is an extension to the VMware SVGA device's command FIFO protocol. * In some ways it looks like a graphics API: o SVGA3D device manages all physical VRAM allocation. o High-level render states, relatively high-level shader bytecode. * In some ways it looks like hardware: o All commands are executed asynchronously. o Driver must track memory ownership, schedule DMA transfers. o All physical VRAM represented by generic "Surface" objects * Supports both fixed-function and programmable vertex and fragment pipelines. 6. Overview of SVGA3D features ------------------------------ * Capabilities o Extensible key/value pair list describes the SVGA3D device's capabilities. o Number of texture image units, max texture size, number of lights, texture formats, etc. * Surfaces o Formats: 8-bit RGB/RGBA, 16-bit RGB/RGBA, depth, packed depth/stencil, luminance/alpha, DXT compressed, signed, floating point, etc. o Supports 3D (volume) textures, cube maps,. o Surfaces are also used as vertex and index buffers. o Generic DMA blits between surfaces and system memory or offscreen "virtual VRAM". o Generic surface-to-surface blits, with and without scaling. * Contexts o Surfaces are global, other objects are per-context, render states are per-context. o Commands to create/delete contexts. * Render State (Mostly Direct3D-style) o Matrices o Texture stage states: Filtering, combiners, LOD, gamma correction, etc. o Stencil, depth, culling, blending, lighting, materials, etc. * Render Targets o Few restrictions on which surfaces can be used as render targets (More lenient than OpenGL FBOs) o Supports depth, stencil, color buffer(s) * Present o The "present" operation is a blit from an SVGA3D surface back to the user-visible screen. o May or may not update the guest-visible 2D framebuffer. * Occlusion queries o Submitted via FIFO commands o Results returned asynchronously: a results structure is filled in via DMA. * Shaders o We define an "SVGA3D bytecode", which is binary-compatible with Direct3D's shader bytecode. o SVGA3D may define extensions to the bytecode format in the future. * Drawing o A single generic "draw primitives" command performs a list of rendering operations from a list of vertex buffers. o Index buffer is optional. o Similar to drawing with OpenGL vertex arrays and VBOs. 7. Programming the VMware SVGA Device ------------------------------------- 1. Reading/writing a register: The SVGA registers are addressed by an index/value pair of 32 bit registers in the IO address space. The 0710 VMware SVGA chipset (PCI device ID PCI_DEVICE_ID_VMWARE_SVGA) has its index and value ports hardcoded at: index: SVGA_LEGACY_BASE_PORT + 4 * SVGA_INDEX_PORT value: SVGA_LEGACY_BASE_PORT + 4 * SVGA_VALUE_PORT The 0405 VMware SVGA chipset (PCI device ID PCI_DEVICE_ID_VMWARE_SVGA2) determines its index and value ports as a function of the first base address register in its PCI configuration space as: index: + SVGA_INDEX_PORT value: + SVGA_VALUE_PORT To read a register: Set the index port to the index of the register, using a dword OUT Do a dword IN from the value port To write a register: Set the index port to the index of the register, using a dword OUT Do a dword OUT to the value port Example, setting the width to 1024: mov eax, SVGA_REG_WIDTH mov edx, out dx, eax mov eax, 1024 mov edx, out dx, eax 2. Initialization Check the version number loop: Write into SVGA_REG_ID the maximum SVGA_ID_* the driver supports. Read from SVGA_REG_ID. Check if it is the value you wrote. If yes, VMware SVGA device supports it If no, decrement SVGA_ID_* and goto loop This algorithm converges. Map the frame buffer and the command FIFO Read SVGA_REG_FB_START, SVGA_REG_FB_SIZE, SVGA_REG_MEM_START, SVGA_REG_MEM_SIZE. Map the frame buffer (FB) and the FIFO memory (MEM). This step must occur after the version negotiation above, since by default the device is in a legacy-compatibility mode in which there is no command FIFO. Get the device capabilities and frame buffer dimensions Read SVGA_REG_CAPABILITIES, SVGA_REG_MAX_WIDTH, SVGA_REG_MAX_HEIGHT, and SVGA_REG_HOST_BITS_PER_PIXEL / SVGA_REG_BITS_PER_PIXEL. Note: The capabilities can and do change without the PCI device ID changing or the SVGA_REG_ID changing. A driver should always check the capabilities register when loading before expecting any capabilities-determined feature to be available. See below for a list of capabilities as of this writing. Note: If SVGA_CAP_8BIT_EMULATION is not set, then it is possible that SVGA_REG_HOST_BITS_PER_PIXEL does not exist and SVGA_REG_BITS_PER_PIXEL should be read instead. Optional: Report the Guest Operating System Write SVGA_REG_GUEST_ID with the appropriate value from . While not required in any way, this is useful information for the virtual machine to have available for reporting and sanity checking purposes. SetMode Set SVGA_REG_WIDTH, SVGA_REG_HEIGHT, SVGA_REG_BITS_PER_PIXEL Read SVGA_REG_FB_OFFSET (SVGA_REG_FB_OFFSET is the offset from SVGA_REG_FB_START of the visible portion of the frame buffer) Read SVGA_REG_BYTES_PER_LINE, SVGA_REG_DEPTH, SVGA_REG_PSEUDOCOLOR, SVGA_REG_RED_MASK, SVGA_REG_GREEN_MASK, SVGA_REG_BLUE_MASK Note: SVGA_REG_BITS_PER_PIXEL is readonly if SVGA_CAP_8BIT_EMULATION is not set in the capabilities register. Even if it is set, values other than 8 and SVGA_REG_HOST_BITS_PER_PIXEL will be ignored. Enable SVGA Set SVGA_REG_ENABLE to 1 (to disable SVGA, set SVGA_REG_ENABLE to 0. Setting SVGA_REG_ENABLE to 0 also enables VGA.) Initialize the command FIFO The FIFO is exclusively dword (32-bit) aligned. The first four dwords define the portion of the MEM area that is used for the command FIFO. These are values are all in byte offsets from the start of the MEM area. A minimum sized FIFO would have these values: mem[SVGA_FIFO_MIN] = 16; mem[SVGA_FIFO_MAX] = 16 + (10 * 1024); mem[SVGA_FIFO_NEXT_CMD] = 16; mem[SVGA_FIFO_STOP] = 16; Various addresses near the beginning of the FIFO are defined as "FIFO registers" with special meaning. If the driver wishes to take advantage of the special meaning of these addresses rather than using them as part of the command FIFO, the driver must reserve space for these registers when setting up the FIFO. Typically the driver will set MIN to SVGA_FIFO_NUM_REGS*4. Report the guest 3D version If your driver supports 3D, write the latest supported 3D version (SVGA3D_HWVERSION_CURRENT) to the SVGA_FIFO_GUEST_3D_HWVERSION register. Enable the command FIFO Set SVGA_REG_CONFIG_DONE to 1 after these values have been set. Note: Setting SVGA_REG_CONFIG_DONE to 0 will stop the device from reading the FIFO until it is reinitialized and SVGA_REG_CONFIG_DONE is set to 1 again. 3. SVGA command FIFO protocol The FIFO is empty when SVGA_FIFO_NEXT_CMD == SVGA_FIFO_STOP. The driver writes commands to the FIFO starting at the offset specified by SVGA_FIFO_NEXT_CMD, and then increments SVGA_FIFO_NEXT_CMD. The FIFO is full when SVGA_FIFO_NEXT_CMD is one word before SVGA_FIFO_STOP. When the FIFO becomes full, the driver must wait for space to become available. It can do this via various methods (busy-wait, legacy sync) but the preferred method is to use the FIFO_PROGRESS interrupt. The SVGA device does not guarantee that all of FIFO memory is valid at all times. The device is free to discard the contents of any memory which is not part of the active portion of the FIFO. The active portion of the FIFO is defined as the region with valid commands (starting at SVGA_FIFO_STOP and ending at SVGA_FIFO_NEXT_CMD) plus the reserved portion of the FIFO. By default, only one word of memory is 'reserved'. If the FIFO supports the SVGA_FIFO_CAP_RESERVE capability, the device supports reserving driver-defined amounts of memory. If both the device and driver support this operation, it's possible to write multiple words of data between updates to the FIFO control registers. The simplest way to use the FIFO is to write one word at a time, but the highest-performance way to use the FIFO is to reserve enough space for an entire command or group of commands, write the commands directly to FIFO memory, then "commit" the command(s) by updating the FIFO control registers. A reference implementation of this reserve/commit algorithm is provided in svga.c, in SVGA_FIFOReserve() and SVGA_FIFOCommit(). In the common case, this algorithm lets drivers assemble commands directly in FIFO memory without any additional copies or memory allocation. 4. Synchronization The primary synchronization primitive defined by the SVGA device is "Sync to fence". A "fence" is a numbered marker inserted into the FIFO command stream. The driver can insert fences at any time, and efficiently determine the value of the last fence processed by the device. "Sync to fence" is the process of waiting for a particular fence to be processed. This may be important for several reasons: - Flow control. For interactivity, it is important to put an upper limit on the amount by which the device may lag the application. - Waiting for DMA completion. If the driver needs to recycle a DMA buffer or complete a DMA operation synchronously, it must sync to a fence which occurred after the DMA operation in the command stream. - Waiting for accelerated 2D operations. If a 2D driver needs to write to a portion of the framebuffer which is affected by an accelerated blit, it should sync to a fence which occurred after the blit. There are multiple possible implementations of Sync to Fence, depending on the capabilities of the SVGA device you're driving. Very old versions of the VMware SVGA device did not support fences at all. For these devices, you must always perform a "legacy sync". New virtual machines with Workstation 6.5 virtual hardware or later support an IRQ-driven sync operation. For all other versions of the SVGA device, the best approach is a hybrid in which you synchronously use the SYNC/BUSY registers to process the FIFO until the sync has passed. FIFO synchronization is a very complex topic, and it isn't covered fully by this document. Please see the synchronization-related comments in svga_reg.h, and the reference implementation of these primitives in svga.c. 5. Cursor When SVGA_CAP_CURSOR is set, hardware cursor support is available. In practice, SVGA_CAP_CURSOR will only be set when SVGA_CAP_CURSOR_BYPASS is also set and drivers supporting a hardware cursor should only worry about SVGA_CAP_CURSOR_BYPASS and only use the FIFO to define the cursor. See below for more information. 6. Pseudocolor When the read-only register SVGA_REG_PSEUDOCOLOR is 1, the device is in a colormapped mode whose index width and color width are both SVGA_REG_DEPTH. Thus far, 8 is the only depth at which pseudocolor is ever used. In pseudocolor, the colormap is programmed by writing to the SVGA palette registers. These start at SVGA_PALETTE_BASE and are interpreted as follows: SVGA_PALETTE_BASE + 3*n - The nth red component SVGA_PALETTE_BASE + 3*n + 1 - The nth green component SVGA_PALETTE_BASE + 3*n + 2 - The nth blue component And n ranges from 0 to ((1<