Copyright (C) 1999-2009 VMware, Inc.
All Rights Reserved
VMware SVGA Device Interface and Programming Model
--------------------------------------------------
Revision 3, 2009-04-12
Table of Contents:
1. Introduction
2. Examples and Reference Implementation
3. Virtual Hardware Overview
4. 2D Graphics Model
5. 3D Graphics Model
6. Overview of SVGA3D features
7. Programming the VMware SVGA Device
XXX - Todo
----------
This document does not yet describe the 3D hardware in great
detail. It is an architectural overview. See the accompanying sample
and reference code for details.
Section (7) is biased toward describing much older features of the
virtual hardware. Many new capability flags and FIFO commands have
been added, and these are sparsely documented in svga_reg.h.
1. Introduction
----------------
This document describes the virtual graphics adapter interface which
is implemented by VMware products. The VMware SVGA Device is a virtual
PCI video card. It does not directly correspond to any real video
card, but it serves as an interface for exposing accelerated graphics
capabilities to virtual machines in a hardware-independent way.
In its simplest form, the VMware SVGA Device can be used as a basic
memory-mapped framebuffer. In this mode, the main advantage of
VMware's SVGA device over alternatives like VBE is that the virtual
machine can explicitly indicate which ares of the screen have changed
by sending update rectangles through the device's command FIFO. This
allows VMware products to avoid reading areas of the framebuffer which
haven't been modified by the virtualized OS.
The VMware SVGA device also supports several advanced features:
- Accelerated video overlays
- 2D acceleration
- Synchronization primitives
- DMA transfers
- Device-independent 3D acceleration, with shaders
- Multiple monitors
- Desktop resizing
2. Examples and Reference Implementation
-----------------------------------------
This document is not yet complete, in that it doesn't describe the
entire SVGA device interface in detail. It is an architectural
overview of the entire device, as well as an introduction to a few
basic areas of programming for the device.
For deeper details, see the attached example code. The "examples"
directory contains individual example applications which show various
features in action. The "lib" directory contains support code for the
example applications.
Some of this support code is designed to act as a reference
implementation. For example, the process of writing to the command
FIFO safely and efficiently is very complicated. The attached
reference implementation is a must-read for anyone attempting to write
their own driver for the SVGA device.
For simplicity and OS-neutrality, all examples compile to floppy disk
images which execute "on the bare metal" in a VMware virtual
machine. There are no run-time dependencies. At compile-time, most of
the examples only require a GNU toolchain (GCC and Binutils). Some of
the examples require Python at compile-time.
Each example will generate a .vmx virtual machine configuration file
which can be used to boot it in VMware Workstation or Fusion.
The included example code focuses on advanced features of the SVGA
device, such as 3D and synchronization primitives. There are also a
couple examples that demonstrate 3D graphics and video overlays.
For more examples of basic 2D usage, the Xorg driver is also a good
reference.
Header files and reference implementation files in 'lib':
* svga_reg.h
SVGA register definitions, SVGA capabilities, and FIFO command
definitions.
* svga_overlay.h
Definitions required to use the SVGA device's hardware video overlay
support.
* svga_escape.h
A list of definitions for the SVGA Escape commands, a way to send
arbitrary data over the SVGA command FIFO. Escapes are used for video
overlays, for vendor-specific extensions to the SVGA device, and for
various tools internal to VMware.
* svga3d_reg.h
Defines the SVGA3D protocol, the set of FIFO commands used for hardware
3D acceleration.
* svga3d_shaderdefs.h
Defines the bytecode format for SVGA3D shaders. This is used for
accelerated 3D with programmable vertex and pixel pipelines.
* svga.c
Reference implementation of low-level SVGA device functionality.
This contains sample code for device initialization, safe and
efficient FIFO writes, and various synchronization primitives.
* svga3d.c
Reference implementation for the SVGA3D protocol. This file
uses the FIFO primitives in svga.c to speak the SVGA3D protocol.
Includes a lot of in-line documentation.
* svga3dutil.c
This is a collection of high-level utilities which provide usage
examples for svga3d.c and svga.c, and which demonstrate common
SVGA3D idioms.
3. Virtual Hardware Overview
-----------------------------
The VMware SVGA Device is a virtual PCI device. It provides the
following low-level hardware features, which are used to implement
various feature-specific protocols for 2D, 3D, and video overlays:
* I/O space, at PCI Base Address Register 0 (BAR0)
There are only a few I/O ports. Besides the ports used to access
registers, these are generally either legacy features, or they are for
I/O which is performance critical but may have side-effects. (Such as
clearing IRQs after they occur.)
* Registers, accessed indirectly via INDEX and VALUE I/O ports.
The device's register space is the principal method by which
configuration takes place. In general, registers are for actions which
may have side-effects and which take place synchronously with the CPU.
* Guest Framebuffer (BAR1)
The SVGA device itself owns a variable amount of "framebuffer" memory,
up to a maximum of 128MB. This memory size is fixed at power-on. The
memory exists outside of the virtual machine's "main memory", and it's
mapped into PCI space via BAR1. The size of this framebuffer may be
determined either by probing BAR1 in typical PCI fashion, or by
reading SVGA_REG_FB_SIZE.
The beginning of framebuffer memory is reserved for the 2D
framebuffer. The rest of framebuffer memory may be used as buffer
space for DMA operations.
* Command FIFO (BAR2)
The SVGA device can be thought of as a co-processor which executes
commands asynchronously with the virtual machine's CPU. To enqueue
commands for this coprocessor, the SVGA device uses another
device-owned memory region which is mapped into PCI space.
The command FIFO is usually much smaller than the framebuffer. While
the framebuffer usually ranges from 4MB to 128MB, the FIFO ranges in
size from 256KB to 2MB. Like the framebuffer, the FIFO size is fixed
at power-on. The FIFO is mapped via PCI BAR2.
* FIFO Registers
The beginning of FIFO memory is reserved for a set of "registers".
Some of these are used to implement the FIFO command queueing
protocol, but many of these are used for other purposes. The main
difference between FIFO registers and non-FIFO registers is that FIFO
registers are backed by normal RAM whereas non-FIFO registers require
I/O operations to access. This means that only non-FIFO registers can
have side-effects, but FIFO registers are much more efficient when
side-effects aren't necessary.
The FIFO register space is variable-sized. The driver is responsible
for partitioning FIFO memory into register space and command space.
* Synchronization Primitives
Conceptually, the part of the SVGA device which processes FIFO
commands can be thought of as a coprocessor or a separate thread of
execution. The virtual machine may need to:
- Wake up the FIFO processor when it's sleeping, to ensure that
new commands are processed with low-latency. (FIFO doorbell)
- Check whether a previously enqueued FIFO command has been
processed. (FIFO fence)
- Wait until the FIFO processor has passed a particular
command. (Sync to fence)
- Wait until more space is available in the FIFO. (Wait for
FIFO progress)
* Interrupts (Workstation 6.5 virtual machines and later only)
On virtual machines which have been upgrade to Workstation 6.5 virtual
hardware, the SVGA device provides an IRQ which can be used to notify
the virtual machine when a synchronization event occurs. This allows
implementing operations like "Sync to fence" without interfering with
a virtual machine's ability to multitask.
On older virtual hardware versions, the SVGA device only supports a
"legacy sync" mechanism, in which a particular register access has the
side-effect of waiting for host FIFO processing to occur. This older
mechanism completely halts the virtual machine's CPU while the FIFO is
being processed.
* Physical VRAM
The VMware SVGA device provides management of physical VRAM resources
via "surface objects", however physical VRAM is never directly visible
to the virtual machine. Physical VRAM can only be accessed via DMA
transfers.
Note that framebuffer memory is simply a convenient place to put DMA
buffers. Even if a virtual machine only has 16MB of framebuffer memory
allocated to it, it could be using gigabytes of physical VRAM if that
memory is available to the physical GPU.
* DMA engine
The VMware SVGA device can asynchronously transfer surface data
between phyiscal VRAM and guest-visible memory. This guest-visible
memory could be part of framebuffer memory, or it could be part of
guest system memory.
The DMA engine uses a "Guest Pointer" abstraction to refer to any
guest-visible memory. Guest pointer consist of an offset and a Guest
Memory Region (GMR) ID. There is a pre-defined GMR which refers to
framebuffer memory. The virtual machine can create additional GMRs to
refer to regions of system memory which may or may not be physically
contiguous.
4. 2D Graphics Model
---------------------
Conceptually, the 2D portion of the VMware SVGA device is a compositor
which displays a user-visible image composed of several planes. From
back to front, those planes are:
- The 2D framebuffer
- 3D regions
- Video overlay regions
- The virtual hardware mouse cursor ("cursor bypass")
- The physical hardware mouse cursor ("host cursor")
It is important to note that host-executed 2D graphics commands do not
necessarily modify the 2D framebuffer, they may write directly to the
physical display or display window. Like a physical video card, the
VMware SVGA device's framebuffer is never modified by a mouse cursor
or video overlay. Unlike a physical video card, however, 3D display
regions in the VMware SVGA device may or may not modify the 2D
framebuffer.
The following basic 2D operations are available:
* Update
Redraw a portion of the screen, using data from the 2D
framebuffer. Any update rectangles are subtracted from the set of
on-screen 3D regions, so 2D updates always overwrite 3D regions. 2D
updates still appear behind video overlays and mouse cursors.
An update command must be sent any time the driver wishes to make
changes to the 2D framebuffer available. The user-visible screen is
not guaranteed to update unless an explicit update command is sent.
Also note that the SVGA device is allowed to read the 2D framebuffer
even if no update command has been sent. For example, if the virtual
machine is running in a partially obscured window, the SVGA device
will read the 2D framebuffer immediately when the window is uncovered
in order to draw the newly visible portion of the VM's window.
This means that the virtual machine must not treat the 2D framebuffer
as a back-buffer. It must contain a completely rendered image at all
times.
There is not yet any way to synchronize updates with the vertical
refresh. Current VMware SVGA devices may suffer from tearing
artifacts.
* 2D acceleration operations
These include fills, copies, and various kinds of blits. All 2D
acceleration operations happen directly on the user-visible screen,
not in 2D framebuffer memory.
Use of the 2D acceleration operations is encouraged only in very
limited circumstances. For example, when moving or scrolling
windows. Mixing accelerated and unaccelerated 2D operations is
difficult to implement properly, and incurs a significant
synchronization penalty.
* Present 3D surface
"Present" is an SVGA3D command which copies a finished image from an
SVGA3D surface to the user-visible screen. It may or may not update
the 2D framebuffer in the process.
Present commands effectively create a 3D overlay on top of part of the
2D framebuffer. This overlay can be overwritten by Update commands or
by other Present commands.
Present is the only way in which the 2D and 3D portions of the VMware
SVGA device interact.
* Video overlay operations
The SVGA device defines a group of virtual "video overlay units", each
of which can color-convert, scale, and display a frame of YUV video
overlayed with the 2D framebuffer. Overlay units each have a set of
virtual registers which are configured using the commands in
svga_overlay.h.
* Virtual mouse cursor operations
The virtual mouse cursor is an overlay which shows the SVGA device's
current cursor image at a particular location. It may not be
hardware-accelerated by the physical machine, and it does not
necessarily correspond with the position of the user's physical mouse.
There are three "Cursor Bypass" mechanisms by which the virtual
machine can set the position of the virtual mouse cursor. Cursor
bypass 1 did not follow the overlay model described above, and it has
long been obsolete. Cursor bypass 2 and 3 are functionally equivalent,
except that cursor bypass 2 operates via non-FIFO registers and cursor
bypass 3 operates via FIFO registers. If cursor bypass 3 is supported
(SVGA_FIFO_CAP_CURSOR_BYPASS_3), it should be used instead of cursor
bypass 2.
For all forms of cursor bypass, the cursor image is defined by
SVGA_CMD_DEFINE_CURSOR.
* Physical mouse cursor operations
The virtual machine does not define the location of the physical mouse
cursor, but it can define the cursor image and hide/show it. It does
so using the SVGA_CMD_DEFINE_CURSOR and SVGA_CMD_DISPLAY_CURSOR
commands.
5. 3D Graphics Model
---------------------
The VMware SVGA device supports hardware-independent accelerated 3D
graphics via the "SVGA3D" protocol. This is a set of extended FIFO
commands. SVGA3D utilizes the same underlying FIFO and synchronization
primitives as the 2D portion of the SVGA device, but the 2D and 3D
portions of the device are largely independent.
The SVGA3D protocol is relatively high-level. The device is
responsible for tracking render state among multiple contexts, for
managing physical VRAM, and for implementing both fixed-function and
programmable vertex and pixel processing.
The SVGA3D protocol is designed to be vendor- and API-neutral, but for
convenience it has been designed to be compatible with Direct3D in
most places. The shader bytecode is fully binary-compatible with
Direct3D bytecode, and most render states are identical to those
defined by Direct3D.
Note that the VMware SVGA device still supports 3D acceleration on all
operating systems that VMware products run on. Internally, hardware
accelerated 3D is implemented on top of the OpenGL graphics API.
To summarize the SVGA3D device's design:
* SVGA3D is an extension to the VMware SVGA device's command FIFO
protocol.
* In some ways it looks like a graphics API:
o SVGA3D device manages all physical VRAM allocation.
o High-level render states, relatively high-level shader bytecode.
* In some ways it looks like hardware:
o All commands are executed asynchronously.
o Driver must track memory ownership, schedule DMA transfers.
o All physical VRAM represented by generic "Surface" objects
* Supports both fixed-function and programmable vertex and fragment
pipelines.
6. Overview of SVGA3D features
------------------------------
* Capabilities
o Extensible key/value pair list describes the SVGA3D device's
capabilities.
o Number of texture image units, max texture size, number of
lights, texture formats, etc.
* Surfaces
o Formats: 8-bit RGB/RGBA, 16-bit RGB/RGBA, depth, packed
depth/stencil, luminance/alpha, DXT compressed, signed, floating
point, etc.
o Supports 3D (volume) textures, cube maps,.
o Surfaces are also used as vertex and index buffers.
o Generic DMA blits between surfaces and system memory or
offscreen "virtual VRAM".
o Generic surface-to-surface blits, with and without scaling.
* Contexts
o Surfaces are global, other objects are per-context, render
states are per-context.
o Commands to create/delete contexts.
* Render State (Mostly Direct3D-style)
o Matrices
o Texture stage states: Filtering, combiners, LOD, gamma
correction, etc.
o Stencil, depth, culling, blending, lighting, materials, etc.
* Render Targets
o Few restrictions on which surfaces can be used as render
targets (More lenient than OpenGL FBOs)
o Supports depth, stencil, color buffer(s)
* Present
o The "present" operation is a blit from an SVGA3D surface back
to the user-visible screen.
o May or may not update the guest-visible 2D framebuffer.
* Occlusion queries
o Submitted via FIFO commands
o Results returned asynchronously: a results structure is filled
in via DMA.
* Shaders
o We define an "SVGA3D bytecode", which is binary-compatible
with Direct3D's shader bytecode.
o SVGA3D may define extensions to the bytecode format in the future.
* Drawing
o A single generic "draw primitives" command performs a list of
rendering operations from a list of vertex buffers.
o Index buffer is optional.
o Similar to drawing with OpenGL vertex arrays and VBOs.
7. Programming the VMware SVGA Device
-------------------------------------
1. Reading/writing a register:
The SVGA registers are addressed by an index/value pair of 32 bit
registers in the IO address space.
The 0710 VMware SVGA chipset (PCI device ID PCI_DEVICE_ID_VMWARE_SVGA) has
its index and value ports hardcoded at:
index: SVGA_LEGACY_BASE_PORT + 4 * SVGA_INDEX_PORT
value: SVGA_LEGACY_BASE_PORT + 4 * SVGA_VALUE_PORT
The 0405 VMware SVGA chipset (PCI device ID PCI_DEVICE_ID_VMWARE_SVGA2)
determines its index and value ports as a function of the first base
address register in its PCI configuration space as:
index: + SVGA_INDEX_PORT
value: + SVGA_VALUE_PORT
To read a register:
Set the index port to the index of the register, using a dword OUT
Do a dword IN from the value port
To write a register:
Set the index port to the index of the register, using a dword OUT
Do a dword OUT to the value port
Example, setting the width to 1024:
mov eax, SVGA_REG_WIDTH
mov edx,
out dx, eax
mov eax, 1024
mov edx,
out dx, eax
2. Initialization
Check the version number
loop:
Write into SVGA_REG_ID the maximum SVGA_ID_* the driver supports.
Read from SVGA_REG_ID.
Check if it is the value you wrote.
If yes, VMware SVGA device supports it
If no, decrement SVGA_ID_* and goto loop
This algorithm converges.
Map the frame buffer and the command FIFO
Read SVGA_REG_FB_START, SVGA_REG_FB_SIZE, SVGA_REG_MEM_START,
SVGA_REG_MEM_SIZE.
Map the frame buffer (FB) and the FIFO memory (MEM).
This step must occur after the version negotiation above, since by
default the device is in a legacy-compatibility mode in which there
is no command FIFO.
Get the device capabilities and frame buffer dimensions
Read SVGA_REG_CAPABILITIES, SVGA_REG_MAX_WIDTH, SVGA_REG_MAX_HEIGHT,
and SVGA_REG_HOST_BITS_PER_PIXEL / SVGA_REG_BITS_PER_PIXEL.
Note: The capabilities can and do change without the PCI device ID
changing or the SVGA_REG_ID changing. A driver should always check
the capabilities register when loading before expecting any
capabilities-determined feature to be available. See below for a list
of capabilities as of this writing.
Note: If SVGA_CAP_8BIT_EMULATION is not set, then it is possible that
SVGA_REG_HOST_BITS_PER_PIXEL does not exist and
SVGA_REG_BITS_PER_PIXEL should be read instead.
Optional: Report the Guest Operating System
Write SVGA_REG_GUEST_ID with the appropriate value from .
While not required in any way, this is useful information for the
virtual machine to have available for reporting and sanity checking
purposes.
SetMode
Set SVGA_REG_WIDTH, SVGA_REG_HEIGHT, SVGA_REG_BITS_PER_PIXEL
Read SVGA_REG_FB_OFFSET
(SVGA_REG_FB_OFFSET is the offset from SVGA_REG_FB_START of the
visible portion of the frame buffer)
Read SVGA_REG_BYTES_PER_LINE, SVGA_REG_DEPTH, SVGA_REG_PSEUDOCOLOR,
SVGA_REG_RED_MASK, SVGA_REG_GREEN_MASK, SVGA_REG_BLUE_MASK
Note: SVGA_REG_BITS_PER_PIXEL is readonly if
SVGA_CAP_8BIT_EMULATION is not set in the capabilities register. Even
if it is set, values other than 8 and SVGA_REG_HOST_BITS_PER_PIXEL
will be ignored.
Enable SVGA
Set SVGA_REG_ENABLE to 1
(to disable SVGA, set SVGA_REG_ENABLE to 0. Setting SVGA_REG_ENABLE
to 0 also enables VGA.)
Initialize the command FIFO
The FIFO is exclusively dword (32-bit) aligned. The first four
dwords define the portion of the MEM area that is used for the
command FIFO. These are values are all in byte offsets from the
start of the MEM area.
A minimum sized FIFO would have these values:
mem[SVGA_FIFO_MIN] = 16;
mem[SVGA_FIFO_MAX] = 16 + (10 * 1024);
mem[SVGA_FIFO_NEXT_CMD] = 16;
mem[SVGA_FIFO_STOP] = 16;
Various addresses near the beginning of the FIFO are defined as
"FIFO registers" with special meaning. If the driver wishes to
take advantage of the special meaning of these addresses rather
than using them as part of the command FIFO, the driver must
reserve space for these registers when setting up the FIFO.
Typically the driver will set MIN to SVGA_FIFO_NUM_REGS*4.
Report the guest 3D version
If your driver supports 3D, write the latest supported 3D
version (SVGA3D_HWVERSION_CURRENT) to the
SVGA_FIFO_GUEST_3D_HWVERSION register.
Enable the command FIFO
Set SVGA_REG_CONFIG_DONE to 1 after these values have been set.
Note: Setting SVGA_REG_CONFIG_DONE to 0 will stop the device from
reading the FIFO until it is reinitialized and SVGA_REG_CONFIG_DONE is
set to 1 again.
3. SVGA command FIFO protocol
The FIFO is empty when SVGA_FIFO_NEXT_CMD == SVGA_FIFO_STOP. The
driver writes commands to the FIFO starting at the offset specified
by SVGA_FIFO_NEXT_CMD, and then increments SVGA_FIFO_NEXT_CMD.
The FIFO is full when SVGA_FIFO_NEXT_CMD is one word before SVGA_FIFO_STOP.
When the FIFO becomes full, the driver must wait for space to become
available. It can do this via various methods (busy-wait, legacy sync)
but the preferred method is to use the FIFO_PROGRESS interrupt.
The SVGA device does not guarantee that all of FIFO memory is valid
at all times. The device is free to discard the contents of any memory
which is not part of the active portion of the FIFO. The active portion
of the FIFO is defined as the region with valid commands (starting
at SVGA_FIFO_STOP and ending at SVGA_FIFO_NEXT_CMD) plus the reserved
portion of the FIFO.
By default, only one word of memory is 'reserved'. If the FIFO supports
the SVGA_FIFO_CAP_RESERVE capability, the device supports reserving
driver-defined amounts of memory. If both the device and driver support
this operation, it's possible to write multiple words of data between
updates to the FIFO control registers.
The simplest way to use the FIFO is to write one word at a time, but the
highest-performance way to use the FIFO is to reserve enough space for
an entire command or group of commands, write the commands directly to
FIFO memory, then "commit" the command(s) by updating the FIFO control
registers.
A reference implementation of this reserve/commit algorithm is provided
in svga.c, in SVGA_FIFOReserve() and SVGA_FIFOCommit(). In the common
case, this algorithm lets drivers assemble commands directly in FIFO
memory without any additional copies or memory allocation.
4. Synchronization
The primary synchronization primitive defined by the SVGA device is
"Sync to fence". A "fence" is a numbered marker inserted into the FIFO
command stream. The driver can insert fences at any time, and efficiently
determine the value of the last fence processed by the device.
"Sync to fence" is the process of waiting for a particular fence to be
processed. This may be important for several reasons:
- Flow control. For interactivity, it is important to put an upper
limit on the amount by which the device may lag the application.
- Waiting for DMA completion. If the driver needs to recycle a DMA
buffer or complete a DMA operation synchronously, it must sync
to a fence which occurred after the DMA operation in the command
stream.
- Waiting for accelerated 2D operations. If a 2D driver needs to
write to a portion of the framebuffer which is affected by
an accelerated blit, it should sync to a fence which occurred
after the blit.
There are multiple possible implementations of Sync to Fence, depending
on the capabilities of the SVGA device you're driving. Very old versions
of the VMware SVGA device did not support fences at all. For these
devices, you must always perform a "legacy sync". New virtual machines
with Workstation 6.5 virtual hardware or later support an IRQ-driven
sync operation. For all other versions of the SVGA device, the best
approach is a hybrid in which you synchronously use the SYNC/BUSY
registers to process the FIFO until the sync has passed.
FIFO synchronization is a very complex topic, and it isn't covered fully
by this document. Please see the synchronization-related comments in
svga_reg.h, and the reference implementation of these primitives in
svga.c.
5. Cursor
When SVGA_CAP_CURSOR is set, hardware cursor support is available. In
practice, SVGA_CAP_CURSOR will only be set when SVGA_CAP_CURSOR_BYPASS is
also set and drivers supporting a hardware cursor should only worry about
SVGA_CAP_CURSOR_BYPASS and only use the FIFO to define the cursor. See
below for more information.
6. Pseudocolor
When the read-only register SVGA_REG_PSEUDOCOLOR is 1, the device is in a
colormapped mode whose index width and color width are both SVGA_REG_DEPTH.
Thus far, 8 is the only depth at which pseudocolor is ever used.
In pseudocolor, the colormap is programmed by writing to the SVGA palette
registers. These start at SVGA_PALETTE_BASE and are interpreted as
follows:
SVGA_PALETTE_BASE + 3*n - The nth red component
SVGA_PALETTE_BASE + 3*n + 1 - The nth green component
SVGA_PALETTE_BASE + 3*n + 2 - The nth blue component
And n ranges from 0 to ((1<