A night's worth of work: documented EDRAM. Seems mostly right.

This commit is contained in:
Ben Vanik 2016-02-19 10:38:11 -08:00
parent 1dcc84a472
commit 8820c73532
1 changed files with 127 additions and 0 deletions

View File

@ -31,6 +31,133 @@ namespace vulkan {
// This allows us to have the same base address write to the same memory
// regardless of framebuffer format. Resolving then uses whatever format the
// resolve requests straight from the backing memory.
//
// EDRAM is a beast and we only approximate it as best we can. Basically,
// the 10MiB of EDRAM is composed of 2048 5120b tiles. Each tile is 80x16px.
// +-----+-----+-----+---
// |tile0|tile1|tile2|... 2048 times
// +-----+-----+-----+---
// Operations dealing with EDRAM deal in tile offsets, so base 0x100 is tile
// offset 256, 256*5120=1310720b into the buffer. All rendering operations are
// aligned to tiles so trying to draw at 256px wide will have a real width of
// 320px by rounding up to the next tile.
//
// MSAA and other settings will modify the exact pixel sizes, like 4X makes
// each tile effectively 40x8px, but they are still all 5120b. As we try to
// emulate this we adjust our viewport when rendering to stretch pixels as
// needed.
//
// The good news is that games cannot read EDRAM directly but must use a copy
// operation to get the data out. That gives us a chance to do whatever we
// need to (re-tile, etc) only when requested.
//
// To approximate the tiled EDRAM layout we use a single large chunk of memory.
// From this memory we create many VkImages (and VkImageViews) of various
// formats and dimensions as requested by the game. These are used as
// attachments during rendering and as sources during copies. They are also
// heavily aliased - lots of images will reference the same locations in the
// underlying EDRAM buffer. The only requirement is that there are no hazards
// with specific tiles (reading/writing the same tile through different images)
// and otherwise it should be ok *fingers crossed*.
//
// One complication is the copy/resolve process itself: we need to give back
// the data asked for in the format desired and where it goes is arbitrary
// (any address in physical memory). If the game is good we get resolves of
// EDRAM into fixed base addresses with scissored regions. If the game is bad
// we are broken.
//
// Resolves from EDRAM result in tiled textures - that's texture tiles, not
// EDRAM tiles. If we wanted to ensure byte-for-byte correctness we'd need to
// then tile the images as we wrote them out. For now, we just attempt to
// get the (X, Y) in linear space and do that. This really comes into play
// when multiple resolves write to the same texture or memory aliased by
// multiple textures - which is common due to predicated tiling. The examples
// below demonstrate what this looks like, but the important thing is that
// we are aware of partial textures and overlapping regions.
//
// TODO(benvanik): what, if any, barriers do we need? any transitions?
//
// Example with multiple render targets:
// Two color targets of 256x256px tightly packed in EDRAM:
// color target 0: base 0x0, pitch 320, scissor 0,0, 256x256
// starts at tile 0, buffer offset 0
// contains 64 tiles (320/80)*(256/16)
// color target 1: base 0x40, pitch 320, scissor 256,0, 256x256
// starts at tile 64 (after color target 0), buffer offset 327680b
// contains 64 tiles
// In EDRAM each set of 64 tiles is contiguous:
// +------+------+ +------+------+------+
// |ct0.0 |ct0.1 |...|ct0.63|ct1.0 |ct1.1 |...
// +------+------+ +------+------+------+
// To render into these, we setup two VkImages:
// image 0: bound to buffer offset 0, 320x256x4=327680b
// image 1: bound to buffer offset 327680b, 320x256x4=327680b
// So when we render to them:
// +------+-+ scissored to 256x256, actually 320x256
// | . | | <- . appears at some untiled offset in the buffer, but
// | | | consistent if aliased with the same format
// +------+-+
// In theory, this gives us proper aliasing in most cases.
//
// Example with horizontal predicated tiling:
// Trying to render 1024x576 @4X MSAA, splitting into two regions
// horizontally:
// +----------+
// | 1024x288 |
// +----------+
// | 1024x288 |
// +----------+
// EDRAM configured for 1056x288px with tile size 2112x567px (4X MSAA):
// color target 0: base 0x0, pitch 1080, 26x36 tiles
// First render (top):
// window offset 0,0
// scissor 0,0, 1024x288
// First resolve (top):
// RB_COPY_DEST_BASE 0x1F45D000
// RB_COPY_DEST_PITCH pitch=1024, height=576
// vertices: 0,0, 1024,0, 1024,288
// Second render (bottom):
// window offset 0,-288
// scissor 0,288, 1024x288
// Second resolve (bottom):
// RB_COPY_DEST_BASE 0x1F57D000 (+1179648b)
// RB_COPY_DEST_PITCH pitch=1024, height=576
// (exactly 1024x288*4b after first resolve)
// vertices: 0,288, 1024,288, 1024,576
// Resolving here is easy as the textures are contiguous in memory. We can
// snoop in the first resolve with the dest height to know the total size,
// and in the second resolve see that it overlaps and place it in the
// existing target.
//
// Example with vertical predicated tiling:
// Trying to render 1280x720 @2X MSAA, splitting into two regions
// vertically:
// +-----+-----+
// | 640 | 640 |
// | x | x |
// | 720 | 720 |
// +-----+-----+
// EDRAM configured for 640x736px with tile size 640x1472px (2X MSAA):
// color target 0: base 0x0, pitch 640, 8x92 tiles
// First render (left):
// window offset 0,0
// scissor 0,0, 640x720
// First resolve (left):
// RB_COPY_DEST_BASE 0x1BC6D000
// RB_COPY_DEST_PITCH pitch=1280, height=720
// vertices: 0,0, 640,0, 640,720
// Second render (right):
// window offset -640,0
// scissor 640,0, 640x720
// Second resolve (right):
// RB_COPY_DEST_BASE 0x1BC81000 (+81920b)
// RB_COPY_DEST_PITCH pitch=1280, height=720
// vertices: 640,0, 1280,0, 1280,720
// Resolving here is much more difficult as resolves are tiled and the right
// half of the texture is 81920b away:
// 81920/4bpp=20480px, /32 (texture tile size)=640px
// We know the texture size with the first resolve and with the second we
// must check for overlap then compute the offset (in both X and Y).
class RenderCache {
public:
RenderCache(RegisterFile* register_file, ui::vulkan::VulkanDevice* device);