A night's worth of work: documented EDRAM. Seems mostly right.
This commit is contained in:
parent
1dcc84a472
commit
8820c73532
|
@ -31,6 +31,133 @@ namespace vulkan {
|
|||
// This allows us to have the same base address write to the same memory
|
||||
// regardless of framebuffer format. Resolving then uses whatever format the
|
||||
// resolve requests straight from the backing memory.
|
||||
//
|
||||
// EDRAM is a beast and we only approximate it as best we can. Basically,
|
||||
// the 10MiB of EDRAM is composed of 2048 5120b tiles. Each tile is 80x16px.
|
||||
// +-----+-----+-----+---
|
||||
// |tile0|tile1|tile2|... 2048 times
|
||||
// +-----+-----+-----+---
|
||||
// Operations dealing with EDRAM deal in tile offsets, so base 0x100 is tile
|
||||
// offset 256, 256*5120=1310720b into the buffer. All rendering operations are
|
||||
// aligned to tiles so trying to draw at 256px wide will have a real width of
|
||||
// 320px by rounding up to the next tile.
|
||||
//
|
||||
// MSAA and other settings will modify the exact pixel sizes, like 4X makes
|
||||
// each tile effectively 40x8px, but they are still all 5120b. As we try to
|
||||
// emulate this we adjust our viewport when rendering to stretch pixels as
|
||||
// needed.
|
||||
//
|
||||
// The good news is that games cannot read EDRAM directly but must use a copy
|
||||
// operation to get the data out. That gives us a chance to do whatever we
|
||||
// need to (re-tile, etc) only when requested.
|
||||
//
|
||||
// To approximate the tiled EDRAM layout we use a single large chunk of memory.
|
||||
// From this memory we create many VkImages (and VkImageViews) of various
|
||||
// formats and dimensions as requested by the game. These are used as
|
||||
// attachments during rendering and as sources during copies. They are also
|
||||
// heavily aliased - lots of images will reference the same locations in the
|
||||
// underlying EDRAM buffer. The only requirement is that there are no hazards
|
||||
// with specific tiles (reading/writing the same tile through different images)
|
||||
// and otherwise it should be ok *fingers crossed*.
|
||||
//
|
||||
// One complication is the copy/resolve process itself: we need to give back
|
||||
// the data asked for in the format desired and where it goes is arbitrary
|
||||
// (any address in physical memory). If the game is good we get resolves of
|
||||
// EDRAM into fixed base addresses with scissored regions. If the game is bad
|
||||
// we are broken.
|
||||
//
|
||||
// Resolves from EDRAM result in tiled textures - that's texture tiles, not
|
||||
// EDRAM tiles. If we wanted to ensure byte-for-byte correctness we'd need to
|
||||
// then tile the images as we wrote them out. For now, we just attempt to
|
||||
// get the (X, Y) in linear space and do that. This really comes into play
|
||||
// when multiple resolves write to the same texture or memory aliased by
|
||||
// multiple textures - which is common due to predicated tiling. The examples
|
||||
// below demonstrate what this looks like, but the important thing is that
|
||||
// we are aware of partial textures and overlapping regions.
|
||||
//
|
||||
// TODO(benvanik): what, if any, barriers do we need? any transitions?
|
||||
//
|
||||
// Example with multiple render targets:
|
||||
// Two color targets of 256x256px tightly packed in EDRAM:
|
||||
// color target 0: base 0x0, pitch 320, scissor 0,0, 256x256
|
||||
// starts at tile 0, buffer offset 0
|
||||
// contains 64 tiles (320/80)*(256/16)
|
||||
// color target 1: base 0x40, pitch 320, scissor 256,0, 256x256
|
||||
// starts at tile 64 (after color target 0), buffer offset 327680b
|
||||
// contains 64 tiles
|
||||
// In EDRAM each set of 64 tiles is contiguous:
|
||||
// +------+------+ +------+------+------+
|
||||
// |ct0.0 |ct0.1 |...|ct0.63|ct1.0 |ct1.1 |...
|
||||
// +------+------+ +------+------+------+
|
||||
// To render into these, we setup two VkImages:
|
||||
// image 0: bound to buffer offset 0, 320x256x4=327680b
|
||||
// image 1: bound to buffer offset 327680b, 320x256x4=327680b
|
||||
// So when we render to them:
|
||||
// +------+-+ scissored to 256x256, actually 320x256
|
||||
// | . | | <- . appears at some untiled offset in the buffer, but
|
||||
// | | | consistent if aliased with the same format
|
||||
// +------+-+
|
||||
// In theory, this gives us proper aliasing in most cases.
|
||||
//
|
||||
// Example with horizontal predicated tiling:
|
||||
// Trying to render 1024x576 @4X MSAA, splitting into two regions
|
||||
// horizontally:
|
||||
// +----------+
|
||||
// | 1024x288 |
|
||||
// +----------+
|
||||
// | 1024x288 |
|
||||
// +----------+
|
||||
// EDRAM configured for 1056x288px with tile size 2112x567px (4X MSAA):
|
||||
// color target 0: base 0x0, pitch 1080, 26x36 tiles
|
||||
// First render (top):
|
||||
// window offset 0,0
|
||||
// scissor 0,0, 1024x288
|
||||
// First resolve (top):
|
||||
// RB_COPY_DEST_BASE 0x1F45D000
|
||||
// RB_COPY_DEST_PITCH pitch=1024, height=576
|
||||
// vertices: 0,0, 1024,0, 1024,288
|
||||
// Second render (bottom):
|
||||
// window offset 0,-288
|
||||
// scissor 0,288, 1024x288
|
||||
// Second resolve (bottom):
|
||||
// RB_COPY_DEST_BASE 0x1F57D000 (+1179648b)
|
||||
// RB_COPY_DEST_PITCH pitch=1024, height=576
|
||||
// (exactly 1024x288*4b after first resolve)
|
||||
// vertices: 0,288, 1024,288, 1024,576
|
||||
// Resolving here is easy as the textures are contiguous in memory. We can
|
||||
// snoop in the first resolve with the dest height to know the total size,
|
||||
// and in the second resolve see that it overlaps and place it in the
|
||||
// existing target.
|
||||
//
|
||||
// Example with vertical predicated tiling:
|
||||
// Trying to render 1280x720 @2X MSAA, splitting into two regions
|
||||
// vertically:
|
||||
// +-----+-----+
|
||||
// | 640 | 640 |
|
||||
// | x | x |
|
||||
// | 720 | 720 |
|
||||
// +-----+-----+
|
||||
// EDRAM configured for 640x736px with tile size 640x1472px (2X MSAA):
|
||||
// color target 0: base 0x0, pitch 640, 8x92 tiles
|
||||
// First render (left):
|
||||
// window offset 0,0
|
||||
// scissor 0,0, 640x720
|
||||
// First resolve (left):
|
||||
// RB_COPY_DEST_BASE 0x1BC6D000
|
||||
// RB_COPY_DEST_PITCH pitch=1280, height=720
|
||||
// vertices: 0,0, 640,0, 640,720
|
||||
// Second render (right):
|
||||
// window offset -640,0
|
||||
// scissor 640,0, 640x720
|
||||
// Second resolve (right):
|
||||
// RB_COPY_DEST_BASE 0x1BC81000 (+81920b)
|
||||
// RB_COPY_DEST_PITCH pitch=1280, height=720
|
||||
// vertices: 640,0, 1280,0, 1280,720
|
||||
// Resolving here is much more difficult as resolves are tiled and the right
|
||||
// half of the texture is 81920b away:
|
||||
// 81920/4bpp=20480px, /32 (texture tile size)=640px
|
||||
// We know the texture size with the first resolve and with the second we
|
||||
// must check for overlap then compute the offset (in both X and Y).
|
||||
class RenderCache {
|
||||
public:
|
||||
RenderCache(RegisterFile* register_file, ui::vulkan::VulkanDevice* device);
|
||||
|
|
Loading…
Reference in New Issue