* nothing works yet
* don't double buffer 3D framebuffers for the GL Renderer
looks like leftovers from when 3D+2D composition was done in the frontend
* oops
* it works!
* implement display capture for compute renderer
it's actually just all stolen from the regular OpenGL renderer
* fix bad indirect call
* handle cleanup properly
* add hires rendering to the compute shader renderer
* fix UB
also misc changes to use more unsigned multiplication
also fix framebuffer resize
* correct edge filling behaviour when AA is disabled
* fix full color textures
* fix edge marking (polygon id is 6-bit not 5)
also make the code a bit nicer
* take all edge cases into account for XMin/XMax calculation
* use hires coordinate again
* stop using fixed size buffers based on scale factor in shaders
this makes shader compile times tolerable on Wintel
- beginning of the shader cache
- increase size of tile idx in workdesc to 20 bits
* apparently & is not defined on bvec4
why does this even compile on Intel and Nvidia?
* put the texture cache into it's own file
* add compute shader renderer properly to the GUI
also add option to toggle using high resolution vertex coordinates
* unbind sampler object in compute shader renderer
* fix GetRangedBitMask for 64 bit aligned 64 bits
pretty embarassing
* convert NonStupidBitfield.h back to LF only new lines
* actually adapt to latest changes
* fix stupid merge
* actually make compute shader renderer work with newest changes
* show progress on shader compilation
* remove merge leftover
* fix the pu region's end point overflowing
According to gericom it cannot overflow at all
* set a minimum and a better maximum for the pu region size
* fix pu logging
* PU regions with a size of 31 always take up the entire address space
also tweak some logging a little more
* start is actually force aligned by size, oops
* small tweaks
* hopefully more clear code
* math is for nerds
Precompute all 16 5-bit RGB palette colours into 8-bit RGBA to avoid
repeated and superfluous calculation within the nested loop at the
point of index lookup.
A speedup was observed, from ~7ms, to a consistent 1ms
(i.e. now practically instantaneous) through timing with
std::chrono::high_resolution_clock.
Also improve comprehensibility, by using meaningful names, where
appropriate, for loop counter variables.