The significant change is what is now line 520 of TextureCacheBase.cpp:
((std::max(mipWidth, bsw) * std::max(mipHeight, bsh) * bsdepth) >> 1)
to
TexDecoder_GetTextureSizeInBytes(expanded_mip_width, expanded_mip_height, texformat);
Fixes issue 5328.
Fixes issue 5461.
Conflicts:
Source/Core/VideoCommon/Src/PixelShaderGen.cpp
Source/Plugins/Plugin_VideoSoftware/Src/SWRenderer.cpp
immediate-removal is a new created branch seperated from master but reverted the revert of immediate-removal
so we get less conflicts by merging
to use VAO, we must use VBO, so some legency code was removed:
- ARB_map_buffer_range must be available (OGL 3.0), don't call glBufferSubData if not
- ARB_draw_elements_base_vertex also (OGL 3.2), else we have to set the pointers every time
- USE_JIT was removed, it was broken and it isn't needed any more
And the index and vertex buffers are now synchronized, so that there will be one VAO per
NativeVertexFormat and Buffer.
should be done with VAO, but atm, this is not possible :-(
this also partial revert the fix in fb92c338af (activating texture0 globally).
Signed-off-by: Ryan Houdek <Sonicadvance1@gmail.com>
change naming in all the backends vertex managers to make more easy to continue with the merge an some future improvements.
please test this as i'm interested in knowing the performance in linux and windows with the different hardware platforms.
So to compensate lets bring back some speed to the emulation.
change a little the way the vertex are send to the gpu,
This first implementation changes dx9 a lot and dx11 a little to increase the parallelism between the cpu and gpu.
ogl: is my next step in ogl is a little more trickier so i have to take a little more time.
the original concept is Marcos idea, with my little touch to make it even more faster.
what to look for: SPEEEEEDDD :).
please test it a lot and let me know if you see any problem.
in dx9 the code is prepared to fall back to the previous implementation if your card does not support the amount of buffers needed.
So if you did not experience any speed gains you know where is the problem :).
for the ones with more experience and compression of the code please test changing the amount and size of the buffers to tune this for your specific machine.
The current values are the sweet spot for my machine.
All must Thanks Marcos, I hate him for giving good ideas when I'm full of work.
PS and VS making. Untested and won't work for now.
Add in program shader cache files.
Readd NativeVertexFormat stuffs.
Add in PS and VS cache things.
SetShaders in places.
Fixed EFB cache index computations in OpenGL renderer.
The previous computation was very likely to go out of array bounds,
which could result in crashes on EFB access.
Also, the cache size was rounded down instead of up. This is a problem
since EFB_HEIGHT (528) is not a multiple of EFB_CACHE_RECT_SIZE (64).
Reason:
- It's wrong, zcomploc can't be emulated perfectly in HW backends without severely impacting performance.
- It provides virtually no advantages over the previous hack while introducing lots of code.
- There is a better alternative: If people insist on having some sort of valid zcomploc emulation, I suggest rendering each primitive separately while using a _clean_ dual-pass approach to emulate zcomploc.
This reverts commit 0efd4e5c29.
This reverts commit b4ec836aca.
This reverts commit bb4c9e2205.
This reverts commit 146b02615c.