Yet another story of games loading weird shit into registers.
For some reason, Burnout 2 would (in rare situations) load invalid
addresses into cp_state.array_bases. What would the real hardware
do in this situation? Who knows, Burnout 2 doesn't actually enable
the vertex array with the invalid address so nothing kinky happens.
But dolphin tries to optimise things and starts using the address
as soon as it is loaded into memory. This causes GetPointer (which is
now much more vocal) to throw an error.
The Fix: We don't call GetPointer until we are sure the vertex array
has been enabled.
- FileSearch is now just one function, and it converts the original glob
into a regex on all platforms rather than relying on native Windows
pattern matching on there and a complete hack elsewhere. It now
supports recursion out of the box rather than manually expanding
into a full list of directories in multiple call sites.
- This adds a GCC >= 4.9 dependency due to older versions having
outright broken <regex>. MSVC is fine with it.
- ScanDirectoryTree returns the parent entry rather than filling parts
of it in via reference. The count is now stored in the entry like it
was for subdirectories.
- .glsl file search is now done with DoFileSearch.
- IOCTLV_READ_DIR now uses ScanDirectoryTree directly and sorts the
results after replacements for better determinism.
Through just returning the last written value sounds better, this crashes Paper Mario.
In my opinion, gfx issues are fine on older GPUs, but crashes should not happen.
There is no nice way to correctly "detect" the "used" memory, so we just say
we're fine to use 50% of the physical memory for custom textures.
This will fix out-of-memory crashes, but we still might run into swapping issues.
This was causing a race condition where the "absurdly large aux buffer"
panic alert would be triggered in the last bit of fifo processing on the
CPU thread in deterministic mode (i.e. netplay). SyncGPU is supposed to
move the auxiliary queue data to the beginning of the containing buffer
so we don't have to deal with wraparound; if GpuRunningState is false,
however, it just returns, because it's set to false by another thread -
thus it doesn't know whether RunGpuLoop is still executing (in which
case it can't just reset the pointers, because it may still be using the
buffer) or not (in which case the condition variable it normally waits
for to avoid the previous problem will never be signaled). However,
SyncGPU's caller PushFifoAuxBuffer wasn't aware of this, so if the
buffer was filling at just the right time, it'd stay full and that
function would complain that it was about to overflow it. Similar
problem with ReadDataFromFifoOnCPU afaik. Fix this by returning early
from those as well; other callers of SyncGPU should be safe. A
*slightly* cleaner alternative would be giving the CPU thread a way to
tell when RunGpuLoop has actually exited, but whatever, this works.
This drops the "feature" to load level 0 from the custom texture
and all other levels from the native one if the size matches.
But in my opinion, when a custom texture only provide one level,
no more should be used at all.
A number of games make an EFB copy in I4/I8 format, then use it as a
texture in C4/C8 format. Detect when this happens, and decode the copy on
the GPU using the specified palette.
This has a few advantages: it allows using EFB2Tex for a few more games,
it, it preserves the resolution of scaled EFB copies, and it's probably a
bit faster.
D3D only at the moment, but porting to OpenGL should be straightforward..
The obvious question here is, why does it matter if we round or truncate?
The key is that GC/Wii does fixed-point interpolation, where PC GPUs do
floating-point interpolation. Discarding fractional bits makes the conversion
from floating-point to fixed point give more consistent results.
I'm not confident this is really the right fix, or that my explanation is
completely correct; ideally, we don't want to depend on floating-point
interpolation at all.
This is the same trick which is used for Metroid's fonts/texts, but for all textures. If 2 different textures at the same address are loaded during the same frame, create a 2nd entry instead of overwriting the existing one. If the entry was overwritten in this case, there wouldn't be any caching, which results in a big performance drop.
The restriction to textures, which are loaded during the same frame, prevents creating lots of textures when textures are used in the regular way. This restriction is new. Overwriting textures, instead of creating new ones is faster, if the old ones are unlikely to be used again.
Since this would break efb copies, don't do it for efb copies.
Castlevania 3 goes from 80 fps to 115 fps for me.
There might be games that need a higher texture cache accuracy with this, but those games should also see a performance boost from this PR.
Some games, which use paletted textures, which are not efb copies, might be faster now. And also not require a higher texture cache accuracy anymore. (similar sitation as PR https://github.com/dolphin-emu/dolphin/pull/1916)
In nearly all direct loadstore cases we can use unscaled loadstores.
Still have a fallback in case we hit a situation that we /can't/ do a unscaled loadstore.
When enabled, the silent option will avoid popping up dialog boxes for
overwrite confirmation or codec selection. The codec selection defaults to
uncompressed RGB.
This is required for FifoCI on Windows which needs to drive Dolphin from the
command line exclusively.