read_size remained 0 when the "Read beyond end of disc" error occurs,
which made the while (nbytes) loop never end. As a fix, SeekToCluster
now explicitly sets available to 0 when the error occurs, and Read
checks for it.
As it is currently written, CallLambdaTrampoline does not return a
value. However, some of the functions that are being wrapped may
return a value that the JIT is expected to understand. A compiler
*cough cough clang* may opt to alter %rax after the wrapped lambda
returns, e.g. popping a previous value, which can clobber the
return value. If we actually have a return value, then the compiler
must not clobber it.
In a particular hashing heavy scene in Crazy Taxi the Murmur3 hash used 3.11% CPU time.
The new CRC32 hash in the same scene used 1.86%
This was tested on a Nvidia SHIELD Android TV with Cortex-A57s.
This will be a bit slower on the Nexus 9, the Denver CPU core is a bit slower with CRC32 texture hashing than Murmur3 texture hashing.
- Refactored Light Attenuation into inline function in Software Renderer
- Corrected zero length light direction vector to resolve with normal direction (essentially becomes LIGHTDIF_NONE which was what I was after)
- Change the API of this shared function to use points for output variables (degasus)
- Fixes remaining lighting issues (Mario Tennis, etc)
- Apply same fixes to Software Renderer
- Corrected zero length light direction vector to resolve with normal direction (essentially becomes LIGHTDIF_NONE which was what I was after)
The new implementation has 3 options:
SyncGpuMaxDistance
SyncGpuMinDistance
SyncGpuOverclock
The MaxDistance controlls how many CPU cycles the CPU is allowed to be in front
of the GPU. Too low values will slow down extremly, too high values are as
unsynchronized and half of the games will crash.
The -MinDistance (negative) set how many cycles the GPU is allowed to be in
front of the CPU. As we are used to emulate an infinitiv fast GPU, this may be
set to any high (negative) number.
The last parameter is to hack a faster (>1.0) or slower(<1.0) GPU. As we don't
emulate GPU timing very well (eg skip the timings of the pixel stage completely),
an overclock factor of ~0.5 is often much more accurate than 1.0
GC games with long names store two variations of the name in
opening.bnr. This makes the shorter of those names available.
For volumes other than GC discs, prefer_long is ignored.