This method doesn't involve messing around with the quirks of the x87
FPU and should be reasonably fast. As a bonus, it does the correct thing
for out-of-range doubles.
However, it is also a little slower and only benefits programs that rely
on undefined behavior so it is disabled for now.
Floating-point is complicated...
Some background: Denormals are floats that are too close to zero to be
stored in a normalized way (their exponent would need more bits). Since
they are stored unnormalized, they are hard to work with, even in
hardware. That's why both PowerPC and SSE can be configured to operate
in faster but non-standard-conpliant modes in which these numbers are
simply rounded ('flushed') to zero.
Internally, we do the same as the PowerPC CPU and store all floats in
double format. This means that for loading and storing singles we need a
conversion. The PowerPC CPU does this in hardware. We previously did
this using CVTSS2SD/CVTSD2SS. Unfortunately, these instructions are
considered arithmetic and therefore flush denormals to zero if non-IEEE
mode is active. This normally wouldn't be a problem since the next
arithmetic floating-point instruction would do the same anyway but as it
turns out some games actually use floating-point instructions for
copying arbitrary data.
My idea for fixing this problem was to use x87 instructions since the
x87 FPU never supported flush-to-zero and thus doesn't mangle denormals.
However, there is one more problem to deal with: SNaNs are automatically
converted to QNaNs (by setting the most-significant bit of the
fraction). I opted to fix this by manually resetting the QNaN bit of all
values with all-1s exponent.
VI isn't called as regular as we want to, so we have to create a new throttling event called regularly by coretiming.
Atm we throttle every 1 ms when we are too fast and skip throttling when we lack 40ms (to avoid fast boosts after slowdowns)
If there is an issue with a reported extension, disable it instead of failing out entirely.
Fixes an issue with buffer_storage that I had overlooked as well.
This changes from using logical and to bitwise and, which causes the compile time to drop from an absurd amount of time to around five seconds on my
crappy laptop.
The default async api allow us to set some latency options. The old one (simple API) was the lazy way to go for usual audio where latency doesn't matter.
This also streams audio, so it should be a bit faster then the old one.