This resolves a few issues with bounding box animations and others. Most noticably, it greatly reduces the bounding box slowdown seen on some NVIDIA cards and also fixes the odd overlay glitches when moving between rooms on the Excess Express.
The code was actually already rather well adapted for this.
We more or less just have to skip ParseDisc and run
ParsePartitionData directly. This required the PartitionHeader
struct to be removed (which wasn't that useful anyway).
If we start 31 KiB into a 32 KiB block and want to mark 2 KiB
of data as used, we need to mark 2 blocks as used, not just 1.
This problem is avoided when calling MarkAsUsed from
MarkAsUsedE, since MarkAsUsedE aligns to 32 KiB on its own.
Most calls to MarkAsUsed are from MarkAsUsedE, which is why
this hasn't been a noticeable problem in the past.
The constant DESIRED_BUFFER_SIZE was determined by multiplying the
old hardcoded value 32 with the default GCZ block size 16 KiB.
Not sure if it actually is the best value, but it seems fine.
Similar to what we do for addx. Since we're calculating b - a and
because subtraction is not communitative, we can only apply this when
source register a holds the constant.
Before:
45 8B EE mov r13d,r14d
41 83 ED 08 sub r13d,8
After:
45 8D 6E F8 lea r13d,[r14-8]
We can get away with skipping the addition when we know we're dealing
with a constant zero. Just a MOV will suffice in this case.
Once again, we don't bother to add separate handling for when overflow
is needed, because no titles would ever hit that path during my testing.
Before:
8B 7D F8 mov edi,dword ptr [rbp-8]
83 C7 00 add edi,0
After:
8B 7D F8 mov edi,dword ptr [rbp-8]
ADD has a smaller encoding for immediates that can be expressed as an
8-bit signed integer (in other words, between -128 and 127). MOV lacks
this compact representation.
Since addition allows us to swap the source registers, we can always get
the shortest sequence here by carefully checking if we're dealing with a
small immediate first. If we are, move the other source into the
destination and add the small immediate onto that. For large immediates
the reverse is preferrable.
Before:
41 BE 40 00 00 00 mov r14d,40h
44 03 75 A8 add r14d,dword ptr [rbp-58h]
After:
44 8B 75 A8 mov r14d,dword ptr [rbp-58h]
41 83 C6 40 add r14d,40h
Before:
44 8B 7D F8 mov r15d,dword ptr [rbp-8]
41 81 C7 00 68 00 CC add r15d,0CC006800h
After:
41 BF 00 68 00 CC mov r15d,0CC006800h
44 03 7D F8 add r15d,dword ptr [rbp-8]
When the source registers are a simple register and a constant zero and
overflow isn't needed, emitting LEA is kinda silly.
This will occasionally save a single byte for certain registers due to
how x86 encoding works. More importantly, LEA takes up execution
resources while MOV does not.
Before:
41 8D 7D 00 lea edi,[r13]
After:
41 8B FD mov edi,r13d
When the destination register matches a source register, the other
source register contains zero, and overflow isn't needed, the
instruction becomes a nop and we don't need to emit anything.
We could add specialized handling for the case where overflow is needed,
but none of the titles I tried would hit this path.
Before:
83 C7 00 add edi,0
After:
No functional change, just simplify some repeated logic in the case
where we're dealing with exactly one immediate and one simple register
when overflow isn't needed.
On some platforms (like Windows), the temporary file must be closed
before it can be renamed.
I guess nobody noticed this for so long because (1) the FS code has a
failsafe for missing FST entries (because existing users do not have
a FST), and most games do not care about file metadata;
(2) the write failures can only be seen in the logs.
Because we don't want this to break, I have turned the ERROR_LOGs into
PanicAlerts.