With the float instructions that only affect the lower 64bits of the destination register, we need to make sure to load the full 128bit register.
This ensures that we aren't saving garbage in to the top 64bits.
JitCommon is becoming a cluster of x86 specific things and things that are common to multiple recompilers.
This overlap is beginning to cause us issues.
Begin by breaking out the common ASM arrays to have their own file and move the x86 specific routines to their own folder.
This changes the behavior if both texture are available. The old code did
try to load the modfied texID, the new code tries the unmodified texID first.
Intellisense doesn't like defines in PCH files, and it doesn't like the deleted
constructor for BitField. (I think it's being overly strict about the
"must have no non-default constructors" rule for classes in unions.)
For floating loads with a known address, this eliminates the pattern of:
mov r12d, 80001014
mov rdx, r12d
mov rdx, dword ptr [rbp+rdx]
and generates a nice simple:
mov rdx, dword ptr [rbp+00001014]
Instead of swaping each word of the elf code section(s) looking
for a match to our pattern, we swap the pattern just once (at
compile time) and test against our swapped pattern.
* Don't claim to support any features we don't, like relocation
* Actually zero-out BSS sections, as memory might not be already
zeroed.
* Deleted commented out code.
* Removed GetPointer, updated to more modern interface methods.
* Updated pointer types style from "u32 *x" to "u32* x"
* The file already exsists, otherwise we wouldn't have gotten
this far in the boot.
* We have already checked if it's a Wii or GameCube elf,
besides, it's too late to change our minds now anyway.
* On Wii - Don't call EmulatedBS2, it can never succeed as
it knows nothing about booting elfs. Just call the
SetupWiiMemory directly if needed.
* On GameCube - We still call EmulatedBS2_GC, but we stop
it from running Apploader, which might boot something
unexpected from the default iso or DVD root folder.
- Calculate ZSlope every flush but only set PixelShader Constant on Reset Buffer when zfreeze
- Fixed another Pixel Shader bug in D3D that was giving me grief
Results are still not correct, but things are getting closer.
* Don't cull CULLALL primitives so early so they can be used as reference
planes.
* Convert CalculateZSlope to screenspace coordinates.
* Convert Pixelshader to screenspace coordinates (instead of worldspace
xy coordinates, which is totally wrong)
* Divide depth by 2^24 instead of clamping to 0.0-1.0 as was done
before.
Progress:
* Rouge Squadron 2/3 appear correct in game (videos in rs2 save file
selection are missing)
* Shadows draw 100% correctly in NHL 2003.
* Mario golf menu renders correctly.
* NFS: HP2, shadows sometimes render on top of car or below the road.
* Mario Tennis, courts and shadows render correctly, but at wrong depth
* Blood Omen 2, doesn't work.
Based on the feedback from pull request #1767 I have put in most of
degasus's suggestions in here now.
I think we have a real winner here as moving the code to
VertexManagerBase for a function has allowed OGL to utilize zfreeze now
:)
Correct use of the vertex pointer has also corrected most of the issue
found in pull request #1767 that JMC47 stated. Which also for me now
has Mario Tennis working with no polygon spikes on the characters
anymore! Shadows are still an issue and probably in the other games
with shadow problems. Rebel Strike also seems better but random skybox
glitches can show up.
Initial port of original zfreeze branch (3.5-1729) by neobrain into
most recent build of Dolphin.
Makes Rogue Squadron 2 very playable at full speed thanks to recent core
speedups made to Dolphin. Works on DirectX Video plugin only for now.
Enjoy! and Merry Xmas!!
The Backpatching routines didn't correctly understand where to find the real VFP register from, so in most cases it was using D0.
Fixes bugs in the slowmem loadstore routines as well.
On locales that don't use period as a separator this would break us.
For vector values in a configuration, we use comma as a separator which causes the configuration to balloon to massive sizes due to never saving them
correctly. Loading would then break since it would load a million configuration options.
Fixes issue #7569.
Instead of doing vector operations and throwing away the top 64bits of each operation, let's instead use scalar operations.
On Cortex-A57 this saves us three cycles per vector operation changed to scalar, so this saves 3-9cycles per instruction emulated.
Also puts one less micro-op in to the vector pipeline there.
On the Nvidia Denver I couldn't see any noticeable performance difference, but it's a quirky architecture so it may be noticing we are throwing away
the top bits anyway and optimizing it. The world may never know what's truly happening there.
We can compile with haptic support, and then not initialize due to haptics not being available.
So if we are compiling with haptics, test initializing with haptics and if that fails attempt to initialize without haptics before bailing out.
Allows the UI to easily check the current exclusive mode state.
This simplifies a few checks and prevents the user from ever getting stuck in fullscreen.
The maths appears to give crazy impossible answers without this fix, but the cause is all the ints being "promoted" to unsigned because of the single unsigned division at the end.
Don't change the texID depending on the tlut_hash for paletted textures that are efb copies and don't have an entry in the cache for texID ^ tlut_hash. This makes those textures less broken when using efb to texture.
This is not really fixing those textures, but it's a step forward. The mini map in Twilight Princess for example is in grayscales with this and is more or less usable.
For offsets that fit in the instruction encoding then we should just put it in the instruction encoding.
Saves an instruction in a large amount of loadstores.
Someone thought it would be a good idea to have the location as the first argument on the instruction.
Changed it to how it is supposed to be disassembled.
There are a couple things in this PR.
Fixes a bug where if we hit an invalid instruction we would infinite loop.
Fixes an issue where on AArch64 it would show invalid instructions for all NEON instructions.
This was due to asimd and crc being optional extensions and LLVM not enabling them by default.
So we have to specify a CPU which has the feature. LLVM 3.6 will let us select by features instead of CPUs, but we don't have a release of that quite
yet.
If we are on an architecture that has a known instruction size, we will continue onward after hitting the invalid instruction. If we don't have a
known instruction size like on x86, we will instead just dump the rest of the block.