Commit Graph

18746 Commits

Author SHA1 Message Date
degasus 0d92c8fb89 Jit64: Optimize dcbx 2015-08-24 18:33:23 +02:00
Tillmann Karras ac84d6d0fa Jit64: some cache flush changes
- dynamically allocate third scratch register instead of forcing ECX
- use LEA as 3 operand add if possible
- use BT,JC instead of SHR,TEST,JNZ
- merge MOV,TEST
- use appropriate ABI function (no asm change)
2015-08-24 18:33:23 +02:00
degasus 6f34b27323 Jit64: implement dcbf + dcbi 2015-08-24 18:33:19 +02:00
Markus Wick 0ad6fa8f62 Merge pull request #2903 from lioncash/cast
Memmap: Remove pointer casts
2015-08-24 15:42:56 +02:00
Lioncash abd3b124be Memmap: Remove pointer casts 2015-08-24 09:07:09 -04:00
flacs 4baf3e10c6 Merge pull request #2902 from Tilka/fpscr
Jit64: quickfix for mtfsfx
2015-08-24 13:19:26 +02:00
Tillmann Karras 33eefc2d86 Jit64: quickfix for mtfsfx 2015-08-24 12:12:31 +02:00
Ryan Houdek d3176fe22a [AArch64] Implement stfiwx
Improves povray performance by ~4%
2015-08-24 01:10:55 -05:00
Ryan Houdek 80fa9af9b1 Merge pull request #2898 from degasus/linking
JitArm64: Faster linking of continuous blocks
2015-08-23 18:09:02 -05:00
Markus Wick 8bc311ab3c Merge pull request #2897 from degasus/arm
JitArm64: Implement subfex, subfcx, addex, subfic, divwux, srwx
2015-08-23 23:52:35 +02:00
degasus 7320d519b4 JitArm64: Implement srwx 2015-08-23 23:29:48 +02:00
degasus 4722a69fd0 JitArm64: Implement divwux 2015-08-23 23:29:18 +02:00
degasus 9e4366963c JitArm64: Implement subfic 2015-08-23 23:29:07 +02:00
degasus 95be17772f JitArm64: Implement addex 2015-08-23 23:29:02 +02:00
degasus 025e7c835a JitArm64: Implement subfcx 2015-08-23 23:28:28 +02:00
degasus 550a90e691 JitArm64: Implement subfex 2015-08-23 23:28:24 +02:00
Ryan Houdek 561744819e [AArch64] Implement fctiwzx
Improves the povray benchmark time by 5.6%
2015-08-23 15:35:18 -05:00
Ryan Houdek 4fa23abbe1 [AArch64] Implement MOVI and ORR(imm) in the NEON emitter. 2015-08-23 15:34:53 -05:00
aroulin 0a0e012fab x64Emitter: add RCPPS and RCPSS SSE instructions 2015-08-23 16:59:27 +02:00
degasus 77a6798094 JitArm64: Faster linking of continuous blocks 2015-08-23 14:44:23 +02:00
Markus Wick 73067b1ef1 Merge pull request #2888 from degasus/jit64
Jit64: Faster linking of continuous blocks
2015-08-23 13:24:15 +02:00
Lioncash 2a1abf8dd6 Merge pull request #2896 from lioncash/using
Core: Minor CPU core typedef cleanup
2015-08-22 19:00:23 -04:00
Ryan Houdek cc3fb7e7b4 Merge pull request #2883 from degasus/master
Profiler: Sort output by total time
2015-08-22 17:52:54 -05:00
Markus Wick 8b881a6c34 Merge pull request #2891 from Sonicadvance1/aarch64_implement_crxxx
[AArch64] Implement the cr instructions
2015-08-23 00:44:47 +02:00
Lioncash fdafa5d063 Core: Move includes out of instruction table headers
These aren't necessary (and cause unnecessary indirect inclusions).
2015-08-22 14:15:02 -04:00
Lioncash a248a4d2ce Jit64/JitIL: Relocate instruction typedefs 2015-08-22 14:15:00 -04:00
Lioncash c56717e058 Core: Shorten the _interpreterInstruction typedef
The class itself already acts as a namespace trailer, so '_interpreter'
isn't necessary. This also gets rid of a duplicate typedef in the
Interpreter_Tables.
2015-08-22 14:14:49 -04:00
Ryan Houdek b4e4a4cef4 Disable OpenGL ES 3.1 on all Qualcomm Adreno devices.
Their new driver that supports GLES3.1 + AEP has issues with it.
At the very least they don't implement all of the geometry shader features fully which causes shader linker issues when we attempt to use them.
I don't have a device so I can't fully test, so until I do I'm going to blanket disable the whole thing.
2015-08-22 09:12:19 -05:00
Markus Wick a39c0910c4 Merge pull request #2893 from Sonicadvance1/aarch64_memory_base_register
[AArch64] Use a register as a constant for the memory base.
2015-08-22 15:41:57 +02:00
Ryan Houdek dba579c52f [AArch64] Use a register as a constant for the memory base.
Removes a /lot/ of redundant movk operations in fastmem loadstores.
Improves performance of the povray bench by ~5%
2015-08-22 08:36:34 -05:00
Markus Wick 3f5ff98c1b Merge pull request #2890 from lioncash/ptr
x64Emitter: Remove pointer casts from Write{8,16,32,64} functions
2015-08-22 10:09:28 +02:00
Markus Wick 2d505bc2a6 Merge pull request #2894 from Sonicadvance1/no_more_eaten_canary
Fix the shader overrunning our max shader size.
2015-08-22 10:08:14 +02:00
Markus Wick c2f38f1d16 Merge pull request #2892 from Sonicadvance1/aarch64_frsp
[AArch64] Implement frspx
2015-08-22 09:44:14 +02:00
Ryan Houdek 3242e1a617 Fix the shader overrunning our max shader size.
The Star Wars games really push the hardware to its limits, which can cause the shaders that are produced to be 18kb or more.
Double our maximum shader size to compensate.
Fixes issue #8860
2015-08-22 01:01:03 -05:00
Ryan Houdek ce32b76be3 [AArch64] Implement frspx
Improves performance in povray bench by 2%
2015-08-22 00:35:30 -05:00
Ryan Houdek d74eb0ea58 [AArch64] Fix the bugs in the cr instructions
Makes it a bit more efficient in the process.
2015-08-21 23:24:29 -05:00
degasus e9ade0abe1 JitArm64: implement crXXX 2015-08-21 20:49:08 -05:00
Lioncash a69755d9ee x64Emitter: Remove pointer casts from Write{8,16,32,64} functions
This also silences quite a few ubsan asserts from firing when the emitter is being used.
2015-08-21 18:09:48 -04:00
flacs 95d958c03d Merge pull request #2889 from lioncash/interp
Interpreter: Use std::isnan instead of IsNAN
2015-08-21 21:43:08 +02:00
Lioncash caec42135d MathUtil: Remove IsNAN and IsINF
These aren't necessary, since the stdlib provides equivalents.
2015-08-21 15:05:43 -04:00
flacs bb7f3d1822 Merge pull request #2867 from Tilka/mtspr_hid0
Jit64: implement HID0 case of mtspr
2015-08-21 21:04:35 +02:00
flacs 01aea965ba Merge pull request #2864 from Tilka/fpscr
Jit64: implement FPSCR related instructions
2015-08-21 21:04:20 +02:00
Lioncash 18d658df1f Interpreter_FloatingPoint: Use std::isnan instead of IsNAN
Same thing, except one is part of the stdlib.
2015-08-21 15:04:03 -04:00
degasus 78aa01e06e Jit64: Faster linking of continuous blocks
We compile the blocks as they are executed, so it's common
to link them continuously. We end with calling JMP after every
block, but often just with a distance of 0.
So just emitting NOPs instead also "calls" the next block, but
easier for the CPU.
2015-08-21 17:41:53 +02:00
Markus Wick c325c310d6 Merge pull request #2884 from lioncash/emitter
x64Emitter: Minor cleanup
2015-08-21 13:03:51 +02:00
Ryan Houdek 5f628749ff Merge pull request #2886 from Sonicadvance1/aarch64_faster_lfd
[AArch64] Optimize lfd instructions if possible.
2015-08-21 05:38:53 -05:00
Ryan Houdek df53b37253 [AArch64] Optimize lfd instructions if possible.
If we are going to be using lfd, then chances are it is going to be used in double heavy areas of code.
If we only need to load the lower register, then we should also not worry about having to insert in to the low 64bits of the guest register.
So add a new flag to the backpatching to handle lfd to directly to the destination register.
This gives ~3% performance improvement to Povray.
2015-08-21 04:31:54 -05:00
Markus Wick 4f45d71840 Merge pull request #2760 from Sonicadvance1/aarch64_fcmp
[AArch64] Implement fcmp{u,o}
2015-08-21 11:03:20 +02:00
Tillmann Karras 39ced2a2d7 AVIDump: fix -Wsign-compare warning
Cast the other side of the comparison to avoid a warning with newer
ffmpeg/libav versions (cb3591e69738c808d26ba15eb02414fedfcd91cc).
2015-08-21 10:26:35 +02:00
Markus Wick 6cb87a9227 Merge pull request #2837 from Sonicadvance1/aarch64_faster_nonpaired
[AArch64] Optimize cases when an FPR is only used for non-paired ops.
2015-08-21 09:51:45 +02:00