dolphin

Commit Graph

Author	SHA1	Message	Date
degasus	36902c58eb	JitArm64: Fix lwbrx and lhbrx	2015-09-05 13:48:29 +02:00
degasus	696f95d5f9	JitArm64: Fix subfic	2015-09-05 13:48:29 +02:00
degasus	baa28e13f4	JitArm64: Remove FLUSH_INTERPRETER It seems to be broken for some instructions, and there is no need for it any more.	2015-09-05 13:48:29 +02:00
Tillmann Karras	405554e327	Jit64: remove unnecessary indirection	2015-09-05 12:40:14 +02:00
Tillmann Karras	72eed1aa82	JitCache: drop unused method	2015-09-05 12:40:14 +02:00
Ryan Houdek	de051dac71	[AArch64] Implement integer gatherpipe writes.	2015-09-04 19:52:25 -05:00
Ryan Houdek	791c7d5a84	[AArch64] Clean up bogus vector FCVT{N,L} instruction usage. Replace the instruction with the scalar variant FCVT instruction. FCVT{N,L} 8 cycles latency on the Cortex A57 FCVT has five cycle latency and slightly higher throughput On the A72 all three of these instructions will have three cycle latency, While FCVT{N,L} will have half the throughput.	2015-09-04 19:41:54 -05:00
Ryan Houdek	2c68f6bfc5	[AArch64] Implement Fiora's preemptive paired loadstore optimization. This provides a decent speed up in pretty much everything that touches pair loadstores because in most cases they are just regular non-quantizing float loadstores that happen.	2015-09-04 19:20:33 -05:00
Rohit Nirmal	8aed7589ae	Fix building with PCH disabled.	2015-09-04 10:34:45 -05:00
Markus Wick	7ada372ed9	Merge pull request #2944 from degasus/arm JitArm64: Cleanup floating point regcache	2015-09-04 13:14:29 +02:00
booto	97f55c0cc9	VI: Less log spam in Release build	2015-09-04 17:08:19 +08:00
Lioncash	a11ae2cf30	CommonFuncs: Remove SLEEP macro There's already a function in Thread for this.	2015-09-04 02:43:38 -04:00
shuffle2	a09b9bef8d	Merge pull request #2952 from lioncash/constexpr CommonFuncs: Replace ArraySize define with constexpr equivalent	2015-09-03 22:56:25 -07:00
Lioncash	3f1b488a12	CommonFuncs: Replace ArraySize define with constexpr equivalent	2015-09-03 23:47:14 -04:00
Pierre Bourdon	8dd80b8e97	Merge pull request #2943 from booto/vi-enb VI: Respect DisplayControlRegister ENB bit	2015-09-04 03:50:39 +02:00
Lioncash	4fd060ba11	Core: Use constexpr for default pad and attachment radius	2015-09-03 19:44:42 -04:00
Shawn Hoffman	aa7208e270	[windows] Update projects to vs2015.	2015-09-03 04:23:01 -07:00
Scott Mansell	a1538a30ef	Merge pull request #2941 from lioncash/gp GPFifo: Remove pointer casts	2015-09-03 13:47:26 +12:00
Lioncash	2d224bd3b1	ActionReplay: Remove an alloca call	2015-09-02 17:41:19 -04:00
degasus	5797111ef0	JitArm64: Optimize fpr.R()	2015-09-02 22:46:14 +02:00
degasus	dfd44730c8	JitArm64: simplify fpr call	2015-09-02 22:46:14 +02:00
booto	28d788ba2c	VI: Respect DisplayControlRegister ENB bit When ENB is set to 0 (default), VI should not generate clocks, and so shouldn't generate output.	2015-09-03 04:13:32 +08:00
Lioncash	f32b79e612	GPFifo: Get rid of pointer casts	2015-09-02 15:24:33 -04:00
Lioncash	db98efdc98	GPFifo: Adjust parameter names	2015-09-02 15:20:02 -04:00
Scott Mansell	ecbb83fa0f	Merge pull request #2686 from booto/field-timing VI: derive field timing from VI registers	2015-09-03 01:09:43 +12:00
flacs	3b134497dd	Merge pull request #2774 from AdmiralCurtiss/wiimote-extension-reconnect-on-button-press Wiimote: Extend emulated Wiimote reconnect-on-button-press to attachments.	2015-09-01 18:31:39 +02:00
booto	f6e4a8e680	FifoPlayer: Use VI derived timing, not hardcoded 60Hz	2015-09-01 20:24:42 +08:00
booto	8d6c39a89d	VI: Adjust forced-progressive hack per magumagu's suggestion	2015-09-01 20:24:41 +08:00
booto	acc9a74174	VI: Restore forced-progressive hack with option Bugfix: TargetRefreshRate uses rounded result NTSC's 59.94 was becoming 59 with integer division.	2015-09-01 20:24:40 +08:00
booto	480dbb22f2	VI: derive field timing from VI registers	2015-09-01 20:24:40 +08:00
Ryan Houdek	ae0a06a018	[AArch64] Implement dcbz instruction	2015-08-31 15:39:47 -05:00
Ryan Houdek	0f54aa48b4	Merge pull request #2928 from Sonicadvance1/aarch64_improved_singles [AArch64] Improve floating point single instructions.	2015-08-31 12:00:08 -05:00
Ryan Houdek	bcde1aa8ff	[AArch64] Improve floating point single instructions. Instead of having an "INS" instruction after every single instruction to duplicate the bottom 64bits in to the top 64bits of the register, create a new FPR register cache type to track when a register's lower 64bits is supposed to be duplicated in to the high 64bits. Not necessarily actually having the lower bits duplicated in the host side register. This removes inefficient INS instructions from sequential single float instructions. In particular a very heavy single heavy block in Animal Crossing went from 712 instructions down to 520 instructions(~37% less instructions!)	2015-08-31 11:09:17 -05:00
Ryan Houdek	d003934b8a	Merge pull request #2929 from Sonicadvance1/aarch64_optimize_gpr_flush Aarch64 optimize gpr flush	2015-08-31 10:55:45 -05:00
Ryan Houdek	8bf332cf08	[AArch64] Optimize GPR cache flushing. If we are flushing multiple sequential guest GPRs then we can store two in a single STP instruction. Ikaruga does this quite a bit in their blocks where they do an lmw at the very end and then we have to flush them all. Typically cuts 16 STR instructions down to 8 STP instructions there.	2015-08-30 23:07:12 -05:00
Scott Mansell	368867dba0	Merge pull request #2922 from aserna3/SDBlock Implemented ability to block writes to the SD card	2015-08-31 04:51:50 +12:00
Ryan Houdek	b907576510	[AArch64] Support profiling by cycle counters if they are available to EL0	2015-08-30 10:25:16 -05:00
Ryan Houdek	5110574c1f	Merge pull request #2921 from Sonicadvance1/aarch64_optimize_lmw [AArch64] Optimize lmw.	2015-08-30 10:23:57 -05:00
Lioncash	df19f11cb9	Jit_Util: Add missing override specifiers	2015-08-29 00:30:18 -04:00
Anthony Serna	db7fe9507e	Implemented ability to block writes to the SD card Renamed variable to be more accurate	2015-08-28 17:32:29 -07:00
Ryan Houdek	8d61706440	[AArch64] Optimize lmw. This instruction is fairly heavily used by Ikaruga to load a bunch of registers from the stack. In particular at the start of the second stage is a block that takes up ~20% CPU time that includes a usage of lmw to load half of the guest registers. Basic thing optimized here is changing from a single 32bit LDR to potentially a single 128bit LDR. a single 32bit LDR is fairly slow, so we can optimize a few ways. If we have four or more registers to load, do a 64bit LDP in to two host registers, byteswap, and then move the high 32bits of the host registers in to the correct mapped guest register locations. If we have two registers to load then do a 32bit LDP which will load two guest registers in a single instruction. and then if we have only one register left to load, load it as before. This saves quite a bit of cycles since the Cortex-A57 and A72's LDR instruction takes a few cycles. Each 32bit LDR takes 4 cycles latency, plus 1 cycle for post-index(which typically happens in parallel. Both the 32bit and 64bit LDP take the same amount of latency. So we are improving latencies and reducing code bloat here.	2015-08-28 14:40:30 -05:00
Ryan Houdek	2c3fa8da28	[AArch64] Fix a bug in the register caches. This is a bug that crops if BindToRegister() is called multiple times in a row without a R() function call between them. How to reproduce the bug: 1) Have a completely filled cache with no host register remaining 2) Call BindToRegister() with different guest registers 3) Don't call R() between the BindToRegister() calls. This issue typically wouldn't be seen for a couple of reasons. Typically we have /plenty/ of registers in the cache, and in most cases we only call BindToRegister() once per instruction. In the off chance that it is called multiple times, it wouldn't update the last used counts and would flush the same register as the previous call to it.	2015-08-28 14:36:14 -05:00
Lioncash	d86d5fae9f	Merge pull request #2909 from aserna3/DollsAndElves Implemented .elf and .dol support in gamelist	2015-08-28 14:28:09 -04:00
Anthony Serna	faedf1bc5c	Implemented .elf and .dol support in gamelist Fixed a TON of structuring, formatting. removed README.txt files from themes at MaJoR's request Added platform icon for ELFs/DOLs	2015-08-28 11:10:03 -07:00
degasus	e516d4ef59	JitArm64: Implement rlwnmx	2015-08-26 21:59:10 +02:00
flacs	99e88a7af7	Merge pull request #2887 from Tilka/swap Jit64: some byte-swapping changes	2015-08-26 16:43:45 +02:00
flacs	eb6ac641be	Merge pull request #2906 from Tilka/fpscr Jit64: fix bugs in the FPSCR instructions	2015-08-26 16:43:28 +02:00
Tillmann Karras	6ec4bdf862	CoreTiming: remove unused functions	2015-08-26 15:40:15 +02:00
Tillmann Karras	0f4861cac2	CoreTiming: make loops easier to read	2015-08-26 14:53:58 +02:00
Ryan Houdek	ca51f1a4f6	[AArch64] Optimize paired registers being used in double operations. In particular this optimizes the case where a 32bit float is loaded via lfs, and then used in double operations. This happens very often in Gekko based code because the best way to load a 32bit value as a double is lfs since it automatically turns in to a double value. There are a few other implications of this in practice as well. Like if both of the paired registers are loaded via psq_l and then used in double operations it would be improved. Also if we implement a double register we've got to be careful to make sure we understand if it is in "lower" register or the full 128bit register.	2015-08-26 05:50:04 -05:00
Markus Wick	5716d18d10	Merge pull request #2910 from Sonicadvance1/aarch64_regcache_fix [AArch64] Fix a bug in the register cache.	2015-08-26 08:31:24 +02:00
Ryan Houdek	4f5f29a0fb	[AArch64] Fix a bug in the register cache. If the register was only a lower pair and it needed the full register, then we need to load the high 64bits. Which we weren't doing before.	2015-08-26 01:21:43 -05:00
Markus Wick	43d17cb360	Merge pull request #2904 from Sonicadvance1/aarch64_more_inst [AArch64] Implement fdivx/fdivsx/mfcr/mtcrf.	2015-08-26 07:48:24 +02:00
Tillmann Karras	ee4a12ffe2	Jit64: some byte-swapping changes	2015-08-26 05:41:18 +02:00
Ryan Houdek	6729a36d8d	[AArch64] Set BindToRegister's to_load correctly for double FP ops.	2015-08-25 21:29:27 -05:00
Lioncash	db4f692482	GCMemcard: Clean up memcard logging messages.	2015-08-25 21:55:52 -04:00
Tillmann Karras	ee50a2ef28	Jit64: fix bugs in the FPSCR instructions	2015-08-25 23:48:14 +02:00
Markus Wick	bd08c1b01a	Merge pull request #2901 from Sonicadvance1/aarch64_stfiwx [AArch64] Implement stfiwx	2015-08-25 22:47:39 +02:00
Markus Wick	24cb650078	Merge pull request #2663 from degasus/dcbx Jit64: dcbf + dcbi	2015-08-25 12:16:56 +02:00
Ryan Houdek	0666c0750b	[AArch64] Implement fdivx/fdivsx/mfcr/mtcrf. Gets the povray bench to better times than the Wii.	2015-08-24 15:32:19 -05:00
Ryan Houdek	d96be9250c	Merge pull request #2899 from Sonicadvance1/aarch64_fctiwzx [AArch64] Implement fctiwzx	2015-08-24 13:22:27 -05:00
degasus	0d92c8fb89	Jit64: Optimize dcbx	2015-08-24 18:33:23 +02:00
Tillmann Karras	ac84d6d0fa	Jit64: some cache flush changes - dynamically allocate third scratch register instead of forcing ECX - use LEA as 3 operand add if possible - use BT,JC instead of SHR,TEST,JNZ - merge MOV,TEST - use appropriate ABI function (no asm change)	2015-08-24 18:33:23 +02:00
degasus	6f34b27323	Jit64: implement dcbf + dcbi	2015-08-24 18:33:19 +02:00
Markus Wick	0ad6fa8f62	Merge pull request #2903 from lioncash/cast Memmap: Remove pointer casts	2015-08-24 15:42:56 +02:00
Lioncash	abd3b124be	Memmap: Remove pointer casts	2015-08-24 09:07:09 -04:00
Tillmann Karras	33eefc2d86	Jit64: quickfix for mtfsfx	2015-08-24 12:12:31 +02:00
Ryan Houdek	d3176fe22a	[AArch64] Implement stfiwx Improves povray performance by ~4%	2015-08-24 01:10:55 -05:00
Ryan Houdek	80fa9af9b1	Merge pull request #2898 from degasus/linking JitArm64: Faster linking of continuous blocks	2015-08-23 18:09:02 -05:00
degasus	7320d519b4	JitArm64: Implement srwx	2015-08-23 23:29:48 +02:00
degasus	4722a69fd0	JitArm64: Implement divwux	2015-08-23 23:29:18 +02:00
degasus	9e4366963c	JitArm64: Implement subfic	2015-08-23 23:29:07 +02:00
degasus	95be17772f	JitArm64: Implement addex	2015-08-23 23:29:02 +02:00
degasus	025e7c835a	JitArm64: Implement subfcx	2015-08-23 23:28:28 +02:00
degasus	550a90e691	JitArm64: Implement subfex	2015-08-23 23:28:24 +02:00
Ryan Houdek	561744819e	[AArch64] Implement fctiwzx Improves the povray benchmark time by 5.6%	2015-08-23 15:35:18 -05:00
degasus	77a6798094	JitArm64: Faster linking of continuous blocks	2015-08-23 14:44:23 +02:00
Markus Wick	73067b1ef1	Merge pull request #2888 from degasus/jit64 Jit64: Faster linking of continuous blocks	2015-08-23 13:24:15 +02:00
Lioncash	2a1abf8dd6	Merge pull request #2896 from lioncash/using Core: Minor CPU core typedef cleanup	2015-08-22 19:00:23 -04:00
Markus Wick	8b881a6c34	Merge pull request #2891 from Sonicadvance1/aarch64_implement_crxxx [AArch64] Implement the cr instructions	2015-08-23 00:44:47 +02:00
Lioncash	fdafa5d063	Core: Move includes out of instruction table headers These aren't necessary (and cause unnecessary indirect inclusions).	2015-08-22 14:15:02 -04:00
Lioncash	a248a4d2ce	Jit64/JitIL: Relocate instruction typedefs	2015-08-22 14:15:00 -04:00
Lioncash	c56717e058	Core: Shorten the _interpreterInstruction typedef The class itself already acts as a namespace trailer, so '_interpreter' isn't necessary. This also gets rid of a duplicate typedef in the Interpreter_Tables.	2015-08-22 14:14:49 -04:00
Markus Wick	a39c0910c4	Merge pull request #2893 from Sonicadvance1/aarch64_memory_base_register [AArch64] Use a register as a constant for the memory base.	2015-08-22 15:41:57 +02:00
Ryan Houdek	dba579c52f	[AArch64] Use a register as a constant for the memory base. Removes a /lot/ of redundant movk operations in fastmem loadstores. Improves performance of the povray bench by ~5%	2015-08-22 08:36:34 -05:00
Markus Wick	c2f38f1d16	Merge pull request #2892 from Sonicadvance1/aarch64_frsp [AArch64] Implement frspx	2015-08-22 09:44:14 +02:00
Ryan Houdek	ce32b76be3	[AArch64] Implement frspx Improves performance in povray bench by 2%	2015-08-22 00:35:30 -05:00
Ryan Houdek	d74eb0ea58	[AArch64] Fix the bugs in the cr instructions Makes it a bit more efficient in the process.	2015-08-21 23:24:29 -05:00
degasus	e9ade0abe1	JitArm64: implement crXXX	2015-08-21 20:49:08 -05:00
flacs	95d958c03d	Merge pull request #2889 from lioncash/interp Interpreter: Use std::isnan instead of IsNAN	2015-08-21 21:43:08 +02:00
flacs	bb7f3d1822	Merge pull request #2867 from Tilka/mtspr_hid0 Jit64: implement HID0 case of mtspr	2015-08-21 21:04:35 +02:00
flacs	01aea965ba	Merge pull request #2864 from Tilka/fpscr Jit64: implement FPSCR related instructions	2015-08-21 21:04:20 +02:00
Lioncash	18d658df1f	Interpreter_FloatingPoint: Use std::isnan instead of IsNAN Same thing, except one is part of the stdlib.	2015-08-21 15:04:03 -04:00
degasus	78aa01e06e	Jit64: Faster linking of continuous blocks We compile the blocks as they are executed, so it's common to link them continuously. We end with calling JMP after every block, but often just with a distance of 0. So just emitting NOPs instead also "calls" the next block, but easier for the CPU.	2015-08-21 17:41:53 +02:00
Ryan Houdek	5f628749ff	Merge pull request #2886 from Sonicadvance1/aarch64_faster_lfd [AArch64] Optimize lfd instructions if possible.	2015-08-21 05:38:53 -05:00
Ryan Houdek	df53b37253	[AArch64] Optimize lfd instructions if possible. If we are going to be using lfd, then chances are it is going to be used in double heavy areas of code. If we only need to load the lower register, then we should also not worry about having to insert in to the low 64bits of the guest register. So add a new flag to the backpatching to handle lfd to directly to the destination register. This gives ~3% performance improvement to Povray.	2015-08-21 04:31:54 -05:00
Markus Wick	4f45d71840	Merge pull request #2760 from Sonicadvance1/aarch64_fcmp [AArch64] Implement fcmp{u,o}	2015-08-21 11:03:20 +02:00
Markus Wick	6cb87a9227	Merge pull request #2837 from Sonicadvance1/aarch64_faster_nonpaired [AArch64] Optimize cases when an FPR is only used for non-paired ops.	2015-08-21 09:51:45 +02:00
Ryan Houdek	7ce4c3138e	[AArch64] Optimize cases when an FPR is only used for non-paired ops.	2015-08-20 23:36:29 -05:00
Lioncash	95c57fcec1	Jit: Remove unnecessary namespace prefixes	2015-08-20 05:20:19 -04:00

1 2 3 4 5 ...

6192 Commits