xenia-canary

Commit Graph

Author	SHA1	Message	Date
Radosław Gliński	6bc3191b97	Merge pull request #60 from chrisps/canary_experimental Superbig boost in performance?	2022-08-13 23:14:31 +02:00
chss95cs@gmail.com	495b1f8bc8	once again return to spinloop	2022-08-13 14:05:35 -07:00
chss95cs@gmail.com	c9e4119428	Add branch of ffmpeg with non-recursive split_radix_permutation Add branch of disruptorplus with working blocking_wait_stategy Switch back to blocking wait for timer queue	2022-08-13 13:43:45 -07:00
chss95cs@gmail.com	020d64a1a1	revert to using old bad spinwait, disruptorplus' blocking_wait code does not compile	2022-08-13 13:20:35 -07:00
chss95cs@gmail.com	cb85fe401c	Huge set of performance improvements, combined with an architecture specific build and clang-cl users have reported absurd gains over master for some gains, in the range 50%-90% But for normal msvc builds i would put it at around 30-50% Added per-xexmodule caching of information per instruction, can be used to remember what code needs compiling at start up Record what guest addresses wrote mmio and backpropagate that to future runs, eliminating dependence on exception trapping. this makes many games like h3 actually tolerable to run under a debugger fixed a number of errors where temporaries were being passed by reference/pointer Can now be compiled with clang-cl 14.0.1, requires -Werror off though and some other solution/project changes. Added macros wrapping compiler extensions like noinline, forceinline, __expect, and cold. Removed the "global lock" in guest code completely. It does not properly emulate the behavior of mfmsrd/mtmsr and it seriously cripples amd cpus. Removing this yielded around a 3x speedup in Halo Reach for me. Disabled the microprofiler for now. The microprofiler has a huge performance cost associated with it. Developers can re-enable it in the base/profiling header if they really need it Disable the trace writer in release builds. despite just returning after checking if the file was open the trace functions were consuming about 0.60% cpu time total Add IsValidReg, GetRegisterInfo is a huge (about 45k) branching function and using that to check if a register was valid consumed a significant chunk of time Optimized RingBuffer::ReadAndSwap and RingBuffer::read_count. This gave us the largest overall boost in performance. The memcpies were unnecessary and one of them was always a no-op Added simplification rules for multiplicative patterns like (x+x), (x<<1)+x For the most frequently called win32 functions i added code to call their underlying NT implementations, which lets us skip a lot of MS code we don't care about/isnt relevant to our usecases ^this can be toggled off in the platform_win header handle indirect call true with constant function pointer, was occurring in h3 lookup host format swizzle in denser array by default, don't check if a gpu register is unknown, instead just check if its out of range. controlled by a cvar ^looking up whether its known or not took approx 0.3% cpu time Changed some things in /cpu to make the project UNITYBUILD friendly The timer thread was spinning way too much and consuming a ton of cpu, changed it to use a blocking wait instead tagged some conditions as XE_UNLIKELY/LIKELY based on profiler feedback (will only affect clang builds) Shifted around some code in CommandProcessor::WriteRegister based on how frequently it was executed added support for docdecaduple precision floating point so that we can represent our performance gains numerically tons of other stuff im probably forgetting	2022-08-13 12:59:00 -07:00
Radosław Gliński	2f59487bf3	Merge pull request #59 from Uraniumm/canary_experimental Add nullptr check in CheckScalarConstCmp	2022-08-08 19:47:35 +02:00
Uraniumm	a16acbaf59	add nullptr check to mitigate crashes wip for reach untracked tags build fixes	2022-08-08 02:02:25 -04:00
Radosław Gliński	3ac99e0d7d	Merge pull request #58 from chrisps/canary_experimental [CPU] VKPKX Implementation, miscellaneous fixes	2022-08-08 07:54:26 +02:00
chss95cs@gmail.com	324a8eb818	A bunch of fixes for division logic: "turns out theres a lot of quirks with the div instructions we havent been covering if the denom is 0, we jump to the end and mov eax/rax to dst, which is correct because ppc raises no exceptions for divide by 0 unlike x86 except we don't initialize eax before that jump, so whatever garbage from the previous sequence that has been left in eax/rax is what the result of the instruction will be and then in our constant folding, we don't do the same zero check in Value::Div, so if we constant folded the denom to 0 we will host crash the ppc manual says the result for a division by 0 is undefined, but in reality it seems it is always 0 there are a few posts i saw from googling about it, and tests on my rgh gave me 0, but then another issue came up and that is that we dont check for signed overflow in our division, so we raise an exception if guest code ever does (1<<signbit_pos) / -1 signed overflow in division also produces 0 on ppc the last thing is that if src2 is constant we skip the 0 check for division without checking if its nonzero all weird, likely very rare edge cases, except for maybe the signed overflow division chrispy — Today at 9:51 AM oh yeah, and because the int members of constantvalue are all signed ints, we were actually doing signed division always with constant folding" fixed an earlier mistake by me with the precision of fresx made some optimization disableable implemented vkpkx fixed possible bugs with vsr/vsl constant folding disabled the nice imul code for now, there was a bug with int64 version and i dont have time to check started on multiplication/addition/subtraction/division identities Removed optimized VSL implementation, it's going to have to be rewritten anyway Added ppc_ctx_t to xboxkrnl shim for direct context access started working on KeSaveFloatingPointState, re'ed most of it Exposed some more state/functionality to the kernel for implementing lower level routines like the save/restore ones Add cvar to re-enable incorrect mxcsr behavior if a user doesnt care and wants better cpu performance Stubbed out more impossible sequences, replace mul_hi_i32 with a 64 bit multiply	2022-08-07 10:41:26 -07:00
Gliniak	f45e9e5e9a	[Kernel] Improved handling of internal display resolution	2022-08-02 12:09:25 +02:00
Gliniak	0e1353aa71	Implemented Opcode: mcrf	2022-08-01 14:54:05 +02:00
Radosław Gliński	332f69f36b	Merge pull request #57 from chrisps/canary_experimental Add separate VMX/fpu mxcsr	2022-07-31 18:43:30 +02:00
chss95cs@gmail.com	968f656d96	Add separate VMX/fpu mxcsr Add support for constant operands for most fpu instructions Remove constant folding for most fpu cpde half float	2022-07-31 08:56:36 -07:00
Radosław Gliński	3185b0ac9c	Merge pull request #55 from Etokapa/patch-1 Adjustments for Building Canary	2022-07-31 09:24:55 +02:00
Etokapa	bdf9e8d65a	Adjustments for Building Canary	2022-07-30 11:13:17 -05:00
Gliniak	5d1b641197	[Emulator] Added possiblity to install multiple packages at once	2022-07-30 15:52:41 +02:00
Gliniak	79ffbe3971	Merge branch 'importContent' of https://github.com/Gliniak/xenia.git into canary_experimental	2022-07-30 12:44:24 +02:00
Gliniak	0e3403d6da	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-07-30 12:42:51 +02:00
Gliniak	433a8a8a5e	[Emulator] Added option for content installation	2022-07-30 12:41:26 +02:00
Triang3l	7595cdb52b	[Vulkan] Non-GS point sprites + minor SPIR-V fixes	2022-07-27 17:14:28 +03:00
Triang3l	ff7ef05063	[SPIR-V] Clamp cube face using NClamp, not NMax/FMin	2022-07-26 17:08:12 +03:00
Triang3l	66c995f3aa	[SPIR-V] Saturate point sprite coordinates	2022-07-26 17:04:22 +03:00
Triang3l	8fb5da18ea	[Vulkan] Add forgotten fullDrawIndexUint32 check	2022-07-26 16:24:14 +03:00
Triang3l	9fa41c27bc	[Vulkan] Point sprite geometry shader	2022-07-26 16:01:20 +03:00
Gliniak	0c3019981c	[Video] Added option to set internal output resolution	2022-07-26 11:25:03 +02:00
Gliniak	76806e08c5	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-07-26 10:22:38 +02:00
Triang3l	f248e23079	[DXBC] Skip backface check in point PsParamGen	2022-07-25 21:48:25 +03:00
Triang3l	77e85ecaa4	[Vulkan] 32-bit index fetch without fullDrawIndexUint32	2022-07-25 16:53:12 +03:00
Gliniak	061000af01	[Base] Changed size of bitstream accessed data (Risky) This prevents crashing in situation when buffer_ + offset_bytes is at the end of allocated memory range and can go into unallocated space	2022-07-25 10:52:21 +02:00
Gliniak	364137ef5f	[XAM] Send UI On notification on start of XamShowSigninUI	2022-07-25 10:50:32 +02:00
Gliniak	6730ffb7d3	Merge branch 'canary_experimental' of https://github.com/xenia-canary/xenia-canary into canary_experimental	2022-07-24 17:58:48 +02:00
Gliniak	6e501fbd61	[XAM] Set license mask for DLCs (Thanks Beeanyew)	2022-07-24 17:58:00 +02:00
Gliniak	98c2cb636f	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-07-24 17:38:08 +02:00
Triang3l	37579d3bf0	[GPU] Treat non-adaptive-tessellated patches as 1-control-point	2022-07-24 17:38:26 +03:00
Radosław Gliński	5c18f3a5dd	Merge pull request #53 from chrisps/canary_experimental Add special cases to DOT_PRODUCT_3/4 that detect whether they're calc…	2022-07-23 21:53:30 +02:00
chss95cs@gmail.com	33a6cfc0a7	Add special cases to DOT_PRODUCT_3/4 that detect whether they're calculating lengthsquared Add alternate path to DOT_PRODUCT_3/4 for use_fast_dot_product that skips all the status register stuff and just remaps inf to qnan Add OPCODE_TO_SINGLE to replace the CONVERT_F32_F64 - CONVERT_F64_F32 sequence we used to emit with the idea that a backend could implement a more correct rounding behavior if possible on its arch Remove some impossible sequences like MUL_HI_I8/I16, MUL_ADD_F32, DIV_V128. These instructions have no equivalent in PPC. Many other instructions are unused/dead code and should be removed to make the x64 backend a better reference for future ones Add backend_flags to Instr. Basically, flags field that a backend can use for whatever it wants when generating code. Add backend instr flag to x64 that tells it to not generate code for an instruction. this allows sequences to consume subsequent instructions Generate actual x64 code for VSL instruction instead of using callnativesafe Detect repeated COMPARE instructions w/ identical operands and reuse the results in FLAGS if so. this eliminates a ton of garbage compare/set instructions. If a COMPARE instructions destination is stored to context with no intervening instruction and no additional uses besides the store, do setx [ctx address] Detect prefetchw and use it in CACHE_CONTROL if prefetch for write is requested instead of doing prefetch to all cache levels Fixed an accident in an earlier commit by me, VECTOR_DENORMFLUSH was not being emitted at all, so denormal inputs to MUL_ADD_V128 were not becoming zero and outputs from DOT_PRODUCT_X were not either. I believe this introduced a bug into RDR where a wagon wouldnt spawn? (https://discord.com/channels/308194948048486401/308207592482668545/1000443975817252874) Compute fresx in double precision using RECIP_F64 and then round to single instead of doing (double)(1.0f / (float)value), matching original behavior better Refactor some of ppc_emit_fpu, much of the InstrEmit function are identical except for whether they round to single or not Added "tail emitters" to X64Emitter. These are callbacks that get invoked with their label and the X64Emitter after the epilog code. This allows us to move cold code out of the critical path and in the future place constant pools near functions guest_to_host_thunk/host_to_guest_thunk now gets directly rel32 called, instead of doing a mov Add X64BackendContext structure, represents data before the start of the PPCContext Instead of doing branchless sequence, do a compare and jump to tail emitted code for address translation. This makes converting addresses a 3 uop affair in most cases. Do qnan move for dot product in a tail emitter Detect whether EFLAGS bits are independent variables for the current cpu (not really detecting it ehe, just checking if zen) and if so generate inc/dec for add/sub 1 Detect whether low 32 bits of membase are 0. If they are then we can use membasereg.cvt32() in place of immediate 0 in many places, particularly in stores Detect LOAD MODIFY STORE pattern for context variables (currently only done for 64 bit ones) and turn them into modify [context ptr]. This is done for add, sub, and, or, xor, not, neg Tail emit error handling for TRAP opcodes Stub out unused trap opcodes like TRAP_TRUE_I32, TRAP_TRUE_I64, TRAP_TRUE_I16 (the call_true/return_true opcodes for these types are also probably unused) Remove BackpropTruncations. It was poorly written and causes crashes on the game Viva pinata (https://discord.com/channels/308194948048486401/701111856600711208/1000249460451983420)	2022-07-23 12:10:07 -07:00
Gliniak	1fcac00924	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-07-23 13:26:31 +02:00
Triang3l	3c12814276	[GPU] EDRAM looped addressing (resolves #2031 )	2022-07-22 23:51:50 +03:00
Margen67	1d480f74fa	Add game patches to releases	2022-07-22 02:06:00 -07:00
Gliniak	0c782ade8e	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-07-21 18:52:33 +02:00
Margen67	74d83e4af8	Python/xenia-build/xb fixes Use variable for Python version to make upgrading easier. xb.bat: Update copyright date. Add candidate paths. xb.ps1 Properly use found python executable. More consistency with .bat. Don't spew unnecessary errors, etc. EOF newline.	2022-07-21 08:31:35 -05:00
Triang3l	6ff312afb1	[DXBC] Update PsParamGen comment [ci skip]	2022-07-21 12:42:06 +03:00
Triang3l	1a95bef8b3	[GPU] Eliminate unused shader I/O, UCP culling, centroid on Vulkan For more optimal usage of exports and the parameter cache on the host regardless of how effective the optimizations in the host GPU driver are. Also reserve space for Vulkan/Metal/D3D11-specific HostVertexShaderTypes to use one more bit for the host vertex shader type in the shader modification bits, so that won't have to be done in the future as that would require invalidating shader storages (which are invalidated by this commit) again.	2022-07-21 12:32:28 +03:00
Gliniak	0f60e23208	[Kernel] Removed input change notifications from initial notify list	2022-07-19 10:46:36 +02:00
Gliniak	bc315d21e0	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-07-19 10:45:14 +02:00
Triang3l	0a94b86cb8	[GPU] Remove orphaned GetPresentArea declaration [ci skip]	2022-07-18 21:02:34 +03:00
Gliniak	57b514ea6a	Removed (again) unnecessary include	2022-07-18 09:40:45 +02:00
Radosław Gliński	3757580f45	Merge pull request #52 from chrisps/canary_experimental Fix previous batch of CPU changes	2022-07-18 09:20:35 +02:00
Gliniak	fd78ab4dfc	[Patcher] Allow loading patches from non-utf8 paths	2022-07-18 08:46:04 +02:00
Joel Linn	a41770acc5	[xenia-build] Check for clang-format 14	2022-07-17 19:57:37 -05:00

... 3 4 5 6 7 ...

7249 Commits All Branches Search

7249 Commits

All Branches