Commit Graph

  • 2e5c4937fd Add constant folding for LVR when 16 aligned, clean up prior commit by removing dead test code for LVR/LVL/STVL/STVR opcodes and legacy hir sequence Delay using mm_pause in KeAcquireSpinLockAtRaisedIrql_entry, a huge amount of time is spent spinning in halo3 chss95cs@gmail.com 2022-09-04 11:44:29 -0700
  • c6010bd4b1 nasty commit with a bunch of test code left in, will clean up and pr chss95cs@gmail.com 2022-09-04 11:04:41 -0700
  • 892bce5abc
    Merge 35920a9bdc into 7595cdb52b Mohamad risza 2022-08-31 04:27:00 +0800
  • 35920a9bdc
    Galeri Mohamad risza 2022-08-31 04:26:46 +0800
  • c1d3e35eb9
    Merge pull request #66 from chrisps/canary_experimental Radosław Gliński 2022-08-29 00:55:24 +0200
  • 78c9a48bc2 also use vastcpy for shared memory page stuff chss95cs@gmail.com 2022-08-28 14:52:12 -0700
  • f31869092c Fixed a bug with readback_resolve and readback_memexport that was responsible for a large portion of their overhead. readback_memexport and resolve are now usable for games, depending on your hardware. in my case games that were slideshows now run at like 20-30 fps, and my hardware isnt the best for xenia. add split_map class for mapping keys to values in a way that optimizes for frequent searches and infrequent insertions/removals remove jump table implementation of GetColorRenderTargetFormatComponentCount, it was appearing relatively high in profiles. instead pack the component counts into a single 32 bit word, which is indexed by shifting Add cvar to align all basic blocks to a boundary Add mmio aware load paths liberally apply XE_RESTRICT in ringbuffer related code Removed the IS_TRUE and IS_FALSE opcodes, they were pointless duplicates of COMPARE_EQ/COMPARE_NE and i want to simplify our set of opcodes for future backends More work on LVSR/LVSL/STVR/STVL opcodes Optimized X64 translated code emission, now only compute instrkey once Add code for pre-computing integer division magic numbers Optimized GetHostViewportInfo a little Move args for GetHostViewportInfo into a class, cache the result and compare for future queries. moved GetHostViewportInfo far lower on the profile Add (currently not functional, and very racy) asynchronous memcpy code. will improve it and actually use it in future commits. Add non-temporal memcpy function for huge page-aligned allocations. Used for copying to shared memory/readback hoist are_accumulated_render_targets_valid_ check out of loop in render_target_cache already bound check. Add stosb/movsb code for small constant memcpys/memsets that arent worth the overhead of memcpy/memset chss95cs@gmail.com 2022-08-28 14:24:25 -0700
  • 335a390d43
    Merge pull request #64 from beeanyew/cpu-updates-raiden-fighters Radosław Gliński 2022-08-28 20:52:42 +0200
  • 3569e97e0e [CPU] Add rldicx implementation beeanyew 2022-08-28 20:02:39 +0200
  • 75ed343e72 [CPU] Add stub OE handling implementation for addex and negx beeanyew 2022-08-28 20:01:26 +0200
  • 04c9c02270 Guest crash message more useful illusion0001 2022-08-23 17:31:41 -0500
  • 9006b309af
    Merge pull request #62 from chrisps/canary_experimental Radosław Gliński 2022-08-23 00:01:24 +0200
  • 1ffd7ecae8 Remove vpcmov print chss95cs@gmail.com 2022-08-21 12:40:56 -0700
  • b5ef3453c7 Disable most XOP code by default, the manual must be wrong for the shifts or we must be assembling them incorrectly, will return to it later and fix comparisons and select done by xop are fine though chss95cs@gmail.com 2022-08-21 12:32:33 -0700
  • b26c6ee1b8 Fix some more constant folding fabsx does NOT set fpscr turns out that our vector unsigned compare instructions are a bit wierd? chss95cs@gmail.com 2022-08-21 10:27:54 -0700
  • 0ebc109d4d add initial xop codepaths, still need to finish the rest of the compares, and then do shifts, rotates, and PERMUTE Add vector simplification pass, so far it only recognizes whether VECTOR_DENORMFLUSH is useless and optimizes them away Tag restgplr/savegplr/restvmx/savevmx/restfpr/savefpr with useful information, i intend to inline them (they tend to be the most heavily called guest functions) chss95cs@gmail.com 2022-08-21 08:55:42 -0700
  • da00ede181 [XAM/Settings] Check if provided size doesn't exceed maximal setting size Gliniak 2022-08-21 17:46:00 +0200
  • 0b013fdc6b
    Merge pull request #61 from chrisps/canary_experimental Radosław Gliński 2022-08-21 09:31:09 +0200
  • d85bfc1894 Dont constant evaluate MAX with V128! Fix signed zeroes behavior for vmaxfp emulation, was causing a block in sonic to move perpetually, very slowly chss95cs@gmail.com 2022-08-20 14:22:05 -0700
  • 010b59e81c [Emulator] Install Content: Create header for installed packages Gliniak 2022-08-20 20:41:08 +0200
  • 469d062a50 [Emulator] Updated "Install Content" function to match PR status Gliniak 2022-08-20 12:47:43 +0200
  • f19cb704aa [Emulator] Added error checking while creating directories Gliniak 2022-08-20 12:08:19 +0200
  • 457296850e Add OPCODE_NEGATED_MUL_ADD/OPCODE_NEGATED_MUL_SUB Proper handling of nans for VMX max/min on x64 (minps/maxps has special behavior depending on the operand order that vmx does not have for vminfp/vmaxfp) Add extremely unintrusive guest code profiler utilizing KUSER_SHARED systemtime. This profiler is disabled on platforms other than windows, and on windows is disabled by default by a cvar Repurpose GUEST_SCRATCH64 stack offset to instead be for storing guest function profile times, define GUEST_SCRATCH as 0 instead, since thats already meant to be a scratch area Fix xenia silently closing on config errors/other fatal errors by setting has_console_attached_'s default to false Add alternative code path for guest clock that uses kusershared systemtime instead of QueryPerformanceCounter. This is way faster and I have tested it and found it to be working, but i have disabled it because i do not know how well it works on wine or on processors other than mine Significantly reduce log spam by setting XELOGAPU and XELOGGPU to be LogLevel::Debug Changed some LOGI to LOGD in places to reduce log spam Mark VdSwap as kHighFrequency, it was spamming up logs Make logging calls less intrusive for the caller by forcing the test of log level inline and moving the format/AppendLogLine stuff to an outlined cold function Add swcache namespace for software cache operations like prefetches, streaming stores and streaming loads. Add XE_MSVC_REORDER_BARRIER for preventing msvc from propagating a value too close to its store or from its load Add xe_unlikely_mutex for locks we know have very little contention add XE_HOST_CACHE_LINE_SIZE and XE_RESTRICT to platform.h Microoptimization: Changed most uses of size_t to ring_size_t in RingBuffer, this reduces the size of the inlined ringbuffer operations slightly by eliminating rex prefixes, depending on register allocation Add BeginPrefetchedRead to ringbuffer, which prefetches the second range if there is one according to the provided PrefetchTag added inline_loadclock cvar, which will directly use the value of the guest clock from clock.cc in jitted guest code. off by default change uses of GUEST_SCRATCH64 to GUEST_SCRATCH Add fast vectorized xenos_half_to_float/xenos_float_to_half (currently resides in x64_seq_vector, move to gpu code maybe at some point) Add fast x64 codegen for PackFloat16_4/UnpackFloat16_4. Same code can be used for Float16_2 in future commit. This should speed up some games that use these functions heavily Remove cvar for toggling old float16 behavior Add VRSAVE register, support mfspr/mtspr vrsave Add cvar for toggling off codegen for trap instructions and set it to true by default. Add specialized methods to CommandProcessor: WriteRegistersFromMem, WriteRegisterRangeFromRing, and WriteOneRegisterFromRing. These reduce the overall cost of WriteRegister Use a fixed size vmem vector for upload ranges, realloc/memsetting on resize in the inner loop of requestranges was showing up on the profiler (the search in requestranges itself needs work) Rename fixed_vmem_vector to better fit xenia's naming convention Only log unknown register writes in WriteRegister if DEBUG :/. We're stuck on MSVC with c++17 so we have no way of influencing the branch ordering for that function without profile guided optimization Remove binding stride assert in shader_translator.cc, triangle told me its leftover ogl stuff Mark xe::FatalError as noreturn If a controller is not connected, delay by 1.1 seconds before checking if it has been reconnected. Asking Xinput about a controller slot that is unused is extremely slow, and XinputGetState/SetState were taking up an enormous amount of time in profiles. this may have caused a bit of input lag Protect accesses to input_system with a lock Add proper handling for user_index>= 4 in XamInputGetState/SetState, properly return zeroed state in GetState Add missing argument to NtQueryVirtualMemory_entry Fixed RtlCompareMemoryUlong_entry, it actually does not care if the source is misaligned, and for length it aligns down Fixed RtlUpperChar and RtlLowerChar, added a table that has their correct return values precomputed chss95cs@gmail.com 2022-08-20 11:40:19 -0700
  • f551e59015
    CI: Remove game patches Margen67 2022-08-19 03:39:29 -0700
  • e06978e5be [Premake] Cleanup & Fixed references in cpu-tests Gliniak 2022-08-17 09:43:55 +0200
  • 0df92130e6 [Memory] Changed amount of kernel reserved pages. Gliniak 2022-08-15 17:51:29 +0200
  • 7cc364dcb8 squash reallocs in command buffers by using large prealloced buffer, directly use virtual memory with it so os allocs on demand mark raw clock functions as noinline, the way msvc was inlining them and ordering the branches meant that rdtsc would often be speculatively executed add alternative clock impl for win, instead of using queryperformancecounter we grab systemtime from kusershared. it does not have the same precision as queryperformancecounter, we only have 100 nanosecond precision, but we round to milliseconds so it never made sense to use the performance counter in the first place stubbed out the "guest clock mutex"... (the entirety of clock.cc needs a rewrite) added some helpers for minf/maxf without the nan handling behavior chss95cs@gmail.com 2022-08-14 13:42:08 -0700
  • c9b2d10e17 alternative mutex impl on windows works but i really can't tell if it helps much. use larger size in deferred_command_list to cut down on resizes in big scenes on m:dur chss95cs@gmail.com 2022-08-14 10:26:50 -0700
  • a037bdb2e8 Point ffmpeg submodule to the branch with the nonrecursive split_radix_permutation chss95cs@gmail.com 2022-08-14 09:20:04 -0700
  • e5d01af6a6 trying to get new disruptorplus module path to be used chss95cs@gmail.com 2022-08-14 09:16:40 -0700
  • 08f7a28920 Alternative mutex chss95cs@gmail.com 2022-08-14 08:59:11 -0700
  • 6bc3191b97
    Merge pull request #60 from chrisps/canary_experimental Radosław Gliński 2022-08-13 23:14:31 +0200
  • 495b1f8bc8 once again return to spinloop chss95cs@gmail.com 2022-08-13 14:05:35 -0700
  • c9e4119428 Add branch of ffmpeg with non-recursive split_radix_permutation Add branch of disruptorplus with working blocking_wait_stategy Switch back to blocking wait for timer queue chss95cs@gmail.com 2022-08-13 13:43:45 -0700
  • 020d64a1a1 revert to using old bad spinwait, disruptorplus' blocking_wait code does not compile chss95cs@gmail.com 2022-08-13 13:20:35 -0700
  • cb85fe401c Huge set of performance improvements, combined with an architecture specific build and clang-cl users have reported absurd gains over master for some gains, in the range 50%-90% But for normal msvc builds i would put it at around 30-50% Added per-xexmodule caching of information per instruction, can be used to remember what code needs compiling at start up Record what guest addresses wrote mmio and backpropagate that to future runs, eliminating dependence on exception trapping. this makes many games like h3 actually tolerable to run under a debugger fixed a number of errors where temporaries were being passed by reference/pointer Can now be compiled with clang-cl 14.0.1, requires -Werror off though and some other solution/project changes. Added macros wrapping compiler extensions like noinline, forceinline, __expect, and cold. Removed the "global lock" in guest code completely. It does not properly emulate the behavior of mfmsrd/mtmsr and it seriously cripples amd cpus. Removing this yielded around a 3x speedup in Halo Reach for me. Disabled the microprofiler for now. The microprofiler has a huge performance cost associated with it. Developers can re-enable it in the base/profiling header if they really need it Disable the trace writer in release builds. despite just returning after checking if the file was open the trace functions were consuming about 0.60% cpu time total Add IsValidReg, GetRegisterInfo is a huge (about 45k) branching function and using that to check if a register was valid consumed a significant chunk of time Optimized RingBuffer::ReadAndSwap and RingBuffer::read_count. This gave us the largest overall boost in performance. The memcpies were unnecessary and one of them was always a no-op Added simplification rules for multiplicative patterns like (x+x), (x<<1)+x For the most frequently called win32 functions i added code to call their underlying NT implementations, which lets us skip a lot of MS code we don't care about/isnt relevant to our usecases ^this can be toggled off in the platform_win header handle indirect call true with constant function pointer, was occurring in h3 lookup host format swizzle in denser array by default, don't check if a gpu register is unknown, instead just check if its out of range. controlled by a cvar ^looking up whether its known or not took approx 0.3% cpu time Changed some things in /cpu to make the project UNITYBUILD friendly The timer thread was spinning way too much and consuming a ton of cpu, changed it to use a blocking wait instead tagged some conditions as XE_UNLIKELY/LIKELY based on profiler feedback (will only affect clang builds) Shifted around some code in CommandProcessor::WriteRegister based on how frequently it was executed added support for docdecaduple precision floating point so that we can represent our performance gains numerically tons of other stuff im probably forgetting chss95cs@gmail.com 2022-08-13 12:59:00 -0700
  • fc4f2183b7 [Emulator] Added option for content installation Gliniak 2022-07-30 12:41:26 +0200
  • 2f59487bf3
    Merge pull request #59 from Uraniumm/canary_experimental Radosław Gliński 2022-08-08 19:47:35 +0200
  • a16acbaf59
    add nullptr check to mitigate crashes Uraniumm 2022-08-08 02:02:25 -0400
  • 3ac99e0d7d
    Merge pull request #58 from chrisps/canary_experimental Radosław Gliński 2022-08-08 07:54:26 +0200
  • 324a8eb818 A bunch of fixes for division logic: "turns out theres a lot of quirks with the div instructions we havent been covering if the denom is 0, we jump to the end and mov eax/rax to dst, which is correct because ppc raises no exceptions for divide by 0 unlike x86 except we don't initialize eax before that jump, so whatever garbage from the previous sequence that has been left in eax/rax is what the result of the instruction will be and then in our constant folding, we don't do the same zero check in Value::Div, so if we constant folded the denom to 0 we will host crash the ppc manual says the result for a division by 0 is undefined, but in reality it seems it is always 0 there are a few posts i saw from googling about it, and tests on my rgh gave me 0, but then another issue came up and that is that we dont check for signed overflow in our division, so we raise an exception if guest code ever does (1<<signbit_pos) / -1 signed overflow in division also produces 0 on ppc the last thing is that if src2 is constant we skip the 0 check for division without checking if its nonzero all weird, likely very rare edge cases, except for maybe the signed overflow division chrispy — Today at 9:51 AM oh yeah, and because the int members of constantvalue are all signed ints, we were actually doing signed division always with constant folding" chss95cs@gmail.com 2022-08-07 10:41:26 -0700
  • f45e9e5e9a [Kernel] Improved handling of internal display resolution Gliniak 2022-08-02 12:09:25 +0200
  • 0e1353aa71 Implemented Opcode: mcrf Gliniak 2021-06-11 23:31:09 +0200
  • 332f69f36b
    Merge pull request #57 from chrisps/canary_experimental Radosław Gliński 2022-07-31 18:43:30 +0200
  • 968f656d96 Add separate VMX/fpu mxcsr Add support for constant operands for most fpu instructions Remove constant folding for most fpu cpde half float chss95cs@gmail.com 2022-07-31 08:56:36 -0700
  • 3185b0ac9c
    Merge pull request #55 from Etokapa/patch-1 Radosław Gliński 2022-07-31 09:24:55 +0200
  • bdf9e8d65a
    Adjustments for Building Canary Etokapa 2022-07-30 11:13:17 -0500
  • 5d1b641197 [Emulator] Added possiblity to install multiple packages at once Gliniak 2022-07-30 15:52:41 +0200
  • 79ffbe3971 Merge branch 'importContent' of https://github.com/Gliniak/xenia.git into canary_experimental Gliniak 2022-07-30 12:44:24 +0200
  • 0e3403d6da Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental Gliniak 2022-07-30 12:42:51 +0200
  • 433a8a8a5e [Emulator] Added option for content installation Gliniak 2022-07-30 12:41:26 +0200
  • 7595cdb52b [Vulkan] Non-GS point sprites + minor SPIR-V fixes Triang3l 2022-07-27 17:14:28 +0300
  • ff7ef05063 [SPIR-V] Clamp cube face using NClamp, not NMax/FMin Triang3l 2022-07-26 17:08:12 +0300
  • 66c995f3aa [SPIR-V] Saturate point sprite coordinates Triang3l 2022-07-26 17:04:22 +0300
  • 8fb5da18ea [Vulkan] Add forgotten fullDrawIndexUint32 check Triang3l 2022-07-26 16:24:14 +0300
  • 9fa41c27bc [Vulkan] Point sprite geometry shader Triang3l 2022-07-26 16:01:20 +0300
  • 0c3019981c [Video] Added option to set internal output resolution Gliniak 2022-07-26 11:25:03 +0200
  • 76806e08c5 Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental Gliniak 2022-07-26 10:22:38 +0200
  • f248e23079 [DXBC] Skip backface check in point PsParamGen Triang3l 2022-07-25 21:48:25 +0300
  • 77e85ecaa4 [Vulkan] 32-bit index fetch without fullDrawIndexUint32 Triang3l 2022-07-25 16:53:12 +0300
  • 061000af01 [Base] Changed size of bitstream accessed data (Risky) Gliniak 2022-07-25 10:52:21 +0200
  • 364137ef5f [XAM] Send UI On notification on start of XamShowSigninUI Gliniak 2022-07-25 10:50:32 +0200
  • 6730ffb7d3 Merge branch 'canary_experimental' of https://github.com/xenia-canary/xenia-canary into canary_experimental Gliniak 2022-07-24 17:58:48 +0200
  • 6e501fbd61 [XAM] Set license mask for DLCs (Thanks Beeanyew) Gliniak 2022-07-24 17:58:00 +0200
  • 98c2cb636f Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental Gliniak 2022-07-24 17:38:08 +0200
  • 37579d3bf0 [GPU] Treat non-adaptive-tessellated patches as 1-control-point Triang3l 2022-07-24 17:38:26 +0300
  • 5c18f3a5dd
    Merge pull request #53 from chrisps/canary_experimental Radosław Gliński 2022-07-23 21:53:30 +0200
  • 33a6cfc0a7 Add special cases to DOT_PRODUCT_3/4 that detect whether they're calculating lengthsquared Add alternate path to DOT_PRODUCT_3/4 for use_fast_dot_product that skips all the status register stuff and just remaps inf to qnan Add OPCODE_TO_SINGLE to replace the CONVERT_F32_F64 - CONVERT_F64_F32 sequence we used to emit with the idea that a backend could implement a more correct rounding behavior if possible on its arch Remove some impossible sequences like MUL_HI_I8/I16, MUL_ADD_F32, DIV_V128. These instructions have no equivalent in PPC. Many other instructions are unused/dead code and should be removed to make the x64 backend a better reference for future ones Add backend_flags to Instr. Basically, flags field that a backend can use for whatever it wants when generating code. Add backend instr flag to x64 that tells it to not generate code for an instruction. this allows sequences to consume subsequent instructions Generate actual x64 code for VSL instruction instead of using callnativesafe Detect repeated COMPARE instructions w/ identical operands and reuse the results in FLAGS if so. this eliminates a ton of garbage compare/set instructions. If a COMPARE instructions destination is stored to context with no intervening instruction and no additional uses besides the store, do setx [ctx address] Detect prefetchw and use it in CACHE_CONTROL if prefetch for write is requested instead of doing prefetch to all cache levels Fixed an accident in an earlier commit by me, VECTOR_DENORMFLUSH was not being emitted at all, so denormal inputs to MUL_ADD_V128 were not becoming zero and outputs from DOT_PRODUCT_X were not either. I believe this introduced a bug into RDR where a wagon wouldnt spawn? (https://discord.com/channels/308194948048486401/308207592482668545/1000443975817252874) Compute fresx in double precision using RECIP_F64 and then round to single instead of doing (double)(1.0f / (float)value), matching original behavior better Refactor some of ppc_emit_fpu, much of the InstrEmit function are identical except for whether they round to single or not Added "tail emitters" to X64Emitter. These are callbacks that get invoked with their label and the X64Emitter after the epilog code. This allows us to move cold code out of the critical path and in the future place constant pools near functions guest_to_host_thunk/host_to_guest_thunk now gets directly rel32 called, instead of doing a mov Add X64BackendContext structure, represents data before the start of the PPCContext Instead of doing branchless sequence, do a compare and jump to tail emitted code for address translation. This makes converting addresses a 3 uop affair in most cases. Do qnan move for dot product in a tail emitter Detect whether EFLAGS bits are independent variables for the current cpu (not really detecting it ehe, just checking if zen) and if so generate inc/dec for add/sub 1 Detect whether low 32 bits of membase are 0. If they are then we can use membasereg.cvt32() in place of immediate 0 in many places, particularly in stores Detect LOAD MODIFY STORE pattern for context variables (currently only done for 64 bit ones) and turn them into modify [context ptr]. This is done for add, sub, and, or, xor, not, neg Tail emit error handling for TRAP opcodes Stub out unused trap opcodes like TRAP_TRUE_I32, TRAP_TRUE_I64, TRAP_TRUE_I16 (the call_true/return_true opcodes for these types are also probably unused) Remove BackpropTruncations. It was poorly written and causes crashes on the game Viva pinata (https://discord.com/channels/308194948048486401/701111856600711208/1000249460451983420) chss95cs@gmail.com 2022-07-23 12:10:07 -0700
  • 1fcac00924 Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental Gliniak 2022-07-23 13:26:31 +0200
  • 3c12814276 [GPU] EDRAM looped addressing (resolves #2031) Triang3l 2022-07-22 23:51:29 +0300
  • 1d480f74fa Add game patches to releases Margen67 2022-07-21 21:58:43 -0700
  • 97f2774fa9 Initial support for xex patching Gliniak 2020-10-09 21:02:32 +0200
  • cdf2213d3d Added Premake Files For PatchingSystem Gliniak 2020-10-02 22:33:42 +0200
  • 0c782ade8e Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental Gliniak 2022-07-21 18:52:33 +0200
  • 74d83e4af8 Python/xenia-build/xb fixes Margen67 2022-03-09 07:07:46 -0800
  • 6ff312afb1 [DXBC] Update PsParamGen comment [ci skip] Triang3l 2022-07-21 12:42:06 +0300
  • 1a95bef8b3 [GPU] Eliminate unused shader I/O, UCP culling, centroid on Vulkan Triang3l 2022-07-21 12:32:28 +0300
  • 0f60e23208 [Kernel] Removed input change notifications from initial notify list Gliniak 2022-07-19 10:46:36 +0200
  • bc315d21e0 Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental Gliniak 2022-07-19 10:45:14 +0200
  • 0a94b86cb8 [GPU] Remove orphaned GetPresentArea declaration [ci skip] Triang3l 2022-07-18 21:02:07 +0300
  • 57b514ea6a Removed (again) unnecessary include Gliniak 2022-07-18 09:40:45 +0200
  • 3757580f45
    Merge pull request #52 from chrisps/canary_experimental Radosław Gliński 2022-07-18 09:20:35 +0200
  • fd78ab4dfc [Patcher] Allow loading patches from non-utf8 paths Gliniak 2022-07-18 08:46:04 +0200
  • a41770acc5 [xenia-build] Check for clang-format 14 Joel Linn 2022-07-17 18:54:21 +0200
  • 5bfa3bf56e [CI] Build all android targets Joel Linn 2022-07-17 16:16:01 +0200
  • f4f131aab9 [CI] Exclude "Wait on Timer" and "Wait on Multiple Timers" tests from drone Joel Linn 2022-07-17 16:14:46 +0200
  • 846fedfa47 [xenia-build] Report premake errors via exit code Joel Linn 2022-07-16 11:09:16 +0200
  • 92db8f65b7 [CI] Use ninja for cmake builds Joel Linn 2022-07-16 01:57:25 +0200
  • b3a00d0368 [CI] Update image to 2022-07-15 Joel Linn 2022-07-16 01:57:12 +0200
  • 11817f0a3b vshufps accident broke things, this fixes chss95cs@gmail.com 2022-07-17 14:44:09 -0700
  • c0354b8c20 [xenia-build] Check for clang-format 14 Joel Linn 2022-07-17 18:54:21 +0200
  • 8714fd79dc [CI] Build all android targets Joel Linn 2022-07-17 16:16:01 +0200
  • 8a23689d58 [CI] Exclude "Wait on Timer" and "Wait on Multiple Timers" tests from drone Joel Linn 2022-07-17 16:14:46 +0200
  • 61598a1c25 [xenia-build] Report premake errors via exit code Joel Linn 2022-07-16 11:09:16 +0200
  • 96585009a9 [CI] Use ninja for cmake builds Joel Linn 2022-07-16 01:57:25 +0200
  • 711ddc0eb2 [CI] Update image to 2022-07-15 Joel Linn 2022-07-16 01:57:12 +0200
  • 6e1e62378f Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental Gliniak 2022-07-17 21:27:52 +0200
  • 14fdf4b270 [GPU] Up to 7x7 resolution scaling Triang3l 2022-07-17 20:41:50 +0300
  • 3717167bbe Preload ThreeFloatMask in DOT_PRODUCT_3 Use shuffle_ps instead of broadcastss, broadcastss is slower on many intel and amd processors and encodes to the same number of bytes as shuffle_ps Detect and optimize away PERMUTE with a zero src2 and src3 in constant_propagation_pass instead of in the x64 sequence For constant PERMUTE, do the Xor/And prior to LoadConstantXmm instead of in the generated code Simplified code for PERMUTE Added simplification rule that detects (lzcnt(x) >> log2(bitsizeof_x)) == ( x == 0) Added set_srcN(value, idx) which can be used to set the nth source of an instruction, which makes more sense than having three different functions that only differ by the field they touch Added Value::VisitValueOperands for iterating all Value operands an instruction has. Add BackpropTruncations code to simplification_pass Changed the (void**) dereferences of raw_context that are done to grab thread_state to instead reference PPCContext and the thread_state field. Moved the thread_state field to the tail of PPCContext. Moved membase to the tail of PPCContext, since now it is reloaded very infrequently. Rearranged PPCContext so that the condition registers come first (most accesses to them cant get SSA'd), moved lr and ctr to after gp regs since they are not accessed as much as the main gpregs. This way the most frequently accessed registers will be accessible via a rel8 displacement instead of rel32 (ideally, we would have only certain CRs at the start, but xenia does pointer arithmetic on CR0's offset to get CRn) Use alignas(64) to ensure PPCContext's padding Map PPCContext specially so that the low 32 bits of the context register is 0xE0000000, for the 4k page offset check. Also allocate the page before, so that backends can store their own information that is not relevant to the PPCContext on that page and reference that data in the generated asm via 8-bit signed displ or 32-bit signed displ. Currently this page is not being utilized, but I plan on stashing some data critical to the x86 backend there Changed many wrong avx instructions, they worked but they were not intended for the data they operated on, meaning they transferred domains and caused 1-2 cycle stall each time Added SimdDomain checking/deduction to X64Emitter. Used SimdDomain code to fix a lot of float/int domain stalls chss95cs@gmail.com 2022-07-17 09:52:40 -0700
  • e8652e544a [GPU] Translucent trace viewer controls Triang3l 2022-07-17 17:29:41 +0300