Commit Graph

5861 Commits

Author SHA1 Message Date
Adrian 333d7c2767 [UI] Added build to exception message 2023-02-05 18:31:12 +01:00
Adrian 3d3810fa98 [App] Minor Fixes
Added F9
[Controller Hotkeys] Call RunTitle from the UI thread
2023-02-03 21:23:35 +01:00
Gliniak 036f19c66e [UI] Uplift of TR - 2092
Plus fixed invalid path, they're not stored anymore
2023-02-02 19:02:23 +01:00
Adrian 4c220770ec Dynamic TU patch
Load the correct TU patch for the loaded launch module
2023-01-31 21:11:38 +01:00
Adrian b10c84b340
Added controller hotkeys cvar (#119)
* Added controller hotkeys setting

Added option to disable controller hotkeys
Minor Changes

* Fixed locked input system

The input system lock should be released even if a controller is not connected.
2023-01-29 19:26:25 +01:00
Shoegzer 4a2f4d9cfe Add include to fix compiling 2023-01-29 21:10:20 +03:00
Gliniak 89f3598426 Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2023-01-29 11:25:56 +01:00
Gliniak 4e87d1f9d1 [Kernel/Thread] Set TLS slot to 0 while freeing 2023-01-28 17:49:12 -06:00
Emma 4ecd1bffda
Add configurable toggle to mark code segments as writable (#100) 2023-01-23 23:44:59 +01:00
Margen67 aea9714bd0 Make present_safe_area 100
So people stop asking why their games are cropped off.
2023-01-23 01:22:43 -08:00
Adrian 504fb9f205 Title selection & bug fixes
Added title selection
Fixed controller hotkeys for multiple connected controllers
Fixed ImGUI dialog box stacking
Added a new font for ImGUI
2023-01-22 22:49:51 +01:00
Adrian 459497f0b6
Implemented Controller Hotkeys (#111)
Implemented Controller Hotkeys

Added controller hotkeys
Added guide button support for XInput and winkey

The hotkey configurations can be found in HID -> Display controller hotkeys
If the Xbox Gamebar overlay is enabled then use the Back button instead of the Guide button.

- Fixed hotkey thread destruction
- Fixed XINPUT_STATE by padding 4 bytes
- Added hotkey vibration for user feedback
- Replaced MessageBoxA with ImGuiDialog::ShowMessageBox

Co-authored-by: Margen67 <Margen67@users.noreply.github.com>
2023-01-13 09:17:43 +01:00
Gliniak ea1003c6bf [CPU] Check if flags pointer exists
There is specific situation in FM4 on terminating thread that runs dll
where flags pointer doesn't exist
2023-01-09 16:13:09 +01:00
Gliniak d24d3295c6 [XMA] Clear host data on context clear + swap buffer if decoding fails 2023-01-06 19:27:41 +01:00
Gliniak 668244adb7 [XAM] Fixed importing savefiles from different games (at least partially) 2023-01-06 16:35:40 +01:00
Gliniak 40f6b5b3ac [Kernel] Improvements to host and guest objects handling 2023-01-06 09:13:21 +01:00
Gliniak 5b0fe5aada [XBDM] Added Stub For: DmGetConsoleDebugMemoryStatus 2023-01-05 22:34:26 +01:00
Gliniak 81412c59c1 [Net] NetDll___WSAFDIsSet: Fixed incorrect endianness of fd_count
Plus: limit it to 64 entries
2023-01-05 20:33:12 +01:00
Gliniak 81aaf98e04 [Memory] Added option to ignore offset for ranged physical allocations
Certain titles check if input address matches output and if it doesn't then just crash
2023-01-05 08:56:22 +01:00
Gliniak 39c509b57f [APU] Resolved context stuck with is_stream_done_ flag and no space left 2023-01-03 19:49:21 +01:00
Gliniak b7bc0425ba Revert "[APU] Clear host data while reseting context"
This reverts commit 1451ca4266.
2023-01-01 11:23:47 +01:00
Gliniak 2afd2cc4d6 [D3D12] Added workaround for broken Geometry Shader in AMD 7900 Series GPU 2022-12-31 11:27:01 +01:00
Gliniak 26415cb8b1 Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-12-31 11:19:01 +01:00
Wunkolo e55cb737c1 [x64] Add AX512 optimization for `OPCODE_SELECT`(F64) 2022-12-28 14:20:20 -06:00
Wunkolo ba75a016b4 [x64] Add AX512 optimization for `OPCODE_SELECT`(V128)
Uses `vpternlogd` to collapse the bitwise select operation into one
instruction. Though it needs a `vmovdqa` instruction since `vpternlogd`
reads and writes to the first argument.
2022-12-28 14:20:20 -06:00
Wunkolo 7c21b327ff [x64] Add `x64_util.h`
Used to help with generating instruction-specific constants.  Currently
used for the ternary-logic constants(`vpternlog*`).
2022-12-28 14:20:20 -06:00
Gliniak eb25fe4f4a [CPU] Increase amount of possible labels used in FinalizationPass
Instead of using decimal notaation for labels let's use hexadecimal.
That will increase amount of possible combination by a lot.
2022-12-28 14:19:55 -06:00
Joel Linn 9eef64d3fb [SDL2] Print version on startup 2022-12-28 14:19:02 -06:00
Gliniak 859bb89555 [Kernel] Support for loading achievement data 2022-12-28 14:16:43 -06:00
Joel Linn da9c90835b [ImGui] Use new key API 2022-12-28 14:16:32 -06:00
Joel Linn b641e39c4d [ImGui] Use new ImageButton signature 2022-12-28 14:16:32 -06:00
Joel Linn d6b5cbd634 [ImGui] Fix deprecated SetCursorPos use 2022-12-28 14:16:32 -06:00
Joel Linn 29985f69e1 [ImGui] Fix fsr sharpness slider scaling
https://github.com/ocornut/imgui/issues/3361#issuecomment-1287045159
2022-12-28 14:16:32 -06:00
Joel Linn f452d6a007 [UI] Fix UB (moved mem) in file picker
- References to vector data become UB after vector size changes.
- Add one extra level of indirection to pin the wide string memory
  location regardless of vector memory
2022-12-28 14:16:32 -06:00
Joel Linn 7877331d8a [ImGui] Use ImDrawCmd::IdxOffset field
c80e8b964c
https://github.com/ocornut/imgui/issues/4845#issuecomment-1003329113
2022-12-28 14:16:32 -06:00
Joel Linn 8f88bb237e [ImGui] Fix empty IDs 2022-12-28 14:16:32 -06:00
Joel Linn b107af68bc [ImGui] Fix removed flags 2022-12-28 14:16:32 -06:00
Gliniak 9f0d3d4c5b [Kernel] Simple support for loading achievement data 2022-12-25 16:11:59 +01:00
Gliniak a25ea03a28 [Threading] Revert sleepdelay0_for_maybeyield back to 0 2022-12-21 20:12:40 +01:00
Gliniak 783845c56e [XAM] Cleanup in xam_input.
- Added some missing xinput flags
- Simplified XamInputGetCapabilities
- Replaced hardcoded checks to flag
2022-12-20 19:33:55 +01:00
InvoxiPlayGames d13c3b7306 fix XamInputGetCapabilities breaking subtypes 2022-12-19 04:09:27 +00:00
Radosław Gliński 8c43160fc6
Merge pull request #106 from Bo98/canary_burnout5_pr1
CPU threading improvements, Kernel game region improvements
2022-12-18 11:16:53 +01:00
Bo Anderson 6c5423ceed app/emulator_window: store recent.toml in storage root
This makes it consistent with the config file.
2022-12-17 13:36:43 -05:00
Bo Anderson 422ee1fbdc kernal/xam/xam_info: base game region on user_country
It is debatable whether this is correct in the general case.

There's nothing really wrong with 0xFFFF logically.
Burnout Paradise however bases its in-game language on this and does not recognise 0xFFFF.
The game uses Japanese in the default case.

I've avoided the "rest of Asia" code since Burnout Paradise seems to use a different value (0x01F8) for that than what I expected (0x01FC).
2022-12-17 13:34:49 -05:00
Bo Anderson 2a77ed7246 cpu/backend/x64/x64_tracers: fix thread matching 2022-12-17 13:34:48 -05:00
Bo Anderson 9489161d0b kernel: separate host objects from regular handle range
Burnout Paradise statically expects certain thread handle values based on how many objects it knows it is allocating ahead of time.
From this, it calculates an ID by subtracting the thread handle from a base handle of what it expects the first such thread to be assigned.
The value is statically declared in the executable and is not determined automatically.

The host objects in the handle range made these thread handles higher than what the game expects.
Removing these, and allowing 0xF8000000 to be assigned, allows the thread handles to fit perfectly in the range the game expects.

It is not clear what handle range the host objects should be taking. For now though, they're 0-based rather than 0xF8000000-based.
2022-12-17 13:34:47 -05:00
MoistyMarley 4f165825ea Fixed three crashes and refactored
While a title is running any attempt to open another title will result in a crash this commit prevents these crashes via File->Open and File->Open Recent->Title

Opening a recent title with an invalid path caused a crash.
2022-12-16 18:39:42 +00:00
chss95cs@gmail.com 3952997116 Fix issue introduced yesterday where the final fetch constant would never be marked as written
Reorganized SystemPageFlags for sharedmemory, each field now goes into its own array, the three arrays are page aligned in a single virtual allocation
Refactored sharedmemory a bit, use tzcnt if available when finding ranges (faster on pre-zen4 amd cpus)
2022-12-15 08:35:36 -08:00
chrisps 8c6cefcce0
Merge branch 'xenia-canary:canary_experimental' into command_processor_optimizations 2022-12-14 11:35:13 -08:00
chss95cs@gmail.com f931c34ecb Cleaned up for commit, moved WriteRegistersFromMemCommonSense code into WriteRegistersFromMem
optimized copy_and_swap_32_unaligned further
2022-12-14 11:34:33 -08:00
chss95cs@gmail.com 754293ffc3 Fix mistake with fetch constant dirty mask 2022-12-14 10:13:13 -08:00
chss95cs@gmail.com 7a0fd0f32a Remove MaybeYields when vsync is off 2022-12-14 09:56:33 -08:00
chss95cs@gmail.com 080b6f4cbd Partially vectorized GetScissor (loading and unpacking the bitfields from the registers is still scalar) 2022-12-14 09:33:14 -08:00
chss95cs@gmail.com ab6d9dade0 add avx2 codepath for copy_and_swap_32_unaligned
use the new writerange approach in WriteRegisterRangeFromMem_WithKnownBound
2022-12-14 07:53:21 -08:00
Gliniak b2dd489151 [APU] Set first frame offset for next buffer + Note about edgecase 2022-12-13 22:57:36 +01:00
Gliniak e00feb7b0f [APU] Fixed incorrect frame count + removed hopefully useless check from now 2022-12-13 21:47:35 +01:00
chss95cs@gmail.com 82dcf3f951 faster/more compact MatchValueAndRef
Made commandprocessor GetcurrentRingReadcount inline, it was made noinline to match PGO decisions but i think PGO can make extra reg allocation decisions that make this inlining choice yield gains, whereas if we do it manually we lose a tiny bit of performance

Working on a more compact vectorized version of GetScissor to save icache on cmd processor thread
Add WriteRegisterForceinline, will probably end up amending to remove it

add ERMS path for vastcpy

Adding WriteRegistersFromMemCommonSense (name will be changed later), name is because i realized my approach with optimizing writeregisters has been backwards,  instead of handling all checks more quickly within the loop that writes the registers, we need a different loop for each range with unique handling. we also manage to hoist a lot of the logic out of the loops

Use 100ns delay for MaybeYield, noticed we often return almost immediately from the syscall so we end up wasting some cpu, instead we give up the cpu for the min waitable time (on my system, this is 0.5 ms)

Added a note about affinity mask/dynamic process affinity updates to threading_win

Add TextureFetchConstantsWritten
2022-12-13 11:25:33 -08:00
Gliniak 16580f5fae [APU] Fixed crash in error message caused by invalid arguments number 2022-12-13 11:02:59 +01:00
Gliniak 97fdf9c6dd [APU] Resolved crash related to negative amount of bits to copy
This is likely due to hitting somehow valid frame in invalid data
2022-12-13 08:46:48 +01:00
Gliniak 43d7fc5158 [APU] Shuffle checks to hopefully prevent crashing from logger 2022-12-11 21:06:47 +01:00
Gliniak c9cd6f15fc [APU] Fixed logged frame count
Until now info about frames that were provided in log was always incorrect by 1
2022-12-11 13:11:23 +01:00
Gliniak 82ccdd3db5 [APU] Misc Changes:
- Unified APU error messages
- Removed magic number from SetOffset call
- Commented out that annoying assertion from XmaContext::GetNextFrame
- Removed checks for current_input_packet_count and replaced with bool check

Not sure how to call it correctly. I know that calls with packet count == 1 is specific one
and probably handled differently. Is it streaming or how should it be called?
2022-12-11 12:22:23 +01:00
chss95cs@gmail.com 7d49b97e4c Print any module name+ offset in host exception reports
print thread name in host exception reports
trying to force win32 error descriptions to english
Return if output buffer block count is 0 in XmaContext, this is an attempt to fix a divide by zero crash many users have reported
2022-12-09 12:24:06 -08:00
Gliniak 7c5da821d4 [Kernel] Fixed invalid thread pointer in KeEnableFpuExceptions 2022-12-08 21:48:13 +01:00
Radosław Gliński 747fb42bdf
Merge pull request #98 from AdrianCassar/canary_experimental
Added a hotkey to open the previously played title
2022-12-05 18:09:52 +01:00
MoistyMarley 6d2724a861 Added a hotkey to open the previously played title 2022-12-05 17:01:16 +00:00
chss95cs@gmail.com a63f424c0a Directly check PEB for IsDebuggerAttached
Add constexpr getters to magicdiv class so it can be used from jitted x64/dxbc
Track the guest return address as well for guest/host sync, if multiple entries have the same guest stack find the first one with a matching guest retaddr. this fixes epic mickey 2 (which the previous guest-stack change had allowed to go ingame for a bit) and potentially also a crash in fable3.
Break if under debugger when stackpoints are overflowed

Add much more useful output for host exceptions, print out xenia_canary.exe relative offsets if exception is in module, formatmessage for ntstatus/win32err, strerror

Minor d3d12 microoptimization, instead of doing SetEventOnCompletion + WaitForSingleObject do SetEventOnCompletion w/ nullptr so that the wait happens in kernel mode, avoiding two extra context switches

add unimplemented kernel functions:
ExAllocatePoolWithTag
ObReferenceObject

ObDereferenceObject has no return value.
Log a message when ObDereferenceObject/Reference receive unregistered guest kernel objects
gave ObLookupThreadByThreadId its correct error status
hoist object_types initialization out of ObReferenceObjectByHandle
Fix out parameter values on error for a few kernel funcs
add note about msr to KeSetCurrentStackPointers
add X_STATUS_OBJECT_TYPE_MISMATCH check for xeNtSetEvent
add msr_mask field to X_KPCR
2022-12-04 12:38:19 -08:00
Triang3l e97eb75b94 [Vulkan] Update variableMultisampleRate comments (actually supported) [ci skip] 2022-12-04 14:55:56 +03:00
Triang3l 0b4f5ef286 [SPIR-V] Decorate whole gl_PerVertex with Invariant
Block members can be decorated with Invariant only since SPIR-V 1.5 Revision 2. In earlier versions, Invariant can be used only for variables. Mesa warns about this.
2022-12-03 14:27:43 +03:00
Gliniak 1eb61aa9ab Added reccently opened titles list 2022-11-29 10:47:30 +01:00
chrisps 0674b68143
Merge pull request #96 from chrisps/host_guest_stack_synchronization
Host/Guest stack sync, exception messagebox, kernel improvements, minor opt
2022-11-27 10:30:16 -08:00
Gliniak 12005acc98 [APU] Check if splitted frame length is valid 2022-11-27 18:40:27 +01:00
chss95cs@gmail.com 90c771526d "Fix" debug console, we were checking the cvar before any cvars were loaded, and the condition it checks in AttachConsole is somehow always false
Remove dead #if 0'd code in math.h

On amd64, page_size == 4096 constant, on amd64 w/ win32, allocation_granularity == 65536. These values for x86 windows havent changed over the last 20 years so this is probably safe
and gives a modest code size reduction

Enable XE_USE_KUSER_SHARED. This sources host time from KUSER_SHARED instead of from QueryPerformanceCounter, which is far faster, but only has a granularity of 100 nanoseconds.

In some games seemingly random crashes were happening that were hard to trace because
the faulting thread was actually not the one that was misbehaving, another threads stack was underflowing into the faulting thread.

Added a bunch of code to synchronize the guest stack and host stack so that if a guest longjmps the host's stack will be adjusted.
Changes were also made to allow the guest to call into a piece of an existing x64 function.

This synchronization might have a slight performance impact on lower end cpus, to disable it set enable_host_guest_stack_synchronization to false.
It is possible it may have introduced regressions, but i dont know of any yet

So far, i know the synchronization change fixes the "hub crash" in super sonic and allows the game "london 2012" to go ingame.

Removed emit_useless_fpscr_updates, not emitting these updates breaks the raiden game

MapGuestAddressToMachineCode now returns nullptr if no address was found, instead of the start of the function

add Processor::LookupModule

Add Backend::DeinitializeBackendContext
Use WriteRegisterRangeFromRing_WithKnownBound<0, 0xFFFF> in WriteRegisterRangeFromRing for inlining (previously regressed on performance of ExecutePacketType0)

add notes about flags that trap in XamInputGetCapabilities

0 == 3 in XamInputGetCapabilities

Name arg 2 of XamInputSetState

PrefetchW in critical section kernel funcs if available & doing cmpxchg

Add terminated field to X_KTHREAD, set it on termination

Expanded the logic of NtResumeThread/NtSuspendThread to include checking the type of the handle (in release, LookupObject doesnt seem to do anything with the type)
and returning X_STATUS_OBJECT_TYPE_MISMATCH if invalid. Do termination check in NtSuspendThread.

Add basic host exception messagebox, need to flesh it out more (maybe use the new stack tracking stuff if on guest thrd?)

Add rdrand patching hack, mostly affects users with nvidia cards who have many threads on zen

Use page_size_shift in more places

Once again disable precompilation! Raiden is mostly weird ppc asm which probably breaks the precompilation. The code is still useful for running the compiler over the whole of an xex in debug to test for issues
"Fix" debug console, we were checking the cvar before any cvars were loaded, and the condition it checks in AttachConsole is somehow always false

Remove dead #if 0'd code in math.h

On amd64, page_size == 4096 constant, on amd64 w/ win32, allocation_granularity == 65536. These values for x86 windows havent changed over the last 20 years so this is probably safe
and gives a modest code size reduction

Enable XE_USE_KUSER_SHARED. This sources host time from KUSER_SHARED instead of from QueryPerformanceCounter, which is far faster, but only has a granularity of 100 nanoseconds.

In some games seemingly random crashes were happening that were hard to trace because
the faulting thread was actually not the one that was misbehaving, another threads stack was underflowing into the faulting thread.

Added a bunch of code to synchronize the guest stack and host stack so that if a guest longjmps the host's stack will be adjusted.
Changes were also made to allow the guest to call into a piece of an existing x64 function.

This synchronization might have a slight performance impact on lower end cpus, to disable it set enable_host_guest_stack_synchronization to false.
It is possible it may have introduced regressions, but i dont know of any yet

So far, i know the synchronization change fixes the "hub crash" in super sonic and allows the game "london 2012" to go ingame.

Removed emit_useless_fpscr_updates, not emitting these updates breaks the raiden game

MapGuestAddressToMachineCode now returns nullptr if no address was found, instead of the start of the function

add Processor::LookupModule

Add Backend::DeinitializeBackendContext
Use WriteRegisterRangeFromRing_WithKnownBound<0, 0xFFFF> in WriteRegisterRangeFromRing for inlining (previously regressed on performance of ExecutePacketType0)

add notes about flags that trap in XamInputGetCapabilities

0 == 3 in XamInputGetCapabilities

Name arg 2 of XamInputSetState

PrefetchW in critical section kernel funcs if available & doing cmpxchg

Add terminated field to X_KTHREAD, set it on termination

Expanded the logic of NtResumeThread/NtSuspendThread to include checking the type of the handle (in release, LookupObject doesnt seem to do anything with the type)
and returning X_STATUS_OBJECT_TYPE_MISMATCH if invalid. Do termination check in NtSuspendThread.

Add basic host exception messagebox, need to flesh it out more (maybe use the new stack tracking stuff if on guest thrd?)

Add rdrand patching hack, mostly affects users with nvidia cards who have many threads on zen

Use page_size_shift in more places

Once again disable precompilation! Raiden is mostly weird ppc asm which probably breaks the precompilation. The code is still useful for running the compiler over the whole of an xex in debug to test for issues
2022-11-27 09:39:33 -08:00
Gliniak 1451ca4266 [APU] Clear host data while reseting context 2022-11-27 17:00:31 +01:00
Gliniak 9fdfd2ada9 [APU] Removed old hack that invalidates input on decoder error
Added returning parsing error while decoder fails
2022-11-26 17:25:39 +01:00
chss95cs@gmail.com 7a17fad88a fix crash from precompiling out of range funcs, add xexcache version, increment xexcache version (all priors are version 0 thanks to 0 initialization) 2022-11-07 05:40:18 -08:00
chss95cs@gmail.com e21fd22d09 add x_kthread priority/fpu_exceptions_on fields, set fpu_exceptions_on in KeEnableFpuExceptions, set priority in SetPriority
add msr field on context
write to msr for mtmsr/mfmsr, do not have correct default value for msr yet, nor has mtmsrd been reimplemented
do not evaluate assert expressions in release at all, while still avoiding unused variable warnings
2022-11-06 11:03:10 -08:00
chss95cs@gmail.com c1d922eebf Minor decoder optimizations, kernel fixes, cpu backend fixes 2022-11-05 10:50:33 -07:00
Gliniak ba66373d8c [APU][Janky] Fixed issues with incorrect frames on streamed data
This requires a lot more research and test data!
2022-11-03 20:56:36 +01:00
Gliniak dae508500a [APU] Clear remaining packets skip when we're done with current stream
Plus some additional logging
2022-11-03 12:59:47 +01:00
Margen67 4ba14bc35e [APU+HID] Optimizations 2022-11-03 03:56:13 -07:00
Gliniak b23566b823 [APU] Fix incorrect packet frame count when frame ends exactly where packet ends
This resolves looping background sound in GoW
2022-11-03 11:14:37 +01:00
Gliniak 259679d53c [APU] Handle exceeding input offset by switching buffer
This should resolve crashes in FH
2022-11-02 08:47:36 +01:00
chrisps 8186792113
Revert "Minor decoder optimizations, kernel fixes, cpu backend fixes" 2022-11-01 14:45:36 -07:00
chrisps 781871e2d5
Merge pull request #87 from chrisps/canary_experimental
Minor decoder optimizations, kernel fixes, cpu backend fixes
2022-11-01 11:49:10 -07:00
Gliniak c080e2e17c [APU] Resolved crashes related to out of bound readouts 2022-11-01 11:24:01 +01:00
Triang3l 778333b1b5 [UI] Fix ClearInput not called in ImGuiDrawer after deferred dialog removal
Also cleanup the code involved in dialog registration, and update the explanation of why dialog removal is delayed until the end of drawing (the original was written back when window listener and UI drawer callback registration during the execution of the callbacks was deferred, but that was wrong as that might result in execution of callbacks belonging to now-deleted objects).
2022-10-31 18:57:54 +03:00
chss95cs@gmail.com 06bfd624de fix failed debug build from loops variable assert 2022-10-30 12:33:08 -07:00
chss95cs@gmail.com bff264b5fd Fixed RtlCompareString and RtlCompareStringN, they were very wrong, for CompareString the params are struct ptrs not char ptrs
Fixed a ton of clang-cl compiler warnings about unused variables, still many left. Fixed a lot of inconsistent override ones too
2022-10-30 10:47:09 -07:00
chrisps 65b9d93551
Merge branch 'xenia-canary:canary_experimental' into canary_experimental 2022-10-30 09:05:40 -07:00
chss95cs@gmail.com 550d1d0a7c use much faster exp2/cos approximations in ffmpeg, large decrease in cpu usage on my machine on decoder thread
properly byteswap r13 for spinlock
Add PPCOpcodeBits
stub out broken fpscr updating in ppc_hir_builder. it's just code that repeatedly does nothing right now.
add note about 0 opcode bytes being executed to ppc_frontend
Add assert to check that function end is greater than function start, can happen with malformed functions
Disable prefetch and cachecontrol by default, automatic hardware prefetchers already do the job for the most part
minor cleanup in simplification_pass, dont loop optimizations, let the pass manager do it for us
Add experimental "delay_via_maybeyield" cvar, which uses MaybeYield to "emulate" the db16cyc instruction
Add much faster/simpler way of directly calling guest functions, no longer have to do a byte by byte search through the generated code
Generate label string ids on the fly
Fix unused function warnings for prefetch on clang, fix many other clang warnings
Eliminated majority of CallNativeSafes by replacing them with naive generic code paths.
^ Vector rotate left, vector shift left, vector shift right, vector shift arithmetic right, and vector average are included
These naive paths are implemented small loops that stash the two inputs to the stack and load them in gprs from there, they are not particularly fast but should be an order of magnitude faster than callnativesafe
to a host function, which would involve a call, stashing all volatile registers, an indirect call, potentially setting up a stack frame for the arrays that the inputs get stashed to, the actual operations, a return, loading all volatile registers, a return, etc
Added the fast SHR_V128 path back in
Implement signed vector average byte, signed vector average word. previously we were emitting no code for them. signed vector average byte appears in many games
Fix bug with signed vector average 32, we were doing unsigned shift, turning negative values into big positive ones potentially
2022-10-30 08:48:58 -07:00
Gliniak 55877f4c61 [APU] Force buffer swap at the end of stream
Plus some debugging messages and lint fixes
2022-10-25 17:20:45 +02:00
Gliniak 6b11787c93 [APU] Fixed typo that prevented last packet in stream to be processed 2022-10-24 21:33:25 +02:00
Gliniak fac2a89d0f Disallow offset to be set before header, header size fix, audio channels crashfix 2022-10-24 19:43:43 +02:00
Triang3l a37b57ca8d [GPU] Fix tiled mip tail extent calculation
Previously, for mips, the dimensions of the texture weren't rounded to powers of two before calculating the mip tail extent, resulting in the mip tail for a 260 blocks tall texture, that contains mips ending at Y of up to 36, having the Y extent calculated as 32. With rounding to powers of two, it would have been 64.

However, with the GetTiledAddressUpperBound functions, none of this is necessary at all (and neither is rounding the extents in TextureGuestLayout::Level to 32x32x4 blocks) - using the same code for calculating the XYZ extents of tiled textures as for linear textures now, which, for the mip tail, calculates the actual maximum coordinates of the mips stored in it - and rounding to tiles is done internally by GetTiledAddressUpperBound.
2022-10-23 21:26:47 +03:00
Triang3l 74f1f6bb6d [Vulkan] Check depthClamp feature 2022-10-23 19:01:17 +03:00
Triang3l 4add1f67b1 [D3D12] Replace unused shared memory view with a null view
Fixes the PIX validation warning about missing resource states on every guest draw. Also potentially prevents drivers from making assumptions about the shared memory buffer based on the bindings, though no such cases are currently known.
2022-10-23 18:09:47 +03:00
Wunkolo 5fde7c6aa5 [x64] Add AVX512 optimizations for `PERMUTE_V128`
Uses the single-instruction AVX512 `vperm*` instructions to accelerate
the `INT8_TYPE` and `INT16_TYPE` permutation opcodes.

The `INT8_TYPE` is accelerated using `AVX512VBMI` subset of AVX512.
Available since Icelake(Intel) and Zen4(AMD).
2022-10-21 08:47:31 -05:00
Wunkolo f207239349 [x64] Add `kX64EmitAVX512VBMI` feature-flag and detection
Allows access to byte-element 2-register permutations(32-byte look up
tables) and for 64-bit multi-shifts.
Particularly adding this to accelerate the assembly of our `PERMUTE`
opcode.
2022-10-21 08:47:31 -05:00
Wunkolo d73088e5ca [x64] Add AVX512 optimization for `OPCODE_VECTOR_SUB`(saturated)
Passes the `vsubuws` and `vsubsws` unit-tests from https://github.com/xenia-project/xenia/pull/1348
2022-10-21 08:45:43 -05:00