* Added controller hotkeys setting
Added option to disable controller hotkeys
Minor Changes
* Fixed locked input system
The input system lock should be released even if a controller is not connected.
Implemented Controller Hotkeys
Added controller hotkeys
Added guide button support for XInput and winkey
The hotkey configurations can be found in HID -> Display controller hotkeys
If the Xbox Gamebar overlay is enabled then use the Back button instead of the Guide button.
- Fixed hotkey thread destruction
- Fixed XINPUT_STATE by padding 4 bytes
- Added hotkey vibration for user feedback
- Replaced MessageBoxA with ImGuiDialog::ShowMessageBox
Co-authored-by: Margen67 <Margen67@users.noreply.github.com>
remove useless memorybarrier
remove double membarrier in wait pm4 cmd
add int64 cvar
use int64 cvar for x64 feature mask
Rework some functions that were frontend bound according to vtune placing some of their code in different noinline functions, profiling after indicating l1 cache misses decreased and perf of func increased
remove long vpinsrd dep chain code for conversion.h, instead do normal load+bswap or movbe if avail
Much faster entry table via split_map, code size could be improved though
GetResolveInfo was very large and had impact on icache, mark callees as noinline + msvc pragma optimize small
use log2 shifts instead of integer divides in memory
minor optimizations in PhysicalHeap::EnableAccessCallbacks, the majority of time in the function is spent looping, NOT calling Protect! Someone should optimize this function and rework the algo completely
remove wonky scheduling log message, it was spammy and unhelpful
lock count was unnecessary for criticalsection mutex, criticalsection is already a recursive mutex
brief notes i gotta run
Proper handling of nans for VMX max/min on x64 (minps/maxps has special behavior depending on the operand order that vmx does not have for vminfp/vmaxfp)
Add extremely unintrusive guest code profiler utilizing KUSER_SHARED systemtime. This profiler is disabled on platforms other than windows, and on windows is disabled by default by a cvar
Repurpose GUEST_SCRATCH64 stack offset to instead be for storing guest function profile times, define GUEST_SCRATCH as 0 instead, since thats already meant to be a scratch area
Fix xenia silently closing on config errors/other fatal errors by setting has_console_attached_'s default to false
Add alternative code path for guest clock that uses kusershared systemtime instead of QueryPerformanceCounter. This is way faster and I have tested it and found it to be working, but i have disabled it because i do not know how well it works on wine or on processors other than mine
Significantly reduce log spam by setting XELOGAPU and XELOGGPU to be LogLevel::Debug
Changed some LOGI to LOGD in places to reduce log spam
Mark VdSwap as kHighFrequency, it was spamming up logs
Make logging calls less intrusive for the caller by forcing the test of log level inline and moving the format/AppendLogLine stuff to an outlined cold function
Add swcache namespace for software cache operations like prefetches, streaming stores and streaming loads.
Add XE_MSVC_REORDER_BARRIER for preventing msvc from propagating a value too close to its store or from its load
Add xe_unlikely_mutex for locks we know have very little contention
add XE_HOST_CACHE_LINE_SIZE and XE_RESTRICT to platform.h
Microoptimization: Changed most uses of size_t to ring_size_t in RingBuffer, this reduces the size of the inlined ringbuffer operations slightly by eliminating rex prefixes, depending on register allocation
Add BeginPrefetchedRead to ringbuffer, which prefetches the second range if there is one according to the provided PrefetchTag
added inline_loadclock cvar, which will directly use the value of the guest clock from clock.cc in jitted guest code. off by default
change uses of GUEST_SCRATCH64 to GUEST_SCRATCH
Add fast vectorized xenos_half_to_float/xenos_float_to_half (currently resides in x64_seq_vector, move to gpu code maybe at some point)
Add fast x64 codegen for PackFloat16_4/UnpackFloat16_4. Same code can be used for Float16_2 in future commit. This should speed up some games that use these functions heavily
Remove cvar for toggling old float16 behavior
Add VRSAVE register, support mfspr/mtspr vrsave
Add cvar for toggling off codegen for trap instructions and set it to true by default.
Add specialized methods to CommandProcessor: WriteRegistersFromMem, WriteRegisterRangeFromRing, and WriteOneRegisterFromRing. These reduce the overall cost of WriteRegister
Use a fixed size vmem vector for upload ranges, realloc/memsetting on resize in the inner loop of requestranges was showing up on the profiler (the search in requestranges itself needs work)
Rename fixed_vmem_vector to better fit xenia's naming convention
Only log unknown register writes in WriteRegister if DEBUG :/. We're stuck on MSVC with c++17 so we have no way of influencing the branch ordering for that function without profile guided optimization
Remove binding stride assert in shader_translator.cc, triangle told me its leftover ogl stuff
Mark xe::FatalError as noreturn
If a controller is not connected, delay by 1.1 seconds before checking if it has been reconnected. Asking Xinput about a controller slot that is unused is extremely slow, and XinputGetState/SetState were taking up
an enormous amount of time in profiles. this may have caused a bit of input lag
Protect accesses to input_system with a lock
Add proper handling for user_index>= 4 in XamInputGetState/SetState, properly return zeroed state in GetState
Add missing argument to NtQueryVirtualMemory_entry
Fixed RtlCompareMemoryUlong_entry, it actually does not care if the source is misaligned, and for length it aligns down
Fixed RtlUpperChar and RtlLowerChar, added a table that has their correct return values precomputed
[GPU] Add FXAA post-processing
[UI] Add FidelityFX FSR and CAS post-processing
[UI] Add blue noise dithering from 10bpc to 8bpc
[GPU] Apply the DC PWL gamma ramp closer to the spec, supporting fully white color
[UI] Allow the GPU CP thread to present on the host directly, bypassing the UI thread OS paint event
[UI] Allow variable refresh rate (or tearing)
[UI] Present the newest frame (restart) on DXGI
[UI] Replace GraphicsContext with a far more advanced Presenter with more coherent surface connection and UI overlay state management
[UI] Connect presentation to windows via the Surface class, not native window handles
[Vulkan] Switch to simpler Vulkan setup with no instance/device separation due to interdependencies and to pass fewer objects around
[Vulkan] Lower the minimum required Vulkan version to 1.0
[UI/GPU] Various cleanup, mainly ComPtr usage
[UI] Support per-monitor DPI awareness v2 on Windows
[UI] DPI-scale Dear ImGui
[UI] Replace the remaining non-detachable window delegates with unified window event and input listeners
[UI] Allow listeners to safely destroy or close the window, and to register/unregister listeners without use-after-free and the ABA problem
[UI] Explicit Z ordering of input listeners and UI overlays, top-down for input, bottom-up for drawing
[UI] Add explicit window lifecycle phases
[UI] Replace Window virtual functions with explicit desired state, its application, actual state, its feedback
[UI] GTK: Apply the initial size to the drawing area
[UI] Limit internal UI frame rate to that of the monitor
[UI] Hide the cursor using a timer instead of polling due to no repeated UI thread paints with GPU CP thread presentation, and only within the window