xenia-canary

Commit Graph

Author	SHA1	Message	Date
Gliniak	ec267c348a	[LINT] Fixed lint issues after clang-format update	2024-06-13 20:56:56 +02:00
Gliniak	b3f2ab0e96	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2024-05-31 22:43:59 +02:00
Triang3l	a3304d252f	[Base/GPU] Cleanup float comparisons and NaN and -0 in clamping C++ relational operators are supposed to raise FE_INVALID if an argument is NaN, use std::isless/greater[equal] instead where they were easy to locate (though there are other places possibly, mostly min/max and clamp usage was checked). Also fixes a copy-paste error making the CPU shader interpreter execute MINs as MAXs instead.	2024-05-12 19:21:37 +03:00
Gliniak	b115823735	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2024-05-10 08:59:17 +02:00
Triang3l	e9f7a8bd48	[Vulkan] Optional functionality usage improvements Functional changes: - Enable only actually used features, as drivers may take more optimal paths when certain features are disabled. - Support VK_SAMPLER_ADDRESS_MODE_CLAMP_TO_EDGE. - Fix the separateStencilMaskRef check doing the opposite. - Support shaderRoundingModeRTEFloat32. - Fix vkGetDeviceBufferMemoryRequirements pointer not passed to the Vulkan Memory Allocator. Stylistic changes: - Move all device extensions, properties and features to one structure, especially simplifying portability subset feature checks, and also making it easier to request new extension functionality in the future. - Remove extension suffixes from usage of promoted extensions.	2024-05-04 22:47:14 +03:00
Adrian	0fcdc12cb9	[APP] Create and Extract Zarchive packages	2024-04-11 19:39:27 +02:00
Radosław Gliński	06d7a5f0a3	[UI] Fixed incorrect characters range in ImGUI	2024-04-06 19:12:32 +02:00
Gliniak	b9061e6292	[LINT] Linted files + Added lint job to CI	2024-03-12 19:19:30 +01:00
Gliniak	5e0c67438c	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2023-04-09 17:28:04 +02:00
Gliniak	eb5da8e557	[UI] Changed default UI font to Tahoma. If it's not available use embedded font - Additionally allow user to provide own font size - Forced font scaling in notification window to be reasonable size	2023-03-11 13:07:07 +01:00
Gliniak	0ec65be5ff	[UI] Notification & Custom Font Support	2023-03-08 09:36:49 +01:00
Adrian	333d7c2767	[UI] Added build to exception message	2023-02-05 18:31:12 +01:00
Shoegzer	4a2f4d9cfe	Add include to fix compiling	2023-01-29 21:10:20 +03:00
Margen67	aea9714bd0	Make present_safe_area 100 So people stop asking why their games are cropped off.	2023-01-23 01:22:43 -08:00
Adrian	504fb9f205	Title selection & bug fixes Added title selection Fixed controller hotkeys for multiple connected controllers Fixed ImGUI dialog box stacking Added a new font for ImGUI	2023-01-22 22:49:51 +01:00
Adrian	459497f0b6	Implemented Controller Hotkeys (#111 ) Implemented Controller Hotkeys Added controller hotkeys Added guide button support for XInput and winkey The hotkey configurations can be found in HID -> Display controller hotkeys If the Xbox Gamebar overlay is enabled then use the Back button instead of the Guide button. - Fixed hotkey thread destruction - Fixed XINPUT_STATE by padding 4 bytes - Added hotkey vibration for user feedback - Replaced MessageBoxA with ImGuiDialog::ShowMessageBox Co-authored-by: Margen67 <Margen67@users.noreply.github.com>	2023-01-13 09:17:43 +01:00
Gliniak	26415cb8b1	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-12-31 11:19:01 +01:00
Joel Linn	da9c90835b	[ImGui] Use new key API	2022-12-28 14:16:32 -06:00
Joel Linn	f452d6a007	[UI] Fix UB (moved mem) in file picker - References to vector data become UB after vector size changes. - Add one extra level of indirection to pin the wide string memory location regardless of vector memory	2022-12-28 14:16:32 -06:00
Joel Linn	7877331d8a	[ImGui] Use ImDrawCmd::IdxOffset field `c80e8b964c` https://github.com/ocornut/imgui/issues/4845#issuecomment-1003329113	2022-12-28 14:16:32 -06:00
chss95cs@gmail.com	7d49b97e4c	Print any module name+ offset in host exception reports print thread name in host exception reports trying to force win32 error descriptions to english Return if output buffer block count is 0 in XmaContext, this is an attempt to fix a divide by zero crash many users have reported	2022-12-09 12:24:06 -08:00
chss95cs@gmail.com	a63f424c0a	Directly check PEB for IsDebuggerAttached Add constexpr getters to magicdiv class so it can be used from jitted x64/dxbc Track the guest return address as well for guest/host sync, if multiple entries have the same guest stack find the first one with a matching guest retaddr. this fixes epic mickey 2 (which the previous guest-stack change had allowed to go ingame for a bit) and potentially also a crash in fable3. Break if under debugger when stackpoints are overflowed Add much more useful output for host exceptions, print out xenia_canary.exe relative offsets if exception is in module, formatmessage for ntstatus/win32err, strerror Minor d3d12 microoptimization, instead of doing SetEventOnCompletion + WaitForSingleObject do SetEventOnCompletion w/ nullptr so that the wait happens in kernel mode, avoiding two extra context switches add unimplemented kernel functions: ExAllocatePoolWithTag ObReferenceObject ObDereferenceObject has no return value. Log a message when ObDereferenceObject/Reference receive unregistered guest kernel objects gave ObLookupThreadByThreadId its correct error status hoist object_types initialization out of ObReferenceObjectByHandle Fix out parameter values on error for a few kernel funcs add note about msr to KeSetCurrentStackPointers add X_STATUS_OBJECT_TYPE_MISMATCH check for xeNtSetEvent add msr_mask field to X_KPCR	2022-12-04 12:38:19 -08:00
chss95cs@gmail.com	90c771526d	"Fix" debug console, we were checking the cvar before any cvars were loaded, and the condition it checks in AttachConsole is somehow always false Remove dead #if 0'd code in math.h On amd64, page_size == 4096 constant, on amd64 w/ win32, allocation_granularity == 65536. These values for x86 windows havent changed over the last 20 years so this is probably safe and gives a modest code size reduction Enable XE_USE_KUSER_SHARED. This sources host time from KUSER_SHARED instead of from QueryPerformanceCounter, which is far faster, but only has a granularity of 100 nanoseconds. In some games seemingly random crashes were happening that were hard to trace because the faulting thread was actually not the one that was misbehaving, another threads stack was underflowing into the faulting thread. Added a bunch of code to synchronize the guest stack and host stack so that if a guest longjmps the host's stack will be adjusted. Changes were also made to allow the guest to call into a piece of an existing x64 function. This synchronization might have a slight performance impact on lower end cpus, to disable it set enable_host_guest_stack_synchronization to false. It is possible it may have introduced regressions, but i dont know of any yet So far, i know the synchronization change fixes the "hub crash" in super sonic and allows the game "london 2012" to go ingame. Removed emit_useless_fpscr_updates, not emitting these updates breaks the raiden game MapGuestAddressToMachineCode now returns nullptr if no address was found, instead of the start of the function add Processor::LookupModule Add Backend::DeinitializeBackendContext Use WriteRegisterRangeFromRing_WithKnownBound<0, 0xFFFF> in WriteRegisterRangeFromRing for inlining (previously regressed on performance of ExecutePacketType0) add notes about flags that trap in XamInputGetCapabilities 0 == 3 in XamInputGetCapabilities Name arg 2 of XamInputSetState PrefetchW in critical section kernel funcs if available & doing cmpxchg Add terminated field to X_KTHREAD, set it on termination Expanded the logic of NtResumeThread/NtSuspendThread to include checking the type of the handle (in release, LookupObject doesnt seem to do anything with the type) and returning X_STATUS_OBJECT_TYPE_MISMATCH if invalid. Do termination check in NtSuspendThread. Add basic host exception messagebox, need to flesh it out more (maybe use the new stack tracking stuff if on guest thrd?) Add rdrand patching hack, mostly affects users with nvidia cards who have many threads on zen Use page_size_shift in more places Once again disable precompilation! Raiden is mostly weird ppc asm which probably breaks the precompilation. The code is still useful for running the compiler over the whole of an xex in debug to test for issues "Fix" debug console, we were checking the cvar before any cvars were loaded, and the condition it checks in AttachConsole is somehow always false Remove dead #if 0'd code in math.h On amd64, page_size == 4096 constant, on amd64 w/ win32, allocation_granularity == 65536. These values for x86 windows havent changed over the last 20 years so this is probably safe and gives a modest code size reduction Enable XE_USE_KUSER_SHARED. This sources host time from KUSER_SHARED instead of from QueryPerformanceCounter, which is far faster, but only has a granularity of 100 nanoseconds. In some games seemingly random crashes were happening that were hard to trace because the faulting thread was actually not the one that was misbehaving, another threads stack was underflowing into the faulting thread. Added a bunch of code to synchronize the guest stack and host stack so that if a guest longjmps the host's stack will be adjusted. Changes were also made to allow the guest to call into a piece of an existing x64 function. This synchronization might have a slight performance impact on lower end cpus, to disable it set enable_host_guest_stack_synchronization to false. It is possible it may have introduced regressions, but i dont know of any yet So far, i know the synchronization change fixes the "hub crash" in super sonic and allows the game "london 2012" to go ingame. Removed emit_useless_fpscr_updates, not emitting these updates breaks the raiden game MapGuestAddressToMachineCode now returns nullptr if no address was found, instead of the start of the function add Processor::LookupModule Add Backend::DeinitializeBackendContext Use WriteRegisterRangeFromRing_WithKnownBound<0, 0xFFFF> in WriteRegisterRangeFromRing for inlining (previously regressed on performance of ExecutePacketType0) add notes about flags that trap in XamInputGetCapabilities 0 == 3 in XamInputGetCapabilities Name arg 2 of XamInputSetState PrefetchW in critical section kernel funcs if available & doing cmpxchg Add terminated field to X_KTHREAD, set it on termination Expanded the logic of NtResumeThread/NtSuspendThread to include checking the type of the handle (in release, LookupObject doesnt seem to do anything with the type) and returning X_STATUS_OBJECT_TYPE_MISMATCH if invalid. Do termination check in NtSuspendThread. Add basic host exception messagebox, need to flesh it out more (maybe use the new stack tracking stuff if on guest thrd?) Add rdrand patching hack, mostly affects users with nvidia cards who have many threads on zen Use page_size_shift in more places Once again disable precompilation! Raiden is mostly weird ppc asm which probably breaks the precompilation. The code is still useful for running the compiler over the whole of an xex in debug to test for issues	2022-11-27 09:39:33 -08:00
Triang3l	778333b1b5	[UI] Fix ClearInput not called in ImGuiDrawer after deferred dialog removal Also cleanup the code involved in dialog registration, and update the explanation of why dialog removal is delayed until the end of drawing (the original was written back when window listener and UI drawer callback registration during the execution of the callbacks was deferred, but that was wrong as that might result in execution of callbacks belonging to now-deleted objects).	2022-10-31 18:57:54 +03:00
Gliniak	d262214c1b	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-10-14 20:13:03 +02:00
Triang3l	45050b2380	[GPU] Vulkan fragment shader interlock RB and related fixes/cleanup Also fixes addressing of MSAA samples 2 and 3 for 64bpp color render targets in the ROV RB implementation on Direct3D 12. Additionally, with FSI/ROV, alpha test and alpha to coverage are done only if the render target 0 was dynamically written to (according to the Direct3D 9 rules for writing to color render targets, though not sure if they actually apply to the alpha tests on Direct3D 9, but for safety). There is also some code cleanup for things spotted during the development of the feature.	2022-10-09 22:06:41 +03:00
chss95cs@gmail.com	d8c94b1aee	Fix premake filter mistake that broke debug builds (and likely any build other than release)	2022-10-08 10:10:36 -07:00
chss95cs@gmail.com	b4c175d8a3	Enable SDL_LEAN_AND_MEAN, SDL_RENDER_DISABLED, saves about 500kb in final exe Build several projects that arent performance critical with /Os and /O1 under msvc windows	2022-09-29 07:26:38 -07:00
chss95cs@gmail.com	0fd4a2533b	Prevent clang-format from moving d3d12_nvapi above the require d3d12 headers	2022-09-11 14:35:33 -07:00
chss95cs@gmail.com	20638c2e61	use Sleep(0) instead of SwitchToThread, should waste less power and help the os with scheduling. PM4 buffer handling made a virtual member of commandprocessor, place the implementation/declaration into reusable macro files. this is probably the biggest boost here. Optimized SET_CONSTANT/ LOAD_CONSTANT pm4 ops based on the register range they start writing at, this was also a nice boost Expose X64 extension flags to code outside of x64 backend, so we can detect and use things like avx512, xop, avx2, etc in normal code Add freelists for HIR structures to try to reduce the number of last level cache misses during optimization (currently disabled... fixme later) Analyzed PGO feedback and reordered branches, uninlined functions, moved code out into different functions based on info from it in the PM4 functions, this gave like a 2% boost at best. Added support for the db16cyc opcode, which is used often in xb360 spinlocks. before it was just being translated to nop, now on x64 we translate it to _mm_pause but may change that in the future to reduce cpu time wasted texture util - all our divisors were powers of 2, instead we look up a shift. this made texture scaling slightly faster, more so on intel processors which seem to be worse at int divs. GetGuestTextureLayout is now a little faster, although it is still one of the heaviest functions in the emulator when scaling is on. xe_unlikely_mutex was not a good choice for the guest clock lock, (running theory) on intel processors another thread may take a significant time to update the clock? maybe because of the uint64 division? really not sure, but switched it to xe_mutex. This fixed audio stutter that i had introduced to 1 or 2 games, fixed performance on that n64 rare game with the monkeys. Took another crack at DMA implementation, another failure. Instead of passing as a parameter, keep the ringbuffer reader as the first member of commandprocessor so it can be accessed through this Added macro for noalias Applied noalias to Memory::LookupHeap. This reduced the size of the executable by 7 kb. Reworked kernel shim template, this shaved like 100kb off the exe and eliminated the indirect calls from the shim to the actual implementation. We still unconditionally generate string representations of kernel calls though :(, unless it is kHighFrequency Add nvapi extensions support, currently unused. Will use CPUVISIBLE memory at some point Inserted prefetches in a few places based on feedback from vtune. Add native implementation of SHA int8 if all elements are the same Vectorized comparisons for SetViewport, SetScissorRect Vectorized ranged comparisons for WriteRegister Add XE_MSVC_ASSUME Move FormatInfo::name out of the structure, instead look up the name in a different table. Debug related data and critical runtime data are best kept apart Templated UpdateSystemConstantValues based on ROV/RTV and primitive_polygonal Add ArchFloatMask functions, these are for storing the results of floating point comparisons without doing costly float->int pipeline transfers (vucomiss/setb) Use floatmasks in UpdateSystemConstantValues for checking if dirty, only transfer to int at end of function. Instead of dirty \|= (x == y) in UpdateSystemConstantValues, now we do dirty_u32 \|= (x^y). if any of them are not equal, dirty_u32 will be nz, else if theyre all equal it will be zero. This is more friendly to register renaming and the lack of dependencies on EFLAGS lets the compiler reorder better Add PrefetchSamplerParameters to D3D12TextureCache use PrefetchSamplerParameters in UpdateBindings to eliminate cache misses that vtune detected Add PrefetchTextureBinding to D3D12TextureCache Prefetch texture bindings to get rid of more misses vtune detected (more accesses out of order with random strides) Rewrote DMAC, still terrible though and have disabled it for now. Replace tiny memcmp of 6 U64 in render_target_cache with inline loop, msvc fails to make it a loop and instead does a thunk to their memcmp function, which is optimized for larger sizes PrefetchTextureBinding in AreActiveTextureSRVKeysUpToDate Replace memcmp calls for pipelinedescription with handwritten cmp Directly write some registers that dont have special handling in PM4 functions Changed EstimateMaxY to try to eliminate mispredictions that vtune was reporting, msvc ended up turning the changed code into a series of blends in ExecutePacketType3_EVENT_WRITE_EXT, instead of writing extents to an array on the stack and then doing xe_copy_and_swap_16 of the data to its dest, pre-swap each constant and then store those. msvc manages to unroll that into wider stores stop logging XE_SWAP every time we receive XE_SWAP, stop logging the start and end of each viz query Prefetch watch nodes in FireWatches based on feedback from vtune Removed dead code from texture_info.cc NOINLINE on GpuSwap, PGO builds did it so we should too.	2022-09-11 14:14:48 -07:00
chss95cs@gmail.com	cb85fe401c	Huge set of performance improvements, combined with an architecture specific build and clang-cl users have reported absurd gains over master for some gains, in the range 50%-90% But for normal msvc builds i would put it at around 30-50% Added per-xexmodule caching of information per instruction, can be used to remember what code needs compiling at start up Record what guest addresses wrote mmio and backpropagate that to future runs, eliminating dependence on exception trapping. this makes many games like h3 actually tolerable to run under a debugger fixed a number of errors where temporaries were being passed by reference/pointer Can now be compiled with clang-cl 14.0.1, requires -Werror off though and some other solution/project changes. Added macros wrapping compiler extensions like noinline, forceinline, __expect, and cold. Removed the "global lock" in guest code completely. It does not properly emulate the behavior of mfmsrd/mtmsr and it seriously cripples amd cpus. Removing this yielded around a 3x speedup in Halo Reach for me. Disabled the microprofiler for now. The microprofiler has a huge performance cost associated with it. Developers can re-enable it in the base/profiling header if they really need it Disable the trace writer in release builds. despite just returning after checking if the file was open the trace functions were consuming about 0.60% cpu time total Add IsValidReg, GetRegisterInfo is a huge (about 45k) branching function and using that to check if a register was valid consumed a significant chunk of time Optimized RingBuffer::ReadAndSwap and RingBuffer::read_count. This gave us the largest overall boost in performance. The memcpies were unnecessary and one of them was always a no-op Added simplification rules for multiplicative patterns like (x+x), (x<<1)+x For the most frequently called win32 functions i added code to call their underlying NT implementations, which lets us skip a lot of MS code we don't care about/isnt relevant to our usecases ^this can be toggled off in the platform_win header handle indirect call true with constant function pointer, was occurring in h3 lookup host format swizzle in denser array by default, don't check if a gpu register is unknown, instead just check if its out of range. controlled by a cvar ^looking up whether its known or not took approx 0.3% cpu time Changed some things in /cpu to make the project UNITYBUILD friendly The timer thread was spinning way too much and consuming a ton of cpu, changed it to use a blocking wait instead tagged some conditions as XE_UNLIKELY/LIKELY based on profiler feedback (will only affect clang builds) Shifted around some code in CommandProcessor::WriteRegister based on how frequently it was executed added support for docdecaduple precision floating point so that we can represent our performance gains numerically tons of other stuff im probably forgetting	2022-08-13 12:59:00 -07:00
Gliniak	5d1b641197	[Emulator] Added possiblity to install multiple packages at once	2022-07-30 15:52:41 +02:00
Triang3l	373b143049	[Base] Cvars from Android Bundle/Intent	2022-07-16 13:13:08 +03:00
Triang3l	7b8281aee0	[UI] Android ImGui touch and mouse input	2022-07-14 21:13:40 +03:00
Triang3l	037310f8dc	[Android] Unified xenia-app with windowed apps and build prerequisites	2022-07-11 21:45:57 +03:00
Triang3l	b3edc56576	[Vulkan] Merge texture and sampler descriptors into a single descriptor set Put all descriptors used by translated shaders in up to 4 descriptor sets, which is the minimum required, and the most common on Android, `maxBoundDescriptorSets` device limit value	2022-07-09 17:10:28 +03:00
Triang3l	83e9984539	[Vulkan] Remove required feature checks Fallbacks for those will be added more or less soon, the stable version won't hard-require anything beyond 1.0 and the portability subset	2022-07-03 20:54:34 +03:00
Triang3l	f7ef051025	[Vulkan] Disable validation by default	2022-07-03 19:42:22 +03:00
Triang3l	001f64852c	[Vulkan] VMA for textures	2022-07-03 19:40:48 +03:00
Triang3l	ad1ef84145	Merge branch 'master' into vulkan	2022-07-01 19:53:08 +03:00
Triang3l	e37e3ef382	[GPU] Display swap output in the trace viewer Resolve output is unreliable because resolving may be done to a subregion of a texture and even to 3D textures, and to any color format	2022-07-01 19:50:19 +03:00
Triang3l	d174762a40	Merge branch 'master' into vulkan	2022-07-01 12:51:34 +03:00
Triang3l	28670d8ec2	[UI] Presenter: Rename display size to aspect ratio	2022-07-01 12:50:45 +03:00
Triang3l	fdcbf67623	[Vulkan] Enable VK_KHR_sampler_ycbcr_conversion	2022-06-25 15:46:02 +03:00
Triang3l	4b4205ba00	[Vulkan] Frontbuffer presentation	2022-06-25 14:33:43 +03:00
Triang3l	3fc7d8753c	Merge branch 'master' into vulkan	2022-06-24 23:38:04 +03:00
Triang3l	f4a634c617	[XeSL] xesl_writeStore > xesl_Store	2022-06-24 23:37:29 +03:00
Triang3l	7a4732e14f	[GPU] XeSL swap shaders	2022-06-24 23:24:30 +03:00
Triang3l	61c4c49d76	Merge branch 'master' into vulkan	2022-06-20 12:34:41 +03:00
Triang3l	3b4845511d	[Vulkan] Don't require an explicit uint64_t cast for SetDeviceObjectName	2022-06-20 12:25:52 +03:00

1 2 3 4 5 ...

603 Commits