xenia-canary

Commit Graph

Author	SHA1	Message	Date
none	d8aa14da73	Small fixes for better cross-platform compatibility (#200 ) * Add ifdef check before the Microsoft-specific movsq in memory.cc * Added ifdef before Microsoft-specific movsq and replaced with memcpy in other cases. In memory.cc * Update image_sha_bytes_ from 16 to 20 xex_module.h The value 16 is less than the expected value 20, causing a buffer overflow during sha1 finalization. * Update image_sha_bytes_ loop from 16 to 20 iterations xex_module.cc * Update mapped_memory_posix.cc: Must resize file to map_length. * Should not map nullptr with MAP_FIXED flag. Update memory_posix.cc.	2023-10-21 17:07:29 +02:00
chss95cs@gmail.com	90c771526d	"Fix" debug console, we were checking the cvar before any cvars were loaded, and the condition it checks in AttachConsole is somehow always false Remove dead #if 0'd code in math.h On amd64, page_size == 4096 constant, on amd64 w/ win32, allocation_granularity == 65536. These values for x86 windows havent changed over the last 20 years so this is probably safe and gives a modest code size reduction Enable XE_USE_KUSER_SHARED. This sources host time from KUSER_SHARED instead of from QueryPerformanceCounter, which is far faster, but only has a granularity of 100 nanoseconds. In some games seemingly random crashes were happening that were hard to trace because the faulting thread was actually not the one that was misbehaving, another threads stack was underflowing into the faulting thread. Added a bunch of code to synchronize the guest stack and host stack so that if a guest longjmps the host's stack will be adjusted. Changes were also made to allow the guest to call into a piece of an existing x64 function. This synchronization might have a slight performance impact on lower end cpus, to disable it set enable_host_guest_stack_synchronization to false. It is possible it may have introduced regressions, but i dont know of any yet So far, i know the synchronization change fixes the "hub crash" in super sonic and allows the game "london 2012" to go ingame. Removed emit_useless_fpscr_updates, not emitting these updates breaks the raiden game MapGuestAddressToMachineCode now returns nullptr if no address was found, instead of the start of the function add Processor::LookupModule Add Backend::DeinitializeBackendContext Use WriteRegisterRangeFromRing_WithKnownBound<0, 0xFFFF> in WriteRegisterRangeFromRing for inlining (previously regressed on performance of ExecutePacketType0) add notes about flags that trap in XamInputGetCapabilities 0 == 3 in XamInputGetCapabilities Name arg 2 of XamInputSetState PrefetchW in critical section kernel funcs if available & doing cmpxchg Add terminated field to X_KTHREAD, set it on termination Expanded the logic of NtResumeThread/NtSuspendThread to include checking the type of the handle (in release, LookupObject doesnt seem to do anything with the type) and returning X_STATUS_OBJECT_TYPE_MISMATCH if invalid. Do termination check in NtSuspendThread. Add basic host exception messagebox, need to flesh it out more (maybe use the new stack tracking stuff if on guest thrd?) Add rdrand patching hack, mostly affects users with nvidia cards who have many threads on zen Use page_size_shift in more places Once again disable precompilation! Raiden is mostly weird ppc asm which probably breaks the precompilation. The code is still useful for running the compiler over the whole of an xex in debug to test for issues "Fix" debug console, we were checking the cvar before any cvars were loaded, and the condition it checks in AttachConsole is somehow always false Remove dead #if 0'd code in math.h On amd64, page_size == 4096 constant, on amd64 w/ win32, allocation_granularity == 65536. These values for x86 windows havent changed over the last 20 years so this is probably safe and gives a modest code size reduction Enable XE_USE_KUSER_SHARED. This sources host time from KUSER_SHARED instead of from QueryPerformanceCounter, which is far faster, but only has a granularity of 100 nanoseconds. In some games seemingly random crashes were happening that were hard to trace because the faulting thread was actually not the one that was misbehaving, another threads stack was underflowing into the faulting thread. Added a bunch of code to synchronize the guest stack and host stack so that if a guest longjmps the host's stack will be adjusted. Changes were also made to allow the guest to call into a piece of an existing x64 function. This synchronization might have a slight performance impact on lower end cpus, to disable it set enable_host_guest_stack_synchronization to false. It is possible it may have introduced regressions, but i dont know of any yet So far, i know the synchronization change fixes the "hub crash" in super sonic and allows the game "london 2012" to go ingame. Removed emit_useless_fpscr_updates, not emitting these updates breaks the raiden game MapGuestAddressToMachineCode now returns nullptr if no address was found, instead of the start of the function add Processor::LookupModule Add Backend::DeinitializeBackendContext Use WriteRegisterRangeFromRing_WithKnownBound<0, 0xFFFF> in WriteRegisterRangeFromRing for inlining (previously regressed on performance of ExecutePacketType0) add notes about flags that trap in XamInputGetCapabilities 0 == 3 in XamInputGetCapabilities Name arg 2 of XamInputSetState PrefetchW in critical section kernel funcs if available & doing cmpxchg Add terminated field to X_KTHREAD, set it on termination Expanded the logic of NtResumeThread/NtSuspendThread to include checking the type of the handle (in release, LookupObject doesnt seem to do anything with the type) and returning X_STATUS_OBJECT_TYPE_MISMATCH if invalid. Do termination check in NtSuspendThread. Add basic host exception messagebox, need to flesh it out more (maybe use the new stack tracking stuff if on guest thrd?) Add rdrand patching hack, mostly affects users with nvidia cards who have many threads on zen Use page_size_shift in more places Once again disable precompilation! Raiden is mostly weird ppc asm which probably breaks the precompilation. The code is still useful for running the compiler over the whole of an xex in debug to test for issues	2022-11-27 09:39:33 -08:00
chss95cs@gmail.com	7a17fad88a	fix crash from precompiling out of range funcs, add xexcache version, increment xexcache version (all priors are version 0 thanks to 0 initialization)	2022-11-07 05:40:18 -08:00
chss95cs@gmail.com	c1d922eebf	Minor decoder optimizations, kernel fixes, cpu backend fixes	2022-11-05 10:50:33 -07:00
chrisps	8186792113	Revert "Minor decoder optimizations, kernel fixes, cpu backend fixes"	2022-11-01 14:45:36 -07:00
chss95cs@gmail.com	550d1d0a7c	use much faster exp2/cos approximations in ffmpeg, large decrease in cpu usage on my machine on decoder thread properly byteswap r13 for spinlock Add PPCOpcodeBits stub out broken fpscr updating in ppc_hir_builder. it's just code that repeatedly does nothing right now. add note about 0 opcode bytes being executed to ppc_frontend Add assert to check that function end is greater than function start, can happen with malformed functions Disable prefetch and cachecontrol by default, automatic hardware prefetchers already do the job for the most part minor cleanup in simplification_pass, dont loop optimizations, let the pass manager do it for us Add experimental "delay_via_maybeyield" cvar, which uses MaybeYield to "emulate" the db16cyc instruction Add much faster/simpler way of directly calling guest functions, no longer have to do a byte by byte search through the generated code Generate label string ids on the fly Fix unused function warnings for prefetch on clang, fix many other clang warnings Eliminated majority of CallNativeSafes by replacing them with naive generic code paths. ^ Vector rotate left, vector shift left, vector shift right, vector shift arithmetic right, and vector average are included These naive paths are implemented small loops that stash the two inputs to the stack and load them in gprs from there, they are not particularly fast but should be an order of magnitude faster than callnativesafe to a host function, which would involve a call, stashing all volatile registers, an indirect call, potentially setting up a stack frame for the arrays that the inputs get stashed to, the actual operations, a return, loading all volatile registers, a return, etc Added the fast SHR_V128 path back in Implement signed vector average byte, signed vector average word. previously we were emitting no code for them. signed vector average byte appears in many games Fix bug with signed vector average 32, we were doing unsigned shift, turning negative values into big positive ones potentially	2022-10-30 08:48:58 -07:00
chss95cs@gmail.com	cb85fe401c	Huge set of performance improvements, combined with an architecture specific build and clang-cl users have reported absurd gains over master for some gains, in the range 50%-90% But for normal msvc builds i would put it at around 30-50% Added per-xexmodule caching of information per instruction, can be used to remember what code needs compiling at start up Record what guest addresses wrote mmio and backpropagate that to future runs, eliminating dependence on exception trapping. this makes many games like h3 actually tolerable to run under a debugger fixed a number of errors where temporaries were being passed by reference/pointer Can now be compiled with clang-cl 14.0.1, requires -Werror off though and some other solution/project changes. Added macros wrapping compiler extensions like noinline, forceinline, __expect, and cold. Removed the "global lock" in guest code completely. It does not properly emulate the behavior of mfmsrd/mtmsr and it seriously cripples amd cpus. Removing this yielded around a 3x speedup in Halo Reach for me. Disabled the microprofiler for now. The microprofiler has a huge performance cost associated with it. Developers can re-enable it in the base/profiling header if they really need it Disable the trace writer in release builds. despite just returning after checking if the file was open the trace functions were consuming about 0.60% cpu time total Add IsValidReg, GetRegisterInfo is a huge (about 45k) branching function and using that to check if a register was valid consumed a significant chunk of time Optimized RingBuffer::ReadAndSwap and RingBuffer::read_count. This gave us the largest overall boost in performance. The memcpies were unnecessary and one of them was always a no-op Added simplification rules for multiplicative patterns like (x+x), (x<<1)+x For the most frequently called win32 functions i added code to call their underlying NT implementations, which lets us skip a lot of MS code we don't care about/isnt relevant to our usecases ^this can be toggled off in the platform_win header handle indirect call true with constant function pointer, was occurring in h3 lookup host format swizzle in denser array by default, don't check if a gpu register is unknown, instead just check if its out of range. controlled by a cvar ^looking up whether its known or not took approx 0.3% cpu time Changed some things in /cpu to make the project UNITYBUILD friendly The timer thread was spinning way too much and consuming a ton of cpu, changed it to use a blocking wait instead tagged some conditions as XE_UNLIKELY/LIKELY based on profiler feedback (will only affect clang builds) Shifted around some code in CommandProcessor::WriteRegister based on how frequently it was executed added support for docdecaduple precision floating point so that we can represent our performance gains numerically tons of other stuff im probably forgetting	2022-08-13 12:59:00 -07:00
emoose	c889a8af3f	[CPU] Load alt-title-ids XEX header into XexModule::opt_alternate_title_ids_	2021-06-25 23:48:25 -05:00
Joel Linn	a86d7173e1	Refactor FourCC magic uses - Use new fourcc_t type - Improves compiler compatibility by removing multi chars	2021-06-02 22:28:43 -05:00
emoose	bb7c5b8266	[CPU/XEX] Move SecurityInfo conversion code to ReadSecurityInfo & call that during ApplyPatch	2021-01-31 23:18:54 -06:00
gibbed	5bf0b34445	C++17ification. C++17ification! - Filesystem interaction now uses std::filesystem::path. - Usage of const char*, std::string have been changed to std::string_view where appropriate. - Usage of printf-style functions changed to use fmt.	2020-04-07 16:09:41 -05:00
aerosoul	bc8b629092	[Kernel] Enable XEX1 loading	2019-11-20 18:09:28 -06:00
emoose	bbb5c938ec	[CPU] Fix XexModule::FindSaveRest not finding functions properly	2018-11-01 15:50:56 +00:00
Dr. Chat	7443b7e61f	[CPU] Rename ImportLibrary fields to follow naming conventions	2018-10-28 09:41:31 -05:00
emoose	265903fe66	[CPU] Add XEXP support to XexModule, if XEXP is in same folder as XEX This was a headache to work out, big thanks to the lack of documentation on .xexp files... a ton of guesswork was involved here but luckily it turned out well. I did have to make some pretty major changes to the way XEX files are loaded though. Previously it'd just load everything in one go: XEX headers -> decrypt/decompress data -> load imports/symbols -> set loader data table entries, etc... Now it's changed to something like this: - Load base XEX headers + decrypted/decompressed image data, return X_STATUS_PENDING - In the LoadFromFile call used to load the XEX, search for XEXP patch file (only .xexp in same folder atm) - If patch exists: load XEXP, decrypt headers/data, apply patch to base XEX, dispose of XEXP - Finish XEX load via LoadXexContinue() (handles imports/symbols/loader data...) This saves us from needing to reset the imports/function/symbol stuff after patching (since all the XEX code will be a lot different), but I'm not really sure if I went about it the best way.	2018-10-20 04:36:21 +01:00
emoose	0b7f7e1657	[CPU] Move XEX2 code into XexModule class, autodetect XEX key Code is mainly just copy/pasted from kernel/util/xex2.cc, I've tried fixing it up to work better in a class, but there's probably some things I missed. Also includes some minor improvements to the XEX loader, like being able to try both XEX keys (retail/devkit) automatically, and some fixes to how the base address is determined. (Previously there was code that would get base address from optional header, code that'd get it from xex_security_info, code that'd use a stored base address value... Now everything reads it from a single stored value instead, which is set either from the xex_security_info, or if it exists from the optional header instead. Maybe this can help improve compatibility with any weird XEX's that don't have a base address optional header?) Compressed XEX loader also has some extra checks to make sure the compressed data hash matches what's expected. Might increase loading times by a fraction, but could save reports from people unknowingly using corrupt XEXs. (still no checks for non-compressed data though, maybe need to compare data with xex_security_info->ImageHash?)	2018-10-20 04:18:18 +01:00
Dr. Chat	a148b965f1	KernelState should handle module launching	2016-10-24 11:01:10 -05:00
Ben Vanik	5e08889d93	More style cleanup.	2015-08-06 20:17:01 -07:00
Ben Vanik	eaa1a8ee3a	Refactoring SymbolInfo/FunctionInfo/Function into Symbol/Function.	2015-08-05 21:50:02 -07:00
Ben Vanik	12a29371e3	Clang fixes.	2015-07-19 18:32:48 -07:00
Dr. Chat	f9977a25af	Use std::vector to hold the xex header instead of new/delete	2015-07-06 19:45:10 -05:00
Dr. Chat	0388d17a72	Formatting	2015-07-06 10:57:32 -05:00
Dr. Chat	93f24d2047	XexModule keep track of whether it's loaded into memory or not	2015-07-06 10:40:35 -05:00
Dr. Chat	7f53b1d630	Allow unloading of user modules	2015-07-05 14:03:00 -05:00
Dr. Chat	0211135fd6	Fix potential corruption for GetOptHeader	2015-07-03 10:41:43 -05:00
Dr. Chat	362a521c79	Rewrite XexModule to drop dependency on old xex2 headers for imports	2015-07-03 08:17:23 -05:00
Dr. Chat	029babaf5d	Drop dependency on old-style xex2 headers	2015-07-03 08:15:53 -05:00
Dr. Chat	fe87c08424	Shuffle some code around.	2015-07-03 08:15:53 -05:00
Ben Vanik	fb1f4906d9	xb format --all (we are now format clean). Buildbot will yell at you.	2015-06-22 22:26:51 -07:00
Ben Vanik	08770a4ec0	Mass renaming. I love clang-format.	2015-05-31 16:58:12 -07:00
Dr. Chat	589e672d20	XexModule: Resolve user library imports	2015-05-18 01:31:59 -05:00
Ben Vanik	ade5388728	bool-ifying xe::cpu	2015-05-05 18:52:54 -07:00
Ben Vanik	78921c1a7e	Merging Runtime into Processor.	2015-05-03 22:28:25 -07:00
Ben Vanik	30f7effa73	Code cleanup: removing common.h	2015-05-02 01:25:59 -07:00
Ben Vanik	281abea955	Converting addresses in xe::cpu to 32bit.	2015-03-24 19:41:29 -07:00
Ben Vanik	9281d62106	Moving cpu/runtime/ to cpu/.	2015-03-24 08:25:58 -07:00
Ben Vanik	29912f44c0	Moving alloy/ into xenia/cpu/ to start simplifying things.	2015-03-24 07:46:18 -07:00
Ben Vanik	00e4a4fe1b	Fix #include format.	2015-01-31 22:49:47 -08:00
Ben Vanik	a0eebf8898	Removing old run loop/ref/core/etc.	2014-12-31 19:26:51 -08:00
Ben Vanik	9437d0b564	Sprucing up some of alloy.	2014-07-13 21:15:37 -07:00
Ben Vanik	4d92720109	Moving all kernel files around just to fuck with whoever's keeping track ;)	2014-01-04 17:12:46 -08:00
Ben Vanik	dcd9f8b6ff	Module info in json.	2013-12-24 17:25:29 -08:00
Ben Vanik	fdb6a5cfa3	Initial Alloy implementation. This is a regression in functionality and performance, but a much better foundation for the future of the project (I think). It can run basic apps under an SSA interpreter but doesn't support some of the features required to do real 360 apps yet.	2013-12-06 22:57:16 -08:00

43 Commits