Commit Graph

43 Commits

Author SHA1 Message Date
none d8aa14da73
Small fixes for better cross-platform compatibility (#200)
* Add ifdef check before the Microsoft-specific movsq in memory.cc

* Added ifdef before Microsoft-specific movsq and replaced with memcpy in other cases. In memory.cc

* Update image_sha_bytes_ from 16 to 20  xex_module.h

The value 16 is less than the expected value 20, causing a buffer overflow during sha1 finalization.

* Update image_sha_bytes_ loop from 16 to 20 iterations xex_module.cc

* Update mapped_memory_posix.cc: Must resize file to map_length.

* Should not map nullptr with MAP_FIXED flag. Update memory_posix.cc.
2023-10-21 17:07:29 +02:00
chss95cs@gmail.com 90c771526d "Fix" debug console, we were checking the cvar before any cvars were loaded, and the condition it checks in AttachConsole is somehow always false
Remove dead #if 0'd code in math.h

On amd64, page_size == 4096 constant, on amd64 w/ win32, allocation_granularity == 65536. These values for x86 windows havent changed over the last 20 years so this is probably safe
and gives a modest code size reduction

Enable XE_USE_KUSER_SHARED. This sources host time from KUSER_SHARED instead of from QueryPerformanceCounter, which is far faster, but only has a granularity of 100 nanoseconds.

In some games seemingly random crashes were happening that were hard to trace because
the faulting thread was actually not the one that was misbehaving, another threads stack was underflowing into the faulting thread.

Added a bunch of code to synchronize the guest stack and host stack so that if a guest longjmps the host's stack will be adjusted.
Changes were also made to allow the guest to call into a piece of an existing x64 function.

This synchronization might have a slight performance impact on lower end cpus, to disable it set enable_host_guest_stack_synchronization to false.
It is possible it may have introduced regressions, but i dont know of any yet

So far, i know the synchronization change fixes the "hub crash" in super sonic and allows the game "london 2012" to go ingame.

Removed emit_useless_fpscr_updates, not emitting these updates breaks the raiden game

MapGuestAddressToMachineCode now returns nullptr if no address was found, instead of the start of the function

add Processor::LookupModule

Add Backend::DeinitializeBackendContext
Use WriteRegisterRangeFromRing_WithKnownBound<0, 0xFFFF> in WriteRegisterRangeFromRing for inlining (previously regressed on performance of ExecutePacketType0)

add notes about flags that trap in XamInputGetCapabilities

0 == 3 in XamInputGetCapabilities

Name arg 2 of XamInputSetState

PrefetchW in critical section kernel funcs if available & doing cmpxchg

Add terminated field to X_KTHREAD, set it on termination

Expanded the logic of NtResumeThread/NtSuspendThread to include checking the type of the handle (in release, LookupObject doesnt seem to do anything with the type)
and returning X_STATUS_OBJECT_TYPE_MISMATCH if invalid. Do termination check in NtSuspendThread.

Add basic host exception messagebox, need to flesh it out more (maybe use the new stack tracking stuff if on guest thrd?)

Add rdrand patching hack, mostly affects users with nvidia cards who have many threads on zen

Use page_size_shift in more places

Once again disable precompilation! Raiden is mostly weird ppc asm which probably breaks the precompilation. The code is still useful for running the compiler over the whole of an xex in debug to test for issues
"Fix" debug console, we were checking the cvar before any cvars were loaded, and the condition it checks in AttachConsole is somehow always false

Remove dead #if 0'd code in math.h

On amd64, page_size == 4096 constant, on amd64 w/ win32, allocation_granularity == 65536. These values for x86 windows havent changed over the last 20 years so this is probably safe
and gives a modest code size reduction

Enable XE_USE_KUSER_SHARED. This sources host time from KUSER_SHARED instead of from QueryPerformanceCounter, which is far faster, but only has a granularity of 100 nanoseconds.

In some games seemingly random crashes were happening that were hard to trace because
the faulting thread was actually not the one that was misbehaving, another threads stack was underflowing into the faulting thread.

Added a bunch of code to synchronize the guest stack and host stack so that if a guest longjmps the host's stack will be adjusted.
Changes were also made to allow the guest to call into a piece of an existing x64 function.

This synchronization might have a slight performance impact on lower end cpus, to disable it set enable_host_guest_stack_synchronization to false.
It is possible it may have introduced regressions, but i dont know of any yet

So far, i know the synchronization change fixes the "hub crash" in super sonic and allows the game "london 2012" to go ingame.

Removed emit_useless_fpscr_updates, not emitting these updates breaks the raiden game

MapGuestAddressToMachineCode now returns nullptr if no address was found, instead of the start of the function

add Processor::LookupModule

Add Backend::DeinitializeBackendContext
Use WriteRegisterRangeFromRing_WithKnownBound<0, 0xFFFF> in WriteRegisterRangeFromRing for inlining (previously regressed on performance of ExecutePacketType0)

add notes about flags that trap in XamInputGetCapabilities

0 == 3 in XamInputGetCapabilities

Name arg 2 of XamInputSetState

PrefetchW in critical section kernel funcs if available & doing cmpxchg

Add terminated field to X_KTHREAD, set it on termination

Expanded the logic of NtResumeThread/NtSuspendThread to include checking the type of the handle (in release, LookupObject doesnt seem to do anything with the type)
and returning X_STATUS_OBJECT_TYPE_MISMATCH if invalid. Do termination check in NtSuspendThread.

Add basic host exception messagebox, need to flesh it out more (maybe use the new stack tracking stuff if on guest thrd?)

Add rdrand patching hack, mostly affects users with nvidia cards who have many threads on zen

Use page_size_shift in more places

Once again disable precompilation! Raiden is mostly weird ppc asm which probably breaks the precompilation. The code is still useful for running the compiler over the whole of an xex in debug to test for issues
2022-11-27 09:39:33 -08:00
chss95cs@gmail.com 7a17fad88a fix crash from precompiling out of range funcs, add xexcache version, increment xexcache version (all priors are version 0 thanks to 0 initialization) 2022-11-07 05:40:18 -08:00
chss95cs@gmail.com c1d922eebf Minor decoder optimizations, kernel fixes, cpu backend fixes 2022-11-05 10:50:33 -07:00
chrisps 8186792113
Revert "Minor decoder optimizations, kernel fixes, cpu backend fixes" 2022-11-01 14:45:36 -07:00
chss95cs@gmail.com 550d1d0a7c use much faster exp2/cos approximations in ffmpeg, large decrease in cpu usage on my machine on decoder thread
properly byteswap r13 for spinlock
Add PPCOpcodeBits
stub out broken fpscr updating in ppc_hir_builder. it's just code that repeatedly does nothing right now.
add note about 0 opcode bytes being executed to ppc_frontend
Add assert to check that function end is greater than function start, can happen with malformed functions
Disable prefetch and cachecontrol by default, automatic hardware prefetchers already do the job for the most part
minor cleanup in simplification_pass, dont loop optimizations, let the pass manager do it for us
Add experimental "delay_via_maybeyield" cvar, which uses MaybeYield to "emulate" the db16cyc instruction
Add much faster/simpler way of directly calling guest functions, no longer have to do a byte by byte search through the generated code
Generate label string ids on the fly
Fix unused function warnings for prefetch on clang, fix many other clang warnings
Eliminated majority of CallNativeSafes by replacing them with naive generic code paths.
^ Vector rotate left, vector shift left, vector shift right, vector shift arithmetic right, and vector average are included
These naive paths are implemented small loops that stash the two inputs to the stack and load them in gprs from there, they are not particularly fast but should be an order of magnitude faster than callnativesafe
to a host function, which would involve a call, stashing all volatile registers, an indirect call, potentially setting up a stack frame for the arrays that the inputs get stashed to, the actual operations, a return, loading all volatile registers, a return, etc
Added the fast SHR_V128 path back in
Implement signed vector average byte, signed vector average word. previously we were emitting no code for them. signed vector average byte appears in many games
Fix bug with signed vector average 32, we were doing unsigned shift, turning negative values into big positive ones potentially
2022-10-30 08:48:58 -07:00
chss95cs@gmail.com cb85fe401c Huge set of performance improvements, combined with an architecture specific build and clang-cl users have reported absurd gains over master for some gains, in the range 50%-90%
But for normal msvc builds i would put it at around 30-50%
Added per-xexmodule caching of information per instruction, can be used to remember what code needs compiling at start up
Record what guest addresses wrote mmio and backpropagate that to future runs, eliminating dependence on exception trapping. this makes many games like h3 actually tolerable to run under a debugger
fixed a number of errors where temporaries were being passed by reference/pointer
Can now be compiled with clang-cl 14.0.1, requires -Werror off though and some other solution/project changes.
Added macros wrapping compiler extensions like noinline, forceinline, __expect, and cold.
Removed the "global lock" in guest code completely. It does not properly emulate the behavior of mfmsrd/mtmsr and it seriously cripples amd cpus. Removing this yielded around a 3x speedup in Halo Reach for me.
Disabled the microprofiler for now. The microprofiler has a huge performance cost associated with it. Developers can re-enable it in the base/profiling header if they really need it
Disable the trace writer in release builds. despite just returning after checking if the file was open the trace functions were consuming about 0.60% cpu time total
Add IsValidReg, GetRegisterInfo is a huge (about 45k) branching function and using that to check if a register was valid consumed a significant chunk of time
Optimized RingBuffer::ReadAndSwap and RingBuffer::read_count. This gave us the largest overall boost in performance. The memcpies were unnecessary and one of them was always a no-op
Added simplification rules for multiplicative patterns like (x+x), (x<<1)+x
For the most frequently called win32 functions i added code to call their underlying NT implementations, which lets us skip a lot of MS code we don't care about/isnt relevant to our usecases
^this can be toggled off in the platform_win header
handle indirect call true with constant function pointer, was occurring in h3
lookup host format swizzle in denser array
by default, don't check if a gpu register is unknown, instead just check if its out of range. controlled by a cvar
^looking up whether its known or not took approx 0.3% cpu time
Changed some things in /cpu to make the project UNITYBUILD friendly
The timer thread was spinning way too much and consuming a ton of cpu, changed it to use a blocking wait instead
tagged some conditions as XE_UNLIKELY/LIKELY based on profiler feedback (will only affect clang builds)
Shifted around some code in CommandProcessor::WriteRegister based on how frequently it was executed
added support for docdecaduple precision floating point so that we can represent our performance gains numerically
tons of other stuff im probably forgetting
2022-08-13 12:59:00 -07:00
emoose c889a8af3f [CPU] Load alt-title-ids XEX header into XexModule::opt_alternate_title_ids_ 2021-06-25 23:48:25 -05:00
Joel Linn a86d7173e1 Refactor FourCC magic uses
- Use new fourcc_t type
- Improves compiler compatibility by removing multi chars
2021-06-02 22:28:43 -05:00
emoose bb7c5b8266 [CPU/XEX] Move SecurityInfo conversion code to ReadSecurityInfo & call that during ApplyPatch 2021-01-31 23:18:54 -06:00
gibbed 5bf0b34445 C++17ification.
C++17ification!

- Filesystem interaction now uses std::filesystem::path.
- Usage of const char*, std::string have been changed to
  std::string_view where appropriate.
- Usage of printf-style functions changed to use fmt.
2020-04-07 16:09:41 -05:00
aerosoul bc8b629092 [Kernel] Enable XEX1 loading 2019-11-20 18:09:28 -06:00
emoose bbb5c938ec [CPU] Fix XexModule::FindSaveRest not finding functions properly 2018-11-01 15:50:56 +00:00
Dr. Chat 7443b7e61f [CPU] Rename ImportLibrary fields to follow naming conventions 2018-10-28 09:41:31 -05:00
emoose 265903fe66 [CPU] Add XEXP support to XexModule, if XEXP is in same folder as XEX
This was a headache to work out, big thanks to the lack of documentation on .xexp files... a ton of guesswork was involved here but luckily it turned out well.

I did have to make some pretty major changes to the way XEX files are loaded though.
Previously it'd just load everything in one go: XEX headers -> decrypt/decompress data -> load imports/symbols -> set loader data table entries, etc...

Now it's changed to something like this:
- Load base XEX headers + decrypted/decompressed image data, return X_STATUS_PENDING
- In the LoadFromFile call used to load the XEX, search for XEXP patch file (only .xexp in same folder atm)
- If patch exists: load XEXP, decrypt headers/data, apply patch to base XEX, dispose of XEXP
- Finish XEX load via LoadXexContinue() (handles imports/symbols/loader data...)

This saves us from needing to reset the imports/function/symbol stuff after patching (since all the XEX code will be a lot different), but I'm not really sure if I went about it the best way.
2018-10-20 04:36:21 +01:00
emoose 0b7f7e1657 [CPU] Move XEX2 code into XexModule class, autodetect XEX key
Code is mainly just copy/pasted from kernel/util/xex2.cc, I've tried fixing it up to work better in a class, but there's probably some things I missed.

Also includes some minor improvements to the XEX loader, like being able to try both XEX keys (retail/devkit) automatically, and some fixes to how the base address is determined.

(Previously there was code that would get base address from optional header, code that'd get it from xex_security_info, code that'd use a stored base address value...
Now everything reads it from a single stored value instead, which is set either from the xex_security_info, or if it exists from the optional header instead.
Maybe this can help improve compatibility with any weird XEX's that don't have a base address optional header?)

Compressed XEX loader also has some extra checks to make sure the compressed data hash matches what's expected.
Might increase loading times by a fraction, but could save reports from people unknowingly using corrupt XEXs.
(still no checks for non-compressed data though, maybe need to compare data with xex_security_info->ImageHash?)
2018-10-20 04:18:18 +01:00
Dr. Chat a148b965f1 KernelState should handle module launching 2016-10-24 11:01:10 -05:00
Ben Vanik 5e08889d93 More style cleanup. 2015-08-06 20:17:01 -07:00
Ben Vanik eaa1a8ee3a Refactoring SymbolInfo/FunctionInfo/Function into Symbol/Function. 2015-08-05 21:50:02 -07:00
Ben Vanik 12a29371e3 Clang fixes. 2015-07-19 18:32:48 -07:00
Dr. Chat f9977a25af Use std::vector to hold the xex header instead of new/delete 2015-07-06 19:45:10 -05:00
Dr. Chat 0388d17a72 Formatting 2015-07-06 10:57:32 -05:00
Dr. Chat 93f24d2047 XexModule keep track of whether it's loaded into memory or not 2015-07-06 10:40:35 -05:00
Dr. Chat 7f53b1d630 Allow unloading of user modules 2015-07-05 14:03:00 -05:00
Dr. Chat 0211135fd6 Fix potential corruption for GetOptHeader 2015-07-03 10:41:43 -05:00
Dr. Chat 362a521c79 Rewrite XexModule to drop dependency on old xex2 headers for imports 2015-07-03 08:17:23 -05:00
Dr. Chat 029babaf5d Drop dependency on old-style xex2 headers 2015-07-03 08:15:53 -05:00
Dr. Chat fe87c08424 Shuffle some code around. 2015-07-03 08:15:53 -05:00
Ben Vanik fb1f4906d9 xb format --all (we are now format clean). Buildbot will yell at you. 2015-06-22 22:26:51 -07:00
Ben Vanik 08770a4ec0 Mass renaming. I love clang-format. 2015-05-31 16:58:12 -07:00
Dr. Chat 589e672d20 XexModule: Resolve user library imports 2015-05-18 01:31:59 -05:00
Ben Vanik ade5388728 bool-ifying xe::cpu 2015-05-05 18:52:54 -07:00
Ben Vanik 78921c1a7e Merging Runtime into Processor. 2015-05-03 22:28:25 -07:00
Ben Vanik 30f7effa73 Code cleanup: removing common.h 2015-05-02 01:25:59 -07:00
Ben Vanik 281abea955 Converting addresses in xe::cpu to 32bit. 2015-03-24 19:41:29 -07:00
Ben Vanik 9281d62106 Moving cpu/runtime/ to cpu/. 2015-03-24 08:25:58 -07:00
Ben Vanik 29912f44c0 Moving alloy/ into xenia/cpu/ to start simplifying things. 2015-03-24 07:46:18 -07:00
Ben Vanik 00e4a4fe1b Fix #include format. 2015-01-31 22:49:47 -08:00
Ben Vanik a0eebf8898 Removing old run loop/ref/core/etc. 2014-12-31 19:26:51 -08:00
Ben Vanik 9437d0b564 Sprucing up some of alloy. 2014-07-13 21:15:37 -07:00
Ben Vanik 4d92720109 Moving all kernel files around just to fuck with whoever's keeping track ;) 2014-01-04 17:12:46 -08:00
Ben Vanik dcd9f8b6ff Module info in json. 2013-12-24 17:25:29 -08:00
Ben Vanik fdb6a5cfa3 Initial Alloy implementation.
This is a regression in functionality and performance, but a much better
foundation for the future of the project (I think). It can run basic
apps under an SSA interpreter but doesn't support some of the features
required to do real 360 apps yet.
2013-12-06 22:57:16 -08:00