2013-12-07 06:57:16 +00:00
|
|
|
/**
|
|
|
|
******************************************************************************
|
|
|
|
* Xenia : Xbox 360 Emulator Research Project *
|
|
|
|
******************************************************************************
|
2020-03-02 15:37:11 +00:00
|
|
|
* Copyright 2020 Ben Vanik. All rights reserved. *
|
2013-12-07 06:57:16 +00:00
|
|
|
* Released under the BSD license - see LICENSE in the root for more details. *
|
|
|
|
******************************************************************************
|
|
|
|
*/
|
|
|
|
|
|
|
|
#ifndef XENIA_CPU_XEX_MODULE_H_
|
|
|
|
#define XENIA_CPU_XEX_MODULE_H_
|
|
|
|
|
2014-07-14 04:15:37 +00:00
|
|
|
#include <string>
|
2015-08-07 03:17:01 +00:00
|
|
|
#include <vector>
|
Huge set of performance improvements, combined with an architecture specific build and clang-cl users have reported absurd gains over master for some gains, in the range 50%-90%
But for normal msvc builds i would put it at around 30-50%
Added per-xexmodule caching of information per instruction, can be used to remember what code needs compiling at start up
Record what guest addresses wrote mmio and backpropagate that to future runs, eliminating dependence on exception trapping. this makes many games like h3 actually tolerable to run under a debugger
fixed a number of errors where temporaries were being passed by reference/pointer
Can now be compiled with clang-cl 14.0.1, requires -Werror off though and some other solution/project changes.
Added macros wrapping compiler extensions like noinline, forceinline, __expect, and cold.
Removed the "global lock" in guest code completely. It does not properly emulate the behavior of mfmsrd/mtmsr and it seriously cripples amd cpus. Removing this yielded around a 3x speedup in Halo Reach for me.
Disabled the microprofiler for now. The microprofiler has a huge performance cost associated with it. Developers can re-enable it in the base/profiling header if they really need it
Disable the trace writer in release builds. despite just returning after checking if the file was open the trace functions were consuming about 0.60% cpu time total
Add IsValidReg, GetRegisterInfo is a huge (about 45k) branching function and using that to check if a register was valid consumed a significant chunk of time
Optimized RingBuffer::ReadAndSwap and RingBuffer::read_count. This gave us the largest overall boost in performance. The memcpies were unnecessary and one of them was always a no-op
Added simplification rules for multiplicative patterns like (x+x), (x<<1)+x
For the most frequently called win32 functions i added code to call their underlying NT implementations, which lets us skip a lot of MS code we don't care about/isnt relevant to our usecases
^this can be toggled off in the platform_win header
handle indirect call true with constant function pointer, was occurring in h3
lookup host format swizzle in denser array
by default, don't check if a gpu register is unknown, instead just check if its out of range. controlled by a cvar
^looking up whether its known or not took approx 0.3% cpu time
Changed some things in /cpu to make the project UNITYBUILD friendly
The timer thread was spinning way too much and consuming a ton of cpu, changed it to use a blocking wait instead
tagged some conditions as XE_UNLIKELY/LIKELY based on profiler feedback (will only affect clang builds)
Shifted around some code in CommandProcessor::WriteRegister based on how frequently it was executed
added support for docdecaduple precision floating point so that we can represent our performance gains numerically
tons of other stuff im probably forgetting
2022-08-13 19:59:00 +00:00
|
|
|
#include "xenia/base/mapped_memory.h"
|
2015-03-24 15:25:58 +00:00
|
|
|
#include "xenia/cpu/module.h"
|
2015-06-29 05:48:24 +00:00
|
|
|
#include "xenia/kernel/util/xex2_info.h"
|
2013-12-07 06:57:16 +00:00
|
|
|
|
|
|
|
namespace xe {
|
2015-06-23 05:26:51 +00:00
|
|
|
namespace kernel {
|
|
|
|
class KernelState;
|
2015-08-07 03:17:01 +00:00
|
|
|
} // namespace kernel
|
|
|
|
} // namespace xe
|
2015-05-18 06:31:59 +00:00
|
|
|
|
2015-08-07 03:17:01 +00:00
|
|
|
namespace xe {
|
2013-12-07 06:57:16 +00:00
|
|
|
namespace cpu {
|
|
|
|
|
2021-05-06 16:46:43 +00:00
|
|
|
constexpr fourcc_t kXEX1Signature = make_fourcc("XEX1");
|
|
|
|
constexpr fourcc_t kXEX2Signature = make_fourcc("XEX2");
|
|
|
|
constexpr fourcc_t kElfSignature = make_fourcc(0x7F, 'E', 'L', 'F');
|
|
|
|
|
2015-03-24 15:25:58 +00:00
|
|
|
class Runtime;
|
Huge set of performance improvements, combined with an architecture specific build and clang-cl users have reported absurd gains over master for some gains, in the range 50%-90%
But for normal msvc builds i would put it at around 30-50%
Added per-xexmodule caching of information per instruction, can be used to remember what code needs compiling at start up
Record what guest addresses wrote mmio and backpropagate that to future runs, eliminating dependence on exception trapping. this makes many games like h3 actually tolerable to run under a debugger
fixed a number of errors where temporaries were being passed by reference/pointer
Can now be compiled with clang-cl 14.0.1, requires -Werror off though and some other solution/project changes.
Added macros wrapping compiler extensions like noinline, forceinline, __expect, and cold.
Removed the "global lock" in guest code completely. It does not properly emulate the behavior of mfmsrd/mtmsr and it seriously cripples amd cpus. Removing this yielded around a 3x speedup in Halo Reach for me.
Disabled the microprofiler for now. The microprofiler has a huge performance cost associated with it. Developers can re-enable it in the base/profiling header if they really need it
Disable the trace writer in release builds. despite just returning after checking if the file was open the trace functions were consuming about 0.60% cpu time total
Add IsValidReg, GetRegisterInfo is a huge (about 45k) branching function and using that to check if a register was valid consumed a significant chunk of time
Optimized RingBuffer::ReadAndSwap and RingBuffer::read_count. This gave us the largest overall boost in performance. The memcpies were unnecessary and one of them was always a no-op
Added simplification rules for multiplicative patterns like (x+x), (x<<1)+x
For the most frequently called win32 functions i added code to call their underlying NT implementations, which lets us skip a lot of MS code we don't care about/isnt relevant to our usecases
^this can be toggled off in the platform_win header
handle indirect call true with constant function pointer, was occurring in h3
lookup host format swizzle in denser array
by default, don't check if a gpu register is unknown, instead just check if its out of range. controlled by a cvar
^looking up whether its known or not took approx 0.3% cpu time
Changed some things in /cpu to make the project UNITYBUILD friendly
The timer thread was spinning way too much and consuming a ton of cpu, changed it to use a blocking wait instead
tagged some conditions as XE_UNLIKELY/LIKELY based on profiler feedback (will only affect clang builds)
Shifted around some code in CommandProcessor::WriteRegister based on how frequently it was executed
added support for docdecaduple precision floating point so that we can represent our performance gains numerically
tons of other stuff im probably forgetting
2022-08-13 19:59:00 +00:00
|
|
|
struct InfoCacheFlags {
|
|
|
|
uint32_t was_resolved : 1; // has this address ever been called/requested
|
|
|
|
// via resolvefunction?
|
|
|
|
uint32_t accessed_mmio : 1;
|
2022-11-05 17:50:33 +00:00
|
|
|
uint32_t is_syscall_func : 1;
|
|
|
|
uint32_t reserved : 29;
|
Huge set of performance improvements, combined with an architecture specific build and clang-cl users have reported absurd gains over master for some gains, in the range 50%-90%
But for normal msvc builds i would put it at around 30-50%
Added per-xexmodule caching of information per instruction, can be used to remember what code needs compiling at start up
Record what guest addresses wrote mmio and backpropagate that to future runs, eliminating dependence on exception trapping. this makes many games like h3 actually tolerable to run under a debugger
fixed a number of errors where temporaries were being passed by reference/pointer
Can now be compiled with clang-cl 14.0.1, requires -Werror off though and some other solution/project changes.
Added macros wrapping compiler extensions like noinline, forceinline, __expect, and cold.
Removed the "global lock" in guest code completely. It does not properly emulate the behavior of mfmsrd/mtmsr and it seriously cripples amd cpus. Removing this yielded around a 3x speedup in Halo Reach for me.
Disabled the microprofiler for now. The microprofiler has a huge performance cost associated with it. Developers can re-enable it in the base/profiling header if they really need it
Disable the trace writer in release builds. despite just returning after checking if the file was open the trace functions were consuming about 0.60% cpu time total
Add IsValidReg, GetRegisterInfo is a huge (about 45k) branching function and using that to check if a register was valid consumed a significant chunk of time
Optimized RingBuffer::ReadAndSwap and RingBuffer::read_count. This gave us the largest overall boost in performance. The memcpies were unnecessary and one of them was always a no-op
Added simplification rules for multiplicative patterns like (x+x), (x<<1)+x
For the most frequently called win32 functions i added code to call their underlying NT implementations, which lets us skip a lot of MS code we don't care about/isnt relevant to our usecases
^this can be toggled off in the platform_win header
handle indirect call true with constant function pointer, was occurring in h3
lookup host format swizzle in denser array
by default, don't check if a gpu register is unknown, instead just check if its out of range. controlled by a cvar
^looking up whether its known or not took approx 0.3% cpu time
Changed some things in /cpu to make the project UNITYBUILD friendly
The timer thread was spinning way too much and consuming a ton of cpu, changed it to use a blocking wait instead
tagged some conditions as XE_UNLIKELY/LIKELY based on profiler feedback (will only affect clang builds)
Shifted around some code in CommandProcessor::WriteRegister based on how frequently it was executed
added support for docdecaduple precision floating point so that we can represent our performance gains numerically
tons of other stuff im probably forgetting
2022-08-13 19:59:00 +00:00
|
|
|
};
|
|
|
|
struct XexInfoCache {
|
|
|
|
struct InfoCacheFlagsHeader {
|
|
|
|
unsigned char reserved[256]; // put xenia version here
|
|
|
|
|
|
|
|
InfoCacheFlags* LookupFlags(unsigned offset) {
|
|
|
|
return &reinterpret_cast<InfoCacheFlags*>(&this[1])[offset];
|
|
|
|
}
|
|
|
|
};
|
|
|
|
/*
|
|
|
|
for every 4-byte aligned address, records a 4 byte set of flags.
|
|
|
|
*/
|
|
|
|
std::unique_ptr<MappedMemory> executable_addr_flags_;
|
|
|
|
|
|
|
|
void Init(class XexModule*);
|
|
|
|
InfoCacheFlags* LookupFlags(unsigned offset) {
|
|
|
|
offset /= 4;
|
|
|
|
if (!executable_addr_flags_) {
|
|
|
|
return nullptr;
|
|
|
|
}
|
|
|
|
uint8_t* data = executable_addr_flags_->data();
|
|
|
|
|
|
|
|
if (!data) {
|
|
|
|
return nullptr;
|
|
|
|
}
|
|
|
|
return reinterpret_cast<InfoCacheFlagsHeader*>(data)->LookupFlags(offset);
|
|
|
|
}
|
|
|
|
};
|
2013-12-07 06:57:16 +00:00
|
|
|
|
2015-03-24 15:25:58 +00:00
|
|
|
class XexModule : public xe::cpu::Module {
|
2015-03-25 02:41:29 +00:00
|
|
|
public:
|
[CPU] Move XEX2 code into XexModule class, autodetect XEX key
Code is mainly just copy/pasted from kernel/util/xex2.cc, I've tried fixing it up to work better in a class, but there's probably some things I missed.
Also includes some minor improvements to the XEX loader, like being able to try both XEX keys (retail/devkit) automatically, and some fixes to how the base address is determined.
(Previously there was code that would get base address from optional header, code that'd get it from xex_security_info, code that'd use a stored base address value...
Now everything reads it from a single stored value instead, which is set either from the xex_security_info, or if it exists from the optional header instead.
Maybe this can help improve compatibility with any weird XEX's that don't have a base address optional header?)
Compressed XEX loader also has some extra checks to make sure the compressed data hash matches what's expected.
Might increase loading times by a fraction, but could save reports from people unknowingly using corrupt XEXs.
(still no checks for non-compressed data though, maybe need to compare data with xex_security_info->ImageHash?)
2018-10-20 03:18:18 +00:00
|
|
|
struct ImportLibraryFn {
|
|
|
|
public:
|
2018-10-20 16:13:42 +00:00
|
|
|
uint32_t ordinal;
|
|
|
|
uint32_t value_address;
|
|
|
|
uint32_t thunk_address;
|
[CPU] Move XEX2 code into XexModule class, autodetect XEX key
Code is mainly just copy/pasted from kernel/util/xex2.cc, I've tried fixing it up to work better in a class, but there's probably some things I missed.
Also includes some minor improvements to the XEX loader, like being able to try both XEX keys (retail/devkit) automatically, and some fixes to how the base address is determined.
(Previously there was code that would get base address from optional header, code that'd get it from xex_security_info, code that'd use a stored base address value...
Now everything reads it from a single stored value instead, which is set either from the xex_security_info, or if it exists from the optional header instead.
Maybe this can help improve compatibility with any weird XEX's that don't have a base address optional header?)
Compressed XEX loader also has some extra checks to make sure the compressed data hash matches what's expected.
Might increase loading times by a fraction, but could save reports from people unknowingly using corrupt XEXs.
(still no checks for non-compressed data though, maybe need to compare data with xex_security_info->ImageHash?)
2018-10-20 03:18:18 +00:00
|
|
|
};
|
|
|
|
struct ImportLibrary {
|
|
|
|
public:
|
2018-10-20 16:13:42 +00:00
|
|
|
std::string name;
|
|
|
|
uint32_t id;
|
|
|
|
xe_xex2_version_t version;
|
|
|
|
xe_xex2_version_t min_version;
|
|
|
|
std::vector<ImportLibraryFn> imports;
|
[CPU] Move XEX2 code into XexModule class, autodetect XEX key
Code is mainly just copy/pasted from kernel/util/xex2.cc, I've tried fixing it up to work better in a class, but there's probably some things I missed.
Also includes some minor improvements to the XEX loader, like being able to try both XEX keys (retail/devkit) automatically, and some fixes to how the base address is determined.
(Previously there was code that would get base address from optional header, code that'd get it from xex_security_info, code that'd use a stored base address value...
Now everything reads it from a single stored value instead, which is set either from the xex_security_info, or if it exists from the optional header instead.
Maybe this can help improve compatibility with any weird XEX's that don't have a base address optional header?)
Compressed XEX loader also has some extra checks to make sure the compressed data hash matches what's expected.
Might increase loading times by a fraction, but could save reports from people unknowingly using corrupt XEXs.
(still no checks for non-compressed data though, maybe need to compare data with xex_security_info->ImageHash?)
2018-10-20 03:18:18 +00:00
|
|
|
};
|
2019-10-28 18:59:30 +00:00
|
|
|
struct SecurityInfoContext {
|
|
|
|
const char* rsa_signature;
|
|
|
|
const char* aes_key;
|
|
|
|
uint32_t image_size;
|
|
|
|
uint32_t image_flags;
|
|
|
|
uint32_t export_table;
|
|
|
|
uint32_t load_address;
|
|
|
|
uint32_t page_descriptor_count;
|
|
|
|
const xex2_page_descriptor* page_descriptors;
|
|
|
|
};
|
|
|
|
enum XexFormat {
|
|
|
|
kFormatUnknown,
|
|
|
|
kFormatXex1,
|
|
|
|
kFormatXex2,
|
|
|
|
};
|
[CPU] Move XEX2 code into XexModule class, autodetect XEX key
Code is mainly just copy/pasted from kernel/util/xex2.cc, I've tried fixing it up to work better in a class, but there's probably some things I missed.
Also includes some minor improvements to the XEX loader, like being able to try both XEX keys (retail/devkit) automatically, and some fixes to how the base address is determined.
(Previously there was code that would get base address from optional header, code that'd get it from xex_security_info, code that'd use a stored base address value...
Now everything reads it from a single stored value instead, which is set either from the xex_security_info, or if it exists from the optional header instead.
Maybe this can help improve compatibility with any weird XEX's that don't have a base address optional header?)
Compressed XEX loader also has some extra checks to make sure the compressed data hash matches what's expected.
Might increase loading times by a fraction, but could save reports from people unknowingly using corrupt XEXs.
(still no checks for non-compressed data though, maybe need to compare data with xex_security_info->ImageHash?)
2018-10-20 03:18:18 +00:00
|
|
|
|
2015-05-31 23:58:12 +00:00
|
|
|
XexModule(Processor* processor, kernel::KernelState* kernel_state);
|
2013-12-07 06:57:16 +00:00
|
|
|
virtual ~XexModule();
|
|
|
|
|
2015-07-06 15:40:35 +00:00
|
|
|
bool loaded() const { return loaded_; }
|
2015-07-07 00:45:10 +00:00
|
|
|
const xex2_header* xex_header() const {
|
|
|
|
return reinterpret_cast<const xex2_header*>(xex_header_mem_.data());
|
|
|
|
}
|
2019-10-28 18:59:30 +00:00
|
|
|
const SecurityInfoContext* xex_security_info() const {
|
|
|
|
return &security_info_;
|
2018-10-20 03:36:21 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
uint32_t image_size() const {
|
|
|
|
assert_not_zero(base_address_);
|
|
|
|
|
|
|
|
// Calculate the new total size of the XEX image from its headers.
|
|
|
|
auto heap = memory()->LookupHeap(base_address_);
|
|
|
|
uint32_t total_size = 0;
|
|
|
|
for (uint32_t i = 0; i < xex_security_info()->page_descriptor_count; i++) {
|
|
|
|
// Byteswap the bitfield manually.
|
|
|
|
xex2_page_descriptor desc;
|
|
|
|
desc.value =
|
|
|
|
xe::byte_swap(xex_security_info()->page_descriptors[i].value);
|
|
|
|
|
2018-11-01 15:50:56 +00:00
|
|
|
total_size += desc.page_count * heap->page_size();
|
2018-10-20 03:36:21 +00:00
|
|
|
}
|
|
|
|
return total_size;
|
[CPU] Move XEX2 code into XexModule class, autodetect XEX key
Code is mainly just copy/pasted from kernel/util/xex2.cc, I've tried fixing it up to work better in a class, but there's probably some things I missed.
Also includes some minor improvements to the XEX loader, like being able to try both XEX keys (retail/devkit) automatically, and some fixes to how the base address is determined.
(Previously there was code that would get base address from optional header, code that'd get it from xex_security_info, code that'd use a stored base address value...
Now everything reads it from a single stored value instead, which is set either from the xex_security_info, or if it exists from the optional header instead.
Maybe this can help improve compatibility with any weird XEX's that don't have a base address optional header?)
Compressed XEX loader also has some extra checks to make sure the compressed data hash matches what's expected.
Might increase loading times by a fraction, but could save reports from people unknowingly using corrupt XEXs.
(still no checks for non-compressed data though, maybe need to compare data with xex_security_info->ImageHash?)
2018-10-20 03:18:18 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
const std::vector<ImportLibrary>* import_libraries() const {
|
|
|
|
return &import_libs_;
|
|
|
|
}
|
|
|
|
|
|
|
|
const xex2_opt_execution_info* opt_execution_info() const {
|
|
|
|
xex2_opt_execution_info* retval = nullptr;
|
|
|
|
GetOptHeader(XEX_HEADER_EXECUTION_INFO, &retval);
|
|
|
|
return retval;
|
2015-07-03 02:58:47 +00:00
|
|
|
}
|
2013-12-25 01:25:29 +00:00
|
|
|
|
[CPU] Move XEX2 code into XexModule class, autodetect XEX key
Code is mainly just copy/pasted from kernel/util/xex2.cc, I've tried fixing it up to work better in a class, but there's probably some things I missed.
Also includes some minor improvements to the XEX loader, like being able to try both XEX keys (retail/devkit) automatically, and some fixes to how the base address is determined.
(Previously there was code that would get base address from optional header, code that'd get it from xex_security_info, code that'd use a stored base address value...
Now everything reads it from a single stored value instead, which is set either from the xex_security_info, or if it exists from the optional header instead.
Maybe this can help improve compatibility with any weird XEX's that don't have a base address optional header?)
Compressed XEX loader also has some extra checks to make sure the compressed data hash matches what's expected.
Might increase loading times by a fraction, but could save reports from people unknowingly using corrupt XEXs.
(still no checks for non-compressed data though, maybe need to compare data with xex_security_info->ImageHash?)
2018-10-20 03:18:18 +00:00
|
|
|
const xex2_opt_file_format_info* opt_file_format_info() const {
|
|
|
|
xex2_opt_file_format_info* retval = nullptr;
|
|
|
|
GetOptHeader(XEX_HEADER_FILE_FORMAT_INFO, &retval);
|
|
|
|
return retval;
|
|
|
|
}
|
|
|
|
|
2021-06-15 21:28:09 +00:00
|
|
|
std::vector<uint32_t> opt_alternate_title_ids() const {
|
|
|
|
return opt_alternate_title_ids_;
|
|
|
|
}
|
|
|
|
|
[CPU] Move XEX2 code into XexModule class, autodetect XEX key
Code is mainly just copy/pasted from kernel/util/xex2.cc, I've tried fixing it up to work better in a class, but there's probably some things I missed.
Also includes some minor improvements to the XEX loader, like being able to try both XEX keys (retail/devkit) automatically, and some fixes to how the base address is determined.
(Previously there was code that would get base address from optional header, code that'd get it from xex_security_info, code that'd use a stored base address value...
Now everything reads it from a single stored value instead, which is set either from the xex_security_info, or if it exists from the optional header instead.
Maybe this can help improve compatibility with any weird XEX's that don't have a base address optional header?)
Compressed XEX loader also has some extra checks to make sure the compressed data hash matches what's expected.
Might increase loading times by a fraction, but could save reports from people unknowingly using corrupt XEXs.
(still no checks for non-compressed data though, maybe need to compare data with xex_security_info->ImageHash?)
2018-10-20 03:18:18 +00:00
|
|
|
const uint32_t base_address() const { return base_address_; }
|
2018-10-20 03:36:21 +00:00
|
|
|
const bool is_dev_kit() const { return is_dev_kit_; }
|
[CPU] Move XEX2 code into XexModule class, autodetect XEX key
Code is mainly just copy/pasted from kernel/util/xex2.cc, I've tried fixing it up to work better in a class, but there's probably some things I missed.
Also includes some minor improvements to the XEX loader, like being able to try both XEX keys (retail/devkit) automatically, and some fixes to how the base address is determined.
(Previously there was code that would get base address from optional header, code that'd get it from xex_security_info, code that'd use a stored base address value...
Now everything reads it from a single stored value instead, which is set either from the xex_security_info, or if it exists from the optional header instead.
Maybe this can help improve compatibility with any weird XEX's that don't have a base address optional header?)
Compressed XEX loader also has some extra checks to make sure the compressed data hash matches what's expected.
Might increase loading times by a fraction, but could save reports from people unknowingly using corrupt XEXs.
(still no checks for non-compressed data though, maybe need to compare data with xex_security_info->ImageHash?)
2018-10-20 03:18:18 +00:00
|
|
|
|
2015-06-29 05:48:24 +00:00
|
|
|
// Gets an optional header. Returns NULL if not found.
|
|
|
|
// Special case: if key & 0xFF == 0x00, this function will return the value,
|
2015-07-03 15:41:43 +00:00
|
|
|
// not a pointer! This assumes out_ptr points to uint32_t.
|
[CPU] Move XEX2 code into XexModule class, autodetect XEX key
Code is mainly just copy/pasted from kernel/util/xex2.cc, I've tried fixing it up to work better in a class, but there's probably some things I missed.
Also includes some minor improvements to the XEX loader, like being able to try both XEX keys (retail/devkit) automatically, and some fixes to how the base address is determined.
(Previously there was code that would get base address from optional header, code that'd get it from xex_security_info, code that'd use a stored base address value...
Now everything reads it from a single stored value instead, which is set either from the xex_security_info, or if it exists from the optional header instead.
Maybe this can help improve compatibility with any weird XEX's that don't have a base address optional header?)
Compressed XEX loader also has some extra checks to make sure the compressed data hash matches what's expected.
Might increase loading times by a fraction, but could save reports from people unknowingly using corrupt XEXs.
(still no checks for non-compressed data though, maybe need to compare data with xex_security_info->ImageHash?)
2018-10-20 03:18:18 +00:00
|
|
|
static bool GetOptHeader(const xex2_header* header, xex2_header_keys key,
|
2015-06-29 05:48:24 +00:00
|
|
|
void** out_ptr);
|
[CPU] Move XEX2 code into XexModule class, autodetect XEX key
Code is mainly just copy/pasted from kernel/util/xex2.cc, I've tried fixing it up to work better in a class, but there's probably some things I missed.
Also includes some minor improvements to the XEX loader, like being able to try both XEX keys (retail/devkit) automatically, and some fixes to how the base address is determined.
(Previously there was code that would get base address from optional header, code that'd get it from xex_security_info, code that'd use a stored base address value...
Now everything reads it from a single stored value instead, which is set either from the xex_security_info, or if it exists from the optional header instead.
Maybe this can help improve compatibility with any weird XEX's that don't have a base address optional header?)
Compressed XEX loader also has some extra checks to make sure the compressed data hash matches what's expected.
Might increase loading times by a fraction, but could save reports from people unknowingly using corrupt XEXs.
(still no checks for non-compressed data though, maybe need to compare data with xex_security_info->ImageHash?)
2018-10-20 03:18:18 +00:00
|
|
|
bool GetOptHeader(xex2_header_keys key, void** out_ptr) const;
|
2015-06-29 06:39:07 +00:00
|
|
|
|
2015-07-03 02:58:47 +00:00
|
|
|
// Ultra-cool templated version
|
|
|
|
// Special case: if key & 0xFF == 0x00, this function will return the value,
|
2015-07-03 15:41:43 +00:00
|
|
|
// not a pointer! This assumes out_ptr points to uint32_t.
|
2015-07-03 02:58:47 +00:00
|
|
|
template <typename T>
|
[CPU] Move XEX2 code into XexModule class, autodetect XEX key
Code is mainly just copy/pasted from kernel/util/xex2.cc, I've tried fixing it up to work better in a class, but there's probably some things I missed.
Also includes some minor improvements to the XEX loader, like being able to try both XEX keys (retail/devkit) automatically, and some fixes to how the base address is determined.
(Previously there was code that would get base address from optional header, code that'd get it from xex_security_info, code that'd use a stored base address value...
Now everything reads it from a single stored value instead, which is set either from the xex_security_info, or if it exists from the optional header instead.
Maybe this can help improve compatibility with any weird XEX's that don't have a base address optional header?)
Compressed XEX loader also has some extra checks to make sure the compressed data hash matches what's expected.
Might increase loading times by a fraction, but could save reports from people unknowingly using corrupt XEXs.
(still no checks for non-compressed data though, maybe need to compare data with xex_security_info->ImageHash?)
2018-10-20 03:18:18 +00:00
|
|
|
static bool GetOptHeader(const xex2_header* header, xex2_header_keys key,
|
2015-07-03 02:58:47 +00:00
|
|
|
T* out_ptr) {
|
|
|
|
return GetOptHeader(header, key, reinterpret_cast<void**>(out_ptr));
|
|
|
|
}
|
|
|
|
|
|
|
|
template <typename T>
|
[CPU] Move XEX2 code into XexModule class, autodetect XEX key
Code is mainly just copy/pasted from kernel/util/xex2.cc, I've tried fixing it up to work better in a class, but there's probably some things I missed.
Also includes some minor improvements to the XEX loader, like being able to try both XEX keys (retail/devkit) automatically, and some fixes to how the base address is determined.
(Previously there was code that would get base address from optional header, code that'd get it from xex_security_info, code that'd use a stored base address value...
Now everything reads it from a single stored value instead, which is set either from the xex_security_info, or if it exists from the optional header instead.
Maybe this can help improve compatibility with any weird XEX's that don't have a base address optional header?)
Compressed XEX loader also has some extra checks to make sure the compressed data hash matches what's expected.
Might increase loading times by a fraction, but could save reports from people unknowingly using corrupt XEXs.
(still no checks for non-compressed data though, maybe need to compare data with xex_security_info->ImageHash?)
2018-10-20 03:18:18 +00:00
|
|
|
bool GetOptHeader(xex2_header_keys key, T* out_ptr) const {
|
2015-07-03 02:58:47 +00:00
|
|
|
return GetOptHeader(key, reinterpret_cast<void**>(out_ptr));
|
|
|
|
}
|
|
|
|
|
2019-10-28 18:59:30 +00:00
|
|
|
static const void* GetSecurityInfo(const xex2_header* header);
|
2018-10-20 03:36:21 +00:00
|
|
|
|
[CPU] Move XEX2 code into XexModule class, autodetect XEX key
Code is mainly just copy/pasted from kernel/util/xex2.cc, I've tried fixing it up to work better in a class, but there's probably some things I missed.
Also includes some minor improvements to the XEX loader, like being able to try both XEX keys (retail/devkit) automatically, and some fixes to how the base address is determined.
(Previously there was code that would get base address from optional header, code that'd get it from xex_security_info, code that'd use a stored base address value...
Now everything reads it from a single stored value instead, which is set either from the xex_security_info, or if it exists from the optional header instead.
Maybe this can help improve compatibility with any weird XEX's that don't have a base address optional header?)
Compressed XEX loader also has some extra checks to make sure the compressed data hash matches what's expected.
Might increase loading times by a fraction, but could save reports from people unknowingly using corrupt XEXs.
(still no checks for non-compressed data though, maybe need to compare data with xex_security_info->ImageHash?)
2018-10-20 03:18:18 +00:00
|
|
|
const PESection* GetPESection(const char* name);
|
2015-07-03 02:58:47 +00:00
|
|
|
|
2015-06-29 06:39:07 +00:00
|
|
|
uint32_t GetProcAddress(uint16_t ordinal) const;
|
2020-03-02 15:37:11 +00:00
|
|
|
uint32_t GetProcAddress(const std::string_view name) const;
|
2015-06-29 05:48:24 +00:00
|
|
|
|
2018-10-20 03:36:21 +00:00
|
|
|
int ApplyPatch(XexModule* module);
|
2020-03-02 15:37:11 +00:00
|
|
|
bool Load(const std::string_view name, const std::string_view path,
|
2015-06-29 05:48:24 +00:00
|
|
|
const void* xex_addr, size_t xex_length);
|
2018-10-20 03:36:21 +00:00
|
|
|
bool LoadContinue();
|
2015-07-05 19:03:00 +00:00
|
|
|
bool Unload();
|
2013-12-07 06:57:16 +00:00
|
|
|
|
2018-10-20 03:36:21 +00:00
|
|
|
bool ContainsAddress(uint32_t address) override;
|
|
|
|
|
2014-07-14 04:15:37 +00:00
|
|
|
const std::string& name() const override { return name_; }
|
2016-10-24 16:00:22 +00:00
|
|
|
bool is_executable() const override {
|
|
|
|
return (xex_header()->module_flags & XEX_MODULE_TITLE) != 0;
|
|
|
|
}
|
2013-12-07 06:57:16 +00:00
|
|
|
|
2018-10-20 03:36:21 +00:00
|
|
|
bool is_valid_executable() const {
|
|
|
|
assert_not_zero(base_address_);
|
|
|
|
if (!base_address_) {
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
uint8_t* buffer = memory()->TranslateVirtual(base_address_);
|
|
|
|
return *(uint32_t*)buffer == 0x905A4D;
|
|
|
|
}
|
[CPU] Move XEX2 code into XexModule class, autodetect XEX key
Code is mainly just copy/pasted from kernel/util/xex2.cc, I've tried fixing it up to work better in a class, but there's probably some things I missed.
Also includes some minor improvements to the XEX loader, like being able to try both XEX keys (retail/devkit) automatically, and some fixes to how the base address is determined.
(Previously there was code that would get base address from optional header, code that'd get it from xex_security_info, code that'd use a stored base address value...
Now everything reads it from a single stored value instead, which is set either from the xex_security_info, or if it exists from the optional header instead.
Maybe this can help improve compatibility with any weird XEX's that don't have a base address optional header?)
Compressed XEX loader also has some extra checks to make sure the compressed data hash matches what's expected.
Might increase loading times by a fraction, but could save reports from people unknowingly using corrupt XEXs.
(still no checks for non-compressed data though, maybe need to compare data with xex_security_info->ImageHash?)
2018-10-20 03:18:18 +00:00
|
|
|
|
2018-10-20 03:36:21 +00:00
|
|
|
bool is_patch() const {
|
|
|
|
assert_not_null(xex_header());
|
|
|
|
if (!xex_header()) {
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
return (xex_header()->module_flags &
|
|
|
|
(XEX_MODULE_MODULE_PATCH | XEX_MODULE_PATCH_DELTA |
|
|
|
|
XEX_MODULE_PATCH_FULL));
|
[CPU] Move XEX2 code into XexModule class, autodetect XEX key
Code is mainly just copy/pasted from kernel/util/xex2.cc, I've tried fixing it up to work better in a class, but there's probably some things I missed.
Also includes some minor improvements to the XEX loader, like being able to try both XEX keys (retail/devkit) automatically, and some fixes to how the base address is determined.
(Previously there was code that would get base address from optional header, code that'd get it from xex_security_info, code that'd use a stored base address value...
Now everything reads it from a single stored value instead, which is set either from the xex_security_info, or if it exists from the optional header instead.
Maybe this can help improve compatibility with any weird XEX's that don't have a base address optional header?)
Compressed XEX loader also has some extra checks to make sure the compressed data hash matches what's expected.
Might increase loading times by a fraction, but could save reports from people unknowingly using corrupt XEXs.
(still no checks for non-compressed data though, maybe need to compare data with xex_security_info->ImageHash?)
2018-10-20 03:18:18 +00:00
|
|
|
}
|
|
|
|
|
Huge set of performance improvements, combined with an architecture specific build and clang-cl users have reported absurd gains over master for some gains, in the range 50%-90%
But for normal msvc builds i would put it at around 30-50%
Added per-xexmodule caching of information per instruction, can be used to remember what code needs compiling at start up
Record what guest addresses wrote mmio and backpropagate that to future runs, eliminating dependence on exception trapping. this makes many games like h3 actually tolerable to run under a debugger
fixed a number of errors where temporaries were being passed by reference/pointer
Can now be compiled with clang-cl 14.0.1, requires -Werror off though and some other solution/project changes.
Added macros wrapping compiler extensions like noinline, forceinline, __expect, and cold.
Removed the "global lock" in guest code completely. It does not properly emulate the behavior of mfmsrd/mtmsr and it seriously cripples amd cpus. Removing this yielded around a 3x speedup in Halo Reach for me.
Disabled the microprofiler for now. The microprofiler has a huge performance cost associated with it. Developers can re-enable it in the base/profiling header if they really need it
Disable the trace writer in release builds. despite just returning after checking if the file was open the trace functions were consuming about 0.60% cpu time total
Add IsValidReg, GetRegisterInfo is a huge (about 45k) branching function and using that to check if a register was valid consumed a significant chunk of time
Optimized RingBuffer::ReadAndSwap and RingBuffer::read_count. This gave us the largest overall boost in performance. The memcpies were unnecessary and one of them was always a no-op
Added simplification rules for multiplicative patterns like (x+x), (x<<1)+x
For the most frequently called win32 functions i added code to call their underlying NT implementations, which lets us skip a lot of MS code we don't care about/isnt relevant to our usecases
^this can be toggled off in the platform_win header
handle indirect call true with constant function pointer, was occurring in h3
lookup host format swizzle in denser array
by default, don't check if a gpu register is unknown, instead just check if its out of range. controlled by a cvar
^looking up whether its known or not took approx 0.3% cpu time
Changed some things in /cpu to make the project UNITYBUILD friendly
The timer thread was spinning way too much and consuming a ton of cpu, changed it to use a blocking wait instead
tagged some conditions as XE_UNLIKELY/LIKELY based on profiler feedback (will only affect clang builds)
Shifted around some code in CommandProcessor::WriteRegister based on how frequently it was executed
added support for docdecaduple precision floating point so that we can represent our performance gains numerically
tons of other stuff im probably forgetting
2022-08-13 19:59:00 +00:00
|
|
|
InfoCacheFlags* GetInstructionAddressFlags(uint32_t guest_addr);
|
2022-11-01 21:45:36 +00:00
|
|
|
|
2022-11-05 17:50:33 +00:00
|
|
|
virtual void Precompile() override;
|
2015-08-06 04:50:02 +00:00
|
|
|
protected:
|
|
|
|
std::unique_ptr<Function> CreateFunction(uint32_t address) override;
|
|
|
|
|
2015-03-25 02:41:29 +00:00
|
|
|
private:
|
2022-11-05 17:50:33 +00:00
|
|
|
void PrecompileKnownFunctions();
|
|
|
|
void PrecompileDiscoveredFunctions();
|
|
|
|
std::vector<uint32_t> PreanalyzeCode();
|
Huge set of performance improvements, combined with an architecture specific build and clang-cl users have reported absurd gains over master for some gains, in the range 50%-90%
But for normal msvc builds i would put it at around 30-50%
Added per-xexmodule caching of information per instruction, can be used to remember what code needs compiling at start up
Record what guest addresses wrote mmio and backpropagate that to future runs, eliminating dependence on exception trapping. this makes many games like h3 actually tolerable to run under a debugger
fixed a number of errors where temporaries were being passed by reference/pointer
Can now be compiled with clang-cl 14.0.1, requires -Werror off though and some other solution/project changes.
Added macros wrapping compiler extensions like noinline, forceinline, __expect, and cold.
Removed the "global lock" in guest code completely. It does not properly emulate the behavior of mfmsrd/mtmsr and it seriously cripples amd cpus. Removing this yielded around a 3x speedup in Halo Reach for me.
Disabled the microprofiler for now. The microprofiler has a huge performance cost associated with it. Developers can re-enable it in the base/profiling header if they really need it
Disable the trace writer in release builds. despite just returning after checking if the file was open the trace functions were consuming about 0.60% cpu time total
Add IsValidReg, GetRegisterInfo is a huge (about 45k) branching function and using that to check if a register was valid consumed a significant chunk of time
Optimized RingBuffer::ReadAndSwap and RingBuffer::read_count. This gave us the largest overall boost in performance. The memcpies were unnecessary and one of them was always a no-op
Added simplification rules for multiplicative patterns like (x+x), (x<<1)+x
For the most frequently called win32 functions i added code to call their underlying NT implementations, which lets us skip a lot of MS code we don't care about/isnt relevant to our usecases
^this can be toggled off in the platform_win header
handle indirect call true with constant function pointer, was occurring in h3
lookup host format swizzle in denser array
by default, don't check if a gpu register is unknown, instead just check if its out of range. controlled by a cvar
^looking up whether its known or not took approx 0.3% cpu time
Changed some things in /cpu to make the project UNITYBUILD friendly
The timer thread was spinning way too much and consuming a ton of cpu, changed it to use a blocking wait instead
tagged some conditions as XE_UNLIKELY/LIKELY based on profiler feedback (will only affect clang builds)
Shifted around some code in CommandProcessor::WriteRegister based on how frequently it was executed
added support for docdecaduple precision floating point so that we can represent our performance gains numerically
tons of other stuff im probably forgetting
2022-08-13 19:59:00 +00:00
|
|
|
friend struct XexInfoCache;
|
2021-01-04 16:10:50 +00:00
|
|
|
void ReadSecurityInfo();
|
|
|
|
|
2018-10-20 03:36:21 +00:00
|
|
|
int ReadImage(const void* xex_addr, size_t xex_length, bool use_dev_key);
|
[CPU] Move XEX2 code into XexModule class, autodetect XEX key
Code is mainly just copy/pasted from kernel/util/xex2.cc, I've tried fixing it up to work better in a class, but there's probably some things I missed.
Also includes some minor improvements to the XEX loader, like being able to try both XEX keys (retail/devkit) automatically, and some fixes to how the base address is determined.
(Previously there was code that would get base address from optional header, code that'd get it from xex_security_info, code that'd use a stored base address value...
Now everything reads it from a single stored value instead, which is set either from the xex_security_info, or if it exists from the optional header instead.
Maybe this can help improve compatibility with any weird XEX's that don't have a base address optional header?)
Compressed XEX loader also has some extra checks to make sure the compressed data hash matches what's expected.
Might increase loading times by a fraction, but could save reports from people unknowingly using corrupt XEXs.
(still no checks for non-compressed data though, maybe need to compare data with xex_security_info->ImageHash?)
2018-10-20 03:18:18 +00:00
|
|
|
int ReadImageUncompressed(const void* xex_addr, size_t xex_length);
|
|
|
|
int ReadImageBasicCompressed(const void* xex_addr, size_t xex_length);
|
|
|
|
int ReadImageCompressed(const void* xex_addr, size_t xex_length);
|
|
|
|
|
|
|
|
int ReadPEHeaders();
|
|
|
|
|
2020-03-02 15:37:11 +00:00
|
|
|
bool SetupLibraryImports(const std::string_view name,
|
2015-07-03 02:58:47 +00:00
|
|
|
const xex2_import_library* library);
|
2015-05-06 00:21:08 +00:00
|
|
|
bool FindSaveRest();
|
2013-12-07 06:57:16 +00:00
|
|
|
|
2015-07-20 01:32:48 +00:00
|
|
|
Processor* processor_ = nullptr;
|
|
|
|
kernel::KernelState* kernel_state_ = nullptr;
|
2015-03-25 02:41:29 +00:00
|
|
|
std::string name_;
|
|
|
|
std::string path_;
|
2015-07-07 00:45:10 +00:00
|
|
|
std::vector<uint8_t> xex_header_mem_; // Holds the xex header
|
2018-10-20 03:36:21 +00:00
|
|
|
std::vector<uint8_t> xexp_data_mem_; // Holds XEXP patch data
|
[CPU] Move XEX2 code into XexModule class, autodetect XEX key
Code is mainly just copy/pasted from kernel/util/xex2.cc, I've tried fixing it up to work better in a class, but there's probably some things I missed.
Also includes some minor improvements to the XEX loader, like being able to try both XEX keys (retail/devkit) automatically, and some fixes to how the base address is determined.
(Previously there was code that would get base address from optional header, code that'd get it from xex_security_info, code that'd use a stored base address value...
Now everything reads it from a single stored value instead, which is set either from the xex_security_info, or if it exists from the optional header instead.
Maybe this can help improve compatibility with any weird XEX's that don't have a base address optional header?)
Compressed XEX loader also has some extra checks to make sure the compressed data hash matches what's expected.
Might increase loading times by a fraction, but could save reports from people unknowingly using corrupt XEXs.
(still no checks for non-compressed data though, maybe need to compare data with xex_security_info->ImageHash?)
2018-10-20 03:18:18 +00:00
|
|
|
|
|
|
|
std::vector<ImportLibrary>
|
|
|
|
import_libs_; // pre-loaded import libraries for ease of use
|
|
|
|
std::vector<PESection> pe_sections_;
|
|
|
|
|
2021-06-15 21:28:09 +00:00
|
|
|
// XEX_HEADER_ALTERNATE_TITLE_IDS loaded into a safe std::vector
|
|
|
|
std::vector<uint32_t> opt_alternate_title_ids_;
|
|
|
|
|
[CPU] Move XEX2 code into XexModule class, autodetect XEX key
Code is mainly just copy/pasted from kernel/util/xex2.cc, I've tried fixing it up to work better in a class, but there's probably some things I missed.
Also includes some minor improvements to the XEX loader, like being able to try both XEX keys (retail/devkit) automatically, and some fixes to how the base address is determined.
(Previously there was code that would get base address from optional header, code that'd get it from xex_security_info, code that'd use a stored base address value...
Now everything reads it from a single stored value instead, which is set either from the xex_security_info, or if it exists from the optional header instead.
Maybe this can help improve compatibility with any weird XEX's that don't have a base address optional header?)
Compressed XEX loader also has some extra checks to make sure the compressed data hash matches what's expected.
Might increase loading times by a fraction, but could save reports from people unknowingly using corrupt XEXs.
(still no checks for non-compressed data though, maybe need to compare data with xex_security_info->ImageHash?)
2018-10-20 03:18:18 +00:00
|
|
|
uint8_t session_key_[0x10];
|
2018-10-20 03:36:21 +00:00
|
|
|
bool is_dev_kit_ = false;
|
[CPU] Move XEX2 code into XexModule class, autodetect XEX key
Code is mainly just copy/pasted from kernel/util/xex2.cc, I've tried fixing it up to work better in a class, but there's probably some things I missed.
Also includes some minor improvements to the XEX loader, like being able to try both XEX keys (retail/devkit) automatically, and some fixes to how the base address is determined.
(Previously there was code that would get base address from optional header, code that'd get it from xex_security_info, code that'd use a stored base address value...
Now everything reads it from a single stored value instead, which is set either from the xex_security_info, or if it exists from the optional header instead.
Maybe this can help improve compatibility with any weird XEX's that don't have a base address optional header?)
Compressed XEX loader also has some extra checks to make sure the compressed data hash matches what's expected.
Might increase loading times by a fraction, but could save reports from people unknowingly using corrupt XEXs.
(still no checks for non-compressed data though, maybe need to compare data with xex_security_info->ImageHash?)
2018-10-20 03:18:18 +00:00
|
|
|
|
2018-10-20 03:36:21 +00:00
|
|
|
bool loaded_ = false; // Loaded into memory?
|
|
|
|
bool finished_load_ = false; // PE/imports/symbols/etc all loaded?
|
2013-12-07 06:57:16 +00:00
|
|
|
|
2015-07-20 01:32:48 +00:00
|
|
|
uint32_t base_address_ = 0;
|
|
|
|
uint32_t low_address_ = 0;
|
|
|
|
uint32_t high_address_ = 0;
|
2019-10-28 18:59:30 +00:00
|
|
|
|
|
|
|
XexFormat xex_format_ = kFormatUnknown;
|
|
|
|
SecurityInfoContext security_info_ = {};
|
Huge set of performance improvements, combined with an architecture specific build and clang-cl users have reported absurd gains over master for some gains, in the range 50%-90%
But for normal msvc builds i would put it at around 30-50%
Added per-xexmodule caching of information per instruction, can be used to remember what code needs compiling at start up
Record what guest addresses wrote mmio and backpropagate that to future runs, eliminating dependence on exception trapping. this makes many games like h3 actually tolerable to run under a debugger
fixed a number of errors where temporaries were being passed by reference/pointer
Can now be compiled with clang-cl 14.0.1, requires -Werror off though and some other solution/project changes.
Added macros wrapping compiler extensions like noinline, forceinline, __expect, and cold.
Removed the "global lock" in guest code completely. It does not properly emulate the behavior of mfmsrd/mtmsr and it seriously cripples amd cpus. Removing this yielded around a 3x speedup in Halo Reach for me.
Disabled the microprofiler for now. The microprofiler has a huge performance cost associated with it. Developers can re-enable it in the base/profiling header if they really need it
Disable the trace writer in release builds. despite just returning after checking if the file was open the trace functions were consuming about 0.60% cpu time total
Add IsValidReg, GetRegisterInfo is a huge (about 45k) branching function and using that to check if a register was valid consumed a significant chunk of time
Optimized RingBuffer::ReadAndSwap and RingBuffer::read_count. This gave us the largest overall boost in performance. The memcpies were unnecessary and one of them was always a no-op
Added simplification rules for multiplicative patterns like (x+x), (x<<1)+x
For the most frequently called win32 functions i added code to call their underlying NT implementations, which lets us skip a lot of MS code we don't care about/isnt relevant to our usecases
^this can be toggled off in the platform_win header
handle indirect call true with constant function pointer, was occurring in h3
lookup host format swizzle in denser array
by default, don't check if a gpu register is unknown, instead just check if its out of range. controlled by a cvar
^looking up whether its known or not took approx 0.3% cpu time
Changed some things in /cpu to make the project UNITYBUILD friendly
The timer thread was spinning way too much and consuming a ton of cpu, changed it to use a blocking wait instead
tagged some conditions as XE_UNLIKELY/LIKELY based on profiler feedback (will only affect clang builds)
Shifted around some code in CommandProcessor::WriteRegister based on how frequently it was executed
added support for docdecaduple precision floating point so that we can represent our performance gains numerically
tons of other stuff im probably forgetting
2022-08-13 19:59:00 +00:00
|
|
|
|
|
|
|
uint8_t image_sha_bytes_[16];
|
|
|
|
std::string image_sha_str_;
|
|
|
|
XexInfoCache info_cache_;
|
2013-12-07 06:57:16 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
} // namespace cpu
|
|
|
|
} // namespace xe
|
|
|
|
|
|
|
|
#endif // XENIA_CPU_XEX_MODULE_H_
|