The C spec states that the va_arg argument value is indeterminate after
returning from vfprintf. va_end and va_start must be called before the
variable is reused.
clang and earlier GCC versions do not provide the _xgetbv intrinsic.
GCC8 does, but unfortunately it's broken.
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85684).
Re-use our _xgetbv implementation to avoid the bug, but rename it to
avoid compilation errors as well.
It's not really used, and the OSD uses a different API.
The specified calling convention (stdcall) is also incorrect since
variadic functions are caller-clean, not callee-clean. The compilers
ignore the stdcall and just use cdecl (I think), though it does trigger
a -Wcast-calling-convention on clang.
Mapping the full buffer is killer on Vtune (either crash or requires a huge processing time).
Instead keep the same ID for code in the same buffers.
I think all buffers are correctly mapped now but I still miss the frame pointer
for VU code.
Cons:
* requires ~180MB of physical memory (virtual memory is the same so it
doesn't impact the 4GB limit)
From steam: 98.81% got at least 2GB of RAM. 83.62% got at least 4GB of RAM.
That being said, it might not really increase RAM requirements as OS could put the
new allocation in the swap.
Pro:
* code is much easier
* remove at least half of the signal listener
* last but not least, it is way easier for profiler/debugger
Doesn't fully work yet
* Unknown stack frame
* Outside any known module
Potential root cause:
* Nvidia driver
* VU code as ebp is required for emulation so likely no frame
Templace is nicer but give a hard time to compiler.
New version compile in both gcc&clang without hack
v2: add an uptr cast too for VS2013 sigh...
v3: use an ugly function pointer cast to please VS2013
* Avoid the generation of memory barrier (mfence)
* Based on the fact that it used to work on previous code without any
barrier
v2:
* keep basic code in reset path
* use relaxed access for isBusy. The variable doesn't carry load/store
dependency but is instead an hint to optimize semaphore post
The TLS buffers used by the FastFormatUnicode and FastFormatAscii
classes seem to be responsible for PCSX2 not terminating properly on
Windows under certain conditions (using MTVU before commit
1111e03901, using CDVDgigaherz without a
disc, possibly other conditions).
When PCSX2 shut downs and the FastFormatBuffers are being cleaned up,
the call to pthread_key_delete() would end up calling
WaitForSingleObject(e, INFINITE) and waiting indefinitely for an event
to trigger. It never does get triggered (for reasons unknown) and
therefore PCSX2 doesn't terminate properly.
Remove the usage of TLS buffers in the FastFormatString classes - it
fixes the termination issue on Windows and doesn't seem to have much
effect on performance.
Using __declspec(dllexport) causes duplicate export warnings to be
generated when compiling 64-bit builds. Name mangling also occurs on
functions that are exported this way, so it doesn't actually work with
the plugin system, which uses unmangled names.
The module definition file exports the functions without name mangling
and is sufficient on its own.