Ensures that upon construction of a JitBase instance, that all
underlying members within the option and state structs are guaranteed
to be initialized.
This prevents potentially using a member uninitialized in some form.
Also amends the condition that was being checked. Previously it was
checking if the destination register value was 0x80000000, however it's
actually the source register that should be checked.
This macro (that has unfortunately become the de-facto way of
introducing targets) has a lot of disadvantages that outweigh the fact
that you avoid writing two extra lines of CMake script.
- It encourages the use of variables. In a build system the last thing
we want to care about is mutable state that can be avoided.
- It only handles linking in the libraries and nothing else. It's a
laziness macro.
- We should be explicit about what we're doing by introducing the target
first, not last.
This gets the ball rolling by migrating Core off the macro. Note that
this is essentially 1-to-1 unrolling of the macro, therefore we're
still linking in all libraries as public, even though that may not be
necessary.
This can be revisited once everything is off the macro for a quicker
transition period.
Trims the direct usages of the global by making the code go through the
JIT interface (where it should have been going in the first place).
This also removes direct JIT header dependencies from the breakpoints as
well. Now, no code uses the JIT global other than JIT code itself, and
the unit tests.
All of these with the record bit set update condition register 1 with the
contents of the FPSCR's FX, FEX, VX and OX bits.
Helper_UpdateCR1() does exactly that, so we use it here to perform this
for us.
These are only used internally. This also allows us to eliminate some
symbols that get dumped into the exposed Gen namespace.
By extension this also hides the Write[X] functions from OpArg's public
interface. This is only used internally by XEmitter, so they shouldn't
be usable by anything else.
This wouldn't be much of a data reader if it can't access the
read-only data pointer in read-only contexts. Especially if it
can get a writable equivalent in contexts that aren't read-only.
It's questionable to not return a reference to the instance being
assigned to. It's also quite misleading in terms of expected behavior
relative to everything else. This fixes it to make it consistent with
other classes.
Allows the default constructor to be defaulted and ensures the default
values are associated with the member variables directly.
Also corrects a prefixed underscore in the two parameter constructor.
Given that this only contains functions from the VideoBackendBase class,
it makes more sense to move these to the relevant cpp file to keep them
all together.
This is only ever memset to zero and never used again.
This also gets rid of an instance of undefined behavior considering the
draft standard for C++17 (N4659) states at [dcl.type.cv] paragraph 5:
"
The semantics of an access through a volatile glvalue are implementation-defined.
If an attempt is made to access an object defined with a volatile-qualified type
through the use of a non-volatile glvalue, the behavior is undefined.
"
Before this, DolphinQt2 would crash at boot with an assertion error
when using a Windows debug build, at least if the Dolphin GUI
language was set to English.
Depending on which constructor is invoked, m_id or m_compute_program_id
can end up in an uninitialized state. We should ensure that the object
is completely initialized to something deterministic regardless of the
constructor taken.
This prevents Dolphin from crashing when the emulated software crashes.
AFAIK, there is absolutely no performance to enabling this with the
x64 JIT.
Eventually, we should probably just remove bMMU
(https://github.com/dolphin-emu/dolphin/pull/1831). We can't do that
yet because of the ARM JIT.
A very basic hardware test shows that the ARMMSG doesn't change until
IOS replies. (People could have disassembled IOS to verify this too...)
Console:
sending request at 00034640 - ARMMSG 133e0fa0
00000000000000000000000000000010(ack) - ARMMSG 133e0fa0
00000000000000000000000000000100(reply) - ARMMSG 00034640
Dolphin, prior to this fix:
sending request (00034640) - ARMMSG 133e0fa0
00000000000000000000000000000011(ack) - ARMMSG 00034640
00000000000000000000000000000100(reply) - ARMMSG 00034640
Dolphin, after this fix:
sending request at 00034640 - ARMMSG 133e0fa0
00000000000000000000000000000011(ack) - ARMMSG 133e0fa0
00000000000000000000000000000100(reply) - ARMMSG 00034640
(Yes, note that the X1 bit is still set. This is a bug that I will
fix in the next commit.)
The IPC interrupt is triggered when IY1/IY2 is set and Y1/Y2 is written
to even when this results in clearing the bit.
This shouldn't change anything in practice but it's a difference
that Dolphin wasn't taking into account, which made me waste some time
when I was writing a hwtest :/
This adjusts IOS IPC timing to be closer to actual hardware:
* Emulate the IPC interrupt delay. On a real Wii, from the point of
view of the PPC, the IPC interrupt appears to fire about 100 TB ticks
after Y1/Y2 is seen.
* Fix the IPC acknowledgement delay. Dolphin was much, much too fast.
* Fix Device::GetDefaultReply to return more reasonable delays. Again,
Dolphin was way too fast. We now use a more realistic, average reply
time for most requests.
Note: the previous result from https://dolp.in/pr6374 is flawed.
GetTicketViews definitely takes more than 25µs to reply.
The reason the reply delay was so low is because an invalid
parameter was passed to the libogc wrapper, which causes it to
immediately return an error code (-4100).
* Fix the response delay for various replies that come from the kernel:
fd table full, unknown resource manager / device, invalid fd,
unknown IPC command.
Source: https://github.com/leoetlino/hwtests/blob/af320e4/iostest/ipc_timing.cpp
This replaces usages of the non-standard __FUNCTION__ macro with the standard
mandated __func__ identifier.
__FUNCTION__ is a preprocessor definition that is provided as an
extension by compilers. This was the only convenient option to rely on
pre-C++11. However, C++11 and greater mandate the predefined identifier
__func__, which lets us accomplish the same thing.
The difference between the two, however, is that __func__ isn't a
preprocessor macro, it's an actual identifier that exists at function
scope. The C++17 draft standard (N4659) at section [dcl.fct.def.general]
paragraph 8 states:
"
The function-local predefined variable __func__ is defined as if a
definition of the form
static const char __func__[] = "function-name ";
had been provided, where function-name is an implementation-defined
string. It is unspecified whether such
a variable has an address distinct from that of any other object in the
program.
"
Thankfully, we don't do any macro or string concatenation with __FUNCTION__
that can't be modified to use __func__.
Currently, when immediately compile shaders is not enabled, the
ubershaders will be placed before any specialized shaders in the compile
queue in hybrid ubershaders mode. This means that Dolphin could
potentially use the ubershaders for a longer time than it would have if
we blocked startup until all shaders were compiled, leading to a drop in
performance.
- In D3D, shaders could be compiled on the main thread, blocking
startup.
- Reduced the latency between a pipeline being requested and used in all
backends in hybrid ubershader mode, when no shader stages were present.
- Fixed a case where async compilation could cause the same UID to be
appended multiple times to the UID cache.
- Fix incorrect number of threads being used when immediately compile
shaders was enabled.
Fixes a crash which could occur in platforms which do not support
buffer_storage, and EFB2RAM is enabled (which indirectly uses the
attributeless buffer).
While the code is namespaced out properly, the files weren't separated
into their own directory. This moves the files so that introducing a general
interface is easier in the future for supporting other architectures.
Lowest hanging fruit I could find with a profiler.
Not sure this stuff actually needs to be done, but assuming it is, why
not do it quickly? 10x faster, goes from 1% CPU to 0.09%.
Yes, this commit is only to blame OSX and Mali. Through the former supports unsynchronized mappings, the latter supports *no* way to stream dynamic data at all. Let's try to make bad news, as they ignore friendly feature requests. Maybe we just need to make more noise...
Some locales use non-breaking spaces as separators, so getting the
encoding right is important. If DolphinWX gets a string that isn't
valid UTF-8, it flat out won't display the string.
This enables shaders to be compiled while the game is starting, instead
of blocking startup. If a shader is needed before it is compiled,
emulation will block.
As these are stored in a map, operator< will become a hot function when
doing lookups, which happen every frame. std::tie generated a rather
large function here with quite a few branches.
We would want to improve the granularity here in the future, but for
now, this should avoid any performance loss from switching to the
VideoCommon shader cache.
This saves us from having to call GetPath when the same file is being
read over and over. (GetPath is more expensive than GetOffset due to
it iterating through parts of the file system and creating strings.)
The original reason I wanted to do this was so that we can replace
the Android-specific code with this in the future, but of course,
just deduplicating between DolphinWX and DolphinQt2 is nice too.
Fixes:
- DolphinQt2 showing the wrong size for split WBFS disc images.
- DolphinQt2 being case sensitive when checking if a file is a DOL/ELF.
- DolphinQt2 not detecting when a Wii banner has become available
after the game list cache was created.
Removes:
- DolphinWX's ability to load PNGs as custom banners. But it was
already rather broken (see https://bugs.dolphin-emu.org/issues/10365
and https://bugs.dolphin-emu.org/issues/10366). The reason I removed
this was because PNG decoding relied on wx code and we don't have any
good non-wx/Qt code for loading PNG files right now (let's not use
SOIL), but we should be able to use libpng directly to implement PNG
loading in the future.
- DolphinQt2's ability to ignore a cached game if the last modified
time differs. We currently don't have a non-wx/Qt way to get the time.
This saves us from having to hardcode strings, and it also gives
us strings in whatever format is appropriate on the current OS
(for instance, IIRC Windows uses Alt+F where other OSes use Alt-F).
This commit changes devices to always return IPCCommandResult rather
than just a return code for Open() and Close() in order to be able
to better emulate reply timing.
In hindsight, I should have considered we would want to emulate
timing when I cleaned up the device interface, but alas.
This rectifies that mistake.
There is code below that assumes the presence of those macros (by #undef'ing them), but none of the included headers provided them.
This fixes a build failure on OpenBSD where the undef'd macros _do_ get picked up later on in a compilation unit (through which include, I don't know), and thus shadow the Common::swap* functions.
tl;dr: This PR speedups dolphin on mobiles with the Mali GPU and ES 3.2
drivers by a factor of 10 by using the method with the biggest overhead.
Please keep care not to buy this shit!
The ARM driver team seems to care very well about their customers. But
bad luck, users and open source developers are *not* their customers. So
even device-independent feature requests are just ignored for *years*:
https://community.arm.com/graphics/f/discussions/4645/gl_ext_buffer_storage-support
The bad point, they neither implement any of the other common ways to
stream dynamic content in unextented GL:
- They just ignore the GL_MAP_UNSYNCHRONIZED_BIT flag
- They don't support on-device buffer updates and just stall with
glBufferSubData
It seems like no benchmark is using any dynamic content - and like no
customer cares about anything but benchmarks, or users...
We have a flag to disable the glBufferSubData way, this PR adds the flag
to also disable the unsychronized mapping way. The second one is
available since their ES 3.2 update, but slow as hell.
So how to continue? The last remaining technical way to stream dynamic
content at all is to alloc a new buffer per draw call with glBufferData.
This is very gross, but still a factor 10 speedup compared to stalling
the GPU. Small tests shows that you can expect another 3-5 times speedup
with EXT_buffer_data, so Mali would be on pair with Adreno here. So if
you have bought such a device unfortunately, please try to make noise on
your vendor forums/support and ask for this extension. If you are going
to buy a new mobile, I'd recormend to avoid *any* mobile with a Mali GPU
in it.
We now differentiate between a resize event and surface change/destroyed
event, reducing the overhead for resizes in the Vulkan backend. It is
also now now safe to change the surface multiple times if the video thread
is lagging behind.
The option is named DisableCopyToVRAM under the Hacks section in
GFX.ini. It is intentionally not exposed to the GUI, as users should not
need to use it under normal circumstances. The main use is debugging
issues in the EFB-to-RAM shaders.
This could cause glReadPixels() calls which assume no buffer is bound
(e.g. CPU EFB access) to fail. The problem was limited to devices which
don't support persistent mapping, as the map path is not otherwise.
Whenever udev_monitor_receive_device() returns a non-null pointer,
the device must be unref'd after use with udev_device_unref().
We previously missed some unref calls for non-evdev devices.
Both libusbhid (system library) and libhidapi (3rd party library)
provide a function called hid_init. Dolphin was being linked to both.
The WiimoteScannerHidapi constructor was calling hid_init without
arguments. libusbhid's hid_init expects one argument (a file path).
It was being called as if it was defined without arguments, which
resulted in a garbage path being passed in, and because of that,
the Qt GUI was failing to launch with the following error:
'dolphin-emu-qt2: @ : No such file or directory'
It's not guaranteed that the eventfd is smaller than the monitor fd,
because fds are not always monotonically allocated. To select()
correctly in all cases, use the max between the monitor fd and eventfd.
The console appears to behave against standard IEEE754 specification
here, in particular around how NaNs are handled. NaNs appear to have no
effect on the result, and are treated the same as positive or negative
infinity, based on the sign bit.
However, when the result would be NaN (inf - inf, or (-inf) - (-inf)),
this results in a completely fogged color, or unfogged color
respectively. We handle this by returning a constant zero for the A
varaible, and positive or negative infinity for C depending on the sign
bits of the A and C registers. This ensures that no NaN value is passed
to the GPU in the first place, and that the result of the fog
calculation cannot be NaN.
- Smplification of graphics backend startup/shutdown.
- Don't send complete message until CPU is ready to execute.
- Remove redundant stop message.
- Remove OSD message with backend name.
Otherwise we might get UB if the value we cast is larger than the
max value of the underlying type that the compiled picked for the enum.
I haven't done any extensive check through Dolphin to find cases
of this, I'm just fixing the cases I already know of.
Tested on a linux Intel Skylake integrated graphics with
blend_func_extended force-disabled, as it's the only platform I have
that doesn't crash with ubershaders and supports fb_fetch
It seems it doesn't like modifying inout variables in place - so instead
use a temporary for ocol0/ocol1 and only write them once at the end of
the shader
The advantage of std::list is that elements can be removed from the
middle efficiently, but we don't actually need that, because the
ordering of the elements doesn't matter for us. We can just replace the
element we want to remove with the last element and then call pop_back.
Replacing list with vector should speed up looping through the elements.
GameTracker's usage of GameFileCache is thread-safe even without
using a mutex. All of its access to GameFileCache happens on the
thread m_load_thread, except for the call to GameFileCache::Load,
which finishes before m_load_thread starts executing.
Starting with 5.0-5504, trying to launch DolphinQt2 from Visual Studio
shows the error message "The operation could not be completed. Undefined
error" instead of launching the exe file. (The exe gets created
correctly, it just doesn't get launched. It's possible to work around
the problem by launching the exe manually outside of Visual Studio, but
then you won't have an attached debugger automatically.) This commit
fixes that by removing headers from DolphinQt2.vcxproj's ClInclude list
that already are in the QtMoc list. (The problem was originally about
LogWidget.h and LogConfigWidget.h, but 5.0-5600 made the problem be
about CheatWarningWidget.h and GeckoCodeWidget.h instead.)
There are two reasons for this change:
1. It removes many repetitive lines of code.
2. I think it's a good idea to enable the use of old-style section
names even for settings that previously haven't been settable in game
INIs. Mixing the two styles in INIs (using the new style only for new
settings) is not ideal, and people on the forums don't even seem to
know that the new style exists (nobody knew a way to set ubershader
settings per game, for instance). Encouraging everyone to start using
only the new style might work long-term, but it would take take time
and effort to make everyone get used to it. Considering that this commit
*reduces* the amount of code by adding the ability to use old-style
names for more settings, I'd say that adding this ability is worth it.
Trying to force the XSI version by undefining _GNU_SOURCE can lead
to compilation errors on some systems because of headers expecting
that _GNU_SOURCE is defined.
This commit uses define checks to detect which version we have.
I tried making an overloaded function (int and const char*) instead,
but that led to a warning about one of the variants being unused.
The Time Base Register was added under the BAT registers. TBL and TBU
were ORed together to get one 64-bit value to display. It is labeled TB
The Graphics Quantisation Registers were added under the Segment
Registers. They are Labeled GQR0-GQR7.
All new registers are read only.
GLES doesn't support C-style array initialisers, so stuff like:
Type var[2] = {
VALUE_A,
VALUE_B
};
isn't supported in GLES (it was added in
GL_ARB_shading_language_420pack).
The texture conversion shader used this, so would fail to compile on
GLES.
The PC offset ADRP() path takes a s32 value, but the input offset was
being tested as abs(ptr) < 0xFFFFFFFF. This caused values between
0x80000000 and 0xFFFFFFFF to incorrectly use this path, despite the
offsets not being representable in an s32.
This caused a crash in the VertexLoader on android 8.1 immediate in wind
waker (and possibly all other apps on android 8.1) as the jit and data
sections happened to be loaded 4gb apart in virtual memory, causing some
pointers to hit this
Some homebrew expect exception handlers to be present -- which is
almost always the case on console, since most of the time homebrew are
launched from either a libogc or SDK title) -- and break if they are
not. To fix this, we just need to include default, dummy handlers.
Set HID0, HID4, GPR1 to values that are used by libogc for
initialisation. This makes boots more similar to a launch
from the HBC or another loader, since normally the registers
have already been initialised by the loader.
This fixes a crash in homebrew that assume GPR1 points to a correct
location and attempt to use it before initialising registers.
HLSL does not define roundEven(), only round(). This means that the
output may differ slightly for OpenGL vs Direct3D. However, it ensures
consistency across OpenGL drivers, as round() in GLSL can go either way.
These three instructions use the B field (bits 16-20 of the opcode)
to determine what the operand register is. However, the code was
using the path that uses the C field (bits 21-25).
This amends the code to use the B field (and also fixes the 64-bit PPC
opcodes, because why not?).
Fixes issue 10683.
This fixes the rendering of the scan visor in Metroid Prime 2: Echoes,
as seen in https://fifoci.dolphin-emu.org/dff/mp2-scanner/
The alpha channel was off-by-one on Ivy Bridge due to the rounding
after multiplication with colmat. This commit removes this matrix
altogether in most cases, making them simple GLSL swizzles.
This will generate one shader per copy format. For now, it is the same
shader with the colmat hard coded. So it should already improve the GPU
performance a bit, but a rewrite of the shader generator is suggested.
Half of the patch is done by linkmauve1:
VideoCommon: Reorganise the shader writes.
Currently, a simple typo in the system name will trigger an assert
message that complains about a "programming error". This is not
user friendly and misleading.
So this changes GetSystemFromName to return an std::optional, which
allows for callers to check whether the system exists and handle
failures better.
Manually convert each argument to a UTF-8 std::string, because the
implicit conversion for wxCmdLineArgsArray to char** calls ToAscii
(which is obviously undesired).
Fixes https://bugs.dolphin-emu.org/issues/10274
Also skips swapping the window system buffers in headless mode, as there
may not be a surface which can be swapped in the first place. Instead,
we call glFlush() at the end of a frame in this case.
Cel-damage uses the color from the lighting stage of the vertex pipeline
as texture coordinates, but sets numColorChans to zero.
We now calculate the colors in all cases, but override the color before
writing it from the vertex shader if numColorChans is set to a lower value.
This was causing an issue where DolphinQt couldn't save graphics options
(DolphinWX doesn't hit this code path), because this function was being
called before the in-memory config was flushed to disk.
With this PR, the in-memory config isn't reset, and only SYSCONF-related
variables may get changed.
7f0834c9 moved the locations of the Real XFB (now XFB to RAM) and
Disabled XFB (now Immediate Mode) settings. There are programs
other than Dolphin that parse DTM headers, so this is not good.
Note that Immediate XFB actually is the inversion of Disabled XFB.
I hope that's not too much of a problem...
All file scope variables are able to be made internally linked.
CD3DFont is essentially used as an extension to the utility interface, so
this is able to be made internal as well, removing a global from
external view.
This lets Dolphin know if a configured GameCube Controller should actually
be treated as connected or not.
Talked to @JMC47 a bit about this last night. My use-case is that all of
my controllers are the same hardware (Xbox One controllers) so share the
same configuration (modulo device number). Treating them all as always
connected isn't a problem for most games, but in some (Smash Bros.) it
forces me to go find a keyboard/mouse and unconfigure any controllers
that I don't actually have connected. Hotplugging devices (works on macOS,
at least) + this patch remove my need to ever touch the Controller Config
dialog while in a game.
This patch makes the following changes:
- A new `BooleanSetting` in `GCPadEmu` called "Always Connected", which
defaults to false.
- `ControllerEmu` tracks whether the default device is connected on every
call to `UpdateReferences()`.
- `GCPadEmu.GetStatus()` now sets err bit to `PAD_ERR_NO_CONTROLLER` if
the default device isn't connected.
- `SIDevice_GCController` handles `PAD_ERR_NO_CONTROLLER` by imitating the
behaviour of `SIDevice_Null` (as far as I can tell, this is the only use
of the error bit from `GCPadStatus`).
I wanted to add an OSD message akin to the ones when Wiimotes get
connected/disconnected, but I haven't yet found where to put the logic.
This is already initialized in the class definition. This would
previously cause a -Wreorder warning on macOS, as m_config is
defined after m_currently_mapped.
Originally, Layer contained a std::map of Sections, which containted a std::map
containing the (key, value) pairs. Here we flattern this structure so that only
one std::map is required, reducing the number of indirections required and
vastly simplifying the code.