- In D3D, shaders could be compiled on the main thread, blocking
startup.
- Reduced the latency between a pipeline being requested and used in all
backends in hybrid ubershader mode, when no shader stages were present.
- Fixed a case where async compilation could cause the same UID to be
appended multiple times to the UID cache.
- Fix incorrect number of threads being used when immediately compile
shaders was enabled.
Fixes a crash which could occur in platforms which do not support
buffer_storage, and EFB2RAM is enabled (which indirectly uses the
attributeless buffer).
While the code is namespaced out properly, the files weren't separated
into their own directory. This moves the files so that introducing a general
interface is easier in the future for supporting other architectures.
Lowest hanging fruit I could find with a profiler.
Not sure this stuff actually needs to be done, but assuming it is, why
not do it quickly? 10x faster, goes from 1% CPU to 0.09%.
Yes, this commit is only to blame OSX and Mali. Through the former supports unsynchronized mappings, the latter supports *no* way to stream dynamic data at all. Let's try to make bad news, as they ignore friendly feature requests. Maybe we just need to make more noise...
Some locales use non-breaking spaces as separators, so getting the
encoding right is important. If DolphinWX gets a string that isn't
valid UTF-8, it flat out won't display the string.
This enables shaders to be compiled while the game is starting, instead
of blocking startup. If a shader is needed before it is compiled,
emulation will block.
As these are stored in a map, operator< will become a hot function when
doing lookups, which happen every frame. std::tie generated a rather
large function here with quite a few branches.
We would want to improve the granularity here in the future, but for
now, this should avoid any performance loss from switching to the
VideoCommon shader cache.
This saves us from having to call GetPath when the same file is being
read over and over. (GetPath is more expensive than GetOffset due to
it iterating through parts of the file system and creating strings.)
The original reason I wanted to do this was so that we can replace
the Android-specific code with this in the future, but of course,
just deduplicating between DolphinWX and DolphinQt2 is nice too.
Fixes:
- DolphinQt2 showing the wrong size for split WBFS disc images.
- DolphinQt2 being case sensitive when checking if a file is a DOL/ELF.
- DolphinQt2 not detecting when a Wii banner has become available
after the game list cache was created.
Removes:
- DolphinWX's ability to load PNGs as custom banners. But it was
already rather broken (see https://bugs.dolphin-emu.org/issues/10365
and https://bugs.dolphin-emu.org/issues/10366). The reason I removed
this was because PNG decoding relied on wx code and we don't have any
good non-wx/Qt code for loading PNG files right now (let's not use
SOIL), but we should be able to use libpng directly to implement PNG
loading in the future.
- DolphinQt2's ability to ignore a cached game if the last modified
time differs. We currently don't have a non-wx/Qt way to get the time.
This saves us from having to hardcode strings, and it also gives
us strings in whatever format is appropriate on the current OS
(for instance, IIRC Windows uses Alt+F where other OSes use Alt-F).
This commit changes devices to always return IPCCommandResult rather
than just a return code for Open() and Close() in order to be able
to better emulate reply timing.
In hindsight, I should have considered we would want to emulate
timing when I cleaned up the device interface, but alas.
This rectifies that mistake.
There is code below that assumes the presence of those macros (by #undef'ing them), but none of the included headers provided them.
This fixes a build failure on OpenBSD where the undef'd macros _do_ get picked up later on in a compilation unit (through which include, I don't know), and thus shadow the Common::swap* functions.
tl;dr: This PR speedups dolphin on mobiles with the Mali GPU and ES 3.2
drivers by a factor of 10 by using the method with the biggest overhead.
Please keep care not to buy this shit!
The ARM driver team seems to care very well about their customers. But
bad luck, users and open source developers are *not* their customers. So
even device-independent feature requests are just ignored for *years*:
https://community.arm.com/graphics/f/discussions/4645/gl_ext_buffer_storage-support
The bad point, they neither implement any of the other common ways to
stream dynamic content in unextented GL:
- They just ignore the GL_MAP_UNSYNCHRONIZED_BIT flag
- They don't support on-device buffer updates and just stall with
glBufferSubData
It seems like no benchmark is using any dynamic content - and like no
customer cares about anything but benchmarks, or users...
We have a flag to disable the glBufferSubData way, this PR adds the flag
to also disable the unsychronized mapping way. The second one is
available since their ES 3.2 update, but slow as hell.
So how to continue? The last remaining technical way to stream dynamic
content at all is to alloc a new buffer per draw call with glBufferData.
This is very gross, but still a factor 10 speedup compared to stalling
the GPU. Small tests shows that you can expect another 3-5 times speedup
with EXT_buffer_data, so Mali would be on pair with Adreno here. So if
you have bought such a device unfortunately, please try to make noise on
your vendor forums/support and ask for this extension. If you are going
to buy a new mobile, I'd recormend to avoid *any* mobile with a Mali GPU
in it.
We now differentiate between a resize event and surface change/destroyed
event, reducing the overhead for resizes in the Vulkan backend. It is
also now now safe to change the surface multiple times if the video thread
is lagging behind.
The option is named DisableCopyToVRAM under the Hacks section in
GFX.ini. It is intentionally not exposed to the GUI, as users should not
need to use it under normal circumstances. The main use is debugging
issues in the EFB-to-RAM shaders.
This could cause glReadPixels() calls which assume no buffer is bound
(e.g. CPU EFB access) to fail. The problem was limited to devices which
don't support persistent mapping, as the map path is not otherwise.
Whenever udev_monitor_receive_device() returns a non-null pointer,
the device must be unref'd after use with udev_device_unref().
We previously missed some unref calls for non-evdev devices.
Both libusbhid (system library) and libhidapi (3rd party library)
provide a function called hid_init. Dolphin was being linked to both.
The WiimoteScannerHidapi constructor was calling hid_init without
arguments. libusbhid's hid_init expects one argument (a file path).
It was being called as if it was defined without arguments, which
resulted in a garbage path being passed in, and because of that,
the Qt GUI was failing to launch with the following error:
'dolphin-emu-qt2: @ : No such file or directory'
It's not guaranteed that the eventfd is smaller than the monitor fd,
because fds are not always monotonically allocated. To select()
correctly in all cases, use the max between the monitor fd and eventfd.