This was necessary to work around a FS timing issue which caused small
writes to take much longer than they should.
Now that we emulate timings for the FS module including its file cache,
we don't need to maintain this workaround anymore.
Everything that links in core doesn't need to see anything related to bochs, because it's only used internally.
Anything else that relies on bochs should be linking it in explicitly.
The general convention is to return a reference to the object that was
acted on, otherwise you can get into situations with errors because the
type wasn't being propagated properly
Using this in its current form would invoke undefined behavior, as it's
using a union to type pun between data types. It's also, well, unused,
so we don't need to keep it around.
We already read the necessary information with the
HostRead_Instruction() call. Internally, it calls HostRead_U32() as
well, so there's no difference in behavior.
If the locked cache isn't enabled, dcbz_l is illegal to execute
(locked cache is off, locked cache instructions don't work, makes sense)
This makes exception handling more accurate. It was previously possible to hit the DSI exception
handler when HID2[LCE] is set to zero, which isn't correct.
With this change we no longer hit the DSI handler, however we still have a lingering issue elsewhere
likely to do with exception precedence, we seem to hit the Floating Point exception handler instead
in some cases. This isn't due to the instruction itself directly however, so this is just another bug
to fix elsewhere.
CMake already has this functionality built-in. This lessens depending on the host system environment
and is more cross-platform friendly (which is always nice from a build-system point of view).
In the case we had X11 libs available, we'd allocate an XRRConfiguration instance and pass it
to the GraphicsWindow instance, but it would never actually be freed.
add_definitions and include_directories don't operate on a by-target basis, they act on a
by-directory basis (i.e. if we defined two targets, A and B, in this CMakeLists file, add_definitions
would add the definitions to the COMPILE_DEFINITIONS directory property which both A and B would
implicitly use).
The same idea applies to include_directories, only it appends to the INCLUDE_DIRECTORIES directory
property.
Instead, specify these on the target to keep scope as narrow as possible.
This is one of the conditions for an alignment exception documented in
the 750CL architecture reference manual in section 4.5.6, which also
applies to the Gekko microprocessor.
This is an exception condition documented within section 4.5.6 in the
architecture reference manual for the PPC 750CL, which also applies to
the Gekko microprocessor.
Also moves dcbz_l's implementation out of Interpreter_Paired and beside
dcbz where it belongs.
The effective address given to these instructions must be word (4 byte) aligned,
and if the address is not aligned like that, then an alignment exception
gets triggered.
We currently don't update the DSISR in this case properly, since we
didn't really handle alignment exceptions outside of ecowx and eciwx,
and even then the handling of it isn't really that great, considering
the DAR isn't updated with the address that caused the exception to
occur.
The DSISR will eventually be amended to be properly updated.
Prior to this change, it's possible for m_wake_me_up_again to be used
while it's in an uninitialized state from the exposed API.
e.g.
- Using SetEnable after construction would perform an uninitialized read.
- Using PushEvent would perform an uninitialized read by way of operator |=.
internally, an uninitialized read can happen if PullEventsInternal() is
executed before other functions.
Just to avoid the whole possibility of performing uninitialized reads,
we just give the class member a default value of false.
If the copy assignment operator is deleted, then the copy constructor
should be deleted as well, otherwise it's a hole in the API where copies
can be made (and if this were an intended case, it should be
documented).
So we delete the copy constructor and explicitly default the move
assignment and move constructor to signify this is intended to be a
move-only type.
Adjusts Common to use the ICONV_LIBRARIES variable directly and doesn't
append it to the LIBS variable.
After this, there's only one remaining usage where libraries are added
to the LIBS variable, after which it can be removed once the rest of
the targets are migrated off add_dolphin_library
This way, if we load a UID cache where the data was incomplete (e.g.
Dolphin crashed), we don't lose the existing UIDs which were previously
at the beginning.
These are bit manipulation functions, so they belong within BitUtils.
This also gets rid of duplicated code and avoids relying on compiler
reserved names existing or not existing to determine whether or not we
define a set of functions.
Optimizers are smart enough in GCC and clang to transform the code to a
ROR or ROL instruction in the respective functions.
The only place this library is needed (core) is already linked in the core target.
Also make the linkage private to create linkage failures if the dependency isn't
explicitly linked in elsewhere where it should be.
Reduces the dependency on the LIBS variable.
Return a FileHandle which will automatically close the FD when
the handle goes out of scope. For the rare cases where this behaviour
is undesirable, the FD can be released from the handle.
This is the large change in the branch.
This lets us use either the host filesystem or (in the future) a NAND
image exactly the same way, and make sure the IPC emulation code
behaves identically. Less duplicated code.
Note that "FileIO" and "FS" were merged, because it actually doesn't
make a lot of sense to split them: IOS handles requests for both
/dev/fs and files in the same resource manager, and as it turns out,
/dev/fs commands can *also* be sent to non /dev/fs file descriptors!
If we kept /dev/fs and files split, there would be no way to
emulate that correctly. I'm not aware of anything that does that (yet?)
but I think it's important to be correct.
Now that we have a proper filesystem interface, it makes more sense
to return it instead of the emulated IOS device (which isn't
really usable for any purpose other than emulated IPC).
Extract the existing FS code into a HostBackend implementing
the filesystem interface.
Compared to the original code, this uses less static state.
The open host files map is now a member variable
as it should have been. Filesystem handles are now also easier
to savestate. Some variable names and log messages were cleaned up.
Nothing else has been changed.
Add a new FileSystem class that can be used to perform operations
on the emulated Wii filesystem.
This will allow separating the IPC code (reading from and writing to
memory to handle IOS FS commands) and the actual filesystem code
in a much better way.
This also paves the way for implementing another filesystem backend
in the future -- NAND images for more complete emulation, including
filesystem metadata like permissions or file orders, which some games
and homebrew actually care about -- without needing to make any more
changes to the other parts of the codebase, in addition to making
filesystem behaviour tests easier to write.
This adds a lightweight, easy to use std::variant wrapper intended to
be used as a return type for functions that can return either a result
or an error code.
Makes our libraries explicitly link in which libraries they need.
This makes our dependencies explicit and removes the reliance on the
LIBS variable to contain the libraries that they need.
This will create a merge conflict if two PRs try to increment the
cache version at the same time, which makes it noticeable that the
PR that gets merged last needs to increment the cache version again.
We already use this for savestates and the game list cache.
Regression from 1f1dae3.
This problem doesn't happen in DolphinWX as far as I know, but if
you've ran into the problem in DolphinQt2, it will carry over to
DolphinWX because of the shared game list cache.
Minimizes repetition.
std::minmax_element can be used for the 256 * 2 case, as it's only performing byte comparisons
and thus, there will always be an element smaller than 0xffff, so it doesn't need to be included
in the set of compared values.
The SGI extension does not define calling SwapInterval with a parameter
of zero as valid. It was just lucky that drivers interpreted this as
vsync off. The EXT_swap_control extension defines zero as a valid value.
Mesa does not appear to support the EXT variant, so we fall back to
MESA_swap_control here, which also supports zero.
This is technically undefined behavior, but regardless of that, it's not
even necessary since we can just make a temporary around the MSR value
and just discard it when done with it, since all we do is query the FP
bit value with it.
Given how the hooking operates, we may not execute an instruction.
Instead of making the state a static local to the function, just make it
part of the lifecycle of the Interpreter class.
This was only ever used by the DSP assembler, and even then it was
sparsely used. Get rid of it to be consistent with types in other
sections of the DSP code.
These aren't necessary as the type being stored into a u32 are of the
same signedness and are smaller in data size, so there's no truncation
being performed.
Skip ubershader mode works the same as hybrid ubershaders in that the
shaders are compiled asynchronously. However, instead of using the
ubershader to draw the object, it skips it entirely until the
specialized shader is made available.
This mode will likely result in broken effects where a game creates an
EFB copy, and does not redraw it every frame. Therefore, it is not a
recommended option, however, it may result in better performance on
low-end systems.
The overflow check needs to occur before the condition register update
due to the fact that the summary overflow (SO) bit is used in the
updating of the condition register. If we set any overflow bits after
updating the CR, then we can potentially incorrectly report that an
overflow did not happen (in the case the SO bit wasn't set previously).
A search box is a common UI element. We don't need to explicitly tell
the user that they need to type a search term. Also,
Let's use the common "Search <items>..." placeholder text instead
and make the string shorter for localisation.
Also turns a std::string const reference into a value instance.
While this is well-defined, it does look out of place, given a new string
is being created.
ori can be used as a NOP if the two register operands are the same, and
the immediate is zero, not only if the two register operands are r0.
Also removes the check for !inst.Rc, as ori only has one encoding, and
said encoding doesn't even have a record bit in it.
This fix the awkwardness of having the symbols detection, parsing and loading related logs be in OS HLE while they don't have anything to do with that.
The OV bit is non-sticky. Therefore, after an overflow-enabled
instruction executes, if an overflow does *not* occur, then OV is
cleared. SO is sticky however, so it staying set in this case is
correct.
With this, JitAsm code doesn't have any reliance on the JIT global
variable. This means the core JIT64 code no longer relies on said
global at all. The Jit64 common code, however, still has some offenders.
Notably, EmuCodeBlock and Jit64AsmCommon are the remaining places in the
common code that make use of the global variable.
The AutoUpdate module is a generic update checker mechanism which can be
used by UI backends to trigger an auto-update check as well as the
actual update process.
Currently only configurable through .ini and the Qt implementation is
completely placeholder-y -- blocking the main thread on a network
request on startup, etc.
Ensures that upon construction of a JitBase instance, that all
underlying members within the option and state structs are guaranteed
to be initialized.
This prevents potentially using a member uninitialized in some form.
Also amends the condition that was being checked. Previously it was
checking if the destination register value was 0x80000000, however it's
actually the source register that should be checked.
This macro (that has unfortunately become the de-facto way of
introducing targets) has a lot of disadvantages that outweigh the fact
that you avoid writing two extra lines of CMake script.
- It encourages the use of variables. In a build system the last thing
we want to care about is mutable state that can be avoided.
- It only handles linking in the libraries and nothing else. It's a
laziness macro.
- We should be explicit about what we're doing by introducing the target
first, not last.
This gets the ball rolling by migrating Core off the macro. Note that
this is essentially 1-to-1 unrolling of the macro, therefore we're
still linking in all libraries as public, even though that may not be
necessary.
This can be revisited once everything is off the macro for a quicker
transition period.
Trims the direct usages of the global by making the code go through the
JIT interface (where it should have been going in the first place).
This also removes direct JIT header dependencies from the breakpoints as
well. Now, no code uses the JIT global other than JIT code itself, and
the unit tests.
All of these with the record bit set update condition register 1 with the
contents of the FPSCR's FX, FEX, VX and OX bits.
Helper_UpdateCR1() does exactly that, so we use it here to perform this
for us.
These are only used internally. This also allows us to eliminate some
symbols that get dumped into the exposed Gen namespace.
By extension this also hides the Write[X] functions from OpArg's public
interface. This is only used internally by XEmitter, so they shouldn't
be usable by anything else.
This wouldn't be much of a data reader if it can't access the
read-only data pointer in read-only contexts. Especially if it
can get a writable equivalent in contexts that aren't read-only.
It's questionable to not return a reference to the instance being
assigned to. It's also quite misleading in terms of expected behavior
relative to everything else. This fixes it to make it consistent with
other classes.
Allows the default constructor to be defaulted and ensures the default
values are associated with the member variables directly.
Also corrects a prefixed underscore in the two parameter constructor.
Given that this only contains functions from the VideoBackendBase class,
it makes more sense to move these to the relevant cpp file to keep them
all together.
This is only ever memset to zero and never used again.
This also gets rid of an instance of undefined behavior considering the
draft standard for C++17 (N4659) states at [dcl.type.cv] paragraph 5:
"
The semantics of an access through a volatile glvalue are implementation-defined.
If an attempt is made to access an object defined with a volatile-qualified type
through the use of a non-volatile glvalue, the behavior is undefined.
"
Before this, DolphinQt2 would crash at boot with an assertion error
when using a Windows debug build, at least if the Dolphin GUI
language was set to English.
Depending on which constructor is invoked, m_id or m_compute_program_id
can end up in an uninitialized state. We should ensure that the object
is completely initialized to something deterministic regardless of the
constructor taken.
This prevents Dolphin from crashing when the emulated software crashes.
AFAIK, there is absolutely no performance to enabling this with the
x64 JIT.
Eventually, we should probably just remove bMMU
(https://github.com/dolphin-emu/dolphin/pull/1831). We can't do that
yet because of the ARM JIT.
A very basic hardware test shows that the ARMMSG doesn't change until
IOS replies. (People could have disassembled IOS to verify this too...)
Console:
sending request at 00034640 - ARMMSG 133e0fa0
00000000000000000000000000000010(ack) - ARMMSG 133e0fa0
00000000000000000000000000000100(reply) - ARMMSG 00034640
Dolphin, prior to this fix:
sending request (00034640) - ARMMSG 133e0fa0
00000000000000000000000000000011(ack) - ARMMSG 00034640
00000000000000000000000000000100(reply) - ARMMSG 00034640
Dolphin, after this fix:
sending request at 00034640 - ARMMSG 133e0fa0
00000000000000000000000000000011(ack) - ARMMSG 133e0fa0
00000000000000000000000000000100(reply) - ARMMSG 00034640
(Yes, note that the X1 bit is still set. This is a bug that I will
fix in the next commit.)
The IPC interrupt is triggered when IY1/IY2 is set and Y1/Y2 is written
to even when this results in clearing the bit.
This shouldn't change anything in practice but it's a difference
that Dolphin wasn't taking into account, which made me waste some time
when I was writing a hwtest :/
This adjusts IOS IPC timing to be closer to actual hardware:
* Emulate the IPC interrupt delay. On a real Wii, from the point of
view of the PPC, the IPC interrupt appears to fire about 100 TB ticks
after Y1/Y2 is seen.
* Fix the IPC acknowledgement delay. Dolphin was much, much too fast.
* Fix Device::GetDefaultReply to return more reasonable delays. Again,
Dolphin was way too fast. We now use a more realistic, average reply
time for most requests.
Note: the previous result from https://dolp.in/pr6374 is flawed.
GetTicketViews definitely takes more than 25µs to reply.
The reason the reply delay was so low is because an invalid
parameter was passed to the libogc wrapper, which causes it to
immediately return an error code (-4100).
* Fix the response delay for various replies that come from the kernel:
fd table full, unknown resource manager / device, invalid fd,
unknown IPC command.
Source: https://github.com/leoetlino/hwtests/blob/af320e4/iostest/ipc_timing.cpp
This replaces usages of the non-standard __FUNCTION__ macro with the standard
mandated __func__ identifier.
__FUNCTION__ is a preprocessor definition that is provided as an
extension by compilers. This was the only convenient option to rely on
pre-C++11. However, C++11 and greater mandate the predefined identifier
__func__, which lets us accomplish the same thing.
The difference between the two, however, is that __func__ isn't a
preprocessor macro, it's an actual identifier that exists at function
scope. The C++17 draft standard (N4659) at section [dcl.fct.def.general]
paragraph 8 states:
"
The function-local predefined variable __func__ is defined as if a
definition of the form
static const char __func__[] = "function-name ";
had been provided, where function-name is an implementation-defined
string. It is unspecified whether such
a variable has an address distinct from that of any other object in the
program.
"
Thankfully, we don't do any macro or string concatenation with __FUNCTION__
that can't be modified to use __func__.
Currently, when immediately compile shaders is not enabled, the
ubershaders will be placed before any specialized shaders in the compile
queue in hybrid ubershaders mode. This means that Dolphin could
potentially use the ubershaders for a longer time than it would have if
we blocked startup until all shaders were compiled, leading to a drop in
performance.
- In D3D, shaders could be compiled on the main thread, blocking
startup.
- Reduced the latency between a pipeline being requested and used in all
backends in hybrid ubershader mode, when no shader stages were present.
- Fixed a case where async compilation could cause the same UID to be
appended multiple times to the UID cache.
- Fix incorrect number of threads being used when immediately compile
shaders was enabled.
Fixes a crash which could occur in platforms which do not support
buffer_storage, and EFB2RAM is enabled (which indirectly uses the
attributeless buffer).
While the code is namespaced out properly, the files weren't separated
into their own directory. This moves the files so that introducing a general
interface is easier in the future for supporting other architectures.
Lowest hanging fruit I could find with a profiler.
Not sure this stuff actually needs to be done, but assuming it is, why
not do it quickly? 10x faster, goes from 1% CPU to 0.09%.
Yes, this commit is only to blame OSX and Mali. Through the former supports unsynchronized mappings, the latter supports *no* way to stream dynamic data at all. Let's try to make bad news, as they ignore friendly feature requests. Maybe we just need to make more noise...
Some locales use non-breaking spaces as separators, so getting the
encoding right is important. If DolphinWX gets a string that isn't
valid UTF-8, it flat out won't display the string.
This enables shaders to be compiled while the game is starting, instead
of blocking startup. If a shader is needed before it is compiled,
emulation will block.
As these are stored in a map, operator< will become a hot function when
doing lookups, which happen every frame. std::tie generated a rather
large function here with quite a few branches.
We would want to improve the granularity here in the future, but for
now, this should avoid any performance loss from switching to the
VideoCommon shader cache.
This saves us from having to call GetPath when the same file is being
read over and over. (GetPath is more expensive than GetOffset due to
it iterating through parts of the file system and creating strings.)
The original reason I wanted to do this was so that we can replace
the Android-specific code with this in the future, but of course,
just deduplicating between DolphinWX and DolphinQt2 is nice too.
Fixes:
- DolphinQt2 showing the wrong size for split WBFS disc images.
- DolphinQt2 being case sensitive when checking if a file is a DOL/ELF.
- DolphinQt2 not detecting when a Wii banner has become available
after the game list cache was created.
Removes:
- DolphinWX's ability to load PNGs as custom banners. But it was
already rather broken (see https://bugs.dolphin-emu.org/issues/10365
and https://bugs.dolphin-emu.org/issues/10366). The reason I removed
this was because PNG decoding relied on wx code and we don't have any
good non-wx/Qt code for loading PNG files right now (let's not use
SOIL), but we should be able to use libpng directly to implement PNG
loading in the future.
- DolphinQt2's ability to ignore a cached game if the last modified
time differs. We currently don't have a non-wx/Qt way to get the time.
This saves us from having to hardcode strings, and it also gives
us strings in whatever format is appropriate on the current OS
(for instance, IIRC Windows uses Alt+F where other OSes use Alt-F).
This commit changes devices to always return IPCCommandResult rather
than just a return code for Open() and Close() in order to be able
to better emulate reply timing.
In hindsight, I should have considered we would want to emulate
timing when I cleaned up the device interface, but alas.
This rectifies that mistake.
There is code below that assumes the presence of those macros (by #undef'ing them), but none of the included headers provided them.
This fixes a build failure on OpenBSD where the undef'd macros _do_ get picked up later on in a compilation unit (through which include, I don't know), and thus shadow the Common::swap* functions.
tl;dr: This PR speedups dolphin on mobiles with the Mali GPU and ES 3.2
drivers by a factor of 10 by using the method with the biggest overhead.
Please keep care not to buy this shit!
The ARM driver team seems to care very well about their customers. But
bad luck, users and open source developers are *not* their customers. So
even device-independent feature requests are just ignored for *years*:
https://community.arm.com/graphics/f/discussions/4645/gl_ext_buffer_storage-support
The bad point, they neither implement any of the other common ways to
stream dynamic content in unextented GL:
- They just ignore the GL_MAP_UNSYNCHRONIZED_BIT flag
- They don't support on-device buffer updates and just stall with
glBufferSubData
It seems like no benchmark is using any dynamic content - and like no
customer cares about anything but benchmarks, or users...
We have a flag to disable the glBufferSubData way, this PR adds the flag
to also disable the unsychronized mapping way. The second one is
available since their ES 3.2 update, but slow as hell.
So how to continue? The last remaining technical way to stream dynamic
content at all is to alloc a new buffer per draw call with glBufferData.
This is very gross, but still a factor 10 speedup compared to stalling
the GPU. Small tests shows that you can expect another 3-5 times speedup
with EXT_buffer_data, so Mali would be on pair with Adreno here. So if
you have bought such a device unfortunately, please try to make noise on
your vendor forums/support and ask for this extension. If you are going
to buy a new mobile, I'd recormend to avoid *any* mobile with a Mali GPU
in it.