Dolphin is able to generate one with all correct default settings, so
we don't need to ship with a pre-generated SYSCONF and worry about
syncing default settings.
Additionally, this commit changes SysConf to work with session SYSCONFs
so that Dolphin is able to generate a default one even for Movie/TAS.
Which SYSCONF needs to be touched is explicitly specified to avoid
confusion about which file SysConf is managing.
(Another notable change is that the Wii root functions are moved into
Core to prevent Common from depending on Core.)
Size is internally stored as a size_t, so having an int parameter
would cause implicit sign-conversion from a signed value to an
unsigned value to occur.
Jan 04 22:55:01 <leoetlino> fwiw, it looks like there are new warnings in the RegCache code
Jan 04 22:55:04 <leoetlino> Source/Core/Core/PowerPC/Jit64/FPURegCache.cpp:13:33: warning: declaration shadows a variable in the global namespace [-Wshadow]
Jan 04 22:56:19 <@Lioncash> yeah, the jit global should have a g_ prefix.
This fixes shadowing warnings and adds the g_ prefix to a global.
Certain parts of the standard library try to determine whether or not a
transfer operation should either be a copy or a move. The prevalent notion
of move constructors/assignment operators is that they should not throw,
they simply move an already existing resource somewhere else.
This is typically done with 'std::move_if_noexcept'. Like the name says,
if a type's move constructor is noexcept, then the functions retrieves an
r-value reference (for move semantics), or an l-value (for copy semantics)
if it is not noexcept.
As IOFile deletes the copy constructor and copy assignment operators,
using IOFile with certain parts of the standard library can fail in
unexcepted ways (especially when used with various container
implementations). This prevents that.
'operator void*' is basically a pre-C++11-ism that was used, as C++03
only had the notion of implicit type-conversion operators, but not explicit type
conversion operators (allowing implicit conversion of a file handle to
bool can go downhill pretty quickly).
Windows-1252 was sometimes being referred to as ASCII or ANSI
in Dolphin, which is incorrect. ASCII is only a subset of
Windows-1252, and ANSI is (rather improperly) used in Windows
to refer to the current code page (which often is 1252 on
Western systems, but can also be something entirely different).
The commit also replaces "SJIS" with "Shift JIS". "SJIS"
isn't misleading, but "Shift JIS" is more commonly used.
Prevents path traversal without needing an absolute path
function, and also improves accuracy (character sequences
like ../ appear to have no special meaning in IOS).
This removes the creation and usage of /sys/replace,
because the new escapes are too complicated to all
be representable in its format and because no other
NAND handling software seems to use /sys/replace.
This reverts commit 141f3bfb3a.
The implementation of getting absolute paths wasn't working
on non-Windows systems, which is a huge problem for IOS HLE.
In the code view, it would never say r1 or r2, but rather sp (stack pointer) and rtoc (register of the table of content) respectively. In the register view, all it says is the register number. This is an inconvenience considering it might not be obvious which register belongs to which of these terms.
Also make r13 named the "sda" for small data area with the same convention as above.
This is done to remove confusions among potential debugger users and to also make it more accurately tell what this feature is actually doing. Despite being true that it is using a memcheck (and it certianly checks that memory), the idea being to break on a memory access isn't really obvious especially considering that memchecks are also used in full MMU emulation to handle exceptions. It also doesn't help that memchecks are now supported in every builds.
It also changes the corresponding log because this log would be wanted by the user which means it should be more obvious that it was caused by the "memory breakpoint".
Using cmake and GCC, logs would contain the full file path when logging making logs lines unnecessarily long. This is solved by just removing anything before "/Source/Core/" (where / is whatever your OS uses to separated directory).
This fixes an issue where the Bluetooth info section could be fully
filled up by syncing 5 Wiimotes in passthrough mode then switching to
emulated Bluetooth; emulated Wiimotes were then unable to be used.
The "real" SYSCONF section is now backed up before being replaced with
a blank section that the emulated BT adapter can always fill with 5
Wiimotes without issues.
This backup is restored by the passthrough code, instead of during
the Bluetooth mode switch because this should be done regardless of the
user interface, and even without UI (if the config file is edited
manually).
When standby mode is enabled, this causes games to ES_Launch the system
menu instead of directly asking IOS (the STM more precisely) to shut
down, which prevents graceful shutdown from working
(it'll appear to hang).
Dolphin never supported WC24 standby mode anyway, so this shouldn't
cause any issues. (This should be reverted if and when WC24 standby is
implemented…)
It was never used, even when the code tried to make sure it was initialised and passed correctly. This is a supplementary fix for the memCheck dialog as this option will now work correctly.
This changes GetSymbolFromName to not require the passed name to
completely match with the symbol name. Instead, we now match
against the stripped symbol name (i.e. only the function name).
This fixes a regression introduced by #4160, which prevented
HLE::PatchFunctions() from working properly.
It didn't really made sense to disable 2 logs levels in releases builds while the level LDEBUG should really be where logs that would impact performance be. Info should be logs that report potentially usefull information and debug should report info that would only be usefull in debug context as they are called very often. To make this work, a lot of info log would have to be made debug log.
It also avoid inaccurate logs level done due to not using debug builds. While searching through the code, I saw a ton of logs that should have been info log, likely done to avoid using a debug build (which shouldn't happen considering the level debug exists anyway).
The whole idea is to have more meaningful logs in release builds while maintaining minimal performance loss from choosing the highest level. This could potentially help to diagnose issues or to know more about what the emulator is actually doing.
The next commit aims to sort the log levels for this purpose.
clang-format really *wants* the two empty lines to be removed;
otherwise, it will always flag MemoryUtil as needing formatting changes
which is an annoyance when it is used as a git filter driver.
These are needed for the next commit. I had to modify the implementation of the DSP one too, but since it basically isn`t used, I don`t think it matters much. These options only matters when adding one.
Fundamentally, all this does is enforce the invariant that we always
translate effective addresses based on the current BAT registers and
page table before we do anything else with them.
This change can be logically divided into three parts. The first part is
creating a table to represent the current BAT state, and keeping it up to
date (PowerPC::IBATUpdated, PowerPC::DBATUpdated, etc.). This does
nothing by itself, but it's necessary for the other parts.
The second part (mostly in MMU.cpp) is simply removing all the hardcoded
checks for specific untranslated addresses, and consistently translating
addresses using the current BAT configuration. Very straightforward, but a
lot of code changes because we hardcoded assumptions all over the place.
The third part (mostly in Memmap.cpp) is making the fastmem arena reflect
the current BAT configuration. We do this by redoing the mapping (calling
memmap()) based on the BAT table whenever it changes.
One additional minor change is that translation can fail in two ways:
either the segment is a direct store segment, or page table lookup failed.
The difference doesn't usually matter, but the difference affects cache
instructions, like dcbz.
Replace adhoc linked list with a priority heap. Performance
characteristics are mostly the same, but is more cache friendly.
[Priority Queues have O(log n) push/pop compared to the linked
list's O(n) push/O(1) pop but the queue is not big enough for
that to matter, so linear is faster over linked. Very slight gains
when framelimit is unlimited (Wind Waker), 1900% -> 1950%]
memory check
Also fix an oddity in the case when the last memory check is deleted,
the jit cache was supposed to be cleared in that case, but it was out of
the for loop that finds the one to delete so it was never run.
Naturally, the same fix for the adding the first memory check was
applied.
This should help prevent breakage when the curl.h header is changed.
As far as I can tell this only increases the compile time by a hair, but prevents needing to create a PR every time curl.h gets updated. Alternatively I'm experimenting with CURL_STRICTER defined per a conversion with booto:
>booto | krakn: try having CURL_STRICTER defined for the build?
Credit goes to: flacs for the suggestion to include curl.h
Fix include
git diff --name-only already took care of only returning the name, so
the awk is unneeded and makes it return only empty file names.
Facepalm, I know. Sorry for this oversight.
(Also fixes something that lint didn't catch because of this)
* Focus "Hash Code" / "IP address" text box by default in "Connect"
* Focus game list in "Host" tab
* RETURN keypress now host/join depending on selected tab
* Remember last hosted game
* Remove PanicAlertT:
* Simply log message to netplay window
* Remove them when they are useless
* Show some netplay message in OSD
* Chat messages
* Pad buffer changes
* Desync alerts
* Stop the game consistently when another player disconnects / crashes
* Prettify chat textbox
* Log netplay ping to OSD
Join scenario:
* Copy netplay code
* Open netplay
* Paste code
* Press enter
Host scenario:
* Open netplay
* Go to host tab
* Press enter
Most modern Unix environments use 64-bit off_t by default: OpenBSD,
FreeBSD, OS X, and Linux libc implementations such as Musl.
glibc is the lone exception; it can default to 32 bits but this is
configurable by setting _FILE_OFFSET_BITS.
Avoiding the stat64()/fstat64() interfaces is desirable because they
are nonstandard and not implemented on many systems (including
OpenBSD and FreeBSD), and using 64 bits for stat()/fstat() is either
the default or trivial to set up.
CScript must be run as 64-bit regardless of the MSBuild bitness. Otherwise it won't find 64-bit Git installations.
However the "Sysnative" redirector is not available for 64-bit processes. So a fix is needed when 64-bit MSBuild is run.
The "ProgramFiles(x86)" Macro is only set for 64-bit, otherwise it is empty. Therefore it can be used as condition to check whether the current MSBuild process is 32 or 64-bit.
OS X uses a case insensitive filesystem by default: when I try to build,
a system header does #include <assert.h>, which picks up
Source/Core/Common/Assert.h. This only happens because CMakeLists adds
'${PROJECT_BINARY_DIR}/Source/Core/Common' as an include directory: in
an out-of-tree build, that directory contains no other source files, but
in an in-tree build PROJECT_BINARY_DIR is just the source root.
This is only used for scmrev.h. Change the include directory to
'${PROJECT_BINARY_DIR}/Source/Core' and the include to
"Common/scmrev.h", which is more consistent with normal headers anyway.
Fully opt-in, reports to analytics.dolphin-emu.org over SSL. Collects system
information and settings at Dolphin start time and game start time.
UI not implemented yet, so users are required to opt in through config editing.
allthough this is a mesa bug, this is a simple enough workaround for context
creation fails with EGL_CONTEXT_OPENGL_FORWARD_COMPATIBLE_BIT_KHR set.
Otherwise dolphin will fail to create 3.3+ core context with current mesa
version
EndPlayInput runs on the CPU thread so it can't directly call
UpdateWantDeterminism. PlayController also tries to ChangeDisc
from the CPU Thread which is also invalid. It now just pauses
execution and posts a request to the Host to fix it instead.
The Core itself also did dodgy things like PauseAndLock-ing
from the CPU Thread and SetState from EmuThread which have been
removed.
Fix Frame Advance and FifoPlayer pause/unpause/stop.
CPU::EnableStepping is not atomic but is called from multiple threads
which races and leaves the system in a random state; also instruction
stepping was unstable, m_StepEvent had an almost random value because
of the dual purpose it served which could cause races where CPU::Run
would SingleStep when it was supposed to be sleeping.
FifoPlayer never FinishStateMove()d which was causing it to deadlock.
Rather than partially reimplementing CPU::Run, just use CPUCoreBase
and then call CPU::Run(). More DRY and less likely to have weird bugs
specific to the player (i.e the previous freezing on pause/stop).
Refactor PowerPC::state into CPU since it manages the state of the
CPU Thread which is controlled by CPU, not PowerPC. This simplifies
the architecture somewhat and eliminates races that can be caused by
calling PowerPC state functions directly instead of using CPU's
(because they bypassed the EnableStepping lock).
bool is not always guaranteed to be the same size on every platform.
On some platforms it may be one byte, on others it can be 8 bytes if the
platform dictates it. It's implementation-defined.
This can be problematic when it comes to storing this
data to disk (it can also be space-inefficient, but that's not really an
issue). Also say for some reason you moved your savestates to another
platform, it's possible they won't load correctly due to differences in size.
This change stores all bools to savestates as if they were a byte in size
and handles the loading of them accordingly.
MSVC's implementation of numeric_limits currently generates incorrect
signaling NaNs. The resulting values are actually quiet NaNs instead.
This commit is based off of a solution by shuffle2. The only
difference is a template specialization for floats is also added
to cover all bases
This isn't necessary, as the member functions are deleted.
If someone tries to perform a copy, the compiler will now
indicate that the member functions/constructors are deleted,
rather than inaccessible.
Seems like NVidia just ignores the forward compatible flag.
Additionally, they neither enable any extension which was designed later...
So either compatible profile, or a huge list of core profiles....
This is because Google decided it was in their best interest to update eglext.h for android-21/arch-arm only and completely neglect all the other
architectures.
Sucks to suck.
This is being implemented here first under EGL since the infrastructure is already in place for this due to the Android code requiring some bits.
The rest of the interfaces will come in a little bit.
This will be required for threaded shader compiling in the near future.
Build Events are run in an 32 bit environment, therefore both program files environment strings resolve to the x86 program files folder on 64 Bit systems. If Git is 64 bit and installed into the x64 program files it can't be found by the script.
This fixes the crashes occuring at startup with a non-empty shader cache.
Because LinearDiskCache reads/writes to the storage of ShaderUid, ShaderUid must be trivially copyable.
Additionally, adds a static assert to LinearDiskCache to ensure this doesn't happen in the future.
The initialization of ShaderUid data has been moved to the code generation functions, so the above condition holds true.
Approximately three or four times now, the issue of pointers being
in an inconsistent state been an issue in the video backend renderers
with regards to tripping up other developers.
Global (ugh) resources are put into a unique_ptr and will always have a
well-defined state of being - null or not null
This removes some nonsense in the extension loader where under an ES context we would still pull all function pointers and just continue onward if we
fail to pull one.
Now function pointers are only pulled if the version of GL or ES actually supports that function.
fileplatform is moved so it's in the same place as the other platform
icons, and nobanner is moved just because it fits better in Resources.
Both of them were identical in all of Dolphin's themes.
- Change the path of the Sys folder to the executable's location
- Add LINUX_LOCAL_DEV flag to use relocatable version on Linux
- Add CMake definition for relocatable build
Currently only works on unix, but can be extended to other systems. Can
also be extended to do wiimotes.
Searches the Pipes folder for readable named pipes and creates a dolphin
input device out of them. Send controller inputs to the game by writing
to the file. Commands are described in Pipes.h.
It's used by both the GUI to do things like install WADs and check up on
the system menu, in which case the global root should be used, and by
/dev/es, in which case the local one should. The latter isn't
*terribly* useful today, since no contents will ever be installed in
temporary roots (although it's still relevant for data directories), but
converting the whole thing makes sense because then it will Just Work
once the entire NAND is synced.
Because it would have been a bit of work to split it up (but I can if
desired), this commit also contains some basic cleanup of
NANDContentLoader:
(1) The useless interface class INANDContentLoader is removed and the
methods are changed to just return CNANDContentLoader (the only
implementation);
(2) CNANDContentManager is changed to use unique_ptr and cleaned up a
bit.
Gets rid of magic numbers in cases where the array size is known at compile time.
This is also useful for future entries that are stack allocated arrays as these
functions prevent incorrect sizes being provided.
g_compressAndDumpStateSyncEvent was Set() before destruction of file object (i.e. before flushing changes and closing file).
Also, adds Common::ScopeGuard wrapper for RAII.
Wasn't masking by the size of the offset encoding so negative values were killing the instruction
Missed commiting this in my integer gatherpipe PR.
Fixes crashing on AArch64.
Uses delete to make the unimplemented functions detectable at compile time
and not link time if they are used.
The move constructor and assignment operator are removed as moves are not
copies, but transfers of ownership, which isn't suited for this class.
This is required to make sure two code spaces are relatively close to one another.
In this case I need the AArch64 JIT codespace and its farcode space to be within 128MB of one another for branches.
As it is currently written, CallLambdaTrampoline does not return a
value. However, some of the functions that are being wrapped may
return a value that the JIT is expected to understand. A compiler
*cough cough clang* may opt to alter %rax after the wrapped lambda
returns, e.g. popping a previous value, which can clobber the
return value. If we actually have a return value, then the compiler
must not clobber it.
In a particular hashing heavy scene in Crazy Taxi the Murmur3 hash used 3.11% CPU time.
The new CRC32 hash in the same scene used 1.86%
This was tested on a Nvidia SHIELD Android TV with Cortex-A57s.
This will be a bit slower on the Nexus 9, the Denver CPU core is a bit slower with CRC32 texture hashing than Murmur3 texture hashing.
The new implementation has 3 options:
SyncGpuMaxDistance
SyncGpuMinDistance
SyncGpuOverclock
The MaxDistance controlls how many CPU cycles the CPU is allowed to be in front
of the GPU. Too low values will slow down extremly, too high values are as
unsynchronized and half of the games will crash.
The -MinDistance (negative) set how many cycles the GPU is allowed to be in
front of the CPU. As we are used to emulate an infinitiv fast GPU, this may be
set to any high (negative) number.
The last parameter is to hack a faster (>1.0) or slower(<1.0) GPU. As we don't
emulate GPU timing very well (eg skip the timings of the pixel stage completely),
an overclock factor of ~0.5 is often much more accurate than 1.0
Using SDL_INIT_JOYSTICK implies SDL_INIT_EVENTS which installs a signal
handler for SIGINT and SIGTERM. There will be a way to prevent this in
2.0.4 but for now we'll need to handle SDL_QUIT.
Actually caused by IniFiles::GetLines leaving the output vector in its
old state if the section wasn't found, and Gecko::LoadCodes not checking
the return value. Fix by moving lines->clear() up.
- Change the Wiimote emulation SYSCONF R/W to use the temporary NAND if in use.
- Fix up SysConf API so this actually works.
Kind of a hack. Like I said, this can be cleaned up when configuration
is synced...
Eventually, netplay will be able to use the host's NAND, but this could
still be useful in some cases; for TAS it definitely makes sense to have
a way to avoid using any preexisting NAND.
In terms of implementation: remove D_WIIUSER_IDX, which was just WIIROOT
+ "/", as well as some other indices which are pointless to have as
separate variables rather than just using the actual path (fixed, since
they're actual Wii NAND paths) at the call site. Then split off
D_SESSION_WIIROOT_IDX, which can point to the dummy NAND directory, from
D_WIIROOT_IDX, which always points to the "real" one the user
configured.
- FileSearch is now just one function, and it converts the original glob
into a regex on all platforms rather than relying on native Windows
pattern matching on there and a complete hack elsewhere. It now
supports recursion out of the box rather than manually expanding
into a full list of directories in multiple call sites.
- This adds a GCC >= 4.9 dependency due to older versions having
outright broken <regex>. MSVC is fine with it.
- ScanDirectoryTree returns the parent entry rather than filling parts
of it in via reference. The count is now stored in the entry like it
was for subdirectories.
- .glsl file search is now done with DoFileSearch.
- IOCTLV_READ_DIR now uses ScanDirectoryTree directly and sorts the
results after replacements for better determinism.
There is no nice way to correctly "detect" the "used" memory, so we just say
we're fine to use 50% of the physical memory for custom textures.
This will fix out-of-memory crashes, but we still might run into swapping issues.
Replaces them with forward declarations of used types, or removes them entirely if they aren't used at all. This also replaces certain Common headers with less inclusive ones (in terms of definitions they pull in).
Change TMemCheck::Action to return whether to break rather than calling
PPCDebugInterface::BreakNow, as this simplified the implementation; then
remove said method, as that was its only caller. One "interface" method
down, many to go...
- Move JitState::memcheck to JitOptions because it's an option.
- Add JitOptions::fastmem; switch JIT code to checking that rather than
bFastmem directly.
- Add JitBase::UpdateMemoryOptions(), which sets both two JIT options
(replacing the duplicate lines in Jit64 and JitIL that set memcheck
from bMMU).
- (!) The ARM JITs both had some lines that checked js.memcheck
despite it being uninitialized in their cases. I've added
UpdateMemoryOptions to both. There is a chance this could make
something slower compared to the old behavior if the uninitialized
value happened to be nonzero... hdkr should check this.
- UpdateMemoryOptions forces jo.fastmem and jo.memcheck off and on,
respectively, if there are any watchpoints set.
- Also call that function from ClearCache.
- Have MemChecks call ClearCache when the {first,last} watchpoint is
{added,removed}.
Enabling jo.memcheck (bah, confusing names) is currently pointless
because hitting a watchpoint does not interrupt the basic block. That
will change in the next commit.
Otherwise, it would work but any async sending would be delayed by 4ms or
wait until the next packet was received.
Also increase the client timeout to 250ms, since enet_host_service is now
really interrupted.
With my previous changes Dolphin would fail to create the user directory if it didn't exist, and would dump all the configuration options in to the cwdir.
This was a bit more complicated to fix in a clean fashion, so I took to moving around code concerning user directories.
Instead of having GetUserPath serve a dual purpose of both getting and setting our user directories, break out to a new SetUserPath function.
GetUserPath will know only get the configured user path.
SetUserPath will set our user paths and setup the internal user path state.
This ending up being a lot cleaner overall, which is nice. Also less mind bending when attempting to read the code.
So now we won't dump all of our configuration in to the cwdir if ~/.dolphin-emu isn't found.
Fixes issue 8371.
Clamping a rectangle correctly requires fully clamping all four
coordinates in the general case.
This should fix issue 6923, sort of; at least, it fixes the part where a
rectangle ends up with a nonsensical height after being clamped.
A bit more efficient if we are only pushing two VFP registers.
We can probably be a bit more efficient in the future by mixing paired loadstores in to the other paths as well.
Previously on FPR pushing and popping we would do a single STR/LDR per quad FPR we wanted to push/pop.
In most of our cases when we are pushing and popping VFP registers they will be consecutive registers that will save more efficiently using the NEON
loadstores that can do up to four quad registers.
So this can potentially cutting instructions down to ~1/4th the amount of instructions if the registers are all consecutive.
On the Cortex-A57 this is basically just an icache improvement, but on the Nvidia Denver this may be optimized to be more efficient. Either way it's a
win.
The Load directory wasn't being properly reassigned when the user path changed, which causes a bunch of issues with things loading from the wrong
place when using the -U option in Dolphin.
The UI should decide on where it wants the user directory, not our core system.
This is in anticipation of some upcoming work on Android which will need proper user directory setting.
Since libcommon.a is also the last library to be linked, this has the
totally hacky but useful side-effect that it doesn't require people to
modify CMake files for temporarily adding VTune code to other Dolphin
libraries.
The PowerPC CPU has bits in MSR (DR and IR) which control whether
addresses are translated. We should respect these instead of mixing
physical addresses and translated addresses into the same address space.
This is mostly mass-renaming calls to memory accesses APIs from places
which expect address translation to use a different version from those
which do not expect address translation.
This does very little on its own, but it's the first step to a correct BAT
implementation.
The Windows implementations of CharArrayFromFormatV() and
StringFromFormat() use the "C"/".1252" locale instead of the user
locale (using _vsnprintf_l). On non-Windows, the user locale was used.
This leads to bugs on non-Windows: the Overclock parameter was
serialised with the user locale ("0,279322" in some locale) and was
interpreted back as "0" (because the C locale is used for parsing the
string).
Make non-Windows CharArrayFromFormatV() and StringFromFormat()
consistent with their Windows counterpart.
The locale code is not enables for Android:: uselocale is only
available since API 21 and API 21 only supports C and C.UTF-8.
If we are compiling in the CRC32 hash, clang has an issue with casting a s32 to a u64.
Change our lens argument to a unsigned integer to fix the issue.
Intellisense doesn't like defines in PCH files, and it doesn't like the deleted
constructor for BitField. (I think it's being overly strict about the
"must have no non-default constructors" rule for classes in unions.)
Someone thought it would be a good idea to have the location as the first argument on the instruction.
Changed it to how it is supposed to be disassembled.
Optimistically assume used GQRs are 0 in blocks that only use one GQR, and
bail at the start of the block and recompile if that assumption fails.
Many games use almost entirely unquantized stores (e.g. Rebel Strike, Sonic
Colors), so this will likely be a big performance improvement across the board
for games with heavy use of paired singles.
Won't work with all games, but provides a nice way to spend extra CPU to make
a variable framerate game faster (e.g. Spyro or The Last Story), or to make
a game use less CPU at the cost of a lower framerate (e.g. Rogue Leader).
If we have a shift amount that is the full length of the source register then we have an invalid instruction.
This can happen when dealing with a couple of PowerPC instructions.
This same adjustment is already in the ARMv7 emitter.
Fixes issues with negative offsets in loadstore instructions.
Adds ADRP/ADR instructions.
Optimizes MOVI2R function to take advantage of ADRP on pointers, can change a 3 instruction operation down to one.
Adds GPR push/pop operations for ABI related things.
The reason this didn't break is that bitwise instructions like VPAND,
VANDPS, and VANDPD do the exact same thing. The only difference is the
data type they are intended for.
The builtin byteswap routines cause critical failure on AArch64 when built with the Android toolchain.
I didn't experience this issue when building for Linux using a local qemu chroot.
Seems to be only an issue with the Android toolchain when building AArch64.
Use our generic version instead.
If the inputs are both float singles, and the top half is known to be identical
to the bottom half, we can use packed arithmetic instead of scalar to skip
the movddup.
This is slower on a few rather old CPUs, plus the Atom+Silvermont, so detect
Atom and disable it in that case.
Also avoid PPC_FP on stores if we know that the output came from a float op.
Move the JITed function/basic-block registration logic out of the CPU
subsystem in order to add JIT registration to JITed DSP and
Video/VertexLoader code.
This necessary in order to add /tmp/perf-$pid.map support to other
JITed code as they need to write to the same file.
When we cleaned up the code to calculate the shm_position and total_mem
in one step, we sometimes skipped over certain views because they were
Wii-only. When looking at the total memory, we'd look at the last field,
whether or not it was skipped. Since Wii-only fields are the last view,
this meant that the shm_position was 0, since it was skipped, causing us
to map a 0-sized field. Fix this by explicitly returning the total size
from MemoryMap_InitializeViews.
Additionally, the shm_position was being calculated incorrectly because
it was adding up the shm_position *before* the mirror, rather than after
it. Fix this by adopting a scheme similar to what we had before.
The code to calculate the offsets into the SHM file wasn't properly
respecting the skip flags, causing it to calculate offsets beyond
the end of the SHM file.
This code was ported from out_ptr, which was a double-pointer, and
wanted to double-check that the proper arena was actually allocated.
When I ported it to store the pointer directly in the view regardless
of whether out_ptr was non-NULL, I got confused here and instead
caused the code to only free the arena if the first byte was non-zero.
This code originally tried to map the "low space" for the Gamecube's
memory layout, but since has expanded to mapping all of the easily
mappable memory on the system. Change the name to "GrabSHMSegment" to
indicate that we're looking for a shared memory segment we can map into
our process space.
These are effectively unused, since the memmap already maps them in one
place. For 32-bit, they might have some slight advantage, but we already
special-case the regular "high-mem" pointer for 32-bit, so just use the
one we already have...
This is a higher level, more concise wrapper for bitsets which supports
efficiently counting and iterating over set bits. It's similar to
std::bitset, but the latter does not support efficient iteration (and at
least in libc++, the count algorithm is subpar, not that it really
matters). The converted uses include both bitsets and, notably,
considerably less efficient regular arrays (for in/out registers in
PPCAnalyst).
Unfortunately, this may slightly pessimize unoptimized builds.
We weren't dropping a newline character from the string, we were cutting off the last character of the hardware name.
This fixes my TK1 being called 'lagun' when it's name is 'laguna'
GCC has optimized this using the exact same code since 4.7 or 4.8.
Android building falls back to the __linux__ route.
No need to keep these around anymore since we aren't building on an old GCC version.
Before this commit, the two were reversed ("cpu_string" had the brand, e.g. "AuthenticAMD"; and "brand_string" had the CPU type, e.g. "AMD Phenom II X4 925").
This is good hygiene, and also happens to be required to build Dolphin
using Clang modules.
(Under this setup, each header file becomes a module, and each #include
is automatically translated to a module import. Recursive includes
still leak through (by default), but modules are compiled independently,
and can't depend on defines or types having previously been set up. The
main reason to retrofit it onto Dolphin is compilation performance - no
more textual includes whatsoever, rather than putting a few blessed
common headers into a PCH. Unfortunately, I found multiple Clang bugs
while trying to build Dolphin this way, so it's not ready yet, but I can
start with this prerequisite.)
I found it via clang complaining about a useless null check on an array,
but I decided to get rid of the array in favor of dynamic allocation, as
there was no reason to assume a maximum length of 0x32 bytes. Plus, add
a CFString type check just in case, and switch to UTF-8 in the
off-chance it matters.
The result has not actually been tested, as I have no CD drive.
It only ever did anything on 32-bit OS X.
Anyway, it wasn't even on the right functions, and these days
ABI_PushRegistersAndAdjustStack should handle maintaining the ABI
correctly.
This helps us avoid accidentally clobbering flags between two instructions
when the flags are expected to be maintained. Dolphin will of course crash
immediately, but at least it will crash loudly and alert us of the mistake,
instead of forcing hours of bisecting to find the subtle way in which the JIT
has managed to sneak a flag-modifying instruction where there shouldn't be one.
This is inconsistent with how other containers are used (i.e. with Do()), but making std::array be used with Do() seems rather confusing when there's also a DoArray available.
To avoid FPRs being pushed unnecessarily, I checked the uses: DSPEmitter
doesn't use FPRs, and VertexLoader doesn't use anything but RAX, so I
specified the register list accordingly. The regular JIT, however, does
use FPRs, and as far as I can tell, it was incorrect not to save them in
the outer routine. Since the dispatcher loop is only exited when
pausing or stopping, this should have no noticeable performance impact.
- Factor common work into a helper function.
- Replace confusingly named "noProlog" with "rsp_alignment". Now that
x86 is not supported, we can just specify it explicitly as 8 for
clarity.
- Add the option to include more frame size, which I'll need later.
- Revert a change by magumagu in March which replaced MOVAPD with MOVUPD
on account of 32-bit Windows, since it's no longer supported. True,
apparently recent processors don't execute the former any faster if the
pointer is, in fact, aligned, but there's no point using MOVUPD for
something that's guaranteed to be aligned...
(I discovered that GenFrsqrte and GenFres were incorrectly passing false
to noProlog - they were, in fact, functions without prologs, the
original meaning of the parameter - which caused the previous change to
break. This is now fixed.)
This is the bare minimum required to run a few games on AArch64.
Was able to run starfield and Animal Crossing to the Nintendo logo.
QEmu emulation is literally the slowest thing in the world, it maxes out at around 12mhz on my Core i7-4930MX.
I've tested a few instruction encodings and am expecting most to work as long as one stays away from VFP/SIMD.
This implements mostly instructions to bring up an initial JIT with integer support.
This can be improved to allow ease of use functions in the future, dealing with the raw imms/immr encodings is probably the worst thing ever.
Uses are split into three categories:
- Arbitrary (except for size savings) - constants like RSCRATCH are
used.
- ABI (i.e. RAX as return value) - ABI_RETURN is used.
- Fixed by architecture (RCX shifts, RDX/RAX for some instructions) -
explicit register is kept.
In theory this allows the assignments to be modified easily. I verified
that I was able to run Melee with all the registers changed, although
there may be issues if RSCRATCH[2] and ABI_PARAM{1,2} conflict.
The special case is where the registers are actually to be swapped (i.e.
func(ABI_PARAM2, ABI_PARAM1); this was previously impossible but would
be ugly not to handle anyway.
Prior to this change, it was possible to cause an infinite loop by making the string to be replaced and the replacing string the same thing.
e.g.
std::string some_str = "test";
ReplaceAll(some_str, "test", "test");
This also changes the replacing in a way that doesn't require starting from the beginning of the string on each replacement iteration.