Floating-point is complicated...
Some background: Denormals are floats that are too close to zero to be
stored in a normalized way (their exponent would need more bits). Since
they are stored unnormalized, they are hard to work with, even in
hardware. That's why both PowerPC and SSE can be configured to operate
in faster but non-standard-conpliant modes in which these numbers are
simply rounded ('flushed') to zero.
Internally, we do the same as the PowerPC CPU and store all floats in
double format. This means that for loading and storing singles we need a
conversion. The PowerPC CPU does this in hardware. We previously did
this using CVTSS2SD/CVTSD2SS. Unfortunately, these instructions are
considered arithmetic and therefore flush denormals to zero if non-IEEE
mode is active. This normally wouldn't be a problem since the next
arithmetic floating-point instruction would do the same anyway but as it
turns out some games actually use floating-point instructions for
copying arbitrary data.
My idea for fixing this problem was to use x87 instructions since the
x87 FPU never supported flush-to-zero and thus doesn't mangle denormals.
However, there is one more problem to deal with: SNaNs are automatically
converted to QNaNs (by setting the most-significant bit of the
fraction). I opted to fix this by manually resetting the QNaN bit of all
values with all-1s exponent.
VI isn't called as regular as we want to, so we have to create a new throttling event called regularly by coretiming.
Atm we throttle every 1 ms when we are too fast and skip throttling when we lack 40ms (to avoid fast boosts after slowdowns)
If there is an issue with a reported extension, disable it instead of failing out entirely.
Fixes an issue with buffer_storage that I had overlooked as well.
This changes from using logical and to bitwise and, which causes the compile time to drop from an absurd amount of time to around five seconds on my
crappy laptop.
The default async api allow us to set some latency options. The old one (simple API) was the lazy way to go for usual audio where latency doesn't matter.
This also streams audio, so it should be a bit faster then the old one.
This structure fields should match byte-to-byte the layout of MMIO registers:
it is addressed using the MMIO reg address when doing a CP MMIO read. This was
unfortunately not the case, causing CP reads to be mostly broken with the
software renderer.
This option was known to break every second game and only boost a bit.
It also seems to be broken because of streaming into pinned memory and buffer storage buffers.
v2: also remove dlc_desc
Older Qualcomm drivers rotated the framebuffer 90 degrees and this fix didn't work.
Now for some obscene reason it rotates a full 180 degrees.
This can at least be worked around by flipping around the image on our end.
This isn't the cleanup that GLInterface needs, but for now it makes it so it'll swap and not just black screen
A cleanup to GLInterface will be coming in a couple weeks.
On Windows, nvidia don't give us their driver version, so we can't workaround any issues.
As buffer_storage is broken on some drivers, we wanted to disble it for them.
So we can't.
Luckyly only "some" released driver versions are affected as this extension is only available since some months. Let's hope that nobody have to use one of this driver version, else they will get a black screen ...
OSX has their own driver, so performance issues aren't shared with the nvidia driver (unlike the closed source linux and windows nvidia driver). So now they'll also use the MapAndSync backend like all other osx drivers.
fixes issue 6596
I've also cleaned up the if/else block selecting the best backend a bit.
The new chapter title in Paper Mario TTYD had a small graphical bug due to the new code because it read one extra pixel, this fixes it.
I hope this gets everything, I though I had checked most bugs and yet here I am, commit-spamming...
Fixes some remaining bbox related bugs in Mickey's Magical Mirror and a slight graphical glitch in Paper Mario: TTYD when flipping and Vivian as your companion (I've been scratching my head for days to find this one).
Instead of being vertex-based, it is now primitive (point, line or dissected triangle) based, with proper clipping.
Also, screen position is now calculated based on viewport values, instead of "guesstimating".
This fixes many graphical glitches in Paper Mario: TTYD and Super Paper Mario.
Also, the new code allows Mickey's Magical Mirror and Disney's Hide & Sneak to work (mostly) bug-free. I changed their inis to use bbox.
These changes have a slight cost in performance when bbox is being used (rare), mostly due to the new clipping algorithm.
Please check for any regressions or crashes.
The only two devices that do this are Mesa software rasterizer and Intel Ironlake(With a few hacks).
Basically since it doesn't support OpenGL 3.0, it can't grab the version the new way.
So failing that, it sets to GL 2.1, and continues.
Further along, on Ironlake at least, it tries grabbing the extensions the new GL 3.0 way and fails.
So have a fallback that grabs the extensions string the old way, in probably the most elegant way possible.
The old way was to use big switch/case statements based on a type of buffer.
The new one is to use inheritance.
This change prohibits us to change the buffer type while running, but I doubt we'll ever do so.
Performance should also be a bit better. Also a nice cleanup.
Added some comments about this different kind of buffers.
This is a bit slower on map_and_* because of flushing and _very_ much slower on buffer(sub)?data because of a new memcpy.
But this design allow us to decode directly into a gpu buffer, eg vertexloader will profit :)
gl.h and glext.h provide most of the function pointer typedefs and defines for extensions and core features.
The only one it doesn't provide is GL 1.1 function typedefs, but this is to be expected.
If anything needs defines or typedefs in their header in the future, that's as easy as before.
Prior to this commit it was possible to assign the same keycode to more than one button.
ie. Say I assigned Open with the hotkey Ctrl+O; well, it was possible to also add it to another function as well, which leads to hotkey clashing.
Now, say I assign Open with Ctrl+O, but then assign that same hotkey to Refresh List; it will unbind the hotkey from Open and then assign it to refresh list.
This one was introduced to reduce the glBindTexture and glActiveTexture calls. But it was quite a bit of logic and only an improvment on uploading/creating a texture, which is done rarely.
This adds xfb support to the videosoftware backend, which increases it's
accuracy and more imporantly, enables the usage of many homebrew apps
which write directly to the xfb on the videosoftware backend.
Conflicts:
Source/Core/VideoBackends/Software/SWRenderer.cpp
Source/Core/VideoBackends/Software/SWmain.cpp
I give up. Merging the ppc_fp branch has caused issues in numerous games
and I can't find the bug. I'm leaving this merged to enable easy
recompilation for people who would like to play games that benefit from
non-IEEE mode emulation (e.g. Starfox Assault).
This branch is the final step of fully supporting both OpenGL and OpenGL ES in the same binary.
This of course only applies to EGL and won't work for GLX/AGL/WGL since they don't really support GL ES.
The changes here actually aren't too terrible, basically change every #ifdef USE_GLES to a runtime check.
This adds a DetectMode() function to the EGL context backend.
EGL will iterate through each of the configs and check for GL, GLES3_KHR, and GLES2 bits
After that it'll change the mode from _DETECT to whichever one is the best supported.
After that point we'll just create a context with the mode that was detected
As we do lots of writes to *Iptr, the compiler isn't allowed to cache any shared variable (neither index nor Iptr itself).
This commit inlines Iptr + index into the index generator functions, so the compiler know that they are const.
We are used to render them out of order as long as everything else matches, but rendering order does matter, so we have to flush on primitive switch. This commit implements this flush.
Also as we flush on primitive switch, we don't have to create three different index buffers. All indices are now stored in one buffer.
This will slow down games which switch often primitive types (eg ztp), but it should be more accurate.
add the GL include (back) to Base.props
use a similar technique to GLX.cpp (by Sonic) in WGL.cpp to get
wglSwapIntervalEXT without the WGLEW check
Conflicts:
Source/Core/VideoBackends/OGL/OGL.vcxproj
Source/Core/VideoBackends/OGL/OGL.vcxproj.filters
Source/VSProps/Base.props
This "u32 components" is a list of flags which attributes of the vertex loader are present.
We are used to append this variable to lots of vertex generation functions, but some of them don't need it at all.
The usual way to handle this kind of request is to rise a flag which the gpu thread polls.
The gpu thread itself either generates the result or just write zeros if disabled.
After this, it rise another flag which says that this work is done.
So if disabled, we still have the cpu-gpu round trip time. This commit just returns 0 on the cpu thread
instead of playing ping pong...
Move enums for max SI and EXI devices to their respective .h file, and rename them.
Use only those enums in BootManager.cpp. Same thing in Movie.cpp
Change one instance of MAX_BBMOTES to MAX_WIIMOTES in Movie.cpp, since movies do not support balance board.
Some information on this bug since this isn't quite true.
Seemingly with the v53 driver, Qualcomm has actually fixed this bug. So we can dynamically access UBO array members.
The issue that is cropping up is actually converting our attribute 'fposmtx' to an integer.
int posmtx = int(fpostmtx);
This line causes some seemingly garbage values to enter in to the posmtx variable.
Not sure exactly why it is failing, probably them just not actually converting the float to an integer and just handling the float directly as a integer.
So the bug is going to stay active with Qualcomm devices until we convert this vertex attribute from a float to a integer.
Let's talk a bit about this bug. 12nd oldest bug not fixed in Dolphin, it was a
lot of fun to debug and it kept me busy for a while :)
Shoutout to Nintendo for framework.map, without which this could have taken a
lot longer.
Basic debugging using apitrace shows that the heat effect is rendered in an
interesting way:
* An EFB copy texture is created, using the hardware scaler to divide the
texture resolution by two and that way create the blur effect.
* This texture is then warped using indirect texturing: a deformation map is
used to "move" the texture coordinates used to sample the framebuffer copy.
Pixel shader: http://pastie.org/private/25oe1pqn6s0h5yieks1jfw
Interestingly, when looking at apitrace, the deformation texture was only 4x4
pixels... weird. It also does not have any feature that you would expect from a
deformation map. Seeing how the heat effect glitches, this deformation texture
being wrong looks like a good candidate for the problem. Let's see how it's
loaded!
By NOPing random calls to GXSetTevIndirect, we find a call that when removed
breaks the effect completely. The parameters used for this call come from the
results of methods of JPAExTexShapeArc objects. 3 different objects go through
this code path, by breaking each one we can notice that the one "controlling"
the heat effect is the one at 0x81575b98.
Following the path of this object a bit more, we can see that it has a method
called "getIndTexId". When this is called, the returned texture ID is used to
index a map and get a JPATextureArc object stored at 0x81577bec.
Nice feature of JPATextureArc: they have a getName method. For this object, it
returns "AK_kagerouInd01". We can probably use that to see how this texture
should look like, by loading it "manually" from the Wind Waker DVD.
Unfortunately I don't know how to do that. Fortunately @Abahbob got me the
texture I wanted in less than 10min after I asked him on Twitter.
AK_kagerouInd01 is a 32x32 texture that really looks like a deformation map:
http://i.imgur.com/0TfZEVj.png . Fun fact: "kagerou" means "heat haze" in JP.
So apparently we're not using the right texture object when rendering! The
GXTexObj that maps to the JPATextureArc is at offset 0x81577bf0 and points to
data at 0x80ed0460, but we're loading texture data from 0x0039d860 instead.
I started to suspect the BP write that loads the texture parameters "did not
work" somehow. Logged that and yes: nothing gets loaded to texture stage 1! ...
but it turns out this is normal, the deformation map is loaded to texture stage
5 (hardcoded in the DOL). Wait, why is the TextureCache trying to load from
texture stage 1 then?!
Because someone sucked at hex.
Fixes issue 2338.
At the moment, custom textures with:
- invalid mipmap size
- invalid aspect ratio
- non-fractional scaling factors
are allowed. But they can't be loaded fine by the backend, so generate a warning if someone trys to load them.
fixes issue 6898
OpenGL defaults are GL_REPEAT, which is even more unlikely than GL_CLAMP_TO_EDGE.
As I can't test the behavoir of the real hardware, I changed it to how it works before,
but I guess just clip the texture makes more sense.
Some settings that bootmanger reads from game ini can be changed while a game is running, so we don't have to revert these back to what they were when starting the game, unless they were actually changed by the game ini.
Fix signed/unsigned warnings that pauldacheez pointed out.
It didn't make sense. The math was nonsensical. Calibration data was somehow applied twice. I don't even.
This reverts commit 4dad640d5f.
Fixed issue 6702.
SSE do support non-vector instructions, but they _all_ overwrite the dest register
if the src location isn't a register. (wtf?)
So we have to load the src into a temporay register :(
Parsing Gecko codes (in any manner) is much like parsing HTML with regex
- that w̷a̶y̸ l̵i̷e̴s̵ m̴̲a̵͈d̵̝n̵̙ę̵͎̞̼̙̼͔̞͖͎̝s̵̨̬̱͍͓͉̠̯̤͙̝s̷͍̲̲̭̼͍͎͖̤̭̘. Luckily, with the embedded codehandler.bin,
the monstrosity may remain at only one implementation. Anyway, removing
the inserted_asm_codes thing probably speeds up the interpreter a bit.
MemArena mmaps the emulated memory from a file in order to get the same
mapping at multiple addresses. A file which, formerly, was located at a
static filename: it was unlinked after creation, but the open did not
use O_EXCL, so if two instances started up on the same system at just
the right time, they would get the same memory. Naturally, this caused
extremely mysterious crashes, but only in Netplay, where the game is
automatically started when the client receives a broadcast from the
server, so races are actually quite likely.
And switch to shm_open, because it fits the bill better and avoids any
issues with using /tmp.
This flag wasn't cleared at all, so we set our constants dirty every time...
This could fix some performance regressions because of revision 6798a4763e
Real xfb didn't provide any read_stride, so there is a division by zero.
This commit calculates the correct read_stride for real_xfb, so there is also no hack for texture vs xfb needed.
This removes the redundant code and also implements this feature for OSX and Wayland.
But so it's dropped for non-wx builds...
imo DolphinWX still isn't the best place for this, but now it's in the same file as all other hotkeys. Maybe they'll be moved to InputCommon sometimes at once ...
This is done with a pixel buffer object. We still have to stall the GPU, but
we only do it once per efb2ram call.
As the cpu can't access the vram, it has to queue a memcpy for the gpu and
wait for the gpu to finish this copy. We did this for every cache line which
is just stupid. Now we copy the complete texture into a pbo and readback this
at once. So we don't have to wait for lots of round-trip-times.
Also use attributeless rendering. But we need the src rect, so set it by uniform.
If there is a slowdown here (I doubt as the driver likely has a fast path to update uniforms)
then we should check if this rect changes and only then update the uniform.
We use attributeless rendering, so officially we have to bind _any_ VAO.
As the state of this VAO doesn't matter, we don't have to switch it.
Also fix an AMD issue as they don't like to render from an empty VAO.
We neither scale nor render from subimages, so we by using gl_Position, we don't have to generate _any_ vertices for this converting.
Also remove the glTexSubImage optimization as every driver does it when needed. But there are some workflows (eg on APU) where it's better to realloc this texture instead of a second memcpy or stall.
This fixes Real XFB Jaggies in OpenGL on games which use yscaling, such
as most PAL games.
This fixes the last of the "Real XFB Macroblocking" issues for opengl,
see issue #6503
Seems OpenGL ES 3 Requires you must have an lod argument, while Desktop
versions require you must not have a lod argument if you are using a
Sampler2DRect (which doesn't do Mipmapping).