Our defines were never clear between what meant 64bit or x86_64
This makes a clear cut between bitness and architecture.
This commit also has the side effect of bringing up aarch64 compiling support.
- remove unused variables
- reduce the scope where it makes sense
- correct limits (did you know that strcat()'s last parameter does not
include the \0 that is always added?)
- set some free()'d pointers to NULL
inline halfxb
So we know which is the first pixel by masking.
inline xl
inline xb a bit
inline yl
inline uv1.x shift
remove likely wrong guessed ternary operator
add pixel layout comment
inline xel
optimize the shifts a bit
inline xb
optimize shifts in a second step
extract xb
rename all variables
calculate cache line by position.x
Revert 5115b459f40d53044cd7a858f52e6e876e1211b4 "optimize the shifts a bit"
It seems I was wrong, the other way is the more natural.
use x_virtual_position instead of uv1.x for x_offset_in_block
This looks more natural and the offset should be masked anyway.
substitude factor with cache_lines
move 32bit logic in a conditional block
This option was known to break every second game and only boost a bit.
It also seems to be broken because of streaming into pinned memory and buffer storage buffers.
v2: also remove dlc_desc
Older Qualcomm drivers rotated the framebuffer 90 degrees and this fix didn't work.
Now for some obscene reason it rotates a full 180 degrees.
This can at least be worked around by flipping around the image on our end.
On Windows, nvidia don't give us their driver version, so we can't workaround any issues.
As buffer_storage is broken on some drivers, we wanted to disble it for them.
So we can't.
Luckyly only "some" released driver versions are affected as this extension is only available since some months. Let's hope that nobody have to use one of this driver version, else they will get a black screen ...
OSX has their own driver, so performance issues aren't shared with the nvidia driver (unlike the closed source linux and windows nvidia driver). So now they'll also use the MapAndSync backend like all other osx drivers.
fixes issue 6596
I've also cleaned up the if/else block selecting the best backend a bit.
The new chapter title in Paper Mario TTYD had a small graphical bug due to the new code because it read one extra pixel, this fixes it.
I hope this gets everything, I though I had checked most bugs and yet here I am, commit-spamming...
Fixes some remaining bbox related bugs in Mickey's Magical Mirror and a slight graphical glitch in Paper Mario: TTYD when flipping and Vivian as your companion (I've been scratching my head for days to find this one).
Instead of being vertex-based, it is now primitive (point, line or dissected triangle) based, with proper clipping.
Also, screen position is now calculated based on viewport values, instead of "guesstimating".
This fixes many graphical glitches in Paper Mario: TTYD and Super Paper Mario.
Also, the new code allows Mickey's Magical Mirror and Disney's Hide & Sneak to work (mostly) bug-free. I changed their inis to use bbox.
These changes have a slight cost in performance when bbox is being used (rare), mostly due to the new clipping algorithm.
Please check for any regressions or crashes.
This branch is the final step of fully supporting both OpenGL and OpenGL ES in the same binary.
This of course only applies to EGL and won't work for GLX/AGL/WGL since they don't really support GL ES.
The changes here actually aren't too terrible, basically change every #ifdef USE_GLES to a runtime check.
This adds a DetectMode() function to the EGL context backend.
EGL will iterate through each of the configs and check for GL, GLES3_KHR, and GLES2 bits
After that it'll change the mode from _DETECT to whichever one is the best supported.
After that point we'll just create a context with the mode that was detected
As we do lots of writes to *Iptr, the compiler isn't allowed to cache any shared variable (neither index nor Iptr itself).
This commit inlines Iptr + index into the index generator functions, so the compiler know that they are const.
We are used to render them out of order as long as everything else matches, but rendering order does matter, so we have to flush on primitive switch. This commit implements this flush.
Also as we flush on primitive switch, we don't have to create three different index buffers. All indices are now stored in one buffer.
This will slow down games which switch often primitive types (eg ztp), but it should be more accurate.
This "u32 components" is a list of flags which attributes of the vertex loader are present.
We are used to append this variable to lots of vertex generation functions, but some of them don't need it at all.
The usual way to handle this kind of request is to rise a flag which the gpu thread polls.
The gpu thread itself either generates the result or just write zeros if disabled.
After this, it rise another flag which says that this work is done.
So if disabled, we still have the cpu-gpu round trip time. This commit just returns 0 on the cpu thread
instead of playing ping pong...
Some information on this bug since this isn't quite true.
Seemingly with the v53 driver, Qualcomm has actually fixed this bug. So we can dynamically access UBO array members.
The issue that is cropping up is actually converting our attribute 'fposmtx' to an integer.
int posmtx = int(fpostmtx);
This line causes some seemingly garbage values to enter in to the posmtx variable.
Not sure exactly why it is failing, probably them just not actually converting the float to an integer and just handling the float directly as a integer.
So the bug is going to stay active with Qualcomm devices until we convert this vertex attribute from a float to a integer.
Let's talk a bit about this bug. 12nd oldest bug not fixed in Dolphin, it was a
lot of fun to debug and it kept me busy for a while :)
Shoutout to Nintendo for framework.map, without which this could have taken a
lot longer.
Basic debugging using apitrace shows that the heat effect is rendered in an
interesting way:
* An EFB copy texture is created, using the hardware scaler to divide the
texture resolution by two and that way create the blur effect.
* This texture is then warped using indirect texturing: a deformation map is
used to "move" the texture coordinates used to sample the framebuffer copy.
Pixel shader: http://pastie.org/private/25oe1pqn6s0h5yieks1jfw
Interestingly, when looking at apitrace, the deformation texture was only 4x4
pixels... weird. It also does not have any feature that you would expect from a
deformation map. Seeing how the heat effect glitches, this deformation texture
being wrong looks like a good candidate for the problem. Let's see how it's
loaded!
By NOPing random calls to GXSetTevIndirect, we find a call that when removed
breaks the effect completely. The parameters used for this call come from the
results of methods of JPAExTexShapeArc objects. 3 different objects go through
this code path, by breaking each one we can notice that the one "controlling"
the heat effect is the one at 0x81575b98.
Following the path of this object a bit more, we can see that it has a method
called "getIndTexId". When this is called, the returned texture ID is used to
index a map and get a JPATextureArc object stored at 0x81577bec.
Nice feature of JPATextureArc: they have a getName method. For this object, it
returns "AK_kagerouInd01". We can probably use that to see how this texture
should look like, by loading it "manually" from the Wind Waker DVD.
Unfortunately I don't know how to do that. Fortunately @Abahbob got me the
texture I wanted in less than 10min after I asked him on Twitter.
AK_kagerouInd01 is a 32x32 texture that really looks like a deformation map:
http://i.imgur.com/0TfZEVj.png . Fun fact: "kagerou" means "heat haze" in JP.
So apparently we're not using the right texture object when rendering! The
GXTexObj that maps to the JPATextureArc is at offset 0x81577bf0 and points to
data at 0x80ed0460, but we're loading texture data from 0x0039d860 instead.
I started to suspect the BP write that loads the texture parameters "did not
work" somehow. Logged that and yes: nothing gets loaded to texture stage 1! ...
but it turns out this is normal, the deformation map is loaded to texture stage
5 (hardcoded in the DOL). Wait, why is the TextureCache trying to load from
texture stage 1 then?!
Because someone sucked at hex.
Fixes issue 2338.
At the moment, custom textures with:
- invalid mipmap size
- invalid aspect ratio
- non-fractional scaling factors
are allowed. But they can't be loaded fine by the backend, so generate a warning if someone trys to load them.
fixes issue 6898
OpenGL defaults are GL_REPEAT, which is even more unlikely than GL_CLAMP_TO_EDGE.
As I can't test the behavoir of the real hardware, I changed it to how it works before,
but I guess just clip the texture makes more sense.
This flag wasn't cleared at all, so we set our constants dirty every time...
This could fix some performance regressions because of revision 6798a4763e
This removes the redundant code and also implements this feature for OSX and Wayland.
But so it's dropped for non-wx builds...
imo DolphinWX still isn't the best place for this, but now it's in the same file as all other hotkeys. Maybe they'll be moved to InputCommon sometimes at once ...
Now OGL doesn't rely on WX for PNG saving.
FlipImageData supports (pixel data len > 3) now.
TextureToPng is now in ImageWrite.cpp/h
Video Common depends on zlib and png.
D3D no longer depends on zlib and png.
Revert "Actually, filename really does need to be a parameter because of some random debug thing."
Revert "fix non-HAVE_WX case"
Revert "Handle screenshot saving in RenderBase. Removes dependency on D3DX11 for screenshots (texture dumping is still broken)."
This reverts commits 00fe5057f1, 74b5fb3ab4, cd46138d29 and 5f72542e06 because taking screenshots in D3D still crashed for me so there was no point in the code changes (which I found ugly anyway).
This reverts commit 6cece6b486.
In fact, there was a _huge_ speedup on lots of games (mostly on nvidia+ogl), but there are some crashes on D3D.
I have to fix this crash and then I'll commit something like this again :-)
Conflicts:
Source/Core/VideoCommon/Src/TextureCacheBase.cpp
We often need the same native texture objects for new textures. This commit
try to avoid destroying and creation of this textures by pooling them.
This should be a big performance gain for some efb2ram games as they may
overwrites partially a cached texture (which would be deleted) and afterwards
try to read it.
Creating/destroying sounds like an easy task, but it isn't. eg the nvidia ogl
driver synchonize their threads do avoid use-after-free issues.
D3D doesn't allow bigger viewports than rendertargets. But flipper does, so the viewport will be clipped and the transformation matrix will be changed.
This was done in the D3D backend itself. This is now moved into VideoCommon. This don't reduce code, but in this way, VideoCommon doesn't depend on the backends.
This isn't needed for both OGL+D3D11 as they support sample shading directly. So we
could use the common MSAA util shaders instead of writing custom ones.
* Currently there is no DEBUGFAST configuration. Defining DEBUGFAST as a preprocessor definition in Base.props (or a global header) enables it for now, pending a better method. This was done to make managing the build harder to screw up. However it may not even be an issue anymore with the new .props usage.
* D3DX11SaveTextureToFile usage is dropped and not replaced.
* If you have $(DXSDK_DIR) in your global property sheets (Microsoft.Cpp.$(PlatformName).user), you need to remove it. The build will error out with a message if it's configured incorrectly.
* If you are on Windows 8 or above, you no longer need the June 2010 DirectX SDK installed to build dolphin. If you are in this situation, it is still required if you want your built binaries to be able to use XAudio2 and XInput on previous Windows versions.
* GLew updated to 1.10.0
* compiler switches added: /volatile:iso, /d2Zi+
* LTCG available via msbuild property: DolphinRelease
* SDL updated to 2.0.0
* All Externals (excl. OpenAL and SDL) are built from source.
* Now uses STL version of std::{mutex,condition_variable,thread}
* Now uses Build as root directory for *all* intermediate files
* Binary directory is populated as post-build msbuild action
* .gitignore is simplified
* UnitTests project is no longer compiled
D3D9 only supports 8 texcoords. But we need a new one for ppl, so we just store it in the first 4 texcoords in the free 4th component.
This isn't needed for both d3d11 and ogl3, so just remove it.
This isn't needed in VertexShaderManager as it's still in the old dirty flag way.
But it's very importend for PixelShaderManager as some float4s aren't initialized as 0.0f
The old way was to use a dirty flag per setter. Now we just update the const buffer per setter directly.
The old optimization isn't needed any more as the setters don't call the backend any more.
The follow parts are rewritten:
Alpha
ZTextureType
zbias
FogParam
FogColor
Color
TexDim
IndMatrix
MaterialColor
FogRangeAdjust
Lights
The upload in the backend isn't done, it's just pushed by the mostly removed SetMulti*SConstant4fv.
Also no optimizations was done on VideoCommon side, but I can start now :-)
Sorry for the hacky way, but I think this is a nice (working) snapshot for a much bigger change.
It changes the string in the Android backend select to just OpenGL ES.
Adds a check in the Android code to check for Tegra 4 and to enable the option to select the OpenGL ES backend.
Adds a DriverDetails bug under BUG_ISTEGRA as a blanket case of Tegra 4 support.
The changes that effects most lines in this change. Removing all float suffixes in the pixel/vertex/util shaders since OpenGL ES 2 doesn't support float suffixes.
Disables the shaders for reinterpreting the EFB format since Tegra 4 doesn't support integers.
Changes GLFunctions.cpp to grab the correct Tegra extension functions.
Readds the GLSL 1.2 'hacks' as GLSLES2 'hacks' since they are required for GLSL ES 2
Adds a GLSLES2 to the GLSL_VERSION enum.
Disable the SamplerCache on Tegra since Tegra doesn't support samplers...
Enable glBufferSubData on Tegra since it is the only mobile GPU to correctly work with it.
Disable glDrawRangeElements on Tegra since it doesn't support it, This uses glDrawElements instead.
- Call ABI_AlignStack even on x86-64.
- Have ABI_AlignStack respect the difference in current alignment
between the root JIT function, which has a prolog, and
ProtectFunction thunks, which do not. This was causing many games
to crash on start on OS X. Since this might otherwise mean changing
the stack pointer before every call...
- Have one prolog/epilog function rather than two (one of which
definitely did not do what it was thought to do), and make it
actually work like a normal one, so that the stack frame shows up
properly in the debugger. There should be no performance impact.
attn is sometimes very big (eg 1e27), so attn*attn doesn't fit into a float.
So the funny part here is: 0.0 * (1e27*1e27) = 0.0 * Inf = NaN
As the shader compiler is allowed to change the order of multiplications,
this issue isn't fixed completely.
Use strings internally, use a multimap and std::function for callbacks (instead
of a flat vector + loop over the vector to find the right callback type), fix
coding style issues. Simplify MainAndroid code a bit.
It's possible to configure to use the vertex color as lightning source without enabling the vertex color at all.
The old implementation will use zero, but it seems to be wrong (prooven by THPS3), more likely is to disable
the lightning and just return the global color.
This fixes THPS3 on OpenGL, but it isn't verifed on hardware
- rewrite loops to not use divisions and multiplications
- remove warnings as the current implementations seems to be correct (ignore additional vertices)
use always ppd is a huge gpu performance drop: 20%-50%
and always disable it cause some rendering issues
so there is an option again
But this time it's called "Fast Depth Calculation"
fix issue 6282
The Last Story seems to render a fan with two vertices. It is non-sense as it
shouldn't do anything, but the code underflows at (u32)numVerts-3
Block braces on new lines.
Also killed off trailing whitespace and dangling elses.
Spaced some things out to make them more readable (only in places where it looked like a bit of a clusterfuck).
See Render.cpp, PixelShaderGen.cpp, and PixelShaderManager.cpp for most of the changes.
See VertexShaderManager.cpp for a logging typo fix.
See SWRenderer.cpp for a small typo fix for a message that gets swprintf'd in DrawDebugText.
See SWVertexLoader.cpp for a typo fix of an assert message.
Should slightly improve the readability of some of those files.
This commit mainly elaborates on some messages a little more. Also fixes some typos that slipped through the last commit.
A large change in text can be seen in EXI_DeviceMemoryCard.cpp. I added more info as to why a write to a memory card may fail. (This actually was a reason I was unable to write to a memcard recently).
Elaborations can be seen in WGL.cpp
I did change some comments in some files that I was correcting logging messages in, however this is only if I spot a typo or if an abbreviation is lower-cased. Even in that case, the amount of changes done to comments is very minimal.
Sorry for a direct commit to the main branch but i need fast feedback, and i don't want to leave problematic code in the main branch for a long time.
if this approach does not work for the drivers with problems will transform dual source blend to an option in the D3D9 backend.
I appreciate the help of the people that tested my last commit and thanks to neobrain for pointing this solution.
Convert all quads+triangles into trangle_strip and uses primitive restart to split them.
Speed up triangle_strip, but slows down all others primitive formats.
Only implemented in ogl.
this implementation does not work in windows xp (sorry no support for dual source blending there).
this should improve speed on older hardware or in newer hardware using super sampling.
disable partial fix for 4x supersampling as I'm interested in knowing the original issue with the implementation to fix it correctly.
remove the deprecation label from the plugin while I'm working on it.
* Fast-EE:
Forced the exception check only for ARAM DMA transfers. Removed the Eternal Darkness boot hack and replaced it with an exception check.
Reverted rd76ca5783743 as it was made obsolete by r1d550f4496e4.
Removed the tracking of the FIFO Writes as it was made obsolete by r1d550f4496e4.
Forced the external exception check to occur sooner by changing the downcount.
# By skidau (30) and Pierre Bourdon (1)
* FIFO-BP: (31 commits)
Set g_bSignalTokenInterrupt on the main thread. Fixes the random hang in Harry Potter: Prisoner of Azkaban.
Used a scheduled event to generate the ARAM DMA interrupt if the DMA is greater than a certain size. Fixes NFS:HP2 GC.
Bumped up the disc transfer speed enough to prevent audio stuttering in Gauntlet: Dark Legacy.
Enabled Synchronise GPU on "SPEED CHALLENGE - Jacques Villeneuve's Racing Vision". Required to go in-game.
Added direct GameCube controller commands to the Serial Interface emulation. Fixes the controls in MaxPlay Classic Games Volume 1 and the Action Replay disc.
Increased the FIFO buffer size to 2MB from 1MB. Fixes Killer 7's Angel boss.
Used an immediate GenerateDSPInterrupt when transferring data from ARAM to MRAM and a scheduled DSP interrupt when transferring data from MRAM to ARAM.
Fixes the audio cutting in and out in the Resident Evil GC games using DSP HLE. Triggered the ARAM interrupt by the scheduler instead of directly in function.
Implemented proper timing for the sample counter in the AudioInterface, removing the previous hack. Cleaned up some of the audio streaming code.
Skipped the EE check if there is a CP interrupt pending.
Disabled "Speed up disc transfer" from the ZTP GC game ini.
Removed the disc seek times for GC games and removed the disc speed option on Wii games. Checked for external exceptions only in mtmsr.
Delayed the interrupts in the EXI Channel.
Merge aram-dma-fixes (r76a13604ef49b522281af75675f044d59a74e871)
Added a patch that bypasses the FIFO reset code in Wallace and Gromit: Project Zoo, allowing it to go in-game.
Made vertex loading take constant time.
Increased the cycle time of the vertex command. Fixes "Speed Challenge: Jacques Villeneuve's Racing Vision".
Moved the setting of the Finish interrupt signal back to the main thread as it was causing Wii games like Resident Evil 4 (Wii) to hang.
Profile stores, fp stores and ps stores only to the fifo write addresses list. This should make the JIT a little faster as it will not be checking for external exceptions unnecessarily.
...
Conflicts:
Source/Core/VideoCommon/Src/PixelEngine.cpp
Adds support for PE performance metrics.
Used in Super Mario Sunshine's "Scrubbing Sirena Beach" level to determine when enough goop has been cleaned up to finish the level.
Also used in TimeSplitters: Future Perfect to determine the appearance of flares around light sources (e.g. sun).
OpenGL and D3D11 only. D3D9 support unlikely to be added unless anyone bothers to do the work.
Initial work and D3D11 support by me. Kudos go to Billiard for adding the OpenGL support and reviving development of this branch that way :D
Slightly (~7%) decreases performance when performance metrics are used (and only then).
Fixes issue 1498.
Fixes issue 5368.
# By Jordan Woyak (46) and others
# Via Jordan Woyak (2) and others
* master: (70 commits)
Fixes two memory leaks, one is pretty bad for OSX. Yell at pauldachz if this doesn't work. Or... say thanks.
Added a BluetoothEnumerateInstalledServices call so that the wiimote remembers the pairing.
Make ARMJit core default CPU core on ARM architecture
Fix a StringUtil regression from the arm-noglsl merge
Small improvement to cmpli/cmpi in ARMJit.
Merge latest ArmEmitter changes from ppsspp while we're at it.
Ah. I blame vim on this typo entirely.
Add disabled code for authenticating wiimotes on Windows.
Add the missing FPR cache
Buildfix.
Yell at the user if they change window size while dumping frames, and some other avi dumping stuff.
Not sure if this is the right way to handle this, but it makes the save states perfectly stable. That's all that really matters, right?
Abort loading states from incompatible graphics backends.
ARM Support without GLSL
Improve VideoSoftware save states. They are fairly stable, but not perfect. OpcodeDecoder::DoState() needs to be fixed.
Begin implementing save states to video software. Kind of works, sometimes.
Make error message for loading save state with wrong dsp engine shorter.
Abort load state if it uses a different dsp engine, instead of crashing.
Update the gameini of F-zero. Efb to Ram is no longer the default choice.
fix last commit by neobrain
...
Conflicts:
Source/Core/VideoCommon/Src/Fifo.cpp
This reverts commit 4653adecf1.
Also dx9 isn't able to hanlde more than 11 varying registers.
More frustrating is the lightning issue by this commit. I don't know why it happens...
# By Jordan Woyak (9) and others
* master:
Fixed a buffer overflow in the OpenAL buffer.
TextureCache: Fix D3D backends crashing when a game uses multiple 1x1-sized LODs.
WII_IPC_HLE_Device_FileIO: don't rebuild the filename on every operation.
Some cleanup of CWII_IPC_HLE_Device_FileIO: The real file was never kept open for longer than a single operation so there was no point in dealing with it in DoState. Saving the real path in the savestate was also probably a bad idea. Savestates should be a bit more portable now.
Removing destination on rename when source isn't present doesn't make sense. IOCTL_RENAME_FILE still might not be totally correct.
Change some CNANDContentLoader logic to what was probably intended. Kills some warn logs when opening Dolphin.
Let's not CreateDir an empty string every time CreateFullPath is used, logging an error every time.
Fix a memleak. Probably/maybe improve USBGecko performance.
Remove the core count from the cpu info OSD message. It was often wrong and not rather important.
Use omp_get_num_procs to set the number of OpenMP threads rather than our core count detection.
Bulk send TCP data to the client with the emulated USB Gecko.
Added the ability to reverse the direction of the force feedback by allowing negative range values.
Changes/cleanup to TextureCache::Load and other mipmap related code. The significant change is what is now line 520 of TextureCacheBase.cpp: ((std::max(mipWidth, bsw) * std::max(mipHeight, bsh) * bsdepth) >> 1) to TexDecoder_GetTextureSizeInBytes(expanded_mip_width, expanded_mip_height, texformat);
The significant change is what is now line 520 of TextureCacheBase.cpp:
((std::max(mipWidth, bsw) * std::max(mipHeight, bsh) * bsdepth) >> 1)
to
TexDecoder_GetTextureSizeInBytes(expanded_mip_width, expanded_mip_height, texformat);
Fixes issue 5328.
Fixes issue 5461.
# By Jordan Woyak (24) and others
# Via Jordan Woyak (3) and others
* master: (66 commits)
Reduce some DI command delays. Fix DKCR hanging with DSP HLE. My other games continue to work.
Video_Software: Fix ZComploc option breaking stuff.
Video_Software: Fix the ZFreeze option doing nothing.
Video_Software: Toggable zfreeze and early_z support for testing.
Fix header guard and definitions not being set to 1
Add the option to turn on only the EGL interface to use desktop OpenGL with it.
Change the ugly "no banner" banner to the sexy "X" from the website.
Fix a crash in the FifoPlayer dialog.
Use different reply delays for various DI commands. Fixes issue 5983.
Revert "[bugfix] DX9::TextureCache: Use max_lod instead of min_lod where necessary."
Fix some potential issues when blending on EFB formats without alpha. Clean up state transition tables.
Disable play and record buttons if an iso was selected, but is later deselected.
Disable start/play recording buttons when no iso is selected.
Only delay DI and fs IPC replies. Fixes issue 5982.
Fix compilation with SDL2. (based on a patch from matthewharveys) Fixes issue 5971.
"Fix" using SDL from externals.
Clean up SDL includes a bit. Maybe fix an SDL2 problem.
Number "unknown" axes in OSX rather than call them all "unk".
Revert "Only delay DI command replies." Fix "Wii Party" again.
Hopefully make wiimote speaker less crappy.
...
Conflicts:
Source/Core/VideoCommon/Src/PixelShaderGen.cpp
Source/Plugins/Plugin_VideoSoftware/Src/SWRenderer.cpp
immediate-removal is a new created branch seperated from master but reverted the revert of immediate-removal
so we get less conflicts by merging
Depth calculations are always done in the pixel shader now.
Due to the unpredictability of our zcomploc hacks this commit probably changes the behavior of some games which use zcomploc.
That's what would've been done in the next TCB::Load() call, anyway.
Fixes issue 5742.
Additionally, change efb copies to specify 1 as the number of mipmaps because that makes more sense than anything else.
change naming in all the backends vertex managers to make more easy to continue with the merge an some future improvements.
please test this as i'm interested in knowing the performance in linux and windows with the different hardware platforms.
So to compensate lets bring back some speed to the emulation.
change a little the way the vertex are send to the gpu,
This first implementation changes dx9 a lot and dx11 a little to increase the parallelism between the cpu and gpu.
ogl: is my next step in ogl is a little more trickier so i have to take a little more time.
the original concept is Marcos idea, with my little touch to make it even more faster.
what to look for: SPEEEEEDDD :).
please test it a lot and let me know if you see any problem.
in dx9 the code is prepared to fall back to the previous implementation if your card does not support the amount of buffers needed.
So if you did not experience any speed gains you know where is the problem :).
for the ones with more experience and compression of the code please test changing the amount and size of the buffers to tune this for your specific machine.
The current values are the sweet spot for my machine.
All must Thanks Marcos, I hate him for giving good ideas when I'm full of work.
PS and VS making. Untested and won't work for now.
Add in program shader cache files.
Readd NativeVertexFormat stuffs.
Add in PS and VS cache things.
SetShaders in places.
Fixed EFB cache index computations in OpenGL renderer.
The previous computation was very likely to go out of array bounds,
which could result in crashes on EFB access.
Also, the cache size was rounded down instead of up. This is a problem
since EFB_HEIGHT (528) is not a multiple of EFB_CACHE_RECT_SIZE (64).
Very useful to compare performance between two builds, check the impact of
a configuration option, etc. FPS log is stored in User/Logs/fps.txt and is
reset each time you launch a game. Only enabled if you check the "Log FPS
to file" option in your graphics settings.
Could be improved a bit: currently logs only every 1s (so you can't really
see small variations), maybe output more infos to the fps.txt like
average/stddev (but Excel/Libreoffice/Google Docs can compute that easily
too).
Super Mario Sunshine is using a cool trick: To determine how much goop has been cleaned in ep. 6 of Sirena Beach, it counts the number of pixels that are input to the blending stage. For that it's using the PE performance registers ;)
Fixes issue 1498.
Reason:
- It's wrong, zcomploc can't be emulated perfectly in HW backends without severely impacting performance.
- It provides virtually no advantages over the previous hack while introducing lots of code.
- There is a better alternative: If people insist on having some sort of valid zcomploc emulation, I suggest rendering each primitive separately while using a _clean_ dual-pass approach to emulate zcomploc.
This reverts commit 0efd4e5c29.
This reverts commit b4ec836aca.
This reverts commit bb4c9e2205.
This reverts commit 146b02615c.
the intent is to replace the haphazard scheduling and finger-crossing associated with saving/loading with the correct and minimal necessary wait for each thread to reach a known safe location before commencing the savestate operation, and for any already-paused components to not need to be resumed to do so.
This branch reduces the number of useless state flushes in the video
emulation layer by checking whether a BP/XF change will have an effect
or not. Greatly reduces the number of GL calls per frame.
Thanks to degasus for his help!
When a game writes the same value that was already configured to a BP
register, Dolphin previously flushed the GPU pipeline and reconfigured
the internal video state (calling SetScissor/SetLineWidth/SetDepthMode).
Some of these useless writes still need to perform actions, for example
writes to the EFB copy trigger or the texture preload registers (which
need to reload the texture from memory).
This adds an "Analyzer" tab to the fifoplayer dialog which allows to conveniently browse through all register pokes that are being sent by the game each frame.
There's also a search function, but it doesn't work all that well for anything but simple searches at the moment. However, I'm merging this anyway since I'm not sure if I'm going to finish this.
Note that due to recent fifo changes, it's not yet possible to run fifoplayer in dual-core mode.
please test for regressions, speed and for other issues fixed, as a example, the black color in water splash in super mario galaxy are fixed with this rev.
please as soon as yo find a bug let me know.
a little code cleaning to avoid duplicated execution of AlphaPreTest and a little correction to some comments from the previous commits.
this change must behave exactly like last revision, if something is broken please let me know
to marcosvitali.
Added an external exception check when the CPU writes to the FIFO. This allows
the CPU time to service FIFO overflows. Fixes random hangs caused by FIFO
overflows and desyncs like in "The Last Story" and "Battalion Wars 2". Thanks
to marcosvitali for the research.
Added some code to unlink invalidated blocks so that the recompiled block can be
linked (speed-up).
This release still fixed the hangs produced by fifo overflow without sacrifice
performance. For example you can test Tutorial moves at the beginning of The last history now
is fluid 30/60.
Fixed possibles random hangs in DC mode.
Fixed hangs in DC mode in (Simpsons, Monkey Island, Pokemon XD, etc)
Implemented accurate management of Pixel Engine Interrupts. Now the GPU loop
is stopped when a PE Interrupt needs to be managed and resumed when Pixel Engine
finish.
Fixed Metroid Prime 3 and 2 desync. And other games with desync because of
FIFO Reset. That happens because FIFO_RW_DISTANCE_HI must be written first, for checking
fifo.CPReadWriteDistance == 0, so some fifo resets was not managed in the right
way.
Fixed Super Monkey Ball in some cases when the game write the
WriteReadDistance need to be safe like the SafeCPRead.
Improved the CheckException for the GatherPipe writes in JIT, now only the
External Exceptions are processed.
Fixed definitely Pokemon XD in dual core mode. This game is doing something
not allowed. It attach to CPU the same fifo attached to the GPU in multibuffer
mode. I added a check to prevent overwrite the GPU FIFO with the CPU FIFO. If
the game do that on breakpoint the solution can fail.
Fixed ReadWriteDistance calc when CPRead > CPWrite.
Added Token and Finish cause to GP Jit checking.
Additional cleanup in CommandProcessor.
Fixes issue 5209
Fixes issue 5055
Fixes issue 4889
Fixes issue 4061
Fixes issue 4010
Fixes issue 3902
This fix is not related with the previous commits, but the previous commits help me to see that because in the new scenery SMB was hanging. May be in the past also doesn't boot some times because of that.
Please Test FZero boot also. Thanks.
The external exceptions in dolphin are checking frequently but is different to real HW, so sometime the game is in a loop checking GPU STATUS, the exceptions doesn't checked, and the game hang.\
For solve this I need a trick: still waiting for the exception handler be linked but if CommandProcecsor is reading the GPStatus, resume this.
This fixed "TimeSplitters: Future Perfect" broken in the Revision c2e6fdf09f and surely others games.
That happens because FIFO_RW_DISTANCE_HI must be written first, for checking fifo.CPReadWriteDistance == 0, so some fifo resets was not managed in the right way.
I didn't test Metroid 2 desync reported in Issue 4336 but I think is the same.
About the flickering in MP2, I don't know for my is not related or yes, but you can test anyway.
Fixed Issue 3902
Well now the FIFO is 99.99% finished :)