Issue1: Depth buffer is wrongly invalidated only the first page is detected.
Issue2: First page seems to be partially written. Could be a GSdx transfer bug.
Anyway, invalidation only support a page granularity.
So here a quick workaround that will clear depth buffer in case of very small partial write.
Might worth to check regression on nocturne/digital saga
It would requires some texture dynamic width convert shaders.
So as a quick solution, let's add a new CRC hack.
For issue #1362 (granted the CRC is correct)
Faster :) Reduce further the cost of accurate date
The optimization will clear the stencil to 1. So all pixels will have a
single sample that pass both the depth & stencil test. No primitive
overlaps So the destination alpha test can be done directly in the
shader.
Game often uses date to allow a single pixel pass. If this
use case is detected, stencil buffer will be cleared after first pixels
that pass both depth&stencil test.
It seems to reduce the load on the GPU.
Note: with the help of texture barriere, maybe we could implement the algo
with a single pass.
If a color buffer is still attached and is smaller than depth buffer,
the latter won't be fully cleared.
As a faster alternative, use GL4.4 clear texture function. Avoid to fiddle with
framebuffer and pixel tests.
Fix #1362x Ar Tolenico 2 map clip
So just reuse GT hle shader :) Acid stage is now correct. However it might need
some tuning for others stages.
Still look awful with uspcaled resolution (note internal game framebuffer is around 160x128)
Until AMD release the driver with a fix, I can't use 2nd blending source with SSO.
So let's use the first source. Blending/Alpha will be wrong. But it is likely better
than an uninitialized alpha value.
"Enable Hardware Depth" removed from main dialog.
"Disable Depth Emulation" and "Fast Texture Invalidation" added to Hacks
dialog.
And fix lots of whitespace issues.
GSDX: Improvements to the config interface.
- GSDX: Add new logos to dialog
- GSDX: Remove all the extra null renderers
- GSDX: Changes to renderer combobox
- Sort all the renderers in ascending order. (the fact that D3D11 was
above D3D9 really annoyed me >_<)
- Properly display usage of D3D10/D3D11 on the combobox.
- Use highest available version of DX by default.
- GSDX: gray out upscaling hacks at native resolution
- GSDX-PSX: Modifications to the dialog
- Add new logos
- Remove SDL renderer from combobox since it was removed long ago.
GSOffset is already based on a lookup of PSM/BP/BW. Coverage only adds
the size parameters (so only 256 possibilities)
It replaces the hash lookup with a free array access.
Will use integral coordinate to avoid any rescaling.
Bilinear interpolation isn't supported. I don't think it is allowed to
filter a depth texture anyway.
The hypothesis is that game will use a depth (aka Z32/Z24/Z16/Z16S)
format when sampling depth texture as color. Technically one could use
a standard color format but block/pixel order won't be the same.
(otherwise I'm screwed)
=> Hypothesis invalid on GoW. They just do a scrambled rendering...
Lookup info:
* The first searched list is the depth pool as we search a depth
texture.
* 2nd one is the render target pool (if a depth was converted to a
render target already)
To avoid any CPU overhead, the source will be a pointer to the real texture
* Conversion (if float texture) will be done on the fly by the shader (GPU).
* Relative rescaling won't be supported. Texture must be fetched with
integral coordinate
Cache page coverage of texture into a hash map
Test done on Champion of Norrath (paltex + DisablePartialInvalidation)
Profiler:
Self of GSTextureCache::SourceMap::Add 5.39% => 0.23%
Self of GSTextureCache::LookupSource 15.27% => 10.82%
Hard to measure on CoN as it depends on memory transfer. Seem to be 5-10 fps faster.
This reverts commit 53690cf9d0.
Quoting user:
For aliasing, the option allow of reduce a little but always very
visible compared with DX11 even with anisotropic OFF, , furthermore
many textures bug added with option activated (predictable but not see
on DX11 with anisotropic ON).
TL;DR doesn't worth it.
Note: it seem to work on DX because DX uses HW texturing in clamp region
mode (and others invalid case). OpenGL uses SW texturing to ensure accuracy
* keep a reference of program/pipeline created to ease the deletion
* extend a bit the API to support multiple pipeline
Final goal will be to use a pre link pipeline for SW shaders. And uses
the default pipeline for HW shaders.
By default, anisotropic filtering was disabled when textures aren't countinuous.
This hack allows to force it. It can help to reduce aliasing but it would create
unexpected effect on texture boundaries.
Again, someone ought to add the option on Windows too
Someone ought to add the Windows option too (and DisablePartialInvalidation too)
It might break a couple of games but most of them run better with depth enabled.
Fixes an issue with the D3D backends crashing if the configure dialog
is accessed and ok is pressed. The D3Dcompiler dll is freed and a null
pointer is dereferenced.
It might break gsdxgui but GSshutdown really should not be called unless
GSdx is shutting down. GSDumpGUI on Windows provides the same (or
better) functionality.
* Silent Hill 2 doesn't need the CRC hack
* GSRenderer: no need to explicitly set bottom value for r.
* Texture Cache: Removed a check which couldn't possibly enter true
branch.
For some reason some Windows 7 systems (most are unaffected) cannot cope
with LoadLibraryEx and return error code 87 - "The parameter is
incorrect".
Switch to using LoadLibrary instead for any case where Windows 7 is
expected to successfully load the requested dll. Potentially Windows
Vista is also affected.
So let's increase the height. It will increase the memory requirement on some games
v2: try to do it automatically
(not sure it will useful as most game will requires it)
v3: let's back to an hardcoded 1280 size. It generates too much issue
Try to avoid random black screen frame
v2: don't force the preload hack on the frame
It creates a ghost image over FMV
v3: support offset within a frame
The long story:
Game blits FMV far aways of the RT which is actually the input of the RO texture...
Currently GSdx suffers of 2 bugs.
1/ RT is too small
2/ texture isn't properly updated with the rendered value. Texture is invalidated
but it reads back the pixels from the GS memory whereas the correct
value is located on the GPU.
This commit will replace the standard draw by a manual blit. Therefore it avoid
size issue and bad upscaling issue.
v2:
* Use various copy to be more compatible with dx api
* Move all part of the hack info the BlitFMV function
v3: add log message
It often happens the game try to upload the FMV directly which typically
gave a black screen.
Commit fix rules of roses and I hope various black screen FMV
Performance impact must be tested, and I'm afraid of strange texture cache behavior.
V2: check the size of the transfer too
V3: add support of 16 bits format
V4: avoid division by 0
Using D3DX11 requires the end user to install the DirectX redist files.
Switch to using D3DCompile, and distribute D3DCompiler_47.dll for
Windows Vista, 7 and 8 users (Windows 8.1 onwards supplies
D3DCompiler_47.dll with the OS).
It actually removes the previous hack that read the full target.
Unfortunately snowblind engine game uses big target so the read is very big too (1280x448)
which is killer for the perf. Whereas the game requires only 24x12 texels
Give a 2x speed boost on Champion of Norrath !!!
Games uses very special texture with a lots of repeating.
It is much faster to send the full texture rather than trying to partially invalidate it.
On my gs dump:
FPS: 29 => 68 !
Basically I ran
find . -name "*.vcxproj" -exec sed -i -e 's/_xp//' {} \;
This will likely break XP, but it paves the way on Windows for a PCSX2
that does not require the DirectX redistributables to be installed for
Windows 8, 8.1 and 10 users. Windows Vista and 7 users will still require
the DirectX redistributable files for XInput and XAudio, though PCSX2
should still be capable of running if a user does not actually use either
of them.
All GL4 extensions supported by DX10 class GPU will be soon mandatory
Namely:
* GL_ARB_copy_image
* GL_ARB_texture_barrier
* GL_ARB_clip_control
* GL_ARB_direct_state_access
* GL_ARB_separate_shader_objects
* GL_ARB_buffer_storage
There are likely few games (RE4) which constantly change the FBW register value causing the framebuffer width to be updated at every interval. Adding a safe limit (512) similar to frame buffer height would prevent such constant changes of the framebuffer width when FBW changes once again to an even lower value.
Valid values for png_compression_level are from 0 (no compression) to 9
(max compression). The default is 1.
v2: Use zlib Z_BEST_SPEED (1) and Z_BEST_COMPRESSION (9) defines.
The following patch uses the height value of the display rectangle rather than make an estimation of the Frame buffer height when the game uses a non-referenceable height (or) width.
Don't lookup a depth buffer if depth test is always pass without write
Boost performance on Tekken5 when depth emulation is enabled in openGL
(Tekken5 sets same address for both the RT and the depth but depth is disabled)
v2:
Keep ds if DATE is enabled (some implementation uses a stencil buffer)
Be more aggressive to avoid an useless depth lookup
Texture coordinate could be dummy/float/int integral/int normalized.
Old behavior:
* VS was in charge to select the texture coordinate
* int integral format wasn't supported
New behavior:
* Always compute all formats
* FS will be in charge to select the good format
Impact:
* VS will be slightly slower but it reduces shaders permutation from
little to 0 (won't be bad for CPU)
* FS speed isn't impacted as 2 separate code paths were already required
to support both format
* Rasterizer will be 33% slower but unlikely to be the limited factor of
the GPU
* In future we could directly use the integral format in the FS.
V2: remove useless PSin_t
It would be on by default. Unsafe & fast path.
The hack is a security if someone encounters any issue
v2: update Windows gui file
v3: fix typo in tooltip and linux gui
Visual Studio Find and Replace can only be trusted if all the files are
included in the project. I suppose it's time to add any missing files
to the relevant projects...
The old implementation saved the current value of a GSSetting as uint in
a field called 'id'. The implementation of GSSettings suggests that
GSSettings could be saved in a database with id as primary key. This
would require a translation look up from id to value but could have all
advantages of a database. However the interface to GSSetting was never
implemented like that.
In the new implementation GSSetting has a 'value' field that stores an
int representative value of the desired state. Additionally the
constructor is 'overloaded' as template to reduce casting in the
consumer code. However all consumer values need to be castable to int.
Accordingly combobox initialization was adjusted.
Initially it was free to do the SW blending because safe fbmask
will already do a sw blending.
Unsafe version uses a fast path with a limited blending. Therefore
SW blending isn't free anymore.
Improve the speed of the previous speed hack (xenosaga 1)
The hack relies on the undefined behavior of the hardware so it can
potentially generate rendering corruption.
This new hack drops the cache flusing when only the alpha channel is masked.
Alpha is a direct copy of the fragment. Normally masked bits will be constant
everywhere (RT, FS output, texture cache) so it would likely work.
Just in case, code is only enabled with the new shiny hack
The previous behaviour loaded the saved renderer config whenever the
adapter combobox was changed. The renderer will now only change if the
new adapter doesn't support the currently selected renderer (i.e
Direct3D11 might not be supported, so it'll revert to Direct3D 9).
Fixes#1080.
The Wild Arms Offset text was slightly cut off due to the label being
too small. Make the dialog slightly wider so the full text will fit.
Someone should probably make the dialog look nicer at some point.
Avoid a crash on Onimusha3 (PAL 60HZ)
In theory it will be better to find the root cause of overflow. I.e. somewhere in this
code below. Dirty rectangle is too big.
***********************************************************************
if(rowsize > 0 && offset % rowsize == 0)
{
int y = GSLocalMemory::m_psm[psm].pgs.y * offset / rowsize;
if(r.bottom > y)
{
GL_CACHE("TC: Dirty After Target(%s) %d (0x%x)", to_string(type),
t->m_texture ? t->m_texture->GetID() : 0,
t->m_TEX0.TBP0);
// TODO: do not add this rect above too
t->m_dirty.push_back(GSDirtyRect(GSVector4i(r.left, r.top - y, r.right, r.bottom - y), psm));
t->m_TEX0.TBW = bw;
continue;
}
}
***********************************************************************
So as a temporary solution (that will likely stay for a couple of
years), buffers were increased.
Height of the dirty rectangle must be the GS size of the RT. Of course
RT doesn't have any height so we compute the max safest value.
Fix issue #987
Candidate for 1.4 release
Both the Linux and Windows config dialogs now have a TV Shaders combobox,
so the F7 toggle can be made temporary. This makes the hotkey behaviour
consistent with all the other hotkeys.
The old one isn't working. I don't think there's a URL that redirects to
whatever language the user is using (unless my browser settings are
wrong), so I've just used the English US URL.
1. Add GS_Renderer Enum
Replace all instances of int/uint32 renderer identifier by a strongly
typed enum and appropriate casts.
Only instances in GS[*].cpp/h classes were touched. GPU[*].cpp/h classes
do not to follow the same convention.
2. Add default renderer according to OS
The default renderer is OS dependent (Win -> Dx9HW, others -> OGLHW).
Consequently one should always check againt the appropriate default
value on config load.
The old behaviour was only - if a at all - problematic if the respective
element in the gsdx.ini was missing and probably even then didn't create
issues. The current implementation is still more stable and does not
depend on the implementation of GS.cpp -> GetConfig()
The following patch adds Mipmap option (software mode exclusive) and Preload Data Frame (Hardware mode exclusive) to the GSDX plugin settings for debug purposes.
The goal is to check the impact on game that have wrong RT content.
It helps a bit Smash Court Tennis Pro Tournament 2 but the game suffers
another texture cache bug. (RT BW is 10 whereas texture BW is 8)
Note: Armored Core: Last Raven must be tested (only game so far
that rely on the option and I didn't want to add a new one).
Typical wrong draw:
1/ draw in 32 bits
2/ draw in 24 bits
3/ Use alpha as a texure. (Must reuse the GPU data)
4/ Write alpha from EE
5/ Use alpha as a texure. (Must upload new data)
This commit fixes the step 5.
Fix#917 (Conflict - Desert Storm)
CID 146973 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)2. uninit_member: Non-static class member overflow is not initialized in this constructor nor in any functions that it calls.
2999}
runion/rempty/rinter requires x < z and y < w
Help issue #762 (accurate blending issue)
If you want to shine, please put better GSVector code (AVX512 is 2 instruction :p)
In the DATE42 algo, first pass must find the primitive that write the
bad alpha value. If depth test is fail, alpha value won't be written therefore
you mustn't keep the primitive id.
In theory to ensure 100% correctness, depth would need to be fully executed
(currently depth write is disabled). However it requires to copy the depth buffer.
It is likely bad for the perf.
Issue reported on DBZInfWorld
Function pointer was mangled to avoid any collision. Nowadays all symbols
are hidden so no risk of collision.
Syntax is nicer beside it would allow to put back GLES3.2. I think it
supports most of the used extension.
glActiveTexture & glBlendColor are provided without symbol query.
CID 146834 (#2-1 of 2): Division or modulo by zero (DIVIDE_BY_ZERO)9. divide_by_zero: In expression tpf * 10000ULL / ttpf, division by expression ttpf which may be zero has undefined behavior.
CID 146839 (#1 of 1): Explicit null dereferenced (FORWARD_NULL)11. var_deref_model: Passing null pointer fb_pages to UsePages, which dereferences it.
CID 146840 (#1 of 1): Explicit null dereferenced (FORWARD_NULL)11. var_deref_model: Passing null pointer zb_pages to UsePages, which dereferences it.
* Prevent a potential null pointer deference in ```void GSRendererSW::UsePages()```
All combobox text can now be seen in full without having to click on the
combobox.
The internal and custom resolution stuff has been moved into the Hardware
Mode Settings groupbox since it doesn't affect software mode.
The dialog has also been rearranged a bit.
shaders/GSdx.fx is now the default location and is no longer hardcoded.
The external shader and external shader config can now be selected. (The
OpenGL renderer already has this feature.)
Note: It is still possible to not use a config file, just use an invalid
value for shaderfx_conf.
Don't use D3DX compile from file and compile from resource functions -
use the compile from memory function instead. It does the same thing,
except you have to set things up yourself.
Benefits:
Allows external shaders to be split into a config file and a shader file
without hardcoding the config file name.
Less code.
Yes, I more or less used the same message as the dx11 one.
Don't use D3DX compile from file and compile from resource functions -
use the compile from memory function instead. It does the same thing,
except you have to set things up yourself.
Benefits:
Easier move to D3DCompile when it becomes necessary.
Allows external shaders to be split into a config file and a shader
file without hardcoding the config file name.
Less code.
OpenGL does not use the cdecl calling convention (which is the default
calling convention for GSdx on Windows). Since DebugOutputToFile is used
by OpenGL, it needs to use the same calling convention that OpenGL uses.
This fixes a debug build crash when the OpenGL renderers are used and
debug_opengl is nonzero in the ini.
A couple of useless members were removed too.
Also fix wnd initialization
Coverity:
CID 146955 (#1 of 1): Uninitialized pointer read (UNINIT)
18. uninit_use: Using uninitialized value wnd[i].
Coverity:
CID 146816 (#1 of 1): Calling risky function (DC.STREAM_BUFFER)
dont_call: fscanf(FILE *, char const *, ...) assumes an arbitrarily large string, so callers must use correct precision specifiers or never use fscanf(FILE *, char const *, ...)
Without this patch, if a user initiates a recording and then cancels at the GSdx
dialog, the audio was recording anyway, which is probably highly unexpected.
However, while probably highly unexpected, it could still be useful to record
only audio, but with this patch it's now impossible.
We can reconsider if it turns out that people are actually using this "feature",
though one might as well set the video setting to be very unobtrusive (very low
resolution/bitrate) such that it uses very little CPU.
This is the internal resolution which GSdx uses and recording at this resolution
is optimal, i.e. without any dumb scaling, with all relevant pixels and without
redundant pixels.
The resulting clip still doesn't have the correct aspect ratio set, but that's
just a property which can be set to the clip afterwards, which is where the DAR
becomes useful. Since it's usually anamorphic, when muxing later with the audio
use the DAR to set the playback aspect ratio.
gsdx changes:
Remove native resolution checkbox from GUI and rework associated code
Small changes to Windows and Linux GUI
Support 8x native resolution
Fix custom resolution width less than native width use case
Previously it was saving the display name to the config but trying to restore
according to the friendly name.
Now store and restore according to the "displayName" which is more unique than
"friendlyName" since it includes GUID[s], and handle it consistently as _bstr_t.
This fixes the following issues when custom resolution is selected.
- When the width is smaller than the native resolution width, the
texture cache targets are removed on every Vsync signal, causing a
black screen issue.
- The texture cache code needs a 1 returned for the custom resolution
upscale multiplier or there'll be some really funny graphical issues.
It also removes unnecessary GetConfig (which I think unconditionally
does a a file read on Windows) calls if the width was increased - the
upscale multiplier is already stored, and the custom resolution width
and height calls are now unnecessary.
Also fix some whitespace issues.
Before this patch, when recording Progressive (frame) mode, it recorded all
the frames correctly but set the clip's fps property to 25/29.97, so when
played back it played at half the speed (but was fine when played at double speed).
This patch does not affect the number of frames recorded per second, but rather
only sets the resulting clip fps property to the correct one (double than before).
Also fixes a bug that in a non-managed window in progressive mode, the title
displayed "200%" speed when it should have displayed 100% speed.
Fixes#832
upscale_multiplier function values have been changed to allocate native resolution and also move custom resolution to 9.
Remove the old native checkbox value and include Native in the combo box.
Internal GSDX functions have also been updated with this new update to the upscale_multiplier variable.
* Greatly reduce the number of clut read (factor 10x)
* Avoid to get wrong TEXA texture in the cache.
* Fix "jump depends on uninitialized variable" Valgrind warning.
Fix#748
I try my best to avoid any breakage of DX but please test it too.
* add lengthly comment to explain the format
* Likely reduce the number of shader permutation
* Avoid slow AEM (on GPU)
Expect regressions because TC needs some fixes
v2: fix palette mode
CID 146833 (#2-1 of 2): Division or modulo by zero (DIVIDE_BY_ZERO)
divide_by_zero: In expression this->m_width / this->m_upscale_multiplier, division by expression this->m_upscale_multiplier which may be zero has undefined behavior.
CID 146835 (#1 of 1): Division or modulo by float zero (DIVIDE_BY_ZERO)
50. divide_by_zero: In expression (float)(end - start) / (float)frame_number, division by expression frame_number which may be zero has undefined behavior