Thanks to @colepcsx2 (https://github.com/PCSX2/pcsx2/pull/1896#commitcomment-21858717) for pointing it out!
I also updated the prefix in the inferior video mode detection of GSdx, I'm not even sure why we need the videomode info on the plugin side, might be useful someday.
The shadeboost options text (Contrast, Brightness, Saturation) were not
grayed out when shadeboost was disabled, it was sort of inconsistent
compared to the behavior of external shader, so added grayouts to them
when shadeboost is disabled.
Also changed "OpenGL Very Advanced Custom Settings" to "OpenGL Advanced
Settings", the verbosity didn't help much in my opinion.
Split code in 2 parts
* Base class (GSWndEGL) that implement the core EGL and GL context
* Derived class (GSWndEGL_X11/GSWndEGL_WL) that implement the backend to handle native resources
Note: Most backend code is only useful for GSopen1/PS1 mode. GSopen2 only requires
the AttachNativeWindow implementation
Code is based around EGL_EXT_platform extension that allow to select the platform at runtime.
Note: I think the extension was integrated in EGL 1.5
The X11 backend was mostly converted to XCB
The wayland backend is only a placeholder for future code
I don't know if MS windows is/could be supported with EGL_EXT_platform API
Code validated on Mesa. Proprietary drivers aren't yet tested.
Moved a CRC hack to Aggressive that can cause or fix VRAM and RAM spikes.
This way people can switch between each config if they experience problems with either.
Varies on userconfig , game version and maybe hardware.
Added missing CRC game version for GT4 pal.
Note: The issue might be the same on GT3 and GTConcept , the code might
need to be removed for those games as well.
Hack was used to remove garbage data rectangles from popping up on screen when objects and characters were added to or removed from the world.
This issue is now being handled by OI_DoubleHalfClear in GSRendererHW.cpp, so the hack is no longe necessary and has been removed.
Moves Resident Evil 4 hack to Aggressive level as it is no longer
required to fix any issues, but does offer a decent speed boost.
Moves The Getaway & The Getaway Black Monday CRC hack to Full level, as the issue can be resolved
when using the OpenGL Hardware renderer.
* Windows behavior must be checked
* remove glsl_source.h
v2: fix missing include
Big thanks to Turtleli
v3:
fix indentation in gsdx-res.xml
add dependency in cmake
remove old res/glsl/fxaa.fx symlink
add tfx.cl for OpenCL support on Linux
v4, v5
fix cmake indentation
It is done automatically on Linux. Strings are much
better with this NULL char ;)
All credits go to turtleli
v2: increase resize instead of push_back NULL char
add checkboxes for the 2 "new" hacks
Wrap gs memory & merge postprocessing sprite
add tooltip for OpenGL options
v2: based on turtleli feedback
use gtk_scrolled_window_set_propagate_natural_height on GTK 3.22+
use the nicer GTK_CHECK_VERSION macro
Move GL_ARB_copy_image to optional for OpenGL SW render.
It will allow Ivy Bridge to work with OpenGL SW as it's not required.
Sandy Bridge is not yet tested , would be nice if someone could test.
Clip Control is only used for the HW renderer.
It will help Nvidia DX10 GPU on Windows. Potentially old AMD GPU too.
Unfortunately Ivy bridge still misses texture copy
Note on Linux, you can use the free Mesa driver.
Otherwise, it is time to save money for a future upgrade :)
Adds merge sprite hack to GSDx hacks dialog
And ports merge sprite hack to Direct3D renderers.
Special thanks to my keyboards Ctrl, c and v buttons for all their hard
work in porting this hack.
Ports the "Unscale Point and Line" hack to the Direct3D11 Hardware renderer.
And enables the "Unscale Point and Line" hack for Custom Resolutions with Direct3D11 and OpenGL.
Adds Windows GUI elements of the split texture filtering options.
Bilinear Texture Filtering is moved to the top section of the main GSdx window,
and Trilinear Filtering is moved to Hacks.
Adds Texture Filtering Of Display option to the Shader dialog window Windows UI.
Updates the layouts of the Shader and OSD dialog windows to more closely resemble the Linux GUI.
Reorganizes Hacks dialog window.
Adds UI elements for the Memory Wrapping and HPO v2/Special commits
Adds advanced OpenGL functions "Geometry Shader" and "Image Load Store" to the Windows UI.
Renames "Configure Hacks" to "Advanced Settings and Hacks", to more closely resemble the Linux GUI.
Adds GS Memory Wrapping hack to Windows. Enabling the hack will fix cut-off cutscenes in Wallace & Gromit: The Curse of the Were-Rabbit and Thrillville.
Ace Combat 4 CRC hack removes clouds for a good speed boost, which removes both 3D clouds(invisible with Hardware renderers, but cause slowdown) and 2D background clouds.
Removes blur from player airplane.
This hack also removes rockets, shows explosions(invisible without CRC hack) as garbage data, causes flickering issues with the HUD, and in some (night) missions removes the HUD altogether.
The CRC hack has been moved to the aggressive level.
Aggressive is misspeled several times in the file, this has been adressed.
Changes to the dependencies of the generated logo files did not trigger
a rebuild of the files. Use add_custom_command instead of
execute_process so build dependencies can be specified.
Also prevent the generated files from polluting the source directory.
The pixdata format loader has been removed from recent versions of
gdk2-pixbuf, so the logo doesn't load. Avoid preprocessing the data and
leave the logo as an embedded bitmap file.
Allow the output circuit saturation to take place at cases where one of the output circuit is enabled with frame mode rendering, I'm not sure it would be safe to allow saturations when both of the output circuits are enabled with frame mode rendering. Unlike field mode rendering, frame mode doesn't use identical rectangles at same co-ordinates for output in two alternating fields and potentially they could use a much bigger output size when both of the output circuits are enabled and are separated without any intersection. So let's limit the saturation to only the cases where we detect a single output circuit for frame mode rendering.
Fixes a regression in Devil May Cry 3 and Sky Gunner.
If a user switches renderer they also have to remember to change the CRC
hack level for the best user experience with the selected renderer.
This commit adds a new automatic CRC level that autoselects the
recommended CRC level for the selected renderer, so that a user doesn't
have to make the change manually.
coauthor: turtleli
If OpenGL software is the saved ini renderer and F9 is pressed to toggle
to the hardware renderer, depth emulation will be disabled. This fixes
that issue.
GSdx ogl: SSO Workaround for AMD buggy drivers
All 2017 drivers are now blacklisted.
The BSOD/crash issue is still there so don't set Blending Accuracy to None!
Shortened the message in the console making it more appealing.
-1 is only returned when there is an encoding error, and the va_list
argument is indeterminate after being passed to vsnprintf.
Use the return value to determine the buffer length, and call va_end and
then va_start before vsnprintf is called again.
ICO uses a depth of field effect for the fog. Depth is extracted
into the alpha channel of a texture. And then used as blending factor.
You need a 1:1 texture/pixel mapping otherwise you will line at boundaries.
In order to extract the DoF, ICO moves the depth buffer around the GS
memory. Memory moves are implemented in the not-scaled world. It means
that we can't have the above 1:1 ratio. And we don't know anymore that
data are coming from the current depth buffer.
The solution: I reused an HLE channel shader to read the depth buffer directly.
This way I have the guarantee that pixel/depth are aligned.
Close#1816
Otherwise you have a write before read typical race condition. It works
most of the time because textures are stored in temporary buffers (aka
texture cache). So the race condition requires texture invalidation in the mix.
I hope the perf impact will be small enough.
Fix#1691
Blood Will Tell: gray scale effect description
Frame is renderer in 0x700
Sync 0x700 (RT will be used as input)
Foreach page of frame
// The missing Sync was this one. You can't copy new data to 0x2800
// until you finish the rendering that use 0x2800 as input texture
// (AKA end of this foreach loop)
Sync 0x2800 (not the first iteration, texture will be used as a RT)
Copy page from 0x700+offset to 0x2800
Sync 0x2800 (RT will be used as input)
Render Effect line1 from 0x2800 to 0x700
Alpha test should only be disabled when writes to all of the alpha bits in the Framebuffer are masked. Fixes a regression in Dragon Ball Z: Budokai 3 scouter image rendering.
It allow to do the division before the size multiplication
It avoid a float overflow if T is too big.
Old behavior: (T * size) / Q
New behavior: (T / Q) * size
Performance Note:
* Rcp was replaced by a slow division (more accurate)
* At least we avoid a 2nd loop on the vertex buffer
It helps on Pro Soccer Club and Galerians Ash rendering
Tric Note:
SPRITE must be handled differently because the 'q' of first vertex could
be invalid
Bilinear applies to all renderer
* Common code done in GSVertexTrace
* Extend it with forced but sprite (trade-off between linear/upscale glitches)
* Linux GUI option was moved at the top with the renderer selection
Trilinear is moved to OGL hack
close#1837
Thanks to Flatout for the review and feedback.
It will take care to update the Window GUI :)
Typical bug, missing/wrong texture on the SW renderer but working fine on the HW renderer
Debugged on ATV Quad Power Racing 2 but I suspect couple of game are impacted
Bug description:
GSdx flatten the Q value of sprite. So m_vt.m_eq.q is true when Q(2N+1) are the same.
Q(2N) values could be random. The fix replaces Q0 by Q1 for the uniform Q value.
Previously, the NTSC saturation was also applied for double scan mode (Interlaced and Frame) where the developers send double the height to the DISP registers, saturation shouldn't be performed at such cases as the developers could send a value of 780 while the real size of the output would be 390 due to double scan mode. Doing the saturation later after identifying the real size also seems a bit counter-intuitive as we haven't discovered any cases where double scan games require the NTSC saturation hack. So let's just apply the saturation only for Interlaced (Field) Mode and omit the saturation step for other modes.
Isolate all the hacks into a separate subroutine and properly document about them, should make it easier for people to understand the display rectangle setup code, the hacks were totally messing up the readability of the function earlier.
The buffer contains extra room to avoid a segmentation fault due to an overflow.
Unfortunately the end of the buffer wasn't initialized which can lead to unexpected behavior.
Based on issue #1806 it could impact Guilty Gear X2
Mesa AMD was updated :)
all drivers[1] that support GL_ARB_shader_image_load_store got GL_ARB_clear_texture
[1] Intel driver misses others extensions to run GSdx
Rendering will be corrupted (for advance effects) if the driver doesn't support it.
However it allow to run with Mesa software emulation (or inside a virtual machine)
Note: mesa still requires an override of the buffer storage extension
MESA_EXTENSION_OVERRIDE=GL_ARB_buffer_storage
Previously, the combobox will reach an indeterminate state whenever it's passed with a value out of range via ComboBoxInit(). To avoid such cases, let's initialize the current selection of the combobox with the front element of the settings vector whenever we detect an out of range value which is not declared in the vector.
To reproduce the issue, set "Renderer" to some sort of crazy value like 50 in the GSdx.ini file and it'll mess up the whole GSdx plugin dialog really bad. This patch prevents such undesirable behavior by simply selecting the front element in the vector when we read an unsupported value.
Previously, the auto output circuit selection of the GSdx wasn't good, it simply defaulted to the second output circuit even when the first output circuit is also enabled. The new algorithm for auto selecting returns the merged rectangle dimensions when both of the output circuits are enabled and if the condition for merge is not satisfied then it returns the bigger output circuit.
I didn't fix all the warnings (purpose was to realign code with "recent" update)
Linux note: only miss 2 major items
* res/tfx.cl loading
* device descriptor
* And various bug fixes ;)
Previously, when F8 was triggered multiple times in a single second, the latest captured image would replace the previous captured one as it has the same name as the previous image.
The following patch detects such cases and adds a number along with the filename when new image capture is requested under the same time as the previous capture.
This reverts commit
d6383e6c21
It created a regression in Everybody's Golf 4/Hot Shots Golf 4, breaking the renderering when depth emulation is disabled/when using a Direct3D Hardware renderer.
The code is now a mirror of the ::add. So 1 insert == 1 erase
This way it won't crash on future update. And it will support future GS
memory wrapping improvement.
ZoE2:
RemoveAt overhead plummet to 0.5%. It was 17% !
However insertion is a bit slower. Due to the begin() after the push_front
v2: use std:: for lists and arrays
Removes Alpine Racer 3 hack. Issue has been resolved.
Moves NanoBreaker hack. Issue has been resolved for OpenGL and hack has
been moved to DX only.
Moves Tri-Ace games hacks. Hacks are also necessary for OpenGL with "Partial" CRC Hack Level to prevent massive slowdown.
Move Tales Of Legendia hack back as it's also necessary for OpenGL with "Partial" CRC Hack Level to prevent graphical issues.
Close: https://github.com/PCSX2/pcsx2/issues/1698
Added PAL and NTSC-U CRC's for Ar tonelico II.
Unfortunately it requires at least GCC6. If a nice guy can check the generated code on GCC6.
I don't know clang status.
Here the only example, I have found on the web
https://developers.redhat.com/blog/2016/02/25/new-asm-flags-feature-for-x86-in-gcc-6/
Current generated code in GSTextureCache::SourceMap::Add
38b3: bsf eax,esi
38b6: add esp,0x10
38b9: test esi,esi
38bb: jne 387e <GSTextureCache::SourceMap::Add(GSTextureCache::Source*, GIFRegTEX0 const&, GSOffset*)+0x6e>
BSF already set the Z flag when input (esi) is 0. So it would be better
to not put a silly add before the jump and to skip the test operation.
OMG, Zone of Ender got a speed boost from 11 fps to 45 fps
Seriously, the goal is to allow benchmarking GSdx without too much overhead of the main renderer draw call
Note: unlike the null renderer, texture/vertex uploading, 2D draw, texture conversions are still done.
* move the post-processing frame into the OSD tab
* Rename Global Settings to Renderer Settings
* put monitor and indicator check box on the same line
At least we have a similar number of options by tab
This reverts commit f77c1900fa.
Conflicts:
plugins/GSdx/GSTextureCache.cpp
Another fix was done later for Jak cut scene (or FMV). One game got a regression (don't remember which)
It's only ever updated after the queue is updated, so its state will
always lag slightly behind it. It's sufficient to just use empty().
This seems to fix some caching issues that were noticeable on Skylake
CPUs (#998).
In the previous code, the worker thread would notify the MTGS thread
while the mutex is still locked, which could cause the MTGS thread to
wake up and immediately go back to sleep again since it can't lock the
mutex.
Use a separate mutex for waiting, which avoids the issue.
Some PSX games seem to store image data of the drawing results in an undeterminate area out of range from the current context buffer. At such cases, calculate the height of both the frame memory rectangles combined.
What happens on "Crash bash" -
* At first draw, scissoring is limited to SCAY0- 0 & SCAY1- 255
* At second draw, scissoring is limited to SCAY0- 255 & SCAY0-511
Previously, we limited the height to the value of one single output texture, so instead of that let's calculate the total height of both the two buffers combined to prevent such issues.
Previously, we only calculated the width of a single output circuit which lead to missing a single pixel from the other output circuit which in turn causes offset issues in Persona games, I have customized GetDisplayRect() to now also calculate the dimensions of the merged rectangle when both the output circuits are enabled through the PMODE register, so this hack is no longer needed. :)
TL;DR - The above commit of mine accurately handles the offset issues by calculating union of the rects, removing this stupid hack. (not insulting any other developers, this stupid hack was mine :)
Passes the merged output circuit as the base size for texture cache scaling code. Helps fixing scaling issues where games use both of the output circuits for rendering.
Future Note: Alter the behavior of IsEnabled() check always preferring the second output circuit for some weird reason. I plan on changing it to a better auto-output circuit selection mechanism but that could probably be done some time in the future.
* Upgrade the counter to signed 32 bits. 16 bits is too small to contains the 64K value.
* Read ThreadProc/m_count when the mutex is locked
* Use old value of the fetch instead to read back the new value
Use a reinterpret_cast instead of casting the function pointer address
to a void** and dereferencing it.
Also remove an unnecessary (void) and avoid including stdafx.h.
Don't adjust 'image' and just use an additional offset.
'success' was kinda unnecessary when true or false could just be
directly returned.
Move 'compression' clamping out to GSPng::Save instead.
And throw in a whole bunch of const for good measure.
JMMT uses a bigger display height on NTSC progressive scan mode, which is not really unusual hence adjust the saturation hack to only take effect on interlaced NTSC mode.
However, the whole double screen issue on FMV still exists. As a bit of information, this game has the second output disabled but seems to have some valid data inside of it, maybe the second output data is leaked into the first one? most likely a bug in the frambuffer data management rather than a CRTC issue (needs to be investigated)
Previously, the height of the frame offset was also considered for the total height of the texture which was obviously wrong as the portion before the offset value isn't part of the frame memory.
In the previous code, the threads were created and destroyed in the base
class constructor and destructor, so the threads could potentially be
active while the object is in a partially constructed or destroyed state.
The thread however, relies on a virtual function to process the queue
items, and the vtable might not be in the desired state when the object
is partially constructed or destroyed.
This probably only matters during object destruction - no items are in
the queue during object construction so the virtual function won't be
called, but items may still be queued up when the destructor is called,
so the virtual function can be called. It wasn't an issue because all
uses of the thread explicitly waited for the queues to be empty before
invoking the destructor.
Adjust the constructor to take a std::function parameter, which the
thread will use instead to process queue items, and avoid inheriting
from the GSJobQueue class. This will also eliminate the need to
explicitly wait for all jobs to finish (unless there are other external
factors, of course), which would probably make future code safer.
The previous implementation of HPO adds an offset on vertex position. It
doesn't always work beside it moves the rendering window.
The new implementation will add a texture offset so that instead to sample
the middle of the GS texel, we will sample the middle of the real texture texel.
It must be manually enabled with
* UserHacks_HalfPixelOffset_New = 1 (keep a small offset as intended by GS effect)
* UserHacks_HalfPixelOffset_New = 2 (no offset)
v2: always apply a 0.5 offset in case of float coordinates (Tales of Abyss)
Might break other games but few of them uses float coordinates to read
back the target
Doesn't fully work yet
* Unknown stack frame
* Outside any known module
Potential root cause:
* Nvidia driver
* VU code as ebp is required for emulation so likely no frame
* Use POD type to avoid SSE/AVX compilation dependency
* global object to reduce cache miss
* dynamically object so give a chance to allocate below 2GB (allow x64
optimization)
tex address is a3
vm address is a1
Could help to avoid REX prefix
Reduce prologue/epilogue register copy
Byte code size 41893 => 38912 (on my testcase)
* bin2hex.h is removed
* vptest/vpblendvb YMM support integrated upsteam
* better support of rip for 64 bits
* AVX512 support (only miss the CPU now)
Local change: add BSD3 clause
It will requires a generic (register naming) linear interpolation to use it properly
Gather instruction requires an extra mask register therefore all registers name will be shuffled
Perf wise, initial haswell implementation seems to be microcode emulated.
Based on Gabest's work.
* Miss mipmap
Note: dithering info
It is a bit tricky as a2 on linux was rdx register which overlap with fzm (dh/dl)
It might require dedicated windows code
The main FindMinMax methods is perf critical so instead I created a separate function
to ensure the constness of the depth
Fix letter regression on Xenosaga3
* Explicitly cast w_pages and h_pages into uint32.
* Prevent signed/unsigned comparison by converting lod into unsigned integer, honestly how coud a mipmapping level be negative?
Previously the dedicated custom resolution scaling equation was ignored for the second SetScale() call, generalizing the equations will also fix the DMC scaling issue on custom resolution. Also remove unnecessary checks for null on scale factors. The possibility for having a null scale factor value only exists on custom resolution and it will only happen on cases where the output circuit isn't ready yet. So the ideal way would be to handle all the required conditions of output circuit on "m_renderer->CanUpscale()" itself.
Cost ought to remain small. Worst case is 2 extra "and" operation by group of pixels in scanline renderer
I think PixelAddressN functions are mostly call in the init.
CloseThread is called in the GSJobQueue destructor, so don't call it
again in the GSThread destructor.
Fixes#392, which was caused by a use after free.
Also prevents pthread_join() from being called twice for each thread
on non-Windows operating systems, which is undefined behaviour.
Code can be enabled with "wrap_gs_mem = 1". Code only allow a single shared memory but
I don't think we need more anyway.
Linux only, Kernel panic expected with the HW renderer.
Fix FMV on Silent Hill 3 with the SW renderer
Hardcode location of interface to the location 0. If I understand the
spec correctly (unlikely), variable in interface will get successive
location.
Goal is to reduce driver work. Instead to compute some location based on
name matching approach (and silly validation), the driver can now use
static allocation.
Tests on future Mesa 13 are welcome
Just clear the buffer. The generic solution will be a copy from buffer A
to buffer B But it requires
1/ a big buffer A (otherwise it would overflow)
2/ a line width rescaling (+ the upscaling mess support)
* Add full PMODE register to replace slbg/mmod
* Add full EXTBUF register (will allow to emulate write feedback)
* Add a third source (which will actually be the destination of the
write feedback)
mipmap option 3. Actually maybe a separate tri-linear option will be better
m_mipmap == 2 => use manual PS2 trilinear/mipmap
Otherwise
m_filter == 3 => always use full automatic trilinear interpolation
m_filter == 4 => use automatic trilinear interpolation when PS2 uses mipmap
m_filter == 5 => like 4 but force bilinear interpolation inside layer
Fix Berserk #1526 Well done guys but we're more clever than you ;)
So instead to mask the color channels as any guy that RTFM, they decided to use the illegal 8H frame format