Using range loops where possible (correctly).
Using auto where possible (minimize code changes whenever it's decided to change back to a std container).
Use more efficient erase pattern (where possible).
Minor code tweaks.
PCSX2 sends a negative value (-1) to GSdx when adaptive mode is
specified for Vsync, this mode is exclusive to OpenGL at the moment
and is unimplemented on the D3D11 renderer. Also the present function
of swapchain only accepts values from 0 to 4 as parameter, hence
passing negative values to the function is undefined behavior.
So let's fallback to standard synchronization method on D3D11 when
PCSX2 requests for adaptive mode.
Fix CRC hacks on PAL version.
PAL version will no longer experience very high brightness/contrast
issues on stages in hw mode caused by an incorrect CRC hack.
Moved a CRC hack back to OpenGL mode only for the PAL version
because texture shuffling does not work properly on PAL games.
Forgot to replace `IDC_TEXT` with `IDC_VALUE` macros, due to this the
text containing the name of the options was being updated with the
current value of the option instead of updating the text designated for
holding the values.
DBY isn't an offset to the frame memory but rather an offset to read
output circuit inside the frame memory, hence the top offset should also
be calculated for the total height of the frame memory. Fixes software
mode regression in Beyond Good and Evil.
Also handle cases when GetFrameRect() is called without any paramerer to
avoid an illegal value access violation on the DISP register.
output 1 strip of 2 triangles instead of 2 strips of 1 triangle.
Potentially it would reduce the geometry shader overhead. And it
might avoid a middle line in sprite in some AMD GPU/driver/OS
bad combination
GSdx-ogl: Console messages v2
Follow up to
commit/ec63b04719fd9c05a6aeeacb55dc1c54f5ef145b
Add intel broken driver wiki link message in console (OpenGL).
Print intel / amd buggy driver message once in console (OpenGL).
Pring texture barrier and viewpoint array info once in console (OpenGL).
Enables character outlines to partially work on Full CRC.
DX9 has a small issue where a small black line at the bottom shakes when outlines are enabled.
You can either use Aggressive crc or x,y offset to fix the issue.
Removed unnecessary crc hack that caused shadows on stationary objects (trees) to move on Direct3D in a weird
motion blur type way when the player moved slightly.
* Cast return value of IsEof() to bool. (Avoids int -> bool performance
warning error)
* Cast field and index to the required parameter type of AppendRawData.
The sprite geometry shader was still being used even if the sprites were
converted on the CPUs.
Convert all sprites using the GPU - the fix isn't ideal, but it'll
likely have to do unless someone feels like porting over more of the
OpenGL changes to the D3D11 renderer.
Closes#1921.
Class member variables are initialised in order of declaration in the
class definition. Move native_buffer to the top of the class definition
to avoid initialising m_width and m_height to random values.
Move the custom resolution scaling code to a separate subroutine and
allow future RT buffer resize calls when the buffer size isn't enough.
(Example: when a game's CRTC/Framebuffer size changes. The older code
didn't consider such cases)
Added a more robust buffer size calculation mechanism for custom
resolutions. Improves performance in higher resolutions for games
which don't need a big buffer. There's a great boost in performance
at GS limited scenarios.
I don't even feel there's a need for the large framebuffer option right
now, For future - I plan on making the large framebuffer enabled version
as the default as the overhead is there only at situations when it's
necessary. Until then keeping the original code just to be on the safe
side in case any issue pops up.
An EOF only occurs after attempting to read past the end of the file.
Account for this correctly, which fixes a potential infinite loop when
reading back an xz compressed GS dump.
Add the bitfield structure of the undocumented SYNCV register,
potentially might be useful in proper height determination of the output
circuit for some weird games which still get it wrong but still haven't
figured out how it might be useful. Maybe some sort of black magic
formula with the vertical synchronization values?
The differential phase value seems to closely resemble the display
height value of the video modes (480 for NTSC, 576 for PAL) but after
some investigating into the differential phase, I have no clue on how
they might be even related. Hopefully the mystery will be unveiled in
the near future.
Reduce disk space. Easy to share.
It would be nice to port the code to Windows.
libzma code was taken from https://git.tukaani.org/xz.git
Note: only short dumps are supported so far. Big dump will freeze the interface during the compression.
Or will suck all the RAM.
Note2: a multithreaded encoder would badly impact the compression ratio
Thanks to Turtleli for all review comments
* replace gtk_table by gtk_grid
=> it still misses some paddings
* Use 3.22 monitor API to query screen size
=> need to be tested
* directly add scrolled windows into a container without bothering with
the viewport.
Code compile fine but wasn't tested.
v2: disable the code until I (or someone) get a chance to test and fix it.
Thanks to @colepcsx2 (https://github.com/PCSX2/pcsx2/pull/1896#commitcomment-21858717) for pointing it out!
I also updated the prefix in the inferior video mode detection of GSdx, I'm not even sure why we need the videomode info on the plugin side, might be useful someday.
The shadeboost options text (Contrast, Brightness, Saturation) were not
grayed out when shadeboost was disabled, it was sort of inconsistent
compared to the behavior of external shader, so added grayouts to them
when shadeboost is disabled.
Also changed "OpenGL Very Advanced Custom Settings" to "OpenGL Advanced
Settings", the verbosity didn't help much in my opinion.
Split code in 2 parts
* Base class (GSWndEGL) that implement the core EGL and GL context
* Derived class (GSWndEGL_X11/GSWndEGL_WL) that implement the backend to handle native resources
Note: Most backend code is only useful for GSopen1/PS1 mode. GSopen2 only requires
the AttachNativeWindow implementation
Code is based around EGL_EXT_platform extension that allow to select the platform at runtime.
Note: I think the extension was integrated in EGL 1.5
The X11 backend was mostly converted to XCB
The wayland backend is only a placeholder for future code
I don't know if MS windows is/could be supported with EGL_EXT_platform API
Code validated on Mesa. Proprietary drivers aren't yet tested.
Moved a CRC hack to Aggressive that can cause or fix VRAM and RAM spikes.
This way people can switch between each config if they experience problems with either.
Varies on userconfig , game version and maybe hardware.
Added missing CRC game version for GT4 pal.
Note: The issue might be the same on GT3 and GTConcept , the code might
need to be removed for those games as well.
Hack was used to remove garbage data rectangles from popping up on screen when objects and characters were added to or removed from the world.
This issue is now being handled by OI_DoubleHalfClear in GSRendererHW.cpp, so the hack is no longe necessary and has been removed.
Moves Resident Evil 4 hack to Aggressive level as it is no longer
required to fix any issues, but does offer a decent speed boost.
Moves The Getaway & The Getaway Black Monday CRC hack to Full level, as the issue can be resolved
when using the OpenGL Hardware renderer.
* Windows behavior must be checked
* remove glsl_source.h
v2: fix missing include
Big thanks to Turtleli
v3:
fix indentation in gsdx-res.xml
add dependency in cmake
remove old res/glsl/fxaa.fx symlink
add tfx.cl for OpenCL support on Linux
v4, v5
fix cmake indentation
It is done automatically on Linux. Strings are much
better with this NULL char ;)
All credits go to turtleli
v2: increase resize instead of push_back NULL char
add checkboxes for the 2 "new" hacks
Wrap gs memory & merge postprocessing sprite
add tooltip for OpenGL options
v2: based on turtleli feedback
use gtk_scrolled_window_set_propagate_natural_height on GTK 3.22+
use the nicer GTK_CHECK_VERSION macro
Move GL_ARB_copy_image to optional for OpenGL SW render.
It will allow Ivy Bridge to work with OpenGL SW as it's not required.
Sandy Bridge is not yet tested , would be nice if someone could test.
Clip Control is only used for the HW renderer.
It will help Nvidia DX10 GPU on Windows. Potentially old AMD GPU too.
Unfortunately Ivy bridge still misses texture copy
Note on Linux, you can use the free Mesa driver.
Otherwise, it is time to save money for a future upgrade :)
Adds merge sprite hack to GSDx hacks dialog
And ports merge sprite hack to Direct3D renderers.
Special thanks to my keyboards Ctrl, c and v buttons for all their hard
work in porting this hack.
Ports the "Unscale Point and Line" hack to the Direct3D11 Hardware renderer.
And enables the "Unscale Point and Line" hack for Custom Resolutions with Direct3D11 and OpenGL.
Adds Windows GUI elements of the split texture filtering options.
Bilinear Texture Filtering is moved to the top section of the main GSdx window,
and Trilinear Filtering is moved to Hacks.
Adds Texture Filtering Of Display option to the Shader dialog window Windows UI.
Updates the layouts of the Shader and OSD dialog windows to more closely resemble the Linux GUI.
Reorganizes Hacks dialog window.
Adds UI elements for the Memory Wrapping and HPO v2/Special commits
Adds advanced OpenGL functions "Geometry Shader" and "Image Load Store" to the Windows UI.
Renames "Configure Hacks" to "Advanced Settings and Hacks", to more closely resemble the Linux GUI.
Adds GS Memory Wrapping hack to Windows. Enabling the hack will fix cut-off cutscenes in Wallace & Gromit: The Curse of the Were-Rabbit and Thrillville.
Ace Combat 4 CRC hack removes clouds for a good speed boost, which removes both 3D clouds(invisible with Hardware renderers, but cause slowdown) and 2D background clouds.
Removes blur from player airplane.
This hack also removes rockets, shows explosions(invisible without CRC hack) as garbage data, causes flickering issues with the HUD, and in some (night) missions removes the HUD altogether.
The CRC hack has been moved to the aggressive level.
Aggressive is misspeled several times in the file, this has been adressed.
Changes to the dependencies of the generated logo files did not trigger
a rebuild of the files. Use add_custom_command instead of
execute_process so build dependencies can be specified.
Also prevent the generated files from polluting the source directory.
The pixdata format loader has been removed from recent versions of
gdk2-pixbuf, so the logo doesn't load. Avoid preprocessing the data and
leave the logo as an embedded bitmap file.
Allow the output circuit saturation to take place at cases where one of the output circuit is enabled with frame mode rendering, I'm not sure it would be safe to allow saturations when both of the output circuits are enabled with frame mode rendering. Unlike field mode rendering, frame mode doesn't use identical rectangles at same co-ordinates for output in two alternating fields and potentially they could use a much bigger output size when both of the output circuits are enabled and are separated without any intersection. So let's limit the saturation to only the cases where we detect a single output circuit for frame mode rendering.
Fixes a regression in Devil May Cry 3 and Sky Gunner.
If a user switches renderer they also have to remember to change the CRC
hack level for the best user experience with the selected renderer.
This commit adds a new automatic CRC level that autoselects the
recommended CRC level for the selected renderer, so that a user doesn't
have to make the change manually.
coauthor: turtleli
If OpenGL software is the saved ini renderer and F9 is pressed to toggle
to the hardware renderer, depth emulation will be disabled. This fixes
that issue.
GSdx ogl: SSO Workaround for AMD buggy drivers
All 2017 drivers are now blacklisted.
The BSOD/crash issue is still there so don't set Blending Accuracy to None!
Shortened the message in the console making it more appealing.
-1 is only returned when there is an encoding error, and the va_list
argument is indeterminate after being passed to vsnprintf.
Use the return value to determine the buffer length, and call va_end and
then va_start before vsnprintf is called again.
ICO uses a depth of field effect for the fog. Depth is extracted
into the alpha channel of a texture. And then used as blending factor.
You need a 1:1 texture/pixel mapping otherwise you will line at boundaries.
In order to extract the DoF, ICO moves the depth buffer around the GS
memory. Memory moves are implemented in the not-scaled world. It means
that we can't have the above 1:1 ratio. And we don't know anymore that
data are coming from the current depth buffer.
The solution: I reused an HLE channel shader to read the depth buffer directly.
This way I have the guarantee that pixel/depth are aligned.
Close#1816
Otherwise you have a write before read typical race condition. It works
most of the time because textures are stored in temporary buffers (aka
texture cache). So the race condition requires texture invalidation in the mix.
I hope the perf impact will be small enough.
Fix#1691
Blood Will Tell: gray scale effect description
Frame is renderer in 0x700
Sync 0x700 (RT will be used as input)
Foreach page of frame
// The missing Sync was this one. You can't copy new data to 0x2800
// until you finish the rendering that use 0x2800 as input texture
// (AKA end of this foreach loop)
Sync 0x2800 (not the first iteration, texture will be used as a RT)
Copy page from 0x700+offset to 0x2800
Sync 0x2800 (RT will be used as input)
Render Effect line1 from 0x2800 to 0x700
Alpha test should only be disabled when writes to all of the alpha bits in the Framebuffer are masked. Fixes a regression in Dragon Ball Z: Budokai 3 scouter image rendering.
It allow to do the division before the size multiplication
It avoid a float overflow if T is too big.
Old behavior: (T * size) / Q
New behavior: (T / Q) * size
Performance Note:
* Rcp was replaced by a slow division (more accurate)
* At least we avoid a 2nd loop on the vertex buffer
It helps on Pro Soccer Club and Galerians Ash rendering
Tric Note:
SPRITE must be handled differently because the 'q' of first vertex could
be invalid
Bilinear applies to all renderer
* Common code done in GSVertexTrace
* Extend it with forced but sprite (trade-off between linear/upscale glitches)
* Linux GUI option was moved at the top with the renderer selection
Trilinear is moved to OGL hack
close#1837
Thanks to Flatout for the review and feedback.
It will take care to update the Window GUI :)
Typical bug, missing/wrong texture on the SW renderer but working fine on the HW renderer
Debugged on ATV Quad Power Racing 2 but I suspect couple of game are impacted
Bug description:
GSdx flatten the Q value of sprite. So m_vt.m_eq.q is true when Q(2N+1) are the same.
Q(2N) values could be random. The fix replaces Q0 by Q1 for the uniform Q value.
Previously, the NTSC saturation was also applied for double scan mode (Interlaced and Frame) where the developers send double the height to the DISP registers, saturation shouldn't be performed at such cases as the developers could send a value of 780 while the real size of the output would be 390 due to double scan mode. Doing the saturation later after identifying the real size also seems a bit counter-intuitive as we haven't discovered any cases where double scan games require the NTSC saturation hack. So let's just apply the saturation only for Interlaced (Field) Mode and omit the saturation step for other modes.
Isolate all the hacks into a separate subroutine and properly document about them, should make it easier for people to understand the display rectangle setup code, the hacks were totally messing up the readability of the function earlier.
The buffer contains extra room to avoid a segmentation fault due to an overflow.
Unfortunately the end of the buffer wasn't initialized which can lead to unexpected behavior.
Based on issue #1806 it could impact Guilty Gear X2
Mesa AMD was updated :)
all drivers[1] that support GL_ARB_shader_image_load_store got GL_ARB_clear_texture
[1] Intel driver misses others extensions to run GSdx
Rendering will be corrupted (for advance effects) if the driver doesn't support it.
However it allow to run with Mesa software emulation (or inside a virtual machine)
Note: mesa still requires an override of the buffer storage extension
MESA_EXTENSION_OVERRIDE=GL_ARB_buffer_storage
Previously, the combobox will reach an indeterminate state whenever it's passed with a value out of range via ComboBoxInit(). To avoid such cases, let's initialize the current selection of the combobox with the front element of the settings vector whenever we detect an out of range value which is not declared in the vector.
To reproduce the issue, set "Renderer" to some sort of crazy value like 50 in the GSdx.ini file and it'll mess up the whole GSdx plugin dialog really bad. This patch prevents such undesirable behavior by simply selecting the front element in the vector when we read an unsupported value.
Previously, the auto output circuit selection of the GSdx wasn't good, it simply defaulted to the second output circuit even when the first output circuit is also enabled. The new algorithm for auto selecting returns the merged rectangle dimensions when both of the output circuits are enabled and if the condition for merge is not satisfied then it returns the bigger output circuit.
I didn't fix all the warnings (purpose was to realign code with "recent" update)
Linux note: only miss 2 major items
* res/tfx.cl loading
* device descriptor
* And various bug fixes ;)
Previously, when F8 was triggered multiple times in a single second, the latest captured image would replace the previous captured one as it has the same name as the previous image.
The following patch detects such cases and adds a number along with the filename when new image capture is requested under the same time as the previous capture.
This reverts commit
d6383e6c21
It created a regression in Everybody's Golf 4/Hot Shots Golf 4, breaking the renderering when depth emulation is disabled/when using a Direct3D Hardware renderer.
The code is now a mirror of the ::add. So 1 insert == 1 erase
This way it won't crash on future update. And it will support future GS
memory wrapping improvement.
ZoE2:
RemoveAt overhead plummet to 0.5%. It was 17% !
However insertion is a bit slower. Due to the begin() after the push_front
v2: use std:: for lists and arrays
Removes Alpine Racer 3 hack. Issue has been resolved.
Moves NanoBreaker hack. Issue has been resolved for OpenGL and hack has
been moved to DX only.
Moves Tri-Ace games hacks. Hacks are also necessary for OpenGL with "Partial" CRC Hack Level to prevent massive slowdown.
Move Tales Of Legendia hack back as it's also necessary for OpenGL with "Partial" CRC Hack Level to prevent graphical issues.
Close: https://github.com/PCSX2/pcsx2/issues/1698
Added PAL and NTSC-U CRC's for Ar tonelico II.
Unfortunately it requires at least GCC6. If a nice guy can check the generated code on GCC6.
I don't know clang status.
Here the only example, I have found on the web
https://developers.redhat.com/blog/2016/02/25/new-asm-flags-feature-for-x86-in-gcc-6/
Current generated code in GSTextureCache::SourceMap::Add
38b3: bsf eax,esi
38b6: add esp,0x10
38b9: test esi,esi
38bb: jne 387e <GSTextureCache::SourceMap::Add(GSTextureCache::Source*, GIFRegTEX0 const&, GSOffset*)+0x6e>
BSF already set the Z flag when input (esi) is 0. So it would be better
to not put a silly add before the jump and to skip the test operation.
OMG, Zone of Ender got a speed boost from 11 fps to 45 fps
Seriously, the goal is to allow benchmarking GSdx without too much overhead of the main renderer draw call
Note: unlike the null renderer, texture/vertex uploading, 2D draw, texture conversions are still done.
* move the post-processing frame into the OSD tab
* Rename Global Settings to Renderer Settings
* put monitor and indicator check box on the same line
At least we have a similar number of options by tab
This reverts commit f77c1900fa.
Conflicts:
plugins/GSdx/GSTextureCache.cpp
Another fix was done later for Jak cut scene (or FMV). One game got a regression (don't remember which)
It's only ever updated after the queue is updated, so its state will
always lag slightly behind it. It's sufficient to just use empty().
This seems to fix some caching issues that were noticeable on Skylake
CPUs (#998).
In the previous code, the worker thread would notify the MTGS thread
while the mutex is still locked, which could cause the MTGS thread to
wake up and immediately go back to sleep again since it can't lock the
mutex.
Use a separate mutex for waiting, which avoids the issue.
Some PSX games seem to store image data of the drawing results in an undeterminate area out of range from the current context buffer. At such cases, calculate the height of both the frame memory rectangles combined.
What happens on "Crash bash" -
* At first draw, scissoring is limited to SCAY0- 0 & SCAY1- 255
* At second draw, scissoring is limited to SCAY0- 255 & SCAY0-511
Previously, we limited the height to the value of one single output texture, so instead of that let's calculate the total height of both the two buffers combined to prevent such issues.
Previously, we only calculated the width of a single output circuit which lead to missing a single pixel from the other output circuit which in turn causes offset issues in Persona games, I have customized GetDisplayRect() to now also calculate the dimensions of the merged rectangle when both the output circuits are enabled through the PMODE register, so this hack is no longer needed. :)
TL;DR - The above commit of mine accurately handles the offset issues by calculating union of the rects, removing this stupid hack. (not insulting any other developers, this stupid hack was mine :)
Passes the merged output circuit as the base size for texture cache scaling code. Helps fixing scaling issues where games use both of the output circuits for rendering.
Future Note: Alter the behavior of IsEnabled() check always preferring the second output circuit for some weird reason. I plan on changing it to a better auto-output circuit selection mechanism but that could probably be done some time in the future.
* Upgrade the counter to signed 32 bits. 16 bits is too small to contains the 64K value.
* Read ThreadProc/m_count when the mutex is locked
* Use old value of the fetch instead to read back the new value
Use a reinterpret_cast instead of casting the function pointer address
to a void** and dereferencing it.
Also remove an unnecessary (void) and avoid including stdafx.h.
Don't adjust 'image' and just use an additional offset.
'success' was kinda unnecessary when true or false could just be
directly returned.
Move 'compression' clamping out to GSPng::Save instead.
And throw in a whole bunch of const for good measure.
JMMT uses a bigger display height on NTSC progressive scan mode, which is not really unusual hence adjust the saturation hack to only take effect on interlaced NTSC mode.
However, the whole double screen issue on FMV still exists. As a bit of information, this game has the second output disabled but seems to have some valid data inside of it, maybe the second output data is leaked into the first one? most likely a bug in the frambuffer data management rather than a CRTC issue (needs to be investigated)
Previously, the height of the frame offset was also considered for the total height of the texture which was obviously wrong as the portion before the offset value isn't part of the frame memory.
In the previous code, the threads were created and destroyed in the base
class constructor and destructor, so the threads could potentially be
active while the object is in a partially constructed or destroyed state.
The thread however, relies on a virtual function to process the queue
items, and the vtable might not be in the desired state when the object
is partially constructed or destroyed.
This probably only matters during object destruction - no items are in
the queue during object construction so the virtual function won't be
called, but items may still be queued up when the destructor is called,
so the virtual function can be called. It wasn't an issue because all
uses of the thread explicitly waited for the queues to be empty before
invoking the destructor.
Adjust the constructor to take a std::function parameter, which the
thread will use instead to process queue items, and avoid inheriting
from the GSJobQueue class. This will also eliminate the need to
explicitly wait for all jobs to finish (unless there are other external
factors, of course), which would probably make future code safer.
The previous implementation of HPO adds an offset on vertex position. It
doesn't always work beside it moves the rendering window.
The new implementation will add a texture offset so that instead to sample
the middle of the GS texel, we will sample the middle of the real texture texel.
It must be manually enabled with
* UserHacks_HalfPixelOffset_New = 1 (keep a small offset as intended by GS effect)
* UserHacks_HalfPixelOffset_New = 2 (no offset)
v2: always apply a 0.5 offset in case of float coordinates (Tales of Abyss)
Might break other games but few of them uses float coordinates to read
back the target
Doesn't fully work yet
* Unknown stack frame
* Outside any known module
Potential root cause:
* Nvidia driver
* VU code as ebp is required for emulation so likely no frame
* Use POD type to avoid SSE/AVX compilation dependency
* global object to reduce cache miss
* dynamically object so give a chance to allocate below 2GB (allow x64
optimization)
tex address is a3
vm address is a1
Could help to avoid REX prefix
Reduce prologue/epilogue register copy
Byte code size 41893 => 38912 (on my testcase)
* bin2hex.h is removed
* vptest/vpblendvb YMM support integrated upsteam
* better support of rip for 64 bits
* AVX512 support (only miss the CPU now)
Local change: add BSD3 clause