Fixes Coverity CID 127721: Program hangs
Change the sleep to a condition variable wait, which has the added
benefit of allowing the plugin to close ever so slightly faster if
there's no disc in the drive.
tex address is a3
vm address is a1
Could help to avoid REX prefix
Reduce prologue/epilogue register copy
Byte code size 41893 => 38912 (on my testcase)
* bin2hex.h is removed
* vptest/vpblendvb YMM support integrated upsteam
* better support of rip for 64 bits
* AVX512 support (only miss the CPU now)
Local change: add BSD3 clause
It will requires a generic (register naming) linear interpolation to use it properly
Gather instruction requires an extra mask register therefore all registers name will be shuffled
Perf wise, initial haswell implementation seems to be microcode emulated.
Based on Gabest's work.
* Miss mipmap
Note: dithering info
It is a bit tricky as a2 on linux was rdx register which overlap with fzm (dh/dl)
It might require dedicated windows code
The main FindMinMax methods is perf critical so instead I created a separate function
to ensure the constness of the depth
Fix letter regression on Xenosaga3
Also refactor the default drive selection and GUI code so optical drive
detection is shared.
Note: This breaks the current config, but there's only one setting
anyway.
Don't use a RAW_READ_INFO struct when only the LARGE_INTEGER member is
used. Use SetFilePointerEx which is slightly simpler and doesn't require
checking GetLastError() in some circumstances to check whether the read
has actually failed.
Also use a mutex to prevent simultaneous access from both the read
thread and the keepalive thread to prevent overlapping SetFilePointerEx
calls from causing the wrong data to be read.
And print error messages should a failure occur.
Also set the max drive speed to 4x DVD and 24xCD (down from 8x DVD and
36x CD) - it seems to reduce pausing slightly since the drive doesn't
require as much time to spin up to the desired speed.
Also set the disc speed at the correct time - CDROM SET SPEED only stays
in effect till the disc is removed.
Also fix a memleak in CDVDopen when the drive cannot be accessed.
It's rather unnecessary to use the same ioctls multiple times per disc
when the info returned doesn't change. Just use each ioctl once and
read/calculate all the necessary info all at onace.
This also fixes an issue where the IOCTL_DVD_START_SESSION ioctl is
repeatedly used if the returned session ID is 0. The previous code
assumed that 0 was not a valid session ID and would repeatedly use the
ioctl to obtain a non-zero session ID. However, 0 is a valid session ID,
and it seems IOCTL_DVD_START_SESSION can repeatedly return a 0 session
ID even if the corresponding IOCTL_DVD_END_SESSION has not been called.
In our case, a DVD session is only necessary for DVD detection and
reading the physical format information. This fix seems to alter drive
speed behaviour.
There doesn't seem to be any issues calling CreateFile with
GENERIC_WRITE access (which is necessary for SPTI) on a standard user
account, so the SPTI code should work in all cases.
Adds separate bindings for each of the pad types (DualShock2,
Guitar,Pop'n Music). This allows the user to change the button
configuration to better suit the Guitar and Pop'n Music pads without
messing up the bindings already setup for the DS2.
Close#1576.
* Explicitly cast w_pages and h_pages into uint32.
* Prevent signed/unsigned comparison by converting lod into unsigned integer, honestly how coud a mipmapping level be negative?
Previously the dedicated custom resolution scaling equation was ignored for the second SetScale() call, generalizing the equations will also fix the DMC scaling issue on custom resolution. Also remove unnecessary checks for null on scale factors. The possibility for having a null scale factor value only exists on custom resolution and it will only happen on cases where the output circuit isn't ready yet. So the ideal way would be to handle all the required conditions of output circuit on "m_renderer->CanUpscale()" itself.
Cost ought to remain small. Worst case is 2 extra "and" operation by group of pixels in scanline renderer
I think PixelAddressN functions are mostly call in the init.
CloseThread is called in the GSJobQueue destructor, so don't call it
again in the GSThread destructor.
Fixes#392, which was caused by a use after free.
Also prevents pthread_join() from being called twice for each thread
on non-Windows operating systems, which is undefined behaviour.
Code can be enabled with "wrap_gs_mem = 1". Code only allow a single shared memory but
I don't think we need more anyway.
Linux only, Kernel panic expected with the HW renderer.
Fix FMV on Silent Hill 3 with the SW renderer
Hardcode location of interface to the location 0. If I understand the
spec correctly (unlikely), variable in interface will get successive
location.
Goal is to reduce driver work. Instead to compute some location based on
name matching approach (and silly validation), the driver can now use
static allocation.
Tests on future Mesa 13 are welcome
Just clear the buffer. The generic solution will be a copy from buffer A
to buffer B But it requires
1/ a big buffer A (otherwise it would overflow)
2/ a line width rescaling (+ the upscaling mess support)
* Add full PMODE register to replace slbg/mmod
* Add full EXTBUF register (will allow to emulate write feedback)
* Add a third source (which will actually be the destination of the
write feedback)
mipmap option 3. Actually maybe a separate tri-linear option will be better
m_mipmap == 2 => use manual PS2 trilinear/mipmap
Otherwise
m_filter == 3 => always use full automatic trilinear interpolation
m_filter == 4 => use automatic trilinear interpolation when PS2 uses mipmap
m_filter == 5 => like 4 but force bilinear interpolation inside layer
A multi sector raw disk sector read that reads data from two tracks of
different types will not complete successfully. Reading the sectors one
at a time should fix the issue.
Fix Berserk #1526 Well done guys but we're more clever than you ;)
So instead to mask the color channels as any guy that RTFM, they decided to use the illegal 8H frame format
WMS/WMT 2 is the region clamping mode.
Hw unit can't emulate it right so it can give you bad filtering (Fix#1025)
Note: I only did the fix because I wanted to remove the TEXA hack. Otherwise
it is still recommended to use openGL
It must work fine without it now.
From the google code comments:
It would be nice to test those games
* Ar Tonelico 2 (line in sprite regression?)
* breath of fire dragon quarter (overlayed user interface in the game)
v2: update Dx code to use the good format
* As sw renderer, don't bother to bypass it when it is ATST_ALWAYS
* Don't update the ATE register value
=> It is a really bad idea. Next draw call will be wrong if TEST register isn't written.
The TryAlphaTest context could have been updated
GS really uses an invalid texture located at 0.
Improve the rounding for R&C. The idea is to avoid the corner case were only
the corner of the triangle touch the 7/16 edge.
* Always do +1 before the draw call
* Prefix texture name with i (as input) to keep them before the FB
Goal is to ensure that all renderers share the same draw call value.
Game: harley davidson
* write tex0 ctx0
* write tex0 ctx1
* draw ctx 0
Previous GSdx behavior will load the clut every write of TEX0. In the
above case the draw will take the wrong clut.
To be honest, it could be a wrong emulation on the EE core emulation.
The hardware likely got a single clut (1KB cache is quite expensive)
So clut loading must be skipped if the context is wrong.
Next draw will use the ctx1 clut so I apply TEX0 when the context is switched
Please test harley davidson :)
v2: detect context switch from UpdateContext function
V3: always set m_env.CTXT[i].offset.tex, avoid crash (Thanks to FlatOutPS2 that spot the issue)
V4: move bad psm correction code (rebase put it in the wrong place)
Free bt
3 0xe676d194 in ~Source ../plugins/GSdx/GSTextureCache.cpp:1526
4 0xe676d194 in GSTextureCache::SourceMap::RemoveAt(GSTextureCache::Source*) ../plugins/GSdx/GSTextureCache.cpp:1990
5 0xe676f0fe in GSTextureCache::IncAge() ../plugins/GSdx/GSTextureCache.cpp:1022
Use bt
0 0xe6772a83 in GSTextureCache::LookupSource(GIFRegTEX0 const&, GIFRegTEXA const&, GSVector4i const&) ../plugins/GSdx/GSTextureCache.cpp:204
1 0xe66b0c9f in GSRendererHW::Draw() ../plugins/GSdx/GSRendererHW.cpp:579
2 0xe66fb43e in GSState::FlushPrim() ../plugins/GSdx/GSState.cpp:1509
Hypothesis the m_map array of list contains an invalid pointer
It is populated GSTextureCache::SourceMap::Add based on the coverage. The coverage is based on the offset.
So offset is potentially wrong. As mipmap code hack the offset value. It would be a nice culprit.
This commit avoids a potential bad transition between MIPMAP (which
overwrite the "offset") and the base layer (which wrongly keep an old "offset")
Conclusion, pray for my soul as it is very hard to reproduce
Ratchet & Clank (the third) uses an address of 0 for invalid mipmap.
It would be very awkward to put the middle layer of texture in start of
memory. So let's use this information to correct the lod.
It make the game more robust on the lod rounding
* Use texraw for the unconverted texture (keep index fmt)
=> avoid bad filename order with the multiple texture layers
* add the real mipmap address
* Use a nice string format
The SPTI code is unused, and it's simpler to just use the Windows
ioctls/API if they work (only raw disk sector reading is an issue and
the SPTI workaround is already in place).
It doesn't support dual layer ISO images, and the ini has to be edited
manually so it loads an ISO image ("$" has to be prepended to the ISO
path as well). The PCSX2 internal ISO file reader is probably better in
most/all aspects and I don't think it's worth copying the logic from
PCSX2 into the plugin.
gsdx ogl: only use geometry shader to convert big enough draw call
The purpose of geometry shader is to reduce bandwidth (72 bytes by sprite)
and CPU load.
Unfortunately it increases CPU load due to extra shader validations.
So geometry shader will only be enabled for draw call with more than
16 sprites (arbitrarily, smallest number before shadow hearts plummet)
v2: don't disable geometry shader in replayer.
It is easier to spot sprite rendering and to manually read vertex info.
Warning can be reenabled on GCC
A warning isn't fixed as potentially the code is wrong
../pcsx2/gui/MemoryCardFolder.cpp: In member function ‘void FolderMemoryCard::FlushFileEntries(u32, u32, const wxString&, MemoryCardFileMetadataReference*)’:
../pcsx2/gui/MemoryCardFolder.cpp:1027:10: warning: unused variable ‘filenameCleaned’ [-Wunused-variable]
bool filenameCleaned = FileAccessHelper::CleanMemcardFilename( cleanName );
Strangely the game uses large texture to handle texture buffer.
I think it plays with WMS/WMT. I'm not sure texture shuffling is 100%
correct here. But without it, it's completely broken.
* 0x9C712FF0, Jak1, EU
* 0x472E7699, Jak1, US
* 0x2479F4A9, Jak2, EU
* 0x12804727, Jak3, EU
* 0xDF659E77, JakX, EU
Please report me the CRC of the US version too so I can add them.
Please test the shadows rendering (openGL HW + accurate blending at least basic)
The game sets the framebuffer as an input texture. So I did the same for
openGL. Code is protected with a CRC. It is working because the game want to sample
pixels.
For the record, I tested it GTA too, it doesn't work as expected because
the game will resize the framebuffer to a smaller one. So you don't have
the guarantee that pixel will be read before a data write.
Note: it requires at least accurate blending set on basic
Note: I need CRC of all Jak games that suffers of this issue. Thanks you :)
Default copy-constructor is eight 32 bits move
GSRendererOGL::Lines2Sprites code shrinks from 510B to 398B
(loop of the function 296B => 181B). Hopefully it will reduce the cost
to convert line to sprite on the CPU (i.e. when geometry shader is disabled)
It impacts all renderers. It ought to fix issue in GTA radiosity,
Shadows in Jak series. (note shadows will suck in upscaling)
Implementation is really brutal. Expect a massive slow down, but at least we can test the effect
easily.
Normally perf impact will remain reasonable if the game doesn't use a Read-Write effect
Performances number are welcomes (my guess is really awful in HW mode, slow in SW mode).
You can enable it with "UserHacks_AutoFlush = 1"
GS memory is only 4MB but rendering is allowed to be 2048x2048
with 32 bits format (so 16MB). Technically the frame/depth buffer can start
at the end of the GS memory. Let's not waste too much memory.
Fix crash with BASARAX
(game draws a 2048x1664 32 bits area)
The hack only fix the HW renderer but not the SW renderer. However I'm not sure
the issue is from GSdx.
The hack will disable alpha test that used to generate empty draw call.
Manual gives all setup to upload a palette from the host. But nothing forbid to render
directly in the palette buffer. (GS rule nb 1, there is no rule ^^)
Fix Virtua Fighter 2 dark colors
However I'm not sure we can fix HW renderer. Rendering is done on the GPU but palette
handling is done on the CPU... So we need to read back data (ouch, and slow). A quick
test didn't get the expected results. Potentially there are others bugs (aka not gonna
happen on the HW renderer)
IOCTL_CDROM_RAW_READ apparently does not work for some read modes on
some optical drives, which makes some CD-ROM games unplayable from the
disc.
Work around the issue by using SPTI to retrieve the raw sector data. The
old reading method has been retained in case SPTI cannot be used (if the
device could not be opened with write access).
Fix motocross mania missing texture. Close#1319
As far as I understand, transfer is initialized in DIR. But the real
write only occured later so the blit buffer could have been overwritten
by a new value.
BLIT 0 13700
TREG 40 40
DIR 0 0
BLIT 0 13f00 <=== the bad guy
Write! ... => 0x3f00 W:1 F:C_32 (DIR 00), dPos(0 0) size(64 64)
v2: set a value in m_tr.m_blit for load state
It creates a regression on game that uses a small temporary target to
upload textures of various sizes. Inital code was done to handle direct
frame write (background, FMV) so big target
Purpose is to control the filtering when final image is displayed on the screen
Could improve the sharpness of the output in some games (ofc, it will be pixelated)
When depth primitive is constant and depth test is greater or equal, we can
execute the depth write after color (depth status will only depends on the initial
value)
New case for RGB_ONLY ate:
If the blending equation uses a fixed alpha or a source alpha. We can postpone the alpha write
in a 2nd pass.
If depth can also be postponed, we can guarantee the order of correctness of the value.
1st pass => do RGB
2nd pass => do Alpha & Depth
It fixed Stuntman letter rendering :) Remaining of the game is still broken :(
At higher resolutions it takes too much time to save a screenshot at the
maximum compression level. So let's allow the user to set the
compression level.
This re-uses the png_compression_level setting. The default compression
level is 1 for speed, but if the user wishes to increase the compression
level (without using an external tool) and doesn't mind if the
screenshot takes more time to save then they can increase the
compression level up to a maximum of 9 (which can take quite a while).
Fixes#1527.
PNG_LIBRARIES adds both libpng and zlib to the command line.
PNG_LIBRARY only adds libpng to the linker command line, and the cmake
documentation also suggests not to use it.
Value seems wrongly rounded and you can't distinguish 0xFFFF from 0xFFFE
Instead check that depth is constant for the draw call and the value from the vertex buffer
Fix recent regression on GTA (and likely various games)
In FB_ONLY mode the alpha test impacts (discard) only the depth value.
If there is no depth buffer, we don't care about depth write. So alpha
test is useless and we can do the draw with a single draw call and no program
switch
Extend GSVector to support float move
Initial code likely used integer move for performance reason. However due to
the nan correction, register is now in float domain.
I hope it wasn't done on purpose.
CID 168624 (#1 of 1): Missing break in switch (MISSING_BREAK)
unterminated_case: The case for value CMD_CONFIG_MODE is not terminated by a 'break' statement.
CID 168626 (#1 of 1): Uninitialized scalar field
uninit_member: Non-static class member m_end_block is not initialized in this constructor nor in any functions that it calls.
Fix rendering issue on letters on Kengo/burnout 3/...
Default algo will execute the alpha test in 2 passes. However due to blending
you can't handle accurately the color.
Fortunately for us, the rendering uses an always pass depth test so you
can execute first all the color rendering (which doesn't depends on the alpha test)
And then the depth part which depends on the alpha test.
* Code was factorized a bit with the help of max_z
* Add an extra optimization if test is ZTST_GEQUAL and min z value is
the biggest value. Z test will always be pass.
Note: due to float rounding (23 bits mantissa vs 24 bits depth) the test
is done against 0xFF_FFFE and not 0xFF_FFFF. It is wrong but GPU will
also use float so impact will be null.
CreateEvent and CreateThread return NULL on failure, not
INVALID_HANDLE_VALUE. This should have been done in
0477e03965, I didn't check thoroughly
enough.
The solution files are unused and for ancient Visual Studio versions -
GSDumpGUI has its own solution file, and bin2cpp is included in the main
solution file.
The property sheets have either fallen out of use or were never used in
the first place.
Fixes a regression introduced by 46ba9aa117,
where the Linux GS replayer would always use the options in inis/GSdx.ini
(or use the default options if that doesn't exist) to replay the dump,
instead of using the GSdx.ini from the specified ini folder.
Update done on f712c5c6d0
Previous code use the size of the draw to compute latest block. I
don't know why I use .x/.y which are the origin offset so the start of the block.
Check the instruction set first in GPUinit, GPUconfigure and GPUtext
to prevent unsupported vector instructions from being executed.
Move the vector initialisation in GPUinit to a separate function - it
avoids a vzeroupper instruction.
vector push_back causes a SIGILL signal on a Nehalem (SSE4.2) QEMU VM
when compiled with GCC 6.1.1.
However, an empty constructor causes illegal instruction exceptions to be
generated on a Windows VM.
So here's an inbetween that looks stupid but works on what I've tested.
Not tested
* rumble
Save/load state will be implemented in the next commit
v2:
* Print current deteced pad mode
* fix dpad button tranmission
close#366
Fix#1457 (GTA)
The game uses a depth format for a pure color buffer (cokes do ravage
in gaming industry)
However I'm really afraid that it migth break another effect in other games.
Combine all the different configurations together so the project files
are more generic and maintainable.
Also standardise the layout so all the project files will be similar and
all have the same standard elements (even if empty).
Add 64-bit configurations.
Additional specifics:
spu2-x: FLOAT_SAMPLES preprocessor definition removed since it's unused.
Use a relaxed atomic to read the exit variable in the hot path
Wait that exit is deasserted in the destructor, so we are sure the
thread will "soon" return
Value could range from 1 to 9. Default is 4 and it is potentially the
best option. Feel free to test some values on your system, behavior
might depends on the core number and thread number
Value is exponential so 4 is 2 times more pixels than 3.
Small value increased thread overhead, big value increase wait/sync latency
memory overhead by thead is only 256KB
However it will reduce the probability to block the push thread to nearly 0
I tested a couple of dumps and only manage 4000 element with 1 extrathread.
Add a factor 2 on the VRAM to get the quantity of available memory for the textures.
The driver is allowed to put some textures in RAM. Of course it is bad for performance
but it won't crash.
Due to the 4GB by process limit, I keep a (reasonable) maximum of 3.8GB.
In order to avoid a crash when memory is too low an exception will be risen
with no guarantee on rendering and big performance impact. In this situation
you ought to reduce upscaling/disable large framebuffer.
* Does the first vsync (start counter) after the sleep
* Dump data after the rendering, avoid to count extra destructor,sleep time
* Dump data into a basic csv file (if people want nice graph)
There is only a single event queue, so you need to detect the pad based
on the configuration
Mouse/Wiimote is limited to first pad
Related to issue #1441
In file included from GSRenderer.cpp:23:0:
GSRenderer.h: In constructor ‘GSRenderer::GSRenderer()’:
GSRenderer.h:58:12: warning: ‘GSRenderer::m_dev’ will be initialized after [-Wreorder]
GSDevice* m_dev;
^
GSRenderer.h:52:13: warning: ‘GSVector2i GSRenderer::m_real_size’ [-Wreorder]
GSVector2i m_real_size;
^
GSRenderer.cpp:32:1: warning: when initialized here [-Wreorder]
usb-kbd: Remove unused variable.
usb-ohci: Add proper casts for the variables.
vl: Add proper casts for the variables.
USB: Add proper casts for the variables.
GLLoader: cast passed parameters to required type.
GSDeviceOGL: cast variables to required type and silence warnings.
GSRendererOGL: cast variables to required type and silence warnings.
It creates some slowdowns for unknown reason. My best hypothesis is
that stencil will be cleared too which is slow.
Let's keep the code for the future when stencil will be dropped.
Fix#1420
If baseline and display rectangle offsets differ by small values then consider the status of frame memory offsets, prevents blurring on Tenchu: Fatal Shadows, Worms 3D, ProStroke Golf, Vexx
* Improve frame buffer height management on custom resolution. Width seems to be fine with the same size as scaled image output.
* Prevent offset issues on Persona 3 based on the data from merge circuit.
Note: Fixes custom resolution upscaling on ICO 50Hz/60Hz mode when large frame buffer is enabled. previously 60Hz mode only displayed half of the screen and 50Hz mode only worked due to the scissor hack.
* Ignore Frame memory offsets for calculating dimensions value of display rectangle.
* Remove hack which limited scaling size based on the scissor value.
Note: With the following commit, SilentHill 2 now properly outputs the desired resolution by the users on custom resolution. Previously if we set 1024 x 1024 , it'll output a lower height value which was caused by the hack removed on this commit.
Fixes regression introduced by the pop'n music controller support PR.
When modifying the axis direction combo box in the Configure Binding
group, the modified binding's info would get deleted and replaced by the
next binding's info. This results in incorrect info being passed to
BindCommand().
This commit reverts the incorrect code so the binding info is backed up
before deletion takes place, therefore ensuring the correct info is
passed to BindCommand().
It ought to be the same in performance but code will be easier this way
v2: print the sync status
v3: use a performance print so it doesn't spam the console
* Support Mesa Nouveau IR (free driver for Nvidia's GPU)
=> Print intermediate representation + final shader
=> Dump GPR usage
* Move dumped shader in /tmp/GSdx_Shader/<sub_dir>
=> Avoid the landing of 3 thousands of files in $PWD ^^
* Use function instead of macro
The layout of the buttons is improved to more closely resemble a modern analog controller/DualShock configuration, and the Device column of the list view has been reduced slighty so by default the horizontal scroll bar isn't visible.
Using __declspec(dllexport) causes duplicate export warnings to be
generated when compiling 64-bit builds. Name mangling also occurs on
functions that are exported this way, so it doesn't actually work with
the plugin system, which uses unmangled names.
The module definition file exports the functions without name mangling
and is sufficient on its own.
I accidentally removed it in a previous commit. It probably didn't
affect anyone though (you'd need to be using a DS3 via libusb, most
people will be using other methods).
Increase the performance on the free driver (Nouveau)
Currently the driver validates all UBO when only 1 is updated. It
is clearly a bad idea to put all UBO in a single common headers.