Commit Graph

9267 Commits

Author SHA1 Message Date
Gregory Hainaut d812222061 vif: use u32 code instead of u8/u16
It avoids memory stalls and greatly reduces the overhead of the dVifUnpack function

Here a vtune summary of this branch (done on SotC init)

dVifUnpack<1> was 14.5% of effective VU thread time
dVifUnpack<1> is now 3.8% of effective VU thread time

I hope it will translate to better fps
2016-12-18 22:44:24 +01:00
Gregory Hainaut ef75b36013 vif: move back the cache seach in the unpack function
Avoid the various move to return the value (actually due to the pointer)
2016-12-18 22:44:22 +01:00
Gregory Hainaut e4c2c53b19 vif: inline dVifsetVUptr function
It avoid a double cmp/jmp on the dynarec/interpreter mode.
2016-12-18 22:44:01 +01:00
Gregory Hainaut 6ae082dab2 vif: compute the length during the compilation stage 2016-12-18 22:44:00 +01:00
Gregory Hainaut 7a33cda122 vif: replace sse cmp code with standard cmp
Standard instruction are faster to execute besides the CPU can optimize the cmp/jne

SSE

  e0:	add    ecx,0x10
  e3:	cmp    eax,0x7
  e6:	jg     1b0 <void dVifUnpack<0>(unsigned char const*, bool)+0x1b0>
enter_loop:
  ec:	vpcmpeqd xmm0,xmm1,XMMWORD PTR [ecx]
  f0:	vmovmskps eax,xmm0
  f4:	cmp    eax,0x7
  f7:	jne    e0 <void dVifUnpack<0>(unsigned char const*, bool)+0xe0>

Standard cmp

  d8:	add    eax,0x10
  db:	mov    esi,DWORD PTR [eax+0xc]
  de:	test   esi,esi
  e0:	je     190 <void dVifUnpack<0>(unsigned char const*, bool)+0x190>
enter_loop:
  e6:	cmp    ecx,DWORD PTR [eax+0x4]
  e9:	jne    d8 <void dVifUnpack<0>(unsigned char const*, bool)+0xd8>
  eb:	cmp    DWORD PTR [eax+0x8],ebx
  ee:	jne    d8 <void dVifUnpack<0>(unsigned char const*, bool)+0xd8>

v2: use reference instead of a pointer for find parameter
2016-12-18 22:43:07 +01:00
Gregory Hainaut 2320efeb55 vif: increase buckets number to 64K
It allow to compare only 8B in the lookup so SSE could be replaced with general instruction

As a bonus, it allow to compute the hash key with a mov rather than modulo (which was an 'and')
2016-12-18 14:05:55 +01:00
Gregory Hainaut 1a32062439 vif: repack nVifBlock struct
cl/wl can fit in a single byte. Add a 2B length field instead.
It will contains the pre computed length to reduce dVifsetVUptr overhead
2016-12-18 14:05:55 +01:00
Gregory Hainaut d34e99b38b vif: handle the special case 0 in the compilation stage (rather than lookup) 2016-12-18 14:05:55 +01:00
Gregory Hainaut 555c96a941 vif: reorganize dVifUnpack
Inline the execution part
Add a num parameter to dVifsetVUptr
Use a local variable for the nVifBlock instead of a global struct state

The goal is to ease future update of the nVifBlock struct
2016-12-18 14:05:55 +01:00
Gregory Hainaut 10b3d429fe vif: new implementation of the hash bucket
Previous implementation saved the both the chain pointer and the chain size
Rational: size is useful to add new element and to detect the end of the chain
Vif cache is rarely miss. So 'add' is barely called and the end of a chain is
barely reached.

New implementation will add a null cell at the end of the chain. As a
cell contains a x86 pointer, if is null you could conclude that you
reach the end of the chain.

The 'add' function will traverse the chain to get the current size. It is
a cold path besides the chain is often short (< 4).

The 'find' function only need to check the startPtr bytes to detect the end
of the loop.

Note: SizeChain was replaced with a std::array
2016-12-18 14:05:53 +01:00
Gregory Hainaut c58b04979f vif: remove the type template of HashBucket
The class is designed and optimized for the layout of nVifBlock.
Besides it will ease future improvement.
2016-12-18 13:41:14 +01:00
Gregory Hainaut c368618d09 vif: use intrinsic cast instead of ugly define 2016-12-18 13:41:14 +01:00
Gregory Hainaut 1acc81c25d vif: don't allocate vifblock hash on the heap
Avoid an extra indirection to access the hash bucket (Find function)
2016-12-18 13:41:14 +01:00
Gregory Hainaut 3dc7dc0cdc vif: improve block compilation management
Safety:
* check remaining space before compilation
* clear hash if recompiler is reset

Perf:
* don't research the hash after a miss
* reduce branching in Unpack/ExecuteUnpack

Note: a potential speed optimization for dVifsetVUptr
Precompute the length and store in the cache. However it need 2B on the
nVifBlock struct. Maybe we can compact cl/wl. Or merge aligned with upkType
(if some bits are useless)
2016-12-18 13:41:13 +01:00
Gregory Hainaut b0b5c27fec vif: remove useless state from nVifStruct 2016-12-18 13:23:07 +01:00
Gregory Hainaut c2587abcea mVU: always call perf before leaving the compilation function
I misses some early return in my first tentative. Now VTune shows me
properly the time in VU recompiler.

Note: It seem some block overlap (likely due to the branching mess). But it is still way better than no data
2016-12-16 22:01:06 +01:00
Gregory Hainaut 632b4971de common: remove memset duplicates
Use standard memset instead of memset_8

Move memzero/memset8 in a common OS file.
2016-12-16 20:45:22 +01:00
Gregory Hainaut b3474b5a71 MTVU/gif: prebuilt the fake packet
GS_Packet constructor calls memset which is quite slow and useless as data is overwritten

Vtune overhead of Gif_Unit::Execute goes from 5.8% to 3.0% (EE thread)
2016-12-16 10:31:23 +01:00
ramapcsx2 29d229264d Merge pull request #1696 from FlatOutPS2/master
psxmode: Correct exe name for several PSX titles
2016-12-13 23:54:58 +01:00
FlatOutPS2 ff98dac104 psxmode: Correct exe name for several PSX titles
Several PSX titles lack a backslash in the elf path, which made the disc
serial contain 'cdrom:', this caused savestate issues in those ganes.

Solves: https://github.com/PCSX2/pcsx2/issues/1692
2016-12-13 17:32:26 +01:00
Jonathan Li 61669d1f3f gsdx:png: Fix accidental resource leak
Oops.

Unfortunately it'll reintroduce the clobbering warning on gcc 4.9.
2016-12-12 23:08:30 +00:00
Jonathan Li b178423166 gsdx-replayer:cmake: Reduce build time/filesize
Avoid building GSdx twice if the replayer is being built.
2016-12-12 18:54:54 +00:00
Jonathan Li 2c3fd160c3 gsdx-replayer:linux: Fix strict-aliasing warnings
Use a reinterpret_cast instead of casting the function pointer address
to a void** and dereferencing it.

Also remove an unnecessary (void) and avoid including stdafx.h.
2016-12-12 18:14:38 +00:00
Jonathan Li d4a6e18c01 gsdx:png: Fix gcc clobber warnings
Don't adjust 'image' and just use an additional offset.
'success' was kinda unnecessary when true or false could just be
directly returned.
Move 'compression' clamping out to GSPng::Save instead.

And throw in a whole bunch of const for good measure.
2016-12-12 17:39:05 +00:00
Jonathan Li 415090d249 common: Avoid wchar_t in pxTextWrapper
wchar_t is 16-bits on Windows, which can't actually properly fit all
Unicode characters.

Use the wx3.0.x wxTextWrapper approach of using iterators that increment
by actual characters to fix the issue, and also switch to using the
std::string style functions in wxString.
2016-12-10 22:30:27 +00:00
Jonathan Li afe86a5f66 cmake: Only use -fprofile-dir when PGO is used
It stops clang from warning that '-fprofile-dir' is not supported.
2016-12-10 21:51:21 +00:00
Akash a83042d5c0 PCSX2-WX: Update strings in Language dialog 2016-12-10 12:35:57 +00:00
Akash 83eb79c9d9 PCSX2-WX: Proper source medium on menuitem
Previously the boot menu items always displayed "Boot CDVD" regardless of the current source medium, this behavior has been fixed to properly adjust the text when source medium is changed. Now it'll display Boot CDVD/ISO/BIOS with respect to the current source medium.

v2: Some instances of "Iso" have been changed to "ISO" for consistency.
v3: Remove the unnecessary "Reboot" on menu item labels, saves some string translations.
v4: Add a new shortcut key for the primary boot menu item.
2016-12-10 12:35:57 +00:00
Akash b86518ef24 CDVD: Convert CDVD_SourceType into enum class
* Add a template function for underlying type conversions of enumerations
2016-12-10 12:35:57 +00:00
Akash f367fa5a98 PCSX2-WX: Fix Shutdown menu item behavior
There is already a dedicated bind event to handle the gray out of the menu item, so let's just gray it out initially and let the bind event handler do it's thing.

The previous behavior would only gray out the menu item when all the plugins are in a non-active state which didn't seem ideal as the plugins were shutdown only when closing PCSX2 (or) switching plugins.
2016-12-10 12:35:57 +00:00
Akash 259b81317d PCSX2-WX: Disable HostFs for release builds 2016-12-10 12:35:57 +00:00
FlatOutPS2 947b6b5503 LilyPad: Add Device Select option
Adds a device select option that hides bindings and disables binding new
inputs from all non-selected devices on the bindings list. This also
avoids input conflict issues when one controller is recognized as
several devices through different APIs.
2016-12-10 12:16:44 +00:00
FlatOutPS2 872ab9d2b1 LilyPad: Add Configure on bind option
Part of the GUI update, this function switches to the configuration page
immediately after binding an input instead of staying on the bindings
page.
2016-12-10 12:16:44 +00:00
FlatOutPS2 1f8608f6dd LilyPad: GUI update
Updates the UI by reducing the height of the plugin window. This has
been achieved by removing some buttons below the diagnostics and
bindings list and incorporating those functions into the
lists(accessible by right-clicking in the list). The binding
configurations on the Pad tabs have been moved to a separate page, like
the Forcefeedback bindings, to separate the configuration from the
bindings.
2016-12-10 12:16:44 +00:00
FlatOutPS2 deaceb6b08 LilyPad: Add skip deadzone option
Adds a skip deadzone option to the Pad tabs.

With the normal deadzone, if the control input value is below the
deadzone threshold, the input is ignored.
However, some controllers also benefit from shortening the input range
by skipping a deadzone.
2016-12-10 12:16:44 +00:00
Akash 61a6fe9cd9 GSDX: Apply saturation only to interlaced video mode
JMMT uses a bigger display height on NTSC progressive scan mode, which is not really unusual hence adjust the saturation hack to only take effect on interlaced NTSC mode.

However, the whole double screen issue on FMV still exists. As a bit of information, this game has the second output disabled but seems to have some valid data inside of it, maybe the second output data is leaked into the first one? most likely a bug in the frambuffer data management rather than a CRTC issue (needs to be investigated)
2016-12-10 11:29:10 +01:00
np511 b9d57843eb Adds PGO support. Profile data is stored in a folder called profile
in the top-level source directory. The build folder should NOT be
transferred between computers when PGO is used, though I don't
see why anyone would be doing so anyway.

Also adds support for PGO and LTO to the build.sh script.
2016-12-10 11:26:16 +01:00
Gregory Hainaut 40ac87c9bc Merge pull request #1690 from PCSX2/greg/vtune
Greg/vtune
2016-12-10 11:25:58 +01:00
Gregory Hainaut 7f64f39c05 vtune: count the number of ERET to trigger a quick exit
The purpose is to stop vtune profiling in a predictable way. It allows
to compare multiple runs.

ERET is called every syscall/interrupt return so it is proportional to
the EE program execution.
2016-12-10 11:07:35 +01:00
Gregory Hainaut 031b6e6372 common: improve vtune merge support
Mapping the full buffer is killer on Vtune (either crash or requires a huge processing time).
Instead keep the same ID for code in the same buffers.

I think all buffers are correctly mapped now but I still miss the frame pointer
for VU code.
2016-12-09 09:28:19 +01:00
Gregory Hainaut b9369e7c00 pcsx2: remove the reserve feature of recompiler memory
Cons:
* requires ~180MB of physical memory (virtual memory is the same so it
  doesn't impact the 4GB limit)

From steam: 98.81% got at least 2GB of RAM. 83.62% got at least 4GB of RAM.
That being said, it might not really increase RAM requirements as OS could put the
new allocation in the swap.

Pro:
* code is much easier
* remove at least half of the signal listener
* last but not least, it is way easier for profiler/debugger
2016-12-09 09:28:19 +01:00
Gregory Hainaut 903d3595e5 pcsx2: add a --profiling cli option
Disable Framelimiter and Vsync

So you can profile real data instead of the idle time between vsync ;)
2016-12-09 09:28:19 +01:00
Gregory Hainaut 0453e5cad8 cmake: improve vtune integration
Year is included in the path so search in order 2018/2017/2016

Not ideal but at least all logic is inside the FindVtune module
2016-12-09 09:28:19 +01:00
Akash 07d7905896 GSDX: Fix output texture height calculation
Previously, the height of the frame offset was also considered for the total height of the texture which was obviously wrong as the portion before the offset value isn't part of the frame memory.
2016-12-08 22:14:05 +01:00
Gregory Hainaut 4d39bbe329 Merge pull request #1688 from turtleli/gsdx-thread
gsdx: Use std::thread and std::function for GSJobQueue
2016-12-08 22:07:36 +01:00
Jonathan Li ac78688a32 gsdx: Make GSJobQueue non-inheritable
In the previous code, the threads were created and destroyed in the base
class constructor and destructor, so the threads could potentially be
active while the object is in a partially constructed or destroyed state.
The thread however, relies on a virtual function to process the queue
items, and the vtable might not be in the desired state when the object
is partially constructed or destroyed.

This probably only matters during object destruction - no items are in
the queue during object construction so the virtual function won't be
called, but items may still be queued up when the destructor is called,
so the virtual function can be called. It wasn't an issue because all
uses of the thread explicitly waited for the queues to be empty before
invoking the destructor.

Adjust the constructor to take a std::function parameter, which the
thread will use instead to process queue items, and avoid inheriting
from the GSJobQueue class. This will also eliminate the need to
explicitly wait for all jobs to finish (unless there are other external
factors, of course), which would probably make future code safer.
2016-12-08 01:18:17 +00:00
Jonathan Li cdeed349e3 gsdx: Replace platform-specific threads with std::thread
GSThread now doesn't seem to have a purpose, so it's been removed.
2016-12-08 00:36:32 +00:00
Jonathan Li faa46bb62d gui: Fix Plugin Selector panel memory leak
SafeList is totally unsafe for non-POD objects.
2016-12-07 20:25:45 +00:00
Jonathan Li 592d4b024a cdvdgigaherz:linux: Swap Ok and Cancel button order
This now matches the usual GTK GUI button order.

Also bump the version number.
2016-12-07 01:40:44 +00:00
Jonathan Li 1d634f9b44 cdvdgigaherz:linux: Use pread instead of lseek + read
It'll make it unnecessary to use a lock when reading disc sectors.
2016-12-07 00:54:11 +00:00