Commit Graph

6193 Commits

Author SHA1 Message Date
rogerman c7bb41e4b1 matrix.cpp: Rework all matrix function parameters for explicit array sizing in order to aid compiler optimization and (hopefully) aid in code readability. Also add SSE4.1 versions for the main matrix functions. 2018-02-19 11:43:55 -08:00
rogerman 5a61a08727 matrix.cpp: Do some more code cleanup. 2018-02-16 20:11:36 -08:00
rogerman 249afccfca matrix.cpp: Do a bunch of code cleanup. 2018-02-16 11:59:19 -08:00
rogerman c41a006b2a GPU: Add additional basic SIMD-accelerated functions for memset_u16(), memset_u16_fast(), memset_u32(), and memset_u32_fast() for AVX2 and Altivec. 2018-02-13 14:45:17 -08:00
rogerman 5fbaa53b46 GPU: If a custom-sized layer is to be rendered first, GPUEngineBase::_TransitionLineNativeToCustom() will do a line clear instead of an upscaled line copy.
- Since this is a very common occurrence in many games, and since doing a clear is faster than doing an upscaled copy, this should give a small performance improvement for the larger framebuffer sizes.
2018-02-13 13:54:10 -08:00
rogerman 43d3883986 SoftRasterizer: Framebuffer clears are now accelerated using AVX2 and Altivec. 2018-02-12 18:03:52 -08:00
rogerman ab18de05ef SoftRasterizer: Oops! Fix a performance regression in SoftRasterizerRenderer_SSE2::ClearUsingValues() where the framebuffer was accidentally being cleared twice. (Regression from commit 7509d46.) 2018-02-12 13:42:42 -08:00
rogerman 7509d469b9 SoftRasterizer: Do some multithreading improvements, and also clean up and refactor RasterizerUnit.
- Completely encapsulate all stray global variables into the SoftRasterizer class where they belong.
- Framebuffer clears are now fully multithreaded, significantly improving clearing performance.
- Doing multithreaded texture loads and vertex calculations now requires a minimum of 2 threads, down from 4 threads.
- The maximum amount of SoftRasterizer threads has been increased from 16 to 32.
2018-02-12 11:35:21 -08:00
rogerman 9e3b694ace Cocoa Port: Do some minor code cleanup. 2018-02-05 20:29:09 -08:00
rogerman d1dcbb8218 Cocoa Port: Fix a potential deadlock that may occur on emulation reset. 2018-02-04 13:07:59 -08:00
rogerman 9ee7cd8ec0 NDSSystem.cpp: Check for the GPU struct before calling GPUSubsystem::ForceFrameStop() in GameInfo::closeROM(). 2018-02-03 21:59:00 -08:00
rogerman 23be799a67 Cocoa Port: Metal display views no longer lose visible frames when running multiple display views. 2018-02-03 21:21:54 -08:00
rogerman 01c508f93a Cocoa Port: Remove and replace the high-overhead NSThread with the lower-overhead pthread_t. Improves video display performance when the frame rate is very high (greater than 600 FPS). 2018-02-03 11:31:41 -08:00
rogerman f9c32c9e79 Cocoa Port: Rework triple buffering for Metal display views yet again. This should fix the performance regression introduced in commit a65ceae9 for the larger custom framebuffer sizes. 2018-01-30 16:26:05 -08:00
zeromus 2a58246eb5
Merge pull request #123 from keelimeguy/master
Windows Port: Adding Pen and Touch support for touch screen devices
2018-01-08 16:04:11 -06:00
Keelin Wheeler b11bde4be4 Windows Port: Adding Pen and Touch support for touch screen devices 2018-01-08 16:37:20 -05:00
rogerman f2f3680a7c Cocoa Port: Fix a bug where Metal display view backing textures weren't updating their custom framebuffer sizes. (Regression from commit 4c01e66.) 2017-12-19 15:35:20 -08:00
rogerman 4c01e66a8a GPU: Instead of using fixed double-buffered output framebuffers, allow clients to request any number of framebuffer pages between 1 and 8.
- For all non-Cocoa ports, reduce the number of framebuffer pages from 2 to 1, reducing the memory usage for those ports.
- For the Cocoa port, increase the number of framebuffer pages from 2 to 3 in preparation for a new triple-buffered display scheme.
2017-12-19 14:33:48 -08:00
rogerman d3b628af47 Cocoa Port: Rework synchronization for Metal display views yet again. It should be a lot better now. 2017-12-17 20:35:00 -08:00
rogerman a65ceae98f Cocoa Port: For Metal display views, be much smarter about how we do synchronization. Should fix the performance issues introduced with commit 26ac91ed. 2017-12-11 16:28:42 -08:00
rogerman 1ea95cdde4 Cocoa Port: Do some minor code cleanup. 2017-12-11 16:17:02 -08:00
rogerman cd2f75e43a Cocoa Port: Replace all POSIX named semaphores with Mach semaphores and GCD semaphores, which are both faster than POSIX named semaphores. 2017-12-08 11:49:49 -08:00
rogerman 1e36b36bef Cocoa Port: Remove now obsolete locks from sound functions, since we now call SPU_Emulate_user() in the emulation thread again. 2017-12-07 23:00:28 -08:00
rogerman cee6867bd8 Cocoa Port: In RunCoreThread(), don’t use a potentially more expensive wait method before doing a cheaper time comparison first. 2017-12-07 21:01:59 -08:00
rogerman bac10c7618 Cocoa Port: OpenGL display views no longer use glFlush() when rendering for final flush, since glFlush() has been found to not actually be necessary. 2017-12-05 17:13:24 -08:00
rogerman 26ac91edd0 Cocoa Port: For Metal display views, replace all locks with semaphores, which are the correct synchronization primitive to use here.
- Also change the CocoaDSOutput list lock from a mutex to a rwlock, since testing has shown that there is more thread contention here than I previously thought.
2017-12-05 13:43:30 -08:00
rogerman f9109568b8 Cocoa Port: Improve stability of Metal display views when running CPU-based pixel scalers.
- Also fix a bug where restoring multiple display windows on startup would only have the last display window shown to work properly.
2017-12-03 00:18:30 -08:00
rogerman 39039f2396 Cocoa Port: Have the HUD Settings panel title show the number of the display window that is currently in focus, just like all the other panels. 2017-12-02 19:33:08 -08:00
rogerman b48666ea9c Cocoa Port: Add new HUD item, “Show Execution Speed”, which displays the emulator’s execution speed as a percentage. 2017-12-02 15:35:51 -08:00
zeromus 87335dd57a
Merge pull request #121 from spiveeworks/master
Update README.LIN (autogen.sh, dependencies)
2017-12-01 01:11:48 -06:00
Spivee 93d01f7bf5
Update README.LIN (autogen.sh, dependencies)
I found these dependencies harder to figure out than usual, 
since I'm used to installing packages with pregenerated `configure` scripts.
In particular if `glib` is missing then `configure` will generate with unexpanded macros, which is confusing.
This extra paragraph should be helpful for others. 

Thanks for a great program :)
2017-12-01 18:07:11 +11:00
rogerman 66e8a95657 Cocoa Port: Stability improvements for Metal display views. 2017-11-29 21:31:39 -08:00
rogerman 02a3b58edd Cocoa Port: Fix memory leaks with Metal display views.
- Also fix a bug where Metal display views fail on macOS High Sierra if a CPU-based pixel scaler was used.
2017-11-29 20:02:39 -08:00
rogerman c81df97a92 Cocoa Port: Restore the ability to use Metal display views on macOS High Sierra.
- Also rework the way the HQnx LUTs are loaded in Metal.
2017-11-28 14:06:34 -08:00
rogerman cd6fbcd5ea Cocoa Port: In the Metal framebuffer fetcher, further optimize 18-bit to 32-bit color conversions whenever the master brightness does not need to be applied, which is the most typical use case. 2017-11-28 00:53:50 -08:00
rogerman f0564cc4ac Cocoa Port: Fix a couple of rare edge-case bugs with Metal display views. 2017-11-27 22:53:18 -08:00
rogerman 258ebfd6ea Cocoa Port: Synchronously force a framebuffer fetch on startup to guarantee that all display windows will appear black. 2017-11-27 21:17:16 -08:00
rogerman e18dd27d30 Cocoa Port: Fix the Screenshot Capture Tool with running Metal. (Regression from commit f5ead86.) 2017-11-27 21:15:26 -08:00
rogerman 7213c6373b GPU: All fields for NDSDisplayInfo should be set consistently relative to the NDSDisplayID, not the GPUEngineID.
- In practice, this should change nothing, since all pointers somehow managed to point to the correct buffer locations. This should be nothing more than a programming consistency and readability improvement.
2017-11-27 21:07:14 -08:00
zeromus fa4b027dbd winport: add --windowed-fullscreen 2017-11-27 18:16:40 -06:00
rogerman d5b62d3d02 OpenGL Renderer: Improve the robustness of error-checking OpenGL drivers. 2017-11-24 02:02:05 -08:00
rogerman 010efff31b Linux Port (GTK): Fix OSMesa context creation. (Regression from r4905. Fixes #119.)
- Also do some code cleanup.
2017-11-24 00:28:49 -08:00
rogerman b9ada994df Linux Port (GTK): Remove and replace legacy colorspace handling routines with the new SIMD-optimized colorspace handling routines. 2017-11-22 17:43:17 -08:00
rogerman 96bd35517b Cocoa Port: Signal a fetch at startup, after reading the user defaults for GPU Scaling Factor and GPU Color Depth, in order to guarantee that the client-side fetch buffers will be cleared. 2017-11-21 19:05:11 -08:00
rogerman 5890540007 OpenGL: Maintain one more flag to ensure that textures are always initialized. (Fixes #116.) 2017-11-21 11:17:52 -08:00
rogerman 4269925258 GPU: Properly initialize the newer NDSDisplayInfo fields.
- This has the side-effect of having the Windows port’s display window
start up with a white screen and HUD showing (if enabled) just like
before, rather than a black screen and HUD possibly hidden.
2017-11-20 23:51:42 -08:00
rogerman 25eae6e1ed Cocoa Port (OpenEmu Plug-in): Change the video format to 18-bit RGB666, which matches a hardware NDS. 2017-11-20 21:04:50 -08:00
rogerman a7065311cc Core: Change some default settings to use more compatible and true-to-hardware settings.
- GPU Color Depth (from 24-bit to 18-bit), Advanced SPU Logic (from Disabled to Enabled), SPU Interpolation (from Linear to Cosine), Synchronization Mode (from Dual SPU Sync/Async to Synchronous)
- Just like the previous change to the default JIT block size, let the users themselves disable these settings so that they are more aware that they are sacrificing compatibility for speed.
2017-11-20 16:39:50 -08:00
rogerman 48fee8d590 Windows Port: Don't resize the display window after changing GPU Scaling Factor.
- Also remove TCommonSettings.GFX3D_PrescaleHD. It is a useless setting in core because the internal resolution is not limited to integer-multiplied scaling.
- Also fix spelling on the "Maintain Aspect Ratio" menu option.
2017-11-20 14:37:30 -08:00
rogerman 24108e35e2 Core: Change the default max. JIT block size from 100 to 12, since 12 has been tested to be more compatible and safer to use.
- There is only a negligible performance difference between 100 and 12.
- It is better for users to change the JIT block size from 12 to 100
themselves, since it might make them more aware that they are
sacrificing compatibility in favor of speed.
2017-11-20 12:57:24 -08:00