Commit Graph

135 Commits

Author SHA1 Message Date
Soren Jorvang 6d93d473ad Shut up recent versions of GCC about the OpenMP pragmas.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@7316 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-03-07 20:23:04 +00:00
Rodolfo Osvaldo Bogado 25231f8007 more fixes for opengl sing openmp, i hope is the last :)
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@7300 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-03-05 15:30:38 +00:00
Rodolfo Osvaldo Bogado 5eb99178dd more fixes for my openmp commit
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@7299 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-03-05 15:03:36 +00:00
Soren Jorvang f20731b1c3 Fix OS X build.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@7298 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-03-05 13:23:37 +00:00
Rodolfo Osvaldo Bogado 15ca7b72e6 some fixes for my last commit
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@7289 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-03-05 03:57:09 +00:00
Glenn Rice b8b81001bd Linux build fix.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@7288 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-03-05 02:32:09 +00:00
Rodolfo Osvaldo Bogado c569b33829 First Revert my changes to VertexLoader.cpp, i don't own the games that get error so i revert the changes until i can test it myself.
Second:
A experiment. implemented parallelization in texture decoding using openmp. is most a experiment to test the performance in different os/plataforms. in my system (windows x64 amd 1055t) give a speedup in large textures, but i tested in in intel dual core and gives a slowdown. o i limited the use for large textures and cpus with more than 3 cores.
please test an let me know if it improves or degrades the speed.
please for linux and osx user. to enable this you will have to enable your compiler support for openmp to test this code.

git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@7284 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-03-04 22:48:54 +00:00
Rodolfo Osvaldo Bogado eef715b1cf added the possibility to allocate aligned memory, an use it to allocate the buffer utilized in texture decoding, this will make a little easy to use aligned writes when possible in sse2/3 optimized algorithms.
some code additions for future use ;).
please gcc user test this as i don't have opportunity to test it myself i only use reference code to.

git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@7247 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-02-25 20:35:05 +00:00
Soren Jorvang 9c21d003af Remove the global namespace a bit and remove some dead code.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@7043 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-02-02 18:21:20 +00:00
Soren Jorvang 1bcad428ea Link the video plugin statically into the main binary on OS X.
This makes the OS X build more robust and should help pave the
way for the integration of the video plugins as well as LTO.

There are now no more global class level namespace conflicts left,
as evidenced by the fact that Dolphin can be linked with -all_load,
not that you would want to.


git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6958 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-01-29 04:52:19 +00:00
xsacha 1a72beead0 TextureDecoder: Some misc clean ups. Backport code to SSE2 version. Remove redundancy in RGBA8 (5% speedup).
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6789 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-01-09 05:06:53 +00:00
xsacha 5c725262ba Refactor all the SSSE3 functions in TextureDecoder so that the cpu_info check isn't looped over. Speeds up most textures dramatically (where it has previously slowed them).
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6784 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-01-08 09:08:36 +00:00
xsacha 394534814b New SSSE3 implementation for I4 texture decode. 14% speedup over the previous SSE4 implementation (so it was scrapped).
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6783 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-01-08 08:07:45 +00:00
xsacha 3cf8003a55 From my last commit: Fix build on Linux. Use SSSE3 instead of SSE3.
Remove some unused vars from the SSE2 CMPR.

git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6781 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-01-08 04:59:26 +00:00
xsacha f667c03d55 New SSSE3 implementation of RGB5A3. About 40% improvement (less cycles) on the plain C version and 17% on the SSE2 version.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6779 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-01-08 02:52:07 +00:00
xsacha 62b79028ef This needs to be in the right place to work for <sse4. Going to bed now :P.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6776 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-01-07 17:55:26 +00:00
xsacha 87bd4dd4b9 Probably want to store the result for sse4.
Makes I4 textures appear again for SSE4 codepath.

git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6775 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-01-07 17:52:53 +00:00
xsacha 5c1f30060e Fix a missing 'else' in last commit.
Remove more redundancy in CMPR (may make it faster - not tested).

git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6774 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-01-07 17:30:48 +00:00
xsacha 9efa62b0ed Add SSSE3 implementation for RGBA8 texture decode. It is 25% faster (3/4 of the cycles) than the SSE2 version.
Remove a bit of redundancy in CMPR.

git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6773 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-01-07 17:17:26 +00:00
xsacha 9cb3340754 An extra 5-10% speedup for I4 texture decoding with SSE4.1 intrinsics.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6772 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-01-07 16:00:39 +00:00
xsacha dcbfd4ea4c SSSE3 implementation of IA8 texture decode. Roughly 50% faster than SSE2 version on my computer (SSSE3: 77%, SSE2: 57% vs reference C on Core2 Duo). About half as many cycles.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6770 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-01-07 14:55:05 +00:00
xsacha a6acc99a89 Last commit only requires SSSE3, not SSE4.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6769 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-01-07 13:54:18 +00:00
xsacha 53474403e2 An SSE4 implementation for I8 texture decode. Slightly faster than SSE2 version on my computer (SSE4: 60%, SSE2: 55% vs reference C on Core2 Duo).
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6768 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-01-07 13:40:32 +00:00
james.jdunne 1671917472 Missed one MSVC-ism. Should fix build for Linux. Last revision should still work for Windows. No functionality changes this time.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6762 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-01-06 16:50:09 +00:00
james.jdunne 2841d67ce3 Faster SSE2 optimized GX_TF_CMPR texture decoder which gets ~40% speed improvement on x64 and ~50% improvement on x86 as compared to reference C code.
The code now uses direct pointer access from C code to write the colors to the destination texture instead of trying to force them back up into an __m128i and a single write call. This is what produces the major speed-up.

git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6761 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-01-06 16:41:20 +00:00
Soren Jorvang 95b6d3f445 Kill HAVE_OPENCL.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6756 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-01-06 01:11:32 +00:00
james.jdunne 5ca3adde3c Enabled SSE2 optimization of GX_TF_CMPR decoder only for x86 builds. It can't compete with the x64 optimized reference C code.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6755 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-01-06 00:32:52 +00:00
james.jdunne 8f9f1b64ff Fixed Linux build.
Fixed small undiscovered bug in WII_IPC_HLE_Device_FileIO.cpp when looking at 0-length strings.

git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6745 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-01-05 03:19:53 +00:00
james.jdunne 6eff71b893 Fixed S3TC DXT1 decoder implementation. I see ~40% speed improvements running on x86 Intel hardware and 0% improvements running on x64 AMD hardware. Strange. More investigation to follow!
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6744 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-01-05 02:48:32 +00:00
james.jdunne dd7d325453 Temporarily reverting to unoptimized until I can figure this out. Apologies for the SNAFU :)
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6741 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-01-04 22:30:58 +00:00
james.jdunne 99b4f4703d ~40% speed improvement for decoding GX_TF_CMPR (S3TC) textures using SSE2 intrinsics.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6740 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-01-04 19:54:30 +00:00
james.jdunne b7f7a248c5 Fix for bit reduction regression in GX_TF_RGB565 textures from previous commit.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6726 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-01-03 01:24:35 +00:00
james.jdunne 4b15325acd GX_TF_RGB565 texture decoder optimized with SSE2 producing a ~78% speed increase over reference C implementation.
Fixed crash in debugger when attempting to enable profiler before having run any game.

git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6724 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-01-02 22:53:25 +00:00
james.jdunne 7703018632 Possibly fixed game crash issues by switching to unaligned SSE2 loads/stores.
Removed unnecessary work being done in the file system when logging is disabled.

git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6714 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-01-01 20:54:15 +00:00
james.jdunne 60082853ec GX_TF_I4 texture decoder optimized with SSE2 producing a ~76% speed increase over reference C implementation.
GX_TF_RGBA8 texture decoder optimized with SSE2 producing a ~68% speed increase over reference C implementation.
TABified the entire document per NeoBrainX. :)

git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6706 8ced0084-cf51-0410-be5f-012b33b47a6e
2011-01-01 03:52:32 +00:00
james.jdunne 134fca9b82 ~68% increase in GX_TF_IA8 decoding speed. Not an oft-used texture format. An example use is the Wii cursor in MKWii in the menus.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6699 8ced0084-cf51-0410-be5f-012b33b47a6e
2010-12-31 10:20:43 +00:00
james.jdunne 343e3f7c75 ~80% speed improvement in decoding GX_TF_I8 textures. Yes, EIGHTY PERCENT. However, for MKWii movie playback I still can't break the fluffin' 48 FPS boundary on my machine! There's something else at play here because this decoder is ridonkulously fast.
~25% speed improvement in decoding GX_TF_RGB5A3 textures which aren't used very much. I thought it would help for movie playback but I misled myself. Video playback has nothing to do with this texture format.
Next I'll see if I can knock out some of these other texture decoders. Byte swizzling I'm sure can somehow be accomplished using _mm_unpacklo_epi8 trickery, so that'd be another big win I hope.

git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6698 8ced0084-cf51-0410-be5f-012b33b47a6e
2010-12-31 07:23:17 +00:00
james.jdunne b038df64bf TextureDecoder.cpp: new SSE2 optimized GX_TF_I8 decoder. Probably not ultimately optimal SSE2 code, but provably better (on my machine) than the memset version. Tested with __rdtsc counts in an independent project. I get about 6-7 FPS more on average during the intro movie playback in Mario Kart Wii. Hope this compiles for GCC okay.
TextureDecoder.cpp: merged two functionally identical decode5A3RGBA and decode5A3rgba methods.
OpcodeDecoding.cpp and DLCache.cpp: optimization for GX_LOAD_XF_REG. The PSUHFB solution sounds better for SSSE3, but this is a small win for the default case.

git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6692 8ced0084-cf51-0410-be5f-012b33b47a6e
2010-12-30 19:17:08 +00:00
Shawn Hoffman 444854601c CMPR texel blocks are 8x8 in hardware, even though the source format says 4x4.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6655 8ced0084-cf51-0410-be5f-012b33b47a6e
2010-12-24 11:15:51 +00:00
nodchip d888f13bb9 VideoCommon: An experimental fix for Issue 3493. Changed _mm_load_si128 to _mm_loadu_si128. I could not test the bug because I don't have Sonic Colors.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6402 8ced0084-cf51-0410-be5f-012b33b47a6e
2010-11-14 04:29:20 +00:00
Rodolfo Osvaldo Bogado 9b0357b5e2 sometimes to advance you have to make a step back.
use plain vertex arrays instead of VBOs to render in Opengl plugin as the nature of the data make VBOs slower. This must bring, depending on the implementation, a good speedup in opengl.
in my system now opengl and d3d9 have a difference of 1 to 5 fps depending of the game.
some cleanup and a little work pointing to future improvements in the way of rendering.
please test and check for any errors.

git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6139 8ced0084-cf51-0410-be5f-012b33b47a6e
2010-08-28 15:09:42 +00:00
Soren Jorvang 453f7c67cd Newer versions of GCC's <tmmintrin.h> check for __SSSE3__ (-mssse3).
No matter. We don't actually need it for our purposes.


git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6016 8ced0084-cf51-0410-be5f-012b33b47a6e
2010-07-31 15:26:46 +00:00
Soren Jorvang 824b509d2e Make the SSE3.1 VideoCommon code available in GCC builds.
The GCC model for extended instructions like these is that you compile
with -msse3 etc. These affect code generation for whole compilation units,
so the idea is that you have a separate .c file for each instruction set
class and then indirect to the desired one at runtime.

Without e.g. -msse4.1, the GCC built-ins used by <foointrin.h> are not
available. However, in our specific case of compiling with -msse2 and
wanting to use SSE3.1 code, enough built-ins are available that we only
need to provide a little hack for pshufb.

Upgrading this to also use SSE4.1 instructions doesn't appear feasible
without a lot of undesirable duplication of GCC built-in functions and
headers, so we'd probably have to move to the GCC model of separate
source files for that.


git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6014 8ced0084-cf51-0410-be5f-012b33b47a6e
2010-07-31 14:40:01 +00:00
xsacha 21fb4cb96c Add a toggle option for OpenCL in Config (in Advanced Settings). Default is off.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@5768 8ced0084-cf51-0410-be5f-012b33b47a6e
2010-06-22 13:17:01 +00:00
Orphis c2e32371f6 Refactor and prepare the OpenCL texture decoder for decoding textures to RGBA format required by DX11.
Fix the decoder codepath when OpenCL is enabled and the DX11 plugin is used.
Added the DX11 plugin to the Dolphin project dependencies.

git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@5764 8ced0084-cf51-0410-be5f-012b33b47a6e
2010-06-22 00:52:17 +00:00
Rodolfo Osvaldo Bogado 4ab0e4b8a0 fix for rbga8 decoding that causes problems in nsmbw
fix for screen clearing in opengl and d3d

git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@5749 8ced0084-cf51-0410-be5f-012b33b47a6e
2010-06-19 21:12:09 +00:00
Rodolfo Osvaldo Bogado 8c6ae1f6f4 add a path to texture decoder to produce only rgba textures, this will make texture loading in dx11 a lot easier and give a little performance boost to.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@5746 8ced0084-cf51-0410-be5f-012b33b47a6e
2010-06-19 13:31:40 +00:00
Soren Jorvang f2609d1af3 #if 0 work-in-progress code.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@5488 8ced0084-cf51-0410-be5f-012b33b47a6e
2010-05-26 20:52:44 +00:00
nitsuja- 22551a0a8a a few minor code fixes.
also added a user file that should simplify running from VS for newcomers

git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@5338 8ced0084-cf51-0410-be5f-012b33b47a6e
2010-04-12 02:00:15 +00:00
nodchip 32794fc028 VideoCommon: Fixed the bug that some texture become black in SSSE3.1 codes.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@5309 8ced0084-cf51-0410-be5f-012b33b47a6e
2010-04-10 01:37:51 +00:00